OrionGeno (https://github.com/BGIResearch/OrionGeno)
Source: CNGBdb Project (ID CNP0009228)
CC BY 4

0 0

Description: This dataset presents large-scale, high-quality gene annotations for 1,249 phylogenetically representative eukaryotic families (one high-quality genome per family) generated by OrionGeno. Among them, 725 families have no prior gene annotations in NCBI, covering major taxonomic groups including Insecta, Actinopteri, Aves, and Magnoliopsida. All annotations were efficiently completed in 10 days on a single NVIDIA A100 GPU node. BUSCO assessment shows high completeness: the median score is 94.7%, with 77.1% (963 genomes) exceeding 90% completeness, and 89.2% of sampled classes achieving mean BUSCO scores above 80%. Lower performance was only found in a few early-diverging lineages due to high genomic divergence. Independent validation using multi-tissue RNA-seq data from six representative species confirmed an average transcriptional support rate (TPM > 1) of 93.1%, verifying the reliability of predicted gene structures.
Data type: Genome sequencing and assembly; Exome
Sample scope: Multispecies
Submitter: 吴逸文; BGI Research
Release date: 2026-04-11
Last updated: 2026-04-11
Data size: 133.76GB