Scientific database

The scientific database is based on the data of CNGBdb to construct multiple databases of multi-omics,aiming to provide scientific data services for different research areas, such as plant, animal, micro organism virus, disease and health, etc., support the needs of researchers , enhance the value of data.

MPSTA: Mouse Placentation Spatiotemporal Transcriptomic Atlas

MPSTA database has a total of 15 sagittal sections from C57BL/6 mouse uterine at E7.5 (x2), E8.5 (x2), E9.5 (x2), E10.5 (x3), E10.5 HFD(x1), E12.5 (x2), E14.5 (x2), E14.5 HFD (x1) using Stereo-seq. For each stages, at least two sections were included. In the MPSTA, we provide the spatial map showing the gene expression and subregions annotation in each embryo sagittal sections. Our panoramic atlas will allow in-depth investigation of longstanding questions concerning placenta development.

Release at 2024-10-22
Wu, Y., Su, K., Zhang, Y. et al. A spatiotemporal transcriptomic atlas of mouse placentation. Cell Discov 10, 110 (2024). DOI: 10.1038/s41421-024-00740-6

CBMSTA: A spatial transcriptome atlas of the mammalian whole cerebella

Recent discoveries about the molecular heterogeneity of the cerebellar cortex suggest the existence of functionally divergent subclasses of anatomically defined cell types. Using spatial transcriptome and single-nucleus RNA-seq analysis, we mapped 3D transcriptomic atlases of the whole cerebellum of mice, marmosets, and macaques at the single-cell resolution. Comparative analysis revealed specific cell types, cell localizations, and intra-cerebellum molecular heterogeneity across species. A comprehensive database generated from this study will expand the acknowledgment of the mammalian cerebellum.

Release at 2024-09-27
Shijie Hao et al., Cross-species single-cell spatial transcriptomic atlases of the cerebellar cortex. Science 385, eado3927 (2024). DOI: 10.1126/science.ado3927

The Global Ocean Microbiome Catalogue

The Global Ocean Microbiome Catalogue (GOMC), a database built in this study, has been stored in the National Gene Bank's Life Big Data platform. Through the analysis and in-depth mining of the currently published Marine microbial metagenomic data, we not only built the most complete Marine microbial gene database so far, but also found a large number of gene resources with application potential, providing a new idea for the development of new gene editing tools, antimicrobial peptides, PET plastic degrading enzymes, etc. It will greatly promote the application and development of related industries.

Release at 2024-09-04
Chen, J., Jia, Y., Sun, Y. et al. Global marine microbial diversity and its potential in bioprospecting. Nature (2024). DOI: 10.1038/s41586-024-07891-2

ASRSSTA: Arabidopsis Shoot Regeneration Single-cell Spatiotemporal Transcriptomic Atlas

The visualized website of callus single cell transcription map built in this study provides a reference resource for the study of plant cell totipotency and regeneration mechanism. Detailed single-cell transcriptome profiles of callus induction were constructed using single-cell transcriptome technology (scRNA-seq) to identify the cell types responsible for initiating early callus: We reconstructed the dedifferentiation locus of LRPI-like cells and QC-like cells during callus formation, and deduced transcriptional factors regulating QC-like cells and gene expression characteristics related to cell fate determination. This study provided a unique perspective on cell fate transformation during callus formation and improved the understanding of callus formation and plant cell totipotency.

Release at 2024-08-12
Yin, Ruilian et al. A single-cell transcriptome atlas reveals the trajectory of early cell fate transition during callus induction in Arabidopsis. Plant communications vol. 5,8 (2024): 100941. DOI: 10.1016/j.xplc.2024.100941

MDESTA: Maize Development Ear Spatial Transcriptome Atlas

A comprehensive understanding of inflorescence development is crucial for crop genetic improvement, as inflorescence meristems give rise to reproductive organs and determine grain yield. However, dissecting inflorescence development at the cellular level has been challenging due to the lack of distinct marker genes to distinguish each cell types, particularly the various types of meristems that are vital for organ formation. In this study, we used spatial enhanced resolution omics-sequencing (Stereo-seq) to construct a precise spatial transcriptome map of developing maize ear primordia, identifying twelve cell types, including the four newly cell types that mainly distributed on inflorescence merisetm.

Release at 2024-05-14
Wang, Y., Luo, Y., Guo, X. et al. A spatial transcriptome map of the developing maize ear. Nat. Plants (2024). DOI: 10.1038/s41477-024-01683-2

HLMA: Human Muscle Ageing Cell Atlas

Muscle atrophy and frailty are common manifestations of sarcopenia and are critical contributors to morbidity and mortality in the elderly. Deciphering the molecular mechanisms underlying sarcopenia has major implications for understanding human ageing. Yet progress has been slow, in part due to the complexity of characterising skeletal muscle niche heterogeneity (with myofibres being the most abundant) and of obtaining well-characterized human samples. Here, we have generated a single-cell/single-nucleus transcriptomic and chromatin accessibility map of human limb skeletal muscles encompassing over 387,000 cells/nuclei from individuals ranging from 15 to 99 years of age with distinct fitness and frailty levels.

Release at 2024-04-22
Lai, Y., Ramírez-Pardo, I., Isern, J., An, J., Perdiguero, E., Serrano, A. L., ... & Esteban, M. A. (2024). Multimodal cell atlas of the ageing human skeletal muscle. Nature, 1-11. DOI: 10.1038/s41586-024-07348-6

LISTA: LIver Spatio-Temporal Atlas

The LISTA (LIver Spatio-Temporal Atlas) database used Stereo-seq (Spatio-Temporal Enhanced REsolution Omics-sequencing) combined with single-cell RNA-sequencing and single-cell ATAC-sequencing to profile mouse liver homoeostasis and a time course of regeneration after partial hepatectomy. This integrative spatiotemporal analysis accurately resolves the transcriptomic and epigenetic gradients controlling the identity of all liver cell types and their molecular crosstalk in both homeostasis and regeneration.

Release at 2024-04-16
Xu, J., Guo, P., Hao, S., Shangguan, S., Shi, Q., Volpe, G., ... & Esteban, M. A. (2024). A spatiotemporal atlas of mouse liver homeostasis and regeneration. Nature Genetics, 1-17. DOI: 10.1038/s41588-024-01709-7

CIRSTA: Cholestatic Injury and Repair Spatio-Temporal Atlas

Cholestatic injuries, characterized by regional damage around the periportal region, lack curative therapies and cause considerable mortality. In this study, we generated a high-definition spatiotemporal atlas during cholestatic injury and repair by Stereo-seq and single-cell transcriptomics. We uncovered that cholangiocytes function as a periportal hub (cholangio-hub) by integrating multiple signals with neighboring cells. Feedback between cholangiocytes and lipid-associated macrophages (LAM) was detected in the cholangio-hub, which is related to the differentiation of LAM, a recently identified subpopulation of macrophages crucial in tissue injury.

Release at 2024-04-16
Wu, B., Shentu, X., Nan, H., Guo, P., Hao, S., Xu, J., ... & Hui, L. (2024). A spatiotemporal atlas of cholestatic injury and repair in mice. Nature Genetics, 1-15. DOI: 10.1038/s41588-024-01687-w

LettuceDB: Lettuce DataBase

LetuceDB is a comprehensive multi-omics database for cultivated lettuce. The database integrated the multi-dimensional data into six modules: germplasm, genome, variome, phenome, microbiome, and spatio-temporal transcriptome. Gene annotation, sequence variations, selection regions and genome-wide association results were included in the genome browser Jbrowse. A user-friendly bioinformatics toolbox was also developed, which enables LettuceDB to serve as one-stop platform for lettuce research and breeding.

Release at 2024-04-01
Zhou, W., Yang, T., Zeng, L. et al. LettuceDB: an integrated multi-omics database for cultivated lettuce. Database (2024) Vol. 2024: article ID baae018 DOI: 10.1093/database/baae018

MHAOD: Macaque Hypothalamus Atlas for Obesity and Diabetes

This website offers an open and interactive database for interrogation of snRNA-seq dataset and stereo-seq maps in macaque hypothalamus. Its comprehensive single-cell transcriptomic atlas provides valuable insights into the molecular changes of individual cells in various hypothalamic regions in control, obese and type 2 diabetic macaques. As a result, researchers studying metabolic disorders such as diabetes and obesity can use this resource to enhance their understanding of the molecular mechanisms that underlie such conditions.

Release at 2024-02-06
Lei, Ying et al. “Region-specific transcriptomic responses to obesity and diabetes in macaque hypothalamus.” Cell metabolism vol. 36,2 (2024): 438-453.e6. DOI: 10.1016/j.cmet.2024.01.003

SCAtlas HCL: Single-cell atlas of human cell lines

The SCAtlas HCL database analyzed 23,089 cells from 40 human cancer cell lines and 2 normal human cell lines to construct single-cell transmutation maps and visualize the data, and explored transcriptomic and epigenetic heterogeneity within individual cancer cell lines, finding that transcriptomic heterogeneity is a common feature of different cancers. The molecular mechanism of the interaction between the transcriptome and the epigenome is revealed.

Release at 2023-12-09
Zhu, Q., Zhao, X., Zhang, Y. et al. Single cell multi-omics reveal intra-cell-line heterogeneity across human cancer cell lines. Nat Commun 14, 8170 (2023). DOI: 10.1038/s41467-023-43991-9

CNGB Imputation Service

CNGB Imputation Service builds a high-precision Chinese population haplotype reference panel based on data resources such as the China Chronic Disease Prospective Study Project (CKB) and Thousand Genomes (1KGP) for genotype interpolation on chip data or low depth sequencing data, and provides secure online genotype interpolation service. The evaluation results show that the interpolation accuracy is more than 96%, which is an important tool and data resource in population genetic research and medical field.

Release at 2023-10-23
Yu, Canqing et al. “A high-resolution haplotype-resolved Reference panel constructed from the China Kadoorie Biobank Study.” Nucleic acids research vol. 51,21 (2023): 11770-11782. DOI: 10.1093/nar/gkad779

STOMICS DataBase

Spatial Transcript Omics DataBase (STOmics DB) is a comprehensive portal of literature and datasets related to spatial transcriptomics. STOmics DB curated most of public spatial transcriptomic datasets, and provided interactive visualization and analysis.

Release at 2023-10-08
Xu, Z., Wang, W., Yang, T., Li, L., Ma, X., Chen, J., Wang, J., Huang, Y., Gould, J., Lu, H., et al. (2023). STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Research. DOI: 10.1093/nar/gkad933

CDCP 2.0: Cell-omics Data Coordinate Platform

CDCP (Cell-omics Data Coordinate Platform) shares and integrate complex single cell datasets, and provides single cell analysis tools and visualization services to facilitate and enable researchers to access and explore published single-cell datasets.

Release at 2023-09-01
Li Y, Yang T, Lai T, You L, Yang F, Qiu J, et al. . CDCP: a visualization and analyzing platform for single-cell datasets. Journal of Genetics and Genomics. DOI: 10.1016/j.jgg.2021.12.004

CottonGVD: The Genomic Variation Database of Cultivated Gossypium spp.

Cotton multiomics Database (CottonGVD) integrates 6 multiomics data resources, including genome, transcriptome, mutome, epigenetic, phenotome and metabolome. CottonGVD integrated 25 cotton genomes, transcriptome data from 76 tissue samples, epigenetic information from 5 species, genetic variation data from 4180 samples, 20 phenotypes and 768 metabolite contents. It provides an important data resource and analysis platform for researchers to compare the variation of cotton germplasm resources, explore the evolutionary relationship between cultivated and wild species, develop genetic biomarkers and conduct genome-wide association studies.

Release at 2023-01-06
Zhiquan Yang, Jing Wang, Yiming Huang, Shengbo Wang, Lulu Wei, Dongxu Liu, Yonglin Weng, Jinhai Xiang, Qiang Zhu, Zhaoen Yang, Xinhui Nie, Yu Yu, Zuoren Yang, Qing-Yong Yang, CottonMD: a multi-omics database for cotton biological study, Nucleic Acids Research, Volume 51, Issue D1, 6 January 2023, Pages D1446–D1456. DOI: 10.1093/nar/gkac863

MBA: Macaque brain atlas

Macaque Brain Atlas (MBA) generated single-cell chromatin accessiblity (single-cell AAC) and transcriptomic data of 358,237 cels from three coricaregions of the adult cynomolgus monkey Macaca fascicularis brain. We then integrated this dataset with Stereo-seg Spatio-Temporal EnhancedResolution Omics-sequencing) of the corresponding cortical areas to assign topographic information to molecular and regulatory states.

Release at 2022-11-08
Lei, Y., Cheng, M., Li, Z., Zhuang, Z., Wu, L., Sun, Y., ... & Xu, X. (2022). Spatially resolved gene regulatory and disease-related vulnerability map of the adult Macaque cortex. Nature Communications, 13(1), 6747. DOI: 10.1038/s41467-022-34413-3

ARTISTA: Axolotl Regenerative Telencehpalon Interpretation via Spatiotemporal Transcriptomic Atlas

Axolotl Regenerative Telencehpalon Interpretation via Spatiotemporal Transcriptomic Atlas (ARTISTA) is a spatially resolved transcriptomic data resource that provides visualization of gene expression across the regeneration and development stages of axolotl telencephalon at single cell resolution.

Release at 2022-09-02
Wei, X., Fu, S., Li, H., Liu, Y., Wang, S., Feng, W., ... & Gu, Y. (2022). Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration. Science, 377(6610), eabp9444. DOI: 10.1126/science.abp9444

MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas

Mouse Organogenesis Spatiotemporal Transcriptomic Atlas (MOSTA), provides the spatial map showing the gene expression, gene co-expression, and regulons in each embryo sagittal section. The panoramic atlas allows in-depth investigation of longstanding questions concerning mammalian development.

Release at 2022-05-04
Chen, A., Liao, S., Cheng, M., Ma, K., Wu, L., Lai, Y., ... & Wang, J. (2022). Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 185(10), 1777-1792. DOI: 10.1016/j.cell.2022.04.003

ZESTA: Zebrafish Embryogenesis Spatiotemporal Transcriptomic Atlas

ZESTA employed Stereo-seq to dissect the spatiotemporal dynamics of gene expression and regulatory networks in the developing zebrafish embryos. We profiled 91 embryo sections covering six critical time points during the first 24 hours of development, obtaining a total of 152,977 spots at a resolution of 10x10x15 µm3 (close to cellular size) with spatial coordinates. Meanwhile, we identified spatial modules and co-varying genes for specific tissue organizations. By performing the integrated analysis of the Stereo-seq and scRNA-seq data from each time point, we reconstructed the spatially resolved developmental trajectories of cell fate transitions and molecular changes during zebrafish embryogenesis.

Release at 2022-05-04
Liu, C., Li, R., Li, Y., Lin, X., Zhao, K., Liu, Q., ... & Liu, L. (2022). Spatiotemporal mapping of gene expression landscapes and developmental trajectories during zebrafish embryogenesis. Developmental Cell, 57(10), 1284-1298. DOI: 10.1016/j.devcel.2022.04.009

Flysta3d: High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae

Flysta3D is intended to curate 3D spatial transcriptomes of all stages of Drosophila embryos and larvae generated by Stereo-seq. It could visualize and analyze spatial expression patterns of genes of interest, 3D reconstruct tissue-specific spatial transcriptomes by clustering and annotation, simulate tissue developmental trajectory across development, identify cell signalling pathways and gene regulatory networks, examine gene functions in their intact spatial context, etc.

Release at 2022-05-04
Wang, M., Hu, Q., Lv, T., Wang, Y., Lan, Q., Xiang, R., ... & Liu, L. (2022). High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae. Developmental Cell, 57(10), 1271-1283. DOI: 10.1016/j.devcel.2022.04.006

NHPCA: Non-human Primate Single-cell Atlas Database

Non-human primate cell atlas (NHPCA) is a single cell multi-omics database of non-human primate (NHP). This single cell data resource provides visualization and preliminary analysis of transcriptomic and forthcoming epigenetic single cell data sampled from multiple NHP organs or tissues, aiming to construct a comprehensive, high quality single cell reference map with multi-omics single cell sequencing technology. The reference map is composed of not only healthy adult single cell atlas but embryonic development and disease animal model single cell atlas. These single cell maps will be the critical digital basic structures for better understanding physiological function and underlying pathogenesis of diseases.

Release at 2022-04-13
Han, L., Wei, X., Liu, C. et al. Cell transcriptomic atlas of the non-human primate Macaca fascicularis. Nature 604, 723–731 (2022). DOI: 10.1038/s41586-022-04587-3

Deepseadb: DataBase of Deep-Sea Life

The Database of Deep-Sea Life (Deepseadb) involves both sequencing resource and metadata of ecological communities, isolates and animals collected from the deep-sea (>1000m depth). This database aims to be a dictionary for the exploration and utilization of genetic resources in the deep-sea, which provides uniformed metadata, standard analyzed data and batched analysis tool.

Release at 2021-08-10

QTP: Qinghai-Tibet Plateau Animal Microbiome Database

The completion of the QTP Animal Microbiome Project has generated more than 30Tb of gut microbial metagenomics data. Combined with other omics tools such as culturomics, metabolomics, and function verifications based on germ-free animal models, this project not only provides new knowledges for understanding omics mechanism of those animals' adaptations to extreme environments, but also offers an opportunity and guarantee for the mining, protection and sustainable utilization of unique gut microbial species and genetic resources.

Release at 2021-04-06

GISAID: EpiCoV™ Database

China National GeneBank DataBase (CNGBdb) is an official partner of the GISAID Initiative. It provides access to EpiCoV and features the most complete collection of hCoV-19 genome sequences along with related clinical and epidemiological data. With the data from this database scientific researchers can construct a virus phylogenetic tree to reveal the characteristics of the pathogen, and provide effective references for the study and analysis of the evolutionary source and pathological mechanism of the novel coronavirus.

Release at 2021-03-31

Agricultural Digital Service Platform

Based on the germplasm resource bank, the establishment of the threein-one Core Germplasm GeneBank wil provide basic support for improvingmolecular breeding eficiency, increasing germplasm resources and data sharing, and winning the "seed industry turnaround".

Release at 2021-02-26

BiosalineDB: Biosaline DataBase

Biosaline DataBase is a comprehensive database of information related to the research and utilization of biosaline. It mainly contains the halophyte resources, genomic information and literature about the saline tolerance in plant.

Release at 2020-10-14

Fish10K: The 10,000 Fish Genomes Project

Fish10K (The 10,000 Fish Genomes Project) was officially launched by BGI at ICG-Ocean 2019, which was held in September 2019, aiming to sample, sequence, assemble and analyze 10,000 representative fish genomes under a systematic context within ten years. We will construct high-quality reference genomes for representative species in all orders (Phase I) and families (Phase II) in concert with the generation of draft genome sequences for additional related species (Phase III).

Release at 2020-08-18
Fan, G., Song, Y., Yang, L., Huang, X., Zhang, S., Zhang, M., ... & He, S. (2020). Initial data release and announcement of the 10,000 Fish Genomes Project (Fish10K). GigaScience, 9(8), giaa080. DOI: 10.1093/gigascience/giaa080

ZEAMAP: a comprehensive database adapted to the maize multi-omics era

ZEAMAP is a comprehensive database incorporating population level multi-omic data for the Zea genus, including genomics, transcriptomics, genetic variants, phenotypes, metabolomics, epigenetics, and genetic mapping loci of complex traits. ZEAMAP is user friendly, with the ability to interactively integrate, visualize, and cross-reference multiple different omics datasets. The database is powered by the National Key Laboratory of Crop Genetic Improvement (Huazhong Agricultural University), Beijing Genomics Institute-Shenzhen (BGI)and China National GeneBank(CNGB), and aims to support the maize improvement by integrating pan Zea multi-omics information.

Release at 2020-06-26
Gui, S., Yang, L., Li, J., Luo, J., Xu, X., Yuan, J., ... & Yan, J. (2020). ZEAMAP, a comprehensive database adapted to the maize multi-omics era. IScience, 23(6). DOI: 10.1016/j.isci.2020.101241

MPDB: Medicinal Plant DataBase

The germplasm resources of medicinal plants, as the source of the traditional Chinese medicine industry, play an important role in the development of Chinese medicine. In recent years, the research on medicinal plant resources has developed rapidly, which has provided huge new drug creation resources for the pharmaceutical industry, and research on gene resources of medicinal plants has gradually become a hot topic in the industry. As one of the countries with the most abundant germplasm resources of medicinal plants in the world, strengthening the conservation, sustainable use and research of medicinal plant genetic resources is of great significance to the development of Chinese medicine industry.

Release at 2020-03-31

VirusDIP : Virus Data Integration Platform

VirusDIP(Virus Data Integration Platform) is a community portal for viral sequence data from CNGBdb,GISAID,NCBI. provides meta-information retrieval and data files download services, and deploy multiple tools such as BLAST(Basic Local Alignment Search Tool), phylogenetic analysis, and the genome browser gradually,and Multi-party secure computing tool. The Virus Data Integration Platform initiative aims to provide a reference for rapid identification of pathogens, tracking of specific source of the epidemic outbreaks, virus phylogenetic analysis and research, and pathological mechanisms of the disease.

Release at 2020-03-27

HCL: Human Cell landscape

We used single-cell RNA sequencing to determine the cell type composition of major human organs and construct a basic scheme for the human cell landscape (HCL). The HCL database contains data visualization resources of 102 human cell types and 843 cell subtypes identified from 702,968 single-cell transcriptome data, and the scHCL can help you to identify cell types in your data.

Release at 2020-03-25
Han, X., Zhou, Z., Fei, L. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020). DOI: 10.1038/s41586-020-2157-4

NGD: Nelumbo Genome Database

The Nelumbo Genome Database is constructed by collaboration between the Wuhan Institute of Landscape Architecture and Research Group of Aquatic Plant Biogeography from Wuhan Botanical Garden, Chinese Academy of Sciences. This database not only can browse and retrieve multiple sets of data but also perform simple analysis based on deployed tools such as Blast and Primer.

Release at 2020-03-10
Li, H., Yang, X., Wang, Q., Chen, J., & Shi, T. (2021). Distinct methylome patterns contribute to ecotypic differentiation in the growth of the storage organ of a flowering plant (sacred lotus). Molecular ecology, 30(12), 2831-2845. DOI: 10.1111/mec.15933

10KMP: Database of 10,000 Medicinal Plants

The Database of 10,000 Medicinal Plants is designed based on the data of Guangxi Innovation-Driven Development Special Project, Construction of Big Data Centre of Medicinal Plants (Guike AA18242040). The database aims at building the largest big data center of medicinal plants in the world, interpreting the relationship between the big data of medicinal plants and the pharmaceutical efficacy, and, via smart innovation of medicinal plant resources and smart creation of medicinal plant products, implementing sustainable utilization of medicinal plant resources and accomplishing resource digitization, data industrialization as well as industry modernization, so as to promote the development of traditional Chinese medicine and the health industry.

Release at 2019-12-18

MMHP: Million Microbiomes from Humans Project

The "Million Microbiomes from Humans Project" (MMHP) was officially launched at the 14th International Conference on Genomics (ICG-14). Scientists from China, Sweden, Denmark, France, and Latvia agreed to collaborate on a large-scale microbial metagenomic project, aiming to sequence and analyze one million samples from the intestine, mouth, skin, reproductive tract, and other organs in the next three to five years to construct a microbiome map of the human body and build the world's largest database of the human microbiome.

Release at 2019-10-29
Fang C, Zhong H, Lin Y, et al. Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing. Gigascience, 2018, 7(3): gix133. DOI: 10.1093/gigascience/gix133

MDB: Microbiome Database

The Microbiome Database (MDB) provides relevant sample and microbial data. The Human Microbial Database currently covers sequencing data volume of 83G and phenotypic information from 1,443 cases of stool samples from 8 human intestinal microbiological research projects. It also contains the most complete human intestinal microbial gene set in the world.

Release at 2019-10-28
Jie Zhu, Liu Tian, Peishan Chen, Mo Han, Liju Song, Xin Tong, Xiaohuan Sun, Fangming Yang, Zhipeng Lin, Xing Liu, Chuan Liu, Xiaohan Wang, Yuxiang Lin, Kaiye Cai, Yong Hou, Xun Xu, Huanming Yang, Jian Wang, Karsten Kristiansen, Liang Xiao, Tao Zhang, Huijue Jia, Zhuye Jie, Over 50,000 Metagenomically Assembled Draft Genomes for the Human Oral Microbiome Reveal New Taxa, Genomics, Proteomics & Bioinformatics, Volume 20, Issue 2, April 2022, Pages 246–259. DOI: 10.1016/j.gpb.2021.05.001

PIRD: Pan Immune Repertoire Database

Pan Immune Repertoire Database (PIRD) mainly focuses on immune data related to human body. It collects BCR and TCR sequencing data of various disease and experimental information and phenotype information of the corresponding individual. The PIRD V1 has stored data of 1,923 samples and 554,696,060 sequences. The PIRD provides data comparison and visualization services for researchers and clinicians in the field of disease and public health.

Release at 2019-08-02
ZHANG W, Wang L, Liu K, Wei X, Yang K, Du W, Wang S, Guo N, Ma C, Luo L, et al. PIRD: Pan immune repertoire database. Bioinformatics(2019) DOI: 10.1093/bioinformatics/btz614

10KP: 10,000 Plant Genomes Project

The 10,000 plants (tenKP or 10KP) aims to sequence over 10,000 genomes representing every major clade of plants and eukaryotic microbes. This project would generate large-scale plant genome data within the next five years (2017-2022), addressing fundamental questions about plant evolution. Major supporters include Beijing Genomics Institute in Shenzhen (BGI-Shenzhen) and China National Gene Bank (CNGB). BGI corporate will support this project by developing new tools for de novo genome sequencing and assembly on MGISEQ platforms.

Release at 2018-02-20
Cheng, S., Melkonian, M., Smith, S. A., Brockington, S., Archibald, J. M., Delaux, P. M., ... & Wong, G. K. S. (2018). 10KP: A phylodiverse genome sequencing plan. Gigascience, 7(3), giy013. DOI: 10.1093/gigascience/giy013

GDRD: Genetic Disease and Rare Disease database

GDRD is an integrated platform for genetic disease and rare disease research and application which focuses on collection, storage, analysis and mining of human genetic data, and phenotype data. Now in GDRD phase I,around 7,000 papers, 10,000 causative variants and 300 families with rare disorders from BGI, clinVar and OMIM database have been organized and presented on the website.

Release at 2017-10-27

PVD: Pathogen Variation Database

Pathogen Variation Database (PVD) focuses on the identification and detection of unknown pathogens in human samples containing various pathogenic genomic data and related annotation information. The PVD demonstrates the results clearly and easily by data analysis and visualization, and will provide the toxicity identification and drug resistance of some pathogenic bacteria (HBV/HIV/HCV/HP). Thus, we offer the fast and comprehensive detection services for clinicians, patients and researchers.

Release at 2017-07-26

DISSECT: Data Integration Solution for Systematic Exploration of Cancer Traits

DISSECT (Data Integration Solution for Systematic Exploration of Cancer Traits) is a comprehensive data integration platform for cancer research, including the first mirror site of ICGC Data Portal in China, which provide important resources for domestic researchers. Based on the big data research, we attempt to establish the most comprehensive cancer big data integration system through large-scale, standardized data platform construction. The most valuable of the system is providing omics data integrating and the depth excavation analysis of the large samples data with single cancer or multiple cancers, to support the development of Chinese precision cancer medicine.

Release at 2017-07-10

WGVD: Wheat Genome Variation Database and Selective Signatures

The Northwest Agriculture and Forestry University Wheat Variation Database is deployed as a mirror database in CNGBdb. It contains data for resequencing and exon sequencing of 968 wheat germplasm. Variations include 7,353,314 SNPs and 1,044,400 indels, and provides statistics of wheat population selection signals. (Pi and FST) and display.

Release at 2017-07-07
Avni, R., Nave, M., Barad, O., Baruch, K., Twardziok, S. O., Gundlach, H., ... & Distelfeld, A. (2017). Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science, 357(6346), 93-97. DOI: 10.1126/science.aan0032

FishT1K: Transcriptomes of 1,000 Fishes

FishT1K (Transcriptomes of 1,000 fishes) project was officially launched by BGI in November 2013, with the aim of generating genome-wide transcriptome sequences for 1,000 diverse species of fishes using RNA-seq. The FishT1K database will establish the first data storage, application, sharing platform for fish group research, greatly advancing the study of fish biology, eventually contributing towards global fish biodiversity conservation efforts and sustainable utilization of natural resources. In addition, the database will promote development of new technologies and softwares for transcriptome sequencing, data analysis, annotation, and storage.

Release at 2016-05-03
Sun, Y., Huang, Y., Li, X., Baldwin, C. C., Zhou, Z., Yan, Z., ... & Shi, Q. (2016). Fish-T1K (Transcriptomes of 1,000 Fishes) Project: large-scale transcriptome data for fish evolution studies. Gigascience, 5(1), s13742-016. DOI: 10.1186/s13742-016-0124-7

B10K: Bird 10K Genomes

The Bird 10,000 Genomes (B10K) Project plans to generate representative draft genome sequences from all extant bird species within the next five years (2015-2020). The B10K project will complete a genomic level tree of the entire bird species, decode the relationship between genetic variation and phenotypic variation, uncover the correlation of genetic evolutionary and biogeographical and biodiversity patterns, evaluate the impact of various ecological factors and human influence on species evolution, and unveil the demographic history.

Release at 2015-06-03
Zhang, G. (2015). Bird sequencing project takes off. Nature, 522(7554), 34-34. DOI: 10.1038/522034d

1KITE: 1,000 Insect Transcriptome Evolution

Insects are one of the most species-rich groups of metazoan organisms. They play a pivotal role in most non-marine ecosystems and many insect species are of enormous economical and medical importance. Unraveling the evolution of insects is essential for understanding how life in terrestrial and limnic environments evolved. The 1KITE (1K Insect Transcriptome Evolution) project aims to study the transcriptomes (that is the entirety of expressed genes) of more than 1,000 insect species encompassing all recognized insect orders.

Release at 2014-11-07
Misof, B., Liu, S., Meusemann, K., Peters, R. S., Donath, A., Mayer, C., ... & Zhou, X. (2014). Phylogenomics resolves the timing and pattern of insect evolution. Science, 346(6210), 763-767. DOI: 10.1126/science.1257570

OneKP: 1,000 Plants

The 1, 000 Plants project(OneKP or 1 KP) is an international multi - disciplinary consortium that has generated large - scale genome sequencing data for over 1, 000 species of plants.We constructed online BLAST platform based on the OneKP datasets in the database.

Release at 2014-10-27
Matasci, N., Hung, L. H., Yan, Z., Carpenter, E. J., Wickett, N. J., Mirarab, S., ... & Wong, G. K. S. (2014). Data access for the 1,000 Plants (1KP) project. Gigascience, 3(1), 2047-217X. DOI: 10.1186/2047-217X-3-17

MilletDB: Millet DataBase

The millet database is based on the data of the millet genome project researched by BGI and Zhangjiakou Academy of Agricultural sciences. The database records the genotype-phenotype information of millet. Users can query and retrieve the genotype of millet through the phenotype, and the corresponding phenotype can be retrieved by genotype. Besides, the database also applies the big data technology and machine learning method to construct the genotype-phenotype model to promote the intelligent molecular breeding.

Release at 2013-06-23
Zhang, G., Liu, X., Quan, Z. et al. Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat Biotechnol 30, 549–554 (2012). DOI: 10.1038/nbt.2195

Copied.