The scientific database is based on the data of CNGBdb to construct multiple databases of multi-omics,aiming to provide scientific data services for different research areas, such as plant, animal, micro organism virus, disease and health, etc., support the needs of researchers , enhance the value of data.
The 1, 000 Plants project(OneKP or 1 KP) is an international multi - disciplinary consortium that has generated large - scale genome sequencing data for over 1, 000 species of plants.We constructed online BLAST platform based on the OneKP datasets in the database.
Biosaline DataBase is a comprehensive database of information related to the research and utilization of biosaline. It mainly contains the halophyte resources, genomic information and literature about the saline tolerance in plant.
The 10,000 plants (tenKP or 10KP) aims to sequence over 10,000 genomes representing every major clade of plants and eukaryotic microbes. This project would generate large-scale plant genome data within the next five years (2017-2022), addressing fundamental questions about plant evolution. Major supporters include Beijing Genomics Institute in Shenzhen (BGI-Shenzhen) and China National Gene Bank (CNGB). BGI corporate will support this project by developing new tools for de novo genome sequencing and assembly on MGISEQ platforms.
The Database of 10,000 Medicinal Plants is designed based on the data of Guangxi Innovation-Driven Development Special Project, Construction of Big Data Centre of Medicinal Plants (Guike AA18242040). The database aims at building the largest big data center of medicinal plants in the world, interpreting the relationship between the big data of medicinal plants and the pharmaceutical efficacy, and, via smart innovation of medicinal plant resources and smart creation of medicinal plant products, implementing sustainable utilization of medicinal plant resources and accomplishing resource digitization, data industrialization as well as industry modernization, so as to promote the development of traditional Chinese medicine and the health industry.
The Northwest Agriculture and Forestry University Wheat Variation Database is deployed as a mirror database in CNGBdb. It contains data for resequencing and exon sequencing of 968 wheat germplasm. Variations include 7,353,314 SNPs and 1,044,400 indels, and provides statistics of wheat population selection signals. (Pi and FST) and display.
The germplasm resources of medicinal plants, as the source of the traditional Chinese medicine industry, play an important role in the development of Chinese medicine. In recent years, the research on medicinal plant resources has developed rapidly, which has provided huge new drug creation resources for the pharmaceutical industry, and research on gene resources of medicinal plants has gradually become a hot topic in the industry. As one of the countries with the most abundant germplasm resources of medicinal plants in the world, strengthening the conservation, sustainable use and research of medicinal plant genetic resources is of great significance to the development of Chinese medicine industry.
The millet database is based on the data of the millet genome project researched by BGI and Zhangjiakou Academy of Agricultural sciences. The database records the genotype-phenotype information of millet. Users can query and retrieve the genotype of millet through the phenotype, and the corresponding phenotype can be retrieved by genotype. Besides, the database also applies the big data technology and machine learning method to construct the genotype-phenotype model to promote the intelligent molecular breeding.
The Bird 10,000 Genomes (B10K) Project plans to generate representative draft genome sequences from all extant bird species within the next five years (2015-2020). The B10K project will complete a genomic level tree of the entire bird species, decode the relationship between genetic variation and phenotypic variation, uncover the correlation of genetic evolutionary and biogeographical and biodiversity patterns, evaluate the impact of various ecological factors and human influence on species evolution, and unveil the demographic history.
FishT1K (Transcriptomes of 1,000 fishes) project was officially launched by BGI in November 2013, with the aim of generating genome-wide transcriptome sequences for 1,000 diverse species of fishes using RNA-seq. The FishT1K database will establish the first data storage, application, sharing platform for fish group research, greatly advancing the study of fish biology, eventually contributing towards global fish biodiversity conservation efforts and sustainable utilization of natural resources. In addition, the database will promote development of new technologies and softwares for transcriptome sequencing, data analysis, annotation, and storage.
Fish10K (The 10,000 Fish Genomes Project) was officially launched by BGI at ICG-Ocean 2019, which was held in September 2019, aiming to sample, sequence, assemble and analyze 10,000 representative fish genomes under a systematic context within ten years. We will construct high-quality reference genomes for representative species in all orders (Phase I) and families (Phase II) in concert with the generation of draft genome sequences for additional related species (Phase III).
Insects are one of the most species-rich groups of metazoan organisms. They play a pivotal role in most non-marine ecosystems and many insect species are of enormous economical and medical importance. Unraveling the evolution of insects is essential for understanding how life in terrestrial and limnic environments evolved. The 1KITE (1K Insect Transcriptome Evolution) project aims to study the transcriptomes (that is the entirety of expressed genes) of more than 1,000 insect species encompassing all recognized insect orders.
Non-human primate cell atlas (NHPCA) is a single cell multi-omics database of non-human primate (NHP). This single cell data resource provides visualization and preliminary analysis of transcriptomic and forthcoming epigenetic single cell data sampled from multiple NHP organs or tissues, aiming to construct a comprehensive, high quality single cell reference map with multi-omics single cell sequencing technology. The reference map is composed of not only healthy adult single cell atlas but embryonic development and disease animal model single cell atlas. These single cell maps will be the critical digital basic structures for better understanding physiological function and underlying pathogenesis of diseases.
The Microbiome Database (MDB) provides relevant sample and microbial data. The Human Microbial Database currently covers sequencing data volume of 83G and phenotypic information from 1,443 cases of stool samples from 8 human intestinal microbiological research projects. It also contains the most complete human intestinal microbial gene set in the world.
The "Million Microbiomes from Humans Project" (MMHP) was officially launched at the 14th International Conference on Genomics (ICG-14). Scientists from China, Sweden, Denmark, France, and Latvia agreed to collaborate on a large-scale microbial metagenomic project, aiming to sequence and analyze one million samples from the intestine, mouth, skin, reproductive tract, and other organs in the next three to five years to construct a microbiome map of the human body and build the world's largest database of the human microbiome.
China National GeneBank DataBase (CNGBdb) is an official partner of the GISAID Initiative. It provides access to EpiCoV and features the most complete collection of hCoV-19 genome sequences along with related clinical and epidemiological data. With the data from this database scientific researchers can construct a virus phylogenetic tree to reveal the characteristics of the pathogen, and provide effective references for the study and analysis of the evolutionary source and pathological mechanism of the novel coronavirus.
VirusDIP(Virus Data Integration Platform) is a community portal for viral sequence data from CNGBdb,GISAID,NCBI. provides meta-information retrieval and data files download services, and deploy multiple tools such as BLAST(Basic Local Alignment Search Tool), phylogenetic analysis, and the genome browser gradually,and Multi-party secure computing tool. The Virus Data Integration Platform initiative aims to provide a reference for rapid identification of pathogens, tracking of specific source of the epidemic outbreaks, virus phylogenetic analysis and research, and pathological mechanisms of the disease.
Pathogen Variation Database (PVD) focuses on the identification and detection of unknown pathogens in human samples containing various pathogenic genomic data and related annotation information. The PVD demonstrates the results clearly and easily by data analysis and visualization, and will provide the toxicity identification and drug resistance of some pathogenic bacteria (HBV/HIV/HCV/HP). Thus, we offer the fast and comprehensive detection services for clinicians, patients and researchers
We used single-cell RNA sequencing to determine the cell type composition of major human organs and construct a basic scheme for the human cell landscape (HCL). The HCL database contains data visualization resources of 102 human cell types and 843 cell subtypes identified from 702,968 single-cell transcriptome data, and the scHCL can help you to identify cell types in your data.
GDRD is an integrated platform for genetic disease and rare disease research and application which focuses on collection, storage, analysis and mining of human genetic data, and phenotype data. Now in GDRD phase I，around 7,000 papers, 10,000 causative variants and 300 families with rare disorders from BGI, clinVar and OMIM database have been organized and presented on the website.
DISSECT (Data Integration Solution for Systematic Exploration of Cancer Traits) is a comprehensive data integration platform for cancer research, including the first mirror site of ICGC Data Portal in China, which provide important resources for domestic researchers. Based on the big data research, we attempt to establish the most comprehensive cancer big data integration system through large-scale, standardized data platform construction. The most valuable of the system is providing omics data integrating and the depth excavation analysis of the large samples data with single cancer or multiple cancers, to support the development of Chinese precision cancer medicine.
Pan Immune Repertoire Database (PIRD) mainly focuses on immune data related to human body. It collects BCR and TCR sequencing data of various disease and experimental information and phenotype information of the corresponding individual. The PIRD V1 has stored data of 1,923 samples and 554,696,060 sequences. The PIRD provides data comparison and visualization services for researchers and clinicians in the field of disease and public health.
CDCP (Cell-omics Data Coordinate Platform) shares and integrate complex single cell datasets, and provides single cell analysis tools and visualization services to facilitate and enable researchers to access and explore published single-cell datasets.