The Literature database includes more than 30 million Chinese and English biomedical literature from GigaScience, Pubmed, Europe PMC, Baidu Academic, etc., supporting Chinese and English retrieval.
The main data of the Gene database for query is derived from the genomic gene data of different species sequenced by CNGB and the genetic information of external databases such as NCBI, EBI, DDBJ, etc. The main available information includes gene name, chromosomal location, gene product and its attributes, genome of a gene, gene sequence, gene variation, etc.
The Organism database is built for querying classification and nomenclature of organisms. The data is mainly from the new species data of animals, plants and marine species sequenced by CNGB and external databases such as NCBI, EBI, DDBJ, etc. The main available information includes organism name and rank, references, etc.
The Variation database contains more than 600 million pieces of variation data from rare disease database, cancer database, population database of CNGB, and dbSNP, dbvar, Clinvar of NCBI. The variation data is associated with disease, phenotype, literature, and population frequency to provide query and retrieval.
The Protein database is a collection of sequences from several protein sequence databases, such as Uniprot, etc., which provides query and retrieval of protein information, mainly including protein name, protein length, organism, gene encoding proteins etc.
The Sequence database integrates several sources from various nucleic acid sequence databases, such as CNGB immune sequence database, Refseq, Genbank, Nucleotide, etc., which provides query and retrieval of sequence information, mainly including sequence title, species, length, molecular type and so on.
The Genome database contains genomic information for more than 1000 species, including bacteria, archaea and eukaryote, as well as virus, phage, viroid, plasmid and organelle. The data is mainly from the Genome database of NCBI.
A project is a general description of a study, usually containing multiple samples and data sets. The data of the Project database is mainly derived from the project data archived by CNSA, and the public data supports query and retrieval. The main available information includes the project name, description, data type, etc., helping users quickly find different types of data sets.
A sample describes the biological source materials used in experimental assays, and each sample submitted to CNSA needs to have a separate attribute. The Sample database provides query and retrieval of public sample information, mainly including sample name, organism, sample type, description, etc.
An experiment describes the experimental information of the method of sample library construction, sequencing instrument, sequencing method, etc. One experiment is usually associated with one project and one sample. The Experiment database provides query and retrieval of public experimental information, mainly including experiment title, sequencing platform, library construction strategy, library source, and library option.
The genome assembly result assembled from the original data is defined as assembly data. The data of the Assembly database is mainly derived from the assembly data archived by CNSA. The main available information includes assembly name, molecular type, sequencing technology, assembly method, etc.
The Run database stores the second generation sequencing data, including raw data generated by different sequencing platforms such as BGISEQ, MGISEQ, HiSeq, 454, etc. These raw data are archived by CNSA to provide query and retrieval.