Data resources


The main data of the Gene database for query is derived from the genomic gene data of different species sequenced by CNGB and the genetic information of external databases such as NCBI, EBI, DDBJ, etc. The main available information includes gene name, chromosomal location, gene product and its attributes, genome of a gene, gene sequence, gene variation, etc.


The Variation database contains more than 600 million pieces of variation data from rare disease database, cancer database, population database of CNGB, and dbSNP, dbvar, Clinvar of NCBI. The variation data is associated with disease, phenotype, literature, and population frequency to provide query and retrieval.


The Protein database is a collection of sequences from several protein sequence databases, such as Uniprot, etc., which provides query and retrieval of protein information, mainly including protein name, protein length, organism, gene encoding proteins etc.


The Sequence database integrates several sources from various nucleic acid sequence databases, such as CNGB immune sequence database, Refseq, Genbank, Nucleotide, etc., which provides query and retrieval of sequence information, mainly including sequence title, species, length, molecular type and so on.


A project is a general description of a study, usually containing multiple samples and data sets. The data of the Project database is mainly derived from the project data archived by CNSA, and the public data supports query and retrieval. The main available information includes the project name, description, data type, etc., helping users quickly find different types of data sets.


A sample describes the biological source materials used in experimental assays, and each sample submitted to CNSA needs to have a separate attribute. The Sample database provides query and retrieval of public sample information, mainly including sample name, organism, sample type, description, etc.


An experiment describes the experimental information of the method of sample library construction, sequencing instrument, sequencing method, etc. One experiment is usually associated with one project and one sample. The Experiment database provides query and retrieval of public experimental information, mainly including experiment title, sequencing platform, library construction strategy, library source, and library option.


The genome assembly result assembled from the original data is defined as assembly data. The data of the Assembly database is mainly derived from the assembly data archived by CNSA. The main available information includes assembly name, molecular type, sequencing technology, assembly method, etc.