The Data Application Team of the Big Data Center
China National GeneBank(CNGB) is developing a new system architecture to rebuild the data layer (data warehouse, data mart, database cluster, index cluster and computing cluster) and application layer (search engine, data analysis, data visualization, authorization management, data submission service) of the platform which can support the rapid growth of PB-level biological big data. We also use API services to manage and open the different ability of our system including the storage, computing, network, search, and analysis capabilities. Current CNGB has upgraded and developed some new database applications with the new architecture, and will release more powerful applications with new features (including data search engine and submission services) in the second half of this year.
CNGB Nucleotide Sequence Archive
BioMiGo, a bio-data search engine, is committed to data sharing. BioMiGo integrates a wealth of comprehensive data resources, including plants, animals, microbes and diseases, etc.
DBazar likes a data "market" for data sharing. One of the data services, CNGB Nucleotide Sequence Archive (CNSA) has been launched now.
The databases cover important topics in human(disease), agriculture, animals, plants, viruses and so on.
VizMusée, Visualize Atlas of Lives, supports visualization of all datasets included in BigData Application Center.
Computing tools platform collects a variety of computing tools that are commonly used in genomics research area, such as BLAST.
News and events
CNGB Nucleotide Sequence Archive (CNSA) released
CNGB Nucleotide Sequence Archive (CNSA) is a convenient and fast online submission system for biological research projects, samples, experiments and other information data. CNSA is committed to the storage and sharing of biological sequencing information and data, and is designed to provide global researchers with the most comprehensive data and information resources, enabling researchers to access and use data easily and deeply.
Free sharing of ten thousands of cells' omics data from Single Cell DataBase
The Single Cell DataBase (SCDB) will create an atlas of human cells, catalog all the body cells and subtypes, build a complete list of human cells, define cells and construct a human cell frame. By now, the database has pooled and demonstrated a single cell project group including 46 samples, 30,854 cells, and 470G single cell omics data.
Free online pathogen identification service of Pathogen Variation Database (PVD) v1.0 released
Pathogen Variation Database (PVD) focuses on the identification and detection of millions of pathogens in human samples containing various pathogenic genomic data and related annotation information. The PVD demonstrates the results clearly and easily by data analysis and visualization, and will provide the toxicity identification and drug resistance of some pathogenic (HBV/HIV/HCV/HP).
The public beta version of high-performance sequence alignment service
CNGB is developing a high-performance sequence searching service for researchers. Now the public beta version is based on NCBI BLAST+ 2.6.0, and integrated with most of NCBI BLAST databases and some of the CNGB public data. In the second half of this year, we will make more effort on optimization of the sequence searching service based on parallel computing method, collection of the new high-quality datasets, integration with the visualization function, and release the stable version.
The first version of Marine Life Genome Database (MLGD) with 472,547 species information released
Marine Life Genome Database (MLGD) is an on-line database aiming to provide a comprehensive knowledge and analysis for the genome of marine lives. We collected the genome, transcriptome and proteome data and information of the marine species that has been sequenced and published so far. This information is organized based on the taxonomy of marine lives and each species can be searched and viewed in the taxon tree. At present, 472,547 species information, 7,538 genomic data, and 25,514 image information have been collected in MLGD.
PIRD update log
The new version v1.1 of Pan Immune Repertoire Database (PIRD) is integrated with the knowledge repository including records CDR3 sequences, specific diseases and corresponding CDR3 information, and provides support for immunological disease research.