On October 25, 2018, at the 13th International Conference on Genomics (ICG-13) hosted by BGI, with the witness of the guests at home and abroad, the China National GeneBank DataBase (CNGBdb) officially released and became a highlight of ICG-13.
Xiaofeng Wei, the head of the CNGBdb platform and the leader of the data application team in the Big Data Center of BGI, introduced CNGBdb. The platform was built to promote the sharing and utilization of biological data. Unlike other data platforms, based on the big data and cloud computing technologies, CNGBdb has integrated massive data from CNGB (China National GeneBank), NCBI (National Center for Biotechnology Information), EBI (European Bioinformatics Institute), OMIM (Online Mendelian Inheritance in Man), etc., to meet the multiple needs such as search, storage, computation and use, to broke the barriers of database decentralization in the field, to provide one-stop solution to all needs of users with a unified portal, so it is called “Google in the genetic world”.
The head of CNGBdb platform
What can CNGBdb do specifically? Xiaofeng Wei, the head of the platform, said that the platform provides data services such as data archive, knowledge search, computational analysis, management authorization, and visualization to researchers around the world, covering more than a dozen studies on maternal and child health, tumors, animal and plant diversity, and pathogenic microorganisms, which has formed a super large scientific research data system that integrates multiple research fields, multiple data types, and multiple analysis dimensions.
CNGBdb data structure
CNGBdb first solved the problem of “storage” for domestic researchers – the domestic localized data archive space. Its CNGB Nucleotide Sequence Archive (CNSA) provides English-Chinese bilingual interface, 1Gb bandwidth, and batch submission functions to ensure users can use easily, upload and download quickly, and strengthen the security of genetic data resources in China. At the same time, CNGBdb assigns each scientific data a unique "identity card" - DOI (Digital Object Identifier) identifier for easy retrieval, tracking and citation, increasing data exposure and citation rate. At present, the platform has supported nearly 3,000 projects and with an archived data volume of nearly 600TB.
It is reported that another major advantage of CNGBdb is the distributed, AI-driven search engine. It is also the largest search engine in the vertical field of life science, and has realized the index of more than 3 billion data and interconnection of 10TB meta information. 12 data structures such as literature, variation, gene, protein, and sequence, etc., are linked to each other, and the related information is showed on the same page at the time of retrieval, which makes the efficiency of information collection and screening double. In addition, the CNGBdb search engine not only achieves second-level response speed, but also supports Chinese keywords and full-text search.
In terms of "computation", CNGBdb provides a series of data computing and analysis services, of which BLAST is one of the most recommended applications. It is a high-performance hybrid computing pool that integrates NCBI's latest nt, nr libraries and has CNGB's unique the 1, 000 Plants project (1KP) dataset, the Bird 10,000 Genomes (B10K) Project dataset, and the world's largest immune sequence dataset etc., to make it easy for users to search and compare nucleic acid or protein sequences.
The release of CNGBdb is a great event in the field of life science, which has aroused strong repercussions from guests at home and abroad. The platform is now officially online, and users can access its features and services at db.cngb.org. Xiaofeng Wei, the head of the platform, said that GNGBdb will provide great convenience for scientific research in life science at home and abroad. The CNGB will further expand the data storage of the platform, and continue to strengthen the functions and services of the platform in combination with cutting-edge technologies to better promote the interoperability, open sharing and effective use of life multi-omics big data, and promote the rapid development of life science and bioindustry.
Log in to CNGBdb or the CNGBl website
Experience the services of CNGBdb
Scan the code and sneak peek:
China National GeneBank DataBase (CNGBdb)