China National GeneBank DataBase (CNGBdb) Officially Released
October 25, 2018On October 25, 2018, at the 13th International Conference on Genomics (ICG-13) hosted by BGI, with the witness of the guests at home and abroad, the China National GeneBank DataBase (CNGBdb) officially released and became a highlight of ICG-13.
Xiaofeng Wei, the head of the CNGBdb platform and the leader of the data application team in the Big Data Center of BGI, introduced CNGBdb. The platform was built to promote the sharing and utilization of biological data. Unlike other data platforms, based on the big data and cloud computing technologies, CNGBdb has integrated massive data from CNGB (China National GeneBank), NCBI (National Center for Biotechnology Information), EBI (European Bioinformatics Institute), OMIM (Online Mendelian Inheritance in Man), etc., to meet the multiple needs such as search, storage, computation and use, to broke the barriers of database decentralization in the field, to provide one-stop solution to all needs of users with a unified portal, so it is called “Google in the genetic world”.
The head of CNGBdb platform
What can CNGBdb do specifically? Xiaofeng Wei, the head of the platform, said that the platform provides data services such as data archive, knowledge search, computational analysis, management authorization, and visualization to researchers around the world, covering more than a dozen studies on maternal and child health, tumors, animal and plant diversity, and pathogenic microorganisms, which has formed a super large scientific research data system that integrates multiple research fields, multiple data types, and multiple analysis dimensions.
CNGBdb data structure
CNGBdb first solved the problem of “storage” for domestic researchers – the domestic localized data archive space. Its CNGB Nucleotide Sequence Archive (CNSA) provides English-Chinese bilingual interface, 1Gb bandwidth, and batch submission functions to ensure users can use easily, upload and download quickly, and strengthen the security of genetic data resources in China. At the same time, CNGBdb assigns each scientific data a unique "identity card" - DOI (Digital Object Identifier) identifier for easy retrieval, tracking and citation, increasing data exposure and citation rate. At present, the platform has supported nearly 3,000 projects and with an archived data volume of nearly 600TB.
CNSA
It is reported that another major advantage of CNGBdb is the distributed, AI-driven search engine. It is also the largest search engine in the vertical field of life science, and has realized the index of more than 3 billion data and interconnection of 10TB meta information. 12 data structures such as literature, variation, gene, protein, and sequence, etc., are linked to each other, and the related information is showed on the same page at the time of retrieval, which makes the efficiency of information collection and screening double. In addition, the CNGBdb search engine not only achieves second-level response speed, but also supports Chinese keywords and full-text search.
In terms of "computation", CNGBdb provides a series of data computing and analysis services, of which BLAST is one of the most recommended applications. It is a high-performance hybrid computing pool that integrates NCBI's latest nt, nr libraries and has CNGB's unique the 1, 000 Plants project (1KP) dataset, the Bird 10,000 Genomes (B10K) Project dataset, and the world's largest immune sequence dataset etc., to make it easy for users to search and compare nucleic acid or protein sequences.
CNGBdb BLAST
The release of CNGBdb is a great event in the field of life science, which has aroused strong repercussions from guests at home and abroad. The platform is now officially online, and users can access its features and services at db.cngb.org. Xiaofeng Wei, the head of the platform, said that GNGBdb will provide great convenience for scientific research in life science at home and abroad. The CNGB will further expand the data storage of the platform, and continue to strengthen the functions and services of the platform in combination with cutting-edge technologies to better promote the interoperability, open sharing and effective use of life multi-omics big data, and promote the rapid development of life science and bioindustry.
Log in to CNGBdb or the CNGBl website
Experience the services of CNGBdb
Scan the code and sneak peek:
China National GeneBank DataBase (CNGBdb)
The latest upgrades to CNSA: large capacity, batch submission, easy to hold your sequencing data
October 11, 2018Recently, the CNGB Nucleotide Sequence Archive (CNSA) has undergone a major upgrade. The batch submission function of sequencing data that many users are eagerly awaiting is finally realized for the first time in China. From now on, CNSA can not only complete data submission several times faster than before, but also greatly improve the operating experience.
With the development of biotechnology, a large number of biological research data have been produced. The massive achievement need to be shared, bring data security management and efficient transmission bottlenecks. CNSA established by China National GeneBank, built for the big data of life science, will solve the problems completely.
1 Solve the problem of local data submission
As a researcher, there is often a need to publish articles in authoritative scientific journals. When you submit sequencing data in advance, do you often encounter such problems?
As a result, the data submission is time-consuming and laborious, which not only makes many researchers feel overwhelmed, but also has other hidden dangers such as sensitive data outflow.
CNSA released on October 25, 2017. With the recognition and support of many researchers, CNSA has supported more than 700 scientific research projects with a total archived data volume of nearly 600TB.
The CNSA system is easy to operate, supports high-speed upload and download, and provides domestic localized storage space for genomics data. In addition, CNGB systematically optimizes and enhances both information security management and technology, which has passed the three-level review of information security level protection and the protection capability review of trusted cloud service, providing powerful information security protection for data storage.
2 omnipotent platform to promote global information sharing
CNSA is dedicated to facilitating the storage and sharing of sequencing data, providing researchers worldwide with the most comprehensive data and information resources, and making it easier and more in-depth for researchers to access and use data. In order to fully meet the various needs of biological researchers for data, achieve complete archive of data from generation, submission, synchronization, search to download, and ensure the integrity of scientific research results, CNSA can not only archive data, but also provide other important services:
Note: At present, CNSA does not accept relevant data covered by the Human Genetic Resources Regulations
3 Support the DOI index and publication of scientific research results
In addition to the above features, CNSA has a significant advantage, which can be identified using the DOI (Digital Object Identifier). At present, CNGB has become one of the first organizations to cooperate with the British Library on DataCite services and is the first data center in Asia to participate in the project. The use of unique DOI numbers to identify the data generated by scientific research and facilitate the retrieval, tracking and reference of data, can improve its exposure and citation rate, and will greatly facilitate the establishment of data sharing and sharing evaluation system.
CNSA follows the International Nucleotide Sequence Database Collaboration (INSDC) standard and the DataCite standard, and accepts sequencing data (including raw data and other supporting data) from around the world. Its data submission service can be used as a supplement to the literature publishing process to support early data sharing. CNSA will support the publication of global scientific research results, realize data sharing for different research types and scales, improve the reproducibility of scientific research results, and promote new discoveries in science and technology.
4 Batch submission, do more with less
In order to better meet the needs of large-scale data submission, the CNSA system has undergone a major upgrade. CNSA has realized the online batch submission function of nucleotide sequence archive system for the first time in China. At the same time, it also optimizes the operation experience. The submission process is clearer and consumes less time, so you can easily upload and archive large amounts of data.
CNSA link: https://db.cngb.org/cnsa/
Please contact us if any problem or suggestion.
Email:datasubs@cngb.org
CNGB Nucleotide Sequence Archive (CNSA) released
October 25, 2017On October 25, 2017, China National GeneBank released CNGB Nucleotide Sequence Archive (CNSA). CNSA is a convenient and fast online submission system for biological research projects, samples, experiments and other information data. CNSA is committed to the storage and sharing of biological sequencing information and data, and is designed to provide global researchers with the most comprehensive data and information resources, enabling researchers to access and use data easily and deeply.
With the development of biotechnology, a large number of biological research data have been produced. The massive achievement need to be shared, bring data security management and efficient transmission bottlenecks. CNSA established by China National GeneBank, built for the big data of life science, will solve the problems completely.
Combined with the international authority of the data structure standards to meet the global share of scientific research
CNSA accepts the submission of the raw reads and other support data, integrating with INSDC and Datacite standard, sharing different types of research and scale data.
Following the International data open protocol, Serving as a complement to the literature publication process of scientific research achievement all over the world.
CNSA follows the international data open protocol such as Fort Lauderdale Agreement, NHGRI Rapid Data Release Policies, Joint Data Archiving Policy, CC0-No Rights Reserved, accepting the submission of global scientific research sequencing data (including raw data and other support data), its data submission service can be used as a supplement to the literature publishing process to support early data sharing.
Following the user`s stated data permissions and rights constraints.
CNSA follows the "Interim Measures for the Management of Human Genetic Resources" and ethical norms of users` countries. Researchers need to send an electronic copy of the document that the Ethics Committee agrees to approve to the datasubs@cngb.org. For the data related to the collection, sale, export and exit approval of human genetic resources, researchers need to send an electronic copy of the document which the relevant department of human genetic resources management approve (Eg, a regional or national country with a human genetic resources management approach).
Ensuring a level of security,taking into account the categories of data
CNSA combines the data types and processing methods, using the corresponding technical and management measures to ensure that different levels of security.
Using high-performance distributed archiving system.
CNSA uses high-performance distributed for data archiving, with independent high availability backup storage system for secure data storage.
Having the high-speed internet network and logistics network
CNSA Relies on the high-speed internet network and logistics network of BGI and CNGB, covering the global multi-center, synchronizing the data to the to the global public databases quickly.
Having a full-text search engine
CNSA has a full-text search engine which can support Petabytes of data, combine any keywords and fast position.
Providing Localized Chinese language services, fastest feedback, zero-distance communication
CNSA provides Chinese and English artificial bilingual services and can contact us by phone, email, etc., to achieve barrier-free and zero-distance communication.
CNSA Quick Start Guide :
1. Raw sequence data submission
Raw data refers to all the original data generated by a sequencing without any filter theoretically.For raw sequence data submission, CNSA integrates data standards and structure of INSDC for data review and archiving, including projects, samples, experiments and data submission.
Fig 1. Raw sequence data submission process
After the raw data and related metadata has been submitted and reviewed by data administrator, CNSA will synchronize these data to ENA (European Nucleotide Archive) public database to obtain the ENA accession ID as ENA broker by default, and automatically return the ID to CNSA in which submitters can view directly on the overview page in related modules. If the submitted data requires permission control, or needs to be uploaded to NCBI SRA (Sequence Read Archive, National Center for Biotechnology Information) or DDBJ DRA (Sequence Read Archive, DNA Data Bank of Japan), please contact the administrator datasubs@cngb.org.2. Other support data submission
Other support data except the raw reads, which is related to articles or research, includes but not limited to process and result data, analysis methods, software programs, image files, audio files, video files, imaging files, electronic charts and word documents. CNSA cooperates with Gigascience GigaDB to archive the support data. With a link to DataCite, each dataset will be assigned with a DOI which can be directly referenced(Fig 2).
Fig 2. Other support data submission process
3. Data search and download
China National GeneBank (CNGB) joins Data Center Alliance and Open Data Center Committee
September 18, 2017In September 2017, the China National GeneBank (CNGB) joined the Data Center Alliance as a member of the board of directors and fulfilled the responsibilities of associate members. Xun Xu, the executive director of CNGB and dean of BGI serves as the director of the Alliance, who will give priority to the work of the alliance infrastructure working group, IT equipment working group, internet security working group, international cooperation committee and so on, and gradually extend to other work.
At the same time, the CNGB will join the Open Data Center Committee (ODCC), which will fully participate in all working groups of the Alliance and gradually improve the standardization of the construction and development of the CNGB. Joining domestic and foreign scientific research organizations and standardization agencies and participating in related projects will promote and support the rapid development of the CNGB, and will help to create and transfer transferable technology standards to the unique technologies in the life sciences field, and also accelerate the integration of BT and IT, build a broader development platform for cross-disciplinary talents.
China National GeneBank (CNGB) series database adding new members
May 8 , 2018To provide a uniform external shared portal CNGB is constructed for biological data sharing and application service which contains a data layer (data warehouse, data mart, database cluster, index cluster and computing cluster) and a application layer (search engine, data analysis, authorization management, data submission and download services). The data storage is opened at several levels of granularity through API to support petabyte-scale biological data sharing. In addition, we have presented a convenient and fast online submission platform named CNGB Nucleotide Sequence Archive (CNSA) to archive raw sequencing data including project, samples, experiments, assemblies and other support data. It provides data submission, data download and data management services for researchers all over the world.
1 Pan Immune Repertoire Database (PIRD V1.1)
Pan Immune Repertoire Database (PIRD) which focuses on human immune research has collected 1923 samples of information and 554,696,060 sequence . All of them were reads related to the BCR and TCR data including experimental and phenotype information from various diseases. This issue of PIRD V1.1 incorporates a repository that records CDR3 sequences, as well as specific disease and corresponding CDR3 information, providing support for immunological disease research. In the new version, which is under development and will be released this year, the samples and data will increase to 5000 individuals and 10TB, respectively. The PIRD aims to provide data analysis and visualization services to meet requirements for disease health researchers and clinicians in the field of disease and public health who have no muchfew computinge resources, analysis tools and data
2 Pathogen Variation Database
Pathogen Variation Database(PVD)focuses on the identification and detection of millions of pathogens in human samples containing various pathogenic genomic data and related annotation information. The PVD demonstrates the results clearly and easily by data analysis and visualization, and will provide the toxicity identification and drug resistance of some pathogenic(HBV/HIV/HCV/HP). In the future, we will offer the fast and comprehensive detection services for clinicians, patients and researchers.
3 Single Cell Database(SCDB)
The single cell database will integrate create the atlas of human cells, catalog all kinds of the body cells including subtypes, build a complete list of human cells, define human cells and construct the cell framework. The first version of single cell database demonstrates four projects including 46 samples, 30,854 cells, and 470GB data.
4 Marine Life Genome Database (MLGD)
Marine Life Genome Database (MLGD) is an on-line database aiming to provide a comprehensive knowledge and analysis for the genome of marine lives. We collected the genome, transcriptome and proteome data and information of the marine species that has been sequenced and published so far. This information is organized based on the taxonomy of marine lives. and Eeach species can be searched found and reviewed in the taxon tree. The information of each specie contains: description, reference, images, genetic information and data. Data sets can be downloaded directly or redirected to the NCBI Genome database, and we will add some on-line analysis tools in the future editions. We highly welcome cooperationcooperation to sequence and analyze new marine species for ato better understanding of the genomes of marine lives.
The species are organized based on the taxonomy information with categories from kingdom to subsection. Each category is colored differently as described in the legend. A category can be selected by searching in the form or by clicking on the nodes in the taxon tree. The scale of the taxon tree can be adjusted by rolling the mouse, and the taxon tree can also be moved by clicking and dragging. At present, 472547 species information, 7538 genomic data, and 25514 image information have been collected in MLGD.
5 BLAST
CNGB is developing a high-performance sequence searching service for researchers. Now the public beta version is based on NCBI BLAST+ 2.6.0, and integrated with most of NCBI BLAST databases and some of the CNGB public data. In the second half of this year, we will make more effort on optimizing the sequence service based on parallel computing method, collecting the new high-quality datasets, providing the visualization function, and releasinge the stable version.
CNGB construct different topics databases including tumor diseases, population polymorphism, biodiversity, microbiological and others, to provide data sharing systems and communities to meet the needs of researchers in different areas, to enhance the data value and promote data development application.
Biological diversity
1KITE: 1K Insect Transcriptome Evolution
B10K: Bird 10K Genomes
FishT1K: Transcriptomes of 1,000 Fishes
MilletDB: Millet DataBase
MLGD: Marine Life Genome Database
OneKP: 1000 Plants
MT10K: 10K Mitochondrion Genome
ADD: Agriculture BioDiversity Database
10KP: 10,000 plants
Health & Disease
BDDB: Birth Defects Database
DHGV: Database of Human Genetic Variations
DISSECT: Data Integration Solution for Systematic Exploration of Cancer Traits
GDRD: Genetic Disease and Rare Disease database
GeMap: Human omics-scale annotation system
MDB: Microbiome Database
ICGC: ICGC Data Portal China Mirror
PIRD: Pan Immune Repertoire Database
PVD: Pathogen Variation Database
SCDB: Single Cell DataBase
Service
Biomigo: Biomigo
BLAST: The public beta version of high-performance sequence alignment service
CNSA: CNGB Nucleotide Sequence Archive
GigaDB: GigaDB
The China National GeneBank (CNGB) Officially Join American Children`s Brain Tumor Tissue Consortium (CBTTC) to establish an International Children`s Brain Tumor Disease Data Center in China
May 08 , 2018On May 8, 2017, the CNGB and the Neurosurgery Center of Beijing Tian Tan Hospital joined the CBTTC which also officially announced that the CNGB became a new satellite member. So far, the CBTTC already has 15 members from Europe, Asia, and America respectively. Countries will work together to advance children’s brain tumor disease research and open a new chapter in children’s health.
The CNGB will establish the China Children`s Brain Tumor Disease Data Center with the CBTTC to help the effective data accumulation of children`s brain tumor disease in China. This will help researchers better master, share, and analyze children`s cancer data. The CNGB calls for more Chinese hospitals to join the program, share the research results, promote the rapid development of life sciences, work hard to solve the globally shared health challenges, and make active contributions to improve human well-being. The CNGB relies on its own large samples and big data platform to accelerate the scientific research and clinical transformation of children`s brain tumors and other children`s tumors, help to eliminate diseases, and commit to realize “owned by all, shared by all, and completed by all” of the genetic resources.
CNGB is about to announce strategic partnership with the GISAID Initiative.
2020-03-12CNGB is about to announce strategic partnership with the GISAID Initiative.
By Mar 11, 2020, GISAID’s EpiCoV™ database has accumulated 461 hCoV-19 genome sequences submitted by researchers across the world.
To support the research on hCoV-19, the CNGB data platform (CNGBdb) published four genome assemblies of hCoV-19 on Jan 22, 2020 and later launched a hCoV-19 database integrating hCoV-19 genome sequence data from different platforms, hCoV-19 potential intermediate host datasets, and information of published research papers. By Mar 11, the CNGBdb database has collected 118 hCoV-19 genome sequences, of which 16 were published directly on CNGBdb.
CNGB has been playing a key role in the fight against COVID-19 in Guangdong Province, including the 20 million population of Shenzhen. As one of the network of emergency COVID-19 testing laboratories BGI established throughout the country, the Shenzhen Dapeng “Huo-Yan” ("Piercing Eyes") Lab came into service on Feb 15, 2020 and has provided over 80,000 tests for Shenzhen and the surrounding cities. The lab has now developed a throughput capability of up to 10,000 sample tests per day.
China National Genebank (CNGB) announced a strategic partnership with the GISAID Initiative.
2020-03-16On Mar 16, 2020, China National Genebank (CNGB) announced a strategic partnership with the GISAID Initiative.
Shenzhen, March 16, 2020 - China National Genebank (CNGB) announced its strategic partnership with the GISAID Initiative, noted as a key player in global health security for its data-sharing program, enabling near real-time surveillance to respond to and mitigate seasonal and pandemic influenza and now the hCoV-19 pandemic. Comprised of 14,000 collaborating researchers and 1,500 contributing institutions world-wide, GISAID’s unique, sharing mechanism is enabling immediate progress, for example in the understanding of the new COVID-19 disease and in the research and development of candidate medical countermeasures.
The CNGB partnership with GISAID will facilitate access and use of hCoV-19 data for Chinese researchers, and provide a safe data sharing environment where the interests of data submitters are safeguarded, by providing also effective measures for the acknowledgement of laboratories providing viruses or contributing genomic data and IP Rights.
GISAID’s EpiCoV™ database for hCoV-19 data already surpassed conventional platforms, such as the public-domain archives, where access to data takes place anonymously, with no protections of owners’ interests and no transparency on the use of data.
CNGBdb sets up a mirror site of Human Cell Landscape, Research findings were published online in Nature.
2020-03-25On Mar 25, 2020, a scheme for the human cell landscape, constructed jointly by several research institutions including the School of Basic Medical Sciences and the First and Second Affiliated Hospitals of Zhejiang University, was published online in Nature. To allow public access to the data resource, the research team built a human cell landscape database ( http://bis.zju.edu.cn/HCL/ ), with a mirror site on CNGBdb (https://db.cngb.org/HCL/ )。
The hCoV-19 Genome Analysis Platform jointly developed by CNGB and BGI Chain is officially launched and made available in CNGBdb.
2020-03-27On Mar 27, 2020, the hCoV-19 Genome Analysis Platform ( https://db.cngb.org/virus/secure_evolution/visualization ) jointly developed by China National GeneBank and BGI Chain is officially launched and made available in China National GeneBank DataBase (CNGBdb). This analysis platform, which is the first in China based on blockchain and secure multi-party computing technology, allows users to collaborate with other researchers in data analysis and share the results without revealing their own data. It will further promote the real-time sharing of novel coronavirus genome data and related evolutionary analysis results, and provide strong support for epidemic prevention and control.
CODEPLOT awarded Top Ten Innovative Products at 2020 Nanjing International Life and Health Conference
2020-12-11From December 9th to 11th, the 2020 Nanjing International Life and Health Conference and Expo, co-organized by the Nanjing Jiangbei New District Management Committee, the Editorial Board of Progress in Pharmaceutical Sciences and the Chinese Non-government Medical Institutions Association, was held in Nanjing, Jiangsu Province. China National GeneBank DataBase attended the conference and presented the newly-released CODEPLOT computing platform at its innovative product launch session, drawing wide attention from experts and enterprises in the field of healthcare. CODEPLOT was awarded one of the Top Ten Innovative Products.
Dr. Cong Tan, Associate Researcher at China National GeneBank, gave a speech titled "China National GeneBank - CODEPLOT". He introduced the general framework and functions of CODEPLOT and displayed several data analysis examples. In addition, Dr. Tan also introduced CNGBdb's contribution to the global combat against the COVID-19 epidemic, including establishing a partnership with GISAID (Global Initiative on Sharing All Influenza Data), building VirusDIP (Virus Data Integration Platform), and developing the Multi-party Genome Analysis Platform for COVID-19 Virus.
New England Journal of Medicine publishes latest research on Covid-19 vaccine
2020-12-11On December 11, Beijing time, the New England Journal of Medicine published online the results of the Phase III clinical trial of the BNT162b2 mRNA Covid-19 Vaccine. A two-dose regimen of BNT162b2 conferred 95% protection against Covid-19 in persons 16 years of age or older. Safety over a median of 2 months was similar to that of other viral vaccines.
BNT162b2 was jointly developed by BioNTech and Pfizer. Its development was initiated on January 10, 2020, when the SARS-CoV-2 genetic sequence was released by the Chinese Center for Disease Control and Prevention and disseminated globally by the GISAID (Global Initiative on Sharing All Influenza Data) initiative.According to the NEJM paper,this rigorous demonstration of safety and efficacy less than 11 months later provides a practical demonstration that RNA-based vaccines, which require only viral genetic sequence information to initiate development, are a major new tool to combat pandemics and other infectious disease outbreaks.
China National GeneBank takes lead in joining Privacy-Preserving Computing Alliance, Multi-party Genome Analysis Platform for COVID-19 Virus recognized as benchmark case
2020-12-18On December 18, the Data Asset Management Conference 2020, co-hosted by the China Academy of Information and Communications Technology (CAICT), the China Communications Standards Association (CCSA) and the Beijing Big Data Centre, was held in Beijing. At this conference, the Privacy-Preserving Computing Alliance was formally established, and China National GeneBank (CNGB) became one of the first organizations to join the alliance.
The Multi-party Genome Analysis Platform for COVID-19 Virus, jointly developed by CNGB and MGI, was selected as a benchmark case in the 2020 Big Data "Galaxy" Case Collection activity, which was co-organized by the CAICT and the CCSA’s Big Data Technology Standard Promotion Committee (CCSA TC601). In the critical period of the fight against COVID-19, the Multi-party Genome Analysis Platform for COVID-19 Virus promotes the real-time sharing of COVID-19 Virus genome data and related evolutionary analysis results, and thus provides comprehensive and effective data support for assessing epidemic risks, initiating public health response measures, and formulating medical countermeasures.