CNGB Nucleotide Sequence Archive (CNSA) released

October 25, 2017CNGB Nucleotide Sequence Archive

On October 25, 2017, China National GeneBank released CNGB Nucleotide Sequence Archive (CNSA). CNSA is a convenient and fast online submission system for biological research projects, samples, experiments and other information data. CNSA is committed to the storage and sharing of biological sequencing information and data, and is designed to provide global researchers with the most comprehensive data and information resources, enabling researchers to access and use data easily and deeply.

With the development of biotechnology, a large number of biological research data have been produced. The massive achievement need to be shared, bring data security management and efficient transmission bottlenecks. CNSA established by China National GeneBank, built for the big data of life science, will solve the problems completely.

Combined with the international authority of the data structure standards to meet the global share of scientific research

CNSA accepts the submission of the raw reads and other support data, integrating with INSDC and Datacite standard, sharing different types of research and scale data.

Following the International data open protocol, Serving as a complement to the literature publication process of scientific research achievement all over the world.

CNSA follows the international data open protocol such as Fort Lauderdale Agreement, NHGRI Rapid Data Release Policies, Joint Data Archiving Policy, CC0-No Rights Reserved, accepting the submission of global scientific research sequencing data (including raw data and other support data), its data submission service can be used as a supplement to the literature publishing process to support early data sharing.

Following the user`s stated data permissions and rights constraints.

CNSA follows the "Interim Measures for the Management of Human Genetic Resources" and ethical norms of users` countries. Researchers need to send an electronic copy of the document that the Ethics Committee agrees to approve to thedatasubs@cngb.org. For the data related to the collection, sale, export and exit approval of human genetic resources, researchers need to send an electronic copy of the document which the relevant department of human genetic resources management approve (Eg, a regional or national country with a human genetic resources management approach).

Ensuring a level of security,taking into account the categories of data

CNSA combines the data types and processing methods, using the corresponding technical and management measures to ensure that different levels of security.

Using high-performance distributed archiving system.

CNSA uses high-performance distributed for data archiving, with independent high availability backup storage system for secure data storage.

Having the high-speed internet network and logistics network

CNSA Relies on the high-speed internet network and logistics network of BGI and CNGB, covering the global multi-center, synchronizing the data to the to the global public databases quickly.

Having a full-text search engine

CNSA has a full-text search engine which can support Petabytes of data, combine any keywords and fast position.

Providing Localized Chinese language services, fastest feedback, zero-distance communication

CNSA provides Chinese and English artificial bilingual services and can contact us by phone, email, etc., to achieve barrier-free and zero-distance communication.

CNSA Quick Start Guide :

1. Raw sequence data submission

Raw data refers to all the original data generated by a sequencing without any filter theoretically.For raw sequence data submission, CNSA integrates data standards and structure of INSDC for data review and archiving, including projects, samples, experiments and data submission.

Raw sequence data submission process

Fig 1. Raw sequence data submission process

After the raw data and related metadata has been submitted and reviewed by data administrator, CNSA will synchronize these data to ENA (European Nucleotide Archive) public database to obtain the ENA accession ID as ENA broker by default, and automatically return the ID to CNSA in which submitters can view directly on the overview page in related modules. If the submitted data requires permission control, or needs to be uploaded to NCBI SRA (Sequence Read Archive, National Center for Biotechnology Information) or DDBJ DRA (Sequence Read Archive, DNA Data Bank of Japan), please contact the administratordatasubs@cngb.org.

2. Other support data submission

Other support data except the raw reads, which is related to articles or research, includes but not limited to process and result data, analysis methods, software programs, image files, audio files, video files, imaging files, electronic charts and word documents. CNSA cooperates with Gigascience GigaDB to archive the support data. With a link to DataCite, each dataset will be assigned with a DOI which can be directly referenced(Fig 2).

Other support data submission process

Fig 2. Other support data submission process

3. Data search and download

With the full-text search engine on the home page of CNSA, users can search with any combined keywords, obtain the retrieval results quickly, locate and download the related data.
Users can download data on the Run page or the Assembly page by clicking the accession ID of Run or Assembly which can be acquired through the full-text search engine on the home page.
CNGB Nucleotide Sequence Archive (CNSA) link: https://db.cngb.org/cnsa
Please contact us if any problem or suggestion.
Email:datasubs@cngb.org