-
Purpose
The China National GeneBank Sequence Archive (CNSA) endeavors to ensure the long-term
preservation and accessibility of biological sequence data, thus facilitating and promoting
scientific research in biology. This preservation plan outlines CNSA's protocols for data
archiving, curation, preservation, and sharing, aiming to safeguard data from loss or
corruption, maintain accessibility for researchers, and ensure proper curation to preserve
data quality and utility.
-
Ethics and Legal Compliance
CNSA complies with the laws and regulations of the People's Republic of China, international
guidelines, and industry best practices for the responsible and ethical preservation of
biological data. When submitting data to CNSA, depositors must comply with local
regulations, ethics, and laws related to Human Genetic Resources. Data anonymization or
de-identification is undertaken to uphold privacy standards.
-
Data Standards
CNSA adopts data and metadata submission standards from prestigious consortiums and
organizations like INSDC, DataCite, GA4GH, GGBN, and others. Depositors are required to
provide comprehensive metadata to ensure long-term preservation and understandability. CNSA
supports commonly used data formats (FASTQ, FASTA, VCF, etc.) and offers guidance to
depositors for data and metadata submission. CNSA is committed to updating its standards in
alignment with any changes from these international organizations.
-
Data Archive and Curation
CNSA supports the archival of diverse data types, such as raw sequencing data, assembly,
variation information, metabolism data, single-cell data, and other sequence data. Automatic
and manual reviews are conducted on submitted data to ensure its quality. Data integrity
during each submission and transfer activity is ensured by applying MD5 checksums.
-
Accessibility
To promote data sharing and reuse while protecting data privacy and security, CNSA provides
different levels of access privileges. Public data are openly accessible, while controlled
data access is granted upon request to users. The access privilege is chosen by the data
depositor.
-
Security
Archived data is backed up at a geographically separate data center, with two copies
periodically validated using MD5 checksums. CNSA regularly checks storage media conditions
and replaces any defective units. Data recovery from redundant copies is carried out in case
of media failure. Security measures, potentially including encryption of sensitive data, are
taken to safeguard data, and these activities are conducted by staff with appropriate
security skills, which are updated through continuous training. CNSA has a detailed plan
outlining the recovery procedures for data loss events such as technical failures, or
cyber-attacks.
-
Long-term Preservation and Migration Strategy
CNSA commits to a strategy ensuring long-term data preservation, which includes the
potential migration of data to new systems as technology evolves. This ensures that the data
remains accessible and usable in the future.
-
Review and Update
CNSA's preservation plan is a living document that undergoes regular reviews and updates
with a designated team to ensure its continued effectiveness and relevance.