Handbook

About CNSA

CNGB Sequence Archive (CNSA) is a convenient and fast online submission system for biological research projects, samples, experiments, variations and other information data. Based on the International Nucleotide Sequence Database Collaboration (INSDC) standard and DataCite standard, accepting the submission of global scientific research sequencing data, information and analysis result data, its data submission service can be used as a supplement to the literature publishing process to support early data sharing. CNSA is committed to the storage and sharing of biological sequencing data, information and analysis result data, and is designed to provide global researchers with the most comprehensive data and information resources, enabling researchers to access data more easily and facilitate data reuse.

So far, CNSA has archived XXXX TB data, XXX TB public data, XXXX projects, XXXXX samples, XXXXX experiments, and XXXXX runs.

So far, CNSA has supported XX articles, XX magazines, XX organizations.

Please view the Statistics for detailed statistics.

Handbook (simple version)

Download handbook (simple version)

CNSA Handbook (simple version in English)

Register/Login

Please use the email or mobile number to register/login on the registration page and fill in the submitter’s information.

Enter the submission portal

Click “Submit” on the CNSA homepage or click “Submission portal” on the homepage navigation bar to enter the Submission portal page.

Submit project

1. Enter the submission process

Click “Project” on the Submission portal page to enter the submission process.

2. Submit project information

Select Data management form -> fill in the basic information -> fill in the details -> overview -> submit

Notes:

  1. The first step in project submission requires the choice of Data management forms. If you choose "Public" or "Controlled", the release date can refer to the date the article will be published, and the recommendation is later than the date the article will be published.
  2. The information of the article also can be supplemented after the article is published.
  3. After the project is submitted successfully, you can get the CNSA assigned project accession (prefixed with CNP) in “My submission-Project”.
Submit review materials

After the project is submitted successfully, the system will send the “Data Submission Review Application Form” to your email address. Please send the completed form and the relevant review materials required in the form to the mailbox (cngb-ebb@genomics.cn) of CNGB Bioresource Sharing Compliance Center (BSCC) as soon as possible. After the material review is passed, the data administrator will conduct the project review. If you have questions about the review materials, please contact cngb-ebb@cngb.org.

Submit sample

1. Enter the submission process

Click “Sample” on the Submission portal page to enter the submission process.

2. Submit sample information

If you submit only one sample at a time, we recommend that you choose a single submission method. If you submit multiple samples at a time, we recommend that you choose the batch submission method.

(1)Single submission: Select "Submit a single sample" -> select sample type -> fill in sample attributes -> Fields pass check-> overview -> submit

(2)Batch submission: Select "Submit batch samples" -> Select sample type -> Download template -> Upload completed template -> Template pass check -> Submit

Notes:

  1. Please select the sample type correctly and you can't modify it by yourself after submitting.
  2. The sample name cannot be duplicated.
  3. When filling out the batch template file, please view the related description and field comments first. If some required fields are missing, you can fill in 'not collected', 'not applicable' or 'missing'. If the taxonomy ID or scientific name of the organism is unclear, you can enter the single submission process to search and ensure that the information is correct.
  4. Collection date supports 4 data formats, YYYY, YYYY-MM, YYYY-MM-DD, YYYY-YYYY.
  5. If the number of samples exceeds 1000, or if the file size exceeds 100 KB, please submit it in multiple processes.
  6. After the sample is submitted successfully, you can get the sample accession assigned by CNSA (prefixed by CNS) in “My submission-Sample”.
Submit experiment/run

1. Enter the submission process

Click “Experiment/run” on the Submission portal page to enter the submission process.

2. Submit data files and metadata

If you submit only one experiment/run at a time, we recommend that you choose a single submission method. If you submit multiple experiments/runs at a time, we recommend that you choose the batch submission method. 

(1)Single submission: Select submission type (Submit a single experiment/run) -> Fill in basic information -> Fill in metadata -> Metadata pass check -> Submit data files -> Data files pass check -> Overview ->Submit

(2)Batch submission: Select submission type (Submit batch experiments/runs) ->Upload data files->Download metadata template->Upload completed metadata template->Metadata pass check ->Data files pass check -> submit

Notes:

  1. It is recommended to upload the data files first. All users can upload data via FTP or mail the hard drive.

    (1)After the data is uploaded to the FTP personal directory, the data is uploaded.

    (2)FTP server, username and password can be viewed in the submission process or "My service". Each user has a unique FTP account.

  2. When filling out the batch template file, please view the related instructions and field comments first. One line represents a run. If a sample is associated with multiple data files, please submit them in multiple lines to ensure that the experiment information is consistent and the library name is unique. The file name and MD5 value of each data file are unique.
  3. The metadata file size cannot exceed 100KB.
  4. After the experiment/run is submitted successfully, you can obtain the accession assigned by CNSA in “My submission- Experiment/run” (Experiment: prefixed with CNX; Run: prefixed with CNR)
Submit assembly

1. Enter the submission process

Click “Assembly” on the Submission portal page to enter the submission process.

2. Submit data files and metadata

If you submit only one assembly at a time, we recommend that you choose a single submission method. If you submit multiple assemblies at a time, we recommend that you choose the batch submission method. 

(1)Single submission: Select submission type (Submit a single assembly) -> Fill in basic information -> Fill in metadata -> Metadata pass check -> Submit data files -> Data files pass check -> Overview ->Submit

(2)Batch submission: Select submission type (Submit batch assemblies)->Upload data files->Download metadata template->Upload completed metadata template->Metadata pass check ->Data files pass check -> submit

Notes:

  1. It is recommended to upload the data files first (currently only supports the fasta format). All users can upload data via FTP or mail the hard disk.
  2. When filling out the batch template file, please view the related instructions and field comments first. One line represents an assembly, ensuring that the assembly name of each assembly is unique, and the file name and MD5 value of each data file are unique.
  3. The metadata file size cannot exceed 100KB.
  4. After the assembly is submitted successfully, you can get the CNSA assigned assembly accession (prefixed with CNA) in “My submission-Assembly”.
Submit variation

1. Enter the submission process

Click “Variation” on the Submission portal page to enter the submission process.

2. Submit data files and metadata

(1)Submit SNP: Select variation type (SNP) -> Upload data files to ftp-> Download metadata template -> Upload completed metadata template -> Metadata pass check -> Data files pass check -> Submit

(2)Submit SV: Select variant type (SV) -> Upload data files to ftp (optional) -> Download template -> Upload completed template -> Pass check -> Data files pass check (if submitted) ->Submit

(3)Submit CAHV: Select variation type (CAHV)->Download template->Upload completed template-> Pass check->Submit

Notes:

  1. If the selected variation type is SNP, you are advised to first upload the data files (currently only supports VCF format). All users can upload data via FTP or mail the hard disk.
  2. When filling out the batch template file, please view the description and field comments first. If you need to upload VCF files, make sure that the file name and MD5 value of each data file are unique.
  3. After the data has been reviewed, you can get the variation accessions (prefixed by varc) assigned by CNSA in “My submission-Variation”.
Memo

1. Data release

The release date can be set in the “Data Management” in the project submission process. Only the date today or within two years after today can be selected. The data will not be made public until the submitted data is reviewed by the reviewer and reaches the release date set by the user. If the data is about to reach the release date, the system will send a reminder email 15 days in advance.

2. Modify and delete

On the "My Submission" page, you can click the "pencil icon" in the status column to modify. If the status column does not have a "pencil icon" , please send an email to datasubs@cngb.org and indicate the submission ID or data accession and the reason for the modification.

(1)Modify release date

  • If the status of the project is “Unfinished”, go to “My submission”, and click “pencil icon” in the project status column to enter the modification process.
  • If the status of the project is "Processing" or "Processed", you can modify the release date by clicking the date in the release date column or the "pencil icon" .
  • If the status of the project is “Public” or “Controlled”, the release date cannot be modified by yourself. If you need to make changes, please send an email to datasubs@cngb.org and indicate the project accession and the reason for the change.

(2)Delete submission

If the status is “unfinished”, click the “trash can icon” to delete the submission. If it is in other status, please send an email to datasubs@cngb.org and indicate the submission ID and the reason for the deletion.

3. Data association

The information of the sample will only be triggered after the experiment/run or assembly is submitted; only after the experiment/run or assembly is reviewed and the data is public, all the information associated with the project can be retrieved according to the project accession. Otherwise, the information of the project and sample can only be retrieved separately, and no association will occur.

4. MD5 check

Please fill in the file name and MD5 value of the uploaded data file, and then click "Check", there are maybe four statuses:

(1)Not uploaded: The data file is not uploaded or being uploaded. If you have uploaded the data file, it still shows "Not uploaded". Please click "Check" later.

(2)Calculating: The data file has been uploaded, but the MD5 value of the file has not been calculated or is being calculated. Please click "Check" later.

(3)MD5 mismatch: The MD5 value calculated by the system is inconsistent with the MD5 value you filled in. If the data file is only uploaded a part and in check, the MD5 value calculated by the system will be inconsistent with the one you filled in. Please click "Check" later. If you click "Check" after a long time (such as half an hour), the status still shows "MD5 mismatch", please recalculate and fill in the MD5 of the data file. If the status still shows "MD5 mismatch", please contact datasubs@cngb.org and indicate the file name in the email.

(4)Check finished: The data file has been uploaded and passed the check.

5. View accessions and submitted metadata

On “My submission” page, you can directly view the accession of a single submission, download the batch-submitted attribute files with accessions in the status column, or click on the completed submission ID to view the details.

6. Contact us

If you have any questions, please contact the administrator at datasubs@cngb.org or 0755-33945586.

Metadata

Metadata is data that describes an information resource or a data object.

Currently, CNSA metadata includes submitter, project, sample, experiment, assembly, and variation. Below is an introduction to each data type and the required fields (required fields with *).

Submitter

The submitter submits data on project, sample, experiment, run, assembly and variation to the CNSA. A submitter can submit multiple data types, update and modify data, set Data management form, etc.

FieldDescription
*First nameFirst (given) name of the submitter.
Middle nameMiddle name of the submitter.
*Last nameLast (family) name of the submitter.
*Primary E-mailPrimary Email address of the submitter.
Secondary E-mailPrimary Email address of the submitter.
*Submitting organizationFull name of submitter’s organization.
Submitting organization URLThe URL of submitter’s organization.
*DepartmentThe department of the submitter.
PhoneThe phone number of the submitter.
FaxThe Fax number of the submitter.
*StreetThe street name of the submitter.
*CityThe city name of the submitter.
State/ProvinceThe state/province of the submitter.
*CountryThe Country/Region of the submitter.
*Postal codeThe Postal code of the submitter.
Project

The definition of a set of related data, a 'project' is very flexible and supports the need to define a project using different parameters. For example, Project records can be established for:

  • Genome sequencing and assembly
  • Metagenomes
  • Transcriptome sequencing and expression
  • Targeted locus sequencing
  • Genetic or RH Maps
  • Epigenetics
  • Phenotype or Genotype
  • Variation detection
  • Project represents a submission, initiative, or group of data that is logically related in some manner, or is of interest to retrieve as a distinct dataset. A project may be identified in terms of distinctions in the type of data produced.

    Data management form

    There are three Data management forms of CNSA: Public, Controlled and Private. The data submitter can choose a Data management form when submitting a project.

  • Public: That is, the metadata and data files associated with the project will be public. The data submitter needs to set a release date in the project's submission process, and all metadata and data files associated with the project will be public on that release date. The public data will be displayed on the China National GeneBank DataBase (CNGBdb) and will be open to the world, and users can access or use it freely at CNGBdb.
  • Controlled: That is, the metadata associated with the project will be public and the data files will be controlled. The data submitter needs to set the release date of metadata in the project's submission process, and all metadata associated with the project will be public on that date. Only metadata for project, sample and other data types of controlled data will be displayed on CNGBdb, and data files will not be displayed on the platform. Other registered users can apply for access to controlled data. Data applicants must use the data after the data submitter have reviewed and approved, and the access or data files will be granted to the data applicant by the data submitter.
  • Private: That is, the metadata and data files associated with the project are controlled. Private data will not be displayed on CNGBdb and will not accept any access and download requests.
  • General info

    *Project title

    Short descriptive name of the project such as a phrase or short sentence for public display.

    Project name

    A short name for the study.

    *Public description

    A description (a paragraph) of the study goals and relevance. Provide enough information (more than 100 characters) in the description for other users to interpret the data.

    *Relevance

    The primary general relevance of the project.

    RelevanceDescription
    Agricultural
    Environmental
    Evolution
    IndustrialCould include bio-remediation, bio-fuels and other areas of research where there are areas of mass production.
    Medical
    Model organism
    OtherUnspecified major impact categories to be defined in the "Relevance description".

    *Relevance description

    Describe the relevance when the Other is selected.

    *Functional annotation

    You are asked if the project will contain functional annotation. If yes, then a unique locus tag prefix will be created.

    *Locus tag prefix

    The prefix of a locus tag. Locus_tags are identifiers that are systematically applied to every gene in a genome. All components of a project (such as multiple chromosomes or plasmids, etc) should use the same locus_tag prefix.

    Format requirements:

    (1)It can contain only alpha-numeric characters, and must be at least 3 characters long.

    (2)All letters are capitalized,and it should start with a letter, but numerals can be in the 2nd position or later in the string. (eg. A1C).

    (3)There should be no symbols, such as -_* in the prefix.

    External links

    The web sites that are related to this project.

    FieldDescription
    URLDisplay name of web site that is related to this project.
    Link descriptionURL of web site that is related to this project.

    Related projects

    The projects that are related to this project.

    FieldDescription
    Project acessionRelated Project accession ID
    Project descriptionDescription of related Project

    Grants

    The funding sources of this project.

    FieldDescription
    Grant numberGrant number is collected to support researches.
    Grant titleGrant title may also support researches.
    Institution abbreviationThe abbreviation of institution supported the researches.
    InstitutionThe institution supported the researches.

    Consortium

    If project is carried out as part of a consortium, please provide the related consortium information.

    FieldDescription
    Consortium nameIf project is carried out as part of a consortium, provide the consortium name.
    Consortium URLIf the consortium maintains a web site, provide the URL.

    Data providers

    Indicate the data provider (data submitter) if it is someone other than the submitting organization or consortium.

    FieldDescription
    Data providerData provider
    Data provider URLIf the data provider maintains a web site, provide the URL.
    Detailed information

    Project type

    *Project data type

    A general label indicating the primary study goal. Select appropriate types.

    Project data typeDescription
    Genome sequencing and assemblyWhole, or partial, genome sequencing project (with or without a genome assembly).
    Raw sequence readsSubmission of raw sequencing information as it comes out of machine.
    Genome sequencingGenome sequencing
    AssemblyAssembly
    Clone endsClone-end sequencing project
    EpigenomicsDNA methylation, histone modification, chromatin accessibility datasets
    ExomeExome resequencing project
    MapProject that results in non-sequence map data such as genetic map, radiation hybrid map, cytogenetic map, optical map, and etc.
    MetagenomeSequence analysis of environmental samples
    Metagenomic assemblyMetagenomic assembly
    Phenotype or GenotypeProject correlating phenotype and genotype
    ProteomeLarge scale proteomics experiment including mass spec. analysis
    Random surveySequence generated from a random sampling of the collected sample; not intended to be comprehensive sampling of the material.
    Targeted loci culturedTargeted loci cultured
    Targeted loci environmentalTargeted loci environmental
    Targeted Locus (Loci)Project to sequence specific loci, such as a 16S rRNA sequencing
    Transcriptome or Gene expressionLarge scale RNA sequencing or expression analysis. Includes cDNA, EST, RNA_seq, and microarray.
    VariationProject with a primary goal of identifying large or small sequence variation across populations.
    OtherA free text description is provided to indicate Other data type

    * Project data type description

    Describe the project data type when the Other is selected.

    *Sample scope

    The scope and purity of the biological sample used for the study.

    1. Choose “Multiisolate” as the Scope when the goal of the research is to compare multiple individuals or strains of the same species, eg, in a Variation or Genome sequencing and assembly project.
    2. Choose “Multispecies” when different species are being examined.
    3. Choose “Monoisolate” if the goal is to make a single genome or transcriptome assembly, even if more than one individual was the source of the DNA or RNA.
    Sample scopeDescription
    Monoisolatea single animal, cultured cell-line, inbred population (or possibly a heterogeneous population when a single genome assembly is generated from the pooled sample; not preferred).
    Multiisolatemultiple individuals, a population (representative of a species). To be used for variation or other sequence comparison projects, not when multiple genomes will be annotated. Make separate monoisolate projects when more than one genome will be annotated.
    Multispeciessample represents multiple species.
    Environmentthe species content of the sample is not known.
    Syntheticthe sample is synthetically created by a machine.
    Otherspecify the sample scope that was used.

    * Target description

    Describe the target description when the Other is selected.

    Publications

    FieldDescription
    PubMed IDThe PubMedID will be used to populate the publication information.
    DOIProvide a DOI if a PubMed ID is not available. Provide the additional reference information. If you choose DOI, you need to fill in the following information.
    *Reference titleA title of reference.
    *Journal titleA title of journal.
    *YearYear of publication.
    *VolumeJournal volume.
    *IssueJournal issue.
    *Start page numberStart page number of publication.
    *End page numberEnd page number of publication.
    *AuthorName of author.
    *InstitutionInstitution of author.
    Sample

    Description of biological source material; each physically unique specimen should be registered as a single Sample with a unique set of attributes.

    General information

    Submission type

    Submission typeDescription
    Submit batch samplesUsers will be asked to upload a text file that describes each of your samples and their attributes.
    Submit a single sampleUsers will be asked to manually complete a web form to describe one sample and its attributes.

    Sample type

    In preparing your submission, please refer to this attributes list and Sample examples and fill in the relevant fields. Select the package that best describes your samples

    Sample typeDescription
    Clinical or host-associated pathogen
    Environmental, food or other pathogen
    Combined pathogenBatch submissions that include both clinical and environmental pathogen.
    Microbial sampleUse for bacteria or other unicellular microbes when it is not appropriate or advantageous to use MIxS, Pathogen or Virus packages.
    Model organism or animal sample Use for multicellular samples or cell lines derived from common laboratory model organisms, e.g., mouse, rat, Drosophila, worm, fish, frog, or large mammals including zoo and farm animals.
    Metagenome or environmental sampleUse for metagenomic and environmental samples when it is not appropriate or advantageous to use MIxS packages.
    Invertebrate sampleUse for any invertebrate sample.
    Human sample Only use for human samples or cell lines that have no privacy concerns. For samples isolated from humans use the Pathogen, Microbe or appropriate MIxS package.
    Plant sampleUse for any plant sample or cell line.
    Virus sampleUse for all virus samples not directly associated with disease.
    GSC MIxS air
    GSC MIxS built environment
    GSC MIxS host associated
    GSC MIxS human associated
    GSC MIxS human gut
    GSC MIxS human oral
    GSC MIxS human skin
    GSC MIxS human vaginal
    GCS MIxS microbial mat biolfilm
    GSC MIxS miscellaneous natural or artificial environment
    GSC MIxS plant associated
    GSC MIxS sediment
    GSC MIxS soil
    GSC MIxS waste water sludge
    GSC MIxS water
    Beta-lactamaseUse for beta-lactamase gene transformants that have antibiotic resistance data.

    Sample attributes

    A major component of a Sample record is the sample attributes section. Attributes define the material under investigation and can include sample characteristics such as cell type, collection site and phenotypic information like disease state.

    Sample attributes are captured as structured name: value pairs, for example, tissue: liver. The first targeted dictionaries implemented in the Sample submission are the MIxS minimum information checklists for standardizing descriptions of genomes, metagenomes and targeted locus sequences as developed by the Genomics Standards Consortium.

    Experiment

    A description of sample-specific sequencing library, instrument and sequencing methods. An Experiment references 1 Project and 1 Sample.

    General information

    Submission type

    Submission typeDescription
    Submit batch experiments/runsUsers will be asked to upload a text file that describes each of your experiments and runs.
    Submit a single experiment/run(s)Users will be asked to manually complete a web form to describe your sequencing experiment and upload your raw sequencing reads.

    *Project accession

    Select the project this experiment affiliates.

    *Sample accession

    Select the sample this experiment uses.

    Metadata

    Experiment reuse

    Reuse information of experiment that has already been submitted. The existing experiment information will be automatically populated into cells so that users can quickly submit.

    *Data files type

    The format of sequencing data files.

    Data files typeDescription
    fastqfastq files
    Sff454 Standard Flowgram Format file
    PacBio_HDF5PacBio hdf5 Format file
    BamBinary SAM format for use by loaders that combine alignment and sequencing data

    General information

    * Platform

    The sequencing platform and instrument model.

    PlatformSequencer
    _LS454454 GS
    454 GS 20
    454 GS FLX
    454 GS FLX+
    454 GS FLX Titanium
    454 GS Junior
    ILLUMINAHiSeq X Five
    HiSeq X Ten
    Illumina Genome Analyzer
    Illumina Genome Analyzer II
    Illumina Genome Analyzer IIx
    Illumina HiScanSQ
    Illumina HiSeq 1000
    Illumina HiSeq 1500
    Illumina HiSeq 2000
    Illumina HiSeq 2500
    Illumina HiSeq 3000
    Illumina HiSeq 4000
    Illumina NovaSeq 6000
    Illumina MiniSeq
    Illumina MiSeq
    NextSeq 500
    NextSeq 550
    BGISEQBGISEQ-500
    BGISEQ-50
    BGISEQ-1000
    BGISEQ-100
    CAPILLARYAB 310 Genetic Analyzer
    AB 3130 Genetic Analyzer
    AB 3130xL Genetic Analyzer
    AB 3500 Genetic Analyzer
    AB 3500xL Genetic Analyzer
    AB 3730 Genetic Analyzer
    AB 3730xL Genetic Analyzer
    COMPLETE_GENOMICSComplete Genomics
    HELICOSHelicos HeliScope
    ABI_SOLIDAB 5500 Genetic Analyzer
    AB 5500xl Genetic Analyzer
    AB 5500x-Wl Genetic Analyzer
    AB 5500xl-W Genetic Analysis System
    AB SOLiD 3 Plus System
    AB SOLiD 4 System
    AB SOLiD 4hq System
    AB SOLiD PI System
    AB SOLiD System
    AB SOLiD System 2.0
    AB SOLiD System 3.0
    ION_TORRENTIon Torrent PGM
    Ion Torrent Proton
    Ion Torrent S5 XL
    Ion Torrent S5
    OXFORD_NANOPOREGridION
    MinION
    PromethION
    PACBIO_SMRTPacBio RS
    PacBio RS II
    Sequel
    DNBSEQDNBSEQ-G50(MGISEQ-200)
    DNBSEQ-G400(MGISEQ-2000)
    DNBSEQ-T7
    DNBSEQ-T10x4
    DIPSEQDIPSEQ-T1

    * Title

    Short text that can be used to call out experiment records in searches or in displays.

    Library

    * Library name

    Provide a name for your library if you have any.

    * Strategy

    The library strategy specifies the sequencing technique intended for this library.

    StrategyDescription
    WGARandom sequencing of the whole genome following non-pcr amplification
    WGSRandom sequencing of the whole genome
    WXSRandom sequencing of exonic regions selected from the genome
    RNA-SeqRandom sequencing of whole transcriptome
    miRNA-SeqRandom sequencing of small miRNAs
    WCSRandom sequencing of a whole chromosome or other replicon isolated from a genome
    CLONEGenomic clone based (hierarchical) sequencing
    POOLCLONEShotgun of pooled clones (usually BACs and Fosmids)
    AMPLICONSequencing of overlapping or distinct PCR or RT-PCR products
    CLONEENDClone end (5', 3', or both) sequencing
    FINISHINGSequencing intended to finish (close) gaps in existing coverage
    ChIP-SeqDirect sequencing of chromatin immunoprecipitates
    MNase-SeqDirect sequencing following MNase digestion
    DNase-HypersensitivitySequencing of hypersensitive sites, or segments of open chromatin that are more readily cleaved by DNaseI
    Bisulfite-SeqSequencing following treatment of DNA with bisulfite to convert cytosine residues to uracil depending on methylation status
    Tn-SeqSequencing from transposon insertion sites
    ESTSingle pass sequencing of cDNA templates
    FL-cDNAFull-length sequencing of cDNA templates
    CTSConcatenated Tag Sequencing
    MRE-SeqMethylation-Sensitive Restriction Enzyme Sequencing strategy
    MeDIP-SeqMethylated DNA Immunoprecipitation Sequencing strategy
    MBD-SeqDirect sequencing of methylated fractions sequencing strategy
    Synthetic-Long-ReadBinning and barcoding of large DNA fragments to facilitate assembly of the fragment
    ATAC-seqAssay for Transposase-Accessible Chromatin (ATAC) strategy is used to study genome-wide chromatin accessibility. alternative method to DNase-seq that uses an engineered Tn5 transposase to cleave DNA and to integrate primer DNA sequences into the cleaved genomic DNA
    ChIA-PETDirect sequencing of proximity-ligated chromatin immunoprecipitates
    FAIRE-seqFormaldehyde Assisted Isolation of Regulatory Elements. reveals regions of open chromatin
    Hi-CChromosome Conformation Capture technique where a biotin-labeled nucleotide is incorporated at the ligation junction, enabling selective purification of chimeric DNA ligation junctions followed by deep sequencing
    ncRNA-SeqCapture of other non-coding RNA types, including post-translation modification types such as snRNA (small nuclear RNA) or snoRNA (small nucleolar RNA), or expression regulation types such as siRNA (small interfering RNA) or piRNA/piwi/RNA (piwi-interacting RNA).
    RAD-SeqRestriction Site Associated DNA Sequence
    RIP-SeqDirect sequencing of RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLIP).
    SELEXSystematic Evolution of Ligands by EXponential enrichment
    ssRNA-seqstrand-specific RNA sequencing
    Targeted-Capture
    Tethered Chromatin Conformation Capture
    OTHERLibrary strategy not listed (please include additional info in the "design description")

    * Source

    The library source specifies the type of source material that is being sequenced.

    SourceDescription
    GENOMICGenomic DNA (includes PCR products from genomic DNA)
    TRANSCRIPTOMICTranscription products or non-genomic DNA (EST, cDNA, RT-PCR, screened libraries)
    METAGENOMICMixed material from metagenome
    METATRANSCRIPTOMICTranscription products from community targets
    SYNTHETICSynthetic DNA
    VIRAL RNAViral RNA
    OTHEROther, unspecified, or unknown library source material

    * Selection

    The library selection specifies whether any method was used to select for or against, enrich, or screen the material being sequenced.

    SelectionDescription
    RANDOMRandom selection by shearing or other method
    PCRSource material was selected by designed primers
    RANDOM PCRSource material was selected by randomly generated primers
    RT-PCRSource material was selected by reverse transcription PCR
    HMPRHypo-methylated partial restriction digest
    MFMethyl Filtrated
    MDAMultiple displacement amplification
    MSLLMethylation Spanning Linking Library
    cDNAcomplementary DNA
    ChIPChromatin immunoprecipitation
    MNaseMicrococcal Nuclease (MNase) digestion
    DNaseDeoxyribonuclease (MNase) digestion
    Hybrid SelectionSelection by hybridization in array or solution
    Reduced RepresentationReproducible genomic subsets, often generated by restriction fragment size selection, containing a manageable number of loci to facilitate re-sampling
    Restriction DigestDNA fractionation using restriction enzymes
    5-methylcytidine antibodySelection of methylated DNA fragments using an antibody raised against 5-methylcytosine or 5-methylcytidine (m5C)
    MBD2 protein methyl-CpG binding domainEnrichment by methyl-CpG binding domain
    CAGECap-analysis gene expression
    RACERapid Amplification of cDNA Ends
    size fractionationPhysical selection of size appropriate targets
    Padlock probes capture methodCircularized oligonucleotide probes
    Oligo-dTenrichment of messenger RNA (mRNA) by hybridization to Oligo-dT.
    repeat fractionationSelection for less repetitive (and more gene rich) sequence through Cot filtration (CF) or other fractionation techniques based on DNA kinetics.
    otherOther library enrichment, screening, or selection process (please include additional info in the "design description")
    unspecifiedLibrary enrichment, screening, or selection is not specified (please include additional info in the "design description")

    * Layout

    The library layout specifies whether to expect single, paired, or other configuration of reads. In the case of paired reads, information about the relative distance and orientation is specified.

    LayoutDescription
    fragment/singleSingle-end read
    paired Paired-end reads

    * Nominal size(bp)

    The average insert size for paired reads. The insert size is the size of the DNA fragments after fragmentation (i.e. it is NOT the fragment size minus forward read size, minus reverse read size).

    Nominal standard deviation(bp)

    The standard deviation of the fragment lengths about the mean (insert size).

    Spot layout

    If technical reads (e.g. barcodes, adaptors or linkers) are included in the submitted raw sequences, a spot descriptor must be submitted to describe the position of the technical reads so that they can be removed.

    Some examples for the spot layout.

    • A[TTACG]F* : Single reads with adapter (A) of sequence [TTACG] followed by biological forward read (F).
    • A[TTACG]B[ATGC]F* : Single reads with adapter (A) of sequence [TTACG] followed by barcode (B) of sequence [ATGC] and biological forward read (F).
    • A[TTACG]B[ATGC]P[CGTTT]F* : Single reads with adapter (A) of sequence [TTACG] followed by barcode (B) of sequence [ATGC], primer (P) of sequence [CGTTT] and biological forward read (F).
    • A[TTACG]P[CGTTT]B[ATGC]F*A[GGTATTC] : Single reads with adapter (A) of sequence [TTACG] followed by primer (P) of sequence [CGTTT], barcode (B) of sequence [ATGC], 100bp biological forward read (F) and adapter (A) of sequence [GGTATTC].
    • A[TTACG]F*L[linker_sequence]F* : Mate pair reads with adapter (A) of sequence [TTACG] followed by first forward read (F), linker (L) of sequence [linker_sequence] and second forward (F) read. 
    • A[TTACG]F*P[CGTTT]L[linker_sequence]P[AGGCTC]F*A[GGTATTC] : Mate pair reads with adapter (A) of sequence [TTACG] followed by biological forward read (F), primer (P) of sequence [CGTTT], linker (L) of sequence [linker_sequence], primer (P) of sequence [AGGCTC], second biological forward read (F) and adapter (A) of sequence.
    • A[TTACG]B[ATGC]F*L[linker_sequence]F* : Mate pair reads with adapter (A) of sequence [TTACG] followed by barcode (B) of sequence [ATGC], first forward read (F), linker (L) of sequence [linker_sequence] and second forward (F) read. 

    Design description

    The goal and setup of the individual library.

    Library construction protocol

    Describes the protocol by which the sequencing library was constructed.

    Pipeline

    FieldDescription
    IndexThe index of the programs or algorithms.
    ProgramThe programs or algorithms used.
    VersionThe version of the programs or algorithms.
    Data files

    *File name

    The name of a sequence data file.

    *MD5 value

    MD5 checksum of a sequence data file.

    Status

    The status of file uploaded to FTP servers.

    StatusDescription
    Not uploadedThe data file is not uploaded or being uploaded, the system cannot find the file name.
    CalculatingThe data file has been uploaded, but the MD5 value of the file has not been calculated or is being calculated.
    MD5 mismatchThe data file has been uploaded or uploaded a part. The MD5 value calculated by the system is inconsistent with the MD5 value filled in by the user.
    Check finishedThe data file has been uploaded and check finished.
    Assembly

    An assembly is a collection of genomic sequences that are used to represent the genome of an organism.

    General information

    Submission type

    FieldDescription
    Submit batch assembliesYou will be asked to upload a text file that describes your metadata and submit your data files in batches.
    Submit a single assemblyYou will be asked to manually complete a web form to describe your Assembly and upload your data.

    *Project accession

    Select the project this assembly affiliates.

    *Sample accession

    Select the sample this assembly uses.

    Metadata

    Assembly metadata

    FieldDescription
    Assembly nameAssembly name (e.g. GRCh37.p5).
    *Molecule typeThis field should contain the in vivo molecule type of the sequence to be submitted.
    *CoverageThe average sequencing depth (e.g. 12).
    *Sequencing TechnologySequencing platform.
    *Sequencing technology descriptionDescribe the sequencing technology when “Other” is selected.
    Minimum gap lengthThe minimum stretches of NNNNNs to be considered as a gap.
    *PartialField type should be 'Yes' if genome is Partial and 'No' if genome is Complete.

    Assembly method

    FieldDescription
    *Assembly methodThe program used to generate the assembly.
    *VersionThe version of the program used to generate the assembly.
    *Assembly method descriptionDescribe the assembly method when the Other is selected.
    Data files

    *File type

    The assembly data file format.

    File typeDescription
    FastaSequence data format indicating sequence base calls. Format: a header line initiated with the > character, data lines following with base calls.

    *File name

    The name of an assembly data file.

    *MD5 value

    MD5 checksum of an assembly data file.

    Status

    The status of file uploaded to FTP servers.

    StatusDescription
    Not uploadedThe data file is not uploaded or being uploaded, the system cannot find the file name.
    CalculatingThe data file has been uploaded, but the MD5 value of the file has not been calculated or is being calculated.
    MD5 mismatchThe data file has been uploaded or uploaded a part. The MD5 value calculated by the system is inconsistent with the MD5 value filled in by the user.
    Check finishedThe data file has been uploaded and check finished.
    Variation

    CNSA accepts genomic variations from any species, including single nucleotide polymorphisms, short insertions/deletions and genomic structural variations, etc., and provides long-term stable archive accessions and data. The variation data includes Analysis, Samplesets, Subject, Call, File and Region.

    Submission template

    There are three templates for submission of variations.

    SNP_submission_template.v1.1.xlsx  is for the submission of simple and small-scale genomic variations <= 50 bp, such as single nucleotide polymorphisms (SNP), short insertions and deletions (INDEL), microsatellites, etc., which includes four parts: Analysis, Samplesets, Subject, File, and they are all required.

    SV_submission_template.v1.1.xlsx is for the submission of complex and large-scale genomic structural variations (SV) >50bp, such as insertions, deletions, duplications, inversions, translocations, mobile elements, etc., which includes six parts: Analysis, Samplesets, Subject, Call, Region, File. The information of Analysis, Sampleset, Subject are required, at least one of Call and File is required. The Region is optional.

    CAHV_Submission_template.v1.0.xlsx  is for the submission of Clinically Associated Human Variations (CAHV),including genomic variations and related phenotypes and clinical significance, etc., which includes four parts: Analysis, Samplesets, Subject, Call, and they are all required.

    Data file format

    Run

    CNSA receives five types of Data file format, including FASTQ, BAM, SFF, PacBio_HDF and CRAM.

    FASTQ format

    We recommend FASTQ format. Single and paired reads are accepted. Please note that all files must not be compressed into one file for upload. All file names in your account folder must be unique.

    • Quality scores must be in Phred scale.
    • Both ASCII and space delimitered decimal encoding of quality scores are supported. We will automatically detect the Phred quality offset of either 33 or 64.
    • No technical reads (adapters, linkers, barcodes) are allowed.
    • Single reads must be submitted using a single Fastq file and can be submitted with or without read names.
    • Paired reads must split and submitted using two Fastq files. The read names must have a suffix identifying the first and second read from the pair, for example '/1' and '/2' (regular expression for the reads: "^@([a-zA-Z0-9_-]+:[0-9]+:[a-zA-Z0-9]+:[0-9]+:[0-9]+:[0-9-]+:[0-9-]+) ([12]):[YN]:[0-9]*[02468]:[ACGTN]+$").
    • The first line for each read must start with '@'.
    • The base calls and quality scores must be separated by a line starting with '+'.
    • The Fastq files must be compressed using gzip or bzip2.
    • The regular expression for bases is "^([ACGTNactgn.]*?)$"
    • Example of FASTQ file containing single reads:
      @read_name
      GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
      +
      !''*((((***+))%++)(%).1***-+*''))**55CCF>>>>>>CCCCCCC65
      ...
    • file containing paired reads:
      @read_name/1
      GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
      +
      !''*((((***+))%++)(%).1***-+*''))**55CCF>>>>>>CCCCCCC65
      @read_name/2
      GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
      +
      !''*((((***+))%++)(%).1***-+*''))**55CCF>>>>>>CCCCCCC65
      ...

    BAM file

    Submitted BAM files must be readable with Samtools.

    BAM file names are required to end up with the .bam suffix.

    All files must not be compressed into one file for upload.

    All file names in your account folder must be unique.

    SFF format

    The SFF format is supported for the 454 and Ion Torrent platforms.

    SFF file names are required to end up with the .sff suffix.

    All files must not be compressed into one file for upload.

    All file names in your account folder must be unique.

    PacBio_HFD5 format

    PacBio_HFD5 data submissions are supported in the platform specific native format.

    One run consists of *.bax.h5, *.bas.h5 and xml. These files should be tarred and compressed.

    PacBio_HFD5 data must be submitted as a single tar.gz or tar.bz file.

    All files must not be compressed into one file for upload.

    All file names in your account folder must be unique.

    CRAM format

    CRAM is a sequencing read file format that is highly space efficient by using reference-based compression of sequence data and offers both lossless and lossy modes of compression. Please refer to CRAMv3.0 for the specific format.

    CRAM file names are required to end up with the .cram suffix.

    All files must not be compressed into one file for upload.

    All file names in your account folder must be unique.

    Assembly

    Genome assembly submissions include plasmids, organelles, complete virus genomes, viral segments/replicons, bacteriophages, prokaryotic and eukaryotic genomes. Chromosomes include organelles (e.g. mitochondrion and chloroplast), plasmids and viral segments.

    Sequences should be submitted as a Fasta file. These sequences can be either contig, scaffold or chromosome sequences.

    The submitted fasta file must be gz compressed and should specify the classification (contig, scaffold or chromosome) in the file name.

    All file names in your account folder must be unique. All files must not be compressed into one file for upload.

    Fasta format

    format:

    The sequence name is extracted from the header line starting with >.
    For example, the following sequence has name contig1:

    >contig1
    AAACCCGGG...
    Variation

    CNSA currently only accepts variation data in VCF format. Please note that your variation data needs to be converted to VCF file format. To ensure that the format of VCF file is correct, you are advised to refer to VCFv4.3.

    VCF file

    VCF is a text file format (most likely stored in a compressed manner). It contains meta-information lines (prefixed with “##”), a header line (prefixed with “#”), and data lines each containing information about a position in the genome and genotype information on samples for each position (text fields separated by tabs). Zero length fields are not allowed, a dot (“.”) must be used instead. In order to ensure interoperability across platforms, VCF compliant implementations must support both LF (\n) and CR+LF (\r\n) newline conventions.

    For example:

    ##fileformat=VCFv4.3
    ##fileDate=20090805
    ##source=myImputationProgramV3.1 ##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
    ##contig=< ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>
    ##phasing=partial
    ##INFO=< ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
    ##INFO=< ID=DP,Number=1,Type=Integer,Description="Total Depth">
    ##INFO=< ID=AF,Number=A,Type=Float,Description="Allele Frequency">
    ##INFO=< ID=AA,Number=1,Type=String,Description="Ancestral Allele">
    ##INFO=< ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
    ##INFO=< ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
    ##FILTER=< ID=q10,Description="Quality below 10">
    ##FILTER=< ID=s50,Description="Less than 50% of samples have data">
    ##FORMAT=< ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=< ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=< ID=DP,Number=1,Type=Integer,Description="Read Depth">
    ##FORMAT=< ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
    20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
    20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
    20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
    20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
    20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3

    FTP data upload

    Users can upload data files to their personal directories via FTP.

    General instructions for uploading files using a FTP client:

    1. Use your favorite ftp client, such as FileZilla.
    2. Use binary mode for file transfers.
    3. Use ftp://ftp.cngb.org/ as the target host.
    4. Login with your FTP username and password (available in the data submission process or "My service").
    5. Upload files to your private FTP upload area.

    Note: In the user's personal FTP directory, CNSA will retain the user-uploaded data files until all data files have been successfully submitted and archived. The FTP directory provided to the user for uploading data is a temporary directory and is not suitable for storing data for a long time. If the data file uploaded to FTP is saved for more than 2 months and the relevant metadata is not submitted, we will send a reminder email 15 days in advance. If there is no special reason, the metadata will still not be submitted and we will delete it periodically.

    MD5 check

    Large file transfers do not always complete successfully over the internet.

    An MD5 checksum can be computed for a file before and after transfer to verify that the file was transmitted successfully.

    MD5 (Message Digest Algorithm 5) is a hash function which calculates a hash value (MD5 number, 32-digit numbers and letters) of a given file. 

    You must provide an MD5 checksum for each file submitted to the archive. We will re-compute and verify the MD5 checksum to make sure that the file transfer was completed without any changes to the file contents.

    Obtain MD5 value (Linux)

    Obtain the MD5 values of the files by executing:

    $ md5sum file1 file2
    9F6E6800CFAE7749EB6C486619254B9C file1
    B636E0063E29709B6082F324C76D0911 file2
    Obtain MD5 value (Linux)
    Obtain the MD5 values of the files by executing,
    $ md5sum file1 file2
    9F6E6800CFAE7749EB6C486619254B9C file1
    B636E0063E29709B6082F324C76D0911 file2

    Obtain MD5 value (Mac OS X)

    Obtain the MD5 values of the files by executing:

    $ MD5 file1 file2
    9F6E6800CFAE7749EB6C486619254B9C file1
    B636E0063E29709B6082F324C76D0911 file2
    Obtain MD5 value (Mac OS X)
    Obtain the MD5 values of the files by executing,
    $ MD5 file1 file2
    9F6E6800CFAE7749EB6C486619254B9C file1
    B636E0063E29709B6082F324C76D0911 file2

    Obtain MD5 value (Windows)

    Install and run the Fsum Frontend (sourceforge.net/projects/fsumfe/). 

    At first, tick off "md5".

    After clicking the [+] button, open the sequence data files that you need. You can select multiple files at the same time.

    Click the [Calculate hashes] button. The MD5 values of the files are displayed.

    By clicking the [Export] button, you can obtain the list of the MD5 values as a html, a csv, or a xml file.

    Ethics and regulations on human genetic resources

    For submitting data from human subjects (human data) to the CNSA, it is submitter's responsibility to ensure that the dignity and right of human subject are protected in accordance with all applicable laws, ordinances, guidelines and policies of submitter's institution. In principle, make sure to remove any direct personal identifiers of human subjects from your data to be submitted.

    For submitting data to the CNSA, Users must follow the Interim Measures for the Human Genetic Resources Regulations and ethical norms in their countries, submit real organization and contact information, and take responsibility for the legality and compliance of their uploaded data. CNSA will receive raw data and assembly data from animals, plants, microorganisms, etc.

    Numbering rules

    Numbering rules for projects, samples, experiments, runs, and assemblies:

    Data typeNumbering rulesExample
    Project“CNP”+ 7 numeralsCNP0000063
    Sample“CNS”+ 7 numeralsCNS0001796
    Experiment“CNX”+ 7 numeralsCNX0002218
    Run“CNR”+ 7 numeralsCNR0002529
    Assembly“CNA”+ 7 numeralsCNA0001632

    Numbering rules for variations:

    Variation data typeNumbering rules
    Call (SNP)varc+01+numbers (01 means the variation is less than or equal to 50bp in length, and the following numbers are cumulatively presented. For example, varc012341 represents the 2341th variation.)
    Call (SV)varc+02+numbers (02 means the variation is more than to 50bp in length, and the following are cumulatively presented. For example, varc022341 represents the 2341th variation.)
    Call (CAHV)varc+03+numbers (03 means clinically associated human variations, and the following numbers are cumulatively presented. For example, varc032341 represents the 2341th variation.)
    AnalysisCVA0000001 (The following numbers are cumulatively presented)
    FileCVF0000001 (The following numbers are cumulatively presented)
    SubjectCVS0000001 (The following numbers are cumulatively presented)
    Regionvarr+02+numbers (02 means the variation is more than to 50bp in length, and the following numbers are cumulatively presented. For example, varr022341 represents the 2341th variation.)

    Data submission

    Notes

    (1)CNSA currently accepts online submissions of projects, samples, experiments/runs, assemblies, and variations.

    (2)Before submitting data, you need to register/login and fill in the submitter information.

    (3)The project and sample must be submitted prior to submitting the experiment/run, assembly, and variation.

    (4)The sample can be submitted independently, but the sample is only associated with the project after the relevant data has been submitted.

    (5)In the data submission process, fields with * are required, and other are optional.

    (6)If you need to submit data files, in order to complete the data submission process more quickly, it is recommended to submit the data files before submitting the metadata.

    (7)After the data submission is completed, the system will automatically jump to the corresponding data type under “My submission” after 10 seconds.

    • You can click on the submission ID of completed submission to view the details.
    • In the Status column, you can directly view the accession of a single submission, and you can also download the batch submitted file with attributes and accessions.
    • For public data of a single submission, you can click on the data accession in the status column to access the public details page. For public data of a batch submission, you can search the data accessions on the CNSA home page to access the details page.
    • You can click the "pencil icon" in the status column to modify. If the status column does not have a "pencil icon" , please send an email to datasubs@cngb.org and indicate the submission ID or data accession and the reason for modification.
    Signup /Login

    Users can enter the homepage of CNSA (Fig. 1) through the website (http://db.cngb.org/cnsa). Please click and read the "CNSA User Instructions" at the bottom of the page and agree to the instructions. Click the tab of “ Login/Signup” on the right side of the page to enter the Login/Signup page (Fig. 2). (Note: You need to register before you can login and submit the data.)

    Fig. 1 CNSA homepage

    Fig. 2 Login/Signup page

    Submitter information

    CNSA will obtain partial submitter information from the user's account information. The submitter's information filled out by the user is bound to the submitted project, sample, experiment/run, assembly, and variation data. When submitting the data, if the submitter information is not filled out, the system will jump to the CNSA homepage and ask the submitter to add the information; if the submitter information is completed, the system will automatically skip the page and enter the submission page. If the user needs to change the submitter information, click "My CNGBdb" in the home navigation to select "Submit" in the drop-down option, and go to the CNSA homepage (Figure 3) to modify. The modified submitter information will be bound to the data being submitted or submitted in the future.

    Fig. 3 Submitter page

    Submission portal

    Projects, samples, experiments/runs, assemblies, and variations can be submitted through their respective submission portal. Please click "Submit" on the home page or "Submit portal" in the navigation bar to enter the Submission portal page (Fig. 4), and then click the submission portal for the corresponding data type to submit the data. You can also enter the corresponding submission process by clicking the "New submission" button under each data type in “My submission”.

    Fig. 4 Submission portal page

    Submit project
    1. Project submission portal

      Click on "Project" (Fig. 5) on the Submission portal page to enter the submission process.

      Fig. 5 Project submission portal

    2. Data management

      On the data management page (Fig. 6), you need to choose a Data management form. If you choose “Public” or “Controlled”, you need to set a release date. Then click "Save and continue" to proceed to the next step.

      Note:

      (1)When selecting the Data management form, you need to carefully read the prompt information under the option. After submitting, you cannot change it yourself. If you need to make changes, please send an email to datasubs@cngb.org with the project accession and the reason for the change. If you need to change the controlled data to public, you will need to fill out the “Data Submission Review Request Form” and prepare the appropriate review materials.

      (2)If you need to change the release date, click “My submission”, find the submission under the project, and click the "pencil icon" in the release date column to edit it.

      Fig. 6 Data management

    3. General information

      On the general information page (Fig. 7), fill in the information of the Project title, Relevance, Public description, External links, Related projects, etc., and then click "Save and Continue" to proceed to the next step.

      Fig. 7 General information

    4. Detailed information

      On the detailed information page (Fig. 8), select the project type (you can choose more than one) and sample scope. The literature information is optional. Then click "Save and Continue" to proceed to the next step. If your article has not been published yet, you can click on “My submission” in the navigation bar after the article is published, find the corresponding submission ID, click on the "pencil icon" in the status column to enter the modification process, and supplement the relevant information of the article.If the status column does not have a "pencil icon", please contact datasubs@cngb.org, and indicate the project accession in the email.

      Fig. 8 Detailed information

    5. Overview

      The overview page (Fig.9) summarizes the information filled in the previous steps. If you find any errors, please click “Previous” to go to any of the previous pages to make the corresponding changes. If the check is correct, please click “Submit”. During the process of filling in the entire project information, the system will retain the last filling result.

      Fig. 9 Overview

    6. My submission-Project

      After the project is submitted, the system will automatically assign a project accession (CNPXXXXXXX) and jump to "My submissions – Project" after 10 seconds, you can view the project accession on this page (Fig. 10).

      Fig. 10 My submission-Project

    Submit sample

    Sample submission portal

    Click on "Sample" (Figure 11) on the Submission portal page to enter the submission process.

    Fig. 11 Sample submission portal

    Submission type

    Select a submission type (Fig. 12). If you submit multiple samples at a time, we recommend that you choose the batch submission, which is more convenient and faster than a single submission. You will need to download the batch submission template for the samples first in the submission process, then fill it out and upload it. If you submit only one sample at a time, we recommend that you choose a single submission. You need to fill out the sample information online in the submission process.

    Fig. 12 Submission type

    1. Submit batch samples

      (1)Select a sample type (Fig. 13). First select a large class from the drop-down list in the left input box and select a small class in the drop-down list in the right input box. Please carefully select the sample type. Once the process is submitted, you cannot modify it yourself. If you need to modify it, please send an email to datasubs@cngb.org with the submission ID and the reason for the modification.

      (2)Download the batch submission template of samples, fill it out and upload it (Fig. 13). Except for "sample_name, sample_title, description", the information of other fields of the sample cannot duplicated with other samples.

      Fig. 13 Submit batch samples

      (3)If it fails the check, modify it according to the check rule and error line number prompted by the bullet box (Fig. 14), and then upload again.

      Fig. 14 Check result

      (4)After the check passed, click “Submit”, the system will automatically assign the sample accessions (CNSXXXXXXX) and jump to “My submission-Sample” after 10 seconds. The metadata file with accessions can be downloaded in the status column of this page (Fig. 15).

      Fig. 15 My submission-Sample

    2. Submit a single sample

      (1)Sample type

      On the sample type page (Fig. 16), please carefully select the sample type, then click “Save and continue” to proceed to the next step. Once the process is submitted, you cannot modify it yourself. If you need to modify it, please send an email to datasubs@cngb.org with the submission ID and the reason for the modification.

      Fig. 16 Sample type

      (2)Sample attributes

      On the sample attributes page (Fig. 17), different sample types require different attributes, then click "Save and continue". If some fields do not pass the check, please modify it according to the prompt information, then click "Save and continue" to proceed to the next step.

      Fig. 17 Sample attributes

      (3)Overview

      This page (Fig. 18) summarizes the information filled in the previous steps. If you find any errors, please click “Previous” to go to any of the previous pages to make the corresponding changes. In the entire submission process, the system will retain the last saved result when you quit the system. If the check is correct, please click “Submit”.

      Fig. 18 Overview

      (4)My submission-Sample

      After the sample is submitted, the system will automatically assign a sample accession such as CNSXXXXXXX and jump to “My submission – Sample”. The sample accession can be viewed in in the status column of that page (Fig. 19).

      Fig. 19 My submission-Sample

    Submit experiment/run

    You need to submit the metadata and data files for the experiment/run. Before submitting the experiment/run data, please create the project and sample first.

    Experiment/run submission portal

    Click on " Experiment/run " (Figure 20) on the Submission portal page to enter the submission process.

    Fig. 20 Experiment/run submission portal

    Submission type

    Select a submission type (Fig. 21). If you submit multiple experiments/runs at a time, we recommend that you choose the batch submission method, which is more convenient and faster than a single submission. You will need to download the batch submission template for the experiments/runs metadata first in the submission process, then fill it out and upload it. If you submit only one experiment/run(s) at a time, we recommend that you choose a single submission method. You need to fill out the experiment/run(s) metadata online in the submission process.

    Fig. 21 Submission type

    1. Submit batch experiments/runs

      (1)Upload data files according to the data upload method.

      (2)Download the batch submission template of experiments/runs, fill it out and upload it (Fig. 22). If it fails the check, modify it according to the check rule and error line number prompted by the bullet box, and then upload again. After the metadata pass the check, the system will check the MD5 values of data files. If there are files that have not passed the check, please handle it according to the prompt information of the bullet box (Fig. 23).

      Fig. 22 Submit batch experiments/runs

      Fig. 23 Check result

      (3) When the status of the data file is "Check finished", click “Submit”, the system will automatically assign the accessions of experiments/runs (CNXXXXXXXX/CNRXXXXXXX) and jump to “My submission- Experiment/run” after 10 seconds. The metadata file with accessions can be downloaded in the status column of this page (Fig. 24).

      Fig. 24 My submission- Experiment/run

    2. Submit a single experiment/run(s)

      (1)General information

      One the general information page (Fig. 25), select the project accession and sample accession associated with the experiment/run(s) in the drop-down list, then click “Save and continue” to proceed to the next step. If you have not submitted the project and sample, create a new project and sample.

      Fig. 25 General information

      (2)Metadata

      On the metadata page (Figure 26), you can select a submitted experiment accession in the experiment reuse section, the system will automatically fill in the experiment information. The copied experiment information can be modified to help users to fill in quickly. If you do not reuse the experiment information, you need to fill in the experiment information, then click "Save and continue". If some fields do not pass the check, please modify it according to the prompt information, then click "Save and continue" to proceed to the next step.

      Fig. 26 Metadata

      (3)Data files

      On the data files page (Fig. 27), please upload the data file according to the data upload method, and fill in the file name and MD5 value of the data file in the input box of the “Data files” section, then click "Check". If the input box turns red, modify it according to the error message in the question mark, and then click "Check". If the status of the data file is “Not uploaded, Calculating or MD5 mismatch”, please handle it according to the prompt information in this part of the page. If the status of the data file is "Check finished ", click "Save and continue" to proceed to the next step.

      Fig. 27 Data files

      (4)Overview

      This overview page (Fig. 28) summarizes the information filled in the previous steps. If you find any errors, please click “Previous” to go to any of the previous pages to make the corresponding changes. In the entire submission process, the system will retain the last saved result when you quit the system. If the check is correct, please click “Submit”.

      Fig. 28 Overview

      (5) My submission- Experiment/run

      After the experiment/run is submitted, the system will automatically assign the accession of experiment/run (CNXXXXXXXX/CNRXXXXXXX) and jump to “My submission- Experiment/run” after 10 seconds. The experiment accession can be viewed in in the status column of that page (Fig. 29). The run accession can be viewed by clicking on the submission ID.

      Fig. 29 My submission- Experiment/run

    Submit assembly

    You need to submit the metadata and data files for the assembly. Before submitting the assembly data, please create the project and sample first.

    Assembly submission portal

    Click on " Assembly " (Figure 30) on the Submission portal page to enter the submission process.

    Fig. 30 Assembly submission portal

    Submission type

    Select a submission type (Fig. 31). If you submit multiple assemblies at a time, we recommend that you choose the batch submission method, which is more convenient and faster than a single submission. You will need to download the batch submission template for the assemblies’ metadata first in the submission process, then fill it out and upload it. If you submit only one assembly at a time, we recommend that you choose a single submission method. You need to fill out the assembly metadata online in the submission process.

    Fig. 31 Submission type

    1. Submit batch assemblies

      (1)Upload data files according to the data upload method (Fig. 32).

      (2)Download the batch submission template of assemblies, fill it out and upload it (Fig. 32). If it fails the check, modify it according to the check rule and error line number prompted by the bullet box (Fig. 33), and then upload again. After the metadata pass the check, the system will check the MD5 values of data files. If there are files that have not passed the check, please handle it according to the prompt information of the bullet box.

      Fig. 32 Submit batch assemblies

      Fig. 33 Check result

      (3)When the status of the data file is "Check finished", click “Submit”, the system will automatically assign the accessions of assemblies (CNAXXXXXXX) and jump to “My submission-Assembly” after 10 seconds. The metadata file with accessions can be downloaded in the status column of this page (Fig. 34).

      Fig. 34 My submission-Assembly

    2. Submit a single assembly

      (1)General information

      Select the project accession and sample accession associated with the assembly in the drop-down list (Fig. 35), then click “Save and continue” to proceed to the next step. If you have not submitted the project and sample, create a new project and sample.

      Fig. 35 General information

      (2)Metadata

      On the metadata page (Figure 36), you need to fill in the metadata of assembly, then click "Save and continue". If some fields do not pass the check, please modify it according to the prompt information, then click "Save and continue" to proceed to the next step.

      Fig. 36 Metadata

      (3)Data files

      On the data files page (Fig. 37), please upload the data file according to the data upload method, and fill in the file name and MD5 value of the data file in the input box of the “Data files” section, then click "Check". If there is a field that has not passed the check, please modify it according to the error prompt under the field, and then click "Check". If the status of the data file is “Not uploaded, Calculating or MD5 mismatch”, please handle it according to the prompt information in this part of the page. If the status of the data file is "Check finished ", click "Save and continue" to proceed to the next step.

      Fig. 37 Data files

      (4)Overview

      The overview page (Fig. 38) summarizes the information filled in the previous steps. If you find any errors, please click “Previous” to go to any of the previous pages to make the corresponding changes. In the entire submission process, the system will retain the last saved result when you quit the system. If the check is correct, please click “Submit”.

      Fig. 38 Overview

      (5)My submission-Assembly

      After the assembly is submitted, the system will automatically assign the assembly accession (CNAXXXXXXX) and jump to “My submission-Assembly” after 10 seconds. The assembly accession can be viewed in in the status column of that page (Fig. 39).

      Fig. 39 My submission-Assembly

    Submit variation

    Before submitting the variation data, please create the project and sample first, and the experiment/run data is optional.

    Variation submission portal

    Click on " Variation " (Figure 40) on the Submission portal page to enter the submission process.

    Fig. 40 Variation submission portal

    Submit variation data

    1. Select a variant type (Fig. 41).

      Fig. 41 Variation type

    2. Upload the VCF file of variations.

      (1)If you choose SNP, you need to upload the VCF file to FTP, then download the SNP submission template, fill it out and upload it (Fig. 42).

      Fig. 42 Submit SNP

      (2)If you choose SV, you need to download the SV submission template, fill it out and upload it. VCF files can be optionally submitted (Fig. 43).

      Fig. 43 Submit SV

      (3)If you choose CAHV, you only need to need to download the CAHV submission template and fill it out and upload it (Fig. 44).

      Fig. 44 Submit CAHV

    3. After the template file is filled in and uploaded, the system will check each sheet in the template in turn. If the field fails to pass the check, modify it according to the check rules and the error line numbers in the bullet box (Fig. 45), then re-upload and click "check".

      Fig. 45 Check result

    4. If the check is passed, click “Submit”, the system will jump to “My submission-Variation” after 10 seconds. After the data pass the review, the submitted file with accessions can be downloaded in the status column of this page (Fig. 46).

      Fig. 46 My submission-Variation