/
/
COVID-19 Dataset
COVID-19 Dataset

The COVID-19 Novel Coronavirus Sequence database is built by the China National GeneBank DataBase(CNGBdb) through integrating the released coronavirus sequence data from several open source data platforms. The database contains only virus sequences and do not include the sequence of human. With the data from this database scientific researchers can further construct a virus phylogenetic tree to reveal the pathogen-related characteristics.

数据量: 106,436
更新时间: 2020-10-21

1.Background 背景描述

The COVID-19 Novel Coronavirus Sequence database is built by the China National GeneBank DataBase(CNGBdb) through integrating the released coronavirus sequence data from several open source data platforms. The database contains only virus sequences and do not include the sequence of human. With the data from this database scientific researchers can further construct a virus phylogenetic tree to reveal the pathogen-related characteristics, and provide effective references for the study and analysis of the evolutionary source and pathological mechanism of the novel coronavirus.

国家基因库生命大数据平台CNGBdb通过整合已发布的COVID-19新型冠状病毒序列数据建立数据库,这些数据仅为病毒数据,不包含人类数据。利用该数据整合的新冠肺炎数据库,科研人员可进一步构建进化树揭示该病原相关特性,为研究分析新型冠状病毒的进化来源、致病病理机制提供有效的参考依据。

2.Data description (数据说明)

  • Meta data/COVID-19:The COVID-19 Novel Coronavirus Sequence database integrated the released coronavirus sequence data from CNGB、NCBI、GISAID 、PDB.

  • 元数据表格/COVID-19:整合CNGB、NCBI、GISAID 、PDB已发布新型冠状病毒序列数据建立数据库 元数据表格。

  • References:Reference genome。

  • 参考基因组:参考基因组相关文件信息表格。

  • Files/ Dataset filesFrom the public dataset file, which can be used but not downloaded。

  • 文件/数据集文件:来自公共数据集文件。能被使用不能被下载。

  • Files/Calculation results:generated through analysis

  • 文件/计算结果:通过分析产生的结果文件

  • Files/my upload:files uploaded to the workspace can only be accessed by youself

  • 文件/我的上传:上传到工作空间的文件,只能被自己使用。

3.Workflows (工作流程)

3.1BLAST

The sequence alignment tool based on the widely used NCBI-BLAST(2.10.0),selecting different blast components by method parameters.
基于目前广泛使用的NCBI-BLAST(2.10.0)提供序列比对搜素,用户可以通过method参数选择不同的BLAST成分。

4.Data source (数据源)

1.CNGB

2.GISAID

3.NCBI

4.PDB

Enabled by data from GISAID

Showing 91136 of 91136 genomes sampled between Dec 2019 and Mar 2020.

显示了在2019年12月至2020年3月之间取样的91136个基因组中的91136条序列。

GISAID is a platform openly accessible to the public. Access to, and use of, the GISAID Database and Data, as defined herein, is governed by the GISAID Database Access Agreement(https://www.gisaid.org/DAA) . Whether as a provider or user of Data, both need to accept and agree to be bound by the terms of the Agreement. GISAID will provide its best efforts to protect the rights of the data producers and data submitters. Data users need to provide clear acknowledgement to the data producers and data submitters. If the data users reprocess and reanalysis based on the data from this platform, the subsequent acquisition of intellectual property rights shall be subject to the consent of the data producer and data submitter.

GISAID是一个面向公众开放的平台。数据集包含的对GISAID数据库和数据的访问和使用受GISAID数据库访问协议的约束(https://www.gisaid.org/DAA) . 无论是作为数据的提供者还是用户,双方都需要接受并同意受协议条款的约束。GISAID将尽最大努力保护数据生产者和数据提交者的权利。数据用户需要向数据生产者和数据提交者提供明确的确认。如果数据使用者基于本平台的数据进行再处理和再分析,后续知识产权的获取须经数据生产者和数据提交者的同意。