Data standard

Open data sharing is the foundation of life science innovation and research. The standard data structure is the premise of data sharing and application. CNGBdb integrates data structures and standards of international omics, health, and medicine, such as The International Nucleotide Sequence Database Collaboration (INSDC), The Global Alliance for Genomics and Health GA4GH (GA4GH), Global Genome Biodiversity Network (GGBN), American College of Medical Genetics and Genomics (ACMG), and constructs standardized data standards and structures with wide compatibility. CNGBdb applies the GGBN sample standard to sample collection and sample data sharing to ensure that sample data can be shared and reused, and applies the GA4GH standard to datasets from individual humans to ensure that health data sets are available to researchers with data access. The data of some modules including project, sample, experiment/run, assembly, and annotation in the CNGBdb basic data refer to the INSDC data structure to share data with international data sharing centers such as EBI, NCBI, DDBJ. So that it can be used by more researchers around the world. At the same time, the open data structure metadata standard application set built by CNGBdb can be accessed through the CNGBdb dataset.

The INSDC is a long-standing foundational initiative that operates between DDBJ, EMBL-EBI and NCBI. INSDC covers the spectrum of data raw reads, though alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations.

The GA4GH is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health. Bringing together 500+ leading organizations working in healthcare, research, patient advocacy, life science, and information technology, the GA4GH community is working together to create frameworks and standards to enable the responsible, voluntary, and secure sharing of genomic and health-related data.

The GGBN Data Standard is a set of vocabularies designed to represent tissue, DNA or RNA samples associated to voucher specimens, tissue samples and collections.

ACMG’s activities include the development of laboratory and practice standards and guidelines, advocating for quality genetic services in healthcare and in public health, and promoting the development of methods to diagnose, treat and prevent genetic disease.