CNGBdb
CNGB Agricultural Digital Service Platform
Home
Services
Technology
Cases
/
Technology

A full range of genomic breeding solutions

De novo sequencing of the whole genome of animals and plants whose genome sequence is unknown, construction of species reference genome sequence. On the basis of the reference genome, large-scale whole-genome resequencing was carried out to construct a genomic variation map, and at the molecular level, the centers of agricultural species diversity and cultivation origin and domestication centers were identified, laying an important theoretical foundation for the study of crop domestication and trait improvement. In order for researchers to conduct biological information analysis independently and conveniently, the National Gene Bank has created a National Gene Bank Trusted Computing Platform (CODEPLOT).

Trusted Computing Platform CODEPLOT

The tremendous progress in next-generation sequencing technology has revolutionized research methods, benefiting human health, agricultural science, and pandemic control. CODEPLOT provides comprehensive solutions for data sharing, workflow management, elastic cloud computing resources and a reliable collaborative environment for research and industry in the life science field. At this stage, three unique data sets can be used, including Assembly and gene annotation of the 1000 plant transcriptomes, COVID-19 Dataset and Single Cell Dataset, and an automatic analysis workflow of 16 responses, and are ready to be used in the CODEPLOT platform.

CODEPLOT has three advantages:

  • Easy to operate: CODEPLOT is an online biological information platform, researchers can analyze and mine biological information without a programming background. Combine Docker/container technology, Workflow Description Language (WDL) and Cromwell workflow engine to provide an efficient and user-friendly framework to create workflows and perform bioinformatics analysis in this platform.

  • High-performance computing: CODEPLOT provides enterprise-level elastic cloud computing resources based on highly scalable and high-performance enterprise-level Kubernetes clusters and Docker container technology, and runs parallel jobs in batch mode.

  • Security: CODEPLOT adopts cutting-edge technologies such as data encryption, blockchain, and secure multi-party computing to provide a highly secure and reliable environment for data sharing and collaboration. In short, CODEPLOT is a reliable and flexible bioinformatics computing platform in the life sciences, which aims to promote the effective sharing, cooperation and utilization of omics data in research and industry.

CODEPLOT is a flexible and credible computing platform. Users do not need any programming background and can also use the platform's computing tools to perform automated bioinformatics analysis. Let medical staff focus on their own duties. CODEPLOT provides a life big data analysis platform integrating a trusted computing environment and diversified online analysis tools. It is also the first domestic application to apply the latest security strategies such as data encryption, blockchain, secure multi-party computing, and genetic security container virtualization. A platform for life big data analysis and utilization and cooperation and sharing.

We provide users with:

  • Based on the WDL (Workflow Description Language) biological information execution workflow and the software environment management mechanism of docker mirroring, it is easy to read and manage the biological computing work for computing users in different research fields. At the same time, the easy portability of tools between multiple execution platforms enables rapid recurrence of workflow.

  • Based on the work space as the work unit of the platform calculation, the work space contains four parts: description, data, work, and monitoring. Platform users have research purposes or biological problems to create a corresponding workspace. The description of the workspace is convenient to organize and record the research process and research background for sharing with collaborative analysts. The data unit managed by the data table clearly displays the dimensions of the research data and facilitates batch analysis by users Sample data sheet.

  • Based on Kubernetes intelligent genetic computing task scheduling and Spark and other acceleration services, and using the second-level concurrency of container technology, WGS can be shortened from 30 hours to less than 5 hours. Comparing similar competing products, using the same sample, the resources Utilization rate has increased significantly. The second-level elastic scalability of container applications can quickly and elastically expand capacity when traffic surges, ensuring business continuity and high stability.