Data analysis

CODEPLOT is a reliable and flexible bioinformatics computing platform in life science, which aims to promote the efficient sharing, cooperation, and utilization of omics data in research and industries. The cutting-edge technologies like data encryption, blockchain and secure multi-party computing are employed in this platform to provide a highly secure and reliable environment for data sharing and collaboration. Meanwhile, docker/container technology, workflow description language (WDL) and Cromwell workflow engine are incorporated to provide a highly efficient and user-friendly framework to create workflows and perform bioinformatics analysis in this platform.

BLAST stands for Basic Local Alignment Search Tool. The BLAST service of CNGB is developed with NCBI BLAST+ 2.6.0 standalone version, downloaded from NCBI FTP server, providing sequences searching on public data of CNGB applications, BGI projects and external data sources. For example, BLAST integrates datasets from the Transcriptomes of 1,000 Plants (ONEKP), datasets for thousands of Transcriptomes of 1,000 Fishes (FISHT1K), NCBI's nr, nt databases, etc. The word, BLAST, in the name "the BLAST service of CNGB", is standing for kinds of sequence searching method. More types of sequence searching algorithms will be integrated in the future.

Translation is an essential step in gene expression that directly shapes the proteome, contributing to cellular structure, function, and activity in all organisms. A much better tell-tale for gene translation is the translatome consisting of ribosome-protected footprints, which indicate mRNAs being in the process of translation. Although several techniques are developed specifically for the acquisition of the translatome information, ribosome profiling (Ribo-Seq) in comparison to other techniques (eg. Polysomal profiling) has unique advantages that can provide a global measure of translatome at near nucleotide resolution.

DISSECT is a comprehensive data integration platform for cancer research, including the first mirror site of ICGC Data Portal in China, which provides important resources for domestic researchers. Based on the big data research, we devote to an integrated platform with multi-omics data and various analytic tools, in an effort to help users dissect data of common database and their own. Those analyses, from single cluster and single data type to cross clusters and multi data types, are all well-designed to be easy to use. With the DISSECT tool, you need to select the data you want to analyze (https://db.cngb.org/dissect/data/) and then select the analysis tool to analyze the data. DISSECT provides sample data of 47,277 different phenotypes for researchers to select and analyze.

PVD contains information of pathogenic microbes that cause human infectious diseases, including pathogen classification, biological characteristics, nucleic acid sequence,disease phenotype and patient immunological characteristics. It includes three sub-databases: chronic infectious disease pathogens database, emerging infectious disease pathogens database, and major infectious disease pathogens database,providing the search and identification tools for clinicians and researchers.

The H. pylori typing tool (HpTT) is a novel genomic epidemiological tool that can achieve high-resolution analysis of genomic typing and visualizing simultaneously, providing insights into the genetic population structure, evolution analysis, and epidemiological surveillance of H. pylori. HpTT is a genomic typing tool based on SNP (single nucleotide polymorphism) of bacterial pathogens, which can facilitate not only H. pylori isolates, but also other pathogens that highly related to the public health.