CNGBdb
CNGB Agricultural Digital Service Platform
Home
Services
Technology
Cases
/
Services

Agricultural cash crop breeding schemes

De novosequencing and large-scale re-sequencing at the whole genome level of agricultural cash crops

De novo whole genome sequencing for unsequenced species(animals & plants) , construction reference genome sequence of species. On that basis, large-scale whole-genome re-sequencing was carried out to construct a genomic variation map, addressing the questions of diversity of agricultural species and origin of cultivation and domestication at the molecular level, laying an important theoretical foundation for the study of crop domestication and trait genetic improvement.

Mining functional genes and genetic loci of agricultural cash crops

Functional Research of agricultural cash crop resources, mining functional genes and genetic loci related to functional components or phenotypes, clarifying the molecular mechanism of regulation, which gives a big boost to the quality improvement of cash crops, and accelerates the process of genetic breeding of cash crops, achieves an innovative and high-efficiency breeding model--- "Germplasm + Gene" .

Biological informatics analysis service for Breeding

Biological Informatics Analysis Technology Solution

After long-term accumulation in scientific project research, the research team has a great wealth of projects experience, and has established biological information analysis technology for animal and plant research in the agricultural field.
Standard analysis includes raw data alignment, variation calling, and annotation.
Advanced analysis includes diversity analysis of population, structure and history analysis of population, genome-wide association analysis, haplotype map construction, etc.

Whole Genome Re-sequencing Analysis

Content

Whole Genome Re-sequencing, that is, resequencing the genomes of individuals or population of species with existing reference genomes, and characterizing the mutations (SNP, InDel, SV, CNV) at the whole genome level with high-performance computing platforms and bioinformatics methods, identifying differential gene expression profile with high accuracy and quickly, and can be broadly used in population genetics research, association analysis, evolutionary analysis, etc.

Method

  • Data alignment with Burrows-Wheeler-Alignment Tool(BWA), duplicates remove with Picard MarkDuplicate tool from alignment results, eliminating the effect of PCR duplication
  • Variant Calling with GATK, including SNP and InDel
  • Variants (SNP, Small InDel) annotation and functional effects prediction with SnpEff
  • Structural variants detection with BreakDancer. SV (structural variation) refers to the insertions, deletions, inversions, inter- and intra-chromosomal translocations of large fragments at the genome level

Genome-wide Association Study

Content

Genome-wide association study(GWAS) is a unbiased genome-wide analysis approache of finding whether any nucleotide variations is associated with a phenotypic trait with certain statistical methods; A whole genome re-sequencing is performed on each individual from populations with rich genetic diversity, combining with the phenotypic data of the target trait, and then a genome-wide association analysis is conducted based on certain statistical methods, which can identify the chromosome segment or gene locus that affects the target trait in a rapid way .

Method

  • Quality control of sequencing data
  • Sequencing reads mapping to the reference genome
  • SNP genotyping in populations
  • Phylogenetic trees construction
  • Principal component analysis of populations
  • Traits association analysis
  • Gene function annotation of genomic region associated with target traits

Bulked Segregant Analysis

Content

Bulked Segregant Analysis, is a method of mapping functional genes by extreme traits. Following a cross between parental lines showing contrasting phenotypes. Two bulked pools are generated by mixing DNA , RNA, or SLAF-seq from two populations each composed of a certain amount of individuals showing extreme opposite trait values(phenotypes) for a given phenotype in a segregating progeny. Then the association mapping is performed between sequenceing reads from bulked Pools and the reference sequence. After a battery of SNP, InDel variations are tested for these two bulked pools, markers showing significant differences between the two bulks will be selected to represent linkage to the gene(s) responsible for the difference in the characters. The characteristics related region where the mutation lies can be assessed and analysised with Euclidean distance (ED) method and SNP-index algorithm, and functional annotation and enrichment analysis cross genes in these regions will be conducted. In-depth mining can also be carried out on the basis, such as: primer design, gene mining in the region, and marker screening.

Method

  • Map to reference genome with Burrows-Wheeler-Alignment Tool(BWA)
  • Variant Calling with GATK, including SNP and InDel, result annotation with ANNOVAR
  • Caculate SNP and InDel frequency, identify SNP and InDel loci with significant difference
  • Locate gene regions associated with target traits
  • Positional candidate gene analysis and functional annotation

Transcriptome analysis

Content

Based on the known genome sequence and annotation information, with the next generation high-throughput RNA-seq data, associated with alignment results between sequencing data and the reference genome, detection of novel transcripts (novel genes), identification of new splicing alterations events, determination of genes structural analysis, quantitative analysis of gene expression and differential genes expression analysis can be conducted. Contents: quality evaluation of sequencing data; alignment of sequencing data with selected reference genomes; determination of the exon/intron boundaries, analysis of gene alternative splicing; new gene and transcripts discovery in un-annotated regions; identification SNPs in the transcribed regions; modify the annotated gene boundaries(5'and 3'); quantify gene and transcript expression levels, identify genes that are significantly differentially expressed between different samples (groups) and perform functional annotation and enrichment analysis.

Method

  • Quality evaluation of sequencing data
  • Map to reference genome with TopHat
  • Transcript splicing and quantitative analysis of gene expression with Cufflinks. Detection of novel transcripts (novel genes) through alignment results between sequencing data and the reference genome. Align the novel genes with different libraries with BLAST and get annotation information.
  • Based on alignment results between sequencing data and the reference genome with TopHat, detect single-base mismatch with SAMtools in sequencing data and the reference genome, identify candidate SNPs in the gene regions
  • Compare with the known splicing model of the gene, prefict new splicing alterations events in cross-intron reads with Cufflinks, and visualization of results with SpliceGrapher
  • Then, according to the expression levels of genes in different samples, perform differential genes expression analysis with DESeq and EBSeq, and the differentially expressed genes were screened by indicators FDR and FC.
  • Finally, extract the annotation information of each differentially expressed gene set, perform the GO/KEGG pathway enrichment analysis of the differentially expressed gene set with Fisher exact test, topGO, etc.

For more cloud platform computing and analysis services, please visit CODEPLOT. CODEPLOT is committed to providing users a reliable and flexible computing platform. You can carry out automatic bioinformatics analysis without programming background. we also ensure the data security by using blockchain, multi-party secure computing and other cutting-edge technologies.