The Flaveria genome project is an integration of genomic data of five Flaveria species, including C₃, C₃-C₄ and C₄ species. The datasets include the raw sequencing data, assembly, annotation, transcriptome, proteome and gene co-regulatory network.
1. Backgroud
1.1 Introduction to the genus of Flaveria
The genus Flaveria (Astericea), known as yellow top, is found in North and South America, Australia, Asia and Africa. Most of Flaveria species are annual, and some of them are perennial. The genus Flaveria contains known 23 species, including C₃ species and C₄ species, as well as a range of species performing intermediate photosynthetic types. Besides, the Flaveria C₄ species are one of the youngest C₄ species that evolved around 5 million years ago. Therefore, the Flaveria genus is used as a model system for the study of C₄ photosynthesis.
Figure 1. Summary of phylogeny and timescale of the five Flaveria species and the three indicated outgroup species. Bars represent 95% confidence intervals of the estimated divergence time. Whole genome duplications are shown at the corresponding node/branch. Panels at the right display fluorescence in situ hybridization images to assess the chromosome numbers in Ftri, Flin, and Frob.
1.2 The Flaveria genome
The Flaveria genome database is an integration of genomic data of five Flaveria species, i.e., F. robusta (C₃), F. sonorensis (C₃-C₄), F. linearis (C₃-C₄), F. ramosissima (C₃-C₄) and F. trinervia (C₄). The dataset include the raw sequencing data, genome assemblies, annotations, transcriptomes, proteomics and ATAC-seq. The annotations include transposons, genes and transcriptional factors (TFs).
Figure 2. Collinearity of chromosomes among Flaveria species. C₄ genes are drawn in red lines. Dashed lines represent either failure in anchoring to chromosome (NADP-ME in Flin) or a deletion from the genome (PEPC-k in Fram).
2. Data description
2.1 Genome
The genome sequences of five Flaveria species were assembled with PacBio RSII single-molecule real-time (SMRT) sequencing technology. The assembled genome size is gradually increased during the evolution of C₄ photosynthesis in this genus, from 0.55 Gb in the C₃ species Frob, 1.26~1.66 in C₃-C₄ species, whereas 1.8 Gb in the C₄ species Ftri. Based on chromatin conformation capture (Hi-C seq), from 98% to 99% of assembled genome sequences are anchored to 18 chromosomes in each species.
2.2 Proteome
Proteome of the five Flaveria species were obtained applying LC-MS/MS analysis independently. Six biological replicates of mature leaves were used for each species. Raw data from each species were used to construct library based on protein sequence from these species. Data independent acquisition (DIA) was performed using Spectronaut. The mass spectrometry proteomics data are available in the PRteomics IDEntifications Database (PRIDE) with accession number PXD024720.
2.3 Transcriptome
The RNA-seq dataset were obtained from plants subjected to treatments that were earlier reported to regulate the expression levels of C₄ photosynthesis-related genes, including low CO₂ (380 PPM vs100 PPM) for two and four weeks, high light (500 μmol m⁻² s⁻¹ vs 1400μmol m⁻² s⁻¹) for two weeks, and exogenous ABA (ddH₂0 vs 40 μM ABA ) for three hours. In total, from 18 to 28 RNA-seq dataset were generated for each species. NRA-seq raw reads were also available in GEO of NCBI with accession numbers PRJNA600545 and PRJNA827625.
Transcript abundances of genes were calculated by mapping RNA-seq reads to assembly genome sequence of corresponding species using RSEM (v1.3.3), where STAR (v2.7.3a) was used as the mapping tool.
2.4 Gene regulatory network
Whole genome wide gene regulatory networks (GRNs) of the five Flaveria species were constructed independently by applying CMIP package, the gene expression matrix obtained as described above was used as the only input.
2.5 Assay for Transposase-Accessible Chromatin analysis for the C₄ species F. trinervia
Fully expanded mature leaves were harvested to isolate nuclei. Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) libraries containing DNA insert between 50 and 150 bp were gel purified and sequenced in Illumima X Ten platform in paired-end 150 bp mode. Sequencing reads were mapped to genome sequence of F. trinervia using bowtie2 (v2.3.4.3). Peaks were called using macs2 (v2.2.7.1).
2.6 Project
Project: https://db.cngb.org/search/project/CNP0003058/ Raw data are available in the CNP0003058.
3. Gallery
F. robusta
|
F. sonorensis
|
F. linearis
|
F. ramosissima
|
F. trinervia
|
Frob vein
|
Fson vein
|
Flin vein
|
Fram vein
|
Ftri vein
|
Frob BSCs
|
Fson BSCs
|
Flin BSCs
|
Fram BSCs
|
Ftri BSCs
|
Frob stomata
|
Fson stomata
|
Flin stomata
|
Fram stomata
|
Ftri stomata
|
Figure 3. C₄ photosynthesis related features. From top to bottom: Plant photos; leaf vein density was increased during evolution; bundle shealth cells (BSCs) are acitivated in the C₄ species Ftri, which possess abundant chloroplasts, the specilized BSCs in C₄ species are called as "Kranz anatomy"; stomata density is decreased in the C₄ species compared to C₃ and C₃-C₄ species.
4. Publication
Not yet.
5.How to analyze in CODEPLOT
The genome and transcriptome data, genome assemblies and annotations can be analyzed in CODEPLOT,you can clone the Flaveria dataset to your workspace,then add OrthoFinder or other tools from Tools to your workspace.