High-quality de novo genome assembly and annotation of the tomato genome using Long Read sequencing
Source: NCBI BioProject (ID PRJNA553986)
Source: NCBI BioProject (ID PRJNA553986)
0 0
Project name: Solanum lycopersicum cultivar:Micro-Tom
Description: Long sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies and able to extend paths into problematic or repetitive regions. In addition to Long sequencing reads approaches, recent technologies like optical mapping and linked reads from 10X Genomics are capable to bring additional scaffolding to achieve chromosome-level assemblies. In order to improve the actual tomato reference genome sequence, we generated 70X coverage of Pacific Biosciences (PacBio RSII) long-reads, 100X of optical mapping using two different enzymes and 100X illumina HiSeq3000 2x150b paired-reads sequencing from Chromium linked 10X Genomics libraries.The integration of these three approaches allowed to reach a genome size of ~830 Mb with an N50 of 45 Mb. The assembly contiguity reached chromosome-arm-levels. Interestingly, one full chromosome (Ch12) has been fully assembled in one scaffold. Some of the remaining scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions. We assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map informations. The integration of the genetic map allowed to generate the 12 pseudomolecules corresponding to the 12 tomatoes chromosomes. Several regions corresponding to chromosome zero in the SL3.0 reference genome were included in the current assembly.We took advantage from the large RNA-Seq data of the TomExpress platform (http://tomexpress.toulouse.inra.fr/) and use them to annotate this new genome assembly. To assess the completeness of the genome and the transcriptome assembly, busco v3.0 software has been used and shown a high percent of gene coverage (> 95%).The genome sequence, annotation, JBrowse, and useful tools are accessible http://tomatogenome.gbfwebtools.fr/ .
Data type: Genome sequencing and assembly
Sample scope: Monoisolate
Relevance: ModelOrganism
Organization: INPT
Release date: 2020-04-16
Last updated: 2019-07-11
Statistics: 1 sample