Statistical Modeling of Enzyme Kinetics for Base Modification Detection
Source: NCBI BioProject (ID PRJNA175447)
Source: NCBI BioProject (ID PRJNA175447)
0 0
Description: Current generation DNA sequencing instruments are moving us closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while very significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown modifications, such as 5-hydroxymethylcytosine, 6-methyladenine, 8-oxoguanine, glucosyl-5-hydroxymethylcytosine, and thymine dimers, at genome scale. Recently single molecule real time sequencing has been reported to identify kinetic variation events (KVEs) that have been demonstrated to reflect epigenetic changes of every known type, providing for great promise in objectively identifying chemical modification to DNA bases as a routine part of sequencing. However, to date, no statistical framework has been proposed to enhance the power to detect these events while also controlling for false positive events. Here we develop and apply a statistical framework for inferring kinetic variation from single molecule, real-time sequencing data. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a given genomic region of interest as a conditional random field, we provide a way to incorporate kinetic information not only at a test position of interest, but at neighboring sites as well, including interactions among neighboring sites that can help enhance power to detect kinetic variation events. The performance of this and related models is explored, and then the best performing model is applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events while others represent putative chemically modified sites of unknown types.
Data type: epigenomics
Sample scope: Multispecies
Relevance: Medical
Organization: Mount Sinai School of Medicine
Last updated: 2012-09-18