Content area
Full text
Introduction
Despite the complex panoply of physiological and pathological chemical modifications that occur in biological genomes1, 2, 3–4, DNA sequencing approaches have relied on the Watson-Crick base-pairing properties of DNA to reveal the sequence of bases within a DNA molecule5,6. With the exception of a few methods that conditionally alter base-pairing properties based on chemical modifications of bases, such as bisulfite sequencing of 5-methylcytosine7, this type of sequencing returns one dimension of information: the simple base sequence of the nucleic acid. While some approaches based on altered enzymatic activity at modified DNA bases (such as damID8) or antibody-dependent DNA immunoprecipitation9 can characterize base modifications, these approaches can have base resolution or noise issues limiting their utility. Third-generation sequencing technologies are capable of measuring beyond simple base-pairing interactions. Pacific Biosciences’ SMRT sequencing is a sequencing-by-synthesis method that, unlike next-gen sequencing, can observe the kinetics of base addition, allowing slight differences in synthesis opposite a modified base to infer its modification10. Nanopore sequencing, including devices created by Oxford Nanopore Technologies, does not utilize sequencing-by-synthesis; instead the sequencer functions by passing a nucleic acid through an engineered pore and recording the current as the nucleic acid passes through; the observed current will depend on the five or six bases located within the pore as current values are measured11. Notably, a base modification will slightly alter this shape, and in many cases will cause a slightly different current than the unmodified base12, 13, 14–15. To recognize these current alterations, a reference current profile for the modified base must be experimentally generated; the dynamics of nucleic acid shape and pore current are too complex to derive from theoretical principles alone. The generation of a complete training library is expensive and technically challenging; as such the number of modified bases that can currently be identified by nanopore sequencing approaches is still limited. Some approaches have been implemented to bypass these technical challenges, including analysis of deviations from the reference current in either DNA with stochastically-incorporated base analogs such as BrdU16, or in RNA with unknown modifications17. These approaches have proven quite successful yet still have limitations, including a requirement of a high...