Content area
Full Text
This review commemorates the 40th anniversary of DNA sequencing, a period in which we have already witnessed multiple technological revolutions and a growth in scale from a few kilobases to the first human genome, and now to millions of human and a myriad of other genomes. DNA sequencing has been extensively and creatively repurposed, including as a 'counter' for a vast range of molecular phenomena. We predict that in the long view of history, the impact of DNA sequencing will be on a par with that of the microscope.
DNA sequencing has two intertwined histories-that of the underlying technologies and that of the breadth of problems for which it has proven useful. Here we first review major developments in the history of DNA sequencing technologies (Fig. 1). Next we consider the trajectory of DNA sequencing applications (Fig. 2). Finally, we discuss the future of DNA sequencing.
History of DNA sequencing technologies
The development of DNA sequencing technologies has a rich history, with multiple paradigm shifts occurring within a few decades. Below, we review early efforts to sequence biopolymers, the invention of electrophoretic methods for DNA sequencing and their scaling to the Human Genome Project, and the emergence of second (massively parallel) and third (real-time, single-molecule) generation DNA sequencing. Some key technical milestones are also summarized in Box 1.
Early sequencing
Fred Sanger devoted his scientific life to the determination of primary sequence, believing that knowledge of the specific chemical structure of biological molecules was necessary for a deeper understanding1. Ironically, given the state of sequencing technology for each biopolymer today, proteins and RNA came first.
The first protein sequence, of insulin, was determined in the early 1950s by Sanger, who fragmented its two chains, deciphered each fragment and overlapped the fragments to yield a complete sequence. His work showed unequivocally that proteins had defined patterns of amino acid residues2. The later development of Edman degradation, a repeated elimination of an N-terminal residue from the peptide chain, made protein sequencing easier3. Although these methods were cumbersome, many proteins had been sequenced by the late 1960s, and it became clear that each protein's sequence varied across species and between individuals.
In the 1960s, RNA sequencing was tackled by this same general process: an RNA species was...