Content area
Full Text
Introduction
In human genomics today, one question looms above all others. How are we going to handle data on the phenotypes, genotypes, and environmental exposures of individual humans [1]? These data are already the lifeblood of our field and will play an increasingly dominant role in human-genomic research for decades, if not centuries, to come. We already collect these data in quantities that were unthinkable a few years ago, and a tsunami of new data will soon be upon us. Indeed, this metaphor is inadequate. Tsunamis are discrete, rare events that do a lot of damage and then recede. Survivors bury the dead, pick up the debris, beef up seawalls, and get on with their lives. In contrast, we are not dealing with a one-time event: the flux of data about human phenotypes, genotypes, and environmental influences will just keep growing, exponentially or super-exponentially, for the foreseeable future. Furthermore, the basic character of these data will differ greatly from those that human genomicists have gathered in the past. We need a strategic plan for managing these data, and it is increasingly obvious we lack one.
Geneticists and genomicists like change and have a good record of adapting to it. Consider the rapidity with which recombinant-DNA and genomic techniques allowed human geneticists to solve longstanding problems in the 1980s and 1990s. In that era, much of the energy of human geneticists went into exploring local features of the human genome in cottage-industry fashion. Once the whole genome had been sequenced, the energy once expended mapping out megabase-pair-sized regions, no easy task in the 1980s, was freed up for more scientifically rewarding endeavors. An optimist might imagine a similarly smooth transition from the current era, in which human genomicists and their collaborators expend enormous energy enrolling patients in one-off research studies, to an era in which huge data sets containing genomic, phenotypic, and environmental data on millions of recontactable people become widely available. In this essay, I argue this will not happen unless we make a big push now to create a true information commons. Inaction or misdirected actions pose an existential threat to the open-science traditions of human genomics. In what follows, I elaborate on this alarmist view and sketch a path forward that offers a...