Content area
Full Text
The Early Modern OCR Project (eMOP), funded by a development grant from the Andrew Mellon Foundation to Texas A&M University, is one of the first large-scale digital humanities (DH) projects to use book history as a solution to a digital problem. This project in a sense inverts the typical per- spective of the relationship between book history and the digital humanities, in which DH projects are perceived as providing access to an otherwise-hid- den or inaccessible past. Ours are not engines that crunch large amounts of bibliographical data; instead, they find their power source in that bibliograph- ical data. In the case of eMOP, as this essay will discuss, the relationship between the digital and the bibliographic is dialogic and reciprocal: while the project's goals, angles of approach, and ethos of interdisciplinarity are all char- acteristic of DH, it is only through an acknowledged utilization of book his- tory scholarship and methods that the project's ends will be accomplished. Book history-our corner of eMOP-represents two foundational nodes of the project. In the first place, we are identifying specific, minutely variant typefaces in order to distinguish as best we can between the myriad versions of the standard Roman typeface in early modern books.1 Secondly, we are study- ing type founders and foundries to trace the flow of fonts into and through London. Through this research we hope to realize the goal of eMOP: the auto- mation of a process by which trained optical character recognition (OCR) en- gines might more accurately "read" the images of early modern book pages in, for example, Early English Books Online (EEBO) and Eighteenth Century Collec- tions Online (ECCO). Ultimately our work will be formalized in a database that serves as the hub of this automated OCR process: the printer and typo- graphical data will act as a traffic cop of sorts, directing the properly trained OCR engine to read the appropriate page images. In fact, a central tenet of the Early Modern OCR Project is that training OCR engines to recognize the let- terforms in specific font sets will improve the accuracy of the OCR output- the resultant text files-when these engines are called upon to scan page im- ages printed in that typeface.
While this is only a...