The Statistical Order of Knowledge: A Social and Intellectual History of Search Engines
Abstract (summary)
This dissertation tells the story of how the index, originally a library technology, became the hidden infrastructure of information societies. Using archival material, I trace the historical process through which indexes and indexers came to be automated. To be sure, an index is always a technology of automation. Without indexes, each time we searched for an item in a collection, we would have to read the collection in its entirety. Since the late nineteenth century, however, there have been many attempts to speed up and, eventually, fully automate the creation of indexes and the practice of information retrieval. Rather than literary indexing, my focus is scientific indexing because it is in the field of scientific communication that fears of information overload became entangled with commercial interests and the ambition to objectively establish a new unity of science. The main characters in this story are indexers, ranging from librarians and documentalists to information scientists and engineers. Although this story focuses on them, many of its consequences affect us— the indexees—directly.
Each chapter of this dissertation focuses on a different process in the automation of indexing, tracing both the intellectual controversies that underlie their development and reflecting on the social consequences that each of these transformations has wrought. Beginning with the world of scientific abstracting services at the turn of the twentieth century, I examine the politics of scholarly information in the years immediately before and after World War II in the United Kingdom; the emergence of keyword search and attempts to fully compute natural language in the 1950s in the United States; the use of academic citations to organize large collections of data and the origins of automated ranked indexes at the Institute for Scientific Information in the 1960s and 1970s. The final chapter examines the development of an indexing infrastructure that pervades our present moment and the roots of algorithmic bias in information sciences.
The dissertation uncovers an important epistemic shift I have termed the 'statistical order of knowledge'. Attempts to classify and order the sciences have long adopted a deductive approach that subdivided knowledge from larger categories into individual documents. In contrast to this approach, the period I am studying will foster the emergence of inductive methodologies that rely on the analysis of bibliographic data to produce new organizations of knowledge from which the subjectivity of the indexer can be bracketed. As I will show, however, the bracketing of the indexer's subjectivity was nothing but an illusion, as the tasks of indexing were delegated to algorithms designed to automate decision making and establish their own ranking criteria. To that end, the dissertation focuses on libraries, universities, laboratories, and research centers, which represent the institutional landscape in which the statistical order of knowledge emerged, but these spaces do not exhaust its influence. Soon, the ranked index will break out of its perimeter and come to organize every aspect of our digitally mediated experience.
Indexing (details)
Information science;
Statistics;
Information technology
0723: Information science
0489: Information Technology
0463: Statistics