Content area
By looking in some detail at how metadata structures might be used to manage the functions of authority control, librarians get a better appreciation for both the potential of these structures and the complexities of authority control in a conventional online catalog.
One way to appreciate the structure of a familiar system is to compare it with another, less familiar one. Aspects and functions Of the known system are thrown into a new relief when seen in the light of a different system's operations. By looking in some detail at how metadata structures might be used to manage the functions of authority control, we can better appreciate both the potential of these structures and the complexities of authority control in a conventional online catalog.
My discussion will not attempt to cover all the active definitions of authority control and metadata. Rather, I will focus on comparing authority control in typical U. S. library catalogs based on AACR2, on LCSH applied according to the Library of Congress's Subject Cataloging Manual: Subject Headings, and on the LC name and subject authority files with the access management possibilities inherent in the Dublin Core element set and the Resource Description Framework model for data relationships.
Goals Of Authority Control
Authority control is sometimes thought of simply in terms of "see" references, but its real goals are more complex than that. First, authority control seeks to distinguish between entities.
The focus in authority work on controlling the form of name, title, and subject headings is the expression of a more basic intellectual effort to differentiate between persons, works, and concepts. Sometimes, the basis for these distinctions is not explicit in the authority record, which might contain only the name or subject term itself. Nonetheless, it is visible in the catalog where one can usually see the differences between the works entered under one John Smith and those entered under another. Current U.S. standard practice is to include enough data in an authority record to assist in making these "who's who, what's what" distinctions, in which the painstaking differentiation of heading forms is undertaken.
In addition to relating a unique heading with the entity or concept it represents in the catalog, authority control also seeks to relate the heading to other real and potential headings with cross-references. The former activity, making "see also" references, often is much more important to navigating the ocean of catalog entries than making simple "see" references. "See" references are essential to diversifying the entry vocabulary in a controlled heading catalog, but are generally left behind once one is "in." Still, the rules currently governing authority control leave many kinds of relationships unspecified for example, between a person and the many categories to which he or she may belong, or between a work and its genre or subject.
Library online catalogs support authority control in several ways. USMARC, now renamed MARC21, offers distinct formats for bibliographic and authority records. Fields from these records can be indexed in both ordered lists and keyword indexes. In "linked" systems the authority record is the primary site for heading storage, and bibliographic record heading fields point to the corresponding authority record heading for their content. In "unlinked" systems, this connection is implicit, and congruent forms from bibliographic and authority records are integrated in browsable indexes. Beyond these system features, a broadly based, rule-governed consensus exists among data providers about how to select and formulate data elements for catalog records, which in turn generates major reference files of authorized headings, variant terms, and source notes.
Metadata Structures
The development of metadata can be thought of as a loosely consolidated effort to create a standard structure for differing communities to use for the description and retrieval of records. The goal is to allow more freedom to data providers to select terms, forms, and data for describing and providing access to resources, while ensuring a significant measure of interoperability among the resulting metadata records and databases. This interoperability is achieved when metadata records make use of the expanding panoply of structural standards, including Extensible Markup Language (XML) namespaces, schema declarations, and the resource description framework.
Within a given metadata database, traditional authority control may be fairly easy to implement if contributors to the database can agree to follow rules and respect lists of authoritative forms for access data. As long as these functions are contained within a single system, possible routes to consistent practices and data are not hard to trace. But are the structural congruencies which metadata standards seek to achieve supportive of authority control on a wider levelacross databases and between the records belonging to different domains? The MARC21 formats have enabled automated library systems to share records and refer to shared global authority files. How does this compare to metadata structures?
One metadata standard, the Resource Description Framework (RDF), holds promise as a too] for achieving several of the goals of authority control over disparate data records. Simply put, RDF defines a coded structure that allows you to specify the relationship between one thing and another. For example, "Moby Dick has an author named Hen-nan Melville" specifies how the work is related to the person named. If the Uniform Resource Locator (URL) for an online version of Moby Dick were substituted for the simple title, the elements of this statement in RDF would become "Moby Dick URL [resource] has author [property] Herman Melville [value]". RDF takes this structure a crucial step further by allowing the "value" position in this statement to be occupied by another "resource" address. This resource address might not be as simple and self-contained as a URL, but within the definitions of an RDF conformant document, all the necessary information would be provided to retrieve the "Herman Melville" record from elsewhere on the Web. (I have deliberately chosen not to illustrate actual RDF coding here; the documents cited from the World Wide Web Consortium's Web site should be consulted for that level of detail).
The Resource Description Framework also makes possible the defining and sharing of sets of element names or properties. The Dublin Core element set defines a set of properties by which information resources may be described and accessed. When Dublin Core elements are expressed in terms of RDF, it becomes easier to move between Dublin Core metadata and other metadata types, such as directory records which name and define people, or rights records which detail terms of use. RDF provides a common structure for integrating data records of different kinds, just as MARC21 enables library systems to integrate data bibliographic, authority, and community information records in a single system.
But RDF does more than this. By creating a structure that allows the properties of one resource to be described by reference to other resources, RDF generalizes the concept underlying linked authority systems. Conventional linked authority databases are limited to establishing links between records in the same system. But RDF statements can refer to resources that differentiate and define entities, and concepts anywhere on the Web. By referring to organized XML namespaces and other wellmanaged data sets, metadata creators can select from defined sets of property terms and resource records, Further, the information discovered by reference to a second resource can go well beyond that provided by typical Library of Congress authority records. Metadata resource files that define entities are free also to define their own sets of significant properties for those entities and to link elements of their records to additional sets of records. Moby Dick's author is Herman Melville; Melville's birthplace is New York City; New York City includes the borough of Manhattan; and so on. The term "web" seems especially appropriate when considering this potentially unlimited network of resources whose analyzed properties are defined by other resources, each of which may be analyzed and defined by still more resources.
Potential Problems
There also are problems with this kind of distributed, linked structure for managing entity names and concept terms. It isn't clear how the "see" reference function is provided for in such a system. Offering access to a separate namespace or database that enables you to search for a name using variants of the authorized form may not be as efficient as offering both the name entries that relate to the resources one is describing and the variants of those names in a single, searchable file. Sophisticated searching programs could be developed to direct a searcher's query to the appropriate database and use its response to redirect the query to another database. But if the two data sets are not coextensive in their coverage, the "authority" database may include many hits that turn up empty when they are redirected to the database that was the searcher's original goal. Typical library catalogs control the extent of the authority file and provide a more consistent relationship between references, headings, and works found. It is hard to see how this could be managed in a distributed arrangement of metadata data sets such as is envisaged above.
Librarians' experiences managing authority files have deeper cautionary lessons to offer metadata developers. While RDF may be analogous to MARC21 as a common format applicable across different record types, it is harder to discern an analogy in the metadata environment for AACR2 and the LC authority files. These latter two standards are crucial to the consistency of form and meaning that enable library catalogs to work. Much of the discussion around metadata seems to take these rules and the consensus that supports them for granted, as though they were no more than common sense. Librarians have learned from experience that rules for form and content must be stated and interpreted and headings must be established and adhered to, if consistency across a database is to be achieved. Librarians also have learned that databases that attempt to engage the real world are volatile, subject to endless shifts in language, to constant influx of new information, and to unexpected changes in user expectations. It takes a large number of committed librarians working in close cooperation to achieve the level of currency and consistency of access found in online library catalogs. Records that rely on links to disparate systems maintained by different agencies with varying levels of commitment might find the inescapable problems of data management have been compounded by being distributed. Metadata structures may be able to support the same kind of integrated information discovery and retrieval that library catalogs offer, but they never will be able to finesse the effort and commitment to rules and content standards that cataloging requires.
Different Paths
In the end, it may be cultural differences that make the crosswalk between library catalogs and metadata such a problematic path. Metadata structures such as the Resource Description Framework and Dublin Core promise their implementers a measure of ease and freedom with regard to the form and content of data in metadata records which library cataloging, with its insistence on conformity and discipline, cannot. Communities attracted to metadata for these reasons may be successful at building very useful resource discovery and retrieval tools. But the more the freedom to tailor records to suit the interests of different communities is exploited, the more difficult it will be for developers to integrate metadata effectively across different communities' datasets and with standardized library catalogs, The analogies noted above between metadata structures, MARC2 1, and authority control functions hold only to a point. Looking beyond that point, it's easy to see how metadata-based systems will differ from library catalogs, appreciate the new kinds of navigation they will offer, and grasp the importance of not imposing library catalogers' expectations on looser, less controlled models of information discovery.
Sources Consulted
"Namespaces in XML: World Wide Web Consortium 14-January1999," editors Tim Bray, Dave Hollander, Andrew Layman, http://www.w3.org/TR/REC-xmlnames/
"Resource Description Framework (RDF) Model and Syntax Specification: W3C Recommendation 22 February 1999," Ora Lassila and Ralph R. Swick, editors, http://www.w3.org/FR/REC-rdfsyntax/.
"Resource Description Framework (RDF) Schema Specification: W3C Proposed Recommendation 03 March 1999," editors Dan Brickly, R.V. Guha, http://www.w3.org/TR/PR-rdfschema/
Stephen S. Hearn is Authority Control Coordinator at the University of Minnesota Libraries. He can be contacted at [email protected].
Copyright Media Periodicals Division, Trozzolo Resources, Inc. Jun/Jul 1999