Content area
Recommendations are presented for the improvement of online catalogs within the categories of closer connections to the users' work environment, Selective Dissemination of Information (SDI), downloading, reform of the Library of Congress Subject Headings (LCSH), enhanced search capabilities, and linking with other bibliographies and text. Library automation, until now based on factors internal to the library, should be associated with and paced by the parallel shift in the task environment of the people that the library serves. The widespread and prolonged failure to provide users with software that retains the formatting of downloaded records reflects poorly on attitudes toward library users. Online use of LCSH could be greatly simplified if the existing provision for subdivisions were developed and systematized as a verbal faceted classification. Current unpublished research analyzing transaction logs reveals unexpectedly low levels of effectiveness in the use of the MELVYL system and probably online catalogs in general.
Fifteen recommendations are offered for the improvement of online catalogs within the categories of closer connections to the users' work environment, SDI, downloading, reform of LCSH, enhanced search capabilities, and linking with other bibliographies and text.
Recognition of the achievements of the first ten years of the MELVYL online system of the University of California occasions an excellent opportunity to examine what needs to be done in the next ten years of online catalog design and development. What follows is a personal selection of improvements not only for the MELVYL system but for online catalogs generally.
USER ENVIRONMENT
The online catalog has two quite different kinds of impact. For all who visit the library, it is a different sort of catalog, with a keyboard, screen, and a new way of searching that replaces passive trays of cards.
A different impact arises with the growing proportion of library users whose work habits and working environments have changed to include routine use of computers. For these persons, the option of remote access to the library's catalog has constituted an important new extension of library service. Not since library catalogs were (infrequently) printed and distributed in book form in the nineteenth century has this kind of catalog access been possible. This second impact is selective, an enhancement of service for those whose work habits and equipment enable them to benefit. Library automation to improve library service within the library is clearly useful. However, the ability of the library to arrange for access from outside the library to materials stored electronically, such that users with suitable equipment and skills can use the resources by themselves, constitutes a much more substantial extension of library service.
Because people have moved to a personal computing environment for their work, they need the provision of online access to the online catalog, online bibliographies, and any other online resources because the effective performance of their work is based on access to electronic records. Their work is constrained if such access is not provided. For this reason library automation, hitherto based on factors internal to the library, should now be associated with and paced by the parallel shift in the "task environment" of the people the library serves. Once library users begin to work electronically, they are hindered by the lack of remote access to an online catalog and to materials in electronic form. This close coupling of library development with changes in users' working styles requires a new perspective. Any serious agenda for automation in library service should include enhancements designed to bring service to where the users are and into their personal working (and computing) environment. Our first four agenda items are in this class.
AUTOMATIC SDI
The Selective Dissemination of Information (SDI) is the notification of library users of selected, newly received items relevant to their personal interests. SDI is a well-established practice in small, specialized libraries but is labor intensive and, therefore, rarely found in large libraries. The idea of SDI has found new currency outside of libraries as "information filtering." The (largely independent) developments of electronic mail and of online library catalogs can be combined to provide automatic SDI if the catalog has an "AND LOADED SINCE date!" search limit capability (as the MELVYL system does)or can achieve a similar effect through, for example, record ID numbers in consecutive order.
One feasible approach would be along the following lines. A library user's SDI profile can be expressed in terms of an online search statement (e.g., FIND SUBJECT CATALOGS, ONLINE) and identified by the user's electronic mail address (e.g., bucklandotlet.berkeley.edu). During off-peak periods, at intervals such as once a month, an SDI program would initiate each search with the AND LOADED SINCE search limit set so as to capture records added to the catalog since the previous running of the program. Each search result would then be sent automatically as electronic mail to that library user's e-mail address. Implementing such a service would build on disparate, existing investments in e-mail systems, telecommunications networks, and an online catalog and would add to the value received from each.
It was clear by the mid-1980s that such a service would be feasible and relatively simple to implement, but it was not perceived as a priority for the MELVYL system at that time.(1) The practice of loading indexing and abstracting files in conjunction with online catalogs increases the scope for what could be a popular and inexpensive service.
FORMATTED DOWNLOADING
Library users, especially users of academic libraries, have developed personal computing environments, primarily for word processing, for the very same kind of work that generates library use. Dial-in access to library catalogs has become standard, and standard communications software allows for catalog searches to be downloaded. However, identifying individual records within the downloaded records usually require tedious editing. Software for this is available for some online catalogs and is under development for others, but the widespread and prolonged failure to provide users with software that retains the formatting of downloaded records reflects poorly on attitudes toward library users.
HIGH-SPEED RECORD TRANSMISSION
A decade ago the MELVYL system and other catalogs were designed to communicate with plain terminals over relatively slow lines. Now that situation is changing rapidly as high-speed networks and increasing numbers of workstations are coming into use. The MELVYL system, for example, currently downloads records at no more than around one record per second in off-peak periods and significantly more slowly during busy periods. Thus, for example, downloading a set of six hundred records for further analysis in a workstation takes at least ten minutes and possibly much longer. Effective use of the Internet, workstations, and the Z39.50 Search and Retrieve protocol will depend on much higher downloading speeds.
CORDLESS TELECOMMUNICATIONS
The card catalog could only be used in one place, and an enormous advantage of an online catalog is that it can be used anywhere telecommunications can reach. Very useful first steps have been to allow "dial-in" access from outside the library and to place terminals in stacks and reading rooms where they are needed, as well as in the traditional "catalog hall." The need for electrical power and telecommunications cables to reach the terminal is still an expensive constraint. Library users, however, are beginning to carry around small, portable, battery-powered notebook computers that are very convenient for in-library use. It would be an obvious amenity to enable library users with portable computers to use the online catalog without needing to connect to a telephone line. Cordless telephones or, more likely, radio transmission of data packets hold great promise. The pioneering research at the Division of Library Automation, supported by the California State Library, demonstrated the feasibility of this approach. The initial motivation had been to reduce the high cost of cabling large libraries, but the approach clearly has numerous possibilities.(2)
RESOURCES
What of the catalog as a resource? Two aspects spring to mind.
LCSH MODERNIZED
The predominant form of subject access in U.S. library catalogs is through the Library of Congress Subject Headings (LCSH), a system in which complex topics are expressed in the form of lengthy phrases (e.g. "Marmots as carriers of disease") or else as a heading extended by a set of qualifying subdivisions (e.g., "God -- Knowableness -- History of doctrines -- Early church, ca. 30-600 -- Congresses"). There is considerable scope for routine updating and systematizing LSCH, but there are two more basic problems. Searching long, complex subject headings ("Exact subject searches"), while relatively easy to do when scanning headings already visible on cards or printed catalogs, is doubly difficult in an online catalog: One has to guess what the heading might be; and one must avoid any keying error. In second-generation online catalogs such as the MELVYL system a sensible solution is to allow subject keyword searching, atomizing the carefully constructed pre-coordinate LCSH into component words for postcoordinate searching on these fragments. Online use of the LCSH system could be greatly simplified if the existing provision for subdivisions were developed and systematized as a verbal faceted classification. This is hardly an avant-garde suggestion; it is essentially what Paul Otlet and Henry La Fontaine did in 1895 to develop the Dewey Decimal Classification into the more powerful and versatile Universal Decimal Classification. A more faceted approach would, like the Anglo-American Cataloging Rules, the Dewey Decimal Classification, and the Library of Congress Classification, be respectably grounded in the nineteenth century.
CATALOGING QUALITY AND COMPLETENESS
The MELVYL system is a union catalog. As a policy decision, it was decided to retain every variant detail of cataloging from several different cataloging departments This complexity is largely hidden from the users, since only on form of the record is ordinarily displayed, but it is a wonderful boon for those who teach cataloging. Almost any MELVYL MARC display for an item cataloged by multiple catalog departments can provide a basis for discussion of record differences. There are two aspects to this: consistency and completeness. University of California cataloging may well be of above-average quality, yet the number of variant forms continues to increase. FIND EXACT SUBJECT LIBRARY, for example, yields two records: one a cataloguing error for LIBRARIES, the other a miskeying for LIBERTY. A program of record cleansing at OCLC, recently reported to be correcting 30,000 records a day, should be an inspiration to all catalog administrators.
A different problem is that of completeness. Initially only a few fields were searchable: author, title, and subject headings. As online catalog software evolves, the number of fields that can be searched increases, with the trend presumably toward being able to search all fields. However, the usefulness of this extended functionality will be limited by the frequency with which these fields are left empty or contain what appears to be a higher cataloging error rate.
SEARCH CAPABILITY
SPELLING AND PLURALS
A very useful feature of standard word processing software is the ability to identify spelling errors. This would be an obvious amenity for any online catalog, at least for subject searches that have retrieved no records. Further, it is pedantic and unfriendly to offer a service in which the user doe not even know that success in searching may depend on using that success in searching may depend on using a plural (e.g., CATFISHES IN ART) or, perhaps, a singular form of a subject -- or a variant spelling of a word when using the library catalog(ue)(s). That the catalog records may be inconsistent can reinforce a user's misunderstanding. for example, in the MELVYL system, FIND SUBJECT CHANSON retrieves 153 records, a result likely to be perceived as successful, but one that masks the fact that FIND SUBJECT CHANSONS retrieves 1,376 records. Only 113 records are common to the two sets.
MORE SEARCHABLE INDEXES
Online catalogs started with the traditional access points of author subject, and title. Since then the range of searchable access points has steadily increased to allow, for example, searching by date and language. The long-term expectation should be that eventually all data in or implicit in the MARC format will become searchable.
CROSS-REFERENCES
One of the major ingredients of cataloging is the systematization of syndetic structure: the web of cross-references between related terms. LCSH was expanded recently to provide "Use," "Use for," "See Broader term," "See Narrower term" "Related term" and "See Also." Further reform is needed in the case of tantalizing and minimally helpful guidance (e.g., "Gums and resins. See also specific gums and resins)) in which the names of the breeds are not revealed.
In the case of the MELVYL system, cross-referencing has been implemented for names but not yet for subjects. As on should expect, "Mark Twain" will also retrieve "Samuel Clemens." It is nonsensical that in a catalog for the general public "Vietnam War" does not retrieve material on what the LCSH coyly calls the "Vietnamese Conflict." LCSH has a cross-reference, but unfortunately, the MELVYL system has yet to implement even "Use" cross-references.
ENTRY VOCABULARY
The development of information retrieval in the past thirty years has been in two substantially different streams: traditional, deterministic, bibliographic approaches using human indexing, exact and Boolean searches, and retrieving sets of records (e.g., online catalogs and DIALOG); and probabilistic approaches emphasizing retrieval from full-text and ranked retrieval results (e.g., SMART). These streams have remained remarkably separate.(3) Online catalogs need to be designed for users who lack searching expertise and familiarity with the semantics of LCSH. A development that could be expected to transform the ease of use of online catalogs would be to combine these two approaches, using probabilistic techniques to derive from the user's vocabulary the terms in the system's vocabulary most likely to match the user's interests. A taste of the effects can be seen in the generic keyword search (FIND KW, actually a Boolean OR search combining title keyword and subject keyword searches), which has been implemented on some MELVYL databases, but not on the catalog. The CHESHIRE system and experiments on the OASIS system go one step further in displaying the most promising system subject headings in ranked order.(4) The effect would be similar to that of an up-to-date index, using contemporary language, to the LCSH and, preferably, to the LC Classification.
ADAPTIVENESS
Retrieval is more of a process than an event, so it is desirable that we think in terms of searching sessions rather than individual searches. In this context two developments are needed.
RETRIEVED SET ANALYSIS
The MELVYL system, as is typical in online catalogs, indicates the number of items retrieved by any given search but does little more than that. An amenity being developed experimentally as part of the "prototype adaptive library catalog" project is the routine analysis of any retrieved set to provide the searcher with a summary analysis of composition of the set retrieved.(5) Such an analysis provides an informed basis for estimating the consequences of modifying the search and, therefore, for deciding what to do next. An expert searcher can, within limits, ascertain the breakdown of the retrieved set by, say, language or date, but doing so can be quite tedious, and it is a task that is very suitable for delegation to the computer. A simple command to analyze by, say, date, language, and holding library could generate a display of the profile of the material with which one is dealing. A variation on this theme is to have the system analyze the distribution of subject headings within the set retrieved by any search. This refinement is particularly useful when exploring some topic that is widely scattered over LCSH. A title keyword search on "Working women," for example, yields records with a very wide spread of different LCSH. Here, as in so many cases, the LCSH headings found are individually plausible, but no one would have the imagination to think of all, or even many, of them. An online catalog can be programmed to excerpt, rank by frequency, and display the LCSH (or, in principle, any other attribute in or implicit in the catalog records) in any retrieved set. Such display, which could be generated automatically and routinely, could provide the non-expert searcher with a well-informed basis for deciding future moves in the search process, as well as be a useful convenience for even the most expert searcher. Adding, as some online catalogs an, counts of how many records for each heading would be retrieved from the entire database nicely complements the counts of the number in the set already retrieved. The later indicates the options; the former, the consequences of moving the search to related headings. Expert systems can be expected to need the same kind of analytical capability as a basis for inferring and proposing good next steps to propose.
STRATEGIC COMMANDS
Problems arise when the complexity of a task exceeds the user's expertise. Various options may be possible, including educating the user to increase expertise; providing advice situationally; simplifying the system; providing an intermediary (human or artificial); and, as with automatic transmissions and automatic cameras, shifting some of the complexity into the system. Expert, effective searching of online bibliographic systems is done by implementing a search strategy composed of a series of tactical moves. In practice, however, not all searchers are expert. Weak expertise is associated with a lack of knowledge of search commands, search strategies, and the arrangement of material in the database. Weak expertise is a significant problem in the case of online library catalogs, which are used by untrained searchers. As the functionality of online catalogs increases, so their complexity increases and so, too, the amount of expertise needed to use them. However, very few of the available commands are frequently used. In particular, as files grow in size with the retrospective conversion of older records, the frequency with which excessive numbers of records are retrieved increases expert searches know search tactics that can be used to reduce retrieved increases. Expert searchers know search tactics that can be used to reduce retrieved sets. The great majority of relatively inexpert users typically scroll through page after page of displayed records, then settle for the first few found or start over with some new search command.(6)
We use the term strategic search command to denote a search command that instructs the system to implement a series of tactical moves in some direction. Given the propensity of library users to limit themselves to only a few commands, it is difficult to see how else increasing complexity can be handled except by providing more versatile commands. As with the automatic transmission, it is a matter of enabling the user to delegate some of the complexity to the system and, as such, it is necessary that the user remain in control of the pace and direction. We recommend are currently developing strategic commands of the form of FIND MORE, FIND RELATED RECORDS, FIND FEWER, and SUMMARIZE the retrieved set!.(7) What works for the nonexpert is also likely to be a convenient amenity for the expert.
OUT OF ISOLATION
The substitution of the new information technology for the old information of paper and card may very well be a sensible and beneficial course of action, but in the longer term it misses the point of technological change.
Sooner or later we need to rethink and redesign what is done so that it is not a mechanization of paper but fully exploits the capabilities of the new technology.(8) In this, the online catalog is of special interest. For example, online catalogs normally display retrieval records in the alphabetical order of main entry. Why? The first few displayed are not likely to be any more interesting than any others in the retrieved set. It is, perhaps, an unconscious carry-over from the necessity of filing, and therefore viewing, 3-by-5-inch cards in alphabetical order. Our last three agenda topics are directions in which the MELVYL system is already pointing.
CATALOG AND BIBLIOGRAPHY
The mounting of a detailed bibliography, providing bibliographic access at the journal article level, when the MEDLINE file was loaded on the MELVYL system seemed radical at the time. Now, with hindsight, this move seems quite sensible, even overdue, yet it symbolizes the reversal of one hundred years of orthodoxy in library thinking: Catalogs are created in technical services departments with records derived from other libraries; bibliographies are normally created by publishers outside of libraries, made accessible through commercial firms, and searched by reference librarians in public services divisions. However, linking online bibliographies with online catalogs transforms this historic separation between bibliography and catalog. Linking bibliographies such as MEDLINE and CURRENT CONTENTS to holdings statements leads us toward a redefinition and dramatic enrichment of the library catalog. The new "catalog" becomes, in effect, the whole range of bibliographic access that can be linked to holdings records.(9)
OTHER CATALOGS
That online catalogs around the world are becoming accessible at a distance over networks echoes the nineteenth-century practice of printing and distributing library catalogs in book form (which became a victim of the move to card catalogs). Facilitating access by "pass-through" (such as the MELVYL system's USE command) and, prospectively, using the emerging "Search and Retrieve" standards (NISO Z39.50; ISO 10162/10163) are valuable moves toward universal bibliographic control.
CATALOG T AND TEXT
The pre-automation library was characterized by separations. The library and its catalog were more or less distant from the user's workplace, and the catalog was separate from the books. The online catalog can bring the catalog to the user and into the stacks. As files of documents become available in electronic form text (and other electronic objects) can be brought to the user. Further, the ability to bring both catalog and texts to the user will provide libraries with the option of having catalog records and their associated texts at the same time, engineering some of the advantages that make browsing in the stacks more attractive than doing so in card catalogs. The new connections are building the electronic library.
PRIORITIES
Since everything cannot be done at once, priorities become important. Current, unpublished research analyzing transaction logs reveals unexpectedly low levels of effectiveness in use of the MELVYL system (and probably of online catalog generally). A user can easily spend half an hour not quite finding what a expert searcher would quickly find. Perhaps this is to be expected when a complex system is provided that nonexpert people have no choice but to use. The user's ineffectiveness should provide the major basis for priorities in online catalog development. For example, the unorthodox step of providing an "entry vocabulary" that converts the user's terminology into the system's language might do more good than any other reform for those who have to use the catalog.(10)
ACKNOWLEDGMENTS
This paper has benefited from discussions with Michael G. Berger and the assistance of Barbara A. Norgard, as well as the author's past involvement in the development of the MELVYL system.
REFERENCES AND NOTES
(1) Michael K. Buckland, "Combining Electronic Mail with Online Retrieval in a Library Context," Information Technology and Libraries 6:266-71 (Dec. 1987).
(2) Clifford A. Lynch and Edwin B. Brownrigg, Packet Radio Networks: Architectures, Protocols, Technologies, and Applications (Oxford, England: Pergamon, 1987).
(3) An unusual exception is the CHESHIRE system developed by Ray R. Larson, in which a SMART-based system operates on the conventional MARC records of a library catalog. Ray R. Larson, "Classification Clustering, Probabilistic Information Retrieval, and the Online Catalog," Library Quarterly 61:133-73 (Apr. 1991): Ray R. Larson, "Evaluation of Advanced Retrieval Techniques in an Experimental Online Catalog," Journal of the American Society for Information Science 43:34-53 (Jan. 1992).
(4) For CHESIRE see Larson, "Classification Clustering" and Larson, "Evaluation." For OASIS, see Michael K. Buckland, Barbara A. Norgard, and Christian Plaunt, "Design for an Adaptive Library Catalog." Proceedings of the American Society for Information Science Mid-Year Meeting, Albuquerque, NM, May 8-10, 1992 (forthcoming).
(5) Buckland, Norgard, and Plaunt, "Design."
(6) Stephen Walker, "Interactional Aspects of a Reference Retrieval System Using Semi-Automatic Query Expansion," Informatics 10:119-36 Prospects for Intelligent Retrieval (London: Aslib; 1990).
(7) Buckland, Norgard, and Plaunt, "Design."
(8) Michael K. Buckland, Redesigning Library Services: A Manifesto (Chicago: American Library Assn., forthcoming).
(9) Michael K. Buckland, "Bibliography, Library Records, and the Redefinition of the Library Catalog," Library Resources and Technical Services 33, no. 4:299-311 (Oct. 1988).
(10) Michael G. Berger, private communication, December 1991.
Michael K. Buckland is Professor of Library and Information Studies. University of California, Berkeley. During 1983-86, as Assistant Vice-President for Library Plans and Policies in the University of California systemwide administration, his responsibilities included the Division of Library Automation and the MELVYL system.
Copyright American Library Association Jun 1992