Content area
Full text
Normalization rules are essential for interoperability between bibliographic systems. In the process of working with Name Authority Cooperative Program (NACO) authority files to match records with Functional Requirements for Bibliographic Records (FRBR) and developing the Faceted Application of Subject Terminology (FAST) subject heading schema, the authors found inconsistencies in independently created NACO normalization implementations. Investigating these, the authors found ambiguities in the NACO standard that need resolution, and came to conclusions on how the procedure could be simplified with little impact on matching headings. To encourage others to test their software for compliance with the current rules, the authors have established a Web site that has test files and interactive services showing their current implementation.
Sharing data between bibliographic systems requires the ability to compare two pieces of information to determine if they are intellectually equivalent regardless of the ways in which they are stored. The authors attempted to compare data created by disparate systems but theoretically normalized by the same rules, and discovered discrepancies. Researching the problem headings revealed that the NACO normalization rules are vague in some aspects and possibly too restrictive in others. Three independently developed implementations of the Program for Cooperative Cataloging's (PCC) Name Authority Cooperative Program (NACO) normalization rules were brought into agreement with each other through the use of a common test environment, which the authors have made publicly available. Areas in need of clarification and simplification were identified during the testing.
Normalization rules can be used to create a standard or generic form for headings and other similar alphanumeric strings. This standard form is essential for clustering logically identical headings and differentiating between logically different headings. The need to determine the equivalence of two headings arises frequently in work with both name and subject authority files. What characteristics of the heading are significant? Should capitalization, spacing, and punctuation be ignored? What about special characters? Are Smith-Jones and Smith & Jones the same? What about Black jack and Black, Jack? Depending on which rules are followed and how they are implemented, these may or may not be considered equivalent.
Normalization is the transformation of a string of characters into a more generic form. Typical transformations include reducing all alphabetic characters to a single case and eliminating diacritics and...