Content area
Full text
This paper describes the technical challenges of developing software to convert Wade-Giles to Pinyin in bibliographic records that are not in Chinese.
The Chinese language is different from alphabetical languages to which most Westerners are accustomed. To represent items using such a language in a bibliographic database employing principally roman script requires some form of converting the original to a representation in alphabetic characters and possibly diacritics. Systems of such transliteration for Chinese date from at least 1605, but one prevalent in the last hundred years or so in the United States is the Wade-Giles (WG) system.1
Recently the Library of Congress (LC) decided to discontinue use of WG and adopt the newer Pinyin form of transliteration, adopted by the People's Republic of China in the late 1950s. This meant conversion of Chinese records in the Online Computer Library Center (OCLC) authority file and OCLC bibliographic file to Pinyin.
This evolved into a consortial effort among LC, Research Libraries Group (RLG), and OCLC, an effort extending over three years. Earlier efforts by the OCLC Office of Research have been reported elsewhere.2
Background
Once requests for comments and discussions with key libraries had taken place, there were major parts to the conversion effort to plan:
1. LC conversion of Chinese authorities by OCLC, scheduled to take place not later than October 1, 2000, "Day One"
2. Conversion of LC bibliographic records (bibliographic records) by RLG
3. Bibliographic records conversion by OCLC and RLG of their respective union catalogs
4. Conversion by OCLC of the non-Chinese records containing Chinese text, and later by RLG of similar records in their databases
5. Conversion efforts by OCLC and RLG of records of institutions from WG to Pinyin
Development of the Specifications
These were developed cooperatively as the project progressed. The specifications can be seen at the LC Web site.3 Some general points to keep in mind about the conversion:
1. Only fields/subfields specified by LC in the specs were to be converted.
2. Conversions made heavy use of dictionary lookups, not only for conversion of WG syllables to Pinyin counterparts, but also for phrase matching as in place names. The conversion sequences, which were directions for specific types of conversion such as geographic place names or Taiwan names,...





