Content area
This article details the development of an experimental XML-based online library catalog. The emerging technology of XML, and its early implementation in Microsoft Internet Explorer 5, allowed for the development of an application employing the client-side processing of XML with JavaScript. But slow implementation of XML by other browser vendors, and a tendency towards the slow adoption of the newest browser versions by users, demanded an application employing server-side processing of the XML. Now in its third version, XMLCat demonstrates the viability of this approach, and points to possibilities for its future development.
Full text
Keywords
Catalogues, Libraries, Internet, Computer languages,
Client-server computing
Abstract
This article details the development of an experimental XML-- based online library catalog. The emerging technology of XML, and its early implementation in Microsoft Internet Explorer 5, allowed for the development of an application employing the client-side processing of XML with JavaScript. But slow implementation of XML by other browser vendors, and a tendency towards the slow adoption of the newest browser versions by users, demanded an application employing server-side processing of the XML. Now in its third version, XMLCat demonstrates the viability of this approach, and points to possibilities for its future development.
The genesis of the XMLCat Library Catalog began in the Spring of 1999 when I was Librarian at the Middle East Institute in Washington, DC. I was using Inmagic DB/ Textworks to maintain the library's online catalog and wanted to place it on the Web, both as a convenience to patrons, and as a way to cut down on the number of phone inquiries. Although Inmagic has very good software (WebPublisher) for placing databases (or textbases as they call them) on the Web, the more than $5,000 price tag was beyond the meager budget of my little library. And as I discovered on the Inmagic listserv, there were plenty of smaller libraries in the same situation as mine.
I wanted, therefore, to come up with a low-- cost solution (or no-cost, other than my time), that would not require a great deal of programming expertise, and what I knew of the emerging XML standard seemed like it might provide the answer. I had seen from my experience encoding texts in TEI-Lite (an SGML application), and using Softquad's Panorama software, an SGML publisher/ viewer, that encoding text in this fashion makes them highly searchable. I had observed that documents marked up in SGML - and by extension XML, a simplified version of SGML for the Web - with its highly hierarchical data structures, were analogous to relational databases, although seeming to contain far richer possibilities. All that were required to mine the wealth of data contained in these documents were adequate software tools.
The development of XMI.Cat: client-side version
In the Spring of 1999, Microsoft had just released its Internet Explorer version 5.0, the first XML-aware Web browser. With this release, containing both an XML parser and an XSL processor, the possibility of not only designing an XML-encoded library catalog, but of making it available to the public on the Web, had arrived. But I did not have the skills necessary for achieving this until the appearance, soon thereafter, of what was probably the first book on using XML with IE5, published by Microsoft Press. In XML In Action (Pardi, 1999), I gained an understanding of how to use something called the Document Object Model (DOM) - a representation, contained in memory, of the XML's tree-- structure - to walk through the document hierarchy, searching for data within specific XML elements, analogous to fields in a database. This technique seemed tailor-made for library applications. Since it relied heavily on the use of JavaScript for processing the XML, I then picked up a copy of Danny Goodman's JavaScript Bible (1998) to deepen my knowledge of this scripting language, and I was ready to write my application. The only major drawback to this approach, that I could see, was that it required the user to have IE5 installed on their machine, compounded by the fact that Netscape owned more than half of the browser market, and by the notorious slowness of users in adopting the latest versions of browsers. But if that was the price of being on the "cutting edge," so be it, and XML promised, in due course, to become the new standard for encoding documents for the Web. Universal browser implementation of XML, it seemed, couldn't be far off.
First I tackled the creation of the XML. All of my bibliographic records were contained in DB/ Textworks, so it just seemed a matter of outputting them from the database as XML. I accomplished this using the built-in report writing feature of the Inmagic software. For simplicity, I wrote a DTD (Document Type Definition) that replicated the structure and field names of my database (Figure 1), and constructed a form to output XML conforming to this DTD. After a bit of experimentation, I was able to export all of the records in the database (or any subset of these records) in well-formed, valid XML. The only problem here was that the extended ASCII (or special) characters were passed along to the XML intact. This meant they would have to be converted to character entities before the XML could be parsed by IE5 or any XML parser. I had to avoid converting the angle brackets ("<" and ">") since these are used for the tags in XML. To complicate matters, what if the bibliographic records themselves contained angle brackets? These obviously need to be converted to character entities, otherwise they were bound to cause problems in displaying and processing the XML. To get around these problems, I decided to convert all angle brackets within the database records themselves, using the global search and replace function of DB/Textworks. After exporting the XML, I ran it through a utility called ansi2ent (written in the Omnimark language - requires free Omnimark software), which I modified to omit conversion of angle brackets. I have since written an ANSI to entity convert in the Python programming language which gives one the options of converting or leaving angle brackets. A self-installing version is available at http:// elementarts.com/ansicon.html
My first version of XMLCat (Figure 2), available at http://www.elementarts.com/ xmlcat/search.html took advantage of IE5's XML implementation in order to build an entirely client-side application, using JavaScript to process the XML. A notice at the top of the search screen informs the would-be user: "You will need Internet Explorer 5 to use this online catalog." As mentioned previously, the disadvantage of this approach is browser dependency, something I felt would be remedied in due course as all major browsers
began to implement XML. However, when compared to library databases employing CGI and other server-side technologies, this disadvantage was offset by such features as a higher retrieval rate, the ability to instantly manipulate and reformat data, and the reduction of the load on the server. As I soon discovered, another down-side was the time required to download the XML file to the browser. The earliest books on designing XML Applications all used client-side JavaScript to process XML and none warned of this drawback. Their examples worked nicely, but used record sets that were too small for any meaningful bibliographic purpose. My first XML file consisted of a relatively modest 734 records, and at 750K, took nearly two minutes to download with a 28.8K connection, or about a minute with a 56K connection. To alert the user, the following caveat appeared at the top of the search screen - "Please wait for XMLCAT to load (about 2 minutes) before searching." Another message - "Please wait while the catalog loads!" scrolls across the status bar, ending in the words "Catalog loaded" when loading into memory is complete. But I could see from the beginning that this not insignificant delay had the potential for trying the patience of the user.
The search screen attempts to incorporate Boolean and field-specific searching in a very simple interface, based on two sets of radio buttons. In the first set, the user has a choice between "Any word," "All words," and "Exact Phrase" (corresponding to the Boolean "or", Boolean "and," and no Boolean operator); the second set, "Title," "Author," "Subject," and "All fields," which searches the text of the entire record.
As mentioned earlier, once the XML is loaded into memory, searches are completed very quickly (in about two or three seconds) since the search is performed entirely in memory, without any need to access either the server or the local hard drive. Search results, in a summarized record format, are displayed in a separate window, with the number of matching records displayed in the status bar (Figure 3). Each record is followed by a button labeled "Full Record." When this is clicked, the full bibliographic record is displayed in a separate window, in a second or less (Figure 3). Both the formatting and the display in separate windows of both search results and full records, were designed to approximate the appearance and functionality of Inmagic DB/Textworks, the initial catalyst for the project.
The underlying structure of the XML is what makes all this possible. Each bibliographic record is enclosed within a
The procedures then used to search and display the XML-encoded bibliographic records are fairly simple. Once the XML is loaded into memory, a script is executed which uses the Document Object Model, to "walk" through the
The development of XMI.Cat: server-side version
The next challenge facing me in the evolution of XMLCAT was the development of a browser-- independent version (http://www.elementarts. com/xmlcat/search.asp). The most
straightforward way was to use my JavaScript code server-side with Active Server Pages (ASP). Although requiring a Microsoft platform on the server, all of the processing of the XML is done there, transforming the XML into HTML. Then the HTML, usable by any browser, on any platform, is served up to the client. And although VBScript is the native script, JavaScript, or Microsoft's implementation of JavaScript, JScript, works equally well. This required only a fairly slight rewriting of the old code and minimal addition of new code to work. Also using capabilities inherent in ASP, I was able to add a simple entry box for user comments (some of which have proved really useful in the re-design process), and counters for total and current users (Figure 4).
The payoff was immediate: although the search times were lengthened somewhat to 10-20 seconds, the one to two minute wait for the XML to load into memory was eliminated. Happily, the new version permitted catalog access from any browser - Internet Explorer, Netscape, Opera - any version, and on any platform. Equally importantly, I was able to increase the number of bibliographic records from a mere 750 to more than 2,000, and still have acceptable performance. At 4,600 records, however, search time increased to about 30 seconds to complete, although in a typical search, results would begin displaying in about eight to ten seconds. The application was now designed to display partial results as it found matching records, rather than all at once, at the completion of the search. The idea here was to avoid having users stare at a blank page for 30 seconds or more. One consequence of this was that the record count had to appear at the bottom of the page, when the search was completed (Figure 5). Perhaps even less satisfactory than the 30-second search time was the additional ten or 15 seconds required to retrieve and display the full record from the server. So the second version of XMLCat, although a partial success, clearly still needed more development.
In the third version of XMLCat (available at http://www.elementarts.com/xmlcat/ search2.asp), a number of refinements were added, both to the search engine and to the interface. In order to speed up the search process, XMLCat first selects all elements corresponding to the search field, using the getElementsByTagName() method of the DOM to obtain references to, for example, only the
Other schemes for this restructuring could have been used. For example, the
A major departure in methodology was to write the selected XML to a file and then use XSLT (XML Style Sheet Transformations -- the transformation part of XSL) to transform the XML into HTML as opposed to creating it on the fly with JavaScript, as I had done in the previous version. Most importantly, this greatly speeds up retrieval of the full record since it now has to locate it in a much smaller subset of the original XML file. Also, by first writing the search results to a file, there is a much greater capacity to instantly re-sort and otherwise reformat the XML.
Improvements to the interface include: display of search results and record details in the same browser window as the initial search screen (users found the separate windows to be confusing), enhanced navigation between pages, display of search criteria in the search results screen (Figure 7), and display of the record unique id in the navigation bar of the record display (Figure 8). The record count could now also be brought up to the top of the page (Figure 7).
The future of XMLCat
With three distinct versions, and continuous minor improvements, XMLCat has gone well beyond its roots in Inmagic DB/Textworks. Several methods now exist for converting MARC records directly into XML, and it is equally possible to generate XML from a variety of other databases. I feel that once all browsers are fully XML and XSLT compliant, the ideal configuration for XMLCat and other XML-based library catalogs will be in a marriage of the two approaches, server-side and client-side, outlined above. Specifically, the initial searching and processing of the XML is more efficient on the server, while the reformatting, sorting, and other manipulations of the bibliographic information, is more efficient and faster on the client.
Until such time that browser implementation of XML and XSLT and consumer adoption of these new browsers reach a level that permits this type of application design, future versions of XMLCat will continue to incorporate these and other new features using only server-side technology.
In order to be practical for even moderately large library catalogs, an XML-based library catalog will have to incorporate the use of some type of database server software. The type will be determined by the structure of the underlying XML (Cagle, 2000). According to Professional XML, "the future of XML is inseparable from database technology," which would include "the ability to generate XML documents automatically from data stored in diverse mediums, and the ability to exchange information from different data stores" (Anderson et al., 2000). The key here is that the XML in the catalog would be updated dynamically from a database, as opposed to the more static approach currently employed by XMLCat.
Although a lot more work needs to be done in this area, I hope that this entire project, as preliminary as it is, points the way to and further stimulates development of XML-based library catalogs.
References and further reading
Anderson, R. et al. (2000), Professional XML, Wrox Press, Chicago, IL.
Cagle, K. (2000), "How to find the best XML server for you", XML Magazine, Fall, pp. 48-56.
Flanagan, D. (1998), JavaScript: The Definitive Guide, 3rd ed., O'Reilly, Sebastpool, CA.
Goodman, D. (1998), JavaScript Bible, 3rd ed., IDG Books, Foster City, CA.
Harms, D. and McDonald, K. (2000), The Quick Python Book, Manning Publications, Greenwich, CT.
Harold, E.R. (1999), XML Bible, IDG Books, Foster City, CA.
Hatfield, B. (1999), Active Server Pages for Dummies, 2nd ed., IDG Books, Foster City, CA.
Homer, A. (1999), XML in IES: Programmer's Reference, Wrox Press, Chicago, IL.
Kay, M. (2000), XSLT: Programmer's Reference, Wrox Press, Chicago, IL.
Miller, D.R. (2000). "XML: libraries' strategic opportunity", netConnect, Summer, pp. 18-22.
Niederst, J. (1999), Web Design in a Nutshell: A Desktop Quick Reference, O'Reilly, Sebastpool, CA.
Pardi, W.J. (1999), XML in Action, Microsoft Press, Redmond, WA.
The author
Paul Yachnes is Manager at the Information Resource Center, Newspaper Association of America, Vienna, Virginia, USA.
Copyright MCB UP Limited (MCB) 2000
