Content area
The Research Libraries Group (RLG) spent 1991-1992 developing a server that implements the main features of ANSI/NISO standard Z39.50-1992, Information Retrieval Service and Protocol. The Z39.50 server allows users of other computers that run Z39.50 client programs to search RLG's bibliographic, authority, and citation databases. In any development project, some of the issues that the development team has to consider are the operating environment of the developed system, the development tools the team will use, the external design of the system, and the internal design. Then, the coding, testing, and implementing had to be done. The Z39.50 vision of being able to use a single interface to search databases housed on multiple remote systems is becoming a reality.
In 1991 and 1992, the Research Libraries Group (RLG) developed a server that implements the main features of ANSI/NISO standard Z39.50-1992, Information Retrieval Service and Protocol. The Z39.50 server allows users of other computers that run Z39.50 client programs to search RLG's bibliographic, authority, and citation databases. This service will be generally available in mid-1993.
In any development project, some of the issues that the development team has to consider are the operating environment of the developed system; the development tools the team will use; the external design of the system, or how it will look to users; and the internal design, or how the programs will look. Then, of course, the coding, testing, and implementing need to be done.
OPERATING ENVIRONMENT
In determining the operating environment for the server, we considered putting it on a separate computer, distinct from the mainframe. Use of the UNIX operating system on such a computer might have given us some advantages in flexibility of network' connections and in maturity of network software. It also would take some of the processing load off the mainframe. In the end, we decided to run the server on RLG's Amdahl mainframe, where the databases are also housed, to avoid running a private protocol between the intermediate system and the mainframe. This also provides a smaller system that is more easily maintained because it lives on a single machine.
Initially RLG planned to run the server over a protocol stack that conforms to the ISO Open Systems Interconnection model for communication between disparate computer systems. The Z39.50 standard was written to fit within this model as an application-layer protocol. RLG has several years of experience with the Linked System Project, running information-retrieval and record-transfer protocols over a stack that conforms to earlier versions of the ISO standards, and we planned to upgrade those programs to the current standards. During the time we were developing the server, however, other institutions began to express more interest in running Z39.50 over the Transmission Control Protocol/Internet Protocol (TCP/ZP) suite. Since many institutions, including RLG, were expanding their use of the Internet, we decided that if our server was going to be used, it should also run over TCP/IP. We learned that the mainframe-based TCP/IP support within the Stanford Timesharing System used by RLG could support the multiuser environment we anticipated within the context of its session-handling and time-sharing systems. We then switched our focus to the TCP/IP environment. The agreements reached by Z39.50 implementors in the Coalition for Networked Information's Z39.50 Implementors Testbed (ZIT) specify the use of only one function of the protocol layers between the application layer and the transport layer: the use at the presentation-layer level of the Basic Encoding Rules (BER) found in ISO 8825 for carrying the application protocol data units (APDUs) defined in Z39.50.
DEVELOPMENT TOOLS
Several developers of Z39.50 clients and servers began by using the ISO Development Environment (ISODE), a publicly available set of programs and accompanying documentation that implements the upper layers of the ISO communications protocols over TCP/IP for UNIX. Generally, they found this software to be unwieldy and difficult to use. With the test-bed group's agreements, only the BER encoding and decoding routines proved useful. As their implementations progressed, OCLC, the University of California at Berkeley, and Stanford University provided code that has been widely used by others implementing under UNIX.
Having decided to operate the server on the mainframe, RLG chose to use its normal language, Pascal/VS, for development. We looked for commercial ASN.1 compilers--programs that take the abstract syntax definition of the APDUs as input and generate the data structures and the encoding--but none was available for Pascal. So we wrote our own encoding and decoding routines, as did several other developers. These routines are specific to the Z39.50 APDUs and are not as generalized as a compiler would be, but it is not hard to add new PDUs. In addition, our routines are more optimized for speed than compiler-generated code can be.
Similarly, because we were writing in mainframe Pascal, we could not make use of the development efforts of other implementors who were using UNIX. Other institutions doing mainframe implementations chose to use the IBM OSI Communications Subsystem. Because of the differences in computing environments, these implementors have not been able to share code.
Historical precedents for servers exist at RLG, in the LSP context and others; while the Z39.50 server code grew out of this experience, some of the precedents were discarded. The magnitude of the changes between the 1988 and 1992 versions of the Z39.50 standard also provided motivation for writing some new code.
EXTERNAL DESIGN
In a client-server environment, the external design is specified by the protocol used between the client and the server, in this case, Z39.50. The interaction of protocol data units is specified in state tables in the standard. The state tables define, for any given state of the server, which PDUs can be expected from the client or generated by the server. Implementing the protocol is largely a matter of selecting among the options defined by the protocol and determining such things as message sizes and diagnostics.
One major area of effort for Z39.50 implementors is mapping the attributes that specify the parameters of a search to and from the underlying database management software and indexing schemes. Initially RLG took a very broad approach: we tried to support each attribute as well as we could, in some cases using capabilities of our DBMS that are not even used by our mainstream service, the Research Libraries Information Network (RLIN). We are now rethinking that approach, as some of the searching is inefficient and leads to unacceptable response times.
One implementation difficulty caused by the decision to run over TCP/IP was the lack of a clear definition of how to terminate a client-server session. The standard relies on termination services provided by another application-layer standard, the Association Control Service Element. The implementors ultimately decided to use a TCP close to shut down sessions.
INTERNAL DESIGN
Internally, the RLG Z39.50 server has two major modules, a protocol machine and a search engine. This design is fairly common among the implementors. The protocol machine receives and generates the protocol data units. It analyzes incoming PDUs and passes the necessary values to the search engine; it receives information about completed searches from the search engine and come poses the response PDUs. The search engine communicates with the DBMS, which actually carries out the searches and retrieves the records. The search engine also communicates with existing routines called Import/Export Services, which convert records from the internal DBMS format to the USMARC format. The protocol machine communicates with the BER encoding and decoding routines, which in turn work with a third set of modules called Communications Services, whose routines "talk" to the mainframe TCP/IP support.
The internal design separates the protocol machine and the search engine from each other; they communicate by messages that could easily be serialized in the same way that PDUs are serialized. Because of this design decision, it would be possible in the future to port the protocol-machine code to a different system to achieve the advantages of a more robust TCP environment, decreased mainframe processing load, and easier scalability as use grows. Thus the internal design reflects some of the early considerations about the operating environment.
As much information as possible is maintained outside the server code. This includes items such as PDU parameters (message sizes, element set names, version numbers, and options available), record conversion tables, database names and aliases, and, most important, the mapping from the use attributes defined in the standard to RLIN indexes. This design decision enables us to make several kinds of changes in the way the server operates without having to recode or recompile any programs. It facilitates prototyping of new capabilities as well as maintenance.
TESTING
Because the server combined existing components of its operating environment in new ways, testing was a lengthy process of tracking down problems in several different pieces of code, some of it not under RLGS control. This caused significant delays. The test environment included a Z39.50 client that enables testers to control elements of the PDUs that would normally be masked to the end user by a well-designed interface. Thus a tester can generate error conditions for the server to handle. There was a fringe benefit of this work: We have been able to use this client to demonstrate what goes on behind an interface, which clarifies the scope and intent of the Z39.50 protocol.
Following internal testing, RLG opened up the server to testing by members of the test-bed group. By and large their client programs were able to communicate with our server successfully. Many of the problems that did arise revolved around differing interpretations of the Basic Encoding Rules. Also there is a tendency among implementors to read only the ASN.1 definitions of the PDU parameters in the standard and to ignore the textual description of the parameters, which frequently contains guidance for applying the parameters.
Initially the RLG server provided access only to test versions of our database. When we provided access to the full database available to the test-bed group, implementors of client programs encountered new issues in accessing very large databases. It is easy for a client to generate a general search, and some seemingly simple searches retrieve large result sets. Managing these result sets on behalf of the user requires care in programming the client.
An area that still awaits resolution by the community of implementors concerns the mapping of queries, both from native searching languages to the intersite query and the bib-1 use attributes, and from the intersite queries to the searching languages of the database maintainers. Because systems have indexed similar types of data differently, a search entered on one system may not yield the expected results when carried out on another system. Resolving these issues will require a great deal of communication between librarians and other expert users of the systems and system implementors.
IMPLEMENTATION
Since July 1992, we have had literally thousands of connection attempts to our server from perhaps fifty different client addresses. Among the test-bed group, between fifteen and twenty institutions have client programs that have connected successfully and regularly. Many of the connection attempts have been from Wide Area Information Server (WAIS) clients; to date, WAIS supports only the 1988 version of Z39.50, which is not compatible with the version supported by the RLG server.
Users at Pennsylvania State University can search RLIN files using the Z39.50 client in LIAS (see accompanying article). This is the first instance of use of RLG's server by people who are not library staff.
OTHER DEVELOPMENTS
Also in 1992, RLG developed a patron-oriented search service called Eureka. Eureka lets users search RLIN databases without training or documentation. Eureka is implemented as a Z39.50 client that runs on RLG's mainframe, communicating with the server running on the same machine. Additional indexes that RLG created for Eureka will be made available to Z39.50 clients as well. We implemented for Eureka some services that are still under development for Z39.50, notably Scan (index browsing) and Sort. This system was developed on personal computers, though it runs on a mainframe. We are in the process of porting the Eureka client software to run under UNIX. When this is complete, the client could run on a computer at RLG or at a remote location.
CONCLUSION
While RLG developed its Z39.50 server over a period of two years, there is now experience--and some code--available that will shorten the development time for future implementors. As is frequently the case with development projects, circumstances changed during the development that made us alter some of our initial decisions, and the server is still evolving. Version 3 of the standard was approved in 1992, and we will be updating the server to support it. However, the Z39.50 vision of being able to use a single interface to search databases housed on multiple remote systems is becoming a reality, and that is very gratifying.
Lennie Stovel is Manager, Intersystem Applications, Research Libraries Group. Rich Fuchs designed RLG's Z39.50 server and coded the Protocol Machine and the BER encoding and decoding routines. Jui-wen Chang coded the search engine.
Copyright American Library Association Jun 1993