Content area
In 1953, Harley E. Tillitt began experiments on storage and searching of a coordinate index using an IBM 701 Calculator. The program was operational in 1954, searching the library's coordinate index, which had been converted to a truncated machine readable form. A paper, believed to be the first report on library-related computerization, describing the system was presented at an IBM Computation Seminar in 1954. That paper is reprinted.
Harley E. Tillitt began experiments on storage and searching of a coordinate index using an IBM 701 Calculator soon after the machine arrived in September 1953 at the then Naval Ordnance Test Station, China Lake, California. By April 1954 Mr. Tillitt's program was operational, searching the library's coordinate index, which had been converted to a truncated machine readable form. In early May 1954 Mr. Tillitt presented the following paper, describing his system, at an IBM Computation Seminar at Endicott, New York. The paper is believed to be the first report on library-related computerization and is here printed for the first time because of its historic importance.
At the U. S. Naval Ordnance Test Station, an attempt has been made to use the 701 Calculator as a tool in the task of searching library files for documents referring to special subjects. The present system includes only reports which have been written in certain agencies throughout the country and does not include periodicals or books. Furthermore, the subjects are for the most part related to the development and testing of items of naval ordnance.
In any organization that includes research and development in its functions, it is economical in both time and money to be able to determine what has been done in a field before new programs are started. Scientists and engineers, therefore, are anxious to learn what is in the literature prior to starting some new task. Frequently, however, the labor of searching library files is so great or so unprofitable that it is either not done, or done very incompletely.
One of the reasons for the difficulty in searching is that the cataloging of reports may be such that important aspects of their contents are obscured. For example, the following report, Equilibrium Composition and Thermodynamic Properties of Combustion Gases, could logically be cataloged under one or more of several subject headings, which might or might not be appropriate, depending somewhat upon the technical skill of the cataloger. This particular report was filed in the China Lake Technical Library under two subjects: Gases and Physics. Both of these are standard Library of Congress subject headings, and are more or less descriptive of the report.
However, under each subject heading there were found to be several hundred other reports filed, in itself a situation that could discourage searching. More serious, however, was the fact that scientists interested in such a category of ordnance development might be equally likely to search under the subjects of Combustion or Physical Chemistry. Most serious, however, was the fact that there was no indication in the cataloging process that one of the main contributions of the report was to describe a numerical method by means of which the thermodynamic properties were computed. As a result, for one reason or another, the report was, in certain respects, lost, as far as many interested individuals were concerned.
To avoid some of the difficulty of cataloging documents by subject heading, a system can be used that depends upon a document being described by several single terms called descriptors.(1) In the library application of this system, there is a card for each descriptor. As a document comes to the library it is given an acquisition serial number and this number is entered upon as many different descriptor cards as seem necessary to describe the document.
In the example above, if the serial number of the report had been 1234, this number might have been entered on the following cards: Thermodynamics; Combustion; Gases; Computation; Fuel; Impulse; Pressure; Temperature; Entropy; Enthalpy; Adiabatic. Some descriptors do not seem related to the title, but could have been assigned after a brief inspection of the contents by the cataloger. To use such a system when information of a certain type is desired, an individual would list descriptors that would, in his opinion, describe his needs. These descriptor cards would then be pulled from the files and be visually compared for numbers that matched on the several cards. Reports corresponding to these matching serial numbers would then be withdrawn.
The original purpose of the 7O1 program to be described was to mechanize the above procedure with a view to the possible establishment of a daily schedule for library searching.
In designing the 701 system, attention was given to the current size of the file and the expected growth during the next five years. The two quantities considered were the expected total number of serial numbers and the total number of descriptors. It was estimated that during the next five years there would be no more than 30,000 serial numbers nor more than 5,000 descriptors. Furthermore, it was estimated that in searching for documents on a particular subject no more than eight descriptors would be listed and that any one of these would not have more than 1,000 serial numbers associated with it.
This coordinate index system has only recently been put into use at the Naval Ordnance Test Station, and at the time the 701 programming was started there had been established a list of approximately 2,500 descriptors. New descriptors are being added at the rate of about 100 per month, with an anticipated upper limit of 5,000. On these 2,500 cards there had been recorded a total of about 20,000 numbers describing nearly 4,000 documents, indicating that each serial number was recorded on an average of five different descriptor cards.
At the present time, additions are being made to the system at the rate of about 500 documents per month. This currently represents approximately 4,000 additional entries of serial numbers per month, since the catalogers are becoming more experienced with the system and are now using about eight descriptors per document.
The 701 operations are quite simple and go through nearly the same steps as are required in a normal hand search. These steps are as follows:
1. Install the master tape reel on which the file has been written. (The present arrangement of information on the tape is that each descriptor and associated report serial numbers form a unit record. The unit records are on the tape in order of increasing descriptor number.)
2. Load the searching program, plus from 1 to 20 cards on each of which are punched the 2 to 8 descriptors, called K's, that describe the subject of interest. (After loading, transition is made to the search program itself.)
3. Report serial numbers that appeared under all selected descriptors are printed
The following brief descriptions show the purpose of several programs used in the system.
Program A: Read into electrostatic storage as many decimal cards as required for a descriptor group.
Program B: Compute a check sum for a descriptor group and write it plus the group on tape.
Program C: Read a group, including its descriptor, from tape and match the descriptor against the 2 to 8 K's.
Program D: If a group descriptor matches any K write the group on drum.
Program E: After either 8 groups have been written on drum, or all of the K's exhausted, read the first two groups from drum and match report serial numbers. Store the matches where the first group had been.
Program F: Read subsequent groups from drum one at a time and continue to match those that remain with each new group, storing these where the first group had been.
Program G: When all groups have been read from drum and matched, print the final matches that remain.
Program H: Read a group from cards. Determine whether this is an addition to a group already on the tape or a group having a descriptor not previously used.
Program I: If the group in H is an addition to an old group, produce a new check sum for the old group plus the addition. If the group in H is new, produce a check sum for it. In either case collate the new information with that on the tape and write it on a second tape.
It can be seen that A and B will be used only once, when the system is started; C through G whenever a search is made; and H and I when additions are made as required by the continued acquisition of documents.
One of the objectives of the experiment is to attempt to reduce the amount of time spent in entering new serial numbers onto cards. This is a hand job requiring manipulation of the file and the recording of numbers, which is a slow process as well as a source of interference with individuals wishing to use the file at the same time. With the use of programs H and I, the system can be kept up to date without the need for hand operations except for the listing of additions on the sheet of paper, as contrasted with making card entries.
A second objective is to attempt to establish a daily schedule for document searching. Presumably, this would eliminate conflicts that arise when more than one person happens to want to use the file at the same time. Also, it is possible that if the mechanics of searching are such that scientists and engineers can delegate the task to their secretaries (and the 701), the general use of reports will be increased, with presumably beneficial results. At the present time from ten to twenty searches are made per day.
Although there are cases when an individual may wish to search immediately, it is believed that most such "urgent" needs can be planned to meet a schedule, especially if such a schedule would include two periods, such as 11:30 a.m. and 4:00 p.m. There has been no real experience on this part of the experiment as yet.
At present only one search can be made at a time; this is because the system is built to accommodate eight descriptors per search, each of which might contain up to 1,000 serial numbers. However, as indicated above, up to twenty searches can be made to follow in order with only one loading.
An improvement planned but not yet in effect is that of searching for 8 K's but also printing those serial numbers that match for 7, 6, 5, 4, 3, or 2 K's.
It is difficult to estimate the 701 time required for what may become a typical scheduled searching period. This depends upon several factors, including the total number of searches to be made in one period, the number of K's, the number of serial numbers per descriptor, and the location of the descriptor groups on the tape.
The time required to load the cards, which include the program and the K's, plus the pushing of card reader and load buttons, is about 10-15 seconds. A search for 8 K's through 300 groups, with from 5 to 40 serial numbers each, all located at the front of the tape, requires about 10-15 seconds. The minimum time of search, therefore, is in the range of 20-30 seconds.
As the file increases in size, by new descriptor groups being entered and new serial numbers being added to old groups, the time per search will approach that required to read the tape.
If the present estimate of 4,000 new entries per month proves to be correct, there will be about 240,000 plus the present 20,000 in five years. Since these are recorded as half words, there would be room to load the file on one 1,200-foot tape. Therefore, a minimum of search would require about the time needed to read 1,200 feet of tape, or approximately 4 minutes. The China Lake Technical Library staff suggests that if the labor of putting entries on cards is reduced, the number of descriptors assigned to a document may be greatly increased, perhaps by a factor of two. If this should happen, a single tape might not be sufficient.
In summary, this paper describes a method by means of which the 701 Calculator can perform certain library searching tasks. Depending upon several variables, a single search may require as little as twenty seconds or as much as four minutes. The system is at present in the nature of an experiment, and whether or not it will prove to be economical or practical remains to be seen.
REFERENCE
1 One discussion of this type of system is given in a series of eight technical reports by Mortimer Taube of Documentation Incorporated, Washington 6, D.C. These reports were prepared under Contract No. AF 18(600)-376 for the Armed Forces Technical Information Agency in the period July 1952 to March 1953.
Copyright American Library Association Mar 1993