Content area
The objective of IBM Corp.'s APL IL Interpreter Generator is to solve the problem of creating APL interpreters for different machines at a minimum cost. The objective has been accomplished by writing an APL interpreter in a specially designed programming language, the Madrid Scientific Center Intermediate Language (IL), which has very low semantics but high-level syntax. The interpreter is translated to each target machine language by easily built compilers that produce high-performance code. The IL compilers are usually written in APL or APL2, which makes them very easy to adapt to new target machines. The APL IL Interpreter Generator has been used to generate 8 APL and APL2 systems as well as an IBM Japan educational product called LETSMATH, which includes the interpreter without the user being aware of it.
(Computer character examples omitted)
The objective of the APL IL Interpreter Generator is to solve the problem of creating APL interpreters for different machines at a minimum cost. The objective has been accomplished by writing an APL interpreter in a specially designed programming language (IL) that has very low semantics but high-level syntax. The interpreter is translated to each target machine language by easily built compilers that produce high-performance code. The paper describes IL, the APL interpreters written in IL, and the final systems generated for seven different target machines and operating systems. Some of these systems have been generated in an extremely short time.
Among the many languages used to write programs, APL and its successor, APL2, are very powerful. They support highly structured data of several different internal types and recognize a large number of primitive functions and operators, some of which (for example, execute,)(Example omitted) are extremely complicated for some arguments. The existence of these primitives makes it very difficult for APL to be compiled (except for subsets of the language or through the inclusion of an interpreter in the machine code). Thus full APL and APL2 systems have to be interpretive. These interpreters are very large programs, consisting of tens of thousands of instructions.
Since interpreted programs normally run at least an order of magnitude slower than their compiled equivalents, programs written in APL or APL2 start with a speed handicap as compared to programs written in, say, C. However, the designers of APL and APL2 and the implementers of the interpreters have tried to reduce this effect in two different ways:
* By extending the language with ever more powerful primitives. In a single stroke, these perform complex operations that, in other languages, would require complicated algorithms. In this way, the time for interpretation is minimized with respect to the time for execution. The fact that most APL primitives apply to entire arrays also helps in this direction.
* By programming the interpreters in very low-level languages that make the best possible use of the resources of the machine or the operating system.
As a result, APL and APL2 interpreters were usually written in assembly languages, with the consequent loss of portability. It has been estimated several times that, done in this way, the full development of an APL system for a new machine requires a total of about 30 person-years.
The APL IL Interpreter Generator started as a project in the IBM Madrid Scientific Center in 1977. The objective of this project was to solve the problem of obtaining APL interpreters for different machines, at a minimum cost. The solution was to write an APL interpreter in a programming language, specially designed for the purpose, that has very low semantics but high-level syntax. This interpreter is translated to each target machine by appropriate, easily built compilers that produce high-performance code.
In the past 14 years, the programming language called the Madrid Scientific Center Intermediate Language (IL) has reached its third version; it has been essentially stable since 1980. The first section of the paper describes the language design decisions, which in many cases are curiously parallel to those made in the design of the C language, although there are important differences. The second section of the paper describes the different interpreters that have been written in IL since 1980. Finally, the last section describes the procedure used to generate an APL system for a given target environment (a machine and an operating system).
THE INTERMEDIATE LANGUAGE
The Madrid Scientific Center Intermediate Language (IL) was designed in the late 1970s, according to the following criteria: on the one hand, a high-level syntax was desirable to assure portability between different machines and operating systems; on the other hand, very low-level semantics would make it possible to obtain highly optimized code with very simple, easy-to-build compilers.
The procedure that was followed to design the IL instructions was to select the most common operations in the assembly languages of different IBM machines and to represent them with a high-level syntax. In this way, compilation of IL instructions into assembly language usually becomes a one-to-one translation between one IL symbol and one assembly instruction.
Even control instructions were subject to this procedure. Since the only control instruction in assembly languages is usually the branch on condition, this instruction is the only one that was implemented in IL, although it received a high-level syntax in the following way:
Optimization, in this kind of intermediate language, is not a question to be solved by the compilers, which we want to build as quickly as possible, but by the IL programmers who write the APL interpreter. Remember that this job should be done only once, although there may be as many compilers as there are different target machines.
The only assumption about the machine in which IL may eventually be implemented is that its memory is considered to be a vector of fixed but undefined size (eight bits or more per byte; two, four, or eight bytes per word). Memory units should be consecutively numbered.
The four elements of IL are now described.
CONSTANTS. Constants can be numeric or literal. In actual fact, a literal constant can also be considered as numeric and operated on accordingly. This means that an expression such as is valid and (assuming ordinality in the character set) is equivalent to constant The C language manages character constants in the same way.
ASCII (American Standard Code for Information Interchange) or EBCDIC (extended binary-coded decimal interchange code) can be selected as the internal representation of the literal constants. In the case of the APL IL interpreters, ASCII has been chosen.
Numeric constants can be either integer or floating point. Floating-point constants, such as 2.0, are distinguished by the presence of the period from integer constants, such as 2.
IDENTIFIERS. Identifiers are names that begin with a letter other than Q (which is reserved) and continue with any (possibly empty) combination of letters and figures. The maximum number of characters in an identifier is five.
What an identifier represents is controlled by its first letter, according to Table 1.(Table 1 omitted)
A full-word variable has an implementation-dependent length. Depending on the machine (in a 16-bit system, for instance), a full-word variable can be the same as a two-byte variable. This type is, to a certain extent, similar to the int type in the C language, but IL does not distinguish full-word integers from pointers. Assembly languages do not usually make this distinction either.
The only data structure supported is the vector (a succession of values at consecutive locations). Higher structures (such as matrices) are not a part of IL, as they are not a part of assembly languages. A scalar is considered to be the same as a vector of one element.
DECLARATIONS. In an IL program, declaration instructions are located at the beginning and clearly separated from executable instructions. Every variable used by a program must be declared, either by assigning initial values to it, or by defining an equivalence.
Initial values are assigned by means of instructions such as the following: The first instruction defines A as a vector of four full words with initial values of one, three, five, and seven. The second defines B as a vector of ten full words with initial values of zero. The third defines W as a vector of three bytes with initial values equal to the ASCII representation of letters A, B, and C.
Equivalences are very powerful and have different forms, such as: The first instruction defines variable C to have the same address as the third element of vector A (zero origin is used). Both A and C are full-word objects by virtue of their initial letter.
The second instruction defines V as a vector of eight bytes, sharing the address of floating-point variable F. This means that V is the vector of the bytes that make up the floating-point value of F, assuming that floating-point values are represented in eight bytes.
The third instruction defines C1 as a vector of three full words whose address is the current value of pointer P1 plus four. Of course, if the value of P1 changes, the address of C will change accordingly.
Pointers are extremely useful in IL programs, just as they are in C. However, there is no restriction on the number of equivalences that may be defined to a pointer at the same time. For instance, the following declarations are all valid and define three variables that share the same address (the value of pointer P), but have a different type. A1 is a pointer or full-word integer vector of four elements. I1 is a two-byte scalar, and V1 is a one-byte scalar.
EXECUTABLE INSTRUCTIONS. Executable IL statements are analyzed and executed from right to left. Functions are executed without any precedence rules in the order in which they are found. Parentheses are not allowed. The main IL executable instructions are of two different types: assignment instructions and execution control instructions.
Assignment instructions may take four different forms, according to the following syntax: where the first form corresponds to normal assignent, the second increments the value of the variable by the right-hand expression, the third decrements that value in the same way, and the fourth, only applicable to pointers, assigns to the variable the address of the expression on the right side.
Execution control statements have three different forms: where the first form corresponds to the unconditional transfer, the second to the conditional transfer, and the third to a computed go-to instruction.
The operations that can be a part of an expression are the typical ones usually encountered in most machine languages, such as the following: addition (+), subtraction (-), multiplication (x), division (/), residue (
Copyright International Business Machines Corporation 1991