Content area
(f.3) D.E. Knuth, The T'Symbol not transcribed'EXbook (Reading, MA: Addison - Wesley 1986). The software itself is distributed free by Stanford University and other sources, and has been ported to a huge variety of computers; there are also commercially supported versions. Most T'Symbol not transcribed'EX typesetting is now done with LAT'Symbol not transcribed'EX, an extension of the T'Symbol not transcribed'EX system that is distributed along with it. See Leslie Lamport, LAT'Symbol not transcribed'EX: A Document Preparation System (Reading, MA: Addison - Wesley 1986). On the typefaces themselves see D.E. Knuth, 'Mathematical typography,' Bulletin of the American Mathematical Society (new series) 1 (March 1979): 337 - 72, and D.E. Knuth, Computer Modern Typefaces (Computers and Typesetting, vol. E) (Reading, MA: Addison - Wesley 1986).
AN INCREASING NUMBER of scholarly books include listings of computer programs, commands, or data files. Typesetting such material poses special challenges, and blunders are common in otherwise well - produced material. In this article I will briefly review some technical requirements and stylistic conventions for typesetting computer programs. For further details of the languages themselves, see a textbook on the relevant language or a general handbook such as Tucker's Programming Languages.(f.1)
LISTINGS IN TYPE
The simplest way to typeset a computer program is to reproduce the text file as it would appear on the screen, using monospace type (figure 1). Here immediately the book designer collides with one of computerdom's most time - honoured traditions: the eighty - character line. Most monospace fonts are too wide to accommodate eighty, or even sixty, characters within a reasonable page width. After all, the usual text line is about sixty characters, and monospace is wider than proportional roman. Figure 2 compares the width of several monospace fonts with that of Times Roman.(f.2)
Not only is Courier unusually wide, it is also often too light, and thus doubly unsuitable. Originally, Courier was an IBM typewriter font designed to remain legible on carbon copies, poor photocopies, and the like. The typewriter ribbon contributed weight to it. IBM used Courier in early laser printers, and it has somehow become the default monospace font on modern laser printers - a thinned - out ghost of its former self.
Letter Gothic (also originally an IBM typewriter font) is narrower and bolder but has, to my taste, an unappealing appearance. Lucida Sans Typewriter, designed by Charles Bigelow, is an improvement, with subtle weighting rather than a constant line width. (There is also a Lucida Sans Typewriter Condensed, not shown.) The 'Line Printer' font on Hewlett - Packard laser printers also resembles Letter Gothic.
My favourite font for program listings is Computer Modern Typewriter, which was designed by D.E. Knuth for precisely this job - typesetting sample programs in books about computing - and is included in Knuth's T'Symbol not transcribed'EX software package.(f.3) If you are using T'Symbol not transcribed'EX with fonts other than its basic set, take care that Computer Modern Typewriter is not replaced by Courier.(f.4)
Do the lines really have to be eighty characters long? That's up to the author and the programming language. COBOL is hard to fit into less than seventy - two columns, but Prolog and Fortran often fit comfortably into forty. PC Techniques magazine has a sixty - three - character limit for Pascal and C programs, and I have to make a conscious effort to stay within it, but it doesn't cramp my style much.
In Microsoft BASIC there are some situations where a line has to be very long (eg, 250 characters) and can't be broken. I consider this a design flaw in the language, but what do you do if you have to typeset one of these super - long lines anyhow? Microsoft's solution is to indicate turned lines with the symbol 'Symbol not transcribed', like this:
Declare Function MYFN Lib "MYLIB" (ByVal Arg1Dollar, ByVal
'Symbol not transcribed' Arg2Dollar, ByVal Arg3Dollar, ByVal Arg4Dollar, ByVal Arg5Dollar,
'Symbol not transcribed'ByVal Arg6Dollar, ByVal Arg7Dollar) As Integer
and announce loudly, in a note at the beginning of each manual, that such a construction should actually be typed all on one line.
In any case, only the author should insert line breaks. Editing a computer program is very risky, since different languages put different restrictions on layout, and there is also the risk of changing the text by accident. I will not publish a computer program that has not been tried on a computer in final form, even if I did the editing myself.
Must the type be monospace? Possibly not. Figure 3 shows a program set in proportional type; as you'll see, it's narrower this way. Proportional type is practical only in languages that allow relatively free layout (Pascal, C, C++, Prolog, Lisp), and even then the author must be warned not to rely on alignments that will not be preserved; neat layout of complex material becomes harder to achieve.
The font used for programs must always be distinct from that used for ordinary text, so that words of the programming language will be distinct when occurring in English context: we want to talk about the if statement or the if statement, not the if statement. When proportional type is used, it is helpful if blanks in quoted strings are indicated explicitly by the character 'Symbol not transcribed' so that the number of blanks is evident.
CHARACTER SET
Almost all computers nowadays use the ASCII character set, shown in figure 4. Other characters (especially accented letters, box - drawing characters, and the like) may appear in comments and quoted strings; it's a good idea to determine early on whether the author is going to use any non - standard characters. The PL/I language adds the character , often written ~ on ASCII computers. APL, a language popular in the 1970s, had an alphabet all its own, and a special font is needed to typeset it.
The characters - and - must not join when written double; that is, -- and - _ must not come out as single lines. If proportional type is used, the minus sign must be the same width as the plus; the opposite of C++, if there were one, would be C -- , not C --. It goes without saying that O and O must be clearly distinct.
A common error is to introduce a false distinction between opening and closing quotation marks. Almost all programming languages begin and end quoted strings with the same character, 'like this' or "like this", not 'like this'. Many ASCII fonts render ' ' as '', respectively, making it clearer that only the second one is normally used as a quotation mark.
English punctuation must not 'slide under' quotation marks that separate a computer language from English. If the user is supposed to type 'this' you cannot tell him or her to type ' this,' unless you want the comma to be included.
The 'pipe' character often lacks the break in the middle; as far as I know, no language distinguishes broken and unbroken versions of it.(f.5) Outside the United States, # may print as the British pound sign and\as the yen sign, both of which are unacceptable in program listings.
Before 1980 most computers lacked lower - case letters, and the languages of that era (Fortran, PL/I, COBOL) are normally set in all caps. Newer languages, such as Pascal, C, and Prolog, use mostly lower case, with upper - case letters here and there for special purposes. Lisp and BASIC are in transition from the old style to the new.
The advent of lower case has also affected the names of programming languages. In 1970 the names of all the languages were set in all caps, because that's how they looked on the computer (FORTRAN, COBOL, ALGOL). Fastidious typesetters occasionally used small caps (FORTRAN, COBOL, ALGOL). Today, only acronyms are set in all caps: BASIC = Beginner's All - purpose Symbolic Instruction Code, COBOL = COmmon Business - Oriented Language. Pascal is a philosopher's name, not an acronym, and only its first letter is capitalized.
With acronyms formed from syllables, rather than initials, usage has shifted: FoRmula TRANslation was FORTRAN and is now Fortran; ALGOrithmaic Language, ALGOL, became Algol; LISt Processor, LISP, became Lisp. Some recent Microsoft manuals have changed BASIC to Basic.
FANCIER OPTIONS
A completely different way to typeset computer programs was invented by the designers of the Algol programming language in the late 1950s. At that time, computer character sets were not standardized, so the designers chose to design a typographic presentation, leaving it up to the implementors to specify how to type the language into any particular computer.
Figure 5 shows an example of Algol style, which is now used mainly in Pascal and Ada and is also the preferred way to typeset pseudocode - that is, semiformal descriptions of computations written in a mixture of a programming language and English.
In its full glory,(f.6) Algol style uses four kinds of type:
boldface for keywords (reserved words);
italics for identifiers (names);
roman for comments; and
monospace for quoted strings, with blanks denoted by 'Symbol not transcribed'.
Note that mathematical symbols such as + - < > and the semicolon are not italicized; they stand upright just as in algebraic formulas. Further, some authors depart from the ASCII character set, writing (for example) X instead of * for multiplication, instead of ^ for exponentiation, and or <= instead of := for assignment.(f.7)
Attenuated forms of Algol style also exist. The most common of them uses only two styles (boldface or all caps for reserved words, italics or ordinary type for everything else). Typesetting in Algol style would be quite a chore except that the markup can be performed by computer. (Computers understand these languages, after all, and can parse them.) Knuth's WEB software handles this, along with many other programming chores. I am not familiar with other software for inserting typographic markup into programs, but, like most programmer - authors, I will gleefully write my own when needed; it's a fine senior - level undergraduate programming project.
Even more elaborate styles are possible. Figures 6 and 7 show part of a C program in ASCII and in the format proposed by Ronald M. Baecker and Aaron Marcus.(f.8) Note that comment delimiters and braces around groups of statements disappear, their job being taken over entirely by layout. Boldface indicates externally defined names. When parentheses are nested, the outer ones are larger. Baecker and Marcus give complete specifications for this format; for ease of typesetting with T'Symbol not transcribed'EX, I have made two slight changes in the example, replacing Helvetica with Computer Modern Sans and changing gray boxes to ruled boxes. Similar formats for other languages and typesetting milieux could easily be devised.
CONCLUSION
The typesetting of computer programs is far from a solved problem, and unsatisfactory techniques are used altogether too often. The choice of format depends on whether the primary goal is to demonstrate how to type something into the computer, or to make a program more readable to someone who already knows the language. The survival of Algol style and the experiments of Baecker and Marcus show convincingly that formats other than the standard eighty - column monospace are useful. Two evident needs are standard software tools to insert typeface specifications into programs, and a better selection of condensed monospace fonts for traditional listings.(f.9)
MICHAEL A. COVINGTON is an associate research scientist in artificial intelligence at the University of Georgia and the author of Natural Language Processing for Prolog Programmers (Englewood Cliffs, NJ: Prentice - Hall 1994) and other books and articles on computer programming.
Footnotes:
(f.1) Allen B. Tucker Jr, Programming Languages (New York: McGraw - Hill 1986). The languages covered are Pascal, Fortran, COBOL, PL/I, SNOBOL, APL, Lisp, Prolog, C, Ada, and Modula - 2.
(f.2) The fonts have been scaled to have equal apparent height. The relation between nominal point size and actual size of letters is nowadays somewhat arbitrary.
(f.3) D.E. Knuth, The T'Symbol not transcribed'EXbook (Reading, MA: Addison - Wesley 1986). The software itself is distributed free by Stanford University and other sources, and has been ported to a huge variety of computers; there are also commercially supported versions. Most T'Symbol not transcribed'EX typesetting is now done with LAT'Symbol not transcribed'EX, an extension of the T'Symbol not transcribed'EX system that is distributed along with it. See Leslie Lamport, LAT'Symbol not transcribed'EX: A Document Preparation System (Reading, MA: Addison - Wesley 1986). On the typefaces themselves see D.E. Knuth, 'Mathematical typography,' Bulletin of the American Mathematical Society (new series) 1 (March 1979): 337 - 72, and D.E. Knuth, Computer Modern Typefaces (Computers and Typesetting, vol. E) (Reading, MA: Addison - Wesley 1986).
(f.4) A LAT'Symbol not transcribed'EX macro that takes care of this problem, somewhat crudely, is available from the author.
(f.5) But consider an epigram frequently seen on the Internet, and paraphrasing Rene Magritte: 'Ceci n'est pas une pipe.'
(f.6) As practised, for example, by D.E. Knuth (who else?) in Literate Programming (Stanford: Center for the Study of Language and Information 1992).
(f.7) An apocryphal story says that := originated as a typesetter's error in one of the early Algol documents: a marginal note said 'Typeset <= where typescritp has :=' and they didn't. But this is not confirmed; := already meant 'is defined as' in mathematical literature.
(f.8) Human Factors and Typography for More Readable Programs (Reading, MA: Addison - Wesley for ACM Press 1990)
(f.9) In preparing this article I have benefited from comments made by Melody Covington and Bob Stearns.
Figure 1 Part of a Pascal program set as a monospaced listing
Figure 2 Monospace fonts are wider than roman
TABLE : Figure 3 The same listing as figure 1, in a proportional font. NOT TRANSCRIBED
Figure 4 The 94 printable ASCII characters
TABLE: Figure 5 The same Pascal procedure in Algol style with four kinds of type. NOT TRANSCRIBED
Figure 6 Part of a C program in ASCII; compare figure 7
Figure 7 The same C code from figure 6, in Baecker - Marcus style
Copyright University of Toronto Press Oct 1994