The Lisp 1 Programmer's Manual appeared in 1960, and the Lisp 1.5 Programmer's Manual (McCarthy 1965) was published in 1962. The early history of Lisp is described in McCarthy 1978.
The two dialects in which most major Lisp programs of the 1970s were written are MacLisp (Moon 1978; Pitman 1983), developed at the MIT Project MAC, and Interlisp (Teitelman 1974), developed at Bolt Beranek and Newman Inc. and the Xerox Palo Alto Research Center. Portable Standard Lisp (Hearn 1969; Griss 1981) was a Lisp dialect designed to be easily portable between different machines. MacLisp spawned a number of subdialects, such as Franz Lisp, which was developed at the University of California at Berkeley, and Zetalisp (Moon 1981), which was based on a special-purpose processor designed at the MIT Artificial Intelligence Laboratory to run Lisp very efficiently. The Lisp dialect used in this book, called Scheme (Steele 1975), was invented in 1975 by Guy Lewis Steele Jr. and Gerald Jay Sussman of the MIT Artificial Intelligence Laboratory and later reimplemented for instructional use at MIT. Scheme became an IEEE standard in 1990 (IEEE 1990). The Common Lisp dialect (Steele 1982, Steele 1990) was developed by the Lisp community to combine features from the earlier Lisp dialects to make an industrial standard for Lisp. Common Lisp became an ANSI standard in 1994 (ANSI 1994).
One such special application was a breakthrough computation of scientific importance--an integration of the motion of the Solar System that extended previous results by nearly two orders of magnitude, and demonstrated that the dynamics of the Solar System is chaotic. This computation was made possible by new integration algorithms, a special-purpose compiler, and a special-purpose computer all implemented with the aid of software tools written in Lisp (Abelson et al. 1992; Sussman and Wisdom 1992).
The characterization of numbers as ``simple data'' is a barefaced bluff. In fact, the treatment of numbers is one of the trickiest and most confusing aspects of any programming language. Some typical issues involved are these: Some computer systems distinguish integers, such as 2, from real numbers, such as 2.71. Is the real number 2.00 different from the integer 2? Are the arithmetic operations used for integers the same as the operations used for real numbers? Does 6 divided by 2 produce 3, or 3.0? How large a number can we represent? How many decimal places of accuracy can we represent? Is the range of integers the same as the range of real numbers? Above and beyond these questions, of course, lies a collection of issues concerning roundoff and truncation errors--the entire science of numerical analysis. Since our focus in this book is on large-scale program design rather than on numerical techniques, we are going to ignore these problems. The numerical examples in this chapter will exhibit the usual roundoff behavior that one observes when using arithmetic operations that preserve a limited number of decimal places of accuracy in noninteger operations.
Throughout this book, when we wish to emphasize the distinction between the input typed by the user and the response printed by the interpreter, we will show the latter in slanted characters.
Lisp systems typically provide features to aid the user in formatting expressions. Two especially useful features are one that automatically indents to the proper pretty-print position whenever a new line is started and one that highlights the matching left parenthesis whenever a right parenthesis is typed.
Lisp obeys the convention that every expression has a value. This convention, together with the old reputation of Lisp as an inefficient language, is the source of the quip by Alan Perlis (paraphrasing Oscar Wilde) that ``Lisp programmers know the value of everything but the cost of nothing.''
In this book, we do not show the interpreter's response to evaluating definitions, since this is highly implementation-dependent.
Chapter 3 will show that this notion of environment is crucial, both for understanding how the interpreter works and for implementing interpreters.
It may seem strange that the evaluation rule says, as part of the first step, that we should evaluate the leftmost element of a combination, since at this point that can only be an operator such as + or * representing a built-in primitive procedure such as addition or multiplication. We will see later that it is useful to be able to work with combinations whose operators are themselves compound expressions.
Special syntactic forms that are simply convenient alternative surface structures for things that can be written in more uniform ways are sometimes called syntactic sugar, to use a phrase coined by Peter Landin. In comparison with users of other languages, Lisp programmers, as a rule, are less concerned with matters of syntax. (By contrast, examine any Pascal manual and notice how much of it is devoted to descriptions of syntax.) This disdain for syntax is due partly to the flexibility of Lisp, which makes it easy to change surface syntax, and partly to the observation that many ``convenient'' syntactic constructs, which make the language less uniform, end up causing more trouble than they are worth when programs become large and complex. In the words of Alan Perlis, ``Syntactic sugar causes cancer of the semicolon.''
Observe that there are two different operations being combined here: we are creating the procedure, and we are giving it the name square. It is possible, indeed important, to be able to separate these two notions--to create procedures without naming them, and to give names to procedures that have already been created. We will see how to do this in section .
Throughout this book, we will describe the general syntax of expressions by using italic symbols delimited by angle brackets--e.g., name--to denote the ``slots'' in the expression to be filled in when such an expression is actually used.
More generally, the body of the procedure can be a sequence of expressions. In this case, the interpreter evaluates each expression in the sequence in turn and returns the value of the final expression as the value of the procedure application.
Despite the simplicity of the substitution idea, it turns out to be surprisingly complicated to give a rigorous mathematical definition of the substitution process. The problem arises from the possibility of confusion between the names used for the formal parameters of a procedure and the (possibly identical) names used in the expressions to which the procedure may be applied. Indeed, there is a long history of erroneous definitions of substitution in the literature of logic and programming semantics. See Stoy 1977 for a careful discussion of substitution.
In chapter 3 we will introduce stream processing, which is a way of handling apparently ``infinite'' data structures by incorporating a limited form of normal-order evaluation. In section we will modify the Scheme interpreter to produce a normal-order variant of Scheme.
``Interpreted as either true or false'' means this: In Scheme, there are two distinguished values that are denoted by the constants #t and #f. When the interpreter checks a predicate's value, it interprets #f as false. Any other value is treated as true. (Thus, providing #t is logically unnecessary, but it is convenient.) In this book we will use names true and false, which are associated with the values #t and #f respectively.
Abs also uses the ``minus'' operator -, which, when used with a single operand, as in (- x), indicates negation.
A minor difference between if and cond is that the e part of each cond clause may be a sequence of expressions. If the corresponding p is found to be true, the expressions e are evaluated in sequence and the value of the final expression in the sequence is returned as the value of the cond. In an if expression, however, the consequent and alternative must be single expressions.
Declarative and imperative descriptions are intimately related, as indeed are mathematics and computer science. For instance, to say that the answer produced by a program is ``correct'' is to make a declarative statement about the program. There is a large amount of research aimed at establishing techniques for proving that programs are correct, and much of the technical difficulty of this subject has to do with negotiating the transition between imperative statements (from which programs are constructed) and declarative statements (which can be used to deduce things). In a related vein, an important current area in programming-language design is the exploration of so-called very high-level languages, in which one actually programs in terms of declarative statements. The idea is to make interpreters sophisticated enough so that, given ``what is'' knowledge specified by the programmer, they can generate ``how to'' knowledge automatically. This cannot be done in general, but there are important areas where progress has been made. We shall revisit this idea in chapter 4.
This square-root algorithm is actually a special case of Newton's method, which is a general technique for finding roots of equations. The square-root algorithm itself was developed by Heron of Alexandria in the first century A.D. We will see how to express the general Newton's method as a Lisp procedure in section .
We will usually give predicates names ending with question marks, to help us remember that they are predicates. This is just a stylistic convention. As far as the interpreter is concerned, the question mark is just an ordinary character.
Observe that we express our initial guess as 1.0 rather than 1. This would not make any difference in many Lisp implementations. MIT Scheme, however, distinguishes between exact integers and decimal values, and dividing two integers produces a rational number rather than a decimal. For example, dividing 10 by 6 yields 5/3, while dividing 10.0 by 6.0 yields 1.6666666666666667. (We will learn how to implement arithmetic on rational numbers in section .) If we start with an initial guess of 1 in our square-root program, and x is an exact integer, all subsequent values produced in the square-root computation will be rational numbers rather than decimals. Mixed operations on rational numbers and decimals always yield decimals, so starting with an initial guess of 1.0 forces all subsequent values to be decimals.
Readers who are worried about the efficiency issues involved in using procedure calls to implement iteration should note the remarks on ``tail recursion'' in section .
It is not even clear which of these procedures is a more efficient implementation. This depends upon the hardware available. There are machines for which the ``obvious'' implementation is the less efficient one. Consider a machine that has extensive tables of logarithms and antilogarithms stored in a very efficient manner.
The concept of consistent renaming is actually subtle and difficult to define formally. Famous logicians have made embarrassing errors here.
Lexical scoping dictates that free variables in a procedure are taken to refer to bindings made by enclosing procedure definitions; that is, they are looked up in the environment in which the procedure was defined. We will see how this works in detail in chapter 3 when we study environments and the detailed behavior of the interpreter.
Embedded definitions must come first in a procedure body. The management is not responsible for the consequences of running programs that intertwine definition and use.
In a real program we would probably use the block structure introduced in the last section to hide the definition of fact-iter:(define (factorial n) (define (iter product counter) (if (> counter n) product (iter (* counter product) (+ counter 1)))) (iter 1 1))We avoided doing this here so as to minimize the number of things to think about at once.
When we discuss the implementation of procedures on register machines in chapter 5, we will see that any iterative process can be realized ``in hardware'' as a machine that has a fixed set of registers and no auxiliary memory. In contrast, realizing a recursive process requires a machine that uses an auxiliary data structure known as a stack.
Tail recursion has long been known as a compiler optimization trick. A coherent semantic basis for tail recursion was provided by Carl Hewitt (1977), who explained it in terms of the ``message-passing'' model of computation that we shall discuss in chapter 3. Inspired by this, Gerald Jay Sussman and Guy Lewis Steele Jr. (see Steele 1975) constructed a tail-recursive interpreter for Scheme. Steele later showed how tail recursion is a consequence of the natural way to compile procedure calls (Steele 1977). The IEEE standard for Scheme requires that Scheme implementations be tail-recursive.
An example of this was hinted at in section : The interpreter itself evaluates expressions using a tree-recursive process.
For example, work through in detail how the reduction rule applies to the problem of making change for 10 cents using pennies and nickels.
One approach to coping with redundant computations is to arrange matters so that we automatically construct a table of values as they are computed. Each time we are asked to apply the procedure to some argument, we first look to see if the value is already stored in the table, in which case we avoid performing the redundant computation. This strategy, known as tabulation or memoization, can be implemented in a straightforward way. Tabulation can sometimes be used to transform processes that require an exponential number of steps (such as count-change) into processes whose space and time requirements grow linearly with the input. See exercise .
The elements of Pascal's triangle are called the binomial coefficients, because the nth row consists of the coefficients of the terms in the expansion of (x+y)n. This pattern for computing the coefficients appeared in Blaise Pascal's 1653 seminal work on probability theory, Traité du triangle arithmétique. According to Knuth (1973), the same pattern appears in the Szu-yuen Yü-chien (``The Precious Mirror of the Four Elements''), published by the Chinese mathematician Chu Shih-chieh in 1303, in the works of the twelfth-century Persian poet and mathematician Omar Khayyam, and in the works of the twelfth-century Hindu mathematician Bháscara Áchárya.
These statements mask a great deal of oversimplification. For instance, if we count process steps as ``machine operations'' we are making the assumption that the number of machine operations needed to perform, say, a multiplication is independent of the size of the numbers to be multiplied, which is false if the numbers are sufficiently large. Similar remarks hold for the estimates of space. Like the design and description of a process, the analysis of a process can be carried out at various levels of abstraction.
More precisely, the number of multiplications required is equal to 1 less than the log base 2 of n plus the number of ones in the binary representation of n. This total is always less than twice the log base 2 of n. The arbitrary constants k1 and k2 in the definition of order notation imply that, for a logarithmic process, the base to which logarithms are taken does not matter, so all such processes are described as .
You may wonder why anyone would care about raising numbers to the 1000th power. See section .
This iterative algorithm is ancient. It appears in the Chandah-sutra by Áchárya Pingala, written before 200 B.C. See Knuth 1981, section 4.6.3, for a full discussion and analysis of this and other methods of exponentiation.
This algorithm, which is sometimes known as the ``Russian peasant method'' of multiplication, is ancient. Examples of its use are found in the Rhind Papyrus, one of the two oldest mathematical documents in existence, written about 1700 B.C. (and copied from an even older document) by an Egyptian scribe named A'h-mose.
This exercise was suggested to us by Joe Stoy, based on an example in Kaldewaij 1990.
Euclid's Algorithm is so called because it appears in Euclid's Elements (Book 7, ca. 300 B.C.). According to Knuth (1973), it can be considered the oldest known nontrivial algorithm. The ancient Egyptian method of multiplication (exercise ) is surely older, but, as Knuth explains, Euclid's algorithm is the oldest known to have been presented as a general algorithm, rather than as a set of illustrative examples.
This theorem was proved in 1845 by Gabriel Lamé, a French mathematician and engineer known chiefly for his contributions to mathematical physics. To prove the theorem, we consider pairs (ak ,bk), where , for which Euclid's Algorithm terminates in k steps. The proof is based on the claim that, if are three successive pairs in the reduction process, then we must have . To verify the claim, consider that a reduction step is defined by applying the transformation ak-1 = bk, . The second equation means that ak = qbk + bk-1 for some positive integer q. And since q must be at least 1 we have . But in the previous reduction step we have bk+1= ak. Therefore, . This verifies the claim. Now we can prove the theorem by induction on k, the number of steps that the algorithm requires to terminate. The result is true for k=1, since this merely requires that b be at least as large as . Now, assume that the result is true for all integers less than or equal to k and establish the result for k+1. Let be successive pairs in the reduction process. By our induction hypotheses, we have and . Thus, applying the claim we just proved together with the definition of the Fibonacci numbers gives , which completes the proof of Lamé's Theorem.
If d is a divisor of n, then so is n/d. But d and n/d cannot both be greater than .
Pierre de Fermat (1601-1665) is considered to be the founder of modern number theory. He obtained many important number-theoretic results, but he usually announced just the results, without providing his proofs. Fermat's Little Theorem was stated in a letter he wrote in 1640. The first published proof was given by Euler in 1736 (and an earlier, identical proof was discovered in the unpublished manuscripts of Leibniz). The most famous of Fermat's results--known as Fermat's Last Theorem--was jotted down in 1637 in his copy of the book Arithmetic (by the third-century Greek mathematician Diophantus) with the remark ``I have discovered a truly remarkable proof, but this margin is too small to contain it.'' Finding a proof of Fermat's Last Theorem became one of the most famous challenges in number theory. A complete solution was finally given in 1995 by Andrew Wiles of Princeton University.
The reduction steps in the cases where the exponent e is greater than 1 are based on the fact that, for any integers x, y, and m, we can find the remainder of x times y modulo m by computing separately the remainders of x modulo m and y modulo m, multiplying these, and then taking the remainder of the result modulo m. For instance, in the case where e is even, we compute the remainder of be/2 modulo m, square this, and take the remainder modulo m. This technique is useful because it means we can perform our computation without ever having to deal with numbers much larger than m. (Compare exercise .)
Numbers that fool the Fermat test are called Carmichael numbers, and little is known about them other than that they are extremely rare. There are 255 Carmichael numbers below 100,000,000. The smallest few are 561, 1105, 1729, 2465, 2821, and 6601. In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a ``correct'' algorithm. Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering.
One of the most striking applications of probabilistic prime testing has been to the field of cryptography. Although it is now computationally infeasible to factor an arbitrary 200-digit number, the primality of such a number can be checked in a few seconds with the Fermat test. This fact forms the basis of a technique for constructing ``unbreakable codes'' suggested by Rivest, Shamir, and Adleman (1977). The resulting RSA algorithm has become a widely used technique for enhancing the security of electronic communications. Because of this and related developments, the study of prime numbers, once considered the epitome of a topic in ``pure'' mathematics to be studied only for its own sake, now turns out to have important practical applications to cryptography, electronic funds transfer, and information retrieval.
This series, usually written in the equivalent form , is due to Leibniz. We'll see how to use this as the basis for some fancy numerical tricks in section .
Notice that we have used block structure (section ) to embed the definitions of pi-next and pi-term within pi-sum, since these procedures are unlikely to be useful for any other purpose. We will see how to get rid of them altogether in section .
The intent of exercises - is to demonstrate the expressive power that is attained by using an appropriate abstraction to consolidate many seemingly disparate operations. However, though accumulation and filtering are elegant ideas, our hands are somewhat tied in using them at this point since we do not yet have data structures to provide suitable means of combination for these abstractions. We will return to these ideas in section when we show how to use sequences as interfaces for combining filters and accumulators to build even more powerful abstractions. We will see there how these methods really come into their own as a powerful and elegant approach to designing programs.
This formula was discovered by the seventeenth-century English mathematician John Wallis.
It would be clearer and less intimidating to people learning Lisp if a name more obvious than lambda, such as make-procedure, were used. But the convention is firmly entrenched. The notation is adopted from the calculus, a mathematical formalism introduced by the mathematical logician Alonzo Church (1941). Church developed the calculus to provide a rigorous foundation for studying the notions of function and function application. The calculus has become a basic tool for mathematical investigations of the semantics of programming languages.
Understanding internal definitions well enough to be sure a program means what we intend it to mean requires a more elaborate model of the evaluation process than we have presented in this chapter. The subtleties do not arise with internal definitions of procedures, however. We will return to this issue in section , after we learn more about evaluation.
We have used 0.001 as a representative ``small'' number to indicate a tolerance for the acceptable error in a calculation. The appropriate tolerance for a real calculation depends upon the problem to be solved and the limitations of the computer and the algorithm. This is often a very subtle consideration, requiring help from a numerical analyst or some other kind of magician.
This can be accomplished using error, which takes as arguments a number of items that are printed as error messages.
Try this during a boring lecture: Set your calculator to radians mode and then repeatedly press the button until you obtain the fixed point.
(pronounced ``maps to'') is the mathematician's way of writing lambda. means (lambda(y) (/ x y)), that is, the function whose value at y is x/y.
Observe that this is a combination whose operator is itself a combination. Exercise already demonstrated the ability to form such combinations, but that was only a toy example. Here we begin to see the real need for such combinations--when applying a procedure that is obtained as the value returned by a higher-order procedure.
See exercise for a further generalization.
Elementary calculus books usually describe Newton's method in terms of the sequence of approximations xn+1=xn-g(xn)/Dg(xn). Having language for talking about processes and using the idea of fixed points simplifies the description of the method.
Newton's method does not always converge to an answer, but it can be shown that in favorable cases each iteration doubles the number-of-digits accuracy of the approximation to the solution. In such cases, Newton's method will converge much more rapidly than the half-interval method.
For finding square roots, Newton's method converges rapidly to the correct solution from any starting point.
The notion of first-class status of programming-language elements is due to the British computer scientist Christopher Strachey (1916-1975).
We'll see examples of this after we introduce data structures in chapter 2.
The major implementation cost of first-class procedures is that allowing procedures to be returned as values requires reserving storage for a procedure's free variables even while the procedure is not executing. In the Scheme implementation we will study in section , these variables are stored in the procedure's environment.
The ability to directly manipulate procedures provides an analogous increase in the expressive power of a programming language. For example, in section we introduced the sum procedure, which takes a procedure term as an argument and computes the sum of the values of term over some specified interval. In order to define sum, it is crucial that we be able to speak of a procedure such as term as an entity in its own right, without regard for how term might be expressed with more primitive operations. Indeed, if we did not have the notion of ``a procedure,'' it is doubtful that we would ever even think of the possibility of defining an operation such as sum. Moreover, insofar as performing the summation is concerned, the details of how term may be constructed from more primitive operations are irrelevant.
The name cons stands for ``construct.'' The names car and cdr derive from the original implementation of Lisp on the IBM 704. That machine had an addressing scheme that allowed one to reference the ``address'' and ``decrement'' parts of a memory location. Car stands for ``Contents of Address part of Register'' and cdr (pronounced ``could-er'') stands for ``Contents of Decrement part of Register.''
Another way to define the selectors and constructor is(define make-rat cons) (define numer car) (define denom cdr)The first definition associates the name make-rat with the value of the expression cons, which is the primitive procedure that constructs pairs. Thus make-rat and cons are names for the same primitive constructor.Defining selectors and constructors in this way is efficient: Instead of make-rat calling cons, make-rat is cons, so there is only one procedure called, not two, when make-rat is called. On the other hand, doing this defeats debugging aids that trace procedure calls or put breakpoints on procedure calls: You may want to watch make-rat being called, but you certainly don't want to watch every call to cons.
We have chosen not to use this style of definition in this book.
Display is the Scheme primitive for printing data. The Scheme primitive newline starts a new line for printing. Neither of these procedures returns a useful value, so in the uses of print-rat below, we show only what print-rat prints, not what the interpreter prints as the value returned by print-rat.
Surprisingly, this idea is very difficult to formulate rigorously. There are two approaches to giving such a formulation. One, pioneered by C. A. R. Hoare (1972), is known as the method of abstract models. It formalizes the ``procedures plus conditions'' specification as outlined in the rational-number example above. Note that the condition on the rational-number representation was stated in terms of facts about integers (equality and division). In general, abstract models define new kinds of data objects in terms of previously defined types of data objects. Assertions about data objects can therefore be checked by reducing them to assertions about previously defined data objects. Another approach, introduced by Zilles at MIT, by Goguen, Thatcher, Wagner, and Wright at IBM (see Thatcher, Wagner, and Wright 1978), and by Guttag at Toronto (see Guttag 1977), is called algebraic specification. It regards the ``procedures'' as elements of an abstract algebraic system whose behavior is specified by axioms that correspond to our ``conditions,'' and uses the techniques of abstract algebra to check assertions about data objects. Both methods are surveyed in the paper by Liskov and Zilles (1975).
The use of the word ``closure'' here comes from abstract algebra, where a set of elements is said to be closed under an operation if applying the operation to elements in the set produces an element that is again an element of the set. The Lisp community also (unfortunately) uses the word ``closure'' to describe a totally unrelated concept: A closure is an implementation technique for representing procedures with free variables. We do not use the word ``closure'' in this second sense in this book.
The notion that a means of combination should satisfy closure is a straightforward idea. Unfortunately, the data combiners provided in many popular programming languages do not satisfy closure, or make closure cumbersome to exploit. In Fortran or Basic, one typically combines data elements by assembling them into arrays--but one cannot form arrays whose elements are themselves arrays. Pascal and C admit structures whose elements are structures. However, this requires that the programmer manipulate pointers explicitly, and adhere to the restriction that each field of a structure can contain only elements of a prespecified form. Unlike Lisp with its pairs, these languages have no built-in general-purpose glue that makes it easy to manipulate compound data in a uniform way. This limitation lies behind Alan Perlis's comment in his foreword to this book: ``In Pascal the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation. It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.''
In this book, we use list to mean a chain of pairs terminated by the end-of-list marker. In contrast, the term list structure refers to any data structure made out of pairs, not just to lists.
Since nested applications of car and cdr are cumbersome to write, Lisp dialects provide abbreviations for them--for instance,
The names of all such procedures start with c and end with r. Each a between them stands for a car operation and each d for a cdr operation, to be applied in the same order in which they appear in the name. The names car and cdr persist because simple combinations like cadr are pronounceable.
It's remarkable how much energy in the standardization of Lisp dialects has been dissipated in arguments that are literally over nothing: Should nil be an ordinary name? Should the value of nil be a symbol? Should it be a list? Should it be a pair? In Scheme, nil is an ordinary name, which we use in this section as a variable whose value is the end-of-list marker (just as true is an ordinary variable that has a true value). Other dialects of Lisp, including Common Lisp, treat nil as a special symbol. The authors of this book, who have endured too many language standardization brawls, would like to avoid the entire issue. Once we have introduced quotation in section , we will denote the empty list as '() and dispense with the variable nil entirely.
To define f and g using lambda we would write(define f (lambda (x y . z) body)) (define g (lambda w body))
Scheme standardly provides a map procedure that is more general than the one described here. This more general map takes a procedure of n arguments, together with n lists, and applies the procedure to all the first elements of the lists, all the second elements of the lists, and so on, returning a list of the results. For example:(map + (list 1 2 3) (list 40 50 60) (list 700 800 900)) (741 852 963)(map (lambda (x y) (+ x (* 2 y))) (list 1 2 3) (list 4 5 6)) (9 12 15)
The order of the first two clauses in the cond matters, since the empty list satisfies null? and also is not a pair.
This is, in fact, precisely the fringe procedure from exercise . Here we've renamed it to emphasize that it is part of a family of general sequence-manipulation procedures.
Richard Waters (1979) developed a program that automatically analyzes traditional Fortran programs, viewing them in terms of maps, filters, and accumulations. He found that fully 90 percent of the code in the Fortran Scientific Subroutine Package fits neatly into this paradigm. One of the reasons for the success of Lisp as a programming language is that lists provide a standard medium for expressing ordered collections so that they can be manipulated using higher-order operations. The programming language APL owes much of its power and appeal to a similar choice. In APL all data are represented as arrays, and there is a universal and convenient set of generic operators for all sorts of array operations.
According to Knuth (1981), this rule was formulated by W. G. Horner early in the nineteenth century, but the method was actually used by Newton over a hundred years earlier. Horner's rule evaluates the polynomial using fewer additions and multiplications than does the straightforward method of first computing an xn, then adding an-1xn-1, and so on. In fact, it is possible to prove that any algorithm for evaluating arbitrary polynomials must use at least as many additions and multiplications as does Horner's rule, and thus Horner's rule is an optimal algorithm for polynomial evaluation. This was proved (for the number of additions) by A. M. Ostrowski in a 1954 paper that essentially founded the modern study of optimal algorithms. The analogous statement for multiplications was proved by V. Y. Pan in 1966. The book by Borodin and Munro (1975) provides an overview of these and other results about optimal algorithms.
This definition uses the extended version of map described in footnote .
This approach to nested mappings was shown to us by David Turner, whose languages KRC and Miranda provide elegant formalisms for dealing with these constructs. The examples in this section (see also exercise ) are adapted from Turner 1981. In section , we'll see how this approach generalizes to infinite sequences.
We're representing a pair here as a list of two elements rather than as a Lisp pair. Thus, the ``pair'' (i,j) is represented as (list i j), not (cons i j).
The set S-x is the set of all elements of S, excluding x.
Semicolons in Scheme code are used to introduce comments. Everything from the semicolon to the end of the line is ignored by the interpreter. In this book we don't use many comments; we try to make our programs self-documenting by using descriptive names.
The picture language is based on the language Peter Henderson created to construct images like M.C. Escher's ``Square Limit'' woodcut (see Henderson 1982). The woodcut incorporates a repeated scaled pattern, similar to the arrangements drawn using the square-limit procedure in this section.
William Barton Rogers (1804-1882) was the founder and first president of MIT. A geologist and talented teacher, he taught at William and Mary College and at the University of Virginia. In 1859 he moved to Boston, where he had more time for research, worked on a plan for establishing a ``polytechnic institute,'' and served as Massachusetts's first State Inspector of Gas Meters.When MIT was established in 1861, Rogers was elected its first president. Rogers espoused an ideal of ``useful learning'' that was different from the university education of the time, with its overemphasis on the classics, which, as he wrote, ``stand in the way of the broader, higher and more practical instruction and discipline of the natural and social sciences.'' This education was likewise to be different from narrow trade-school education. In Rogers's words:
- The world-enforced distinction between the practical and the scientific worker is utterly futile, and the whole experience of modern times has demonstrated its utter worthlessness.
Rogers served as president of MIT until 1870, when he resigned due to ill health. In 1878 the second president of MIT, John Runkle, resigned under the pressure of a financial crisis brought on by the Panic of 1873 and strain of fighting off attempts by Harvard to take over MIT. Rogers returned to hold the office of president until 1881.
Rogers collapsed and died while addressing MIT's graduating class at the commencement exercises of 1882. Runkle quoted Rogers's last words in a memorial address delivered that same year:
In the words of Francis A. Walker (MIT's third president):
- ``As I stand here today and see what the Institute is, ... I call to mind the beginnings of science. I remember one hundred and fifty years ago Stephen Hales published a pamphlet on the subject of illuminating gas, in which he stated that his researches had demonstrated that 128 grains of bituminous coal--''
- ``Bituminous coal,'' these were his last words on earth. Here he bent forward, as if consulting some notes on the table before him, then slowly regaining an erect position, threw up his hands, and was translated from the scene of his earthly labors and triumphs to ``the tomorrow of death,'' where the mysteries of life are solved, and the disembodied spirit finds unending satisfaction in contemplating the new and still unfathomable mysteries of the infinite future.
- All his life he had borne himself most faithfully and heroically, and he died as so good a knight would surely have wished, in harness, at his post, and in the very part and act of public duty.
Equivalently, we could write(define flipped-pairs (square-of-four identity flip-vert identity flip-vert))
Rotate180 rotates a painter by 180 degrees (see exercise ). Instead of rotate180 we could say (compose flip-vert flip-horiz), using the compose procedure from exercise .
Frame-coord-map uses the vector operations described in exercise below, which we assume have been implemented using some representation for vectors. Because of data abstraction, it doesn't matter what this vector representation is, so long as the vector operations behave correctly.
Segments->painter uses the representation for line segments described in exercise below. It also uses the for-each procedure described in exercise .
For example, the rogers painter of figure was constructed from a gray-level image. For each point in a given frame, the rogers painter determines the point in the image that is mapped to it under the frame coordinate map, and shades it accordingly. By allowing different types of painters, we are capitalizing on the abstract data idea discussed in section , where we argued that a rational-number representation could be anything at all that satisfies an appropriate condition. Here we're using the fact that a painter can be implemented in any way at all, so long as it draws something in the designated frame. Section also showed how pairs could be implemented as procedures. Painters are our second example of a procedural representation for data.
Rotate90 is a pure rotation only for square frames, because it also stretches and shrinks the image to fit into the rotated frame.
The diamond-shaped images in figures and were created with squashinwards applied to wave and rogers.
Section describes one such language.
Allowing quotation in a language wreaks havoc with the ability to reason about the language in simple terms, because it destroys the notion that equals can be substituted for equals. For example, three is one plus two, but the word ``three'' is not the phrase ``one plus two.'' Quotation is powerful because it gives us a way to build expressions that manipulate other expressions (as we will see when we write an interpreter in chapter 4). But allowing statements in a language that talk about other statements in that language makes it very difficult to maintain any coherent principle of what ``equals can be substituted for equals'' should mean. For example, if we know that the evening star is the morning star, then from the statement ``the evening star is Venus'' we can deduce ``the morning star is Venus.'' However, given that ``John knows that the evening star is Venus'' we cannot infer that ``John knows that the morning star is Venus.''
The single quote is different from the double quote we have been using to enclose character strings to be printed. Whereas the single quote can be used to denote lists or symbols, the double quote is used only with character strings. In this book, the only use for character strings is as items to be printed.
Strictly, our use of the quotation mark violates the general rule that all compound expressions in our language should be delimited by parentheses and look like lists. We can recover this consistency by introducing a special form quote, which serves the same purpose as the quotation mark. Thus, we would type (quote a) instead of 'a, and we would type (quote (a b c)) instead of '(a b c). This is precisely how the interpreter works. The quotation mark is just a single-character abbreviation for wrapping the next complete expression with quote to form (quote expression). This is important because it maintains the principle that any expression seen by the interpreter can be manipulated as a data object. For instance, we could construct the expression (car '(a b c)), which is the same as (car (quote (a b c))), by evaluating (list 'car (list 'quote '(a b c))).
We can consider two symbols to be ``the same'' if they consist of the same characters in the same order. Such a definition skirts a deep issue that we are not yet ready to address: the meaning of ``sameness'' in a programming language. We will return to this in chapter 3 (section ).
In practice, programmers use equal? to compare lists that contain numbers as well as symbols. Numbers are not considered to be symbols. The question of whether two numerically equal numbers (as tested by =) are also eq? is highly implementation-dependent. A better definition of equal? (such as the one that comes as a primitive in Scheme) would also stipulate that if a and b are both numbers, then a and b are equal? if they are numerically equal.
If we want to be more formal, we can specify ``consistent with the interpretations given above'' to mean that the operations satisfy a collection of rules such as these:
- For any set S and any object x, (element-of-set? x (adjoin-set x S)) is true (informally: ``Adjoining an object to a set produces a set that contains the object'').
For any sets S and T and any object x, (element-of-set? x (union-set S T)) is equal to (or (element-of-set? x S) (element-of-set? x T)) (informally: ``The elements of (union S T) are the elements that are in S or in T'').
For any object x, (element-of-set? x '()) is false (informally: ``No object is an element of the empty set'').
Halving the size of the problem at each step is the distinguishing characteristic of logarithmic growth, as we saw with the fast-exponentiation algorithm of section and the half-interval search method of section .
We are representing sets in terms of trees, and trees in terms of lists--in effect, a data abstraction built upon a data abstraction. We can regard the procedures entry, left-branch, right-branch, and make-tree as a way of isolating the abstraction of a ``binary tree'' from the particular way we might wish to represent such a tree in terms of list structure.
Examples of such structures include B-trees and red-black trees. There is a large literature on data structures devoted to this problem. See Cormen, Leiserson, and Rivest 1990.
Exercises - are due to Paul Hilfinger.
See Hamming 1980 for a discussion of the mathematical properties of Huffman codes.
In actual computational systems, rectangular form is preferable to polar form most of the time because of roundoff errors in conversion between rectangular and polar form. This is why the complex-number example is unrealistic. Nevertheless, it provides a clear illustration of the design of a system using generic operations and a good introduction to the more substantial systems to be developed later in this chapter.
The arctangent function referred to here, computed by Scheme's atan procedure, is defined so as to take two arguments y and x and to return the angle whose tangent is y/x. The signs of the arguments determine the quadrant of the angle.
We use the list (rectangular) rather than the symbol rectangular to allow for the possibility of operations with multiple arguments, not all of the same type.
The type the constructors are installed under needn't be a list because a constructor is always used to make an object of one particular type.
Apply-generic uses the dotted-tail notation described in exercise , because different generic operations may take different numbers of arguments. In apply-generic, op has as its value the first argument to apply-generic and args has as its value a list of the remaining arguments.Apply-generic also uses the primitive procedure apply, which takes two arguments, a procedure and a list. Apply applies the procedure, using the elements in the list as arguments. For example,
(apply + (list 1 2 3 4))returns 10.
One limitation of this organization is it permits only generic procedures of one argument.
We also have to supply an almost identical procedure to handle the types (schemenumber complex).
See exercise for generalizations.
If we are clever, we can usually get by with fewer than n2 coercion procedures. For instance, if we know how to convert from type 1 to type 2 and from type 2 to type 3, then we can use this knowledge to convert from type 1 to type 3. This can greatly decrease the number of coercion procedures we need to supply explicitly when we add a new type to the system. If we are willing to build the required amount of sophistication into our system, we can have it search the ``graph'' of relations among types and automatically generate those coercion procedures that can be inferred from the ones that are supplied explicitly.
This statement, which also appears in the first edition of this book, is just as true now as it was when we wrote it twelve years ago. Developing a useful, general framework for expressing the relations among different types of entities (what philosophers call ``ontology'') seems intractably difficult. The main difference between the confusion that existed ten years ago and the confusion that exists now is that now a variety of inadequate ontological theories have been embodied in a plethora of correspondingly inadequate programming languages. For example, much of the complexity of object-oriented programming languages--and the subtle and confusing differences among contemporary object-oriented languages--centers on the treatment of generic operations on interrelated types. Our own discussion of computational objects in chapter 3 avoids these issues entirely. Readers familiar with object-oriented programming will notice that we have much to say in chapter 3 about local state, but we do not even mention ``classes'' or ``inheritance.'' In fact, we suspect that these problems cannot be adequately addressed in terms of computer-language design alone, without also drawing on work in knowledge representation and automated reasoning.
A real number can be projected to an integer using the round primitive, which returns the closest integer to its argument.
On the other hand, we will allow polynomials whose coefficients are themselves polynomials in other variables. This will give us essentially the same representational power as a full multivariate system, although it does lead to coercion problems, as discussed below.
For univariate polynomials, giving the value of a polynomial at a given set of points can be a particularly good representation. This makes polynomial arithmetic extremely simple. To obtain, for example, the sum of two polynomials represented in this way, we need only add the values of the polynomials at corresponding points. To transform back to a more familiar representation, we can use the Lagrange interpolation formula, which shows how to recover the coefficients of a polynomial of degree n given the values of the polynomial at n+1 points.
This operation is very much like the ordered union-set operation we developed in exercise . In fact, if we think of the terms of the polynomial as a set ordered according to the power of the indeterminate, then the program that produces the term list for a sum is almost identical to union-set.
To make this work completely smoothly, we should also add to our generic arithmetic system the ability to coerce a ``number'' to a polynomial by regarding it as a polynomial of degree zero whose coefficient is the number. This is necessary if we are going to perform operations such as
which requires adding the coefficient y+1 to the coefficient 2.
In these polynomial examples, we assume that we have implemented the generic arithmetic system using the type mechanism suggested in exercise . Thus, coefficients that are ordinary numbers will be represented as the numbers themselves rather than as pairs whose car is the symbol scheme-number.
Although we are assuming that term lists are ordered, we have implemented adjointerm to simply cons the new term onto the existing term list. We can get away with this so long as we guarantee that the procedures (such as add-terms) that use adjoin-term always call it with a higher-order term than appears in the list. If we did not want to make such a guarantee, we could have implemented adjoin-term to be similar to the adjoin-set constructor for the ordered-list representation of sets (exercise ).
The fact that Euclid's Algorithm works for polynomials is formalized in algebra by saying that polynomials form a kind of algebraic domain called a Euclidean ring. A Euclidean ring is a domain that admits addition, subtraction, and commutative multiplication, together with a way of assigning to each element x of the ring a positive integer ``measure'' m(x) with the properties that for any nonzero x and y and that, given any x and y, there exists a q such that y=qx+r and either r=0 or m(r)< m(x). From an abstract point of view, this is what is needed to prove that Euclid's Algorithm works. For the domain of integers, the measure m of an integer is the absolute value of the integer itself. For the domain of polynomials, the measure of a polynomial is its degree.
In an implementation like MIT Scheme, this produces a polynomial that is indeed a divisor of Q1 and Q2, but with rational coefficients. In many other Scheme systems, in which division of integers can produce limited-precision decimal numbers, we may fail to get a valid divisor.
One extremely efficient and elegant method for computing polynomial GCDs was discovered by Richard Zippel (1979). The method is a probabilistic algorithm, as is the fast test for primality that we discussed in chapter 1. Zippel's book (1993) describes this method, together with other ways to compute polynomial GCDs.
Actually, this is not quite true. One exception was the random-number generator in section . Another exception involved the operation/type tables we introduced in section , where the values of two calls to get with the same arguments depended on intervening calls to put. On the other hand, until we introduce assignment, we have no way to create such procedures ourselves.
The value of a set! expression is implementation-dependent. Set! should be used only for its effect, not for its value.The name set! reflects a naming convention used in Scheme: Operations that change the values of variables (or that change data structures, as we will see in section ) are given names that end with an exclamation point. This is similar to the convention of designating predicates by names that end with a question mark.
We have already used begin implicitly in our programs, because in Scheme the body of a procedure can be a sequence of expressions. Also, the consequent part of each clause in a cond expression can be a sequence of expressions rather than a single expression.
In programming-language jargon, the variable balance is said to be encapsulated within the new-withdraw procedure. Encapsulation reflects the general system-design principle known as the hiding principle: One can make a system more modular and robust by protecting parts of the system from each other; that is, by providing information access only to those parts of the system that have a ``need to know.''
In contrast with new-withdraw above, we do not have to use let to make balance a local variable, since formal parameters are already local. This will be clearer after the discussion of the environment model of evaluation in section . (See also exercise .)
One common way to implement rand-update is to use the rule that x is updated to ax+b modulo m, where a, b, and m are appropriately chosen integers. Chapter 3 of Knuth 1981 includes an extensive discussion of techniques for generating sequences of random numbers and establishing their statistical properties. Notice that the rand-update procedure computes a mathematical function: Given the same input twice, it produces the same output. Therefore, the number sequence produced by rand-update certainly is not ``random,'' if by ``random'' we insist that each number in the sequence is unrelated to the preceding number. The relation between ``real randomness'' and so-called pseudo-random sequences, which are produced by well-determined computations and yet have suitable statistical properties, is a complex question involving difficult issues in mathematics and philosophy. Kolmogorov, Solomonoff, and Chaitin have made great progress in clarifying these issues; a discussion can be found in Chaitin 1975.
This theorem is due to E. Cesàro. See section 4.5.2 of Knuth 1981 for a discussion and a proof.
MIT Scheme provides such a procedure. If random is given an exact integer (as in section ) it returns an exact integer, but if it is given a decimal value (as in this exercise) it returns a decimal value.
We don't substitute for the occurrence of balance in the set! expression because the name in a set! is not evaluated. If we did substitute for it, we would get (set! 25 (- 25 amount)), which makes no sense.
The phenomenon of a single computational object being accessed by more than one name is known as aliasing. The joint bank account situation illustrates a very simple example of an alias. In section we will see much more complex examples, such as ``distinct'' compound data structures that share parts. Bugs can occur in our programs if we forget that a change to an object may also, as a ``side effect,'' change a ``different'' object because the two ``different'' objects are actually a single object appearing under different aliases. These so-called side-effect bugs are so difficult to locate and to analyze that some people have proposed that programming languages be designed in such a way as to not allow side effects or aliasing (Lampson et al. 1981; Morris, Schmidt, and Wadler 1980).
In view of this, it is ironic that introductory programming is most often taught in a highly imperative style. This may be a vestige of a belief, common throughout the 1960s and 1970s, that programs that call procedures must inherently be less efficient than programs that perform assignments. (Steele (1977) debunks this argument.) Alternatively it may reflect a view that step-by-step assignment is easier for beginners to visualize than procedure call. Whatever the reason, it often saddles beginning programmers with ``should I set this variable before or after that one'' concerns that can complicate programming and obscure the important ideas.
Assignment introduces a subtlety into step 1 of the evaluation rule. As shown in exercise , the presence of assignment allows us to write expressions that will produce different values depending on the order in which the subexpressions in a combination are evaluated. Thus, to be precise, we should specify an evaluation order in step 1 (e.g., left to right or right to left). However, this order should always be considered to be an implementation detail, and one should never write programs that depend on some particular order. For instance, a sophisticated compiler might optimize a program by varying the order in which subexpressions are evaluated.
If there is already a binding for the variable in the current frame, then the binding is changed. This is convenient because it allows redefinition of symbols; however, it also means that define can be used to change values, and this brings up the issues of assignment without explicitly using set!. Because of this, some people prefer redefinitions of existing symbols to signal errors or warnings.
The environment model will not clarify our claim in section that the interpreter can execute a procedure such as fact-iter in a constant amount of space using tail recursion. We will discuss tail recursion when we deal with the control structure of the interpreter in section .
Whether W1 and W2 share the same physical code stored in the computer, or whether they each keep a copy of the code, is a detail of the implementation. For the interpreter we implement in chapter 4, the code is in fact shared.
Set-car! and set-cdr! return implementation-dependent values. Like set!, they should be used only for their effect.
We see from this that mutation operations on lists can create ``garbage'' that is not part of any accessible structure. We will see in section that Lisp memory-management systems include a garbage collector, which identifies and recycles the memory space used by unneeded pairs.
Get-new-pair is one of the operations that must be implemented as part of the memory management required by a Lisp implementation. We will discuss this in section .
The two pairs are distinct because each call to cons returns a new pair. The symbols are shared; in Scheme there is a unique symbol with any given name. Since Scheme provides no way to mutate a symbol, this sharing is undetectable. Note also that the sharing is what enables us to compare symbols using eq?, which simply checks equality of pointers.
The subtleties of dealing with sharing of mutable data objects reflect the underlying issues of ``sameness'' and ``change'' that were raised in section . We mentioned there that admitting change to our language requires that a compound object must have an ``identity'' that is something different from the pieces from which it is composed. In Lisp, we consider this ``identity'' to be the quality that is tested by eq?, i.e., by equality of pointers. Since in most Lisp implementations a pointer is essentially a memory address, we are ``solving the problem'' of defining the identity of objects by stipulating that a data object ``itself'' is the information stored in some particular set of memory locations in the computer. This suffices for simple Lisp programs, but is hardly a general way to resolve the issue of ``sameness'' in computational models.
On the other hand, from the viewpoint of implementation, assignment requires us to modify the environment, which is itself a mutable data structure. Thus, assignment and mutation are equipotent: Each can be implemented in terms of the other.
If the first item is the final item in the queue, the front pointer will be the empty list after the deletion, which will mark the queue as empty; we needn't worry about updating the rear pointer, which will still point to the deleted item, because empty-queue? looks only at the front pointer.
Be careful not to make the interpreter try to print a structure that contains cycles. (See exercise .)
Because assoc uses equal?, it can recognize keys that are symbols, numbers, or list structure.
Thus, the first backbone pair is the object that represents the table ``itself''; that is, a pointer to the table is a pointer to this pair. This same backbone pair always starts the table. If we did not arrange things in this way, insert! would have to return a new value for the start of the table when it added a new record.
A full-adder is a basic circuit element used in adding two binary numbers. Here A and B are the bits at corresponding positions in the two numbers to be added, and Cin is the carry bit from the addition one place to the right. The circuit generates SUM, which is the sum bit in the corresponding position, and Cout, which is the carry bit to be propagated to the left.
These procedures are simply syntactic sugar that allow us to use ordinary procedural syntax to access the local procedures of objects. It is striking that we can interchange the role of ``procedures'' and ``data'' in such a simple way. For example, if we write (wire 'get-signal) we think of wire as a procedure that is called with the message get-signal as input. Alternatively, writing (get-signal wire) encourages us to think of wire as a data object that is the input to a procedure get-signal. The truth of the matter is that, in a language in which we can deal with procedures as objects, there is no fundamental difference between ``procedures'' and ``data,'' and we can choose our syntactic sugar to allow us to program in whatever style we choose.
The agenda is a headed list, like the tables in section , but since the list is headed by the time, we do not need an additional dummy header (such as the *table* symbol used with tables).
Observe that the if expression in this procedure has no alternative expression. Such a ``one-armed if statement'' is used to decide whether to do something, rather than to select between two expressions. An if expression returns an unspecified value if the predicate is false and there is no alternative.
In this way, the current time will always be the time of the action most recently processed. Storing this time at the head of the agenda ensures that it will still be available even if the associated time segment has been deleted.
Constraint propagation first appeared in the incredibly forward-looking SKETCHPAD system of Ivan Sutherland (1963). A beautiful constraint-propagation system based on the Smalltalk language was developed by Alan Borning (1977) at Xerox Palo Alto Research Center. Sussman, Stallman, and Steele applied constraint propagation to electrical circuit analysis (Sussman and Stallman 1975; Sussman and Steele 1980). TK!Solver (Konopasek and Jayaraman 1984) is an extensive modeling environment based on constraints.
The setter might not be a constraint. In our temperature example, we used user as the setter.
The expression-oriented format is convenient because it avoids the need to name the intermediate expressions in a computation. Our original formulation of the constraint language is cumbersome in the same way that many languages are cumbersome when dealing with operations on compound data. For example, if we wanted to compute the product , where the variables represent vectors, we could work in ``imperative style,'' using procedures that set the values of designated vector arguments but do not themselves return vectors as values:(v-sum a b temp1) (v-sum c d temp2) (v-prod temp1 temp2 answer)Alternatively, we could deal with expressions, using procedures that return vectors as values, and thus avoid explicitly mentioning temp1 and temp2:(define answer (v-prod (v-sum a b) (v-sum c d)))Since Lisp allows us to return compound objects as values of procedures, we can transform our imperative-style constraint language into an expression-oriented style as shown in this exercise. In languages that are impoverished in handling compound objects, such as Algol, Basic, and Pascal (unless one explicitly uses Pascal pointer variables), one is usually stuck with the imperative style when manipulating compound objects. Given the advantage of the expression-oriented format, one might ask if there is any reason to have implemented the system in imperative style, as we did in this section. One reason is that the non-expression-oriented constraint language provides a handle on constraint objects (e.g., the value of the adder procedure) as well as on connector objects. This is useful if we wish to extend the system with new operations that communicate with constraints directly rather than only indirectly via operations on connectors. Although it is easy to implement the expression-oriented style in terms of the imperative implementation, it is very difficult to do the converse.
Most real processors actually execute a few operations at a time, following a strategy called pipelining. Although this technique greatly improves the effective utilization of the hardware, it is used only to speed up the execution of a sequential instruction stream, while retaining the behavior of the sequential program.
To quote some graffiti seen on a Cambridge building wall: ``Time is a device that was invented to keep everything from happening at once.''
An even worse failure for this system could occur if the two set! operations attempt to change the balance simultaneously, in which case the actual data appearing in memory might end up being a random combination of the information being written by the two processes. Most computers have interlocks on the primitive memory-write operations, which protect against such simultaneous access. Even this seemingly simple kind of protection, however, raises implementation challenges in the design of multiprocessing computers, where elaborate cache-coherence protocols are required to ensure that the various processors will maintain a consistent view of memory contents, despite the fact that data may be replicated (``cached'') among the different processors to increase the speed of memory access.
The factorial program in section illustrates this for a single sequential process.
The columns show the contents of Peter's wallet, the joint account (in Bank1), Paul's wallet, and Paul's private account (in Bank2), before and after each withdrawal (W) and deposit (D). Peter withdraws $10 from Bank1; Paul deposits $5 in Bank2, then withdraws $25 from Bank1.
A more formal way to express this idea is to say that concurrent programs are inherently nondeterministic. That is, they are described not by single-valued functions, but by functions whose results are sets of possible values. In section we will study a language for expressing nondeterministic computations.
Parallel-execute is not part of standard Scheme, but it can be implemented in MIT Scheme. In our implementation, the new concurrent processes also run concurrently with the original Scheme process. Also, in our implementation, the value returned by parallel-execute is a special control object that can be used to halt the newly created processes.
We have simplified exchange by exploiting the fact that our deposit message accepts negative amounts. (This is a serious bug in our banking system!)
If the account balances start out as $10, $20, and $30, then after any number of concurrent exchanges, the balances should still be $10, $20, and $30 in some order. Serializing the deposits to individual accounts is not sufficient to guarantee this. See exercise .
Exercise investigates why deposits and withdrawals are no longer automatically serialized by the account.
The term ``mutex'' is an abbreviation for mutual exclusion. The general problem of arranging a mechanism that permits concurrent processes to safely share resources is called the mutual exclusion problem. Our mutex is a simple variant of the semaphore mechanism (see exercise ), which was introduced in the ``THE'' Multiprogramming System developed at the Technological University of Eindhoven and named for the university's initials in Dutch (Dijkstra 1968a). The acquire and release operations were originally called P and V, from the Dutch words passeren (to pass) and vrijgeven (to release), in reference to the semaphores used on railroad systems. Dijkstra's classic exposition (1968b) was one of the first to clearly present the issues of concurrency control, and showed how to use semaphores to handle a variety of concurrency problems.
In most time-shared operating systems, processes that are blocked by a mutex do not waste time ``busy-waiting'' as above. Instead, the system schedules another process to run while the first is waiting, and the blocked process is awakened when the mutex becomes available.
In MIT Scheme for a single processor, which uses a time-slicing model, testandset! can be implemented as follows:(define (test-and-set! cell) (without-interrupts (lambda () (if (car cell) true (begin (set-car! cell true) false)))))Without-interrupts disables time-slicing interrupts while its procedure argument is being executed.
There are many variants of such instructions--including test-and-set, test-and-clear, swap, compare-and-exchange, load-reserve, and store-conditional--whose design must be carefully matched to the machine's processor-memory interface. One issue that arises here is to determine what happens if two processes attempt to acquire the same resource at exactly the same time by using such an instruction. This requires some mechanism for making a decision about which process gets control. Such a mechanism is called an arbiter. Arbiters usually boil down to some sort of hardware device. Unfortunately, it is possible to prove that one cannot physically construct a fair arbiter that works 100% of the time unless one allows the arbiter an arbitrarily long time to make its decision. The fundamental phenomenon here was originally observed by the fourteenth-century French philosopher Jean Buridan in his commentary on Aristotle's De caelo. Buridan argued that a perfectly rational dog placed between two equally attractive sources of food will starve to death, because it is incapable of deciding which to go to first.
The general technique for avoiding deadlock by numbering the shared resources and acquiring them in order is due to Havender (1968). Situations where deadlock cannot be avoided require deadlock-recovery methods, which entail having processes ``back out'' of the deadlocked state and try again. Deadlock-recovery mechanisms are widely used in database management systems, a topic that is treated in detail in Gray and Reuter 1993.
One such alternative to serialization is called barrier synchronization. The programmer permits concurrent processes to execute as they please, but establishes certain synchronization points (``barriers'') through which no process can proceed until all the processes have reached the barrier. Modern processors provide machine instructions that permit programmers to establish synchronization points at places where consistency is required. The PowerPC, for example, includes for this purpose two instructions called SYNC and EIEIO (Enforced In-order Execution of Input/Output).
This may seem like a strange point of view, but there are systems that work this way. International charges to credit-card accounts, for example, are normally cleared on a per-country basis, and the charges made in different countries are periodically reconciled. Thus the account balance may be different in different countries.
For distributed systems, this perspective was pursued by Lamport (1978), who showed how to use communication to establish ``global clocks'' that can be used to establish orderings on events in distributed systems.
Physicists sometimes adopt this view by introducing the ``world lines'' of particles as a device for reasoning about motion. We've also already mentioned (section ) that this is the natural way to think about signal-processing systems. We will explore applications of streams to signal processing in section .
Assume that we have a predicate prime? (e.g., as in section ) that tests for primality.
In the MIT implementation, the-empty-stream is the same as the empty list '(), and stream-null? is the same as null?.
This should bother you. The fact that we are defining such similar procedures for streams and lists indicates that we are missing some underlying abstraction. Unfortunately, in order to exploit this abstraction, we will need to exert finer control over the process of evaluation than we can at present. We will discuss this point further at the end of section . In section , we'll develop a framework that unifies lists and streams.
Although stream-car and stream-cdr can be defined as procedures, cons-stream must be a special form. If cons-stream were a procedure, then, according to our model of evaluation, evaluating (cons-stream a b) would automatically cause b to be evaluated, which is precisely what we do not want to happen. For the same reason, delay must be a special form, though force can be an ordinary procedure.
The numbers shown here do not really appear in the delayed expression. What actually appears is the original expression, in an environment in which the variables are bound to the appropriate numbers. For example, (+ low 1) with low bound to 10,000 actually appears where 10001 is shown.
There are many possible implementations of streams other than the one described in this section. Delayed evaluation, which is the key to making streams practical, was inherent in Algol 60's call-by-name parameter-passing method. The use of this mechanism to implement streams was first described by Landin (1965). Delayed evaluation for streams was introduced into Lisp by Friedman and Wise (1976). In their implementation, cons always delays evaluating its arguments, so that lists automatically behave as streams. The memoizing optimization is also known as call-by-need. The Algol community would refer to our original delayed objects as call-by-name thunks and to the optimized versions as call-by-need thunks.
Exercises such as and are valuable for testing our understanding of how delay works. On the other hand, intermixing delayed evaluation with printing--and, even worse, with assignment--is extremely confusing, and instructors of courses on computer languages have traditionally tormented their students with examination questions such as the ones in this section. Needless to say, writing programs that depend on such subtleties is odious programming style. Part of the power of stream processing is that it lets us ignore the order in which events actually happen in our programs. Unfortunately, this is precisely what we cannot afford to do in the presence of assignment, which forces us to be concerned with time and change.
Eratosthenes, a third-century B.C. Alexandrian Greek philosopher, is famous for giving the first accurate estimate of the circumference of the Earth, which he computed by observing shadows cast at noon on the day of the summer solstice. Eratosthenes's sieve method, although ancient, has formed the basis for special-purpose hardware ``sieves'' that, until recently, were the most powerful tools in existence for locating large primes. Since the 70s, however, these methods have been superseded by outgrowths of the probabilistic techniques discussed in section .
We have named these figures after Peter Henderson, who was the first person to show us diagrams of this sort as a way of thinking about stream processing. Each solid line represents a stream of values being transmitted. The dashed line from the car to the cons and the filter indicates that this is a single value rather than a stream.
This uses the generalized version of stream-map from exercise .
This last point is very subtle and relies on the fact that . (Here, pk denotes the kth prime.) Estimates such as these are very difficult to establish. The ancient proof by Euclid that there are an infinite number of primes shows that , and no substantially better result was proved until 1851, when the Russian mathematician P. L. Chebyshev established that for all n. This result, originally conjectured in 1845, is known as Bertrand's hypothesis. A proof can be found in section 22.3 of Hardy and Wright 1960.
This exercise shows how call-by-need is closely related to ordinary memoization as described in exercise . In that exercise, we used assignment to explicitly construct a local table. Our call-by-need stream optimization effectively constructs such a table automatically, storing values in the previously forced parts of the stream.
We can't use let to bind the local variable guesses, because the value of guesses depends on guesses itself. Exercise addresses why we want a local variable here.
As in section , we represent a pair of integers as a list rather than a Lisp pair.
See exercise for some insight into why we chose this decomposition.
The precise statement of the required property on the order of combination is as follows: There should be a function f of two arguments such that the pair corresponding to element i of the first stream and element j of the second stream will appear as element number f(i,j) of the output stream. The trick of using interleave to accomplish this was shown to us by David Turner, who employed it in the language KRC (Turner 1981).
We will require that the weighting function be such that the weight of a pair increases as we move out along a row or down along a column of the array of pairs.
To quote from G. H. Hardy's obituary of Ramanujan (Hardy 1921): ``It was Mr. Littlewood (I believe) who remarked that `every positive integer was one of his friends.' I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to me a rather dull one, and that I hoped it was not an unfavorable omen. `No,' he replied, `it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways.''' The trick of using weighted pairs to generate the Ramanujan numbers was shown to us by Charles Leiserson.
This procedure is not guaranteed to work in all Scheme implementations, although for any implementation there is a simple variation that will work. The problem has to do with subtle differences in the ways that Scheme implementations handle internal definitions. (See section .)
This is a small reflection, in Lisp, of the difficulties that conventional strongly typed languages such as Pascal have in coping with higher-order procedures. In such languages, the programmer must specify the data types of the arguments and the result of each procedure: number, logical value, sequence, and so on. Consequently, we could not express an abstraction such as ``map a given procedure proc over all the elements in a sequence'' by a single higher-order procedure such as stream-map. Rather, we would need a different mapping procedure for each different combination of argument and result data types that might be specified for a proc. Maintaining a practical notion of ``data type'' in the presence of higher-order procedures raises many difficult issues. One way of dealing with this problem is illustrated by the language ML (Gordon, Milner, and Wadsworth 1979), whose ``polymorphic data types'' include templates for higher-order transformations between data types. Moreover, data types for most procedures in ML are never explicitly declared by the programmer. Instead, ML includes a type-inferencing mechanism that uses information in the environment to deduce the data types for newly defined procedures.
Similarly in physics, when we observe a moving particle, we say that the position (state) of the particle is changing. However, from the perspective of the particle's world line in space-time there is no change involved.
John Backus, the inventor of Fortran, gave high visibility to functional programming when he was awarded the ACM Turing award in 1978. His acceptance speech (Backus 1978) strongly advocated the functional approach. A good overview of functional programming is given in Henderson 1980 and in Darlington, Henderson, and Turner 1982.
Observe that, for any two streams, there is in general more than one acceptable order of interleaving. Thus, technically, ``merge'' is a relation rather than a function--the answer is not a deterministic function of the inputs. We already mentioned (footnote ) that nondeterminism is essential when dealing with concurrency. The merge relation illustrates the same essential nondeterminism, from the functional perspective. In section , we will look at nondeterminism from yet another point of view.
The object model approximates the world by dividing it into separate pieces. The functional model does not modularize along object boundaries. The object model is useful when the unshared state of the ``objects'' is much larger than the state that they share. An example of a place where the object viewpoint fails is quantum mechanics, where thinking of things as individual particles leads to paradoxes and confusions. Unifying the object view with the functional view may have little to do with programming, but rather with fundamental epistemological issues.
The same idea is pervasive throughout all of engineering. For example, electrical engineers use many different languages for describing circuits. Two of these are the language of electrical networks and the language of electrical systems. The network language emphasizes the physical modeling of devices in terms of discrete electrical elements. The primitive objects of the network language are primitive electrical components such as resistors, capacitors, inductors, and transistors, which are characterized in terms of physical variables called voltage and current. When describing circuits in the network language, the engineer is concerned with the physical characteristics of a design. In contrast, the primitive objects of the system language are signal-processing modules such as filters and amplifiers. Only the functional behavior of the modules is relevant, and signals are manipulated without concern for their physical realization as voltages and currents. The system language is erected on the network language, in the sense that the elements of signal-processing systems are constructed from electrical networks. Here, however, the concerns are with the large-scale organization of electrical devices to solve a given application problem; the physical feasibility of the parts is assumed. This layered collection of languages is another example of the stratified design technique illustrated by the picture language of section .
The most important features that our evaluator leaves out are mechanisms for handling errors and supporting debugging. For a more extensive discussion of evaluators, see Friedman, Wand, and Haynes 1992, which gives an exposition of programming languages that proceeds via a sequence of evaluators written in Scheme.
Even so, there will remain important aspects of the evaluation process that are not elucidated by our evaluator. The most important of these are the detailed mechanisms by which procedures call other procedures and return values to their callers. We will address these issues in chapter 5, where we take a closer look at the evaluation process by implementing the evaluator as a simple register machine.
If we grant ourselves the ability to apply primitives, then what remains for us to implement in the evaluator? The job of the evaluator is not to specify the primitives of the language, but rather to provide the connective tissue--the means of combination and the means of abstraction--that binds a collection of primitives to form a language. Specifically:The evaluator enables us to deal with nested expressions. For example, although simply applying primitives would suffice for evaluating the expression (+ 1 6), it is not adequate for handling (+ 1 (* 2 3)). As far as the primitive procedure + is concerned, its arguments must be numbers, and it would choke if we passed it the expression (* 2 3) as an argument. One important role of the evaluator is to choreograph procedure composition so that (* 2 3) is reduced to 6 before being passed as an argument to +.
- The evaluator allows us to use variables. For example, the primitive procedure for addition has no way to deal with expressions such as (+ x 1). We need an evaluator to keep track of variables and obtain their values before invoking the primitive procedures.
- The evaluator allows us to define compound procedures. This involves keeping track of procedure definitions, knowing how to use these definitions in evaluating expressions, and providing a mechanism that enables procedures to accept arguments.
The evaluator provides the special forms, which must be evaluated differently from procedure calls.
We could have simplified the application? clause in eval by using map (and stipulating that operands returns a list) rather than writing an explicit list-of-values procedure. We chose not to use map here to emphasize the fact that the evaluator can be implemented without any use of higher-order procedures (and thus could be written in a language that doesn't have higher-order procedures), even though the language that it supports will include higher-order procedures.
In this case, the language being implemented and the implementation language are the same. Contemplation of the meaning of true? here yields expansion of consciousness without the abuse of substance.
This implementation of define ignores a subtle issue in the handling of internal definitions, although it works correctly in most cases. We will see what the problem is and how to solve it in section .
As we said when we introduced define and set!, these values are implementation-dependent in Scheme--that is, the implementor can choose what value to return.
As mentioned in section , the evaluator sees a quoted expression as a list beginning with quote, even if the expression is typed with the quotation mark. For example, the expression 'a would be seen by the evaluator as (quote a). See exercise .
The value of an if expression when the predicate is false and there is no alternative is unspecified in Scheme; we have chosen here to make it false. We will support the use of the variables true and false in expressions to be evaluated by binding them in the global environment. See section .
These selectors for a list of expressions--and the corresponding ones for a list of operands--are not intended as a data abstraction. They are introduced as mnemonic names for the basic list operations in order to make it easier to understand the explicit-control evaluator in section .
The value of a cond expression when all the predicates are false and there is no else clause is unspecified in Scheme; we have chosen here to make it false.
Practical Lisp systems provide a mechanism that allows a user to add new derived expressions and specify their implementation as syntactic transformations without modifying the evaluator. Such a user-defined transformation is called a macro. Although it is easy to add an elementary mechanism for defining macros, the resulting language has subtle name-conflict problems. There has been much research on mechanisms for macro definition that do not cause these difficulties. See, for example, Kohlbecker 1986, Clinger and Rees 1991, and Hanson 1991.
Frames are not really a data abstraction in the following code: Set-variable-value! and define-variable! use set-car! to directly modify the values in a frame. The purpose of the frame procedures is to make the environment-manipulation procedures easy to read.
The drawback of this representation (as well as the variant in exercise ) is that the evaluator may have to search through many frames in order to find the binding for a given variable. (Such an approach is referred to as deep binding.) One way to avoid this inefficiency is to make use of a strategy called lexical addressing, which will be discussed in section .
Any procedure defined in the underlying Lisp can be used as a primitive for the metacircular evaluator. The name of a primitive installed in the evaluator need not be the same as the name of its implementation in the underlying Lisp; the names are the same here because the metacircular evaluator implements Scheme itself. Thus, for example, we could put (list 'first car) or (list 'square (lambda (x) (* x x))) in the list of primitive-procedures.
Apply-in-underlying-scheme is the apply procedure we have used in earlier chapters. The metacircular evaluator's apply procedure (section ) models the working of this primitive. Having two different things called apply leads to a technical problem in running the metacircular evaluator, because defining the metacircular evaluator's apply will mask the definition of the primitive. One way around this is to rename the metacircular apply to avoid conflict with the name of the primitive procedure. We have assumed instead that we have saved a reference to the underlying apply by doing(define apply-in-underlying-scheme apply)before defining the metacircular apply. This allows us to access the original version of apply under a different name.
The primitive procedure read waits for input from the user, and returns the next complete expression that is typed. For example, if the user types (+ 23 x), read returns a three-element list containing the symbol +, the number 23, and the symbol x. If the user types 'x, read returns a two-element list containing the symbol quote and the symbol x.
The fact that the machines are described in Lisp is inessential. If we give our evaluator a Lisp program that behaves as an evaluator for some other language, say C, the Lisp evaluator will emulate the C evaluator, which in turn can emulate any machine described as a C program. Similarly, writing a Lisp evaluator in C produces a C program that can execute any Lisp program. The deep idea here is that any evaluator can emulate any other. Thus, the notion of ``what can in principle be computed'' (ignoring practicalities of time and memory required) is independent of the language or the computer, and instead reflects an underlying notion of computability. This was first demonstrated in a clear way by Alan M. Turing (1912-1954), whose 1936 paper laid the foundations for theoretical computer science. In the paper, Turing presented a simple computational model--now known as a Turing machine--and argued that any ``effective process'' can be formulated as a program for such a machine. (This argument is known as the Church-Turing thesis.) Turing then implemented a universal machine, i.e., a Turing machine that behaves as an evaluator for Turing-machine programs. He used this framework to demonstrate that there are well-posed problems that cannot be computed by Turing machines (see exercise ), and so by implication cannot be formulated as ``effective processes.'' Turing went on to make fundamental contributions to practical computer science as well. For example, he invented the idea of structuring programs using general-purpose subroutines. See Hodges 1983 for a biography of Turing.
Some people find it counterintuitive that an evaluator, which is implemented by a relatively simple procedure, can emulate programs that are more complex than the evaluator itself. The existence of a universal evaluator machine is a deep and wonderful property of computation. Recursion theory, a branch of mathematical logic, is concerned with logical limits of computation. Douglas Hofstadter's beautiful book Gödel, Escher, Bach (1979) explores some of these ideas.
Warning: This eval primitive is not identical to the eval procedure we implemented in section , because it uses actual Scheme environments rather than the sample environment structures we built in section . These actual environments cannot be manipulated by the user as ordinary lists; they must be accessed via eval or other special operations. Similarly, the apply primitive we saw earlier is not identical to the metacircular apply, because it uses actual Scheme procedures rather than the procedure objects we constructed in sections and .
The MIT implementation of Scheme includes eval, as well as a symbol userinitial-environment that is bound to the initial environment in which the user's input expressions are evaluated.
Although we stipulated that halts? is given a procedure object, notice that this reasoning still applies even if halts? can gain access to the procedure's text and its environment. This is Turing's celebrated Halting Theorem, which gave the first clear example of a non-computable problem, i.e., a well-posed task that cannot be carried out as a computational procedure.
Wanting programs to not depend on this evaluation mechanism is the reason for the ``management is not responsible'' remark in footnote of chapter 1. By insisting that internal definitions come first and do not use each other while the definitions are being evaluated, the IEEE standard for Scheme leaves implementors some choice in the mechanism used to evaluate these definitions. The choice of one evaluation rule rather than another here may seem like a small issue, affecting only the interpretation of ``badly formed'' programs. However, we will see in section that moving to a model of simultaneous scoping for internal definitions avoids some nasty difficulties that would otherwise arise in implementing a compiler.
The IEEE standard for Scheme allows for different implementation strategies by specifying that it is up to the programmer to obey this restriction, not up to the implementation to enforce it. Some Scheme implementations, including MIT Scheme, use the transformation shown above. Thus, some programs that don't obey this restriction will in fact run in such implementations.
The MIT implementors of Scheme support Alyssa on the following grounds: Eva is in principle correct--the definitions should be regarded as simultaneous. But it seems difficult to implement a general, efficient mechanism that does what Eva requires. In the absence of such a mechanism, it is better to generate an error in the difficult cases of simultaneous definitions (Alyssa's notion) than to produce an incorrect answer (as Ben would have it).
This example illustrates a programming trick for formulating recursive procedures without using define. The most general trick of this sort is the Y operator, which can be used to give a ``pure -calculus'' implementation of recursion. (See Stoy 1977 for details on the lambda calculus, and Gabriel 1988 for an exposition of the Y operator in Scheme.)
This technique is an integral part of the compilation process, which we shall discuss in chapter 5. Jonathan Rees wrote a Scheme interpreter like this in about 1982 for the T project (Rees and Adams 1982). Marc Feeley (1986) (see also Feeley and Lapalme 1987) independently invented this technique in his master's thesis.
There is, however, an important part of the variable search that can be done as part of the syntactic analysis. As we will show in section , one can determine the position in the environment structure where the value of the variable will be found, thus obviating the need to scan the environment for the entry that matches the variable.
See exercise for some insight into the processing of sequences.
Snarf: ``To grab, especially a large document or file for the purpose of using it either with or without the owner's permission.'' Snarf down: ``To snarf, sometimes with the connotation of absorbing, processing, or understanding.'' (These definitions were snarfed from Steele et al. 1983. See also Raymond 1993.)
The difference between the ``lazy'' terminology and the ``normal-order'' terminology is somewhat fuzzy. Generally, ``lazy'' refers to the mechanisms of particular evaluators, while ``normal-order'' refers to the semantics of languages, independent of any particular evaluation strategy. But this is not a hard-and-fast distinction, and the two terminologies are often used interchangeably.
The ``strict'' versus ``non-strict'' terminology means essentially the same thing as ``applicative-order'' versus ``normal-order,'' except that it refers to individual procedures and arguments rather than to the language as a whole. At a conference on programming languages you might hear someone say, ``The normal-order language Hassle has certain strict primitives. Other procedures take their arguments by lazy evaluation.''
The word thunk was invented by an informal working group that was discussing the implementation of call-by-name in Algol 60. They observed that most of the analysis of (``thinking about'') the expression could be done at compile time; thus, at run time, the expression would already have been ``thunk'' about (Ingerman et al. 1960).
This is analogous to the use of force on the delayed objects that were introduced in chapter 3 to represent streams. The critical difference between what we are doing here and what we did in chapter 3 is that we are building delaying and forcing into the evaluator, and thus making this uniform and automatic throughout the language.
Lazy evaluation combined with memoization is sometimes referred to as call-by-need argument passing, in contrast to call-by-name argument passing. (Call-by-name, introduced in Algol 60, is similar to non-memoized lazy evaluation.) As language designers, we can build our evaluator to memoize, not to memoize, or leave this an option for programmers (exercise ). As you might expect from chapter 3, these choices raise issues that become both subtle and confusing in the presence of assignments. (See exercises and .) An excellent article by Clinger (1982) attempts to clarify the multiple dimensions of confusion that arise here.
Notice that we also erase the env from the thunk once the expression's value has been computed. This makes no difference in the values returned by the interpreter. It does help save space, however, because removing the reference from the thunk to the env once it is no longer needed allows this structure to be garbage-collected and its space recycled, as we will discuss in section .Similarly, we could have allowed unneeded environments in the memoized delayed objects of section to be garbage-collected, by having memo-proc do something like (set! proc '()) to discard the procedure proc (which includes the environment in which the delay was evaluated) after storing its value.
This exercise demonstrates that the interaction between lazy evaluation and side effects can be very confusing. This is just what you might expect from the discussion in chapter 3.
This is precisely the issue with the unless procedure, as in exercise .
This is the procedural representation described in exercise . Essentially any procedural representation (e.g., a message-passing implementation) would do as well. Notice that we can install these definitions in the lazy evaluator simply by typing them at the driver loop. If we had originally included cons, car, and cdr as primitives in the global environment, they will be redefined. (Also see exercises and .)
This permits us to create delayed versions of more general kinds of list structures, not just sequences. Hughes 1990 discusses some applications of ``lazy trees.''
We assume that we have previously defined a procedure prime? that tests whether numbers are prime. Even with prime? defined, the prime-sum-pair procedure may look suspiciously like the unhelpful ``pseudo-Lisp'' attempt to define the square-root function, which we described at the beginning of section . In fact, a square-root procedure along those lines can actually be formulated as a nondeterministic program. By incorporating a search mechanism into the evaluator, we are eroding the distinction between purely declarative descriptions and imperative specifications of how to compute answers. We'll go even farther in this direction in section .
The idea of amb for nondeterministic programming was first described in 1961 by John McCarthy (see McCarthy 1967).
In actuality, the distinction between nondeterministically returning a single choice and returning all choices depends somewhat on our point of view. From the perspective of the code that uses the value, the nondeterministic choice returns a single value. From the perspective of the programmer designing the code, the nondeterministic choice potentially returns all possible values, and the computation branches so that each value is investigated separately.
One might object that this is a hopelessly inefficient mechanism. It might require millions of processors to solve some easily stated problem this way, and most of the time most of those processors would be idle. This objection should be taken in the context of history. Memory used to be considered just such an expensive commodity. In 1964 a megabyte of RAM cost about $400,000. Now every personal computer has many megabytes of RAM, and most of the time most of that RAM is unused. It is hard to underestimate the cost of mass-produced electronics.
Automagically: ``Automatically, but in a way which, for some reason (typically because it is too complicated, or too ugly, or perhaps even too trivial), the speaker doesn't feel like explaining.'' (Steele 1983, Raymond 1993)
The integration of automatic search strategies into programming languages has had a long and checkered history. The first suggestions that nondeterministic algorithms might be elegantly encoded in a programming language with search and automatic backtracking came from Robert Floyd (1967). Carl Hewitt (1969) invented a programming language called Planner that explicitly supported automatic chronological backtracking, providing for a built-in depth-first search strategy. Sussman, Winograd, and Charniak (1971) implemented a subset of this language, called MicroPlanner, which was used to support work in problem solving and robot planning. Similar ideas, arising from logic and theorem proving, led to the genesis in Edinburgh and Marseille of the elegant language Prolog (which we will discuss in section ). After sufficient frustration with automatic search, McDermott and Sussman (1972) developed a language called Conniver, which included mechanisms for placing the search strategy under programmer control. This proved unwieldy, however, and Sussman and Stallman (1975) found a more tractable approach while investigating methods of symbolic analysis for electrical circuits. They developed a non-chronological backtracking scheme that was based on tracing out the logical dependencies connecting facts, a technique that has come to be known as dependency-directed backtracking. Although their method was complex, it produced reasonably efficient programs because it did little redundant search. Doyle (1979) and McAllester (1978, 1980) generalized and clarified the methods of Stallman and Sussman, developing a new paradigm for formulating search that is now called truth maintenance. Modern problem-solving systems all use some form of truth-maintenance system as a substrate. See Forbus and deKleer 1993 for a discussion of elegant ways to build truth-maintenance systems and applications using truth maintenance. Zabih, McAllester, and Chapman 1987 describes a nondeterministic extension to Scheme that is based on amb; it is similar to the interpreter described in this section, but more sophisticated, because it uses dependency-directed backtracking rather than chronological backtracking. Winston 1992 gives an introduction to both kinds of backtracking.
Our program uses the following procedure to determine if the elements of a list are distinct:(define (distinct? items) (cond ((null? items) true) ((null? (cdr items)) true) ((member (car items) (cdr items)) false) (else (distinct? (cdr items)))))Member is like memq except that it uses equal? instead of eq? to test for equality.
This is taken from a booklet called ``Problematical Recreations,'' published in the 1960s by Litton Industries, where it is attributed to the Kansas State Engineer.
Here we use the convention that the first element of each list designates the part of speech for the rest of the words in the list.
Notice that parse-word uses set! to modify the unparsed input list. For this to work, our amb evaluator must undo the effects of set! operations when it backtracks.
Observe that this definition is recursive--a verb may be followed by any number of prepositional phrases.
This kind of grammar can become arbitrarily complex, but it is only a toy as far as real language understanding is concerned. Real natural-language understanding by computer requires an elaborate mixture of syntactic analysis and interpretation of meaning. On the other hand, even toy parsers can be useful in supporting flexible command languages for programs such as information-retrieval systems. Winston 1992 discusses computational approaches to real language understanding and also the applications of simple grammars to command languages.
Although Alyssa's idea works just fine (and is surprisingly simple), the sentences that it generates are a bit boring--they don't sample the possible sentences of this language in a very interesting way. In fact, the grammar is highly recursive in many places, and Alyssa's technique ``falls into'' one of these recursions and gets stuck. See exercise for a way to deal with this.
We chose to implement the lazy evaluator in section as a modification of the ordinary metacircular evaluator of section . In contrast, we will base the amb evaluator on the analyzing evaluator of section , because the execution procedures in that evaluator provide a convenient framework for implementing backtracking.
We assume that the evaluator supports let (see exercise ), which we have used in our nondeterministic programs.
We didn't worry about undoing definitions, since we can assume that internal definitions are scanned out (section ).
Logic programming has grown out of a long history of research in automatic theorem proving. Early theorem-proving programs could accomplish very little, because they exhaustively searched the space of possible proofs. The major breakthrough that made such a search plausible was the discovery in the early 1960s of the unification algorithm and the resolution principle (Robinson 1965). Resolution was used, for example, by Green and Raphael (1968) (see also Green 1969) as the basis for a deductive question-answering system. During most of this period, researchers concentrated on algorithms that are guaranteed to find a proof if one exists. Such algorithms were difficult to control and to direct toward a proof. Hewitt (1969) recognized the possibility of merging the control structure of a programming language with the operations of a logic-manipulation system, leading to the work in automatic search mentioned in section (footnote ). At the same time that this was being done, Colmerauer, in Marseille, was developing rule-based systems for manipulating natural language (see Colmerauer et al. 1973). He invented a programming language called Prolog for representing those rules. Kowalski (1973; 1979), in Edinburgh, recognized that execution of a Prolog program could be interpreted as proving theorems (using a proof technique called linear Horn-clause resolution). The merging of the last two strands led to the logic-programming movement. Thus, in assigning credit for the development of logic programming, the French can point to Prolog's genesis at the University of Marseille, while the British can highlight the work at the University of Edinburgh. According to people at MIT, logic programming was developed by these groups in an attempt to figure out what Hewitt was talking about in his brilliant but impenetrable Ph.D. thesis. For a history of logic programming, see Robinson 1983.
To see the correspondence between the rules and the procedure, let x in the procedure (where x is nonempty) correspond to (cons u v) in the rule. Then z in the rule corresponds to the append of (cdr x) and y.
This certainly does not relieve the user of the entire problem of how to compute the answer. There are many different mathematically equivalent sets of rules for formulating the append relation, only some of which can be turned into effective devices for computing in any direction. In addition, sometimes ``what is'' information gives no clue ``how to'' compute an answer. For example, consider the problem of computing the y such that y2 = x.
Interest in logic programming peaked during the early 80s when the Japanese government began an ambitious project aimed at building superfast computers optimized to run logic programming languages. The speed of such computers was to be measured in LIPS (Logical Inferences Per Second) rather than the usual FLOPS (FLoating-point Operations Per Second). Although the project succeeded in developing hardware and software as originally planned, the international computer industry moved in a different direction. See Feigenbaum and Shrobe 1993 for an overview evaluation of the Japanese project. The logic programming community has also moved on to consider relational programming based on techniques other than simple pattern matching, such as the ability to deal with numerical constraints such as the ones illustrated in the constraint-propagation system of section .
This uses the dotted-tail notation introduced in exercise .
Actually, this description of not is valid only for simple cases. The real behavior of not is more complex. We will examine not's peculiarities in sections and .
Lisp-value should be used only to perform an operation not provided in the query language. In particular, it should not be used to test equality (since that is what the matching in the query language is designed to do) or inequality (since that can be done with the same rule shown below).
Notice that we do not need same in order to make two things be the same: We just use the same pattern variable for each--in effect, we have one thing instead of two things in the first place. For example, see ?town in the lives-near rule and ?middle-manager in the wheel rule below. Same is useful when we want to force two things to be different, such as ?person-1 and ?person-2 in the lives-near rule. Although using the same pattern variable in two parts of a query forces the same value to appear in both places, using different pattern variables does not force different values to appear. (The values assigned to different pattern variables may be the same or different.)
We will also allow rules without bodies, as in same, and we will interpret such a rule to mean that the rule conclusion is satisfied by any values of the variables.
Because matching is generally very expensive, we would like to avoid applying the full matcher to every element of the data base. This is usually arranged by breaking up the process into a fast, coarse match and the final match. The coarse match filters the data base to produce a small set of candidates for the final match. With care, we can arrange our data base so that some of the work of coarse matching can be done when the data base is constructed rather then when we want to select the candidates. This is called indexing the data base. There is a vast technology built around data-base-indexing schemes. Our implementation, described in section , contains a simple-minded form of such an optimization.
But this kind of exponential explosion is not common in and queries because the added conditions tend to reduce rather than expand the number of frames produced.
There is a large literature on data-base-management systems that is concerned with how to handle complex queries efficiently.
There is a subtle difference between this filter implementation of not and the usual meaning of not in mathematical logic. See section .
In one-sided pattern matching, all the equations that contain pattern variables are explicit and already solved for the unknown (the pattern variable).
Another way to think of unification is that it generates the most general pattern that is a specialization of the two input patterns. That is, the unification of (?x a) and ((b ?y) ?z) is ((b ?y) a), and the unification of (?x a ?y) and (?y ?z a), discussed above, is (a a a). For our implementation, it is more convenient to think of the result of unification as a frame rather than a pattern.
Since unification is a generalization of matching, we could simplify the system by using the unifier to produce both streams. Treating the easy case with the simple matcher, however, illustrates how matching (as opposed to full-blown unification) can be useful in its own right.
The reason we use streams (rather than lists) of frames is that the recursive application of rules can generate infinite numbers of values that satisfy a query. The delayed evaluation embodied in streams is crucial here: The system will print responses one by one as they are generated, regardless of whether there are a finite or infinite number of responses.
That a particular method of inference is legitimate is not a trivial assertion. One must prove that if one starts with true premises, only true conclusions can be derived. The method of inference represented by rule applications is modus ponens, the familiar method of inference that says that if A is true and A implies B is true, then we may conclude that B is true.
We must qualify this statement by agreeing that, in speaking of the ``inference'' accomplished by a logic program, we assume that the computation terminates. Unfortunately, even this qualified statement is false for our implementation of the query language (and also false for programs in Prolog and most other current logic programming languages) because of our use of not and lisp-value. As we will describe below, the not implemented in the query language is not always consistent with the not of mathematical logic, and lisp-value introduces additional complications. We could implement a language consistent with mathematical logic by simply removing not and lisp-value from the language and agreeing to write programs using only simple queries, and, and or. However, this would greatly restrict the expressive power of the language. One of the major concerns of research in logic programming is to find ways to achieve more consistency with mathematical logic without unduly sacrificing expressive power.
This is not a problem of the logic but one of the procedural interpretation of the logic provided by our interpreter. We could write an interpreter that would not fall into a loop here. For example, we could enumerate all the proofs derivable from our assertions and our rules in a breadth-first rather than a depth-first order. However, such a system makes it more difficult to take advantage of the order of deductions in our programs. One attempt to build sophisticated control into such a program is described in deKleer et al. 1977. Another technique, which does not lead to such serious control problems, is to put in special knowledge, such as detectors for particular kinds of loops (exercise ). However, there can be no general scheme for reliably preventing a system from going down infinite paths in performing deductions. Imagine a diabolical rule of the form ``To show P(x) is true, show that P(f(x)) is true,'' for some suitably chosen function f.
Consider the query (not (baseball-fan (Bitdiddle Ben))). The system finds that (baseball-fan (Bitdiddle Ben)) is not in the data base, so the empty frame does not satisfy the pattern and is not filtered out of the initial stream of frames. The result of the query is thus the empty frame, which is used to instantiate the input query to produce (not (baseball-fan (Bitdiddle Ben))).
A discussion and justification of this treatment of not can be found in the article by Clark (1978).
In general, unifying ?y with an expression involving ?y would require our being able to find a fixed point of the equation ?y = expression involving ?y. It is sometimes possible to syntactically form an expression that appears to be the solution. For example, ?y = (f ?y) seems to have the fixed point (f (f (f ... ))), which we can produce by beginning with the expression (f ?y) and repeatedly substituting (f ?y) for ?y. Unfortunately, not every such equation has a meaningful fixed point. The issues that arise here are similar to the issues of manipulating infinite series in mathematics. For example, we know that 2 is the solution to the equation y = 1 + y/2. Beginning with the expression 1 + y/2 and repeatedly substituting 1 + y/2 for y gives
which leads to
However, if we try the same manipulation beginning with the observation that -1 is the solution to the equation y = 1 + 2y, we obtain
which leads to
Although the formal manipulations used in deriving these two equations are identical, the first result is a valid assertion about infinite series but the second is not. Similarly, for our unification results, reasoning with an arbitrary syntactically constructed expression may lead to errors.
Most Lisp systems give the user the ability to modify the ordinary read procedure to perform such transformations by defining reader macro characters. Quoted expressions are already handled in this way: The reader automatically translates 'expression into (quote expression) before the evaluator sees it. We could arrange for ?expression to be transformed into (? expression) in the same way; however, for the sake of clarity we have included the transformation procedure here explicitly.Expand-question-mark and contract-question-mark use several procedures with string in their names. These are Scheme primitives.
This assumption glosses over a great deal of complexity. Usually a large portion of the implementation of a Lisp system is dedicated to making reading and printing work.
One might argue that we don't need to save the old n; after we decrement it and solve the subproblem, we could simply increment it to recover the old value. Although this strategy works for factorial, it cannot work in general, since the old value of a register cannot always be computed from the new one.
In section we will see how to implement a stack in terms of more primitive operations.
Using the receive procedure here is a way to get extract-labels to effectively return two values--labels and insts--without explicitly making a compound data structure to hold them. An alternative implementation, which returns an explicit pair of values, is(define (extract-labels text) (if (null? text) (cons '() '()) (let ((result (extract-labels (cdr text)))) (let ((insts (car result)) (labels (cdr result))) (let ((next-inst (car text))) (if (symbol? next-inst) (cons insts (cons (make-label-entry next-inst insts) labels)) (cons (cons (make-instruction next-inst) insts) labels)))))))which would be called by assemble as follows:(define (assemble controller-text machine) (let ((result (extract-labels controller-text))) (let ((insts (car result)) (labels (cdr result))) (update-insts! insts labels machine) insts)))You can consider our use of receive as demonstrating an elegant way to return multiple values, or simply an excuse to show off a programming trick. An argument like receive that is the next procedure to be invoked is called a ``continuation.'' Recall that we also used continuations to implement the backtracking control structure in the amb evaluator in section .
We could represent memory as lists of items. However, the access time would then not be independent of the index, since accessing the nth element of a list requires n-1 cdr operations.
For completeness, we should specify a make-vector operation that constructs vectors. However, in the present application we will use vectors only to model fixed divisions of the computer memory.
This is precisely the same ``tagged data'' idea we introduced in chapter 2 for dealing with generic operations. Here, however, the data types are included at the primitive machine level rather than constructed through the use of lists.
Type information may be encoded in a variety of ways, depending on the details of the machine on which the Lisp system is to be implemented. The execution efficiency of Lisp programs will be strongly dependent on how cleverly this choice is made, but it is difficult to formulate general design rules for good choices. The most straightforward way to implement typed pointers is to allocate a fixed set of bits in each pointer to be a type field that encodes the data type. Important questions to be addressed in designing such a representation include the following: How many type bits are required? How large must the vector indices be? How efficiently can the primitive machine instructions be used to manipulate the type fields of pointers? Machines that include special hardware for the efficient handling of type fields are said to have tagged architectures.
This decision on the representation of numbers determines whether eq?, which tests equality of pointers, can be used to test for equality of numbers. If the pointer contains the number itself, then equal numbers will have the same pointer. But if the pointer contains the index of a location where the number is stored, equal numbers will be guaranteed to have equal pointers only if we are careful never to store the same number in more than one location.
This is just like writing a number as a sequence of digits, except that each ``digit'' is a number between 0 and the largest number that can be stored in a single pointer.
There are other ways of finding free storage. For example, we could link together all the unused pairs into a free list. Our free locations are consecutive (and hence can be accessed by incrementing a pointer) because we are using a compacting garbage collector, as we will see in section .
This is essentially the implementation of cons in terms of set-car! and set-cdr!, as described in section . The operation get-new-pair used in that implementation is realized here by the free pointer.
This may not be true eventually, because memories may get large enough so that it would be impossible to run out of free memory in the lifetime of the computer. For example, there are about , microseconds in a year, so if we were to cons once per microsecond we would need about 1015 cells of memory to build a machine that could operate for 30 years without running out of memory. That much memory seems absurdly large by today's standards, but it is not physically impossible. On the other hand, processors are getting faster and a future computer may have large numbers of processors operating in parallel on a single memory, so it may be possible to use up memory much faster than we have postulated.
We assume here that the stack is represented as a list as described in section , so that items on the stack are accessible via the pointer in the stack register.
This idea was invented and first implemented by Minsky, as part of the implementation of Lisp for the PDP-1 at the MIT Research Laboratory of Electronics. It was further developed by Fenichel and Yochelson (1969) for use in the Lisp implementation for the Multics time-sharing system. Later, Baker (1978) developed a ``real-time'' version of the method, which does not require the computation to stop during garbage collection. Baker's idea was extended by Hewitt, Lieberman, and Moon (see Lieberman and Hewitt 1983) to take advantage of the fact that some structure is more volatile and other structure is more permanent.An alternative commonly used garbage-collection technique is the mark-sweep method. This consists of tracing all the structure accessible from the machine registers and marking each pair we reach. We then scan all of memory, and any location that is unmarked is ``swept up'' as garbage and made available for reuse. A full discussion of the mark-sweep method can be found in Allen 1978.
The Minsky-Fenichel-Yochelson algorithm is the dominant algorithm in use for large-memory systems because it examines only the useful part of memory. This is in contrast to mark-sweep, in which the sweep phase must check all of memory. A second advantage of stop-and-copy is that it is a compacting garbage collector. That is, at the end of the garbage-collection phase the useful data will have been moved to consecutive memory locations, with all garbage pairs compressed out. This can be an extremely important performance consideration in machines with virtual memory, in which accesses to widely separated memory addresses may require extra paging operations.
This list of registers does not include the registers used by the storage-allocation system--root, the-cars, the-cdrs, and the other registers that will be introduced in this section.
The term broken heart was coined by David Cressey, who wrote a garbage collector for MDL, a dialect of Lisp developed at MIT during the early 1970s.
The garbage collector uses the low-level predicate pointer-to-pair? instead of the list-structure pair? operation because in a real system there might be various things that are treated as pairs for garbage-collection purposes. For example, in a Scheme system that conforms to the IEEE standard a procedure object may be implemented as a special kind of ``pair'' that doesn't satisfy the pair? predicate. For simulation purposes, pointer-to-pair? can be implemented as pair?.
See Batali et al. 1982 for more information on the chip and the method by which it was designed.
In our controller, the dispatch is written as a sequence of test and branch instructions. Alternatively, it could have been written in a data-directed style (and in a real system it probably would have been) to avoid the need to perform sequential tests and to facilitate the definition of new expression types. A machine designed to run Lisp would probably include a dispatch-on-type instruction that would efficiently execute such data-directed dispatches.
This is an important but subtle point in translating algorithms from a procedural language, such as Lisp, to a register-machine language. As an alternative to saving only what is needed, we could save all the registers (except val) before each recursive call. This is called a framed-stack discipline. This would work but might save more registers than necessary; this could be an important consideration in a system where stack operations are expensive. Saving registers whose contents will not be needed later may also hold onto useless data that could otherwise be garbage-collected, freeing space to be reused.
We add to the evaluator data-structure procedures in section the following two procedures for manipulating argument lists:(define (empty-arglist) '())We also use an additional syntax procedure to test for the last operand in a combination:(define (adjoin-arg arg arglist) (append arglist (list arg)))
(define (last-operand? ops) (null? (cdr ops)))
The optimization of treating the last operand specially is known as evlis tail recursion (see Wand 1980). We could be somewhat more efficient in the argument evaluation loop if we made evaluation of the first operand a special case too. This would permit us to postpone initializing argl until after evaluating the first operand, so as to avoid saving argl in this case. The compiler in section performs this optimization. (Compare the construct-arglist procedure of section .)
The order of operand evaluation in the metacircular evaluator is determined by the order of evaluation of the arguments to cons in the procedure list-of-values of section (see exercise ).
We saw in section how to implement such a process with a register machine that had no stack; the state of the process was stored in a fixed set of registers.
This implementation of tail recursion in ev-sequence is one variety of a well-known optimization technique used by many compilers. In compiling a procedure that ends with a procedure call, one can replace the call by a jump to the called procedure's entry point. Building this strategy into the interpreter, as we have done in this section, provides the optimization uniformly throughout the language.
We can define no-more-exps? as follows:(define (no-more-exps? seq) (null? seq))
This isn't really cheating. In an actual implementation built from scratch, we would use our explicit-control evaluator to interpret a Scheme program that performs source-level transformations like cond->if in a syntax phase that runs before execution.
We assume here that read and the various printing operations are available as primitive machine operations, which is useful for our simulation, but completely unrealistic in practice. These are actually extremely complex operations. In practice, they would be implemented using low-level input-output operations such as transferring single characters to and from a device.To support the get-global-environment operation we define
(define the-global-environment (setup-environment))(define (get-global-environment) the-global-environment)
There are other errors that we would like the interpreter to handle, but these are not so simple. See exercise .
We could perform the stack initialization only after errors, but doing it in the driver loop will be convenient for monitoring the evaluator's performance, as described below.
Regrettably, this is the normal state of affairs in conventional compiler-based language systems such as C. In UNIX the system ``dumps core,'' and in DOS/Windows it becomes catatonic. The Macintosh displays a picture of an exploding bomb and offers you the opportunity to reboot the computer--if you're lucky.
This is a theoretical statement. We are not claiming that the evaluator's data paths are a particularly convenient or efficient set of data paths for a general-purpose computer. For example, they are not very good for implementing high-performance floating-point calculations or calculations that intensively manipulate bit vectors.
Actually, the machine that runs compiled code can be simpler than the interpreter machine, because we won't use the exp and unev registers. The interpreter used these to hold pieces of unevaluated expressions. With the compiler, however, these expressions get built into the compiled code that the register machine will run. For the same reason, we don't need the machine operations that deal with expression syntax. But compiled code will use a few additional machine operations (to represent compiled procedure objects) that didn't appear in the explicit-control evaluator machine.
Notice, however, that our compiler is a Scheme program, and the syntax procedures that it uses to manipulate expressions are the actual Scheme procedures used with the metacircular evaluator. For the explicit-control evaluator, in contrast, we assumed that equivalent syntax operations were available as operations for the register machine. (Of course, when we simulated the register machine in Scheme, we used the actual Scheme procedures in our register machine simulation.)
This procedure uses a feature of Lisp called backquote (or quasiquote) that is handy for constructing lists. Preceding a list with a backquote symbol is much like quoting it, except that anything in the list that is flagged with a comma is evaluated.For example, if the value of linkage is the symbol branch25, then the expression `((goto (label ,linkage))) evaluates to the list ((goto (label branch25))). Similarly, if the value of x is the list (a b c), then `(1 2 ,(car x)) evaluates to the list (1 2 a).
We can't just use the labels true-branch, false-branch, and after-if as shown above, because there might be more than one if in the program. The compiler uses the procedure make-label to generate labels. Make-label takes a symbol as argument and returns a new symbol that begins with the given symbol. For example, successive calls to (make-label 'a) would return a1, a2, and so on. Make-label can be implemented similarly to the generation of unique variable names in the query language, as follows:(define label-counter 0)(define (new-label-number) (set! label-counter (+ 1 label-counter)) label-counter)
(define (make-label name) (string->symbol (string-append (symbol->string name) (number->string (new-label-number)))))
We need machine operations to implement a data structure for representing compiled procedures, analogous to the structure for compound procedures described in section :(define (make-compiled-procedure entry env) (list 'compiled-procedure entry env))(define (compiled-procedure? proc) (tagged-list? proc 'compiled-procedure))
(define (compiled-procedure-entry c-proc) (cadr c-proc))
(define (compiled-procedure-env c-proc) (caddr c-proc))
Actually, we signal an error when the target is not val and the linkage is return, since the only place we request return linkages is in compiling procedures, and our convention is that procedures return their values in val.
Making a compiler generate tail-recursive code might seem like a straightforward idea. But most compilers for common languages, including C and Pascal, do not do this, and therefore these languages cannot represent iterative processes in terms of procedure call alone. The difficulty with tail recursion in these languages is that their implementations use the stack to store procedure arguments and local variables as well as return addresses. The Scheme implementations described in this book store arguments and variables in memory to be garbage-collected. The reason for using the stack for variables and arguments is that it avoids the need for garbage collection in languages that would not otherwise require it, and is generally believed to be more efficient. Sophisticated Lisp compilers can, in fact, use the stack for arguments without destroying tail recursion. (See Hanson 1990 for a description.) There is also some debate about whether stack allocation is actually more efficient than garbage collection in the first place, but the details seem to hinge on fine points of computer architecture. (See Appel 1987 and Miller and Rozas 1994 for opposing views on this issue.)
The variable all-regs is bound to the list of names of all the registers:(define all-regs '(env proc val argl continue))
Note that preserving calls append with three arguments. Though the definition of append shown in this book accepts only two arguments, Scheme standardly provides an append procedure that takes an arbitrary number of arguments.
We have used the same symbol + here to denote both the source-language procedure and the machine operation. In general there will not be a one-to-one correspondence between primitives of the source language and primitives of the machine.
Making the primitives into reserved words is in general a bad idea, since a user cannot then rebind these names to different procedures. Moreover, if we add reserved words to a compiler that is in use, existing programs that define procedures with these names will stop working. See exercise for ideas on how to avoid this problem.
This is not true if we allow internal definitions, unless we scan them out. See exercise .
This is the modification to variable lookup required if we implement the scanning method to eliminate internal definitions (exercise ). We will need to eliminate these definitions in order for lexical addressing to work.
Lexical addresses cannot be used to access variables in the global environment, because these names can be defined and redefined interactively at any time. With internal definitions scanned out, as in exercise , the only definitions the compiler sees are those at top level, which act on the global environment. Compilation of a definition does not cause the defined name to be entered in the compile-time environment.
Of course, compiled procedures as well as interpreted procedures are compound (nonprimitive). For compatibility with the terminology used in the explicit-control evaluator, in this section we will use ``compound'' to mean interpreted (as opposed to compiled).
Now that the evaluator machine starts with a branch, we must always initialize the flag register before starting the evaluator machine. To start the machine at its ordinary read-eval-print loop, we could use(define (start-eceval) (set! the-global-environment (setup-environment)) (set-register-contents! eceval 'flag false) (start eceval))
Since a compiled procedure is an object that the system may try to print, we also modify the system print operation user-print (from section ) so that it will not attempt to print the components of a compiled procedure:(define (user-print object) (cond ((compound-procedure? object) (display (list 'compound-procedure (procedure-parameters object) (procedure-body object) '<procedure-env>))) ((compiled-procedure? object) (display '<compiled-procedure>)) (else (display object))))
We can do even better by extending the compiler to allow compiled code to call interpreted procedures. See exercise .
Independent of the strategy of execution, we incur significant overhead if we insist that errors encountered in execution of a user program be detected and signaled, rather than being allowed to kill the system or produce wrong answers. For example, an out-of-bounds array reference can be detected by checking the validity of the reference before performing it. The overhead of checking, however, can be many times the cost of the array reference itself, and a programmer should weigh speed against safety in determining whether such a check is desirable. A good compiler should be able to produce code with such checks, should avoid redundant checks, and should allow programmers to control the extent and type of error checking in the compiled code.Compilers for popular languages, such as C and C++, put hardly any error-checking operations into running code, so as to make things run as fast as possible. As a result, it falls to programmers to explicitly provide error checking. Unfortunately, people often neglect to do this, even in critical applications where speed is not a constraint. Their programs lead fast and dangerous lives. For example, the notorious ``Worm'' that paralyzed the Internet in 1988 exploited the UNIX operating system's failure to check whether the input buffer has overflowed in the finger daemon. (See Spafford 1989.)
Of course, with either the interpretation or the compilation strategy we must also implement for the new machine storage allocation, input and output, and all the various operations that we took as ``primitive'' in our discussion of the evaluator and compiler. One strategy for minimizing work here is to write as many of these operations as possible in Lisp and then compile them for the new machine. Ultimately, everything reduces to a small kernel (such as garbage collection and the mechanism for applying actual machine primitives) that is hand-coded for the new machine.
This strategy leads to amusing tests of correctness of the compiler, such as checking whether the compilation of a program on the new machine, using the compiled compiler, is identical with the compilation of the program on the original Lisp system. Tracking down the source of differences is fun but often frustrating, because the results are extremely sensitive to minuscule details.