24 Steps of Compiler Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

An Incremental Approach to Compiler Construction

Abdulaziz Ghuloum
Department of Computer Science, Indiana University, Bloomington, IN 47408
[email protected]

Abstract too simplistic and do not prepare the novice compiler writer to
Compilers are perceived to be magical artifacts, carefully crafted construct a useful compiler. The source language of these compilers
by the wizards, and unfathomable by the mere mortals. Books on often lacks depth and the target machine is often fictitious. Niklaus
compilers are better described as wizard-talk: written by and for Wirth states that “to keep both the resulting compiler reasonably
a clique of all-knowing practitioners. Real-life compilers are too simple and the development clear of details that are of relevance
complex to serve as an educational tool. And the gap between only for a specific machine and its idiosyncrasies, we postulate
real-life compilers and the educational toy compilers is too wide. an architecture according to our own choice”[20]. On the other
The novice compiler writer stands puzzled facing an impenetrable extreme, advanced books focus mainly on optimization techniques,
barrier, “better write an interpreter instead.” and thus target people who are already well-versed in the topic.
The goal of this paper is to break that barrier. We show that There is no gradual progress into the field of compiler writing.
building a compiler can be as easy as building an interpreter. The The usual approach to introducing compilers is by describing
compiler we construct accepts a large subset of the Scheme pro- the structure and organization of a finalized and polished compiler.
gramming language and produces assembly code for the Intel-x86 The sequencing of the material as presented in these books mirrors
architecture, the dominant architecture of personal computing. The the passes of the compilers. Many of the issues that a compiler
development of the compiler is broken into many small incremen- writer has to be aware of are solved beforehand and only the final
tal steps. Every step yields a fully working compiler for a progres- solution is presented. The reader is not engaged in the process of
sively expanding subset of Scheme. Every compiler step produces developing the compiler.
real assembly code that can be assembled then executed directly In these books, the sequential presentation of compiler imple-
by the hardware. We assume that the reader is familiar with the mentation leads to loss of focus on the big picture. Too much focus
basic computer architecture: its components and execution model. is placed on the individual passes of the compiler; thus the reader
Detailed knowledge of the Intel-x86 architecture is not required. is not actively aware of the relevance of a single pass to the other
The development of the compiler is described in detail in an passes and where it fits in the whole picture. Andrew Appel states
extended tutorial. Supporting material for the tutorial such as an that “a student who implements all the phases described in Part I of
automated testing facility coupled with a comprehensive test suite the book will have a working compiler”[2]. Part I of Appel’s book
are provided with the tutorial. It is our hope that current and future concludes with a 6-page chapter on “Putting it all together” after
implementors of Scheme find in this paper the motivation for de- presenting 11 chapters on the different passes of Tiger.
veloping high-performance compilers and the means for achieving Moreover, practical topics such as code generation for a real
that goal. machine, interfacing to the operating system or to other languages,
heap allocation and garbage collection, and the issues surround-
Categories and Subject Descriptors D.3.4 [Processors]: Compil- ing dynamic languages are either omitted completely or placed
ers; K.3.2 [Computer and Information Science Education]: Com- in an appendix. Muchnick states that “most of the compiler ma-
puter science education terial in this book is devoted to languages that are well suited for
compilation: languages that have static, compile-time type systems,
Keywords Scheme, Compilers
that do not allow the user to incrementally change the code, and
that typically make much heavier use of stack storage than heap
1. Introduction storage”[13].
Compilers have traditionally been regarded as complex pieces of
software. The perception of complexity stems mainly from tradi-
tional methods of teaching compilers as well as the lack of available
2. Preliminary Issues
examples of small and functional compilers for real languages. To develop a compiler, there are a few decisions to be made.
Compiler books are polarized into two extremes. Some of them The source language, the implementation language, and the target
focus on “educational” toy compilers while the others focus on architecture must be selected. The development time frame must
“industrial-strength” optimizing compilers. The toy compilers are be set. The development methodology and the final goal must be
decided. For the purpose of our tutorial, we made the decisions
presented below.

2.1 Our Target Audience


We do not assume that the reader knows anything about assem-
bly language beyond the knowledge of the computer organization,
memory, and data structures. The reader is assumed to have very
Proceedings of the 2006 Scheme and Functional Programming Workshop limited or no experience in writing compilers. Some experience
University of Chicago Technical Report TR-2006-06 with writing simple interpreters is helpful, but not required.

27
We assume that the reader has basic knowledge of C and the C 2.6 Development Methodology
standard library (e.g. malloc, printf, etc.). Although our com- We advocate the following iterative development methodology:
piler will produce assembly code, some functionality is easier to
implement in C; implementing it directly as assembly routines dis- 1. Choose a small subset of the source language that we can
tracts the reader from the more important tasks. compile directly to assembly.
2. Write as many test cases as necessary to cover the chosen subset
2.2 The Source Language of the language.
In our tutorial, we choose a subset of Scheme as the source pro- 3. Write a compiler that accepts an expression (in the chosen sub-
gramming language. The simple and uniform syntax of Scheme set of the source language) and outputs the equivalent sequence
obviates the need for a lengthy discussion of scanners and parsers. of assembly instructions.
The execution model of Scheme, with strict call-by-value evalua-
tion, simplifies the implementation. Moreover, all of the Scheme 4. Ensure that the compiler is functional, i.e. it passes all the tests
primitives in the subset can be implemented in short sequences of that are written beforehand.
assembly instructions. Although not all of Scheme is implemented 5. Refactor the compiler, if necessary, making sure that none of
in the first compiler, all the major compiler-related issues are tack- the tests are broken due to incorrect refactoring.
led. The implementation is a middle-ground between a full Scheme
compiler and a toy compiler. 6. Enlarge the subset of the language in a very small step and re-
In choosing a specific source language, we gain the advantage peat the cycle by writing more tests and extending the compiler
that the presentation is more concrete and eliminates the burden to meet the newly-added requirements.
of making the connection from the abstract concepts to the actual
A fully working compiler for the given subset of the language
language.
is available at every step in the development cycle starting from
the first day of development. The test cases are written to help en-
2.3 The Implementation Language sure that the implementation meets the specifications and to guard
Scheme is chosen as the implementation language of the compiler. against bugs that may be introduced during the refactoring steps.
Scheme’s data structures are simple and most Scheme program- Knowledge of compilation techniques as well as the target machine
mers are familiar with the basic tasks such as constructing and pro- is built incrementally. The initial overhead of learning the assembly
cessing lists and trees. The ability to manipulate Scheme programs instructions of the target machine is eliminated—instructions are
as Scheme data structures greatly simplifies the first steps of con- introduced only when they are needed. The compiler starts small
structing a compiler, since the issue of reading the input program is and is well focused on translating the source language to assembly,
solved. Implementing a lexical-scanner and a parser are pushed to and every incremental step reinforces that focus.
the end of the tutorial.
Choosing Scheme as the implementation language also elimi- 2.7 Testing Infrastructure
nates the need for sophisticated and specialized tools. These tools The interface to the compiler is defined by one Scheme procedure,
add a considerable overhead to the initial learning process and dis- compile-program, that takes as input an s-expression represent-
tracts the reader from acquiring the essential concepts. ing a Scheme program. The output assembly is emitted using an
emit form that routes the output of the compiler to an assembly
2.4 Choosing The Target Architecture file.
We choose the Intel-x86 architecture as our target platform. The Defining the compiler as a Scheme procedure allows us to de-
velop and debug the compiler interactively by inspecting the output
x86 architecture is the dominant architecture on personal comput-
ers and thus is widely available. assembly code. It also allows us to utilize an automated testing fa-
Talking about compilers that are detached from a particular cility. There are two core components of the testing infrastructure:
the test-cases and the test-driver.
architecture puts the burden on the reader to make the connection
from the abstract ideas to the concrete machine. Novice compiler The test cases are made of sample programs and their expected
writers are unlikely to be able to derive the connection on their own. output. For example, the test cases for the primitive + may be
defined as follows:
Additionally, the compiler we develop is small enough to be easily
portable to other architectures, and the majority of the compiler (test-section "Simple Addition")
passes are platform independent. (test-case ’(+ 10 15) "25")
(test-case ’(+ -10 15) "5")
2.5 Development Time Frame ...
The development of the compiler must proceed in small steps
where every step can be implemented and tested in one sitting. Fea- The test-driver iterates over the test cases performing the follow-
tures that require many sittings to complete are broken down into ing actions: (1) The input expression is passed to compile-program
smaller steps. The result of completing every step is a fully working to produce assembly code. (2) The assembly code and a minimal
compiler. The compiler writer, therefore, achieves progress in every run-time system (to support printing) are assembled and linked to
step in the development. This is in contrast with the traditional de- form an executable. (3) The executable is run and the output is
velopment strategies that advocate developing the compiler as a se- compared to the expected output string. An error is signaled if any
ries of passes only the last of which gives the sense of accomplish- of the previous steps fails.
ment. With our approach of incremental development, where every
step results in a fully working compiler for some subset of Scheme, 2.8 The End Goal
the risk of not “completing” the compiler is minimized. This ap- For the purpose of this paper, we define the end goal to be writing
proach is useful for people learning about compilers on their own, a compiler powerful enough to compile an interactive evaluator.
where the amount of time they can dedicate constantly changes. It Building such a compiler forces us to solve many interesting prob-
is also useful in time-limited settings such as an academic semester. lems.

28 Scheme and Functional Programming, 2006


A large subset of Scheme’s core forms (lambda, quote, set!, /* a simple driver for scheme_entry */
etc) and extended forms (cond, case, letrec, internal define #include <stdio.h>
etc.) must be supported by the compiler. Although most of these int main(int argc, char** argv){
forms are not essential, their presence allows us to write our pro- printf("%d\n", scheme_entry());
grams in a more natural way. In implementing the extended forms, return 0;
we show how a large number of syntactic forms can be added with- }
out changing the core language that the compiler supports.
A large collection of primitives (cons, car, vector?, etc.) 3.2 Immediate Constants
and library procedures (map, apply, list->vector, etc.) need Values in Scheme are not limited to the fixnum integers. Booleans,
to be implemented. Some of these library procedures can be im- characters, and the empty list form a collection of immediate val-
plemented directly, while others require some added support from ues. Immediate values are those that can be stored directly in
the compiler. For example, some of the primitives cannot be im- a machine word and therefore do not require additional storage.
plemented without supporting variable-arity procedures, and others The types of the immediate objects in Scheme are disjoint, conse-
require the presence of apply. Implementing a writer and a reader quently, the implementation cannot use fixnums to denote booleans
requires adding a way to communicate with an external run-time or characters. The types must also be available at run time to al-
system. low the driver to print the values appropriately and to allow us to
provide the type predicates (discussed in the next step).
3. Writing a Compiler in 24 Small Steps One way of encoding the type information is by dedicating some
Now that we described the development methodology, we turn our of the lower bits of the machine word for type information and
attention to the actual steps taken in constructing a compiler. This using the rest of the machine word for storing the value. Every type
section is a brief description of 24 incremental stages: the first is a of value is defined by a mask and a tag. The mask defines which bits
small language composed only of small integers, and the last covers of the integer are used for the type information and the tag defines
most of the requirements of R5 RS. A more detailed presentation of the value of these bits.
these stages is in the accompanying extended tutorial. For fixnums, the lower two bits (mask = 11b ) must be 0
(tag = 00b ). This leaves 30 bits to hold the value of a fixnum.
3.1 Integers Characters are tagged with 8 bits (tag = 00001111b ) leaving 24
The simplest language that we can compile and test is composed bits for the value (7 of which are actually used to encode the ASCII
of the fixed-size integers, or fixnums. Let’s write a small compiler characters). Booleans are given a 7-bit tag (tag = 0011111b ), and
that takes a fixnum as input and produces a program in assembly 1-bit value. The empty list is given the value 00101111b .
that returns that fixnum. Since we don’t know yet how to do that, We extend our compiler to handle the immediate types appro-
we ask for some help from another compiler that does know: gcc. priately. The code generator must convert the different immediate
Let’s write a small C function that returns an integer: values to the corresponding machine integer values.

int scheme_entry(){ (define (compile-program x)


return 42; (define (immediate-rep x)
} (cond
((integer? x) (shift x fixnum-shift))
Let’s compile it using gcc -O3 --omit-frame-pointer -S ...))
test.c and see the output. The most relevant lines of the output (emit "movl $~a, %eax" (immediate-rep x))
file are the following: (emit "ret"))
1. .text
2. .p2align 4,,15 The driver must also be extended to handle the newly-added
3. .globl scheme_entry values. The following code illustrates the concept:
4. .type scheme_entry, @function
#include <stdio.h>
5. scheme_entry:
#define fixnum_mask 3
6. movl $42, %eax
#define fixnum_tag 0
7. ret
#define fixnum_shift 2
Line 1 starts a text segment, where code is located. Line 2 aligns ...
the beginning of the procedure at 4-byte boundaries (not important int main(int argc, char** argv){
at this point). Line 3 informs the assembler that the scheme entry int val = scheme_entry();
label is global so that it becomes visible to the linker. Line 4 if((val & fixnum_mask) == fixnum_tag){
says that scheme entry is a function. Line 5 denotes the start of printf("%d\n", val >> fixnum_shift);
the scheme entry procedure. Line 6 sets the value of the %eax } else if(val == empty_list){
register to 42. Line 7 returns control to the caller, which expects printf("()\n");
the received value to be in the %eax register. } ...
Generating this file from Scheme is straightforward. Our com- return 0;
piler takes an integer as input and prints the given assembly with }
the input substituted in for the value to be returned.
3.3 Unary Primitives
(define (compile-program x)
(emit "movl $~a, %eax" x) We extend the language now to include calls to primitives that ac-
(emit "ret")) cept one argument. We start with the simplest of these primitives:
add1 and sub1. To compile an expression in the form (add1 e),
To test our implementation, we write a small C run-time system we first emit the code for e. That code would evaluate e placing its
that calls our scheme entry and prints the value it returns: value in the %eax register. What remains to be done is incrementing

Scheme and Functional Programming, 2006 29


the value of the %eax register by 4 (the shifted value of 1). The ma- (define (emit-primitive-call x si)
chine instruction that performs addition/subtraction is addl/subl. (case (primcall-op x)
((add1) ...)
(define (emit-expr x) ((+)
(cond (emit-expr (primcall-operand2 x) si)
((immediate? x) (emit "movl %eax, ~a(%esp)" si)
(emit "movl $~a, %eax" (immediate-rep x))) (emit-expr
((primcall? x) (primcall-operand1 x)
(case (primcall-op x) (- si wordsize))
((add1) (emit "addl ~a(%esp), %eax" si))
(emit-expr (primcall-operand1 x)) ...))
(emit "addl $~a, %eax" (immediate-rep 1)))
...)) The other primitives (-, *, =, <, char=?, etc.) can be easily
(else ...))) implemented by what we know so far.
The primitives integer->char and char->integer can be 3.5 Local Variables
added next. To convert an integer (assuming it’s in the proper Now that we have a stack, implementing let and local variables
range) to a character, the integer (already shifted by 2 bits) is is straightforward. All local variables will be saved on the stack
shifted further 6 bits to make up a total of char-shift, the result is and an environment mapping variables to stack locations is main-
then tagged with the char-tag. Converting a character to a fixnum tained. When the code generator encounters a let-expression, it
requires a shift to the right by 6 bits. The choice of tags for the first evaluates the right-hand-side expressions, one by one, saving
fixnums and characters is important for realizing this concise and the value of each in a specific stack location. Once all the right-
potentially fast conversion. hand-sides are evaluated, the environment is extended to associate
We implement the predicates null?, zero?, and not next. the new variables with their locations, and code for the body of the
There are many possible ways of implementing each of these pred- let is generated in the new extended environment. When a refer-
icates. The following sequence works for zero? (assuming the ence to a variable is encountered, the code generator locates the
value of the operand is in %eax): variable in the environment, and emits a load from that location.
1. cmpl $0, %eax (define (emit-expr x si env)
2. movl $0, %eax (cond
3. sete %al ((immediate? x) ...)
4. sall $7, %eax ((variable? x)
5. orl $63, %eax (emit "movl ~a(%esp), %eax" (lookup x env)))
Line 1 compares the value of %eax to 0. Line 2 zeros the value ((let? x)
of %eax. Line 3 sets %al, the low byte of %eax, to 1 if the two (emit-let (bindings x) (body x) si env))
compared values were equal, and to 0 otherwise. Lines 4 and 5 ((primcall? x) ...)
construct the appropriate boolean value from the one bit in %eax. ...))
The predicates integer? and boolean? are handled similarly
with the exception that the tag of the value must be extracted (using (define (emit-let bindings body si env)
andl) before it is compared to the fixnum/boolean tag. (let f ((b* bindings) (new-env env) (si si))
(cond
3.4 Binary Primitives ((null? b*) (emit-expr body si new-env))
Calls to binary, and higher-arity, primitives cannot in general be (else
evaluated using a single register since evaluating one subexpression (let ((b (car b*)))
may overwrite the value computed for the other subexpression. To (emit-expr (rhs b) si env)
implement binary primitives (such as +, *, char<?, etc.), we use (emit "movl %eax, ~a(%esp)" si)
a stack to save intermediate values of computations. For example, (f (cdr b*)
generating the code for (+ e0 e1 ) is achieved by (1) emitting the (extend-env (lhs b) si new-env)
code for e1 , (2) emitting an instruction to save the value of %eax on (- si wordsize)))))))
the stack, (3) emitting the code for e0 , and (4) adding the value of 3.6 Conditional Expressions
%eax to the value saved on the stack.
The stack is arranged as a contiguous array of memory loca- Conditional evaluation is simple at the assembly-level. The sim-
tions. A pointer to the base of the stack is in the %esp register. The plest implementation of (if test conseq altern) is:
base of the stack, 0(%esp), contains the return-point. The return- (define (emit-if test conseq altern si env)
point is an address in memory where we return after computing the (let ((L0 (unique-label)) (L1 (unique-label)))
value and therefore should not be modified. We are free to use the (emit-expr test si env)
memory locations above the return-point (-4(%esp), -8(%esp), (emit-cmpl (immediate-rep #f) eax)
-12(%esp), etc.) to hold our intermediate values. (emit-je L0)
In order to guarantee never overwriting any value that will be (emit-expr conseq si env)
needed after the evaluation of an expression, we arrange the code (emit-jmp L1)
generator to maintain the value of the stack index. The stack index (emit-label L0)
is a negative number that points to the first stack location that (emit-expr altern si env)
is free. The value of the stack index is initialized to −4 and is (emit-label L1)))
decremented by 4 (the word-size, 4 bytes) every time a new value is
saved on the stack. The following segment of code illustrates how The code above first evaluates the test expression and compares
the primitive + is implemented: the result to the false value. Control is transferred to the alternate

30 Scheme and Functional Programming, 2006


low address low address memory location in the vector/string to hold the length, and (2)
after allocating the object, the allocation pointer must be aligned to
used the next double-word boundary (allocating pairs was fine because
space
their size is a multiple of 8). For example, a call to the primitive
used
ap field1 %esi space make-vector translates to:
field2 %esi+4

field3 %esi+8 movl %eax, 0(%esi) # set the length


free
movl %eax, %ebx # save the length
heap ap %esi movl %esi, %eax # eax = esi | 2
space
… free
heap …
%esi+4
%esi+8
orl $2, %eax
addl $11, %ebx # align size to next
space
andl $-8, %ebx # object boundary
high address high address
addl %ebx, %esi # advance alloc ptr
(A) Before allocation (B) After allocation Strings are implemented similarly except that the size of a string
is smaller than the size of a vector of the same length. The primitive
Figure 1. Illustration of the heap. The allocation pointer (ap) is is string-ref (and string-set!) must also take care of converting
held in the %esi register and its value is always aligned on 8-byte a byte value to a character (and vise versa).
boundaries. Individual objects are allocated at address %esi and the 3.8 Procedure Calls
allocation pointer is bumped to the first boundary after the object.
The implementation of procedures and procedure calls are perhaps
the hardest aspect of constructing our compiler. The reason for its
code if the value of the test was false, otherwise, it falls through to difficulty is that Scheme’s lambda form performs more than one
the consequent. task and the compiler must tease these tasks apart. First, a lambda
expression closes over the variables that occur free in its body so we
3.7 Heap Allocation must perform some analysis to determine the set of variables that
Scheme’s pairs, vector, strings, etc. do not fit in one machine word are referenced, but not defined, in the body of a lambda. Second,
and must be allocated in memory. We allocate all objects from lambda constructs a closure object that can be passed around.
one contiguous area of memory. The heap is preallocated at the Third, the notion of procedure calls and parameter-passing must
start of the program and its size is chosen to be large enough to be introduced at the same point. We’ll handle these issues one at a
accommodate our current needs. A pointer to the beginning of the time starting with procedure calls and forgetting all about the other
heap is passed to scheme entry to serve as the allocation pointer. issues surrounding lambda.
We dedicate one register, %esi, to hold the allocation pointer. Every We extend the language accepted by our code generator to con-
time an object is constructed, the value of %esi is incremented tain top-level labels (each bound to a code expression containing a
according to the size of the object. list of formal parameters and a body expression) and label calls.
The types of the objects must also be distinguishable from
<Prog> ::= (labels ((lvar <LExpr>) ...) <Expr>)
one another. We use a tagging scheme similar to the one used
<LExpr> ::= (code (var ...) <Expr>)
for fixnums, booleans, and characters. Every pointer to a heap-
<Expr> ::= immediate
allocated object is tagged by a 3-bit tag (001b for pairs, 010b for
| var
vectors, 011b for strings, 101b for symbols, and 110b for closures;
| (if <Expr> <Expr> <Expr>)
000b , 100b and 111b were already used for fixnums and the other
| (let ((var <Expr>) ...) <Expr>)
immediate objects). For this tagging scheme to work, we need to
| (primcall prim-name <Expr> ...)
guarantee that the lowest three bits of every heap-allocated object
| (labelcall lvar <Expr> ...)
is 000b so that the tag and the value of the pointer do not interfere.
This is achieved by always allocating objects at double-word (or
8-byte) boundaries.
Let’s consider how pairs are implemented first. A pair requires
two words of memory to hold its car and cdr fields. A call to low address low address

(cons 10 20) can be translated to:


free free
movl $40, 0(%esi) # set the car stack stack
movl $80, 4(%esi) # set the cdr space … space …
movl %esi, %eax # eax = esi | 1 arg3 %esp-28 arg3 %esp-12

orl $1, %eax outgoing arg2 %esp-24 incoming arg2 %esp-8


arguments arguments
addl $8, %esi # bump esi arg1 %esp-20 arg1 %esp-4

%esp-16 base return point %esp

%esp-12
The primitives car and cdr are simple; we only need to re- local3
locals local2 %esp-8
member that the pointer to the pair is its address incremented by 1. %esp-4
local1
Consequently, the car and cdr fields are located at −1 and 3 from base return point %esp

the pointer. For example, the primitive caddr translates to:


high address high address

movl 3(%eax), %eax # cdr (A) Caller’s View (B) Callee’s View
movl 3(%eax), %eax # cddr
movl -1(%eax), %eax # caddr
Figure 2. The view of the stack from (A) the Caller’s side before
Vectors and strings are different from pairs in that they vary in making the procedure call, and (B) the Callee’s side on entry to the
length. This has two implications: (1) we must reserve one extra procedure.

Scheme and Functional Programming, 2006 31


Code generation for the new forms is as follows: low address low address

• For the labels form, a new set of unique labels are created and
free
the initial environment is constructed to map each of the lvars stack
to its corresponding label. space …
free
• For each code expression, the label is first emitted, followed arg4 %esp-28

by the code of the body. The environment used for the body outgoing
arguments
arg3 %esp-24

%esp-20
stack
space …
arg2
contains, in addition to the lvars, a mapping of each of the arg1 %esp-16 arg4 %esp-16

formal parameters to the first set of stack locations (−4, −8, local3 %esp-12
outgoing arg3 %esp-12

etc.). The stack index used for evaluating the body starts above locals local2 %esp-8 arguments arg2 %esp-8

local1 %esp-4 arg1 %esp-4


the last index used for the formals. return point %esp return point %esp
base base
• For a (labelcall lvar e ...), the arguments are evaluated high address high address
and their values are saved in consecutive stack locations, skip-
(A) Before Tail Call (B) At Tail Call
ping one location to be used for the return-point. Once all of the
arguments are evaluated, the value of the stack-pointer, %esp
is incremented to point to one word below the return-point. A Figure 3. One way of implementing proper tail calls is by collaps-
call to the label associated with the lvar is issued. A call ing the tail frame. The figures show (A) the evaluation and place-
instruction decrements the value of %esp by 4 and saves the ad- ment of the arguments on the stack above the local variables, then
dress of the next instruction in the appropriate return-point slot. (B) moving the arguments down to overwrite the current frame im-
Once the called procedure returns (with a value in %eax), the mediately before making the tail jump.
stack pointer is adjusted back to its initial position.
Figure 2 illustrates the view of the stack from the caller and callee
perspective. (let ((x 5))
(lambda (y) (lambda () (+ x y))))
3.9 Closures
Implementing closures on top of what we have so far should be is transformed to:
straightforward. First, we modify the language accepted by our
code generator as follows: (let ((x 5))
• The form (closure lvar var ...) is added to the lan- (lambda (y) (x) (lambda () (x y) (+ x y))))
guage. This form is responsible for constructing closures. The
first cell of a closure contains the label of a procedure, and the 2. The lambda forms are transformed into closure forms and the
remaining cells contain the values of the free variables. codes are collected at the top. The previous example yields:
• The code form is extended to contain a list of the free variables
(labels ((f0 (code () (x y) (+ x y)))
in addition to the existing formal parameters. (f1 (code (y) (x) (closure f0 x y))))
• The labelcall is replaced by a funcall form that takes an (let ((x 5)) (closure f1 x)))
arbitrary expression as a first argument instead of an lvar.
The closure form is similar to a call to vector. The label 3.10 Proper Tail Calls
associated with the lvar is stored at 0(%esi) and the values of The Scheme report requires that implementations be properly tail-
the variables are stored in the next locations. The value of %esi is recursive. By treating tail-calls properly, we guarantee that an un-
tagged to get the value of the closure, and %esi is bumped by the bounded number of tail calls can be performed in constant space.
required amount. So far, our compiler would compile tail-calls as regular calls
The code form, in addition to associating the formals with the followed by a return. A proper tail-call, on the other hand, must
corresponding stack locations, associates each of the free variables perform a jmp to the target of the call, using the same stack position
with their displacement form the closure pointer %edi. of the caller itself.
The funcall evaluated all the arguments as before but skips A very simple way of implementing tail-calls is as follows
not one but two stack locations: one to be used to save the current (illustrated in Figure 3):
value of the closure pointer, and one for the return point. After the
arguments are evaluated and saved, the operator is evaluated, and 1. All the arguments are evaluated and saved on the stack in the
its value is moved to %edi (whose value must be saved to its stack same way arguments to nontail calls are evaluated.
location). The value of %esp is adjusted and an indirect call through
the first cell of the closure pointer is issued. Upon return from the 2. The operator is evaluated and placed in the %edi register re-
call, the value of %esp is adjusted back and the value of %edi is placing the current closure pointer.
restored from the location at which it was saved. 3. The arguments are copied from their current position of the
One additional problem needs to be solved. The source lan- stack to the positions adjacent to the return-point at the base
guage that our compiler accepts has a lambda form, and none of of the stack.
the labels, code, closure forms. So, Scheme input must be con- 4. An indirect jmp, not call, through the address in the closure
verted to this form before our code generator can accept it. The pointer is issued.
conversion is easy to do in two steps:
1. Free-variable analysis is performed. Every lambda expression This treatment of tail calls is the simplest way of achieving
appearing in the source program is annotated with the set of the objective of the requirement. Other methods for enhancing
variables that are referenced but not defined in the body of the performance by minimizing the excessive copying are discussed
lambda. For example, later in Section 4.

32 Scheme and Functional Programming, 2006


3.11 Complex Constants (let ((f (lambda (c)
Scheme’s constants are not limited to the immediate objects. Using (cons (lambda (v) (set! c v))
the quote form, lists, vectors, and strings can be turned into con- (lambda () c)))))
stants as well. The formal semantics of Scheme require that quoted (let ((p (f 0)))
constants always evaluate to the same object. The following exam- ((car p) 12)
ple must always evaluate to true: ((cdr p))))
=>
(let ((f (lambda () (quote (1 . "H"))))) (let ((f (lambda (t0)
(eq? (f) (f))) (let ((c (vector t0)))
(cons (lambda (v) (vector-set! c 0 v))
So, in general, we cannot transform a quoted constant into an (lambda () (vector-ref c 0)))))))
unquoted series of constructions as the following incorrect trans- (let ((p (f 0)))
formation demonstrates: ((car p) 12)
(let ((f (lambda () (cons 1 (string #\H))))) ((cdr p))))
(eq? (f) (f))) 3.13 Extending the Syntax
One way of implementing complex constants is by lifting their With most of the core forms (lambda, let, quote, if, set!,
construction to the top of the program. The example program can constants, variables, procedure calls, and primitive calls) in place,
be transformed to an equivalent program containing no complex we can turn to extending the syntax of the language. The input to
constants as follows: our compiler is preprocessed by a pass, a macro-expander, which
performs the following tasks:
(let ((tmp0 (cons 1 (string #\H))))
(let ((f (lambda () tmp0))) • All the variables are renamed to new unique names through α-
(eq? (f) (f)))) conversion. This serves two purposes. First, making all vari-
ables unique eliminates the ambiguity between variables. This
Performing this transformation before closure conversion makes
makes the analysis passes required for closure and assignment
the introduced temporaries occur as free variables in the enclosing
conversion simpler. Second, there is no fear of confusing the
lambdas. This increases the size of many closures, increasing heap
core forms with procedure calls to local variables with the same
consumption and slowing down the compiled programs.
name (e.g. an occurrence of (lambda (x) x) where lambda
Another approach for implementing complex constants is by
is a lexical variable).
introducing global memory locations to hold the values of these
constants. Every complex constant is assigned a label, denoting its • Additionally, this pass places explicit tags on all internal
location. All the complex constants are initialized at the start of the forms including function calls (funcall) and primitive calls
program. Our running example would be transformed to: (primcall).
• Extended forms are simplified to the code forms. The forms
(labels ((f0 (code () () (constant-ref t1)))
(t1 (datum))) let*, letrec, letrec*, cond, case, or, and, when, unless,
(constant-init t1 (cons 1 (string #\H))) and internal define are rewritten in terms of the core forms.
(let ((f (closure f0))) 3.14 Symbols, Libraries, and Separate Compilation
(eq? (f) (f))))
All of the primitives that we supported so far were simple enough to
The code generator should now be modified to handle the be implemented directly in the compiler as a sequence of assembly
data labels as well as the two internal forms constant-ref and instructions. This is fine for the simple primitives, such as pair?
constant-init. and vector-ref, but it will not be practical for implementing more
complex primitives such as length, map, display, etc..
3.12 Assignment Also, we restricted our language to allow primitives to occur
Let’s examine how our compiler treats variables. At the source only in the operator position: passing the value of the primitive car
level, variables are introduced either by let or by lambda. By was not allowed because car has no value. One way of fixing this
the time we get to code generation, a third kind (free-variables) is is by performing an inverse-η transformation:
there as well. When a lambda closes over a reference to a variable,
car ⇒ (lambda (x) (car x)).
we copied the value of the variable into a field in the closure. If
more than one closure references the variable, each gets its own This approach has many disadvantages. First, the resulting assem-
copy of the value. If the variable is assignable, then all references bly code is bloated with too many closures that were not present
and assignments occurring in the code must reference/assign to the in the source program. Second, the primitives cannot be defined
same location that holds the value of the the variable. Therefore, recursively or defined by using common helpers.
every assignable variable must be given one unique location to hold Another approach for making an extended library available is
its value. by wrapping the user code with a large letrec that defines all
The way we treat assignment is by making the locations of the primitive libraries. This approach is discouraged because the
assignable variables explicit. These locations cannot in general be intermixing of user-code and library-code hinders our ability to
stack-allocated due to the indefinite extent of Scheme’s closures. debug our compiler.
So, for every assignable variable, we allocate space on the heap (a A better approach is to define the libraries in separate files, com-
vector of size 1) to hold its value. An assignment to a variable x is piling them independently, and linking them directly with the user
rewritten as an assignment to the memory location holding x (via code. The library primitives are initialized before control enters the
vector-set!) and references to x are rewritten as references to user program. Every primitive is given a global location, or a la-
the location of x (via vector-ref). bel, to hold its value. We modify our compiler to handle two addi-
The following example illustrates assignment conversion when tional forms: (primitive-ref x) and (primitive-set! x v)
applied to a program containing one assignable variable c: which are analogous to constant-ref and constant-init that

Scheme and Functional Programming, 2006 33


we introduced in 3.11. The only difference is that global labels are low address low address

used to hold the values of the primitives.


The first library file initializes one primitive: string->symbol. free free
stack stack
Our first implementation of string->symbol need not be ef-
ficient: a simple linked list of symbols suffices. The primitive
space … space …
string->symbol, as its name suggests, takes a string as input and arg3 %esp-12 base return point %esp
incoming arg2 %esp-8 arg1 %esp+4
returns a symbol. By adding the core primitives make-symbol1 and arguments
arg1 %esp-4
incoming
arg2 %esp+8
arguments
symbol-string, the implementation of string->symbol simply base return point %esp arg3 %esp+12

traverses the list of symbols looking for one having the same string.
A new symbol is constructed if one with the same name does not
exist. This new symbol is then added to the list before it is returned.
Once string->symbol is implemented, adding symbols to our
high address high address
set of valid complex constants is straightforward by the following
transformation: (A) Parameter passing to Scheme (B) Parameter passing to C
(labels ((f0 (code () () ’foo)))
(let ((f (closure f0))) Figure 4. The parameters to Scheme functions are placed on the
(eq? (funcall f) (funcall f)))) stack above the return point while the parameters to C functions are
=> placed below the return point.
(labels ((f0 (code () () (constant-ref t1)))
(t1 (datum)))
(constant-init t1 3.16 Error Checking and Safe Primitives
(funcall (primitive-ref string->symbol) Using our newly acquired ability to write and exit, we can define
(string #\f #\o #\o))) a simple error procedure that takes two arguments: a symbol
(let ((f (closure f0))) (denoting the caller of error), and a string (describing the error).
(eq? (funcall f) (funcall f)))) The error procedure would write an error message to the console,
then causes the program to exit.
3.15 Foreign Functions With error, we can secure some parts of our implementation
Our Scheme implementation cannot exist in isolation. It needs to provide better debugging facilities. Better debugging allows us
a way of interacting with the host operating system in order to to progress with implementing the rest of the system more quickly
perform Input/Output and many other useful operations. We now since we won’t have to hunt for the causes of segfaults.
add a very simple way of calling to foreign C procedures. There are three main causes of fatal errors:
We add one additional form to our compiler:
1. Attempting to call non-procedures.
<Expr> ::= (foreign-call <string> <Expr> ...)
2. Passing an incorrect number of arguments to a procedure.
The foreign-call form takes a string literal as the first argu-
3. Calling primitives with invalid arguments. For example: per-
ment. The string denotes the name of the C procedure that we intend
forming (car 5) causes an immediate segfault. Worse, per-
to call. Each of the expressions are evaluated first and their values
forming vector-set! with an index that’s out of range causes
are passed as arguments to the C procedure. The calling convention
other parts of the system to get corrupted, resulting in hard-to-
for C differs from the calling convention that we have been using
debug errors.
for Scheme in that the arguments are placed below the return point
and in reverse order. Figure 4 illustrates the difference. Calling nonprocedures can be handled by performing a proce-
To accommodate the C calling conventions, we evaluate the dure check before making the procedure call. If the operator is not
arguments to a foreign-call in reverse order, saving the values a procedure, control is transferred to an error handler label that sets
on the stack, adjusting the value of %esp, issuing a call to the up a call to a procedure that reports the error and exits.
named procedure, then adjusting the stack pointer back to its initial Passing an incorrect number of arguments to a procedure can be
position. We need not worry about the C procedure clobbering the handled by a collaboration from the caller and the callee. The caller,
values of the allocation and closure pointer because the Application once it performs the procedure check, sets the value of the %eax
Binary Interface (ABI) guarantees that the callee would preserve register to be the number of arguments passed. The callee checks
the values of the %edi, %esi, %ebp and %esp registers[14]. that the value of %eax is consistent with the number of arguments
Since the values we pass to the foreign procedure are tagged, it expects. Invalid arguments cause a jump to a label that calls a
we would write wrapper procedures in our run-time file that take procedure that reports the error and exits.
care of converting from Scheme to C values and back. For primitive calls, we can modify the compiler to insert explicit
We first implement and test calling the exit procedure. Call- checks at every primitive call. For example, car translates to:
ing (foreign-call "exit" 0) should cause our program to
exit without performing any output. We also implement a wrapper movl %eax, %ebx
around write as follows: andl $7, %ebx
cmpl $1, %ebx
ptr s_write(ptr fd, ptr str, ptr len){ jne L_car_error
int bytes = write(unshift(fd), movl -1(%eax), %eax
string_data(str), ...
unshift(len)); L_car_error:
return shift(bytes); movl car_err_proc, %edi # load handler
} movl $0, %eax # set arg-count
jmp *-3(%edi) # call the handler
1 Symbols are similar to pairs in having two fields: a string and a value ...

34 Scheme and Functional Programming, 2006


Another approach is to restrict the compiler to unsafe primitives. The current-output-port is initialized at startup and its file
Calls to safe primitives are not open-coded by the compiler, instead, descriptor is 1 on Unix systems. The buffers are chosen to be
a procedure call to the safe primitive is issued. The safe primitives sufficiently large (4096 characters) in order to reduce the num-
are defined to perform the error checks themselves. Although this ber of trips to the operating system. The procedure write-char
strategy is less efficient than open-coding the safe primitives, the writes to the buffer, increments the index, and if the index of the
implementation is much simpler and less error-prone. port reaches its size, the contents of the buffer are flushed us-
ing s write (from 3.15) and the index is reset. The procedures
3.17 Variable-arity Procedures output-port?, open-output-file, close-output-port, and
Scheme procedures that accept a variable number of arguments are flush-output-port are also implemented.
easy to implement in the architecture we defined so far. Suppose 3.20 Write and Display
a procedure is defined to accept two or more arguments as in the
following example: Once write-char is implemented, implementing the procedures
write and display becomes straightforward by dispatching on
(let ((f (lambda (a b . c) (vector a b c)))) the type of the argument. The two procedures are identical except
(f 1 2 3 4)) for their treatment of strings and characters and therefore can be
implemented in terms of one common procedure. In order to write
The call to f passes four arguments in the stack locations the fixnums, the primitive quotient must be added to the com-
%esp-4, %esp-8, %esp-12, and %esp-16 in addition to the num- piler.
ber of arguments in %eax. Upon entry of f, and after performing Implementing write in Scheme allows us to eliminate the now-
the argument check, f enters a loop that constructs a list of the redundant writer that we implemented as part of the C run-time
arguments last to front. system.
Implementing variable-arity procedures allows us to define
many library procedures that accept any number of arguments 3.21 Input Ports
including +, -, *, =, <, . . . , char=?, char<?, . . . , string=?, The representation of input ports is very similar to output ports.
string<?, . . . , list, vector, string, and append. The only difference is that we add one extra field to support “un-
Other variations of lambda such as case-lambda, which al- reading” a character which adds very minor overhead to the prim-
lows us to dispatch different parts of the code depending on the itives read-char and peek-char, but greatly simplifies the im-
number of actual arguments, can be implemented easily and effi- plementation of the tokenizer (next step). The primitives added
ciently by a series of comparisons and conditional jumps. at this stage are input-port?, open-input-file, read-char,
unread-char, peek-char, and eof-object? (by adding a spe-
3.18 Apply cial end-of-file object that is similar to the empty-list).
The implementation of the apply primitive is analogous to the
implementation of variable-arity procedures. Procedures accepting 3.22 Tokenizer
variable number of arguments convert the extra arguments passed In order to implement the read procedure, we first implement
on the stack to a list. Calling apply, on the other hand, splices a read-token. The procedure read-token takes an input port as an
list of arguments onto the stack. argument and using read-char, peek-char, and unread-char,
When the code generator encounters an apply call, it generates it returns the next token. Reading a token involves writing an de-
the code in the same manner as if it were a regular procedure call. terministic finite-state automata that mimics the syntax of Scheme.
The operands are evaluated and saved in their appropriate stack The return value of read-token is one of the following:
locations as usual. The operator is evaluated and checked. In case
of nontail calls, the current closure pointer is saved and the stack • A pair (datum . x) where x is a fixnum, boolean, character,
pointer is adjusted. In case of tail calls, the operands are moved to string, or symbol that was encountered next while scanning the
overwrite the current frame. The number of arguments is placed port.
in %eax as usual. The only difference is that instead of calling • A pair (macro . x) where x denotes one of Scheme’s pre-
the procedure directly, we call/jmp to the L apply label which defined reader-macros: quote, quasiquote, unquote, or
splices the last argument on the stack before transferring control to unquote-splicing.
the destination procedure. • A symbol left-paren, right-paren, vec-paren, or dot
Implementing apply makes it possible to define the library
denoting the corresponding non-datum token encountered.
procedures that take a function as well as an arbitrary number of
arguments such as map and for-each. • The end-of-file object if read-char returns the end-of-file ob-
ject before we find any other tokens.
3.19 Output Ports
3.23 Reader
The functionality provided by our compiler so far allows us to
implement output ports easily in Scheme. We represent output ports The read procedure is built as a recursive-descent parser on top
by vector containing the following fields: of read-token. Because of the simplicity of the syntax (i.e. the
only possible output is the eof-object, data, lists, and vectors) the
0. A unique identifier that allows us to distinguish output ports entire implementation, including error checking, should not exceed
from ordinary vectors. 40 lines of direct Scheme code.
1. A string denoting the file name associated with the port. 3.24 Interpreter
2. A file-descriptor associated with the opened file. We have all the ingredients required for implementing an environment-
3. A string that serves as an output buffer. passing interpreter for core Scheme. Moreover, we can lift the first
pass of the compiler and make it the first pass to the interpreter as
4. An index pointing to the next position in the buffer. well. We might want to add some restriction to the language of the
5. The size of the buffer. interpreter (i.e. disallowing primitive-set!) in order to prevent

Scheme and Functional Programming, 2006 35


the user code from interfering with the run-time system. We might tures. Such choice of representation makes error-checks faster and
also like to add different binding modes that determine whether reduces the memory requirements and cache exhaustion.
references to primitive names refer to the actual primitives or to Although we did not implement any source-level or backend
the current top-level bindings and whether assignment to primitive optimizations, there is no reason why these optimization passes
names are allowed or not. cannot be added in the future. We mention some “easy” steps that
can be added to the compiler and are likely to yield high payoff:
4. Beyond the Basic Compiler
• Our current treatment of letrec and letrec* is extremely
There are several axes along which one can enhance the basic inefficient. Better letrec treatment as described in [19] would
compiler. The two main axes of enhancements are the feature-axis allow us to (1) reduce the amount of heap allocation since most
and the performance-axis. letrec-bound variables won’t be assignable, (2) reduce the
size of many closures by eliminating closures with no free-
4.1 Enhancing Features variables, (3) recognize calls to known procedures which allows
The implementation presented in Section 3 featured many of the es- us to perform calls to known assembly labels instead of making
sential requirements for Scheme including proper tail calls, variable all calls indirect through the code pointers stored in closures,
arity-procedures, and apply. It also featured a facility for perform- (4) eliminate the procedure check at calls to statically-known
ing foreign-calls that allows us to easily leverage the capabilities procedure, (5) recognize recursive calls which eliminates re-
provided by the host operating system and its libraries. With sepa- evaluating the value of the closures, (6) skip the argument-count
rate compilation, we can implement an extended library of proce- check when the target of the call is known statically, and (7)
dures including those required by the R5 RS or the various SRFIs. consing up the rest arguments for known calls to procedures
The missing features that can be added directly without changing that accept a variable number of arguments.
the architecture of the compiler by much include: • Our compiler introduces temporary stack locations for all com-
plex operands. For example, (+ e 4) can be compiled by eval-
• A full numeric tower can be added. The extended numerical
uating e first and adding 16 to the result. Instead, we trans-
primitives can either be coded directly in Scheme or provided formed it to (let ((t0 e)) (+ t0 4)) which causes unnec-
by external libraries such as GNU MP. essary saving and reloading of the value of e. Direct evaluation
• Multiple values are easy to implement efficiently using our is likely to yield better performance unless good register allo-
stack-based implementation with very little impact on the per- cation is performed.
formance of procedure calls that do not use multiple values [3]. • Our treatment of tail-calls can be improved greatly by recogniz-
• User-defined macros and a powerful module system can be ing cases where the arguments can be evaluated and stored in
added simply by compiling and loading the freely-available place. The greedy-shuffling algorithm is a simple strategy that
portable syntax-case implementation [7, 18]. eliminates most of the overhead that we currently introduce for
• Our compiler does not handle heap overflows. Inserting over- tail-calls[4].
flow checks before allocation attempts should be simple and • None of the safe primitives were implemented in the compiler.
fast by comparing the value of the allocation pointer to an al- Open-coding safe primitives reduces the number of procedure
location limit pointer (held elsewhere) and jumping to an over- calls performed.
flow handler label. A simple copying collector can be imple- • Simple copy propagation of constants and immutable variables
mented first before attempting more ambitious collectors such as well as constant-folding and strength-reduction would allow
as the ones used in Chez Scheme or The Glasgow Haskell Com- us to write simpler code without fear of inefficiencies. For
piler [6, 12]. example, with our current compiler, we might be discouraged
• Similarly, we did not handle stack overflows. A stack-based im- from giving names to constants because these names would
plementation can perform fast stack overflow checks by com- increase the size of any closure that contains a reference to
paring the stack pointer to an end of stack pointer (held else- them.
where) and then jumping to a stack-overflow handler. The han-
dler can allocate a new stack segment and wire up the two stacks More sophisticated optimizations such as register allocation
by utilizing an underflow handler. Implementing stack overflow [5, 4, 16], inlining [17], elimination of run time type checks [10,
and underflow handlers simplifies implementing efficient con- 21], etc. could be targeted next once the simple optimizations are
tinuations capture and reinstatement [9]. performed.
• Alternatively, we can transform the input program into continu-
ation passing style prior to performing closure conversion. This
transformation eliminates most of the stack overflow checks 5. Conclusion
and simplifies the implementation of call/cc. On the down- Compiler construction is not as complex as it is commonly per-
side, more closures would be constructed at run-time causing ceived to be. In this paper, we showed that constructing a com-
excessive copying of variables and more frequent garbage col- piler for a large subset of Scheme that targets a real hardware is
lections. Shao et al. show how to optimize the representation of simple. The basic compiler is achieved by concentrating on the es-
such closures [15]. sential aspects of compilation and freeing the compiler from so-
phisticated analysis and optimization passes. This helps the novice
4.2 Enhancing Performance compiler writers build the intuition for the inner-workings of com-
The implementation of Scheme as presented in Section 3 is sim- pilers without being distracted by details. First-hand experience in
ple and straightforward. We avoided almost all optimizations by implementing a basic compiler gives the implementor a better feel
performing only the essential analysis passes required for assign- for the compiler’s shortcomings and thus provide the motivation
ment and closure conversion. On the other hand, we have chosen for enhancing it. Once the basic compiler is mastered, the novice
a very compact and efficient representation for Scheme data struc- implementor is better equipped for tackling more ambitious tasks.

36 Scheme and Functional Programming, 2006


Acknowledgments References
I extend my thanks to Will Byrd, R. Kent Dybvig, and the anony- [1] A HO , A. V., S ETHI , R., AND U LLMAN , J. D. Compilers: Principles,
mous reviewers for their insightful comments. Techniques, and Tools. 1986.
[2] A PPEL , A. W. Modern Compiler Implementation in ML. Cambridge
Supporting Material University Press, Cambridge, UK, 1998.
[3] A SHLEY, J. M., AND DYBVIG , R. K. An efficient implementation
The extended tutorial and the accompanying test suite are available
of multiple return values in scheme. In LISP and Functional
for download from the author’s website: Programming (1994), pp. 140–149.
http://www.cs.indiana.edu/~aghuloum [4] B URGER , R. G., WADDELL , O., AND DYBVIG , R. K. Register
allocation using lazy saves, eager restores, and greedy shuffling.
In SIGPLAN Conference on Programming Language Design and
Implementation (1995), pp. 130–138.
[5] C HAITIN , G. J. Register allocation & spilling via graph coloring.
In SIGPLAN ’82: Proceedings of the 1982 SIGPLAN symposium on
Compiler construction (New York, NY, USA, 1982), ACM Press,
pp. 98–101.
[6] DYBVIG , R. K., E BY, D., AND B RUGGEMAN , C. Don’t stop the
BIBOP: Flexible and efficient storage management for dynamically-
typed languages. Tech. Rep. 400, Indiana University, 1994.
[7] DYBVIG , R. K., H IEB , R., AND B RUGGEMAN , C. Syntactic
abstraction in scheme. Lisp Symb. Comput. 5, 4 (1992), 295–326.
[8] DYBVIG , R. K., H IEB , R., AND B UTLER , T. Destination-driven
code generation. Tech. Rep. 302, Indiana University Computer
Science Department, February 1990.
[9] H IEB , R., DYBVIG , R. K., AND B RUGGERMAN , C. Representing
control in the presence of first-class continuations. In Proceedings
of the ACM SIGPLAN ’90 Conference on Programming Language
Design and Implementation (White Plains, NY, June 1990), vol. 25,
pp. 66–77.
[10] JAGANNATHAN , S., AND W RIGHT, A. K. Effective flow analysis
for avoiding runtime checks. In 2nd International Static Analysis
Symposium (September 1995), no. LNCS 983.
[11] K ELSEY, R., C LINGER , W., AND (E DITORS ), J. R. Revised5 report
on the algorithmic language Scheme. ACM SIGPLAN Notices 33, 9
(1998), 26–76.
[12] M ARLOW, S., AND J ONES , S. P. The new ghc/hugs runtime system.
[13] M UCHNICK , S. S. Advanced Compiler Design and Implementation.
Morgan Kaufmann Publishers, 1997.
[14] SCO. System V Application Binary Interface, Intel386TM Architec-
ture Processor Supplement Fourth Edition, 1997.
[15] S HAO , Z., AND A PPEL , A. W. Space-efficient closure representa-
tions. In LISP and Functional Programming (1994), pp. 150–161.
[16] T RAUB , O., H OLLOWAY, G. H., AND S MITH , M. D. Quality and
speed in linear-scan register allocation. In SIGPLAN Conference on
Programming Language Design and Implementation (1998), pp. 142–
151.
[17] WADDELL , O., AND DYBVIG , R. K. Fast and effective procedure
inlining. In Proceedings of the Fourth International Symposium on
Static Analysis (SAS ’97) (September 1997), vol. 1302 of Springer-
Verlag Lecture Notes in Computer Science, pp. 35–52.
[18] WADDELL , O., AND DYBVIG , R. K. Extending the scope of
syntactic abstraction. In POPL ’99: Proceedings of the 26th
ACM SIGPLAN-SIGACT symposium on Principles of programming
languages (New York, NY, USA, 1999), ACM Press, pp. 203–215.
[19] WADDELL , O., S ARKAR , D., AND DYBVIG , R. K. Fixing letrec:
A faithful yet efficient implementation of scheme’s recursive binding
construct. Higher Order Symbol. Comput. 18, 3-4 (2005), 299–326.
[20] W IRTH , N. Compiler Construction. Addison Wesley Longman
Limited, Essex, England, 1996.
[21] W RIGHT, A. K., AND C ARTWRIGHT, R. A practical soft type system
for scheme. Transactions on Programming Languages and Systems
(1997).

Scheme and Functional Programming, 2006 37

You might also like