Academia.eduAcademia.edu

Realistic compilation by partial evaluation

1996, ACM SIGPLAN Notices

Two key steps in the compilation of strict functional languages are the conversion of higher-order functions to data structures (closures) and the transformation to tail-recursive style. We show how to perform both steps at once by applying first-order offline partial evaluation to a suitable interpreter. The resulting code is easy to transliterate to low-level C or native code. We have implemented the compilation to C; it yields a performance comparable to that of other modern Scheme-to-C compilers. In addition, we have integrated various optimizations such as constant propagation, higherorder removal, and arity raising simply by modifying the underlying interpreter. Purely first-order methods suffice to achieve the transformations. Our approach is an instance of semantics-directed compiler generation.

Realistic Compilation By Partial Evaluation Michael Sperber Peter Thiemann Wilhelm-Schickard-Institut für Informatik Universität Tübingen Sand 13, D-72076 Tübingen, Germany fsperber,[email protected] Abstract translator from first-order tail-recursive Scheme into low-level C. It is also possible to generate the back-end translator by the same method from an interpreter, using a partial evaluator for C [2]. Hence, partial evaluation offers the development of a Scheme compiler for the price of writing two interpreters. The automatic conversion to tail form is also the first solution to Jones’s 1987 challenge 11.5 [26]. The optimizing compiler performs aggressive constant propagation and higher-order removal; it is a specializer in its own right. For its generation, we exploit two principles: the specializer projections for the generation of the transformer, and the language preservation property of offline partial evaluators for the translation of higher-order programs into first-order tail-recurs ive code. We have generated the transformer from an interpreter using the partial evaluator Unmix. Unmix, a descendant of the Moscow specializer [36], dating back to 1990, treats only a first-order subset of Scheme, and does not handle partially static data structures. Since our transformer performs a much more powerful specialization on higher-order Scheme, and does handle partially static data structures, we have achieved a bootstrapping effect. Our work also refutes the 1991 claim of Consel and Danvy [17] that realistic compiler generation by partial evaluation is only possible through recent advances in partial evaluation technology. We show that neither higher-order specialization nor partially static data structures are vital to achieve realistic compilation. A simple first-order partial evaluator suffices to do the job, even for a higher-order subject language. Two key steps in the compilation of strict functional languages are the conversion of higher-order functions to data structures (closures) and the transformation to tail-recursive style. We show how to perform both steps at once by applying first-order offline partial evaluation to a suitable interpreter. The resulting code is easy to transliterate to low-level C or native code. We have implemented the compilation to C; it yields a performance comparable to that of other modern Scheme-to-C compilers. In addition, we have integrated various optimizations such as constant propagation, higherorder removal, and arity raising simply by modifying the underlying interpreter. Purely first-order methods suffice to achieve the transformations. Our approach is an instance of semantics-directed compiler generation. Keywords semantics-directed compiler generation, partial evaluation, compilation of higher-order functional languages Partial evaluation is an automatic program transformation that performs aggressive constant propagation [28, 18]. When applied to an interpreter with respect to a constant (“static”) input program for the interpreter, partial evaluation performs compilation into the target language of the partial evaluator. Naive interpreters subjected to offline partial evaluation usually produce straightforward compiled code. Moreover, if the input language of the interpreter, and the input and output languages of the partial evaluator are identical—that is, if it is a self-interpreter—the compilation is essentially the identity function. However, if the interpreter uses only a subset of the subject language, so do the compiled programs. In addition, after changing the interpreter to propagate more information statically, the produced compiler performs optimization. The generation of optimizing program transformers by partial evaluation is called the interpretive approach [24]. It has been applied to a wide range of problems: to the generation of optimizing specializers, supercompilers, and deforesters [24, 25]—albeit in the context of first-order languages. We show that the interpretive approach can achieve optimizing compilation of a strict, higher-order functional language. Our compilation system consists of two parts: an optimizing transformer, which translates higher-order recursion equations into first-order tail-recursive Scheme programs, generated automatically from a suitable interpreter by partial evaluation, and a simple, hand-written Overview We start with a small example for specialization and translation into tail-recursive code in Sec. 1. Section 2 is a brief introduction to partial evaluation. In Sec. 3 we explain the two basic principles needed to generate stand-alone compilers by partial evaluation: the specializer projections and the language preservation property. Section 4 shows how to turn a simple-minded recursivedescent interpreter into a two-level interpreter from which the partial evaluator produces the optimizing compiler. In Sec. 5, we describe our approach to generating C code from the compiler output by hand, followed by a recipe on how to achieve the same effect by using partial evaluation again. Section 6 presents experimental results, and Sec. 7 discusses related work. 1 Examples We illustrate the transformations that our compiler performs by applying them to a version of append written in continuation-passing style (CPS): Appears in: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’96, Philadelphia, USA. (define (append x y) 1 An offline partial evaluator consists of a binding-time analysis and a reducer. The binding-time analysis, applied to the subject program and the binding times of its arguments, annotates each expression in the program with a binding time, static or dynamic. The reducer processes the annotated program and the static part of the input, reducing static expressions and rebuilding dynamic ones, driven by the annotations. Whereas simple-minded bindingtime analyses only handle the binding times “completely static” and “completely dynamic,” more sophisticated variants also treat partially static data [33, 32, 9, 16]. In contrast, online partial evaluators [48, 38] are one-pass programs which decide “online” whether to reduce or rebuild an expression. They are generally more powerful than their offline counterparts because they exploit information about actual values—rather than only their binding times—to decide whether to reduce or rebuild. For our experiments, we use Unmix, a simple offline partial evaluator for a first-order, purely functional subset of Scheme. Unmix employs classic Mix technology [29], and does not handle partially static data. However, its post-processor performs arity raising [37] which is crucial to the generation of efficient residual programs in the absence of partially static data. (cps-append x y (lambda (x) x))) (define (cps-append x y c) (if (null? x) (c y) (cps-append (cdr x) y (lambda (xy) (c (cons (car x) xy)))))) The compiler converts the program to first-order tail-recursive Scheme. It residualizes the lambda appearing in the program, and represents the resulting functions by closures. Closures consist of a closure label identifying its originating expression, and the values of their free variables. They are constructed by make-closure and accessed by closure-label and closure-freeval The closure label 10 denotes the identity, 24 the inner continuation. Whenever the program applies a closure, it dispatches on the closure-label component. This happens for both applications of the continuation c, once in s1-eval-$3 for (c y) and once in s1-eval-$9 for the other application. Note that the identifier names in the residual program have been replaced by generic names from the interpreter. Namely, the counterparts to the original identifiers now have the form cv-vals-xx . (define (append x y) (s1-eval-$3 (make-closure 10) y x)) 3 Prerequisites for Compiler Generation The interpreters described here exploit two basic principles: The specializer projections specify how to generate specializers from interpreters, and the language-preservation property of offline partial evaluation is the basis for higher-order removal and conversion to tail form. (define (s1-eval-$3 cv-vals-$1 cv-vals-$2 cv-vals-$3) (if (null? cv-vals-$3) (if (equal? 10 (closure-label cv-vals-$1)) cv-vals-$2 (do-closure-cv-bindings-$2 cv-vals-$2 cv-vals-$1)) (s1-eval-$3 (make-closure 24 cv-vals-$1 cv-vals-$3) cv-vals-$2 (cdr cv-vals-$3)))) The Specializer Projections Partial evaluation of interpreters can perform compilation. The specification of an S -interpreter int written in L is JintKL (define (do-closure-cv-bindings-$2 first-val closure) (s1-eval-$9 (closure-freeval closure 1) first-val (closure-freeval closure 0))) PS inp = JPS KS inp where J KL is the execution of an L-program, PS is an S -program, and inp is its input. An L L-partial evaluator pe can compile PS into an equivalent L-program PL such that JPL KL inp JPS KS inp as described by the first Futamura projection [21]: ! (define (s1-eval-$9 cv-vals-$1 cv-vals-$2 cv-vals-$3) (if (equal? 10 (closure-label cv-vals-$3)) (cons (car cv-vals-$1) cv-vals-$2) (do-closure-cv-bindings-$2 (cons (car cv-vals-$1) cv-vals-$2) cv-vals-$3))) = PL = JpeKL intsd PS . The sd superscript to int means that the partial evaluator is to treat the first argument of int as static, the second as dynamic. Exploiting repeated self-application, the second and third Futamura projections describe the generation of compilers and compiler generators [28]. A generalization of the Futamura projections shows how to generate specializers, or constant-propagating optimizers from a twolevel interpreter [24, 25] 2int which accepts the input to the interpreted program in two parts: one static and one dynamic. The interpreter tries to perform each operation with the static part of the input first; only if that fails, the dynamic part is consulted. Residual programs result from the first specializer projection [23, 25]: When given a known first argument (foo bar), the compiler performs specialization: (define (append-$1 y) (cons ’foo (cons ’bar y))) The next step in the compilation is the translation to C. We have omitted actual output. Section 5 describes our C back end. 2 Partial Evaluation Issues RL = JpeK 2intssd PS inps where inps is the static part of the input and RL Partial evaluation is a specialization technique: If parts of the input of a subject program are known at compile time, a partial evaluator generates a residual program specialized with respect to the static input. The residual program only takes the remaining, dynamic parts of the input as parameters, and produces the same results as the subject program applied to the full input. Partial evaluation can remove interpretive overhead and produce significant speedups [28]. is the specialized program. Analogous to the Futamura projections, stand-alone specializers and specializer generators result result from the second and third specializer projection. We will introduce an ordinary one-level interpreter and then show how to extend it to two levels. 2 int H F compiler ?! ! H 2int C V 2 P2 O2 K2 E2 D2 Futamura Projections F specializer Specializer Projections ::= D ::=  ::= E Figure 1: Generation of compilers and specializers The Language Preservation Property Mix-style offline partial evaluators have the language preservation property which is obvious from inspecting their specialization phase [28]. = JpeK intsd PH = JcompilerKPH = JpeK 2intssd PH inp = JspecializerKPH inp As the first step towards true compilation, we apply Reynolds’s defunctionalization [35], and change the representation of functions in the interpreter to closures consisting of the label of the originating lambda expression, and the values of its free variables (see Fig. 4). (freevars(`) computes the list of the free variables of the expression at ` in an arbitrary but fixed order.) Consequently, we now have a first-order interpreter for a higher-order langua ge. We start from a straightforward, environment-based interpreter and transform it step by step: By subjecting the interpreter to closure conversion, the generated transformer performs closure conversion as well. Converting the interpreter to tail form leads to a transformer into tail form. = Label  Value = BaseValue + Closure ` E J(lambda (V ) E )K = let V1 : : : Vn = freevars(`) in (`; V1 : : : Vn ) = let (`; v1 : : : vn ) = E JE1 K E J(E1 E2 )K (lambda (V ) E ) = (`) V1 : : : Vn = freevars(`) in E JE K[V 7! E JE2 K; V1 7! v1 ; : : : Vn 7! vn ] Closure Value Next, adding constant propagation in static data turns the transformer into an optimizer. Finally, we introduce a generalization strategy to ensure termination. 4.1 A Straightforward Interpreter Figure 2 defines the syntax of the purely functional Scheme subset treated by our interpreters. For the sake of simplicity, we have restricted it to lambda abstractions of one argument. Figure 3 shows a standard interpreter for the Scheme subset. The meta-language is a call-by-value lambda calculus enriched with E1 E2 denotes the Mcconstants, sums, and products. T Carthy conditional. The notation Value Value is a shorthand Value, Value Value, Value Value for the sum of () Value etc. We have omitted the injections and case analysis for elements of Value. We assume that each expression is uniquely labeled by an ` Label. Where necessary, we indicate the label by a superscript. serves for both function and label lookup. ! D+ 4.2 Removing  4 Deriving the Interpreter ! j ! ! j Figure 3: A standard call-by-value interpreter —the specialized program—are F -programs.  j = BaseValue + (Value ! Value) = Variable ! Value 2 ProcEnv = (ProcName + Label) ! Expression KJ_K : Constants ! Value OJ_K : Operators ! Value ! Value E J_K : Expression ! Env ! Value E JV K = JV K E JK K = KJK K E J(if E1 E2 E3 )K = E JE1 K ! E JE2 K j E JE3 K E J(O E1 : : : En )K = OJOK(E JE1 K; : : : ; E JEn K) E J(P E1 : : : En )K = E J (P)K[Vi 7! E JEi K] E J(let ((V E1 )) E2 )K = E JE2 K[V 7! E JE1 K] E J(lambda (V ) E )K = y:E JE K[V 7! y] = (E JE1 K)(E JE2 K) E J(E1 E2 )K —the compiled program—and  j Value   j  2 Env Specialization of an interpreter can translate higher-order to first-order programs: Suppose pe is a partial evaluator for a subset C of Scheme. The first-order language F in which the interC . Finally, the interpreter itself executes preter is written has F programs in the higher-order Scheme subset H . See Fig. 1 for an illustration. Because pe preserves the F -ness of the subject program, the residual programs  j ` 2 Label 2 RF 2 j j Figure 2: Syntax  L which includes all L0 For any sublanguage L0 computable values as literals, and for any binding-timeannotated L-program P every dynamic expression of which belongs to L0 , JpeK P x L0 holds for arbitrary static x. PF Variable ProcName Operators Constants Expression Definition  Program  V K (if E E E ) (O E ) (P E  ) (let ((V E )) E ) (lambda (V ) E ) (E E ) (define (P V  ) E )  Figure 4: Changes to interpreter after closure conversion The interpreter shown in Fig. 4 does not specialize effectively yet. On closure application, the label ` is dynamic. Hence, the lambda expression in the program text would normally be dynamic as well which would lead to unwanted interpretive overhead in the specialized code. Instead, the actual interpreter employs a binding-time improvement to make the expression argument static again—called “The Trick” [28]: On closure application, the interpreter loops over all lambda expressions that could have generated ! 2 3 S  J_K evaluates a simple expression to a value description. The the closure to be applied, comparing each one with ` successively. When the interpreter finds the lambda belonging to `, it continues interpretation with the now static expression. The interpreter employs a simple equational flow analysis [11] to restrict the set of lambdas which it needs to test. The residual code then performs a sequential dispatch on closure application. constructors cons and lambda evaluate to the corresponding descriptions. For selector and primitive applications, the interpreter first examines if it can reduce them statically—for example, when car is applied to a cons description. If that is not possible, the interpreter generates a new configuration variable and maps it to the dynamic result of the expression. Therefore, S  J_K returns a new configuation variable environment along with the value description. Again, all non-tail calls in the definition are statically unfoldable. 4.3 Converting to Tail Form In the next step we convert the interpreter to tail-recursive style. Again, by changing the interpreter, the generated compiler performs the corresponding transformation. In a higher-order setting, we would transform the interpreter into CPS [35, 20, 4]. CPS makes control flow explicit by encoding the current evaluation context as a function. As we only have first-order methods at our disposal, we encode evaluation contexts in the same way as functions: by closures. Thus, we encode the current evaluation context directly as a function, avoiding an explicit CPS transformation. E SE ::= ::= Note that a simple expression is static if all its free variables refer to static value descriptions (those that do not contain cv components). For static simple expressions, S  J_K always produces a static value. DJ_K evaluates an arbitrary value description to a value. E  J_K is the main evaluation function. It is analogous to the E  J_K function in the simple tail-recursive interpreter in Fig. 6. The main difference is in the handling of if: The interpreter tries to determine the conditional statically first. Only if that fails, it introduces a residual conditional. Our implementation can actually infer a static if more often than the interpreter shown in Fig. 6, for example on null? tests on cons descriptions with dynamic components. SE j (if SE E E ) j (P SE  ) j (SE E ) V j K j (O SE  ) j (lambda (V ) E ) Figure 5: Desugared syntax In our interpreter, a desugaring phase reduces the number of different evaluation contexts to one—the application of a closure. In non-tail positions we only allow simple expressions which evaluate directly to values—constants, variables, applications of primitives, and lambda abstractions. Figure 5 shows the simplified syntax. In the specification, SE is for simple expressions. The desugaring phase simply moves the non-tail expressions into parameters to lambda abstractions. Thus, the expression (f (g x)) becomes ((lambda(r )(f r ))(g x)). In addition, the desugarer replaces lets by equivalent applications of lambda abstractions. The tail-recursive interpreter is shown in Fig. 6. S evaluates simple expressions. E  evaluates “serious” expressions. E  has an additional argument, a context stack, which keeps track of pending context closures. When E  reaches a simple expression SE , it evaluates SE via S , and passes the result to C which processes the stack of pending contexts. C applies the closure on top. If the context stack is empty, C ’s argument is the final result of the interpretation. S need not be tail-recursive: All calls to S are statically unfoldable, and consequently never perform function calls. Hence, partially evaluating the interpreter in Fig. 6 yields tail-recursive residual programs. C  handles context application, analogous to C in Fig. 6. C  also needs to distinguish between static and dynamic contexts. For static contexts, it is trivial to prepare a suitable environment and continue evaluation. For dynamic contexts, the interpreter needs to introduce new configuration variables for their (dynamic) free variables. The interpreter presented here is not yet suitable for successful offline partial evaluation. Some standard binding-time improvements [28] are necessary to ensure that  and as well as the expression to be evaluated stay static. For instance, the interpreter also performs “The Trick” on the application of a dynamic context, just as the interpreter shown in Fig. 4. 4.5 Addressing Non-Termination Woes The two-level interpreter exhibited in the last section is first-order, tail-recursive, and performs constant propagation. However, partial evaluation with respect to a static input program does not terminate for non-tail-recursive input programs: Mix-style partial evaluators such as Unmix do not detect and properly handle static data structures that grow without bounds under dynamic control. Our interpreter propagates such data in three places: 4.4 Propagating Constants 1. The stack of evaluation contexts may contain a context that leads to its own repeated evaluation. Now we turn the transformer into an optimizer to first-order tailrecursive code. We split the environment  into a static and a dynamic part, converting the interpreter into a two-level interpreter and making it amenable to the specializer projections. To support partially static data structures, we change  to associate names with completely static value descriptions instead of dynamic values. A value description may represent an arbitrary partially static data object: 2. A closure may contain a closure generated from the same lambda expression as part of the value of one of its free variables. 3. Applications of cons may nest. desc ::= quote(K ) j cons(desc; desc) j clos(`; desc ) j cv(i) Exactly these conditions lead to self-embedding data structures which potentially grow infinitely. The critical data structures must be generalized (coerced to dynamic values) which removes their static value from the view of the partial evaluator. For closures and data structures, generalization is straightforward: The interpreter replaces the offending value descriptions by fresh cv descriptions, and adds the generalized values to . To handle dynamic evaluation contexts, we must split the context stack into a static part and a A value description can be a completely static atomic value (quote), a pair of value descriptions (cons), a partially static closure (clos), or a configuration variable [24, 47, 45] whose value is stored in a separate environment . fresh() yields an unused configuration variable. Figure 7 shows the two-level interpreter with the following functions: 4 2 Context = Closure J_K : SimpleExpression Env Value  J_K : Expression Env Context Value : Value Context Value S E C SV  SK S (O SE : : : SE )  S (lambda (V ) E)  E  SE  E  (if SE E E )  E  (P SE : : : SE )  E (SE E)  Cv `; v : : : v 0 J J J K K 1 J J J J J (( Cv [] n K ` K K 1 1 1 2 K n K K n) : ) = = = = V ! ! KK O O S SE J K J K( J ! ! ! ! ! 1 K; : : : ; S SE J n K) let V1 : : : Vn = freevars(`) in (`; V1 : : : Vn ) = = = = C S SE   S SE  ! E E  j E  E  E  P V 7! S SE ; : : : ; V 7! S SE E E  S SE  = let (lambda (V ) E ) = ( J K ) J K J 1K J 2K J ( )K[ 1 J 1K n J K ( J K : ) V1 : : : V n in  JE K[V v; V1 v E 7! J n K] (`) = = freevars(`) 7! v ; : : : ; V 7! v 1 n n] 0 Figure 6: Tail-recursive interpreter eration to C. The translation of an S0 program to C yields a single function program. Procedure headers are translated into labels, hence (tail-recursive) function calls turn out to be gotos. Parameters are passed in a fixed number of variables local to program, but global to all procedures. On entry to a procedure, a new C scope is opened which declares the procedure’s private parameter variables. Then the relevant global parameter variables are copied into the private variables. Since procedure calls’ arguments are simple expressions, there are no nested procedure calls in S0 . Therefore the arguments of a call can be evaluated without referring to the global parameter variables. Thus the construction of an argument list is straightforward: generate code to evaluate the simple argument expressions and assign the result to the respective global parameter variables. Finally, control is transferred to the next procedure by a goto. The translation of simple expressions SE is an assignment of its value to a new temporary variable. Temporary variables are also local variables of program, but global to all procedures. Each temporary variable is defined and used exactly once. We rely on the C compiler’s register allocator to merge variables (global parameter variables, procedure argument variables, and temporaries) if their life ranges are disjoint. The evaluation is sequentialized using C’s sequential evaluation operator (expr, expr). Thus the result of the translation is a C expression. All other expressions E are translated into C statements. For a simple expression a return statement is generated which terminates the execution of program. The most important interface between the tail-recursive interpreter and the translation to C is the closure representation. The interpreter treats closures as an abstract datatype with operations make-closure, closure-label, and closure-freeval with the obvious interpretations. These operations are propagated to residual programs. The translator to C is free to choose an efficient implementation for closures. The current implementation uses a flat vector representation. Note that the C code also performs a sequential dispatch on closure applications exactly like the Scheme input programs. It might be desirable to perform closure application by an indirect goto statement as allowed by GCC [41]. However, since sequential dispatch is inherent in our approach, it would seem difficult to achieve this by straightforward means. We represent Scheme data objects by a C union, and we employ the Boehm garbage collector for C [6]. There is no cooperation between the translation and the garbage collector. dynamic part, and use the dynamic stack for critical contexts that may cause non-termination. We have implemented an online strategy and an offline analysis for generalization: Online Generalization Self-embedding data can only grow without bounds inside of the branches of dynamic conditionals and through bodies of dynamic lambdas [10]. Under the online strategy [46], our interpreter delays generalization until it encounters a dynamic conditional. In that case, the interpreter scans  and for critical data structures and closures, and generalizes them as described above. Evaluation continues using dynamic evaluation contexts. Offline Generalization Analysis An alternative approach uses a flow analysis [40, 8] to determine statically which lambdas and which cons expressions may lead to critical data in the interpreter. The corresponding descriptions are generalized on creation. As for critical evaluation contexts, they are merely closures already caught by the analysis. The online strategy is less conservative since it generalizes only under dynamic conditionals. It necessarily generalizes less and propagates more static information. However, the online approach delays the generalization too long: The interpreter can only detect self-embedding when it has already occurred. Consequently, the respective code is already part of the residual program. Thus, the underlying data structures and loops are unrolled at least once before generalization happens, leading to redundant code. This is a well-known problem in online partial evaluation [38]. 5 Compilation to C We describe two ways to achieve compilation to the C language. The first one describes a very simple translation implemented by hand. It has been implemented and used to obtain the experimental data presented below. The second one presents a more speculative approach which again employs partial evaluation to obtain a C program from our Scheme subset. It has not been carried out in practice. 5.1 By Hand The output language S0 of the partial evaluation process is a tailrecursive first-order subset of Scheme which has a simple translit5  2 Env  2 CVEnv 2 Context S  J_K DJ_K E  J_K C _ = = = : : : : ! Variable ValDesc ConfigVariable Value ValDesc SimpleExpression Env CVEnv (ValDesc ValDesc CVEnv Value Expression CVEnv Context Value ValDesc CVEnv Context Value = = = S  J(car SE )K = DJquote(K )K DJcons(desc1 ; desc2 )K DJclos(`; desc1 : : : descn )K DJcv(i)K E  JSE K E J(if SE E1 E2 )K E  J(P SE1 : : : SEn )K E  J(SE E )K C  desc (c : 0 ) ! ! ! ! ! ! ! ! ! ! ! S  JV K S  JK K S J(cons SE1 SE2 )K S  J(cdr SE )K S J(lambda` (V ) E)K S  J(O E1 : : : En )K ! = = = = = = = (V;  ) (quote(K );  ) JSE1 K let (desc1 ; 1 ) = (desc2 ; 2 ) = JSE2 K1 in (cons(desc1 ; desc2 ); 2 ) let (desc;  ) = JSE K in (desc = cons(desc1 ; desc2 )) (desc1 ;  ) let i = fresh() in (cv(i);  S  S 0 S 0 ! j 7! OJcarK(DJdescK0 )]) analogous let V1 : : : Vn = freevars(`) in clos(`; V1 : : : Vn ) SE1 : : : SEn static (quote( JO K( Jdesc1 K; : : : ; Jdescn K ));  ) let i = fresh() in (cv(i);  JO K( Jdesc1 K; : : : ; Jdescn K )]) h i ! O D j 7! O D KJK K OJconsK(DJdesc1 K; DJdesc2 K) (`; D Jdesc1 K : : : D Jdescn K ) D D (i) let (desc; 0 ) =  JSE K in  desc 0 let (desc; 0 ) =  JSE K in SE static  JE2 K  JE1 K (desc = quote(false)) 0  JE1 K  JE2 K ( JdescK )  JSE1 K = let (desc1 ; 1 ) = .. .  JSEn Kn?1 (descn ; n ) = in  J (P )K[V1 desc1 ; : : : ; Vn desc ] 0  JSE K in  JE K0 (ndescn : ) = let (desc;  ) = = = h ! j D S S i = C !E S E !E S 7 ! S E (c = clos(`; desc1 : : : descn )) let (lambda (V ) E ) = V1 : : : V n = in JE K[V desc; V1 let (`; v1 : : : vn ) = (lambda (V ) E ) = V1 : : : Vn = i1 = desc1 = 1 = ! j E 7! 7! .. . C  desc [ ]  CVEnv) = D in  JE K[V JdescK E in = descn = n = desc; V1 7! 7! jE jE 7! (`) freevars(`) desc1 ; : : : ; Vn JcK D 7! descn ] 0 (`) freevars(`) fresh() cv(i1 ) 1 7! v1 ] fresh(n?1 ) cv(in ) n?1 [in vn ] desc1 ; : : : ; Vn 7! 7! descn ]n 0 Figure 7: Two-level interpreter 5.2 By Partial Evaluation higher-order to first-order interpreter which has been deve loped in Section 4, and int-c, a hypothetical interpreter for first-order tailrecursive Scheme written in C. Our tools are the compiler generator cogen derived by self-application from the Unmix specializer and the compiler generator C-mix [2]. Let J KS denote execution of The recent advent of C specializers [1, 2] facilitates a development which culminates in a complete compiler written entirely in C. As ingredients we only have to provide two interpreters, int-s, the 6 Scheme programs and J KC execution of C programs. First, we apply deriv tak cpstak takl fibclos cps-append queens = JcogenKS int-s gen-s to obtain a program generator which turns higher-order Scheme programs into first-order tail-recursive Scheme F . Next, we apply gen-s to itself, gen-s-ft = Jgen-sKS gen-s Figure 8: Benchmarks (timings in milliseconds) 7 Related Work = JC-mixKC int-c. Turchin [47] shows that the interpretive approach can perform powerful transformations. Glück and Jørgensen [24, 25] use the interpretive approach to generate a deforester and a supercompiler. However, they only deal with first-order languages. Past attempts at compilers for higher-order languages by partial evaluation have produced higher-order target code because they are written in higher-order languages. Bondorf [7] studies the automatic generation of a compiler for a lazy higher-order functional language from an interpreter. Jørgensen shows that optimizing compilers for realistic functional languages can be generated by rewriting an interpreter [30, 31]. Consel and Danvy [17] use partial evaluation to compile Algol to tail-recursive Scheme. However, they attribute their success to sophisticated features of the partial evaluator they use, Schism, such as partially static data structures and higher-order functions. Burke and Consel [12] translate Scheme into low-level stack-machine code by multiple interpretive passes, starting from a denotational semantics for Scheme. However, they also make extensive use of higher-order features of the partial evaluator. The first mention of higher-order removal or defunctionalization appears in work of Reynolds [35]. Compilers for functional languages [42, 4, 3, 20] usually achieve closure conversion with a direct non-optimizing transformation algorithm, and employ CPS conversion to transform programs into tail form. Chin and Darlington [13, 14] give a higher-order removal algorithm for lazy functional languages. However, the resulting program may still be higher-order—the algorithm does not perform closure conversion. The compilation of higher-order languages via a C compiler has been used successfully in several projects, such as sml2c [44], Hobbit [43], Bigloo [39], and the Glasgow Haskell Compiler [34]. In particular, sml2c also translates tail-recursive intermediate code obtained from a CPS transformation, but uses a function dispatcher for handling control. We can now translate gen-s-ft to C by using the compiler just constructed: gen-s-ft-c = Jgen-cKC gen-s-ft : It remains to compose the programs gen-c and gen-s-ft-c to obtain a full Scheme to C compiler written in C: scheme->c = gen-c  gen-s-ft-c Performing the composition merely consists in merging the print routine of gen-c with the parser of gen-s-ft-c. In essence, we have seen that a Scheme-to-C compiler (written in C) can be generated by partial evaluation for the price of writing two interpreters, int-s and int-c: scheme->c Hobbit 390 810 6490 870 19480 36340 2370 ; and obtain the higher-order to F program generator, but now written in F ! Now we start on the C end of the translation. An F !C compiler (written in C) is the result of an application of C-mix to int-c: gen-c We 2420 5820 6400 220 15820 5480 8110 = int-s + int-c + partial evaluation. The ideas presented in this section have not been realized in practice, due to the fact that no C specializer is publicly available as of yet. 6 Experimental Results We have run some preliminary benchmarks which indicate that the performance of our approach is comparable to other Scheme compilers which generate C code. Our benchmarks are a program computing derivatives deriv from the Gabriel benchmark suite [22], the Takeuchi function tak, a CPS version of it cpstak, a version of it using lists instead of integers takl (also taken from the Gabriel suite) a version of the Fibonacci function involving closures fibclos, a suite of calls to cps-append, and a program solving the 10-queens problem queens. Figure 8 shows the timings of the benchmarks as compared with Hobbit 4d [43], an optimizing compiler which produces code for the scm runtime, used with maximum optimization and fixnum arithmetic. Our versions of the benchmarks were all run using the offline generalization strategy. The tests were run on an IBM PowerPC/250. The fibclos and cps-append benchmarks indicate that our approach deals especially well with higher-order code. For the firstorder code in tak, deriv, and queens, our approach introduces evaluation contexts and thus closures whereas Hobbit can use the native C stack to some advantage. Note that we have spent no effort whatsoever on tuning either the resulting first-order Scheme code, or the translation to C. We believe that further optimizations will result in an additional substantial performance increase. Also, using the online generalization strategy, the cpstak benchmark ran roughly 3 times faster. Our compiler produces quite compact stand-alone executables. The complete benchmark suite yields a binary well under 200 Kilobytes—including the Boehm collector. The programs associated with the optimizing compiler to tailrecursive Scheme take up less than 70 Kilobytes. The compiler to C takes up a mere 10 Kilobytes. 8 Conclusion We have used the interpretive approach to generate the middle end of a compiler for a strict, higher-order functional language from an interpreter. We achieve closure conversion and conversion to tail form by applying the respective transformations on a straightforward interpreter manually. Offline partial evaluation turns the interpreter into an automatic transformer by virtue of the language preservation property. Adding constant propagation in static data to our interpreter then turns the simple transformer into an optimizer and specializer thanks to the specializer projections. The translation also makes optimizations present in the partial evaluator such as post-unfolding and arity raising accessible to the optimizied programs. In addition, we have presented a translator of the resulting code into low-level C. We have formulated the language preservation property, and put it to use for the optimizing compilation of a higher-order language into C with little conceptual effort. We consider this a successful 7 bootstrapping process. Our results prove that partial evaluation is a practical approach to the generation of optimizing compilers. [17] Charles Consel and Olivier Danvy. Static and dynamic semantics processing. In Symposium on Principles of Programming Languages’ 91, pages 14–24. ACM, January 1991. References [18] Charles Consel and Olivier Danvy. Tutorial notes on partial evaluation. In Symposium on Principles of Programming Languages ’93, pages 493–501, Charleston, January 1993. ACM. [1] Lars Ole Andersen. Self-applicable C program specialization. In Consel [15], pages 54–61. Report YALEU/DCS/RR-909. [19] Functional Programming Languages and Computer Architecture, London, GB, 1989. [2] Lars Ole Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19). [20] Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes. Essentials of programming languages. MIT Press, Cambridge, MA, 1992. [3] Andrew W. Appel. Compiling with Continuations. Cambridge University Press, 1992. [21] Y. Futamura. Partial evaluation of computation process—an approach to a compiler-compiler. Systems, Computers, Controls, 2(5):45–50, 1971. [4] Andrew W. Appel and Trevor Jim. Continuation-passing, closure-passing style. In Proc. 16th ACM Symposium on Principles of Programming Languages, pages 293–302, 1989. [22] Richard P. Gabriel. Performance and Evaluation of Lisp Systems. MIT Press, Cambridge, MA, 1985. [5] Dines Bjørner, Andrei P. Ershov, and Neil D. Jones, editors. Partial Evaluation and Mixed Computation. North-Holland, 1987. Proceedings of the IFIP TC2 Workshop on Partial Evaluation and Mixed Computation. [23] Robert Glück. On the generation of specializers. Journal of Functional Programming, 4(4):499–514, October 1994. [6] Hans-J. Boehm, Alan J. Demers, and Scott Shenker. Mostly parallel garbage collection. In Proc. of the ACM SIGPLAN ’91 Conference on Programming Language Design and Implementation, pages 157–164, Toronto, June 1991. ACM. SIGPLAN Notices, v26, 6. [24] Robert Glück and Jesper Jørgensen. Generating optimizing specializers. In IEEE International Conference on Computer Languages, pages 183–194. IEEE Computer Society Press, 1994. [7] Anders Bondorf. Automatic autoprojection of higher order recursive equations. In Jones [27], pages 70–87. LNCS 432. [25] Robert Glück and Jesper Jørgensen. Generating transformers for deforestation and supercompilation. In B. Le Charlier, editor, Static Analysis, volume 864 of Lecture Notes in Computer Science, pages 432–448. Springer-Verlag, 1994. [8] Anders Bondorf. Automatic autoprojection of higher order recursive equations. Science of Computer Programming, 17:3– 34, 1991. [26] Neil D. Jones. Challenging problems in partial evaluation and mixed computation. In Bjørner et al. [5], pages 1–14. Proceedings of the IFIP TC2 Workshop on Partial Evaluation and Mixed Computation. [9] Anders Bondorf. Similix 5.0 Manual. DIKU, University of Copenhagen, May 1993. [27] Neil D. Jones, editor. Proc. of the 3rd European Symposium on Programming 1990, Copenhagen, Denmark, 1990. Springer-Verlag. LNCS 432. [10] Anders Bondorf and Olivier Danvy. Automatic autoprojection of recursive equations with global variables and abstract data types. Science of Computer Programming, 19(2):151–195, 1991. [28] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial Evaluation and Automatic Program Generation. PrenticeHall, 1993. [11] Anders Bondorf and Jesper Jørgensen. Efficient analyses for realistic off-line partial evaluation. Journal of Functional Programming, 3(3):315–346, July 1993. [12] E. David Burke and Charles Consel. Compiling scheme programs via multi-pass partial evaluation. Technical report, Oregon Graduate Institute of Science and Technology, 1994. [29] Neil D. Jones, Peter Sestoft, and Harald Søndergaard. An experiment in partial evaluation: The generation of a compiler generator. In J.-P. Jouannaud, editor, Rewriting Techniques and Applications, pages 124–140, Dijon, France, 1985. Springer-Verlag. LNCS 202. [13] Wei-Ngan Chin. Fully lazy higher-order removal. In Consel [15], pages 38–47. Report YALEU/DCS/RR-909. [30] Jesper Jørgensen. Compiler generation by partial evaluation. Master’s thesis, DIKU, University of Copenhagen, 1991. [14] Wei-Ngan Chin and John Darlington. Higher-order removal transformation technique for functional programs. In Proc. of 15th Australian Computer Science Conference, pages 181– 194, Hobart, Tasmania, January 1992. Australian CS Comm Vol 14, No 1. [31] Jesper Jørgensen. Generating a compiler for a lazy language by partial evaluation. In Symposium on Principles of Programming Languages ’92, pages 258–268. ACM, ACM, January 1992. [32] John Launchbury. Projection Factorisations in Partial Evaluation, volume 1 of Distinguished Dissertations in Computer Science. Cambridge University Press, 1991. [15] Charles Consel, editor. Workshop Partial Evaluation and Semantics-Based Program Manipulation ’92, San Francisco, CA, June 1992. Yale University. Report YALEU/DCS/RR909. [33] Torben Æ. Mogensen. Separating binding times in language specifications. In FPCA1989 [19], pages 14–25. [16] Charles Consel. A tour of Schism. In Symp. Partial Evaluation and Semantics-Based Program Manipulation ’93, pages 134–154, Copenhagen, Denmark, June 1993. ACM. 8 [34] Simon L Peyton Jones, Cordelia Hall, Kevin Hammond, Will Partain, and Philip Wadler. The Glasgow Haskell compiler: a technical overview. In Proceedings of the UK Joint Framework for Information Technology (JFIT) Technical Conference, Keele, 1993. [35] John C. Reynolds. Definitional interpreters for higher-order programming. In ACM Annual Conference, pages 717–740, July 1972. [36] Sergei A. Romanenko. A compiler generator produced by a self-applicable specializer can have a surprisingly natural and understandable structure. In Bjørner et al. [5], pages 445–464. Proceedings of the IFIP TC2 Workshop on Partial Evaluation and Mixed Computation. [37] Sergei A. Romanenko. Arity raiser and its use in program specialization. In Jones [27], pages 341–360. LNCS 432. [38] Erik Ruf. Topics in Online Partial Evaluation. PhD thesis, Stanford University, Stanford, CA 94305-4055, March 1993. Technical report CSL-TR-93-563. [39] Manual Serrano. Bigloo user’s manual. Technical report, INRIA, 1994. (to appear). [40] Peter Sestoft. Replacing function parameters by global variables. In FPCA1989 [19], pages 39–53. [41] Richard M. Stallman. Using GNU CC, November 1995. (part of the GCC distribution). [42] Guy L. Steele. Rabbit: a compiler for Scheme. Technical Report AI-TR-474, MIT, Cambridge, MA, 1978. [43] Tanel Tammet. Lambda-lifting as an optimization for compiling scheme to C. available as ftp://www.cs.chalmers. edu/pub/users/tammet/www/hobbit.ps. [44] D. Tarditi, A. Acharya, and P. Lee. No assembly required: compiling Standard ML to C. Technical Report CMU-CS90-187, School of Computer Science, Carnegie Mellon University, November 1990. [45] Peter Thiemann. Higher-order redundancy elimination. In Peter Sestoft and Harald Søndergaard, editors, Workshop Partial Evaluation and Semantics-Based Program Manipulation ’94, pages 73–84, Orlando, Fla., June 1994. ACM. [46] Peter Thiemann and Robert Glück. The generation of a higher-order online partial evaluator. In Masato Takeichi, editor, Fuji Workshop on Functional and Logic Programming, pages 239–253, Fuji Susono, Japan, July 1995. World Scientific Press, Singapore. [47] Valentin F. Turchin. Program tranformation with metasystem transitions. Journal of Functional Programming, 3(3):283– 313, July 1993. [48] Daniel Weise, Roland Conybeare, Erik Ruf, and Scott Seligman. Automatic online partial evaluation. In Conf. Functional Programming Languages and Computer Architecture ’91, pages 165–191, Cambridge, September 1991. ACM. 9