Down with Emacs Lisp

2001, ACM SIGPLAN Notices

ABSTRACT It is possible to translate code written in Emacs Lisp or another Lisp dialect which uses dynamic scoping to a more modern programming language with lexical scoping while largely preserving structure and readability of the code. The biggest obstacle to such an idiomatic translation from Emacs Lisp is the translation of dynamic binding into suitable instances of lexical binding: Many binding constructs in real programs in fact exhibit identical behavior under both dynamic and lexical binding. An idiomatic translation needs to detect as many of these binding constructs as possible and convert them into lexical binding constructs in the target language to achieve readability and efficiency of the target code. The basic prerequisite for such an idiomatic translation is thus a dynamic scope analysis which associates variable occurrences with binding constructs. We present such an analysis. It is an application of the Nielson/Nielson framework for flow analysis to a semantics for dynamic binding akin to Moreau's. Its implementation handles a substantial portion of Emacs Lisp, has been applied to realistic Emacs Lisp code, and is highly accurate and reasonably efficient in practice.

Down with Emacs Lisp: Dynamic Scope Analysis Matthias Neubauer Institut für Informatik Universität Freiburg Michael Sperber Wilhelm-Schickard-Institut für Informatik Universität Tübingen [email protected] [email protected] ABSTRACT It is possible to translate code written in Emacs Lisp or another Lisp dialect which uses dynamic scoping to a more modern programming language with lexical scoping while largely preserving structure and readability of the code. The biggest obstacle to such an idiomatic translation from Emacs Lisp is the translation of dynamic binding into suitable instances of lexical binding: Many binding constructs in real programs in fact exhibit identical behavior under both dynamic and lexical binding. An idiomatic translation needs to detect as many of these binding constructs as possible and convert them into lexical binding constructs in the target language to achieve readability and efficiency of the target code. The basic prerequisite for such an idiomatic translation is thus a dynamic scope analysis which associates variable occurrences with binding constructs. We present such an analysis. It is an application of the Nielson/Nielson framework for flow analysis to a semantics for dynamic binding akin to Moreau’s. Its implementation handles a substantial portion of Emacs Lisp, has been applied to realistic Emacs Lisp code, and is highly accurate and reasonably efficient in practice. 1. MIGRATING EMACS LISP Emacs Lisp [16, 29] is a popular programming language for a considerable number of desktop applications which run within the Emacs editor or one of its variants. The actively maintained code base measures at around 1,000,000 loc1 . As the Emacs Lisp code base is growing, the language is showing its age: It lacks important concepts from modern functional programming practice as well as provisions for large-scale modularity. Its implementations are slow compared to mainstream implementations of other Lisp dialects. Moreover, the development of both Emacs dialects places 1 The XEmacs package collection which includes many popular add-ons and applications currently contains more than 700,000 loc. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICFP’01, September 3-5, 2001, Florence, Italy. Copyright 2001 ACM 1-58113-415-0/01/0009 ...$5.00. comparatively little focus on significant improvements of the Emacs Lisp interpreter. On the other hand, recent years have seen the advent of a large number of extension language implementations of full programming languages suitable for the inclusion in application software. Specifically, several current Scheme implementations are technologically much better suited as an extension language for Emacs than Emacs Lisp itself. In fact, the official long-range plan for GNU Emacs is to replace the Emacs Lisp substrate with Guile, also a Scheme implementation [28]. The work presented here is part of a different, independent effort to do the same for XEmacs, a variant of GNU Emacs which also uses Emacs Lisp as its extension language. Replacing such a central part of an application like XEmacs presents difficult pragmatic problems: It is not feasible to reimplement the entire Emacs Lisp code base by hand. Thus, a successful migration requires at least the following ingredients: • Emacs Lisp code must continue to run unchanged for a transitory period. • An automatic tool translates Emacs Lisp code into the language of the new substrate, and it must produce maintainable code. Whereas the first of these ingredients is not particularly hard to implement (either by keeping the old Emacs Lisp implementation around or by re-implementing an Emacs Lisp engine in the new substrate), the second is more difficult. Even though a direct one-to-one translation of Emacs Lisp into a modern latently-typed functional language is straightforward by using dynamic assignment or dynamic-environment passing to implement dynamic scoping, it does not result in maintainable output code: Users of modern functional languages use dynamic binding only in very limited contexts such as exception handling or parameterization. As it turns out, the situation is not much different for Emacs Lisp users: For many lets and other binding constructs in real Emacs Lisp code, dynamic scope and lexical scope are identical ! Consequently, a good “idiomatic” translation of Emacs Lisp into, say, Scheme, should convert these binding constructs into the corresponding lexical binding constructs of the target substrate. The only problem is to recognize these binding constructs, or rather, distinguish those where the programmer “meant” dynamic scope from those where she “meant” lexical scope. Since with dynamic scope, bindings travel through the program execution much as values do, this requires a proper (let ((file-name-handler-alist nil) (let* ((filename (expand-file-name filename)) (format-alist nil) (file (file-name-nondirectory filename)) (after-insert-file-functions nil) (dir (file-name-directory filename)) (coding-system-for-read ’binary) (comp (file-name-all-completions file dir)) (coding-system-for-write ’binary) newest) (find-buffer-file-type-function (while comp (if (fboundp ’find-buffer-file-type) (setq file (concat dir (car comp)) (symbol-function ’find-buffer-file-type) comp (cdr comp)) nil))) (if (and (backup-file-name-p file) (unwind-protect (or (null newest) (progn (file-newer-than-file-p file newest))) (fset ’find-buffer-file-type (setq newest file))) (lambda (filename) t)) newest)) (insert-file-contents Figure 1: Typical usage of let in Emacs Lisp. filename visit start end replace)) (if find-buffer-file-type-function (fset ’find-buffer-file-type find-buffer-file-type-function) flow analysis. This paper presents such an analysis called (fmakunbound ’find-buffer-file-type)))) dynamic scope analysis. Specifically, our contributions are the following: Figure 2: Parameterizations via dynamic let in Emacs Lisp. • We have formulated a semantics for a subset of Emacs Lisp, called Mini Emacs Lisp, similar to the sequential evaluation function for Λd by Moreau [20]. • We have applied the flow analysis framework of Nielson and Nielson [22] to the semantics, resulting in an acceptability relation for flow analyses of Mini Emacs Lisp programs. • We have used the acceptability relation to formulate and implement a flow analysis for Emacs Lisp which tracks the flow of bindings in addition to the flow of values. • We have applied the analysis to real Emacs Lisp code. More specifically, the analysis is able to handle mediumsized real-world examples with high accuracy and reasonable efficiency. The work presented here is a part of the el2scm project that works on the migration from Emacs Lisp to Scheme. However, the other aspects of the translation (such as front-end issues, correct handling of symbols, the code-data duality, treatment of primitives and so on) are outside the (lexical) scope of this paper. Indeed, the analysis could be used for a number of other purposes, among them the development of an efficient compiler for Emacs Lisp, or the translation to a different substrate such as Common Lisp. Overview. The next section presents some code examples which show the need for a dynamic scope analysis. Section 3 defines the syntax of Mini Emacs Lisp. Section 4 develops an operational semantics with evaluation contexts. Based on the semantics, Section 5 presents a specification of a correct flow analysis. The next section sketches a correctness proof. Our implementation approach is described in Section 7. Section 8 describes some experimental results gained with our implementation prototype. We end with a discussion of related work and a conclusion. 2. EXAMPLES Consider the Emacs Lisp code shown in Figure 1, taken literally from files.el in the current XEmacs core. It contains five variable bindings, all introducing temporary names for intermediate values. The bindings of the variables filename, file, dir, comp, and newest are all visible in the other functions reachable from the body of the let, yet none of them contain occurrences of these names. The only variable occurrences which access the bindings are in the body of the let* itself, and all are within the lexical scope of the bindings. Hence, translating the let* into a lexically-scoped counterpart in the target language would preserve the behavior of this function. Figure 2 shows an example for idiomatic use of dynamic binding (also taken from files.el): It is part of the implementation of insert-file-contents-literally which calls insert-file-contents in the body of the let. The definition of insert-file-contents indeed contains occurrences of the variables bound in the let with the exception of find-buffer-file-type-function. Therefore, it is not permissible to translate the let with a lexically-scoped binding construct. For the vast majority of binding constructs in real Emacs Lisp code, dynamic scope and lexical scope coincide. Thus, the ultimate goal of the analysis is to detect as many of these bindings constructs as possible. In general however, value flow and the flow of bindings interact during the evaluation of Emacs Lisp programs. Hence, it is not possible to apply standard flow analyses based on lexical-binding semantics to solve the problem; a new analysis is necessary. 3. SYNTAX OF MINI EMACS LISP For the sake of simplicity, we concentrate on a subset of Emacs Lisp called Mini Emacs Lisp in the paper. We omit multi-parameter (and variable-parameter) functions, catch/throw, dual name spaces for functions and “ordinary” values, the resulting gratuitous split between funcall and regular application as well as the data/code duality which appears in various contexts in Emacs Lisp. Adding these features to the analysis is straightforward and does not re- quire significant new insights, which is why we omit it here. Our implementation of the analysis does treat all of these features. Here is the syntax for Mini Emacs Lisp: l s, x c b ∈ ∈ ∈ ∈ Lab SymVar Lit Prim ::= ::= ::= ::= ... fritz | franz | . . . 0 | 1 | 2 | ... cons | car | . . . t ∈ Term ::= | | | | | | | | c (quote s) (lambda (x) e) x (setq x e) (e0 e1 ) (let x e1 e2 ) (if e0 e1 e2 ) (b e1 . . . en ) e ∈ Exp ::= tl d ∈ Def ::= | (defvar x e) (defun x0 (x1 ) e) p ∈ Prg ::= d∗ e All expressions carry unique labels which the analysis uses for identifying locations in the program source. The set of literals is trivially extensible. Note that Emacs Lisp uses the nil symbol for boolean false, and everything else for true. An Emacs Lisp program consists of a sequence of definitions followed by a single expression—the entry point of the program. Only the value bound to a variable by a bind term does not carry a label because bind expressions only show up during evaluation, but not in the analysis which only looks at the source code. 4.2 Environments Environments ρ are finite mapping from symbols to values and contain bindings: ρ ∈ Env = SymVar →fin Val. The notation for the empty environment is []. The modification of an existing environment through the new mapping of a symbol x to a value v is written as ρ[x 7→ v]. 4.3 Evaluation Contexts Here are the evaluation contexts for Mini Emacs Lisp: E ∈ EvalContext ::= [−] | (E e2 )l | (if E e1 e2 )l | (let x E e2 )l | (bind x v E)l | (setq x E)l | (p v ∗ E e∗ )l Vx 0 ∈ VarContext(x0 ) ::= | | | | | | 4. A SEMANTICS FOR MINI EMACS LISP We present a structural operational or small-step semantics [23] for Mini Emacs Lisp. We use evaluation contexts and syntactic rewriting as developed by Felleisen and Friedman [6]. 4.1 Values and Intermediate Terms We use separate syntactic categories for intermediate expressions and values. Here is the syntax for literals and abstractions: f ∈ Fun ::= (func (x) e) v ∈ Val ::= | | | (prim c) (sym s) f (pair v1l1 v2l2 ) ITerm ::= (bind x v e) Exp ::= v l | it l it ∈ e ∈ The elements of Val, called values, are results from successful computations. They represent primitive values, symbols and functions, and correspond to the the expressions of Exp which produce them. The semantics uses intermediate bind terms to handle dynamic binding: They result from reducing let expressions with the value to be bound to the variable already evaluated. Expressions attach labels to values and intermediate terms. [−] (E e2 )l (if Vx0 e1 e2 )l (let x Vx0 e2 )l (bind x1 v Vx0 )l if x0 6= x1 (setq x Vx0 )l (b v ∗ Vx0 e∗ )l The rules for EvalContext describes all contexts in which a reduction step in Mini Emacs Lisp can occur. Variable access needs the most recent dynamic binding of the variable. The variable contexts in VarContext(x0 ) help accomplish this; they describe all contexts that do not contain any bindings associated with the symbol x0 . 4.4 Reductions An evaluation state consists of a partially evaluated expression and a global environment. Thus, a configuration γ of Conf is a tuple consisting of an environment and a current expression: γ ∈ Conf = Env × Exp. The primitive steps of the evaluation process are reduction rules. Some expressions immediately reduce to a value: [c] ρ, E[cl ] → ρ, E[(prim c)l ] [quote] ρ, E[(quote s)l ] → ρ, E[(sym s)l ] [lambda] ρ, E[(lambda (x) e)l ] → ρ, E[(func (x) e)l ] Note that in Emacs Lisp, abstractions do not evaluate to closures—this is dynamic scope, after all. Here are the semantic mechanics for dealing with variable access: l putting all possible configurations before and after a reduction step during evaluation in relation. Its reflexive transitive closure is written →⋆ . l0 [var] ρ, E[(bind x v Vx [x ]) ] → ρ, E[(bind x v Vx [v l ])l0 ] [varglob ] ρ, Vx [xl ] → ρ, Vx [v l ] if x ∈ dom(ρ), ρ(x) = v A variable may have either a local or a global binding. The let and lambda constructs introduce local bindings. For a variable occurrence, the closest bind context for that variable holds its value. The [var] rule expresses this behavior; the context Vx guarantees that there is no other binding closer to the variable. Lacking a local binding, a global one must apply; the [varglob ] rule takes over. The machinery for mutating bindings by setq is analogous to the one for referencing variables: ρ, E[(bind x v Vx [(setq x v0l0 )l ])l1 ] → ρ, E[(bind x v0 Vx [v0l ])l1 ] [setq] [setqglobal ] ρ, Vx [(setq x v0l0 )l ] → ρ[x 7→ v0 ], Vx [v0l ] In the case of a local binding, the [setq] rule changes the value in the corresponding bind context. Assignments to global variables mutate the global environment. Here are the reductions for function applications and local variable bindings: 4.5 Expression Contexts So far, only the meaning of expression is defined by the → relation. For programs, we define another kind of context, the expression contexts of ExpContext: X 4.6 ∈ ExpContext ::= | (defvar x [−]) p [−] Reductions for Programs Equipped with the notion of program configurations δ ∈ PConf = Env × Prg, as well as contexts for programs X and the reduction relation → for expressions, it is possible to state the rewriting rules →d for programs: [defvar] ρ, (defvar x v0l0 ) p →d ρ[x 7→ v0 ], p if x 6∈ dom(ρ) [app] ρ, E[((func (x1 ) e2 )l0 e1 )l ] → ρ, E[(let x1 e1 e2 )l ] [let] ρ, E[(let x v1l1 e2 )l ] → ρ, E[(bind x v1 e2 )l ] [defun] ρ, (defun x0 (x1 ) e) p →d ρ[x0 7→ (func (x1 ) e)], p if x0 6∈ dom(ρ) [bind] ρ, E[(bind x v0 v1l1 )l ] → ρ, E[v1l ] [exp] The [app] rule reduces a function application to a binding of the function parameter wrapped around the function body and the environment. The [let] rule of EvalContext turns a let expression into a corresponding bind expression. Evaluation continues with the body e2 until it becomes a value. Then, the [bind] rule removes the obsolete context. Note that the distinction between let expressions and bind expressions is unnecessary when considering only the semantics, but the formulation of the flow analysis requires their separation. The [if1 ] and [if2 ] rules handle conditionals: [if1 ] ρ, E[(if v l0 tl11 e2 )l ] → ρ, E[tl1 ] if v 6= (sym nil) [if2 ] ρ, E[(if v l0 e1 tl22 )l ] → ρ, E[tl2 ] if v = (sym nil) Here are reduction rules for selected primitives, namely those dealing with pairs: [cons] ρ, E[(cons v1l1 v2l2 )l ] → ρ, E[(pair v1l1 v2l2 )l ] v1l1 v2l2 )l0 )l ] ρ, E[v1l ] [car] ρ, E[(car (pair [cdr] ρ, E[(cdr (pair v1l1 v2l2 )l0 )l ] → ρ, E[v2l ] → The [cons] rule produces a pair value from two argument values. Car selects the first component of pairs by rule [car], the [cdr] rule handles cdr. The combination of the reduction rules defines the reduction relation → ⊆ Conf × Conf ρ, e → ρ′ , e′ ρ, X [e] →d ρ′ , X [e′ ] The [defvar] and [defun] rules satisfy top-level definitions. The [defvar] rule inserts the new global binding in the variable environment ρ. The condition x 6∈ dom(ρ) guarantees that there is only one global variable for every name. The [defun] rule does the equivalent for procedures. The [exp] rule allows the use of all the reductions for expressions at the places defined using the contexts ExpContext. Again →d ∗ is the reflexive transitive closure of →d . 4.7 The Evaluation Function of Programs The reduction relation →d rewrites the program until it gets a final answer. This does not always happen: the program may loop in which case the reduction sequence is infinite, or evaluation may get stuck at a configuration with no matching reduction rule. Thus, the reduction relation induces a partial evaluation function eval: eval : Prg(99K Val v eval(p) = undefined if [], p →d ∗ ρ, v otherwise for some ρ 5. SPECIFICATION OF THE ANALYSIS This section specifies a flow analysis for Mini Emacs Lisp. With the help of the definitions for the abstract domains of the analysis we define an acceptability relation for correct flow analyses which employ these domains. The actual analysis results directly from the definition of the acceptability relation. 5.1 Abstract Domains Here are the abstract domains of the analysis: bp bpe ∈ ∈ p̂ ∈ v̂ ∈ ρ̂ ∈ Ĉ ∈ d BP \ BPEnv = = \ Cons d Val = = d Env \ Cache Lab ∪ {⋄} d SymVar → BP \ Lab × Lab × BPEnv \ P(SymVar ∪ {ω} ∪ Fun ∪ Cons) d d → Val = (SymVar × BP) d \ = (Lab × BPEnv) → Val d for short—denote syntactic locations of Birthplaces—BP variable bindings. The ⋄ stands for top-level bindings. The label of the body of a function or of a let expression serves as the birthplace for the binding it creates. \ are abstractions over Birthplace environments BPEnv regular variable environments; they map variables to birthplaces instead of regular values. \ is one part of the abstract value domain; it is the Cons set of all possible abstract pairs and contains all triples of two labels and a birthplace environment. The two labels are the labels of these two argument subexpressions of the cons expression which created the pair. The birthplace environment registers the abstract bindings active at the time of creation of the pair. Registering the birthplace environment is necessary because we differentiate program points depending on the birthplace environments they occur under. d is the set of all possible abstract values v̂. An abstract Val value represents a set of run-time values. Not every run-time value is relevant to the analysis: the single symbol ω represents all primitive values except for symbols. The analysis tracks symbols needed (eventually) for variable names, functions, primitive values, and abstract pairs. \ is an abstract profile of An abstract cache Ĉ of Cache all values which occur during a program run. It tracks the abstract values of program subexpressions, differentiated by birthplace environment. d is a union of the enAn abstract environment ρ̂ of Env vironments that occur during the evaluation of a program. It associates a variable name and one of its birthplaces with an abstract value. 5.2 Acceptability for Programs We define an acceptability relation for programs |=: d × BPEnv \ × Env \ × Prg. |= ⊆ Cache The |= relation defines the validity of analyses (Ĉ, ρ̂) with regard to a program p and a current birthplace environment bpe. From now on, the notation is (Ĉ, ρ̂) |=bpe p. 5.2.1 Value Expressions (Ĉ, ρ̂) |=bpe cl iff ω ∈ Ĉ(l, bpe) [quote] (Ĉ, ρ̂) |=bpe (quote s)l iff s ∈ Ĉ(l, bpe) [lam] (Ĉ, ρ̂) |=bpe (lambda (x) e0 )l iff (func (x) e0 ) ∈ Ĉ(l, bpe) [c] The [c], [quote], and [lam] clauses register their abstract counterpart in the abstract cache under the program point l and the current birthplace environment bpe. Note that [lam] does not require that the analysis is also valid for the body of each lambda term, because an acceptable analysis must only treat the reachable functions correctly. 5.2.2 Expressions Occurrences of variable references and mutations induce further validity constraints. The [var] rule for variable references enforces that the abstract value for the variable x and its current birthplace bpe(x), held in the abstract environment, must be a subset of the abstract value that linked it to its label and birthplace environment in the abstract cache: [var] (Ĉ, ρ̂) |=bpe xl iff ρ̂(x, bpe(x)) ⊆ Ĉ(l, bpe) The [setq] clause enforces that the analysis for the righthand side is also valid. Moreover, a valid analysis allows values that result from the subexpression t0 to be possible values for the variable x under the current bindings bpe and also for the whole expression: [setq] (Ĉ, ρ̂) |=bpe (setq x tl00 )l iff (Ĉ, ρ̂) |=bpe tl00 ∧ Ĉ(l0 , bpe) ⊆ ρ̂(x, bpe(x)) ∧ Ĉ(l0 , bpe) ⊆ Ĉ(l, bpe) The [app] clause specifies the constraints for procedure calls. Its first and second condition, (Ĉ, ρ̂) |=bpe tl00 and (Ĉ, ρ̂) |=bpe tl11 , guarantee that the analysis is also valid under the same birthplace environment for the operator t0 and the operand t1 : [app] (Ĉ, ρ̂) |=bpe (tl00 tl11 )l iff (Ĉ, ρ̂) |=bpe tl00 ∧ (Ĉ, ρ̂) |=bpe tl11 ∧ l (∀(func (x1 ) tbb ) ∈ Ĉ(l0 , bpe). lb (Ĉ, ρ̂) |=bpe 0 tb ∧ Ĉ(l1 , bpe) ⊆ ρ̂(x1 , lb ) ∧ Ĉ(lb , bpe 0 ) ⊆ Ĉ(l, bpe) ∧ where bpe 0 = bpe[x1 7→ lb ]) l Every function (func (x1 ) tbb ) which might occur in the operator position t0 of a procedure call under bpe must have a valid analysis for its body as well, under an expanded birthplace environment bpe 0 which contains a binding for the function parameter x1 . Moreover, the analysis must link the abstract values of the argument with those of the formal parameter x1 as well as the possible results of the body with the those of the whole expression. The [let] clause works analogously to function application: [let] (Ĉ, ρ̂) |=bpe (let x tl11 tl22 )l iff (Ĉ, ρ̂) |=bpe tl11 ∧ (Ĉ, ρ̂) |=bpe 0 tl22 ∧ Ĉ(l1 , bpe) ⊆ ρ̂(x, l2 ) ∧ Ĉ(l2 , bpe 0 ) ⊆ Ĉ(l, bpe) where bpe 0 = bpe[x 7→ l2 ] In the [if ] clause, each branch contributes to a valid analysis: [if ] (Ĉ, ρ̂) |=bpe (if tl00 tl11 iff (Ĉ, ρ̂) |=bpe tl00 ∧ (Ĉ, ρ̂) |=bpe tl11 ∧ tl22 )l that valid analyses agree with the semantics—that is, that they are semantically correct. However, the reduction rules generate intermediate terms not covered by the rules so far. Here they are: [const] Ĉ(l1 , bpe) ⊆ Ĉ(l, bpe) ∧ (Ĉ, ρ̂) |=bpe tl22 ∧ Ĉ(l2 , bpe) ⊆ Ĉ(l, bpe) [sym] 5.2.3 Primitives [proc] Each primitive has its own associated flow behavior. Pair construction and selection serve as examples. The [cons] rule for the pair constructor is straightforward: A cons produces an abstract pair from abstract values with the labels of its arguments, under the original birthplace environment: (Ĉ, ρ̂) |=bpe (cons tl11 tl22 )l iff (Ĉ, ρ̂) |=bpe tl11 ∧ (Ĉ, ρ̂) |=bpe tl22 ∧ (l1 , l2 , bpe) ∈ Ĉ(l, bpe) [cons] The [const], [sym], and [proc] clauses are identical to their equivalents [c], [quote], and [lam] because their semantics are identical. The [pair] clause is a simpler version of the [cons] clause. The only difference is—since a pair already carries two values in it—that it is unknown under which prior birthplace environment the evaluation took place. The only requirement is that a suitable birthplace environment bpe 0 exists: [pair] The [car] and [cdr] clauses are also straightforward: They induce validity constraints on the argument, and then simply pick the first or second component respectively of the abstract pairs flowing into it: [car] [cdr] (Ĉ, ρ̂) |=bpe (car tl00 )l iff (Ĉ, ρ̂) |=bpe tl00 ∧ (∀(l1 , l2 , bpe 0 ) ∈ Ĉ(l0 , bpe). Ĉ(l1 , bpe 0 ) ⊆ Ĉ(l, bpe)) (Ĉ, ρ̂) |=bpe (cdr tl00 )l iff (Ĉ, ρ̂) |=bpe tl00 ∧ (∀(l1 , l2 , bpe 0 ) ∈ Ĉ(l0 , bpe). Ĉ(l2 , bpe 0 ) ⊆ Ĉ(l, bpe)) [defvar] (Ĉ, ρ̂) |=bpe (defvar x tl00 ) p iff (Ĉ, ρ̂) |=bpe tl00 ∧ Ĉ(l0 , bpe) ⊆ ρ̂(x, bpe(x)) ∧ (Ĉ, ρ̂) |=bpe p A valid analysis must reflect the value initially bound to the variable. It must also associate the variable x with its abstract values under the current binding bpe(x). Moreover, a valid analysis must take into account the rest of the program p, too. The [defun] clause registers the procedure in the abstract environment. As in the [defvar] clause, the rest of the program p must be valid as well. [defun] (Ĉ, ρ̂) |=bpe (defun x0 (x1 ) e) p iff (func (x1 ) e) ∈ ρ̂(x0 , ⋄) ∧ (Ĉ, ρ̂) |=bpe p 5.2.5 Values So far, the definition of the relation |= for all possible expressions and programs checks whether a certain analysis for a program is valid or not. Now, the next goal is to show (Ĉ, ρ̂) |=bpe (pair v1l1 v2l2 )l iff ∃bpe 0 .(Ĉ, ρ̂) |=bpe 0 v1l1 ∧ (Ĉ, ρ̂) |=bpe 0 v2l2 ∧ (l1 , l2 , bpe 0 ) ∈ Ĉ(l, bpe) 5.2.6 Intermediate Expressions The final clause in the definition of acceptability handles intermediate bind expressions. A bind expression binds a variable x to a value v1 during the evaluation of the body t2 : [bind] 5.2.4 Definitions The [defvar] and [defun] clauses extend the notion of valid scope analyses to entire programs. The [defvar] clause handles variable definitions: (Ĉ, ρ̂) |=bpe (prim c)l iff ω ∈ Ĉ(l, bpe) (Ĉ, ρ̂) |=bpe (sym s)l iff s ∈ Ĉ(l, bpe) (Ĉ, ρ̂) |=bpe (func (x) e0 )l iff (func (x) e0 ) ∈ Ĉ(l, bpe) (Ĉ, ρ̂) |=bpe (bind x v1 tl22 )l iff (Ĉ, ρ̂) |=bpe 0 tl22 ∧ Ĉ(l2 , bpe 0 ) ⊆ Ĉ(l, bpe) ∧ v1 A (ρ̂(x, l2 ), Ĉ) where bpe 0 = bpe[x 7→ l2 ] The [bind] rule requires a valid analysis for the body under a suitably extended birthplace environment bpe 0 . Moreover, the value of the body becomes the value of the bind expression. The supplementary constraint v1 A (ρ̂(x, l2 ), Ĉ) reflects that the actual new binding also has to show up in the abstract variable environment under the the relevant birthplace l2 ; the A relation is explained in the next section. 5.3 The Approximation Relation A Intuitively, the |= relation determines if a dynamic scope analysis (Ĉ, ρ̂) correctly reflects the evaluation process of a program a in an abstract sense. The formulation of |= uses the approximation relation A that regulates the approximation of values Val with abstract equivalents. Here is its formal definition: d × Cache \ A ⊆ Val × Val v A (v̂, Ĉ) iff ∀c ∀s ∀f ∀v1 ∀v2 . ((v = c ⇒ ω ∈ v̂) ∧ (v = s ⇒ s ∈ v̂) ∧ (v = f ⇒ f ∈ v̂) ∧ (v = (pair v1l1 v2l2 ) ⇒ ∃bpe. (l1 , l2 , bpe) ∈ v̂ ∧ v1 A (Ĉ(l1 , bpe), Ĉ) ∧ v2 A (Ĉ(l2 , bpe), Ĉ))) A holds between a value v and its correct representation as a set of abstract values and an abstract cache. This is straightforward except for the treatment of pairs: The representation of a pair consists of its components’ creation points and a birthplace environment. An abstract representation however must also map to abstract values for its components. This is why a value cache Ĉ participates in the definition of A. 5.4 The Well-Definedness of |= It is not immediately clear that the acceptability relation |= from Subsection 5.2 is unambiguous. Structural induction by itself is not sufficient because the [app] clause is not compositional. On the other hand, the specifications of |= can be considered as a functional d × BPEnv \ × Env \ × Exp) → Q : P(Cache d \ \ × Exp) P(Cache × Env × BPEnv with (Ĉ, ρ̂, bpe, (let x tl11 tl22 )l ) ∈ Q(R) iff R(Ĉ, ρ̂, bpe, tl11 ) ∧ R(Ĉ, ρ̂, bpe[x 7→ l2 ], tl22 ) ∧ Ĉ(l1 , bpe) ⊆ ρ̂(x, l2 ) ∧ Ĉ(l2 , bpe[x 7→ l2 ]) ⊆ Ĉ(l, bpe) (Ĉ, ρ̂, bpe, . . .) ∈ Q(R) iff . . . This change in perspective leads to a specification of |= using sound mathematical means. Q is a monotone function on the complete lattice because d × BPEnv \ × Env \ × Exp), ⊑) (P(Cache d × BPEnv \ × Env \ × Exp), ⊑) is a com• (P(Cache plete lattice with respect to the partial order R1 ⊑ R2 iff ∀t.t ∈ R1 ⇒ t ∈ R2 , and • Q is a monotone function on this complete lattice— that is, ∀R1 , R2 .R1 ⊑ R2 ⇒ Q(R1 ) ⊑ Q(R2 ). Consequently, Q has a greatest fixed point. Thus, a welldefined definition of |= works by coinduction as |= := gfp(Q). 5.5 Acceptability for Environments Since dynamic scope analysis is ultimately concerned with scope and hence with environments, it is necessary to extend the notion of acceptability to environments: d × BPEnv \ × Env \ × Env |= ⊆ Cache (Ĉ, ρ̂) |=bpe ρ iff ∀x ∈ dom(ρ).ρ(x) A (ρ̂(x, bpe(x)), Ĉ) This acceptability relation for environments examines every binding in an actual environment which occurs during evaluation and relates it to its abstract counterpart for correctness. 5.6 Acceptability for Configurations The combination of the acceptability relation for programs with that for environments produces an acceptability relation for configurations—combinations of environments and expressions: d × BPEnv \ × Env \ × Conf |= ⊆ Cache (Ĉ, ρ̂) |=bpe ρ, e iff (Ĉ, ρ̂) |=bpe ρ ∧ (Ĉ, ρ̂) |=bpe e Furthermore, it is possible to define an acceptability relation for program configurations—combinations of programs and environments: d × BPEnv \ × Env \ × PConf |= ⊆ Cache (Ĉ, ρ̂) |=bpe ρ, p iff (Ĉ, ρ̂) |=bpe ρ ∧ (Ĉ, ρ̂) |=bpe p 6. SEMANTIC CORRECTNESS The semantics developed in Section 4 employs evaluation contexts and rewriting rules. Hence, the specification of the semantics uses almost exclusively syntactical means with the exception of the notion of environments: a program transitions through a sequence of configurations which include valid programs or expressions until it reaches a final value, gets stuck or loops forever. The definition of the acceptability relation in the previous section was derived intuitively. A correctness proof is necessary which must show that every valid analysis stays valid under the evaluation process. This section summarizes the most import lemmas and theorems involved in the proof. For details, the reader is referred to Neubauer’s thesis dissertation [21]. The first lemma states that a dynamic scope analysis is valid for a value if and only if the value is part of the abstract cache at its label and the given bpe: Lemma 1 (Ĉ, ρ̂) |=bpe v l iff v A (Ĉ(l, bpe), Ĉ) Proof. By structural induction over v. Another lemma states the obvious assumption, that if an abstract value v̂1 is a correct approximation of a true value v, it is also a correct approximation of another abstract value v̂2 which includes the former one: Lemma 2 If v A (v̂1 , Ĉ) and also v̂1 ⊆ v̂2 then v A (v̂2 , Ĉ). Proof. By structural induction over v. Each case of the proof is obtained individually by inspecting the definition of A. The specification of the acceptability relation has the important property stated by the following lemma: if an analysis is valid for a term t at label t1 , and the abstract values flowing through it are all contained in the values flowing through label l2 , the analysis is also valid at label l2 : Lemma 3 If (Ĉ, ρ̂) |=bpe tl1 and Ĉ(l1 , bpe) ⊆ Ĉ(l2 , bpe) then also (Ĉ, ρ̂) |=bpe tl2 . Proof. by case analysis over the rules of Term. As an example, here is the case for setq expressions: From the first premise (Ĉ, ρ̂) |=bpe (setq x tl00 )l1 7. IMPLEMENTATION follows (Ĉ, ρ̂) |=bpe tl00 (1) Ĉ(l0 , bpe) ⊆ ρ̂(x, bpe(x)) (2) Ĉ(l0 , bpe) ⊆ Ĉ(l1 , bpe) (3) by the [setq] clause of |=. The assumption together with (3) yields Ĉ(l0 , bpe) ⊆ Ĉ(l2 , bpe). (4) The backwards application of the [setq] clause together with (1) and (2) yields the proposition. The other cases work analogously. Another central insight is that the validity of the dynamic scope analysis of an expression carries over those subexpressions which are in an evaluation context. Even stronger, such a subexpression can be replaced by another valid one without violating its validity. With this result, the further proof of the correctness of a reduction step can concentrate on the possible redexes of all expressions; the following lemma then allows us to generalize the result to the big picture. This facility is known as replacement lemma in the realm of combinatory logic [13]: The definition of the acceptability relation presented in Section 5.2 is a blueprint for a practical implementation of a dynamic scope analysis. Our own analysis is constraintbased [1]; it uses a set of syntactic entities to represent applications of the rules generated by the acceptability relation. The analysis, just like every other constraint-based program analysis, consists of two phases: constraint generation and constraint simplification. In the following we consider a fixed program p∗ and describe how to compute the least dynamic flow analysis for p∗ which is acceptable with respect to the acceptability relation |=. Since the program p∗ is finite, it is possible to enumerate all its occurring labels, symbol, and functions. We call these finite sets Lab∗ , SymVar∗ and Fun∗ , respectively. Similarly, the sets of possibly occurring birthplace environments \ ∗ and possibly occurring abstract pairs Cons \ ∗ are BPEnv also identifiable and finite. Accordingly, the finite set of all abstract values that are conceivable for all possible program runs of p∗ is \ ∗. a ∈ Abs∗ = SymVar∗ ∪ {ω} ∪ Fun∗ ∪ Cons The finite sets serve as basis for the specification of the dynamic scope analysis for a program p∗ . 7.1 Lemma 4 If (Ĉ, ρ̂) |=bpe E[tl11 ] where E[tl11 ] is carrying the label l, then there exists bpe 0 such that a) (Ĉ, ρ̂) |=bpe 0 tl11 holds. b) If also (Ĉ, ρ̂) |=bpe 0 tl21 then (Ĉ, ρ̂) |=bpe E[tl21 ]. c) If also E ∈ VarContext(x) then bpe 0 (x) = bpe(x). Generating Constraints In the constraints generated by the analysis, flow variables V stand for sets of abstract values. A flow variable Cl,bpe stands for the set of abstract values in the abstract cache at label l and birthplace environment bpe. A flow variable rx,l stands for a set of abstract values in the abstract environment. A constraint co in our analysis belongs to one of three different kinds. A simple constraint of the form {a} ⊆ V, Proof. Structural induction over E. The first main theorem is subject reduction for expressions under the reduction relation →. A valid dynamic scope analysis for an expression e and a correct approximation of the environment stay valid after one step with → for the resulting expression e′ and the modified environment: where a is an abstract value of Abs∗ , states that a certain abstract value a must be member of the set of abstract values A. A variable constraint V0 ⊆ V1 says that the abstract values of V0 are all contained in those of V1 . A conditional constraint {a} ⊆ V =⇒ co Theorem 1 If (Ĉ, ρ̂) |=bpe ρ, e and ρ, e → ρ′ , e′ then also (Ĉ, ρ̂) |=bpe ρ′ , e′ . Proof. By case analysis over the reduction relation →. The second theorem formulates subject reduction for entire programs, adapting the previous theorem one to the reduction relation →p : Theorem 2 If (Ĉ, ρ̂) |=bpe ρ, p and ρ, p →d ρ′ , p′ then also (Ĉ, ρ̂) |=bpe ρ′ , p′ . Proof. By case analysis over →d . where a is an abstract value of Abs∗ and co is another constraint, states that the constraint co must hold if the abstract value a is a member of the set of abstract values denoted by V. By inspecting the rules of the acceptability relation, we define the function GJpKbpe M that constructs the set of constraints to be solved, as shown in Figure 3. Its first parameter is the program or expression for which constraints are generated. The second one, bpe, is the birthplace environment, relative to which the generation of the constraints takes place. The third parameter, M, is a set of pairs of a label of a body of a procedure lb and a birthplace environment bpe each. This set memoizes instances of pairs of procedures and birthplace environments already handled by the constraint generation. The analysis uses it to prevent generating duplicate constraints. GJcl Kbpe M GJ(quote s)l Kbpe M GJ(lambda (x) e0 )l Kbpe M GJxl Kbpe M GJ(setq x tl00 )l Kbpe M = = = = = GJ(tl00 tl11 )l Kbpe M = GJ(let x tl11 tl22 )l Kbpe M = GJ(if tl00 tl11 tl22 )l Kbpe M = GJ(cons tl11 tl22 )l Kbpe M = GJ(car tl00 )l Kbpe M = GJ(cdr tl00 )l Kbpe M = GJ(defvar x tl00 p)Kbpe M = GJ(defun x0 (x1 ) e) pKbpe M = {{ω} ⊆ Cl,bpe } {{s} ⊆ Cl,bpe } {{(func (x) e0 )} ⊆ Cl,bpe } {rx,bpe(x) ⊆ Cl,bpe } GJtl00 Kbpe M ∪ {Cl0 ,bpe ⊆ rx,bpe(x) } ∪ {Cl0 ,bpe ⊆ Cl,bpe } GJtl00 Kbpe M ∪ GJtl11 Kbpe M l ∪ {{(func (x1 ) tbb )} ⊆ Cl0 ,bpe =⇒ co l | (func (x1 ) tbb ) ∈ Fun∗ , bpe 0 = bpe[x1 7→ lb ], (lb , bpe 0 ) 6∈ M, M′ = M ∪ {(lb , bpe 0 )}, l co ∈ GJtbb Kbpe 0 M′ } l ∪ {{(func (x1 ) tbb )} ⊆ Cl0 ,bpe =⇒ Cl1 ,bpe ⊆ rx1 ,lb l | (func (x1 ) tbb ) ∈ Fun∗ } lb ∪ {{(func (x1 ) tb )} ⊆ Cl0 ,bpe =⇒ Clb ,bpe[x1 17→lb ] ⊆ Cl,bpe l | (func (x1 ) tbb ) ∈ Fun∗ } l1 l2 GJt1 Kbpe M ∪ GJt2 Kbpe M ∪ {Cl1 ,bpe ⊆ rx,l2 } ∪ {Cl2 ,bpe[x7→l2 ] ⊆ Cl,bpe } GJtl00 Kbpe M ∪ GJtl11 Kbpe M } ∪ {Cl1 ,bpe ⊆ Cl,bpe } ∪ GJtl22 Kbpe M } ∪ {Cl2 ,bpe ⊆ Cl,bpe } GJtl11 Kbpe M ∪ GJtl22 Kbpe M ∪ {{(l1 , l2 , bpe)} ⊆ Cl,bpe } GJtl00 Kbpe M ∪ {{(l1 , l2 , bpe 0 )} ⊆ Cl0 ,bpe =⇒ Cl1 ,bpe 0 ⊆ Cl,bpe | (l1 , l2 , bpe 0 ) ∈ Cons∗ } GJtl00 Kbpe M ∪ {{(l1 , l2 , bpe 0 )} ⊆ Cl0 ,bpe =⇒ Cl2 ,bpe 0 ⊆ Cl,bpe | (l1 , l2 , bpe 0 ) ∈ Cons∗ } GJtl00 Kbpe M ∪ {Cl0 ,bpe ⊆ rx,bpe(x) } ∪ GJpKbpe M {{(func (x1 ) e)} ⊆ rx0 ,⋄ } ∪ GJpKbpe M Figure 3: Generating Constraints. The constraint generation rules in Figure 3 are mostly straightforward translations of the corresponding rules of the acceptability relation. The most involved case is the treatment of procedure applications. In addition to the generation of constraints for the terms at the operator position and the parameter position, every procedure flowing into the operator triggers the generation of constraints for its body under the current birthplace environment via a conditional constraint. The treatment of the primitives car and cdr works in a similar way: we do not know, which abstract pairs could occur at the operator position. Therefore, the anaylsis generates conditional constraints for all abstract pairs. The well-definedness and the termination of the algorithm follow by simple fix-point arguments on a underlying finite complete lattice. GJpKbpe M as specified generates a large number of conditional constraints in the application and car and cdr rules, many of which are never triggered during the constraintsolving phase. Therefore, our implementation defers the generation of their right-hand sides until constraint solving. It would have been possible to specify the analysis this way from the beginning, but this would mean having to mix the constraint-generation and constraint-solving phase, which would obscure the presentation. 7.2 Solving the Constraints The generated set of constraints express the behavior of all valid dynamic scope analyses. To get the least dynamic scope analysis, we close the generated constraints under the following inference rules S: [VarProp] [CondProp] {a} ⊆ V0 V0 ⊆ V1 {a} ⊆ V1 {a} ⊆ V {a} ⊆ V =⇒ co co and write S(CO) for the closure of a set of constraints CO under S. The actual dynamic scope analysis results from the solving phase as all abstract values associated with a variable V after generating the initial constraints and closing those constraints under S: dsa(p)(V) = {a | {a} ⊆ V ∈ S(GJpKbpe ⋄ ∅ )} where bpe ⋄ denotes the top-level birthplace environment. For our implementation, we employ the standard technique of using a graph representation for the constraint set and apply a worklist algorithm on the graph to compute the least solution of the original constraints [2, 14, 32]. Package mail-utils.el rfc822.el add-log.el pop3.el footnote.el Lines Prims Bps 355 378 718 839 975 51 48 74 67 47 63 56 67 169 153 Dynamic Bps 0 1 1 5 0 Iters 4159 89428 22284 93640 115930 Analysis Time (sec) 0.96 81.84 8.32 130.49 73.86 Figure 4: Analyzed Emacs Lisp packages, their size in lines of code after macro expansion, the number of additionally used primitives, the number of birthplaces, the number of birthplaces recognized as dynamic binding, the number of iterations the worklist algorithm used, and the analysis time. The worklist algorithm always terminates. Every program induces only a finite set of abstract values (Abs∗ ) and there is only a finite number of potential nodes since there is only a finite number of program points, variables, and birthplace environments. Hence, the analysis propagates a finite number of data objects over a finite number of nodes. The process ends after a finite number of steps: at the latest when every datum has arrived at every node. The algorithm has exponential worst-case complexity with respect to the size of the analyzed program: the number of all possible birthplace environments is already exponential. However, the next section shows that our prototype implementation is already practical for medium-sized realworld examples. Also, since the translation of Emacs Lisp programs into a new substrate ideally happens only once, speed is not quite as important as, say, in compilers which run often. Instead, precision is at a premium. 8. MEETING THE REAL WORLD Our prototype implementation of the algorithm is in about 5500 lines of Scheme code and runs atop Scheme 48 [15], a byte-code implementation of Scheme. It handles a large subset of Emacs Lisp programs. Specifically, it correctly deals with a number of aspects of the language not treated in this paper including the following: • multi-parameter functions and optional arguments, • catch and throw, • funcall and the duality between functions and their names, and • separation of function and value components of bindings. In this section, we present the results of the analysis run on various packages taken directly from the XEmacs package collection. To receive accurate information from real packages, the implementation must know the flow behavior of a substantial number of used XEmacs’s primitives. The implementation contains a small macro language to describe the flow behavior of basic primitives for which no implementation in Emacs Lisp exists. Using those macros simplifies the description of the primitives tremendously. For instance, the three lines (primitive-flow (FILENAME) ((const) (union (symbol t) (symbol nil)))) describes the constraint generation for the built-in primitive file-exists-p that checks for the existence of a file with a given name. Currently, the system emits an annotated version of the input program, marking those bindings which would have to stay dynamic under lexical binding. The binding-type condition which we use to decide to which type a variable reference belongs, is the following: Binding-type condition A variable is used dynamically iff the abstract cache registers some abstract object for the variable under its label for a different birthplace than the static one, that is iff xl0 with static birthplace l1 is dynamic iff ∃bpe.bpe(x) 6= l1 ∧ Ĉ(l0 , bpe) 6= ∅. Further conditions exist for our implementation to recognize Emacs Lisp lets used in the flavor of statically scoped let*’s or letrec’s in Scheme. The results in this section were obtained by running the implementation under the Scheme system Scheme 48 0.53 on an Athlon 1 GHz system with 256 kByte second-level cache and 256 MByte of physical memory. We did not put any effort to highly optimize or to compile our implementation to native code; feasibility was our main concern. Figure 4 lists the packages used for the experimental results. They are all part of the regular XEmacs distribution. Mail-utils contains utility functions used both by the other packages rmail and rnews. Rfc822 implements a parser for the standard internet messages. Pop3 provides POP3 functionality for email clients. Add-log lets a programmer manage files of changes for programs. Footnote offers the functionality to add footnotes to XEmacs buffers. Figure 4 shows that the analysis is highly accurate: it only leaves behind a small number of dynamic binding constructs. 9. LIMITATIONS While the analysis described here solves some of the hardest problems associated with translating Emacs Lisp programs to readable Scheme programs, a few remain: eval Emacs Lisp has an eval function which interprets a piece of data as an Emacs Lisp expression. Its semantics is naturally quite undefined in a Scheme environment. Except for simple cases (for example, where the expression to be evaluated is a symbol), there is no idiomatic translation for eval forms. Programmers must transform the Emacs Lisp code not to use eval before attempting translation. Dynamic generation of symbols as well as some introspection capabilities of the language also belong in this category. buffer-local variables Emacs Lisp features buffer-local variables which implicitly change value according to the current buffer. This an unfortunate conflation of the language semantics and the application domain, and often yields to unexpected and hard-to-track behavior of Emacs Lisp code. However, buffer locality is usually a global property of variables—programs rarely use the same variable both buffer-locally and buffer-globally. Hence, a feasible approach is to translate buffer-local variables into special designated data structures and access them via special constructs rather than preserving their implicit nature. No special analysis is required as long as the calls to make-variablebuffer-local are close to their variable declarations. Note that these kinds of problems are inherent in almost any translation from one programming language to another, if maintainability is to be preserved. 10. RELATED WORK 10.1 Dynamic Binding Despite the fact that languages with dynamic variable binding have existed for a long time, formulations of semantics for these languages are quite rare. On the other hand, it is folklore that dynamic binding can be eliminated by a dynamic-environment passing translation that makes the dynamic bindings explicit [24]. Gordon [9] initially formalized dynamic binding in the context of early Lisp dialects and studied their metacircular interpreters, using denotational semantics. Moreau [20] rounded up the view on dynamic binding by introducing a syntactic theory of dynamic binding with a calculus allowing equational reasoning. From this theory, he also derived a small-step semantics using evaluation contexts and syntactic rewriting as developed by Felleisen and Friedman [6]. Wright and Felleisen [30, 31] and Harper and Stone [11] formulated semantics for exception mechanisms which also employ a kind of dynamic binding. Lewis at al. [17] introduce a language feature called implicit parameters that provides dynamically scoped variables for languages with Hindley-Milner type systems, and formalize it with an axiomatic semantics. However, functions with implicit parameters are not first-class values in their setting. 10.2 Flow Analysis Most realistic implementations of flow analysis for functional programming languages are simple monovariant (or 0-CFA) flow analyses [12, 7, 8, 26], that is, the analysis looks at every program point independent of context. Shivers [27] proposed the splitting of the analysis at a function call sites depending on the context of the last recent k procedure calls (called k -CFA). Other splittings, also depending on procedure calls, were proposed by Jagannathan and Weeks [14] as poly-k -CFA and by Wright and Jagannathan [32] as polymorphic splitting. The concept of coinduction arose from Milner and Tofte’s works [18] on semantics and type systems of an extended λcalculus with references. Nielson and Nielson were the first to use coinduction as a means for specifying a static analysis [22]. Their work provides the theoretical framework for the specification of our analysis. 10.3 Subject reduction The notion of correctness we used is generally called a subject reduction result. Curry and Feys [5] introduced subject reduction to show the correctness of predicates in the languages of combinatory logic. Mitchell and Plotkin [19] used the idea to show a type correctness result for a λ-calculus like language, whereas Wright and Felleisen [30, 31] adapted it to the more flexible concept of operational semantics with reduction rules and evaluation contexts. Wright and Jagannathan [32] used the same technique for their polymorphic splitting flow analysis. 10.4 Emacs Lisp and Scheme A number of other projects have built or are currently building Scheme-based variants of Emacs. The oldest is Matt Birkholz’s Emacs Lisp interpreter which allows running Emacs Lisp programs on top of MIT Scheme’s Edwin editor. Current efforts include Ken Raeburn’s work on creating a Guile-based Emacs [25], the Guile Emacs project [10] as well as Per Bothner’s JEmacs [3, 4] which aims at reimplementing Emacs atop Java bytecodes, leveraging Bothner’s Kawa compiler for Scheme. As far as Emacs Lisp is concerned, only JEmacs seems to have seen significant work as far as making Emacs Lisp programs run. None of these projects address permanently translating Emacs Lisp code to Scheme while retaining maintainability. 11. CONCLUSION AND FUTURE WORK We have specified, proved correct and implemented a flow analysis for Emacs Lisp whose distinguishing feature is its correct handling of dynamic binding. The primary purpose of the analysis is to aid translation of Emacs Lisp programs into more modern language substrates with lexical scoping since most binding in real Emacs Lisp programs behaves identically under lexical and dynamic scoping. Our analysis is highly accurate in practice. Our prototype implementation is reasonably efficient. We have two main directions for future research: • Improving the efficiency of the analysis by ordinary optimization, compilation code and modularization of the constraints [8], and • integration of the analysis into a translation suite from Emacs Lisp to Scheme. Acknowledgments. We would like to thank the initial members of the el2scm project: Martin Gasbichler, Johannes Hirche, and Peter Biber. Specifically, Peter Biber developed a precursor to the analysis presented here, demonstrating the feasibility of the project. Peter Thiemann provided valuable suggestions for the paper. We also thank the ICFP referees for valuable comments. 12. REFERENCES [1] A. Aiken. Set constraints: Results, applications and future directions. Lecture Notes in Computer Science, 874:326–335, 1994. [2] A. Aiken and E. Wimmers. Type inclusion constraints and type inference. In Proceedings of the FPCA 1993, pages 31–41, 1993. [3] P. Bothner. JEmacs-the java/scheme-based emacs. In Proceedings of the FREENIX Track: 2000 USENIX Annual Technical Conference (FREENIX-00), pages 271–278, Berkeley, CA, June 18–23 2000. USENIX Ass. [4] P. Bothner. JEmacs—the Java/Scheme-based Emacs text editor., Feb. 2001. [5] H. B. Curry and R. Feys. Combinatory Logic, volume I. North-Holland, Amsterdam, 1958. [6] M. Felleisen and D. P. Friedman. Control operators, the SECD-machine, and the λ-calculus. In M. Wirsing, editor, Formal Description of Programming Concepts III, pages 193–217. North-Holland, 1986. [7] C. Flanagan and M. Felleisen. Set-based analysis for full scheme and its use in soft-typing. Technical Report TR95-254, Rice University, Oct., 1995. [8] C. Flanagan and M. Felleisen. Componential set-based analysis. ACM Transactions on Programming Languages and Systems, 21(2):370–416, Mar. 1999. [9] M. J. C. Gordon. Programming Language Theory and its Implementation. Prentice-Hall, 1988. [10] Guile Emacs., July 2000. [11] R. Harper and C. Stone. An interpretation of Standard ML in type theory. Technical Report CMU-CS-97-147, Carnegie Mellon University, Pittsburgh, PA, June 1997. (Also published as Fox Memorandum CMU-CS-FOX-97-01.). [12] N. Heintze. Set-based analysis of ML programs. In ACM Conference on Lisp and Functional Programming, pages 306–317, 1994. [13] R. Hindley and J. Seldin. Introduction to Combinators and λ-Calculus, volume 1 of London Mathematical Society Student Texts. Cambridge University Press, 1986. [14] S. Jagannathan and S. Weeks. A Unified Treatment of Flow Analysis in Higher-Order Languages. In POPL, 1995. [15] R. A. Kelsey and J. A. Rees. A tractable Scheme implementation. Lisp and Symbolic Computation, 7(4):315–335, 1995. [16] B. Lewis, D. LaLiberte, R. Stallman, and the GNU Manual Group. GNU Emacs Lisp reference manual. elisp.html, 1785. [17] J. Lewis, M. Shields, E. Meijer, and J. Launchbury. Implicit parameters: Dynamic scoping with static types. In Proceedings of the 27th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Boston, Massachusetts, pages 108–118, Jan 2000. [18] R. Milner and M. Tofte. Co-induction in relational semantics. Theoretical Computer Science, 87:209–220, 1991. [19] J. C. Mitchell and G. D. Plotkin. Abstact types have existantial type. In ACM Transcations on Programmin Languages and Systems, volume 10, pages 470–502, July 1988. [20] L. Moreau. A Syntactic Theory of Dynamic Binding. Higher-Order and Symbolic Computation, 11(3):233–279, Dec. 1998. [21] M. Neubauer. Dynamic scope analysis for Emacs Lisp. Master’s thesis, Eberhard-Karls-Universität Tübingen, Dec. 2000. http://www.informatik.uni-freiburg. de/~neubauer/ [22] F. Nielson and H. R. Nielson. Infinitary control flow analysis: a collecting semantics for closure analysis. In Proc. POPL’97, pages 332–345. ACM Press, 1997. [23] G. D. Plotkin. A structural approach to operational semantics. Technical Report DAIMI FN-19, Computer Science Department, Aarhus University, Aarhus, Denmark, Sept. 1981. [24] C. Queinnec. Lisp in Small Pieces. Cambridge University Press, 1996. [25] K. Raeburn. Guile-based Emacs., July 1999. [26] M. Serrano and M. Feeley. Storage use analysis and its applications. In Proceedings of the 1fst International Conference on Functional Programming, page 12, Philadelphia, June 1996. [27] O. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie-Mellon University, May 1991. [28] R. Stallman. GNU extension language plans. Usenet article, Oct. 1994. [29] B. Wing. XEmacs Lisp Reference Manual. lispref-a4.pdf.gz, May 1999. Version 3.4. [30] A. K. Wright and M. Felleisen. A syntactic approach to type soundness. Technical Report 91-160, Rice University, Apr. 1991. Final version in Information and Computation 115 (1), 1994, 38–94. [31] A. K. Wright and M. Felleisen. A syntactic approach to type soundness. Information and Computation, 115(1):38–94, 1994. Preliminary version in Rice TR 91-160. [32] A. K. Wright and S. Jagannathan. Polymorphic splitting: an effective polyvariant flow analysis. ACM Transactions on Programming Languages and Systems, 20(1):166–207, Jan. 1998.