Tail-call optimization in emacs lisp

It's been a while since I wrote about any coding here. Most of my coding lately has been done at my job, and while it has been interesting, it's not for public consumption. However, I started a fun little project last weekend and finished it this weekend: adding tail-call optimization (TCO) to emacs lisp.

Background

If you're not familiar with TCO, it's not too complicated. Most programming language interpreters (including the CPU) use a stack to keep track of the local variables in a function. When a new function is called, its arguments are pushed on the stack before the call and read from the stack. When the function returns, the arguments are removed from the stack and the old local variables are available just as they were before. However, in the case where the result of the function call is being returned unmodified, this is somewhat wasteful: the old local variables are not going to be used any more so we could just put the arguments on the stack where the old variables were. We can then just jump into the new function.

Using this technique, multiple tail-calls won't grow the stack, and tail-recursive functions (that is, functions that only recurse when they are about to return) become a good alternative to loops. This is heavily used in Scheme, where tail-call elimination is part of the standard. Note that the reason this is an optimization isn't speed of execution; it's more about the amount of memory that is used.

Until recently, TCO was not possible at all in emacs lisp. This is because, until recently, all of emacs lisp was dynamically scoped. Dynamic scoping essentially means that local variables shadow global variables in functions that are called while the local variables are in scope. So if you have global variable a and function b does something that depends on a, then you can change the behavior of b when you call it from c by setting the value of a locally. This rule, while unconventional, can be quite useful in configuring the behavior of the editor. Anyway, dynamic scoping means that before any function returns, all the bound local variables must be unbound so that they no longer affect other functions. This means that a function call is almost never followed by a return.

However, emacs 24 added the option to compile files using lexical scope (the more common scoping rule that you are probably already familiar with if you've gotten this far). There is no need to explicitly unbind variables before a return in lexical scope because local variables aren't bound. Thus we can do TCO on these functions. In short, this is what I added.

Implementation Challenges

The first thing to decide was where to add the TCO code. Since emacs byte-code doesn't have the ability to jump into a function, I quickly decided that I needed to modify the byte-code interpreter instead. This means going into the C guts of emacs.

The first thing I did was to follow the path that a call to a lexically-bound function would make from the byte-code interpreter. I added all this code plus a goto statement in case the operation that followed the call was a return. It wasn't a whole lot of code, so I was feeling fairly good about things.

The first problem with this approach is that the emacs stack could be a different size from what was needed by the calling function. Since the stack is allocated by alloca (which allocates on the processor's stack), and because there is only one return from the byte-code interpreter, I thought I would be fine changing the alloca to a combination of malloc and free. Strangely, when I went to recompile the emacs lisp files, I kept getting errors. This left me fairly befuddled.

I finally figured it out when I realized the garbage collector was looking at the processor's stack to find variables that are in use. Thus to work properly, the emacs stack must be allocated on the processor's stack. That meant I had to go back to alloca (or modify the garbage collector – something I was not likely to do correctly). Since the (emacs) stack size could change, the memory allocated for it would also need to change. I needed something that let me blow away part of the (CPU) stack – goto keeps the stack in place.

In my spelunking in the emacs C code, I had seen functions setjmp and longjmp used to implement emacs's exception mechanism. I had never used these functions before (and with good reason – if you think goto creates spaghetti code, these functions are much worse), but it seemed like they might do what I want. That is, when you call longjmp, you get the effect of a goto, but the processor's stack is restored to the state that it was when you were at the place that jumped to. So by using a longjmp within the function, I could allocate the emacs stack on the processor's stack and still do TCO without growing the amount of memory used for each tail call.

By the way, I also learned more about the volatile keyword than I wanted to know, including this beauty:

funargs = *(volatile Lisp_Object *) &funargs;

That means (as far as I understand it) "put this in memory right now".

Benefits / Limitations

There is an obvious benefit from this work: people can now write tail-recursive functions and not worry about the the size of the input causing the function to stop working. It does, however, have some limitations. It

only works for lexically-bound functions,
only works for byte-compiled functions, and
could cause problems with debugging.

I've gone through the reasons for the first two, and the third should be well known to people who are familiar with TCO. The main reason is that since the function call can't add to the memory usage, it must also not add information about itself to the list of function calls displayed in a backtrace.

Conclusion

I am going to test this for a while longer before I release it. However, I think it's really going to work, and it will allow an entirely different style of programming in emacs lisp. That's pretty exciting for me.