Macroexpand anywhere with local-apply-transformer
!
Racket programmers are accustomed to the language’s incredible capacity for extension and customization. Writing useful macros that do complicated things is easy, and it’s simple to add new syntactic forms to meet domain-specific needs. However, it doesn’t take long before many budding macrologists bump into the realization that only certain positions in Racket code are subject to macroexpansion.
To illustrate, consider a macro that provides a Clojure-style let
form:
(require syntax/parse/define)
(define-simple-macro (clj-let [{~seq x:id e:expr} ...] body:expr ...+)
(let ([x e] ...) body ...))
This can be used anywhere an expression is expected, and it does as one would expect:
> (clj-let [x 1
y 2]
(+ x y))
3
However, a novice macro programmer might realize that clj-let
really only modifies the syntax of binding pairs for a let
form. Therefore, could one define a macro that only adjusts the binding pairs of some existing let
form instead of expanding to an entire let
? That is, could one write the above example like this:
(define-simple-macro (clj-binding-pairs [{~seq x:id e:expr} ...])
([x e] ...))
> (let (clj-binding-pairs
[x 1
y 2])
(+ x y))
3
The answer is no: the binding pairs of a let
form are not subject to macroexpansion, so the above attempt fails with a syntax error. In this blog post, we will examine the reasons behind this limitation, then explain how to overcome it using a solution that allows macroexpansion anywhere in a Racket program.
Why only some positions are subject to macroexpansion
To understand why the macroexpander refuses to touch certain positions in a program, we must first understand how the macro system operates. In Racket, a macro is defined as a compile-time function associated with a particular binding, and macros are given complete control over the syntax trees they are surrounded with. If we define a macro mac
, then we write the expression (mac form)
, form
is provided as-is to mac
as a syntax object. Its structure can be anything at all, since mac
can be an arbitrary Racket function, and that function can use form
however it pleases.
To give a concrete illustration, consider a macro that binds some identifiers to symbols in a local scope:
(define-simple-macro (let-symbols (x:id ...) body ...+)
(let ([x 'x] ...) body ...))
> (let-symbols (hello goodbye)
(list hello goodbye))
'(hello goodbye)
It isn’t the most exciting macro in the world, but it illustrates a key point: the first subform to let-symbols
is a list of identifiers that are eventually put in binding position. This means that hello
and goodbye
are bindings, not uses, and such bindings shadow any existing bindings that might have been in scope:
> (let ([foo 42])
(let-symbols (foo)
foo))
'foo
This might not seem very interesting, but it’s critical to understand, since it means that the expander can’t know which sub-pieces of a use of let-symbols
will eventually be expressions themselves until it expands the macro and discovers it produces a let
form, so it can’t know where it’s safe to perform macroexpansion. To make this more explicit, imagine we define a macro under some name, then try and use that name with our let-symbols
macro:
(define-simple-macro (hello x:id)
(x:id))
> (let-symbols (hello goodbye)
hello)
What should the above program do? If we treat the first use of hello
in the let-symbols
form as a macro application, then (hello goodbye)
should be transformed into (goodbye)
, and the use of hello
in the body should be a syntax error. But if the first use of hello
was instead intended to be a binder, then it should shadow the hello
definition above, and the output of the program should be 'hello
.
To avoid the chaos that would ensue if defining a macro could completely break local reasoning about other macros, Racket chooses the second option, and the program produces 'hello
. The macroexpander has no way of knowing how each macro will inspect its constituent pieces, so it avoids touching anything until the macro expands. After it discovers the let
form in the expansion of let-symbols
, it can safely determine that the body expressions are, indeed, expressions, and it can recursively expand any macros they contain. To put things another way, a macro’s sub-forms are never expanded before the macro itself is expanded, only after.
Forcing sub-form expansion
The above section explains why the expander must operate as it does, but it’s a little bit unsatisfying. What if we write a macro where we want certain sub-forms to be expanded before they are passed to us? Fortunately, the Racket macro system provides an API to handle this use case, too.
It is true that the Racket macro system never automatically expands sub-forms before outer forms are expanded, but macro transformers can explicitly op-in to recursive expansion via the local-expand
function. This function effectively yields control back to the expander to expand some arbitrary piece of syntax as an expression, and when it returns, the macro transformer can inspect the expanded expression however it wishes. In theory, this can be used to implement extensible macros that allow macroexpansion in locations other than expression position.
To give an example of such a macro, consider the Racket match
form, which implements an expressive pattern-matcher as a macro. One of the most interesting qualities of Racket’s match
macro is that its pattern language is user-extensible, essentially allowing pattern-level macros. For example, a user might find they frequently match against natural numbers, and they wish to be able to write (nat n)
as a shorthand for (? exact-nonnegative-integer? n)
. Fortunately, this is easy using define-match-expander
:
(define-match-expander nat
(syntax-parser
[(_ pat)
#'(? exact-nonnegative-integer? pat)]))
> (match '(-5 -2 4 -7)
[(list _ ... (nat n) _ ...)
n])
4
Clearly, match
is somehow expanding the nat
match expander as a part of its expansion. Is it using local-expand
?
Well, no. While a previous blog post of mine has illustrated that it is possible to do such a thing with local-expand
via some clever trickery, local-expand
is really designed to expand expressions. This is a problem, since (nat n)
is not an expression, it’s a pattern: it will expand into (? exact-nonnegative-integer? n)
, which will lead to a syntax error, since ?
is not bound in the world of expressions. Instead, for a long while, match
and forms like it have emulated how the expander performs macroexpansion in ad-hoc ways. Fortunately, as of Racket v7.0, the new local-apply-transformer
API provides a way to invoke recursive macroexpansion in a consistent way, and it doesn’t assume that what’s being expanded is an expression.
A closer look at local-apply-transformer
If local-apply-transformer
is the answer, what does it actually do? Well, local-apply-transformer
allows explicitly invoking a transformer function on some piece of syntax and retrieving the result. In other words, local-apply-transformer
allows expanding an arbitrary macro, but since it doesn’t make any assumptions about what the output will be, it only expands it once: just a single step of macro transformation.
To illustrate, we can write a macro that uses local-apply-transformer
to invoke a transformer function and preserve the result using quote-syntax
:
(require (for-syntax syntax/apply-transformer))
(define-for-syntax flip
(syntax-parser
[(a b more ...)
#'(b a more ...)]))
(define-simple-macro (mac)
#:with result (local-apply-transformer flip #'(([x 1]) let x) 'expression)
(quote-syntax result))
When we use mac
, our flip
function will be applied, as a macro, to the syntax object we provide:
> (mac)
#<syntax (let ((x 1)) x)>
Alright, so this works, but it raises some questions. Why is flip
defined as a function at phase 1 (using define-for-syntax
) instead of as a macro (using define-syntax
)? What’s the deal with the 'expression
argument to local-apply-transformer
given that local-apply-transformer
is supposedly decoupled from expression expansion? And finally, how is this any different from just calling our flip
function on the syntax object directly by writing (flip #'(([x 1]) let x))
?
Let’s start with the first of those questions: why is flip
defined as a function rather than as a macro? Well, local-apply-transformer
is a fairly low-level operation: remember, it doesn’t assume anything about the argument it’s given! Therefore, it doesn’t take an expression containing a macro and expand it based on its structure, it needs to be explicitly provided the macro transformer function to apply. In practice, this might not seem very useful, since presumably we want to write our macros as macros, not as phase 1 functions. Fortunately, it’s possible to look up the function associated with a macro binding using the syntax-local-value
function, so if we use that, we can define flip
using define-syntax
as usual:
(define-syntax flip
(syntax-parser
[(a b more ...)
#'(b a more ...)]))
(define-simple-macro (mac)
#:with result (local-apply-transformer (syntax-local-value #'flip)
#'(([x 1]) let x)
'expression)
(quote-syntax result))
Now for the next question: what is the meaning of the 'expression
argument? This one is more of a historical artifact than anything else: when the expander applies a macro transformer, it does it in a “context”, which is accessible using the syntax-local-context
function. This context can be one of a predefined enumeration of cases, including 'expression
, 'top-level
, 'module
, 'module-begin
, or a list representing a definition context. Whether or not any of those actually apply to our use case, we still have to pick one, but aside from how they affect the value returned by syntax-local-context
(which some macros inspect), the value we choose is largely irrelevant. Using 'expression
will do, even if it’s a bit of a lie.
Finally, how does any of this differ from just applying the function we get directly? Well, the critical answer is all about hygiene. Racket’s macro system is hygienic, which, among other things, ensures bindings defined with the same name in different places do not unintentionally conflict. Racket’s hygiene mechanism is implemented in the macroexpander, when macro transformers are applied. If we just applied the flip
transformer procedure to a syntax object directly, we would circumvent this hygiene mechanism, potentially causing all sorts of problems. By using local-apply-transformer
, we ensure hygiene is preserved.
There is one small problem left with our program, however. Can you spot it? The key is to consider what would happen if we used flip
as an ordinary macro, without using local-apply-transformer
:
> (flip (([x 1]) let x))
let: bad syntax
in: let
What happened? Well, remember that when a macro in Racket is used, it receives the whole use site as a syntax object: in this case, #'(flip (([x 1]) let x))
. This means that flip
ought to be written to parse its argument slightly differently:
(define-syntax flip
(syntax-parser
[(_ (a b more ...))
#'(b a more ...)]))
Indeed, now that we’ve properly restructured the macro, we can easily switch to using the convenient define-simple-macro
shorthand:
(define-simple-macro (flip (a b more ...))
(b a more ...))
This means we also need to update our definition of mac
to provide the full syntax object the expander would:
(define-simple-macro (mac)
#:with result (local-apply-transformer (syntax-local-value #'flip)
#'(flip (([x 1]) let x))
'expression)
(quote-syntax result))
This might seem redundant, but remember, local-apply-transformer
is very low-level! While the convention that (mac . _)
is the syntax for a macro transformation might seem obvious, local-apply-transformer
makes no assumptions. It just does what we tell it to do.
Applying local-apply-transformer
So what does local-apply-transformer
have to do with the problem at the beginning of this blog post? Well, as it happens, we can use local-apply-transformer
to implement a macro that allows expansion anywhere using some simple tricks. While it’s true that we cannot magically divine which locations ought to be expanded, what we can do is explicitly annotate which places to expand.
To do this, we will implement a macro, expand-inside
, that looks for subforms annotated with a special $expand
identifier and performs macro transformation on those locations before proceeding with ordinary macroexpansion. Using the clj-binding-pairs
example from the beginning of this blog post, our solution to that problem will look like this:
(define-simple-macro (clj-binding-pairs [{~seq x:id e:expr} ...])
([x e] ...))
> (expand-inside
(let ($expand
(clj-binding-pairs
[x 1
y 2]))
(+ x y)))
3
Put another way, expand-inside
will force eager expansion on any subform surrounded with an $expand
annotation.
We’ll start by defining the $expand
binding itself. This binding won’t mean anything at all outside of expand-inside
, but we’d like it to be a unique binding so that users can rename it (using, rename-in
, for example) if they wish. To do this, we’ll use the usual trick of defining it as a macro that always produces an error if it’s ever used:
(define-syntax ($expand stx)
(raise-syntax-error #f "illegal outside an ‘expand-inside’ form" stx))
Next, we’ll implement a syntax class that will form the bulk of our implementation of expand-inside
. Since we need to find uses of $expand
that might be deeply-nested inside the syntax object provided to expand-inside
, we need to recursively look through the syntax object, find any instances of $expand
, and put it all back together once we’re done. This can be done relatively cleanly using a recursive syntax class:
(begin-for-syntax
(define-syntax-class do-expand-inside
#:literals [$expand]
#:attributes [expansion]
[pattern {~or $expand ($expand . _)}
#:with :do-expand-inside (do-$expand this-syntax)]
[pattern (a:do-expand-inside . b:do-expand-inside)
#:attr expansion
(let ([reassembled (cons (attribute a.expansion)
(attribute b.expansion))])
(if (syntax? this-syntax)
(datum->syntax this-syntax reassembled
this-syntax this-syntax)
reassembled))]
[pattern _ #:attr expansion this-syntax]))
There are some tricky details to get right in the reassembly of pairs, since syntax lists are actually composed of ordinary pairs rather than syntax pairs, but ultimately, the code for walking a syntax object is small. The key case of this syntax class is the call to do-$expand
in the first clause, which we have not yet defined. This function will actually handle performing the expansion by invoking local-apply-transformer
:
(begin-for-syntax
(define (do-$expand stx)
(syntax-parse stx
[(_ {~and form {~or trans (trans . _)}})
#:declare trans (static (disjoin procedure? set!-transformer?)
"syntax transformer")
(local-apply-transformer (attribute trans.value)
#'form
'expression)])))
This uses the handy static
syntax class that comes with syntax/parse
, which implicitly handles the call to syntax-local-value
and produces a nice error message if the value returned does not match a predicate. All we have to do is apply the transformer value bound to the trans.value
attribute using local-apply-transformer
, and now the expand-macro
can be written in just a couple lines of code:
(define-syntax-parser expand-inside
#:track-literals
[(_ form:do-expand-inside) #'form.expansion])
(Using the #:track-literals
option, also new in Racket v7.0, ensures that Check Syntax will be able to recognize the uses of $expand
that disappear from after expand-inside
is expanded.)
Putting everything together, our example from above really works:
(define-simple-macro (clj-binding-pairs [{~seq x:id e:expr} ...])
([x e] ...))
> (expand-inside
(let ($expand
(clj-binding-pairs
[x 1
y 2]))
(+ x y)))
3
That’s it. All told, the entire implementation is only about 30 lines of code. For a full, compilable, working example, see this gist.