Chrestomathy!Table of Contents
This blog-post assumes that the reader already has an inkling of what macro-instructions –or “macros” for short– actually are. Even if not, however, their essence ought to be intuited through the examples provided.
No other introduction will be offered. We'll learn by tinkering, the engineer way.
Declaration
Obviously, the first step is to declare a macro, like so:
So far, so good. Now, what do we put inside?
Simplest thing that works
The simplest thing, of course, would be a macro that takes no arguments. It would look like the following:
which then, when invoked with
babys_first_macro!;
would output My first macro worked!.
That's not so scary, is it?
More matchers
Anything non-trivial would require our macro to actually take some arguments. Our first instinct might be to create a different macro for this, but there is no need; we can just declare that the macro can take different arguments, and implement it differently in each of those cases.
Please note: we have to delimit each pair of blocks with a semi-colon, to help the compiler figure out where one ends and the other begins. For me personally, it is easier to just always end each block with a semi-colon.
Then, we invoke this via
babys_first_macro!(/*Comment!*/);
babys_first_macro![1]; // ← With brackets
babys_first_macro!{2}; // ← With braces
babys_first_macro!(1+1);
babys_first_macro!(1 + 1);
babys_first_macro!(dmnkly);
which outputs
My first macro worked!
One.
Two.
One plus one.
One plus one.
Monkey Island quote.
This snippet offers some crucial first insights into how macros work. For one, it handily explains a difference between macros and functions: Functions can only ever be called with one set of arguments, so their arguments are listed immediately after their name. Macros, on the other hand, can be called with lots of different sets of arguments (henceforth called “matchers”), so they have to be declared within the macro's braces rather than outside.
Other important details:
- Macros can be invoked with parentheses (
()), brackets ([]), or braces ({}). Technically, so can the matchers and expansions, but by convention those are delimited by parentheses and braces respectively. - Macro inputs don't have to contain metavariables. (Metavariables are described in the next section.) We can use anything inside our matchers. There's a reason they're not called “argument lists”.
- Non-metavariable inputs don't even get parsed. After all, parsing the input of the macro is your job, not the compiler's. We saw above that
1+1leads to a different invocation than just2. On the other hand… - White-space is ignored. The
1 + 1matcher is dead code, because the1+1matcher is satisfied first. And the reason the1+1matcher is satisfied first is because… - Matchers are examined top-to-bottom. Thus, if the
1 + 1matcher were moved above the1+1matcher, then the1+1matcher would become dead code instead. - Comments within macro invocations are ignored.
Metavariables
Of course, usually, when we pass something as an argument to a macro, we want to be able to then use it within the macro. To do that, we must declare it to be a metavariable.
Metavariables are always prefixed with a dollar sign, $. For the purposes of this section, they must always be accompanied by an explanation as to what exactly they are. This explanation is formally called a fragment specifier. There are 14 of them in Rust, but in this article we will focus on just two: expr (for expressions) and ty (for types).
A super common target for macro usage is trait implementations, because they can get very verbose and repetitive. We will therefore use one for our example.
Let there be a trait, which just offers a constant expression:
And let there also be six random data-types, for which we want to implement this trait:
;
;
;
;
;
;
We want to implement SimpleTrait for all 6 of those, without noise and repetitions. Declarative macros are ideal for that, so let's use them!
To do this, we obviously need to know which type to implement the trait for (ie a ty) and what value CONSTANT should be set to (ie an expr). Those will be the two metavariables that our macro must accept.
So, let's see:
That's all there is to it! The only new information here is that metavariables must be prefixed by a dollar sign in both their declaration and usage.
Afterwards, we can call the macro like so:
impl_simple_trait!;
impl_simple_trait!;
impl_simple_trait!;
impl_simple_trait!;
impl_simple_trait!;
impl_simple_trait!;
and verify its correct operation as follows:
println!;
which outputs
1 2 3 4 5 6
It works! It does useful work, even! But… we can improve it. We have the technology.
Repetition
One big advantage that macros have over ordinary functions (and the reason why println! is a macro instead of an ordinary function) is that macros can be declared to take arbitrary numbers of arguments, not necessarily a fixed number. We can use this to our advantage.
I personally like to think of repetition in macro arguments as creating a metavariable out of another metavariable. For example, if we have a metavariable and a comma like $expression: expr, , then an arbitrary amount of expressions separated by commas is denoted by $($expression: expr, )*. In other words, we take our previous matcher, enclose it in parentheses, stick a dollar sign in front, and put an asterisk immediately afterwards.
Why an asterisk? It's one of three choices we can make. * stands for “zero or more times”, + stands for “at least once”, and ? stands for “at most once”. Those are called “repeat operators”, for obvious reasons.
In our case we have two metavariables, one type and one expression. We could delimit them all with commas (like A, 1, B, 2, C, 3, etc) but that's a bit error-prone. Therefore, after each type we will be using commas as before, and after each expression we will be using a semi-colon, like A, 1; B, 2; C, 3; etc.
(Why semi-colons? Because in this case it's basically the only other delimiter that Rust permits us to use.)
Having explained all this, there are two ways to proceed. In keeping with our tinker-first approach, we'll first detail the more intuitive one.
Recursion
First option is: Since we already have an implementation for the case of a single type, why not just call it and then recurse?
Our structure will be as follows:
- The
$our_type: ty, $our_expression: exprmatcher will be kept as-is - We will also include an empty matcher that does nothing, as an end for our recursion
- If we have an arbitrary amount of
$our_type: ty, $our_expression: expr;pairs, we will separate the first pair, pass it on to the first matcher, then call the macro again with the rest of the pairs.
This will look as follows:
All that's left is to call this macro with impl_simple_trait!(A, 1; B, 2; C, 3; D, 4; E, 5; F, 6;);, and it works just as before!
Iteration
Recursion is a mighty fine option, but it cramps your style. It's not as powerful as straight-forward iteration, after all. Most of all, you're here to hone your craft, and to do that you need the 1337est possible solution.
Syntactically, iterating through a macro's metavariables is done exactly the same way as its declaration: Enclose the whole thing in parentheses, stick a dollar sign in front, and put the repeat operator immediately afterwards.
That's all there is to it.
Making the trailing semi-colon optional
By now, there's just one thing irritating your (my) perfectionism: The trailing semi-colon is mandatory. Can we get rid of it?
Yes we can, relatively easily in fact! But it needs a trick that's not apparent from what we 've said so far.
The solution is to first create a matcher that disallows a trailing semi-colon, and then at the end match on an optional one.
A matcher that disallows a trailing semi-colon
Even with what we know so far, we can do this relatively straight-forwardly. First we match on the first pair without its semi-colon, and then we match on a repetition that has the semi-colon at the beginning, rather than the end.
However, Rust offers a much simpler solution, that I only managed to figure out… after I initially published this article.
As it turns out, if the final character inside the parenthesis of the repetition is a separator, you can move it between the closing parenthesis and repetition mark (ie one position to the right) to disallow its usage in a trailing position. So something like
does what we want with much less fuss.
An optional trailing semi-colon
We said before that an ? repetition mark means “at most once”. Thus, if you've been paying attention, you already know the drill: instead of ; we say $(;)?. Enclose, dollar, repeat operator.
With that in mind, our new implementation is as follows:
For values, you sometimes need double braces
Suppose we have a few values. We want to use a macro to sum them, and then square the result.
So we write a macro to do that…
…aaaaand Rust is angry at us. What with error: expected expression, found `let` statement, or error: macro expansion ignores `sum` and any tokens following or warning: trailing semicolon in macro used in expression position.
That's a lot of errors for something with a very simple solution: For this expander, we need double braces, not single. So we change it to
and it works without issue. Either do your own deep-dive into why that is, or just remember that if your macro doesn't work then it might work if you double the braces in the expander. That said, in this case, it could also work if we phrased it as a one-liner somehow.
Macros can see their environment
One last thing, before concluding this article. Take the following code:
Ignore for now the #[macro_export] and pub(crate) use increment_x lines, they're not what we're here to focus on. The thing to focus on is that the two increment_x macros are exactly identical, but only one of them is useable within main. Uncommenting the some_module::increment_x!() line will make the program fail to compile.
Why is that? It's because the second macro can, from the place it is declared, already see a variable named x in scope, so when called it increments that. In contrast, the first macro –the one that's declared within some_module– does not see any variable named x during its declaration, so calling it leads to an error.