🐌
Yours truly is not an expert in compilers or assembly, but I do enjoy delving into these topics and learning something new from time to time.
What’s this even about?
Here’s a C++ function that does nothing useful except for wasting a bit of electricity. Ba Dum Tss!
void f(int a, int b, int c, int d) { bool value; value = (a + b) == (c + d); }
It contains an expression value = (a + b) == (c + d)
. Asking a C++ compiler to translate this function without optimizations results in assembly code with the portion corresponding to our expression looking like this:
mov eax, dword ptr [rbp - 4] ; `rbp - 4` is the address of `a`
add eax, dword ptr [rbp - 8] ; `rbp - 8` is the address of `b`
mov ecx, dword ptr [rbp - 12] ; `rbp - 12` is the address of `c`
add ecx, dword ptr [rbp - 16] ; `rbp - 16` is the address of `d`
cmp eax, ecx
sete al
mov byte ptr [rbp - 17], al ; `rbp - 17` is the address of `value`
Let’s pick this code apart
Having an abstract syntax tree (AST) representing the expression handy will help us work our way through the assembly.
The dashed arrows are not part of the AST. I added them to illustrate the results of evaluating their respective subexpressions:
- Either the compiler allocates a register to temporarily hold the subexpression’s result and emits assembly to load it into this register. Afterwards the compiler operates with the register’s name when processing the parent AST node.
- Or it simply finds the address of a variable by its name. And operates with this address when it comes back up to the parent AST node.
To translate our expression into assembly, the compiler walks the AST.
It starts at the root — the =
node, noticing it has two children. So the compiler has to deal with the child nodes before it can emit assembly for their parent. As the post-order tree-traversal algorithm dictates, it walks the children from left (value
) to right (==
).
Thus, it enters the value
node first. It’s a leaf-node (i.e., this node has no children), so the compiler can take care of the node itself 1. value
is an identifier referencing a local bool
variable — 1 byte2 that has a designated place (rbp - 17
) on the stack. Our compiler has two options:
- It can either temporarily allocate a register and emit instructions to load the variable from the stack into it. Later, the compiler will use the register to emit assembly for the parent node.
- Or it can decide it doesn’t need to emit any instructions. Then the compiler just finds the address of
value
on the stack and keeps track of it. The translation algorithm will need this address later, when it comes back up to the parent node.
In this case, the parent node is an assignment binary operator (=
), the compiler will translate it into a mov
instruction. One of the operands of the mov
can be a memory address3. To take advantage of that, the translation algorithm decides to save on emitting redundant instructions and resorts to resolving the address of value
.
Next, the compiler enters the right child node ==
, which itself has two children: the left +
node and the right +
node.
In line with the AST walking rules, the translation algorithm enters the left +
node, which has two children of its own: the a
and b
nodes.
Again, walking the left subtree must happen first, so the translation algorithm enters the a
node. This one is a leaf-node. Similar to value
, a
is an identifier, but referencing an int
parameter — a 4 bytes value that, in our case, has a designated place (rbp - 4
) on the stack. This time, the parent node is an addition binary operator, the compiler will translate it into an add
instruction. One of the operands of the add
can be a memory address and the other has to be a register. The compiler decides to emit assembly that loads a
into a temporary register eax
:
mov eax, dword ptr [rbp - 4] ; `rbp - 4` is the address of `a`
Now the translation algorithm can move on to the right child of the left +
node: it enters the b
node. It’s a leaf-node, b
is an identifier referencing an int
parameter — a 4 bytes value, that, again, resides at a designated address (rbp - 8
) on the stack. The sibling’s went into the eax
register. As a result, the compiler can avoid emitting redundant assembly for this node, and limit itself to resolving the address of b
.
At this point both children of the left +
node have been processed, so it’s time to emit instructions for the node itself. Adding two numbers is the add
instruction’s job, thus the compiler is going to emit it.
add eax, dword ptr [rbp - 8] ; `rbp - 8` is the address of `b`
The line of code above adds the value in eax
(a
) to the value in memory at the address rbp - 8
, which happens to point at b
on the stack. add
stores the sum in eax
— that’s OK, the compiler doesn’t need eax
’s previous value anymore. Hence, at this stage eax
is tracked as the temporary register containing the result of the left +
node’s evaluation.
Once the translation algorithm is done with the left +
subtree, it starts walking the right +
subtree. The process is exactly the same, except for a different register used to store the intermediate results. The ecx
register contains the result of the right +
subtree evaluation.
At this step, the compiler goes back up to the ==
node. Both of its children have been processed, their respective values are in the registers eax
and ecx
. With that in place, The compiler is ready to generate the assembly:
cmp eax, ecx
sete al
This is the first time we’ve seen two instructions per a single node. In contrast to add
, cmp
doesn’t modify its operands — the comparison result is encoded in a special EFLAGS
register. Hence, the translation algorithm has to emit sete al
, which sets the al
register’s value to 0 or 1 according the the comparison result. In other words, sete al
materializes the comparison result in al
.
From here, the compiler goes back up to the =
node. By now it took care of the children. The result of visiting the value
node is an address rbp - 17
, and the result of evaluating the ==
subtree is going to be in al
at runtime. Finally, the compiler is ready to generate assembly code for the =
node. As =
is the expression AST’s root, this completes the expression translation.
mov byte ptr [rbp - 17], al ; `rbp - 17` is the address of `value`
=
is an assignment — the corresponding assembly instruction is mov
. Despite its name, mov
doesn’t move data from its source operand to the destination operand, but copies it. The destination operand comes first, it’s rbp - 17
— the address of value
. The source comes second, it’s the al
register, which temporarily holds the result of comparison (a + b) == (c + d)
.
Summary
This is it, we finally have our original expression value = (a + b) == (c + d)
encoded in assembly. Yay!
A general pattern here is to break down a complex expression into smaller subexpressions according to the operations priorities. Each subexpression should be small enough to have a straightforward mapping into assembly. Then the compiler emits instructions to evaluate a part’s result and store this result into a register allocated to temporarily hold it.
This way, when dealing with a subexpression which has child subexpressions of its own, the compiler simply looks up the names or registers allocated to hold the results of the child subexpressions. With these registers at hand, it’s straightforward to generate instructions corresponding to the subexpression’s operation “replacing” the child subexpressions with their respective register names.
Taking care of a node is called visiting it in the tree-traversal algorithm’s parlance. ↩︎
The size of
bool
is defined by a compiler implementation. ↩︎In general, x86 machine code can’t encode an instruction with two explicit memory operands. ↩︎