Stack-machines
The Ethereum VM is stack-based. This means the operands for the instructions are taken from the stack, and it is also where the results are added.
Here is a very simple example of how adding 5 and 10 would work in a stack machine. Assume there is a PUSH
instruction that adds an integer to the stack, and an ADD
instruction that pops the topmost two items on the stack, adds them, and puts the sum back on top of the stack. This is how adding would look:
|
|
The stack now has one item on it: 15.
If we wanted to add 5, 10, and 20, it could be done in a similar way:
|
|
There are other ways of doing it as well, for example we could push all three numbers on the stack directly and just run ADD
twice.
|
|
The data will also have to come from somewhere. The way it could be done is by giving each instruction a byte-code, i.e. a number from 0 to 255. By doing so, we can pass instructions and input values in a simple byte array. We can use these rules:
PUSH
is the byte0x01
ADD
is the byte0x02
Numbers are interpreted as regular 32 bit integers meaning they will occupy 4 bytes, e.g. 5 is
0x00000005
.
We also add a few rules, so that the bytecode can be interpreted.
The first byte is always an instruction.
The byte following an
ADD
is always an instruction.The four bytes following a
PUSH
is to be interpreted as a number.
Now we can write byte-code for the first program.
|
|
Note that it could still fail because for example an ADD
instruction could be added before there are two items on the stack, but that is not important now.
Either way, if we run this now it will be possible to push and add numbers; for example, this javascript function can do it (unless numbers get too big; there’s no proper handling of integer over/underflow):
|
|
The EVM Stack
The EVM is much more complex then this simple example, obviously, but addition would actually be done in much the same way. One big difference is that the EVM has a word-size of 32 bytes instead of the 4 bytes I used in the example. It also has several PUSH
instructions - one for each possible number of bytes (PUSH1
to PUSH32
). This is how it would look:
|
|
This should add 1 and 3.
PUSH1
is 0x60
and ADD
is 0x01
, and we will add a STOP
at the end 0x00
, so the bytecode array becomes:
600160030100
Running this in the JavaScript VM (using my own tools) produces this:
|
|
Most other arithmetic operations (and similar) works in pretty much the same way.
Manual stack manipulation
It is possible to manipulate the items on the stack using POP
, SWAP
and DUP
.
POP
Pop will remove the top item. I will now add it to the code from the previous example, right before the STOP
.
|
|
POP
is 0x50
, so the bytecode is: 60016003015000
Output:
|
|
SWAP
SWAP
has several versions, depending on which items you want to swap. SWAP1
will swap the first with the second, SWAP2
the first with the third, and so on until SWAP16
.
I will run the original example except change ADD
to SWAP1
. This means changing 0x01
to 0x90
.
|
|
Bytecode: 600160039000
Output:
|
|
DUP
DUP
will copy a stack item and put the copy on top of the stack. It has several versions as well, and the principle is the same. DUP1
will duplicate the first item, DUP2
the third, and so on until DUP16
.
I will run the original example except change ADD
to DUP2
, copying the second stack item. This means changing 0x01
to 0x81
.
|
|
Bytecode: 600160038100
Output:
|
|
Flow control
Flow control is done through jumps and conditional statements. A plainJUMP
will set the program counter to the value that’s currently on top of the stack. JUMPI
is a conditional jump. It will set the program counter to the value on top of the stack, if the next stack item is not zero. It is often used in combination with conditionals like EQ
and LT
.
It is not possible to set the PC to any value, the code at that point has to be a JUMPDEST
. I will now add a number of items to the stack, but do a jump that will skip some.
|
|
This should add 0x01
and 0x02
to the stack, jump past the next two PUSH1
s to index 11, then stop.
Bytecode: 60016002600B56600360045B00
|
|
JUMPI
can be done in the same way, but without input it gets complicated, so a true conditional jump will instead be done in a later post about calldata. Now I will simply make two examples - one where I push 0 as the “condition”, and one where I pass 1.
Successful conditional jump:
|
|
Bytecode: 60016001600B57600360045B00
|
|
Failed conditional jump:
|
|
Bytecode: 60016000600B57600360045B00
|
|
Notice it will pass JUMPDEST
in the failed example as well. This is fine, because JUMPDEST
does not affect normal execution; it simply moves past it to the next instruction.
Coming posts
Knowing how to work with the stack and program counter is a good start. Arithmetic, bitwise, comparison operations is easy to use, and does not need a lot of explaining. Same thing with operations like ADDRESS
, CALLER
, BLOCKHASH
, etc. Instead, the coming posts will likely be on calldata and memory, since memory makes it possible to return data, and calldata makes it possible to provide input. Storage will also be treated.
Once the basics are covered I will move on to more advanced things like calling contracts from contracts, creating new contracts, manually calling libraries, copying bytecode, etc.