How can you call it knowledge if it's not shared?
Spotchecks are one of the core activities of The Red Guild. These are time-bounded interest-driven code reviews on projects that we periodically hand-pick, prioritizing them by their impact on the Ethereum ecosystem.
Spotchecks give us tons of insights into the latest Ethereum applications. Our alchemists then transform these insights into learnings, and learnings into educational content. That we share with you.
While running a spotcheck on one implementation of the imposing ERC-4337 for Account Abstraction, I found many parts of the code that pushed me out of my comfort zone. Some because of how short and specific their logic was, yet surprisingly complex once I started digging.
Today I'm sharing with you the first of three of these scenarios.
It all began with the following two functions of the EntryPoint contract. A singleton contract at the core of Account Abstraction. You might not be familiarized with the whole contract - but don't worry. We'll study the behavior of these functions without looking at their actual use-case in the system.
function getOffsetOfMemoryBytes(bytes memory data) internal pure returns (uint256 offset) {
assembly {offset := data}
}
function getMemoryBytesFromOffset(uint256 offset) internal pure returns (bytes memory data) {
assembly {data := offset}
}
What are these function doing? Well, my first impression was that they were implementing some kind of type casting. Which seemed strange. Once I paid more attention, I understood.
Look at the first one:
function getOffsetOfMemoryBytes(bytes memory data) internal pure returns (uint256 offset) {
assembly {offset := data}
}
If you're familiar with C-style memory pointers, like (ptr = *data)
, you could say the function above is implementing the Solidity equivalent. It's reading the memory pointer of an in-memory variable. Though of course since the EVM's memory is short-term and associated to a call frame, we just get the memory position to that variable in that specific call.
Variables inside an assembly
block work different from regular Solidity variables.
For storage
location variables, Solidity won't even let you reference them directly. You would need to use their .slot
property (only available in assembly) to get the slot position where the value is stored in the contract's storage. And then load the value from it using sload(var.slot)
.
.offset
.But when working with types in a memory
location, such as in the EntryPoint::getOffsetOfMemoryBytes
function, you can reference the variable directly. The value it returns is the position in memory where the data starts.
data
in assembly, instead of its location, you'd use mload(data)
.Check out the next function. Notice that the inverse works as well.
function getMemoryBytesFromOffset(uint256 offset) internal pure returns (bytes memory data) {
assembly { data := offset }
}
When the code returns a bytes memory
variable, Solidity expects it to be a position in memory pointing to where it should retrieve its contents.
offset
above, being a bytes
data type, points to a value that indicates the size of the actual contents of offset
. Not to the contents of the variable itself.Building a test
To put my knowledge to the test, I created a proof-of-concept in Remix.
I copied both previous functions to a testing contract (marking them as internal
), and then added these two:
function test_getOffsetOfMemoryBytes() public pure returns (uint256) {
bytes memory aVariable = "Some stuff that uses memory space";
bytes memory anotherVariable = "Another stuff that uses memory space in a larger way, because this seems to be a monologue more than just a simple piece of text. Still going mah friend? dayummmmmm";
bytes memory data = "The real data, though this has nothing in particular. Not going to make this much longer than this.";
return getOffsetOfMemoryBytes(data);
}
function test_getMemoryBytesFromOffset(uint256 offset) public pure returns (bytes memory) {
bytes memory aVariable = "Some stuff that uses memory space";
bytes memory anotherVariable = "Another stuff that uses memory space in a larger way, because this seems to be a monologue more than just a simple piece of text. Still going mah friend? dayummmmmm";
bytes memory data = "The real data, though this has nothing in particular. Not going to make this much longer than this.";
return getMemoryBytesFromOffset(offset);
}
Both test functions declare the same variables. This means the memory layout will be identical, and I should get the same results. Why do this? Think about it for a moment. Remember what I said above! The memory only has what's in the call executed at that moment. That's why I set getMemoryBytesFromOffset
and getOffsetOfMemoryBytes
to be internal
.
So, to test the expected behavior, I needed to recreate the same memory layout. Which, bear in mind, is the following:
bytes memory aVariable = "Some stuff that uses memory space";
bytes memory anotherVariable = "Another stuff that uses memory space in a longer way, because this seems to be a monologue more than just a simple piece of text. Still going mah friend? dayummmmmm";
bytes memory data = "The real data, though this has nothing in particular. Not going to make this much longer than this.";
In Solidity, the convention is that the free-memory pointer is stored at memory position 0x40
. Its value always starts at 0x80
(128), increasing as memory is allocated. So if our scenario entails the declaration of the previous three variables, memory will hold (starting from 0x80
) three data structures which have first their size, and then their actual contents.
Inspecting memory layout
Memory slots are handled in multiples of 32 bytes. But the content of aVariable
in my test is 33 bytes. There is 1 byte that moves into a new slot, occupying it entirely.
Let's see that visually. Starting at the memory position where the free memory pointer is located, we will have the following data.
First, a 32-byte segment specifying the size of the aVariable
with the value 33 (0x21
) for the 33 bytes of its 33 characters, followed by its content (all those characters).
0x80: len(aVariable) == 33 0x0000000000000000000000000000000000000000000000000000000000000021 ????????????????????????????????
0xa0: content of aVariable part #1 0x536f6d6520737475666620746861742075736573206d656d6f72792073706163
Some stuff that uses memory spac
0xc0: content of aVariable part #2 0x6500000000000000000000000000000000000000000000000000000000000000 e???????????????????????????????
Second, a 32-byte segment specifying the size of anotherVariable
(0xa4
or 164 bytes), and its content.
0xe0: len(anotherVariable) == 164 0x00000000000000000000000000000000000000000000000000000000000000a4 ????????????????????????????????
0x100: content of anotherVariable part #1 0x416e6f7468657220737475666620746861742075736573206d656d6f72792073
Another stuff that uses memory s
0x120: content of anotherVariable part #2 0x7061636520696e2061206c6f6e676572207761792c2062656361757365207468
pace in a longer way? because th
0x140: content of anotherVariable part #3 0x6973207365656d7320746f2062652061206d6f6e6f6c6f677565206d6f726520
is seems to be a monologue more
0x160: content of anotherVariable part #4 0x7468616e206a75737420612073696d706c65207069656365206f662074657874
than just a simple piece of text
0x180: content of anotherVariable part #5 0x2e205374696c6c20676f696e67206d616820667269656e643f20646179756d6d
? Still going mah friend? dayumm
0x1a0: content of anotherVariable part #6 0x6d6d6d6d00000000000000000000000000000000000000000000000000000000 mmmm????????????????????????????
Third and final, a 32-byte segment specifying the length of data
(0x63
or 99 bytes), also followed by its content.
0x1c0: len(data) == 99 0x0000000000000000000000000000000000000000000000000000000000000063 ???????????????????????????????c
0x1e0: content of data part #1 0x546865207265616c20646174612c2074686f756768207468697320686173206e
The real data? though this has n
0x200: content of data part #2 0x6f7468696e6720696e20706172746963756c61722e204e6f7420676f696e6720
othing in particular? Not going
0x220: content of data part #3 0x746f206d616b652074686973206d756368206c6f6e676572207468616e207468
to make this much longer than th
0x240: content of data part #4 0x69732e0000000000000000000000000000000000000000000000000000000000 is??????????????????????????????
We can validate this by calling test_getOffsetOfMemoryBytes()
and comparing its return value (448
) to the position of the beginning of the data variable in memory (0x1c0
). They should be and indeed are equal, since 448
is the decimal representation of 0x1c0
.
Now that we have the memory position of the data
variable we can use our other test function with it to get its contents from memory: test_getMemoryBytesFromOffset(0x1c0)
. Doing so will return the hexadecimal representation of the string:0x546865207265616c20646174612c2074686f756768207468697320686173206e6f7468696e6720696e20706172746963756c61722e204e6f7420676f696e6720746f206d616b652074686973206d756368206c6f6e676572207468616e20746869732e
Now, if we wanted to calculate the memory position of the data
variable manually, we could do it like this:
0x80 + (0x20 * (ceil(aVariable / 32) + 1) + (0x20 * (ceil(anotherVariable / 32) + 1))
0x80 + (0x20 * (2 + 1)) + (0x20 * (6 + 1))
0x80 + 0x20 * 10 = 0x1c0 (448 in hex)
If you were to try this on Remix, or scroll up to see the layout memory space, you will be able to verify that indeed 0x1c0
holds the beginning of the data
bytes
structure!
What did we learn?
That it's possible to use low-level assembly to read memory pointers!
Now that you've assimilated this, we encourage you to take a closer look at the EntryPoint
contract, and figure out why these utility functions are needed.
See you in our next tale!