Pointers in Solidity?

How can you call it knowledge if it's not shared?

Spotchecks are one of the core activities of The Red Guild. These are time-bounded interest-driven code reviews on projects that we periodically hand-pick, prioritizing them by their impact on the Ethereum ecosystem.

Spotchecks give us tons of insights into the latest Ethereum applications. Our alchemists then transform these insights into learnings, and learnings into educational content. That we share with you.

While running a spotcheck on one implementation of the imposing ERC-4337 for Account Abstraction, I found many parts of the code that pushed me out of my comfort zone. Some because of how short and specific their logic was, yet surprisingly complex once I started digging.

Today I'm sharing with you the first of three of these scenarios.

It all began with the following two functions of the EntryPoint contract. A singleton contract at the core of Account Abstraction. You might not be familiarized with the whole contract - but don't worry. We'll study the behavior of these functions without looking at their actual use-case in the system.

function getOffsetOfMemoryBytes(bytes memory data) internal pure returns (uint256 offset) {
	assembly {offset := data}
}

function getMemoryBytesFromOffset(uint256 offset) internal pure returns (bytes memory data) {
	assembly {data := offset}
}

What are these function doing? Well, my first impression was that they were implementing some kind of type casting. Which seemed strange. Once I paid more attention, I understood.

Look at the first one:

function getOffsetOfMemoryBytes(bytes memory data) internal pure returns (uint256 offset) {
	assembly {offset := data}
}

If you're familiar with C-style memory pointers, like (ptr = *data), you could say the function above is implementing the Solidity equivalent. It's reading the memory pointer of an in-memory variable. Though of course since the EVM's memory is short-term and associated to a call frame, we just get the memory position to that variable in that specific call.

Variables inside an assembly block work different from regular Solidity variables.

For storage location variables, Solidity won't even let you reference them directly. You would need to use their .slot property (only available in assembly) to get the slot position where the value is stored in the contract's storage. And then load the value from it using sload(var.slot).

🗒️

There are a few more things to consider for storage variables, for example, if the variable's type is of size smaller than 256 bits, it might be stored packed with another variable. Then you would need to use .offset.

But when working with types in a memory location, such as in the EntryPoint::getOffsetOfMemoryBytes function, you can reference the variable directly. The value it returns is the position in memory where the data starts.

🗒️

If you wanted to access the content of the variable data in assembly, instead of its location, you'd use mload(data).

Check out the next function. Notice that the inverse works as well.

function getMemoryBytesFromOffset(uint256 offset) internal pure returns (bytes memory data) {
	assembly { data := offset }
}

When the code returns a bytes memory variable, Solidity expects it to be a position in memory pointing to where it should retrieve its contents.

🗒️

Remember that the position handled in the named return variable offset above, being a bytes data type, points to a value that indicates the size of the actual contents of offset. Not to the contents of the variable itself.

Building a test

To put my knowledge to the test, I created a proof-of-concept in Remix.

I copied both previous functions to a testing contract (marking them as internal), and then added these two:

function test_getOffsetOfMemoryBytes() public pure returns (uint256) { 

	bytes memory aVariable = "Some stuff that uses memory space";

	bytes memory anotherVariable = "Another stuff that uses memory space in a larger way, because this seems to be a monologue more than just a simple piece of text. Still going mah friend? dayummmmmm";

	bytes memory data = "The real data, though this has nothing in particular. Not going to make this much longer than this.";

	return getOffsetOfMemoryBytes(data);
}

function test_getMemoryBytesFromOffset(uint256 offset) public pure returns (bytes memory) {
  
	bytes memory aVariable = "Some stuff that uses memory space";
  
	bytes memory anotherVariable = "Another stuff that uses memory space in a larger way, because this seems to be a monologue more than just a simple piece of text. Still going mah friend? dayummmmmm";
  
	bytes memory data = "The real data, though this has nothing in particular. Not going to make this much longer than this.";
  
	return getMemoryBytesFromOffset(offset);
}

Both test functions declare the same variables. This means the memory layout will be identical, and I should get the same results. Why do this? Think about it for a moment. Remember what I said above! The memory only has what's in the call executed at that moment. That's why I set getMemoryBytesFromOffset and getOffsetOfMemoryBytes to be internal.

So, to test the expected behavior, I needed to recreate the same memory layout. Which, bear in mind, is the following:

bytes memory aVariable = "Some stuff that uses memory space";

bytes memory anotherVariable = "Another stuff that uses memory space in a longer way, because this seems to be a monologue more than just a simple piece of text. Still going mah friend? dayummmmmm";

bytes memory data = "The real data, though this has nothing in particular. Not going to make this much longer than this.";

‌In Solidity, the convention is that the free-memory pointer is stored at memory position 0x40. Its value always starts at 0x80 (128), increasing as memory is allocated. So if our scenario entails the declaration of the previous three variables, memory will hold (starting from 0x80) three data structures which have first their size, and then their actual contents.

Inspecting memory layout

Memory slots are handled in multiples of 32 bytes. But the content of aVariable in my test is 33 bytes. There is 1 byte that moves into a new slot, occupying it entirely.

Let's see that visually. Starting at the memory position where the free memory pointer is located, we will have the following data.

First, a 32-byte segment specifying the size of the aVariable with the value 33 (0x21) for the 33 bytes of its 33 characters, followed by its content (all those characters).

0x80: len(aVariable) == 33 0x0000000000000000000000000000000000000000000000000000000000000021 ????????????????????????????????

0xa0: content of aVariable part #1 0x536f6d6520737475666620746861742075736573206d656d6f72792073706163
Some stuff that uses memory spac

0xc0: content of aVariable part #2 0x6500000000000000000000000000000000000000000000000000000000000000 e???????????????????????????????

Second, a 32-byte segment specifying the size of anotherVariable (0xa4 or 164 bytes), and its content.

0xe0: len(anotherVariable) == 164 0x00000000000000000000000000000000000000000000000000000000000000a4 ????????????????????????????????

0x100: content of anotherVariable part #1 0x416e6f7468657220737475666620746861742075736573206d656d6f72792073
Another stuff that uses memory s

0x120: content of anotherVariable part #2 0x7061636520696e2061206c6f6e676572207761792c2062656361757365207468
pace in a longer way? because th

0x140: content of anotherVariable part #3 0x6973207365656d7320746f2062652061206d6f6e6f6c6f677565206d6f726520
 is seems to be a monologue more

0x160: content of anotherVariable part #4 0x7468616e206a75737420612073696d706c65207069656365206f662074657874
than just a simple piece of text

0x180: content of anotherVariable part #5 0x2e205374696c6c20676f696e67206d616820667269656e643f20646179756d6d
? Still going mah friend? dayumm

0x1a0: content of anotherVariable part #6 0x6d6d6d6d00000000000000000000000000000000000000000000000000000000 mmmm????????????????????????????

Third and final, a 32-byte segment specifying the length of data (0x63 or 99 bytes), also followed by its content.

0x1c0: len(data) == 99 0x0000000000000000000000000000000000000000000000000000000000000063 ???????????????????????????????c

0x1e0: content of data part #1 0x546865207265616c20646174612c2074686f756768207468697320686173206e
The real data? though this has n

0x200: content of data part #2 0x6f7468696e6720696e20706172746963756c61722e204e6f7420676f696e6720
othing in particular? Not going 

0x220: content of data part #3 0x746f206d616b652074686973206d756368206c6f6e676572207468616e207468
to make this much longer than th

0x240: content of data part #4 0x69732e0000000000000000000000000000000000000000000000000000000000 is??????????????????????????????

We can validate this by calling test_getOffsetOfMemoryBytes() and comparing its return value (448) to the position of the beginning of the data variable in memory (0x1c0). They should be and indeed are equal, since 448 is the decimal representation of 0x1c0.

Now that we have the memory position of the data variable we can use our other test function with it to get its contents from memory: test_getMemoryBytesFromOffset(0x1c0). Doing so will return the hexadecimal representation of the string:
0x546865207265616c20646174612c2074686f756768207468697320686173206e6f7468696e6720696e20706172746963756c61722e204e6f7420676f696e6720746f206d616b652074686973206d756368206c6f6e676572207468616e20746869732e

🗒️

You can use tools like CyberChef to help you interpret encoded data. Transforming the bytes structure from hex will show data's string.

Now, if we wanted to calculate the memory position of the data variable manually, we could do it like this:

0x80 + (0x20 * (ceil(aVariable / 32) + 1) + (0x20 * (ceil(anotherVariable / 32) + 1))

0x80 + (0x20 * (2 + 1)) + (0x20 * (6 + 1))

0x80 + 0x20 * 10 = 0x1c0 (448 in hex)

🗒️

I'm using the ceiling function in the divisions because we need memory slots multiple of 32 bytes.

If you were to try this on Remix, or scroll up to see the layout memory space, you will be able to verify that indeed 0x1c0 holds the beginning of the data bytes structure!

What did we learn?

That it's possible to use low-level assembly to read memory pointers!

Now that you've assimilated this, we encourage you to take a closer look at the EntryPoint contract, and figure out why these utility functions are needed.

See you in our next tale!

Pointers in Solidity?

Building a test

Inspecting memory layout

What did we learn?

You'll also like

Question until it crashes

What is a security spotcheck at The Red Guild?

To Return, or to Revert, is gas the question?

Catch me if you can!