Package ssz
provides a zero-allocation, opinionated toolkit for working with Ethereum's Simple Serialize (SSZ) format through Go. The focus is on code maintainability, only secondarily striving towards raw performance.
Please note, this repository is a work in progress. The API is unstable and breaking changes will regularly be made. Do not depend on this in publicly available modules.
This package heavily inspired from the code generated by- and contained within fastssz
!
- Elegant API surface: Binary protocols are low level constructs and writing encoders/decoders entails boilerplate and fumbling with details. Code generators can do a good job in achieving performance, but with a too low level API, the generated code becomes impossible to humanly maintain. That isn't an issue, until you throw something at the generator it cannot understand (e.g. multiplexed types), at which point you'll be in deep pain. By defining an API that is elegant from a dev perspective, we can create maintainable code for the special snowflake types, yet still generate it for the rest of boring types.
- Reduced redundancies: The API aims to make the common case easy and the less common case possible. Redundancies in user encoding/decoding code are deliberately avoided to remove subtle bugs (even at a slight hit on performance). If the user's types require some asymmetry, explicit encoding and decoding code paths are still supported.
- Support existing types: Serialization libraries often assume the user is going to define a completely new, isolated type-set for all the things they want to encode. That is simply not the case, and to introduce a new encoding library into a pre-existing codebase, it must play nicely with the existing types. That means common Go typing and aliasing patterns should be supported without annotating everything with new methods.
- Performant, as meaningful: Encoding/decoding code should be performant, even if we're giving up some of it to cater for the above goals. Language constructs that are known to be slow (e.g. reflection) should be avoided, and code should have performance similar to low level generated ones, including 0 needing allocations. That said, a meaningful application of the library will do something with the encoded data, which will almost certainly take more time than generating/parsing a binary blob.
Whilst we aim to be a become the SSZ encoder of go-ethereum
- and more generally, a go-to encoder for all Go applications requiring to work with Ethereum data blobs - there is no guarantee that this outcome will occur. At the present moment, this package is still in the design and experimentation phase and is not ready for a formal proposal.
There are several possible outcomes from this experiment:
- We determine the effort required to implement all SSZ features are not worth it, abandoning this package.
- All the needed features are shipped, but the package is rejected in favor of some other superior design.
- The API this package gets merged into some existing library and this work gets abandoned in its favor.
- The package turns out simple, safe and performant enough to be added to
go-ethereum
as a test. - Some other unforeseen outcome of the infinite possibilities.
The ssz
package splits the responsibility between user code and library code in the way pictured below:
- Users are responsible for creating Go structs, which are mapped one-to-one to the SSZ container type.
- The library is responsible for creating all other SSZ types from the fields of the user-defined structs.
- Some SSZ types require specific types to be used due to robustness and performance reasons.
- SSZ unions are not implemented as they are an unused (and disliked) feature in Ethereum.
The Simple Serialize spec has schema definitions for mapping SSZ data to JSON. We believe in separation of concerns. This library does not concern itself with encoding/decoding from formats other than SSZ.
First up, you need to add the package to your project:
go get github.com/karalabe/ssz
Some types in Ethereum will only contain a handful of statically sized fields. One example is a Withdrawal
:
type Address [20]byte
type Withdrawal struct {
Index uint64
Validator uint64
Address Address
Amount uint64
}
To encode/decode such an object via SSZ, it needs to implement the ssz.StaticObject
interface:
type StaticObject interface {
// SizeSSZ returns the total size of an SSZ object.
SizeSSZ() uint32
// DefineSSZ defines how an object would be encoded/decoded.
DefineSSZ(codec *Codec)
}
- The
SizeSSZ
seems self-explanatory. It returns the total size of the final SSZ, and for static types such as aWithdrawal
, you need to calculate this by hand (or by a code generator, more on that later). - The
DefineSSZ
is more involved. It expects you to define what fields, in what order and with what types are going to be encoded. Essentially, it's the serialization format.
func (w *Withdrawal) SizeSSZ() uint32 { return 44 }
func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
ssz.DefineUint64(codec, &w.Index) // Field (0) - Index - 8 bytes
ssz.DefineUint64(codec, &w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.DefineStaticBytes(codec, &w.Address) // Field (2) - Address - 20 bytes
ssz.DefineUint64(codec, &w.Amount) // Field (3) - Amount - 8 bytes
}
- The
DefineXYZ
methods should feel self-explanatory. They spill out what fields to encode in what order and into what types. The interesting tidbit is the addressing of the fields. Since this code is used for both encoding and decoding, it needs to be able to instantiate anynil
fields during decoding, so pointers are needed.
To encode the above Withdrawal
into an SSZ stream, use either ssz.EncodeToStream
or ssz.EncodeToBytes
. The former will write into a stream directly, whilst the latter will write into a bytes buffer directly. In both cases you need to supply the output location to avoid GC allocations in the library.
func main() {
out := new(bytes.Buffer)
if err := ssz.EncodeToStream(out, new(Withdrawal)); err != nil {
panic(err)
}
fmt.Printf("ssz: %#x\n", blob)
}
To decode an SSZ blob, use ssz.DecodeFromStream
and ssz.DecodeFromBytes
with the same disclaimers about allocations. Note, decoding requires knowing the size of the SSZ blob in advance. Unfortunately, this is a limitation of the SSZ format.
Most data types in Ethereum will contain a cool mix of static and dynamic data fields. Encoding those is much more interesting, yet still proudly simple. One such a data type would be an ExecutionPayload
as seen below:
type Hash [32]byte
type LogsBLoom [256]byte
type ExecutionPayload struct {
ParentHash Hash
FeeRecipient Address
StateRoot Hash
ReceiptsRoot Hash
LogsBloom LogsBLoom
PrevRandao Hash
BlockNumber uint64
GasLimit uint64
GasUsed uint64
Timestamp uint64
ExtraData []byte
BaseFeePerGas *uint256.Int
BlockHash Hash
Transactions [][]byte
Withdrawals []*Withdrawal
}
Do note, we've reused the previously defined Address
and Withdrawal
types. You'll need those too to make this part of the code work. The uint256.Int
type is from the github.com/holiman/uint256
package.
To encode/decode such an object via SSZ, it needs to implement the ssz.DynamicObject
interface:
type DynamicObject interface {
// SizeSSZ returns either the static size of the object if fixed == true, or
// the total size otherwise.
SizeSSZ(fixed bool) uint32
// DefineSSZ defines how an object would be encoded/decoded.
DefineSSZ(codec *Codec)
}
If you look at it more closely, you'll notice that it's almost the same as ssz.StaticObject
, except the type of SizeSSZ
is different, here taking an extra boolean argument. The method name/type clash is deliberate: it guarantees compile time that dynamic objects cannot end up in static ssz slots and vice versa.
func (e *ExecutionPayload) SizeSSZ(fixed bool) uint32 {
// Start out with the static size
size := uint32(512)
if fixed {
return size
}
// Append all the dynamic sizes
size += ssz.SizeDynamicBytes(e.ExtraData) // Field (10) - ExtraData - max 32 bytes (not enforced)
size += ssz.SizeSliceOfDynamicBytes(e.Transactions) // Field (13) - Transactions - max 1048576 items, 1073741824 bytes each (not enforced)
size += ssz.SizeSliceOfStaticObjects(e.Withdrawals) // Field (14) - Withdrawals - max 16 items, 44 bytes each (not enforced)
return size
}
Opposed to the static Withdrawal
from the previous section, ExecutionPayload
has both static and dynamic fields, so we can't just return a pre-computed literal number.
- First up, we will still need to know the static size of the object to avoid costly runtime calculations over and over. Just for reference, that would be the size of all the static fields in the object + 4 bytes for each dynamic field (offset encoding). Feel free to verify the number
512
above.- If the caller requested only the static size via the
fixed
parameter, return early.
- If the caller requested only the static size via the
- If the caller, however, requested the total size of the object, we need to iterate over all the dynamic fields and accumulate all their sizes too.
- For all the usual Go suspects like slices and arrays of bytes; 2D sliced and arrays of bytes (i.e.
ExtraData
andTransactions
above), there are helper methods available in thessz
package. - For types implementing
ssz.StaticObject / ssz.DynamicObject
(e.g. one item ofWithdrawals
above), there are again helper methods available to use them as single objects, static array of objects, of dynamic slice of objects.
- For all the usual Go suspects like slices and arrays of bytes; 2D sliced and arrays of bytes (i.e.
The codec itself is very similar to the static example before:
func (e *ExecutionPayload) DefineSSZ(codec *ssz.Codec) {
// Define the static data (fields and dynamic offsets)
ssz.DefineStaticBytes(codec, &e.ParentHash) // Field ( 0) - ParentHash - 32 bytes
ssz.DefineStaticBytes(codec, &e.FeeRecipient) // Field ( 1) - FeeRecipient - 20 bytes
ssz.DefineStaticBytes(codec, &e.StateRoot) // Field ( 2) - StateRoot - 32 bytes
ssz.DefineStaticBytes(codec, &e.ReceiptsRoot) // Field ( 3) - ReceiptsRoot - 32 bytes
ssz.DefineStaticBytes(codec, &e.LogsBloom) // Field ( 4) - LogsBloom - 256 bytes
ssz.DefineStaticBytes(codec, &e.PrevRandao) // Field ( 5) - PrevRandao - 32 bytes
ssz.DefineUint64(codec, &e.BlockNumber) // Field ( 6) - BlockNumber - 8 bytes
ssz.DefineUint64(codec, &e.GasLimit) // Field ( 7) - GasLimit - 8 bytes
ssz.DefineUint64(codec, &e.GasUsed) // Field ( 8) - GasUsed - 8 bytes
ssz.DefineUint64(codec, &e.Timestamp) // Field ( 9) - Timestamp - 8 bytes
ssz.DefineDynamicBytesOffset(codec, &e.ExtraData, 32) // Offset (10) - ExtraData - 4 bytes
ssz.DefineUint256(codec, &e.BaseFeePerGas) // Field (11) - BaseFeePerGas - 32 bytes
ssz.DefineStaticBytes(codec, &e.BlockHash) // Field (12) - BlockHash - 32 bytes
ssz.DefineSliceOfDynamicBytesOffset(codec, &e.Transactions, 1_048_576, 1_073_741_824) // Offset (13) - Transactions - 4 bytes
ssz.DefineSliceOfStaticObjectsOffset(codec, &e.Withdrawals, 16) // Offset (14) - Withdrawals - 4 bytes
// Define the dynamic data (fields)
ssz.DefineDynamicBytesContent(codec, &e.ExtraData, 32) // Field (10) - ExtraData
ssz.DefineSliceOfDynamicBytesContent(codec, &e.Transactions, 1_048_576, 1_073_741_824) // Field (13) - Transactions
ssz.DefineSliceOfStaticObjectsContent(codec, &e.Withdrawals, 16) // Field (14) - Withdrawals
}
Most of the DefineXYZ
methods are similar as before. However, you might spot two distinct sets of method calls, DefineXYZOffset
and DefineXYZContent
. You'll need to use these for dynamic fields:
- When SSZ encodes a dynamic object, it encodes it in two steps.
- A 4-byte offset pointing to the dynamic data is written into the static SSZ area.
- The dynamic object's actual encoding are written into the dynamic SSZ area.
- Encoding libraries can take two routes to handle this scenario:
- Explicitly require the user to give one command to write the object offset, followed by another command later to write the object content. This is fast, but leaks out encoding detail into user code.
- Require only one command from the user, under the hood writing the object offset immediately, and stashing the object itself away for later serialization when the dynamic area is reached. This keeps the offset notion hidden from users, but entails a GC hit to the encoder.
- This package was decided to be allocation free, thus the user is needs to be aware that they need to define the dynamic offset first and the dynamic content later. It's a tradeoff to achieve 50-100% speed increase.
- You might also note that dynamic fields also pass in size limits, in two places nonetheless. This is an unfortunate asymmetry in the SSZ spec wrt encoding and hashing data layouts.
- During encoding/decoding, dynamic data is placed at the end of the SSZ blob, so limits need to be passed to the
DefineXYZContent
methods. - During hashing, dynamic data is merkleized inline, mixed with static data, so limits need to be passed to the
DefineXYZOffset
methods. - This is a bit unfortunate. Either parameter set could be avoided at the cost of internal tracking, but that would break 0-alloc.
- During encoding/decoding, dynamic data is placed at the end of the SSZ blob, so limits need to be passed to the
To encode the above ExecutionPayload
do just as we have done with the static Withdrawal
object.
For types defined in perfect isolation - dedicated for SSZ - it's easy to define the fields with the perfect types, and perfect sizes, and perfect everything. Generating or writing an elegant encoder for those, is easy.
In reality, often you'll need to encode/decode types which already exist in a codebase, which might not map so cleanly onto the SSZ defined structure spec you want (e.g. you have one union type of ExecutionPayload
that contains all the Bellatrix, Capella, Deneb, etc fork fields together) and you want to encode/decode them differently based on the context.
Most SSZ libraries will not permit you to do such a thing. Reflection based libraries cannot infer the context in which they should switch encoders and can neither can they represent multiple encodings at the same time. Generator based libraries again have no meaningful way to specify optional fields based on different constraints and contexts.
The only way to handle such scenarios is to write the encoders by hand, and furthermore, encoding might be dependent on what's in the struct, whilst decoding might be dependent on what's it contained within. Completely asymmetric, so our unified codec definition approach from the previous sections cannot work.
For these scenarios, this package has support for asymmetric encoders/decoders, where the caller can independently implement the two paths with their unique quirks.
To avoid having a real-world example's complexity overshadow the point we're trying to make here, we'll just convert the previously demoed Withdrawal
encoding/decoding from the unified codec
version to a separate encoder
and decoder
version.
func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
codec.DefineEncoder(func(enc *ssz.Encoder) {
ssz.EncodeUint64(enc, w.Index) // Field (0) - Index - 8 bytes
ssz.EncodeUint64(enc, w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.EncodeStaticBytes(enc, &w.Address) // Field (2) - Address - 20 bytes
ssz.EncodeUint64(enc, w.Amount) // Field (3) - Amount - 8 bytes
})
codec.DefineDecoder(func(dec *ssz.Decoder) {
ssz.DecodeUint64(dec, &w.Index) // Field (0) - Index - 8 bytes
ssz.DecodeUint64(dec, &w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.DecodeStaticBytes(dec, &w.Address) // Field (2) - Address - 20 bytes
ssz.DecodeUint64(dec, &w.Amount) // Field (3) - Amount - 8 bytes
})
}
- As you can see, we piggie-back on the already existing
ssz.Object
'sDefineSSZ
method, and do not require implementing new functions. This is good because we want to be able to seamlessly use unified or split encoders without having to tell everyone about it. - Whereas previously we had a bunch of
DefineXYZ
method to enumerate the fields for the unified encoding/decoding, here we replaced them with separate definitions for the encoder and decoder viacodec.DefineEncoder
andcodec.DefineDecoder
. - The implementation of the encoder and decoder follows the exact same pattern and naming conventions as with the
codec
but instead of operating on assz.Codec
object, we're operating on anssz.Encoder
/ssz.Decoder
objects; and instead of calling methods namedssz.DefineXYZ
, we're calling methods namedssz.EncodeXYZ
andssz.DecodeXYZ
. - Perhaps note, the
EncodeXYZ
methods do not take pointers to everything anymore, since they do not require the ability to instantiate the field during operation. Still, static bytes are passed by pointer to avoid heavy copy overheads for large arrays.
Encoding the above Withdrawal
into an SSZ stream, you use the same thing as before. Everything is seamless.
If your types are using strongly typed arrays (e.g. [32]byte
, and not []byte
) for static lists, the above codes work just fine. However, some types might want to use []byte
as the field type, but have it still behave as if it was [32]byte
. This poses an issue, because if the decoder only sees []byte
, it cannot figure out how much data you want to decode into it. For those scenarios, we have checked methods.
The previous Withdrawal
is a good example. Let's replace the type Address [20]byte
alias, with a plain []byte
slice (not a [20]byte
array, rather an opaque []byte
slice).
type Withdrawal struct {
Index uint64
Validator uint64
Address []byte
Amount uint64
}
The code for the SizeSSZ
remains the same. The code for DefineSSZ
changes ever so slightly:
func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
ssz.DefineUint64(codec, &w.Index) // Field (0) - Index - 8 bytes
ssz.DefineUint64(codec, &w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.DefineCheckedStaticBytes(codec, &w.Address, 20) // Field (2) - Address - 20 bytes
ssz.DefineUint64(codec, &w.Amount) // Field (3) - Amount - 8 bytes
}
Notably, the ssz.DefineStaticBytes
call from our old code (which got given a [20]byte
array), is replaced with ssz.DefineCheckedStaticBytes
. The latter method operates on an opaque []byte
slice, so if we want it to behave like a static sized list, we need to tell it how large it's needed to be. This will result in a runtime check to ensure that the size is correct before decoding.
Note, checked methods entail a runtime cost. When decoding such opaque slices, we can't blindly fill the fields with data, rather we need to ensure that they are allocated and that they are of the correct size. Ideally only use checked methods for prototyping or for pre-existing types where you just have to run with whatever you have and can't change the field to an array.
More often than not, the Go structs that you'd like to serialize to/from SSZ are simple data containers. Without some particular quirk you'd like to explicitly support, there's little reason to spend precious time counting the bits and digging through a long list of encoder methods to call.
For those scenarios, the library also supports generating the encoding/decoding code via a Go command:
go run github.com/karalabe/ssz/cmd/sszgen --help
Let's go back to our very simple Withdrawal
type from way back.
type Withdrawal struct {
Index uint64
Validator uint64
Address [20]byte
Amount uint64
}
This seems like a fairly simple thing that we should be able to automatically generate a codec for. Let's try:
go run github.com/karalabe/ssz/cmd/sszgen --type Withdrawal
Calling the generator on this type will produce the following (very nice I might say) code:
// Code generated by github.com/karalabe/ssz. DO NOT EDIT.
package main
import "github.com/karalabe/ssz"
// SizeSSZ returns the total size of the static ssz object.
func (obj *Withdrawal) SizeSSZ() uint32 {
return 8 + 8 + 20 + 8
}
// DefineSSZ defines how an object is encoded/decoded.
func (obj *Withdrawal) DefineSSZ(codec *ssz.Codec) {
ssz.DefineUint64(codec, &obj.Index) // Field (0) - Index - 8 bytes
ssz.DefineUint64(codec, &obj.Validator) // Field (1) - Validator - 8 bytes
ssz.DefineStaticBytes(codec, &obj.Address) // Field (2) - Address - 20 bytes
ssz.DefineUint64(codec, &obj.Amount) // Field (3) - Amount - 8 bytes
}
It has everything we would have written ourselves: SizeSSZ
and DefineSSZ
... and it also has a lot of useful comments we for sure wouldn't have written outselves. Generator for the win!
Ok, but this was too easy. All the fields of the Withdrawal
object were primitive types of known lengths, so there's no heavy lifting involved at all. Lets take a look at a juicier example.
For our complex test, lets pick our dynamic ExecutionPayload
type from before, but lets make it as hard as it gets and remove all size information from the Go types (e.g. instead of using [32]byte
, we can make it extra hard by using []byte
only).
Now, obviously, if we were to write serialization code by hand, we'd take advantage of our knowledge of what each of these fields is semantically, so we could provide the necessary sizes for a decoder to use. If we want to, however, generate the serialization code, we need to share all that "insider-knowledge" with the code generator somehow.
The standard way in Go world is through struct tags. Specifically in the context of this library, it will be through the ssz-size
and ssz-max
tags. These follow the convention set previously by other Go SSZ libraries;
ssz-size
can be used to declare a field having a static sizessz-max
can be used to declare a field having a dynamic size with a size cap.- Both tags support multiple dimensions via comma-separation and omitting via
?
type ExecutionPayload struct {
ParentHash []byte `ssz-size:"32"`
FeeRecipient []byte `ssz-size:"32"`
StateRoot []byte `ssz-size:"20"`
ReceiptsRoot []byte `ssz-size:"32"`
LogsBloom []byte `ssz-size:"256"`
PrevRandao []byte `ssz-size:"32"`
BlockNumber uint64
GasLimit uint64
GasUsed uint64
Timestamp uint64
ExtraData []byte `ssz-max:"32"`
BaseFeePerGas *uint256.Int
BlockHash []byte `ssz-size:"32"`
Transactions [][]byte `ssz-max:"1048576,1073741824"`
Withdrawals []*Withdrawal `ssz-max:"16"`
}
Calling the generator as before, just with the ExecutionPayload
yields in the below, much more interesting code:
// Code generated by github.com/karalabe/ssz. DO NOT EDIT.
package main
import "github.com/karalabe/ssz"
// SizeSSZ returns either the static size of the object if fixed == true, or
// the total size otherwise.
func (obj *ExecutionPayload) SizeSSZ(fixed bool) uint32 {
var size = uint32(32 + 32 + 20 + 32 + 256 + 32 + 8 + 8 + 8 + 8 + 4 + 32 + 32 + 4 + 4)
if fixed {
return size
}
size += ssz.SizeDynamicBytes(obj.ExtraData)
size += ssz.SizeSliceOfDynamicBytes(obj.Transactions)
size += ssz.SizeSliceOfStaticObjects(obj.Withdrawals)
return size
}
// DefineSSZ defines how an object is encoded/decoded.
func (obj *ExecutionPayload) DefineSSZ(codec *ssz.Codec) {
// Define the static data (fields and dynamic offsets)
ssz.DefineCheckedStaticBytes(codec, &obj.ParentHash, 32) // Field ( 0) - ParentHash - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.FeeRecipient, 32) // Field ( 1) - FeeRecipient - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.StateRoot, 20) // Field ( 2) - StateRoot - 20 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.ReceiptsRoot, 32) // Field ( 3) - ReceiptsRoot - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.LogsBloom, 256) // Field ( 4) - LogsBloom - 256 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.PrevRandao, 32) // Field ( 5) - PrevRandao - 32 bytes
ssz.DefineUint64(codec, &obj.BlockNumber) // Field ( 6) - BlockNumber - 8 bytes
ssz.DefineUint64(codec, &obj.GasLimit) // Field ( 7) - GasLimit - 8 bytes
ssz.DefineUint64(codec, &obj.GasUsed) // Field ( 8) - GasUsed - 8 bytes
ssz.DefineUint64(codec, &obj.Timestamp) // Field ( 9) - Timestamp - 8 bytes
ssz.DefineDynamicBytesOffset(codec, &obj.ExtraData, 32) // Offset (10) - ExtraData - 4 bytes
ssz.DefineUint256(codec, &obj.BaseFeePerGas) // Field (11) - BaseFeePerGas - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.BlockHash, 32) // Field (12) - BlockHash - 32 bytes
ssz.DefineSliceOfDynamicBytesOffset(codec, &obj.Transactions, 1048576, 1073741824) // Offset (13) - Transactions - 4 bytes
ssz.DefineSliceOfStaticObjectsOffset(codec, &obj.Withdrawals, 16) // Offset (14) - Withdrawals - 4 bytes
// Define the dynamic data (fields)
ssz.DefineDynamicBytesContent(codec, &obj.ExtraData, 32) // Field (10) - ExtraData - ? bytes
ssz.DefineSliceOfDynamicBytesContent(codec, &obj.Transactions, 1048576, 1073741824) // Field (13) - Transactions - ? bytes
ssz.DefineSliceOfStaticObjectsContent(codec, &obj.Withdrawals, 16) // Field (14) - Withdrawals - ? bytes
}
Points of interests to note:
- The generator realized that this type contains dynamic fields (either through
ssz-max
tags or via embedded dynamic objects), so it generated an implementation forssz.DynamicObject
(vs.ssz.StaticObject
in the previous section). - The generator took into consideration all the size
ssz-size
andssz-max
fields to generate serialization calls with different based types and runtime size checks.- Note, it is less performant to have runtime size checks like this, so if you know the size of a field, arrays are always preferable vs dynamic lists.
We've seen that the size of a field can either be deduced automatically, or it can be provided to the generator explicitly. But what happens if we provide an ssz struct tag for a field of known size?
type Withdrawal struct {
Index uint64 `ssz-size:"8"`
Validator uint64 `ssz-size:"8"`
Address [20]byte `ssz-size:"32"` // Deliberately wrong tag size
Amount uint64 `ssz-size:"8"`
}
go run github.com/karalabe/ssz/cmd/sszgen --type Withdrawal
failed to validate field Withdrawal.Address: array of byte basic type tag conflict: field is 20 bytes, tag wants [32] bytes
The code generator will take into consideration the information in both the field's Go type and the struct tag, and will cross validate them against each other. If there's a size conflict, it will abort the code generation.
This functionality can be very helpful in detecting refactor issues, where the user changes the type of a field, which would result in a different encoding. By having the field tagged with an ssz-size
, such an error would be detected.
As such, we'd recommend always tagging all SSZ encoded fields with their sizes. It results in both safer code and self-documenting code.
Perhaps just a mention, anyone using the code generator should call it from a go:generate
compile instruction. It is much simpler and once added to the code, it can always be called via running go generate
.
When generating code for multiple types at once (with one call or many), there's one ordering issue you need to be aware of.
When the code generator finds a field that is a struct of some sort, it needs to decide if it's a static or a dynamic type. To do that, it relies on checking if the type implements the ssz.StaticObject
or ssz.DynamicObject
interface. If if doesn't implement either, the generator will error.
This means, however, that if you have a type that's embedded in another type (e.g. in our examples above, Withdrawal
was embedded inside ExecutionPayload
in a slice), you need to generate the code for the inner type first, and then the outer type. This ensures that when the outer type is resolving the interface of the inner one, that is already generated and available.
Half the SSZ spec is about encoding/decoding data into a binary format, the other half is about proving the data via Merkle Proofs.
The same way that encoding/decoding has a "symmetric" and "asymmetric" API, so does merkleization. What's more, the symmetric API is actually exactly the same as for encoding/decoding, with no code changes necessary!
Taking our very simple Withdrawal
type and it's codec code:
type Address [20]byte
type Withdrawal struct {
Index uint64
Validator uint64
Address Address
Amount uint64
}
func (w *Withdrawal) SizeSSZ() uint32 { return 44 }
func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
ssz.DefineUint64(codec, &w.Index) // Field (0) - Index - 8 bytes
ssz.DefineUint64(codec, &w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.DefineStaticBytes(codec, &w.Address) // Field (2) - Address - 20 bytes
ssz.DefineUint64(codec, &w.Amount) // Field (3) - Amount - 8 bytes
}
Hashing this works out of the box. To merkleize the above Withdrawal
and calculate it's merkel trie root, use either ssz.HashSequential
or ssz.HashConcurrent
. The former will run on a single thread and use 0 allocations, whereas the latter might run on multiple threads concurrently (if large enough fields are present) and use O(1) memory.
func main() {
hash := ssz.HashSequential(new(Withdrawal))
fmt.Printf("hash: %#x\n", hash)
}
If for some reason you have a type that requires custom encoders/decoders, high chance, that it will also require a custom hasher. For those cases, this library provides an API surface very similar to how the asymmetric encoding/decoding worked:
func (w *Withdrawal) DefineSSZ(codec *ssz.Codec) {
codec.DefineEncoder(func(enc *ssz.Encoder) {
ssz.EncodeUint64(enc, w.Index) // Field (0) - Index - 8 bytes
ssz.EncodeUint64(enc, w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.EncodeStaticBytes(enc, &w.Address) // Field (2) - Address - 20 bytes
ssz.EncodeUint64(enc, w.Amount) // Field (3) - Amount - 8 bytes
})
codec.DefineDecoder(func(dec *ssz.Decoder) {
ssz.DecodeUint64(dec, &w.Index) // Field (0) - Index - 8 bytes
ssz.DecodeUint64(dec, &w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.DecodeStaticBytes(dec, &w.Address) // Field (2) - Address - 20 bytes
ssz.DecodeUint64(dec, &w.Amount) // Field (3) - Amount - 8 bytes
})
codec.DefineHasher(func(has *ssz.Hasher) {
ssz.HashUint64(has, w.Index) // Field (0) - Index - 8 bytes
ssz.HashUint64(has, w.Validator) // Field (1) - ValidatorIndex - 8 bytes
ssz.HashStaticBytes(has, &w.Address) // Field (2) - Address - 20 bytes
ssz.HashUint64(has, w.Amount) // Field (3) - Amount - 8 bytes
})
}
Hashing the above Withdrawal
into a Merkle trie root, you use the same thing as before. Everything is seamless.
The table below is a summary of the methods available for SizeSSZ
and DefineSSZ
:
- The Size API is to be used to implement the
SizeSSZ
method's dynamic parts. - The Symmetric API is to be used if the encoding/decoding/hashing doesn't require specialised logic.
- The Asymmetric API is to be used if encoding or decoding or hashing requires special casing.
If some type you need is missing, please open an issue, so it can be added.
¹Type is from github.com/holiman/uint256
.
²Type is from github.com/prysmaticlabs/go-bitfield
.
The goal of this package is to be close in performance to low level generated encoders, without sacrificing maintainability. It should, however, be significantly faster than runtime reflection encoders.
The package includes a set of benchmarks for handling the beacon spec types and test datasets. You can run them with go test ./tests --bench=.
. These can be interesting for some baseline numbers, but they are unrealistic with regard to live beacon state data.
If you want to see the performance on a more realistic piece of data, you'll need to provide a beacon state SSZ object and place it into the project root named state.ssz
. You can then run go test --bench=Mainnet ./tests/manual_test.go
to explicitly run this one benchmark. A sample output running against a 208MB state export from around June 11, 2024, on a MacBook Pro M2 Max:
go test --bench=Mainnet ./tests/manual_test.go
BenchmarkMainnetState/beacon-state/208757379-bytes/encode-12 26 45164494 ns/op 4622.16 MB/s 74 B/op 0 allocs/op
BenchmarkMainnetState/beacon-state/208757379-bytes/decode-12 27 40984980 ns/op 5093.51 MB/s 8456490 B/op 54910 allocs/op
BenchmarkMainnetState/beacon-state/208757379-bytes/merkleize-sequential-12 2 659472250 ns/op 316.55 MB/s 904 B/op 1 allocs/op
BenchmarkMainnetState/beacon-state/208757379-bytes/merkleize-concurrent-12 9 113414449 ns/op 1840.66 MB/s 16416 B/op 108 allocs/op