Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
Merge pull request #358 from vulcanize/ethereum-data-structures
Browse files Browse the repository at this point in the history
Ethereum data structures
  • Loading branch information
warpfork authored Apr 25, 2021
2 parents 7e74a9e + f5ae917 commit 7f5db3b
Show file tree
Hide file tree
Showing 5 changed files with 503 additions and 0 deletions.
17 changes: 17 additions & 0 deletions data-structures/ethereum/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Ethereum as an IPLD Data Structure

Within these documents, schemas are grouped by their serialized blocks.
Other than those types listed in "Basic Types", the top-level schema type in each grouping of schema
types in a code block represents a data structure that is serialized into a single IPLD block with its own Link (CID).

There are some state data structures that are repeats of the same form: a modified merkle patricia trie node.
They are not de-duplicated here for clarity to demonstrate the different purposes and contents of those data structures.

For more information about the IPLD Schema language, see the [specification](https://specs.ipld.io/schemas/).

## Data Structure Descriptions

* [Ethereum Data Structures **Basic Types**](basic_types.md)
* [Ethereum **Chain** Data Structures](chain.md)
* [Ethereum **Convenience Types**](convenience_types.md)
* [Ethereum **State** Data Structures](state.md)
31 changes: 31 additions & 0 deletions data-structures/ethereum/basic_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Ethereum Data Structure Basic Types
These types are used throughout the Ethereum data structures but are themselves not IPLD blocks.

```ipldsch
# Go big.Int
# Prefer presenting to users either as a number or a string view of the decimal number
# for readability.
type BigInt bytes
# Unsigned integer
# Used to explicity specify that an integer cannot be negative
type Uint int
# Block nonce is an 8 byte binary representation of a block's nonce
type BlockNonce bytes
# Hash represents the 32 byte KECCAK_256 hash of arbitrary data.
type Hash bytes
# Address represents the 20 byte address of an Ethereum account.
type Address bytes
# Bloom represents a 256 byte bloom filter.
type Bloom bytes
# Balance represents an account's balance in units of wei (1*10^-18 ETH)
type Balance BigInt
# OpCode is a 1 byte EVM opcode
type OpCode bytes
```
113 changes: 113 additions & 0 deletions data-structures/ethereum/chain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Ethereum Chain Data Structures

This section contains the IPLD schemas for the blockchain data structures of Ethereum.
This includes: headers, uncle sets, transactions, and receipts. The state trie, storage trie,
receipt trie, and transaction trie IPLDs are described in the [state](state.md) section. It
is important to note that traversal from header to a specific transaction or receipt requires traversal
across their respective tries beginning at the root referenced in the header. Alternatively, uncles are referenced
directly from the header by the hash of the RLP encoded list of uncles.

## Header IPLD

This is the IPLD schema for a canonical Ethereum block header.
* The IPLD block is the RLP encoded header
* Links to headers use a KECCAK_256 mutlihash of the RLP encoded header and the EthHeader codec (0x90).
* Parent headers are referenced back to by their child header.
* The genesis header is unique in that it does not reference a parent header in `ParentCID`, instead it contains a reference to a `GenesisInfo` ADL.

```ipldsch
# Header contains the consensus fields of an Ethereum block header
type Header struct {
# CID link to the parent header
# This CID is composed of the KECCAK_256 multihash of the linked RLP encoded header and the EthHeader codec (0x90)
ParentCID &Header
# CID link to the list of uncles at this block
# This CID is composed of the KECCAK_256 multihash of the RLP encoded list of Uncles and the EthHeaderList codec (0x91)
# Note that an uncle is simply a header that does not have an associated body
UnclesCID &Uncles
Coinbase Address
# CID link to the root node of the state trie
# This CID is composed of the KECCAK_256 multhash of the RLP encoded state trie root node and the EthStateTrie codec (0x96)
# This steps us down into the state trie, from which we can link to the rest of the state trie nodes and all the linked storage tries
StateRootCID &StateTrieNode
# CID link to the root node of the transaction trie
# This CID is composed of the KECCAK_256 multihash of the RLP encoded tx trie root node and the EthTxReceiptTrie codec (0x92)
# This steps us down into the transaction trie, from which we can link to the rest of the tx trie nodes and all of the linked transactions
TxRootCID &TxTrieNode
# CID link to the root of the receipt trie
# This CID is composed of the KECCAK_256 multihash of the RLP encoded rct trie root node and the EthTxReceiptTrie codec (0x94)
# This steps us down into the receipt trie, from which we can link to the rest of the rct trie nodes and all of the linked receipts
RctRootCID &RctTrieNode
Bloom Bloom
Difficulty BigInt
Number BigInt
GasLimit Uint
GasUser Uint
Time Uint
Extra Bytes
MixDigest Hash
Nonce BlockNonce
}
```

## Uncles IPLD
This is the IPLD schema for a list of uncles ordered in ascending order by their block number.
* The IPLD block is the RLP encoded list of uncles
* CID links to `UncleList` use a KECCAK_256 multihash of the RLP encoded list and the EthHeaderList codec (0x92).
* The `Uncles` is referenced in an Ethereum `Header` by the `UnclesCID`.

```ipldsch
# Uncles contains an ordered list of Ethereum uncles (headers that have no associated body)
# This IPLD object is referenced by a CID composed of the KECCAK_256 multihash of the RLP encoded list and the EthHeaderList codec (0x91)
type Uncles [Header]
```

## Transaction IPLD
This is the IPLD schema for a canonical Ethereum transaction. It contains only the fields required for consensus.
Note that this will need to be updated once EIP-1559 and EIP-2718 are approved.
* The IPLD block is the RLP encoded transaction
* CID links to `Transaction` use a KECCAK_256 multihash of the RLP encoded transaction and the EthTx codec (0x93).
* `Transaction` IPLDs are not referenced directly from an `Ethereum` header but are instead linked to from within the transaction trie whose root is referenced in the `Header` by the `TxRootCID`.
```ipldsch
# Transaction contains the consensus fields of an Ethereum transaction
type Transaction struct {
AccountNonce Uint
Price BigInt
GasLimit Uint
Recipient nullable Address # null recipient means the tx is a contract creation
Amount BigInt
Payload Bytes
# Signature values
V BigInt
R BigInt
S BigInt
}
```

## Receipt IPLD
This is the IPLD schema for a canonical Ethereum receipt. It contains only the fields required for consensus.
* The IPLD block is the RLP encoded receipt
* CID links to `Receipt` use a KECCAK_256 multihash of the RLP encoded receipt and the EthTxReceipt codec (0x95).
* `Receipt` IPLDs are not referenced directly from an `Ethereum` header but are instead linked to from within the receipt trie whose root is referenced in the `Header` by the `RctRootCID`.
```ipldsch
# Receipt contains the consensus fields of an Ethereum receipt
type Receipt struct {
PostStateOrStatus Bytes
CumulativeGasUsed Uint
Bloom Bloom
Logs [Log]
}
# Log contains the consensus fields of an Etherem receipt log
type Log struct {
Address Address
Topics [Hash]
Data Bytes
}
```
208 changes: 208 additions & 0 deletions data-structures/ethereum/convenience_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# Convenience IPLD types

The types described below are not referenced directly from within the canonical Ethereum merkle tree.
Instead, these types can be constructed and verified from underlying canonical Ethereum IPLD structures using the algorithms described here.
These types are introduced to improve the convenience and performance of accessing and working with the Ethereum objects for certain purposes.

## Transaction Trace IPLD

Transaction traces contain the EVM context, input, and output for each individual OPCODE operation performed during the application of a transaction on a certain state.
These objects can be generated or verified by applying the referenced transactions on top of the referenced state.
* The IPLD block is the RLP encoded object
* CID links to `TxTrace` use a KECCAK_256 multihash of the RLP encoded object and the EthTxTrace codec (tbd).

```ipldsch
# TxTrace contains the EVM context, input, and output for each OPCODE in a transaction that was applied to a specific state
type TxTrace struct {
# List of CIDs linking to the transactions that were used to generate this trace by applying them onto the state referenced below
# If this trace was produced by the first transaction in a block then this list will contain only that one transaction
# and thistrace was produced by applying it directly to the referenced state
# Otherwise, only the last transaction in the list is the one directly responsible for producing this trace whereas the
# proceeding ones were sequentially applied to the referenced state to generate the intermediate state that the final,
# trace-producing transaction, was applied on top of
# This is analogous to the Transactions IPLD defined below, but only in the case of a trace produced by the last
# transaction in a block will the list be same as a complete Transaction IPLD
TxCIDs [&Transaction]
# CID link to the root node of the state trie that the above transaction set was applied on top of to produce this trace
StateRootCID &StateTrieNode
Result Bytes
Frames [Frame]
Gas Uint
Failed Bool
}
# Frame represents the EVM context, input, and output for a specific OPCODE during a transaction trace
type Frame struct {
Op OpCode
From Address
To Address
Input Bytes
Output Bytes
Gas Uint
Cost Uint
Value BigInt
}
```

Provided a `Header` multihash/CID and a transaction index, we can generate a `TxTrace` by
1) Fetching and decoding the `Header` IPLD.
2) Stepping down into the transaction trie referenced in the header.
1) Collecting the transaction at the provided index and all transactions with indexes lower than the provided index.
2) KECCAK_256 hashing each transaction.
3) Convert hashes to CIDs using the KECCAK_256 multihash and EthTx codec.
3) Order these CIDs in a list by transaction index.
3) Collect the `StateRootCID` from within this `Header`.
4) Use [ipfs-ethdb](https://github.com/vulcanize/ipfs-ethdb) with state root linked in the `Header` to instantiate an EVM on top
of the state of this block.
5) Apply each of the transactions on top of this state using the ipfs-ethdb based EVM.
6) For the final transaction applied, collect the trace output from the EVM.
7) Assemble the trace output, the `Transaction` CIDs, and the root `StateTrieNode` CID into the `TxTrace` object.

## Block IPLD

`Block` IPLD represents an entire block (header + body) in the Ethereum blockchain, it contains direct content hash references to
the sets of transactions and receipts for that block in order to avoid the need to traverse the transaction
and receipt tries to collect these sets (as is required when starting from a canonical `Header` block).
These objects can be generated or verified by following the links within the contained `Header` to collect the `Transactions` and `Receipts`
from the referenced transaction and receipt tries.
* The IPLD block is a CBOR serialization of the object
* CID links to `Block` use a KECCAK_256 multihash of the CBOR serialized object and the DagCbor codec (0x71).

```ipldsch
# Block represents an entire block in the Ethereum blockchain.
type Block struct {
# CID link to the header at this block
# This CID is composed of the KECCAK_256 multihash of the RLP encoded header and the EthHeader codec (0x90)
# Note that the header contains references to the uncles and tx, receipt, and state tries at this height
Header &Header
# CID link to the list of hashes for each of the transactions at this block
# This CID is composed of the KECCAK_256 multihash of the RLP encoded list of transaction hashes and the EthTxHashList codec (tbd)
Transactions &TransactionHashes
# CID link to the list of hashes for each of the receipts at this block
# This CID is composed of the KECCAK_256 multihash of the RLP encoded list of receipt hashes and the EthTxReceiptHashList codec (tbd)
Receipts &ReceiptHashes
}
```

Provided a `Header` multihash/CID, we can generate a `Block` IPLD by
1) Fetching and decoding the `Header` IPLD.
1) Stepping down into the transaction trie referenced in the header.
1) Collecting each transaction stored at the leaf nodes in the trie.
2) KECCAK_256 hashing each transaction.
3) Order these hashes in a list by transaction index.
4) KECCAK_256 hash the RLP encoded list.
5) Convert to CID using the KECCAK_256 multihash and EthTxHashList codec.
2) Stepping down into the receipt trie referenced in the header.
1) Collecting each receipt stored at the leaf nodes in the trie.
2) KECCAK_256 hashing each receipt.
3) Order these hashes in a list by receipt index.
4) KECCAK_256 hash the RLP encoded list.
5) Convert to CID using the KECCAK_256 multihash and EthTxReceiptHashList codec.
3) Assemble the `Header` CID, `Transactions` CID, and `Receipts` CID into the `Block` object.

## TransactionHashes IPLD

This is the IPLD schema for the ordered list of all transactions for a given block.
* The IPLD block is the RLP encoded list of transaction hashes
* CID links to `Transactions` use a KECCAK_256 multihash of the RLP encoded list of transaction hashes and the EthTxHashList codec (tbd).
* `Transactions` IPLDs are not referenced from any canonical Ethereum object, but are instead linked to from the above `Block` and `TxTrace` objects.

```ipldsch
# Transactions contains a list of CID that reference all of the Ethereum transactions at this block
# These CIDs are composed from the KECCAK_256 multihash of the referenced transaction and the EthTx codec (0x93)
type Transactions [&Transaction]
```

## ReceiptHashes IPLD

This is the IPLD schema for the ordered list of all receipts for a given block.
* The IPLD block is the RLP encoded list of receipt hashes
* CID links to `Receipts` use a KECCAK_256 multihash of the RLP encoded list of receipt hashes and the EthTxReceiptHashList codec (tbd)
* `Receipts` IPLDs are not referenced directly from any canonical Ethereum object, but are instead linked to from the above `Block` ADL object.

```ipldsch
# Receipts contains a list of CID that reference all of the receipts at this block
# These CIDs are composed from the KECCAK_256 multihash of the referenced receipt and the EthTxReceipt codec (0x95)
type Receipts [&Receipt]
```

## Genesis IPLD

This is the IPLD schema for the configuration settings and genesis allocations to produce a specific genesis block and begin an Ethereum
blockchain. It also includes a reference to the genesis block `Header` it produces. This is a single IPLD block at the base of an entire Ethereum chain.
* The IPLD block is a CBOR serialization of the object
* CID links to `GenesisInfo` use a KECCAK_256 multihash of the CBOR serialized object and the DagCbor codec (0x71).

```ipldsch
# GenesisInfo specifies the header fields, state of a genesis block, and hard fork switch-over blocks through the chain configuration.
# NOTE: we need a new multicodec type for the Genesis object
type GenesisInfo struct {
# CID link to the genesis header this genesis info produces
# This CID is composed of the KECCAK_256 multihash of the linked RLP encoded header and the EthHeader codec (0x90)
GensisHeader &Header
Config ChainConfig
Nonce Uint
Timestamp Uint
ExtraData Bytes
GasLimit Unit
Difficulty BigInt
Mixhash Hash
Coinbase Address
Alloc GenesisAlloc
# These fields are used for consensus tests. Please don't use them
# in actual genesis blocks.
Number Uint
GasUsed Uint
ParentHash Hash
}
# GenesisAlloc is a map that specifies the initial state that is part of the genesis block.
type GenesisAlloc {Address:GenesisAccount}
# GenesisAccount is an account in the state of the genesis block.
type GenesisAccount struct {
Code Bytes
Storage {Hash:Hash}
Balance BigInt
Nonce Uint
PrivateKey Bytes
}
# ChainConfig is the core config which determines the blockchain settings.
# ChainConfig is stored in the database on a per block basis.
# This means that any network, identified by its genesis block, can have its own set of configuration options.
# The ChainConfig referenced in GenesisInfo is used to produce the genesis block but is not necessarily used for later blocks down the chain.
type ChainConfig struct {
ChainID BigInt
HomesteadBlock BigInt
DAOForkBlock BigInt
DAOForkSupport Bool
EIP150Block BigInt
EIP150Hash Hash
EIP155Block BigInt
EIP158Block BigInt
ByzantiumBlock BigInt
ConstantinopleBlock BigInt
PetersburgBlock BigInt
IstanbulBlock BigInt
MuirGlacierBlock BigInt
YoloV2Block BigInt
EWASMBlock BigInt
# Various consensus engines
Ethash EthashConfig
Clique CliqueConfig
}
# EthashConfig is the consensus engine config for proof-of-work based sealing.
# At this time there are no configuration options for the Ethash engine.
type EthashConfig struct {} representation tuple
# CliqueConfig is the consensus engine config for proof-of-authority based sealing.
type CliqueConfig struct {
Period Uint
Epoch Uint
}
```
Loading

0 comments on commit 7f5db3b

Please sign in to comment.