This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 108
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #358 from vulcanize/ethereum-data-structures
Ethereum data structures
- Loading branch information
Showing
5 changed files
with
503 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Ethereum as an IPLD Data Structure | ||
|
||
Within these documents, schemas are grouped by their serialized blocks. | ||
Other than those types listed in "Basic Types", the top-level schema type in each grouping of schema | ||
types in a code block represents a data structure that is serialized into a single IPLD block with its own Link (CID). | ||
|
||
There are some state data structures that are repeats of the same form: a modified merkle patricia trie node. | ||
They are not de-duplicated here for clarity to demonstrate the different purposes and contents of those data structures. | ||
|
||
For more information about the IPLD Schema language, see the [specification](https://specs.ipld.io/schemas/). | ||
|
||
## Data Structure Descriptions | ||
|
||
* [Ethereum Data Structures **Basic Types**](basic_types.md) | ||
* [Ethereum **Chain** Data Structures](chain.md) | ||
* [Ethereum **Convenience Types**](convenience_types.md) | ||
* [Ethereum **State** Data Structures](state.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Ethereum Data Structure Basic Types | ||
These types are used throughout the Ethereum data structures but are themselves not IPLD blocks. | ||
|
||
```ipldsch | ||
# Go big.Int | ||
# Prefer presenting to users either as a number or a string view of the decimal number | ||
# for readability. | ||
type BigInt bytes | ||
# Unsigned integer | ||
# Used to explicity specify that an integer cannot be negative | ||
type Uint int | ||
# Block nonce is an 8 byte binary representation of a block's nonce | ||
type BlockNonce bytes | ||
# Hash represents the 32 byte KECCAK_256 hash of arbitrary data. | ||
type Hash bytes | ||
# Address represents the 20 byte address of an Ethereum account. | ||
type Address bytes | ||
# Bloom represents a 256 byte bloom filter. | ||
type Bloom bytes | ||
# Balance represents an account's balance in units of wei (1*10^-18 ETH) | ||
type Balance BigInt | ||
# OpCode is a 1 byte EVM opcode | ||
type OpCode bytes | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Ethereum Chain Data Structures | ||
|
||
This section contains the IPLD schemas for the blockchain data structures of Ethereum. | ||
This includes: headers, uncle sets, transactions, and receipts. The state trie, storage trie, | ||
receipt trie, and transaction trie IPLDs are described in the [state](state.md) section. It | ||
is important to note that traversal from header to a specific transaction or receipt requires traversal | ||
across their respective tries beginning at the root referenced in the header. Alternatively, uncles are referenced | ||
directly from the header by the hash of the RLP encoded list of uncles. | ||
|
||
## Header IPLD | ||
|
||
This is the IPLD schema for a canonical Ethereum block header. | ||
* The IPLD block is the RLP encoded header | ||
* Links to headers use a KECCAK_256 mutlihash of the RLP encoded header and the EthHeader codec (0x90). | ||
* Parent headers are referenced back to by their child header. | ||
* The genesis header is unique in that it does not reference a parent header in `ParentCID`, instead it contains a reference to a `GenesisInfo` ADL. | ||
|
||
```ipldsch | ||
# Header contains the consensus fields of an Ethereum block header | ||
type Header struct { | ||
# CID link to the parent header | ||
# This CID is composed of the KECCAK_256 multihash of the linked RLP encoded header and the EthHeader codec (0x90) | ||
ParentCID &Header | ||
# CID link to the list of uncles at this block | ||
# This CID is composed of the KECCAK_256 multihash of the RLP encoded list of Uncles and the EthHeaderList codec (0x91) | ||
# Note that an uncle is simply a header that does not have an associated body | ||
UnclesCID &Uncles | ||
Coinbase Address | ||
# CID link to the root node of the state trie | ||
# This CID is composed of the KECCAK_256 multhash of the RLP encoded state trie root node and the EthStateTrie codec (0x96) | ||
# This steps us down into the state trie, from which we can link to the rest of the state trie nodes and all the linked storage tries | ||
StateRootCID &StateTrieNode | ||
# CID link to the root node of the transaction trie | ||
# This CID is composed of the KECCAK_256 multihash of the RLP encoded tx trie root node and the EthTxReceiptTrie codec (0x92) | ||
# This steps us down into the transaction trie, from which we can link to the rest of the tx trie nodes and all of the linked transactions | ||
TxRootCID &TxTrieNode | ||
# CID link to the root of the receipt trie | ||
# This CID is composed of the KECCAK_256 multihash of the RLP encoded rct trie root node and the EthTxReceiptTrie codec (0x94) | ||
# This steps us down into the receipt trie, from which we can link to the rest of the rct trie nodes and all of the linked receipts | ||
RctRootCID &RctTrieNode | ||
Bloom Bloom | ||
Difficulty BigInt | ||
Number BigInt | ||
GasLimit Uint | ||
GasUser Uint | ||
Time Uint | ||
Extra Bytes | ||
MixDigest Hash | ||
Nonce BlockNonce | ||
} | ||
``` | ||
|
||
## Uncles IPLD | ||
This is the IPLD schema for a list of uncles ordered in ascending order by their block number. | ||
* The IPLD block is the RLP encoded list of uncles | ||
* CID links to `UncleList` use a KECCAK_256 multihash of the RLP encoded list and the EthHeaderList codec (0x92). | ||
* The `Uncles` is referenced in an Ethereum `Header` by the `UnclesCID`. | ||
|
||
```ipldsch | ||
# Uncles contains an ordered list of Ethereum uncles (headers that have no associated body) | ||
# This IPLD object is referenced by a CID composed of the KECCAK_256 multihash of the RLP encoded list and the EthHeaderList codec (0x91) | ||
type Uncles [Header] | ||
``` | ||
|
||
## Transaction IPLD | ||
This is the IPLD schema for a canonical Ethereum transaction. It contains only the fields required for consensus. | ||
Note that this will need to be updated once EIP-1559 and EIP-2718 are approved. | ||
* The IPLD block is the RLP encoded transaction | ||
* CID links to `Transaction` use a KECCAK_256 multihash of the RLP encoded transaction and the EthTx codec (0x93). | ||
* `Transaction` IPLDs are not referenced directly from an `Ethereum` header but are instead linked to from within the transaction trie whose root is referenced in the `Header` by the `TxRootCID`. | ||
```ipldsch | ||
# Transaction contains the consensus fields of an Ethereum transaction | ||
type Transaction struct { | ||
AccountNonce Uint | ||
Price BigInt | ||
GasLimit Uint | ||
Recipient nullable Address # null recipient means the tx is a contract creation | ||
Amount BigInt | ||
Payload Bytes | ||
# Signature values | ||
V BigInt | ||
R BigInt | ||
S BigInt | ||
} | ||
``` | ||
|
||
## Receipt IPLD | ||
This is the IPLD schema for a canonical Ethereum receipt. It contains only the fields required for consensus. | ||
* The IPLD block is the RLP encoded receipt | ||
* CID links to `Receipt` use a KECCAK_256 multihash of the RLP encoded receipt and the EthTxReceipt codec (0x95). | ||
* `Receipt` IPLDs are not referenced directly from an `Ethereum` header but are instead linked to from within the receipt trie whose root is referenced in the `Header` by the `RctRootCID`. | ||
```ipldsch | ||
# Receipt contains the consensus fields of an Ethereum receipt | ||
type Receipt struct { | ||
PostStateOrStatus Bytes | ||
CumulativeGasUsed Uint | ||
Bloom Bloom | ||
Logs [Log] | ||
} | ||
# Log contains the consensus fields of an Etherem receipt log | ||
type Log struct { | ||
Address Address | ||
Topics [Hash] | ||
Data Bytes | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,208 @@ | ||
# Convenience IPLD types | ||
|
||
The types described below are not referenced directly from within the canonical Ethereum merkle tree. | ||
Instead, these types can be constructed and verified from underlying canonical Ethereum IPLD structures using the algorithms described here. | ||
These types are introduced to improve the convenience and performance of accessing and working with the Ethereum objects for certain purposes. | ||
|
||
## Transaction Trace IPLD | ||
|
||
Transaction traces contain the EVM context, input, and output for each individual OPCODE operation performed during the application of a transaction on a certain state. | ||
These objects can be generated or verified by applying the referenced transactions on top of the referenced state. | ||
* The IPLD block is the RLP encoded object | ||
* CID links to `TxTrace` use a KECCAK_256 multihash of the RLP encoded object and the EthTxTrace codec (tbd). | ||
|
||
```ipldsch | ||
# TxTrace contains the EVM context, input, and output for each OPCODE in a transaction that was applied to a specific state | ||
type TxTrace struct { | ||
# List of CIDs linking to the transactions that were used to generate this trace by applying them onto the state referenced below | ||
# If this trace was produced by the first transaction in a block then this list will contain only that one transaction | ||
# and thistrace was produced by applying it directly to the referenced state | ||
# Otherwise, only the last transaction in the list is the one directly responsible for producing this trace whereas the | ||
# proceeding ones were sequentially applied to the referenced state to generate the intermediate state that the final, | ||
# trace-producing transaction, was applied on top of | ||
# This is analogous to the Transactions IPLD defined below, but only in the case of a trace produced by the last | ||
# transaction in a block will the list be same as a complete Transaction IPLD | ||
TxCIDs [&Transaction] | ||
# CID link to the root node of the state trie that the above transaction set was applied on top of to produce this trace | ||
StateRootCID &StateTrieNode | ||
Result Bytes | ||
Frames [Frame] | ||
Gas Uint | ||
Failed Bool | ||
} | ||
# Frame represents the EVM context, input, and output for a specific OPCODE during a transaction trace | ||
type Frame struct { | ||
Op OpCode | ||
From Address | ||
To Address | ||
Input Bytes | ||
Output Bytes | ||
Gas Uint | ||
Cost Uint | ||
Value BigInt | ||
} | ||
``` | ||
|
||
Provided a `Header` multihash/CID and a transaction index, we can generate a `TxTrace` by | ||
1) Fetching and decoding the `Header` IPLD. | ||
2) Stepping down into the transaction trie referenced in the header. | ||
1) Collecting the transaction at the provided index and all transactions with indexes lower than the provided index. | ||
2) KECCAK_256 hashing each transaction. | ||
3) Convert hashes to CIDs using the KECCAK_256 multihash and EthTx codec. | ||
3) Order these CIDs in a list by transaction index. | ||
3) Collect the `StateRootCID` from within this `Header`. | ||
4) Use [ipfs-ethdb](https://github.com/vulcanize/ipfs-ethdb) with state root linked in the `Header` to instantiate an EVM on top | ||
of the state of this block. | ||
5) Apply each of the transactions on top of this state using the ipfs-ethdb based EVM. | ||
6) For the final transaction applied, collect the trace output from the EVM. | ||
7) Assemble the trace output, the `Transaction` CIDs, and the root `StateTrieNode` CID into the `TxTrace` object. | ||
|
||
## Block IPLD | ||
|
||
`Block` IPLD represents an entire block (header + body) in the Ethereum blockchain, it contains direct content hash references to | ||
the sets of transactions and receipts for that block in order to avoid the need to traverse the transaction | ||
and receipt tries to collect these sets (as is required when starting from a canonical `Header` block). | ||
These objects can be generated or verified by following the links within the contained `Header` to collect the `Transactions` and `Receipts` | ||
from the referenced transaction and receipt tries. | ||
* The IPLD block is a CBOR serialization of the object | ||
* CID links to `Block` use a KECCAK_256 multihash of the CBOR serialized object and the DagCbor codec (0x71). | ||
|
||
```ipldsch | ||
# Block represents an entire block in the Ethereum blockchain. | ||
type Block struct { | ||
# CID link to the header at this block | ||
# This CID is composed of the KECCAK_256 multihash of the RLP encoded header and the EthHeader codec (0x90) | ||
# Note that the header contains references to the uncles and tx, receipt, and state tries at this height | ||
Header &Header | ||
# CID link to the list of hashes for each of the transactions at this block | ||
# This CID is composed of the KECCAK_256 multihash of the RLP encoded list of transaction hashes and the EthTxHashList codec (tbd) | ||
Transactions &TransactionHashes | ||
# CID link to the list of hashes for each of the receipts at this block | ||
# This CID is composed of the KECCAK_256 multihash of the RLP encoded list of receipt hashes and the EthTxReceiptHashList codec (tbd) | ||
Receipts &ReceiptHashes | ||
} | ||
``` | ||
|
||
Provided a `Header` multihash/CID, we can generate a `Block` IPLD by | ||
1) Fetching and decoding the `Header` IPLD. | ||
1) Stepping down into the transaction trie referenced in the header. | ||
1) Collecting each transaction stored at the leaf nodes in the trie. | ||
2) KECCAK_256 hashing each transaction. | ||
3) Order these hashes in a list by transaction index. | ||
4) KECCAK_256 hash the RLP encoded list. | ||
5) Convert to CID using the KECCAK_256 multihash and EthTxHashList codec. | ||
2) Stepping down into the receipt trie referenced in the header. | ||
1) Collecting each receipt stored at the leaf nodes in the trie. | ||
2) KECCAK_256 hashing each receipt. | ||
3) Order these hashes in a list by receipt index. | ||
4) KECCAK_256 hash the RLP encoded list. | ||
5) Convert to CID using the KECCAK_256 multihash and EthTxReceiptHashList codec. | ||
3) Assemble the `Header` CID, `Transactions` CID, and `Receipts` CID into the `Block` object. | ||
|
||
## TransactionHashes IPLD | ||
|
||
This is the IPLD schema for the ordered list of all transactions for a given block. | ||
* The IPLD block is the RLP encoded list of transaction hashes | ||
* CID links to `Transactions` use a KECCAK_256 multihash of the RLP encoded list of transaction hashes and the EthTxHashList codec (tbd). | ||
* `Transactions` IPLDs are not referenced from any canonical Ethereum object, but are instead linked to from the above `Block` and `TxTrace` objects. | ||
|
||
```ipldsch | ||
# Transactions contains a list of CID that reference all of the Ethereum transactions at this block | ||
# These CIDs are composed from the KECCAK_256 multihash of the referenced transaction and the EthTx codec (0x93) | ||
type Transactions [&Transaction] | ||
``` | ||
|
||
## ReceiptHashes IPLD | ||
|
||
This is the IPLD schema for the ordered list of all receipts for a given block. | ||
* The IPLD block is the RLP encoded list of receipt hashes | ||
* CID links to `Receipts` use a KECCAK_256 multihash of the RLP encoded list of receipt hashes and the EthTxReceiptHashList codec (tbd) | ||
* `Receipts` IPLDs are not referenced directly from any canonical Ethereum object, but are instead linked to from the above `Block` ADL object. | ||
|
||
```ipldsch | ||
# Receipts contains a list of CID that reference all of the receipts at this block | ||
# These CIDs are composed from the KECCAK_256 multihash of the referenced receipt and the EthTxReceipt codec (0x95) | ||
type Receipts [&Receipt] | ||
``` | ||
|
||
## Genesis IPLD | ||
|
||
This is the IPLD schema for the configuration settings and genesis allocations to produce a specific genesis block and begin an Ethereum | ||
blockchain. It also includes a reference to the genesis block `Header` it produces. This is a single IPLD block at the base of an entire Ethereum chain. | ||
* The IPLD block is a CBOR serialization of the object | ||
* CID links to `GenesisInfo` use a KECCAK_256 multihash of the CBOR serialized object and the DagCbor codec (0x71). | ||
|
||
```ipldsch | ||
# GenesisInfo specifies the header fields, state of a genesis block, and hard fork switch-over blocks through the chain configuration. | ||
# NOTE: we need a new multicodec type for the Genesis object | ||
type GenesisInfo struct { | ||
# CID link to the genesis header this genesis info produces | ||
# This CID is composed of the KECCAK_256 multihash of the linked RLP encoded header and the EthHeader codec (0x90) | ||
GensisHeader &Header | ||
Config ChainConfig | ||
Nonce Uint | ||
Timestamp Uint | ||
ExtraData Bytes | ||
GasLimit Unit | ||
Difficulty BigInt | ||
Mixhash Hash | ||
Coinbase Address | ||
Alloc GenesisAlloc | ||
# These fields are used for consensus tests. Please don't use them | ||
# in actual genesis blocks. | ||
Number Uint | ||
GasUsed Uint | ||
ParentHash Hash | ||
} | ||
# GenesisAlloc is a map that specifies the initial state that is part of the genesis block. | ||
type GenesisAlloc {Address:GenesisAccount} | ||
# GenesisAccount is an account in the state of the genesis block. | ||
type GenesisAccount struct { | ||
Code Bytes | ||
Storage {Hash:Hash} | ||
Balance BigInt | ||
Nonce Uint | ||
PrivateKey Bytes | ||
} | ||
# ChainConfig is the core config which determines the blockchain settings. | ||
# ChainConfig is stored in the database on a per block basis. | ||
# This means that any network, identified by its genesis block, can have its own set of configuration options. | ||
# The ChainConfig referenced in GenesisInfo is used to produce the genesis block but is not necessarily used for later blocks down the chain. | ||
type ChainConfig struct { | ||
ChainID BigInt | ||
HomesteadBlock BigInt | ||
DAOForkBlock BigInt | ||
DAOForkSupport Bool | ||
EIP150Block BigInt | ||
EIP150Hash Hash | ||
EIP155Block BigInt | ||
EIP158Block BigInt | ||
ByzantiumBlock BigInt | ||
ConstantinopleBlock BigInt | ||
PetersburgBlock BigInt | ||
IstanbulBlock BigInt | ||
MuirGlacierBlock BigInt | ||
YoloV2Block BigInt | ||
EWASMBlock BigInt | ||
# Various consensus engines | ||
Ethash EthashConfig | ||
Clique CliqueConfig | ||
} | ||
# EthashConfig is the consensus engine config for proof-of-work based sealing. | ||
# At this time there are no configuration options for the Ethash engine. | ||
type EthashConfig struct {} representation tuple | ||
# CliqueConfig is the consensus engine config for proof-of-authority based sealing. | ||
type CliqueConfig struct { | ||
Period Uint | ||
Epoch Uint | ||
} | ||
``` |
Oops, something went wrong.