Reth's architecture

Overview

The image represents a rough component flow of Reth's architecture:

Engine: Similar to other clients, it is the primary driver of Reth.
Sync: Reth has two modes of sync historical and live
Pipeline: The pipeline performs historical sync in a sequential manner, enabling us to optimize each stage of the synchronization process. The pipeline is split into stages , where a stage is a trait that provides us with a function to execute the stage or unwind(undo) it. Currently the pipeline has 12 stages that can be configured, with the first two running separately, the pipeline proceeds top to bottom except when there is a problem encountered then it proceeds to unwind from the issue stage upwards :
1. HeaderStage: Header verification stage.
2. BodyStage: Download blocks over P2P.
3. SenderRecoveryStage: The computation is costly as it retrieves the sender's address from the signature for each transaction in the block's body.
4. ExecutionStage: The most time-consuming & computationally heavy stage involves taking the sender, transaction, and header and executing them within the REVM. This process generates receipts and change sets. Change sets are data structures that function as hash maps and depict the modifications that occur between accounts inside a single block. In addition, the execution stage operates on a plain state that contains only the addresses and account information in the form of key-value pairs.
5. MerkleStage(unwind): Skipped during the execution flow, used when unwinding.
6. AccountHashingStage: Required by the merkle stage,we take the plain state and apply a hashing function to it. Then, we save the resulting hashed account in a database specifically designed for storing accounts.
7. StorageHashingStage: Similar to above but for storage.
8. MerkleStage(execute): generates a state root by using the hashes produced by the two preceding stages and then checks if the resulting state root is accurate for the given block.
9. TransactionLookupStage: Helper stage, allows us to do transaction lookup.
10. IndexStorageHistoryStage: Enables us to retrieve past data, the execution phase generates the change set, which then indexes the data that existed prior to the execution of the block. Enables us to retrieve the historical data for any given block number.
11. IndexAccountHistoryStage: Similar to above.
12. FinishStage:We notify that the engine is now capable of receiving new fork choice updates from the consensus layer's.
BlockchainTree: When we are nearing the end of the chain during the syncing process, we transition to the blockchain tree. The synchronization occurs close to the tip, when state root validation and execution take place in memory.
Database: When a block gets canonicalized, it is moved to the database
Provider: An abstraction over database that provides utility functions to help us avoid directly accessing the keys and values of the underlying database.
Downloader: Retrieves blocks and headers using peer-to-peer (P2P) networks. This tool is utilized by the pipeline during its initial two stages and by the engine in the event that it need to bridge the gap at the tip.
P2P: When we approach the tip, we transfer the transactions we have read over P2P to the transaction pool.
Transaction Pool: Includes DDoS mitigation measures. Consists of transactions arranged in ascending order based on the gas price preferred by the users.
Payload Builder: Extracts the initial n transactions in order to construct a fresh payload.
Pruner: Allows us to have a full node.Once the block has been canonicalized by the blockchain tree, we must wait for an additional 64 blocks for it to reach finalization. Once the finalization process is complete, we can be certain that the block will not undergo reorganization. Therefore, if we are operating a full node, we have the option to eliminate the old block using the pruner.

Reth primarily utilizes the mdbx database. In addition, it offers several valuable abstractions that enhance its underlying database by enabling data transformation, compression, iteration, writing, and querying functionalities. These abstractions are designed to allow reth the option to change its underlying DB, mdbx, with minimal modifications to the existing storage abstractions.

Codecs

This crate enables the creation of diverse codecs for various purposes. The primary codec utilized in this context is the Compact trait, which enables the compression of data, such as unsigned integers by compressing their leading zeros, as well as structures such as access-lists, headers etc.

DB Abstractions

The database trait is the fundamental abstraction that provides either read only or read/write access to transactions in the low-level database.

The cursor enables iteration over the values in the database and offers a swift method for retrieving transactions or blocks. It is particularly useful when calculating merkle roots, as sequential value access is significantly faster than random seeking. In addition, if we have a large amount of data to write, sorting and writing it is much faster. The cursor allows us to optimize our approach by providing convenient functions for writing either sorted or unsorted data.

Tables

Table	Key	Value	Description
CanonicalHeaders	BlockNumber	HeaderHash	Stores block number indexed by header hash
HeaderTerminalDifficulties	BlockNumber	CompactU256	Is responsible for storing the total difficulty value obtained from a block header. Although it is commonly employed in proof-of-work systems, it is currently not in use.
HeaderNumbers	BlockHash	BlockNumber	This is a utility table, it stores block number associated with a header.
Headers	BlockNumber	Header	Stores header bodies.
BlockBodyIndices	BlockNumber	StoredBlockBodyIndices	Stores block indices that contains indexes of transaction and the count of them. This allows us to determine which transaction numbers are included in the block.
BlockOmmers	BlockNumber	StoredBlockOmmers	Stores the uncles/ommers of the block, which are the side blocks that got included (used in proof-of-work)
BlockWithdrawals	BlockNumber	StoredBlockWithdrawals	Stores the block withdrawals.
Transactions	TxNumber	TransactionSignedNoHash	Here the transaction body is stored indexed by the ordinary transaction number. This information includes the total number of transactions and the number of transactions that were executed. Furthermore, it enables us to effortlessly retrieve a solitary transaction.
TransactionHashNumbers	TxHash	TxNumber	Stores the transaction number indexed by the transaction hash.
TransactionBlocks	TxNumber	BlockNumber	Stores the mapping of the highest transaction number to the blocks number. Allows us to fetch the block number for a given transaction number.
Receipts	TxNumber	Receipt	Stores transaction receipts indexed by transaction number.
Bytecodes	B256	Bytecode	Compiles and stores the bytecode of all smart contracts. There will be multiple accounts with identical bytecode. Therefore, it is necessary to implement a reference counting pointer.
PlainAccountState	Address	Account	Stores the current state of an Account, the plain state, indexed by the Account address. The plain state is updated during the execution stage.
PlainStorageState	Address , SubKey = B256	StorageEntry	Stores the current value of a storage key and the sub-key is the hash of the storage key. Concerning sub-keys: mdbx allows us to dup table (duplicate values inside tables) which can lead a faster access to some values.
AccountsHistory	`ShardedKey<Address>`	BlockNumberList	Stores pointers to the block changesets that contain modifications for each account key. Each account is associated with a record of modifications, represented as a list of blocks. For example, if we want to retrieve the account balance at block 1 million, we need to determine the next block where the account was modified. If the next modification occurs at block number 1 million and 1, we need to fetch the set of changes for that account from the tables below.

StoragesHistory	StorageShardedKey	BlockNumberList	Stores pointers to block number changeset with changes for each storage key. This allows us to index the change sets and find the change that happened in the history
AccountChangeSets	BlockNumber, SubKey = Address	AccountBeforeTx	The state of an account is stored prior to any transaction that alters it, such as when the account is created, self-destructed, accessed while empty, or when its balance or nonce is modified. Therefore, for each block number. Therefore, we possess the previous values for each block and account address.
StorageChangeSets	BlockNumberAddress , SubKey = B256	StorageEntry	Preserves the state of a storage prior to a specific transaction altering it. Therefore, for each block number, account address and sub-key as the storage key, we can obtain the previous storage value. The execution stage modifies both this table and the one above it. These tables are used for the merkle trie calculations, which require the values to be incremental. They are also used for any history tracing performed by the JSON-RPC API.
HashedAccounts	B256	Account	Stores the current state of an account indexed by keccak256(Address). This table is in preparation for merkleization and calculation of state root. This and the table below are used by the merkle trie, for the first calculation of the merkle trie we need sorted hashed addresses
HashedStorages	B256, SubKey = B256	StorageEntry	Stores the current storage values indexed by keccak256(Address) and the sub-key as the hash of storage key keccak256(key). Like above useful for merkleization as the hashed addresses/keys are sorted.
AccountsTrie	StoredNibbles	StoredBranchNode	Stores the current state's Merkle Patricia Tree.
StoragesTrie	B256 , SubKey = StoredNibblesSubKey	StorageTrieEntry	From HashedAddress => NibblesSubKey => Intermediate value. This and the above table stores the nodes needed for merkle trie calculation
TransactionSenders	TxNumber	Address	Stores the transaction sender for each transaction. It is needed to speed up execution stage and allows fetching the signer without doing the computationally expensive transaction signer recovery
StageCheckpoints	StageId	StageCheckpoint	Stores the highest synced block number and stage-specific checkpoint of each stage.
StageCheckpointProgresses	StageId	`Vec<u8>`	Stores arbitrary data to keep track of a stage first-sync progress. This and the above table allows us to know where the stage stopped and to determine what to do next.
PruneCheckpoints	PruneSegment	PruneCheckpoint	Records the maximum pruned block number and the pruning mode for each segment of the pruning process. This enables us to determine the extent to which we have pruned our data, involving the elimination of change sets and their corresponding indexes to eliminate historical data, leaving only the most recent data to be retrieved i.e. fetching the tip.
VersionHistory	u64	ClientVersion	Stores the history of client versions that have accessed the database with write privileges indexed by unix timestamp seconds.

Reth

Reth's architecture

Overview

Storage

On this page