Ethereum Source Code Analysis: Blocks, Transactions, Contracts, and the Virtual Machine

·

Ethereum stands as one of the most influential blockchain platforms in the world, enabling decentralized applications (dApps) and smart contracts through its robust architecture. This article dives deep into the core components of Ethereum’s source code—specifically focusing on blocks, transactions, contracts, and the Ethereum Virtual Machine (EVM). By analyzing the Go implementation (go-ethereum), we aim to provide a technically rich yet accessible understanding of how Ethereum operates under the hood.

Core keywords identified: Ethereum, blockchain, smart contracts, Ethereum Virtual Machine (EVM), transactions, Gas, SHA-3, RLP encoding.


Understanding Ethereum's Core Concepts

SHA-3 Hashing and RLP Encoding

At the heart of Ethereum’s data integrity lies cryptographic hashing and efficient data serialization. The platform uses SHA-3 (Keccak-256) for all hash operations—a secure, irreversible algorithm that generates a unique 256-bit fingerprint for any given input. Unlike SHA-1 or SHA-2, SHA-3 employs a different internal structure called the sponge construction, making it more resistant to known cryptographic attacks.

Data in Ethereum is serialized using Recursive Length Prefix (RLP) encoding, which flattens nested byte arrays into a linear format. This method ensures that complex data structures like transactions and blocks can be consistently encoded and decoded across nodes. When combined with SHA-3, RLP enables the creation of RLP hashes—a cornerstone of Ethereum’s state management.

These RLP hashes are used as keys in key-value databases such as LevelDB, where the value is the RLP-encoded object itself. This design allows for efficient storage and retrieval while ensuring immutability and verifiability.

👉 Discover how blockchain platforms leverage cryptographic hashing for security and scalability.


Key Data Types: Hashes and Addresses

Two fundamental types in Ethereum’s codebase are common.Hash and common.Address, both defined in the types package:

const (
    HashLength  = 32
    AddressLength = 20
)
type Hash [HashLength]byte
type Address [AddressLength]byte

A Hash is 32 bytes (256 bits), used universally to identify blocks, transactions, and states. An Address is 20 bytes, representing an account—either externally owned (EOA) or a smart contract.

Additionally, Ethereum relies heavily on Go’s big.Int type to handle large integers safely, especially for values like balances, gas limits, and prices. Operations on big.Int must use built-in methods (e.g., Add, Sub) rather than native operators, ensuring precision and avoiding overflow issues.


Gas and Ether: Fueling the Network

In Ethereum, Gas is the unit measuring computational effort required to execute operations. Every action—from simple transfers to complex smart contract executions—consumes Gas. This mechanism prevents abuse by ensuring users pay for resource usage.

Ether (ETH) is the native cryptocurrency. Users spend ETH to purchase Gas at a specified price (GasPrice). For example, if a transaction sets GasLimit = 21000 and GasPrice = 100 gwei, the total cost is 2.1 million gwei (0.0021 ETH).

This dual-layer model separates computational cost from monetary value, allowing dynamic pricing based on network congestion while maintaining predictable execution costs.


Blocks as Containers of Transactions

A block is a collection of transactions linked in a chain via cryptographic hashes. Each block contains a header with metadata, including:

type Block struct {
    header       *Header
    transactions Transactions
}

type Header struct {
    ParentHash common.Hash
    Number     *big.Int
}

Blocks form a tamper-evident ledger: altering any block invalidates all subsequent hashes. Transactions within a block are processed sequentially by miners or validators during block execution.


Transaction Execution: From Submission to Finality

The Execution Pipeline

Transaction processing occurs in two layers: outside and inside the EVM.

The entry point is StateProcessor.Process(), which iterates over each transaction in a block and calls ApplyTransaction(). This function returns a Receipt, capturing execution results such as:

Each receipt includes a Bloom filter—a probabilistic data structure allowing nodes to quickly verify whether a specific log exists within a block without downloading full data.


Gas Mechanics: Consumption and Refunds

During execution, several Gas-related steps occur:

  1. Gas Purchase: Deduct (GasLimit × GasPrice) from sender’s balance.
  2. Intrinsic Gas Cost: Calculate base cost based on payload size and transaction type.
  3. EVM Execution: Run contract code or transfer funds.
  4. Refund Calculation: Unused Gas is refunded; additional refunds apply for operations like clearing storage.
  5. Miner Reward: The block producer receives (GasUsed × GasPrice) in ETH.

This system incentivizes efficient coding (lower Gas usage) and rewards validators for securing the network.

👉 Learn how decentralized networks balance user fees with validator incentives.


Digital Signatures and Address Recovery

Every transaction must be digitally signed using Elliptic Curve Digital Signature Algorithm (ECDSA). A signature consists of three parts: R, S, and V—stored as *big.Int in the transaction.

The sender’s address isn’t explicitly stored but derived from the public key recovered during signature verification. Ethereum defines a Signer interface to abstract this logic:

type Signer interface {
    Sender(tx *Transaction) (Address, error)
    Hash(tx *Transaction) common.Hash
}

Functions like SignTx() generate signatures, while Sender() recovers the originating address. This design enhances privacy and reduces data redundancy.

Before execution, AsMessage() converts a signed transaction into a Message, extracting essential fields including the sender’s address.


Inside the Ethereum Virtual Machine (EVM)

Context and State Management

The EVM executes transaction logic in a sandboxed environment. It receives contextual information via a Context struct containing:

Critical functions include:

type TransferFunc func(StateDB, Address, Address, *big.Int)
type CanTransferFunc func(StateDB, Address, *big.Int) bool

These allow customization—for instance, implementing alternative transfer rules in private chains.

State changes are not immediately written to disk. Instead, they’re cached in StateDB, which uses a Merkle Patricia Trie for efficient state representation. Only after full block validation are changes committed.


Smart Contracts: Creation and Invocation

A Contract is an EVM object representing a smart contract instance:

type Contract struct {
    CallerAddress common.Address
    self          ContractRef
    Code          []byte
    Input         []byte
    Gas           uint64
}

Two primary functions handle contract interaction:

When a transaction has no recipient (Recipient == nil), its Payload becomes the contract’s bytecode (Code). Otherwise, Payload serves as input data (Input) for function calls.


Precompiled Contracts for Efficiency

Certain critical operations—like SHA-256 hashing or elliptic curve recovery—are implemented as precompiled contracts located at specific addresses. These bypass the interpreter loop for speed and predictability.

They implement:

type PrecompiledContract interface {
    RequiredGas(input []byte) uint64
    Run(input []byte) ([]byte, error)
}

Examples include:

Developers can extend this set in custom forks for high-performance use cases.


Interpreter: Executing Opcodes

For non-precompiled contracts, execution falls to the Interpreter, which processes bytecode one opcode at a time. Each opcode (1 byte) maps to an operation struct defining:

Supported operations span:

Logs are crucial for off-chain indexing services (e.g., The Graph), enabling event-driven dApp functionality.


Frequently Asked Questions

Q: What is the difference between Gas and Ether?
A: Gas measures computational work; Ether is the currency used to pay for Gas. You buy Gas with Ether at a rate set by GasPrice.

Q: How is a sender's address recovered from a transaction?
A: Using ECDSA signature recovery—the public key is derived from (R, S, V) values, then hashed to produce the sender’s address.

Q: Why use RLP instead of JSON or Protocol Buffers?
A: RLP ensures canonical encoding—identical objects always produce the same byte output—critical for consistent hashing across nodes.

Q: Can a transaction modify its own Gas consumption mid-execution?
A: No, but it can trigger refunds (e.g., clearing storage), which increase available Gas up to the original limit.

Q: How do logs help in decentralized applications?
A: Logs emit events that frontends listen to (via Web3.js or Ethers.js), enabling real-time updates without querying full state.

Q: Is EVM Turing-complete?
A: Yes, but bounded by Gas limits to prevent infinite loops and denial-of-service attacks.


Conclusion

Ethereum’s architecture combines cryptographic rigor with flexible computation through blocks, transactions, smart contracts, and the EVM. Its use of SHA-3 hashing, RLP encoding, ECDSA signatures, and Gas-based economics creates a secure, scalable foundation for decentralized innovation.

Understanding these mechanisms provides insight not only into how Ethereum works but also how developers can build efficient, secure applications on top of it.

👉 Explore how next-generation blockchain platforms optimize virtual machine performance.