zeth: the EVM as a differential trail against PyEVM

2026-05-18 · project zeth

This is a lab notebook entry, not an essay. Format is claim → why differential testing is load-bearing → opcode coverage → precompiles → gas accounting → honest gaps → reproducibility. Every claim is graded against the controlled evidence vocabulary; every assertion is footnoted to the [zeth][repo] repo's README, STATUS_SUMMARY.md, or CHANGELOG.md. The substrate under examination is a Zig implementation of the Ethereum Virtual Machine, written with correctness against the reference clients (PyEVM / go-ethereum) as the explicit objective rather than absolute performance.

[repo]: https://github.com/SMC17/zeth

1. The claim

zeth is a Zig EVM whose v1.0 surface implements 142 of 143 opcodes in the dispatch table, routes the nine canonical precompile addresses (0x01..0x09), and publishes machine-readable validation artifacts on every CI run. At the snapshot the README and STATUS_SUMMARY.md both pin (revision 381f677 +, 2026-03-26)12:

precompile passed 14, failures 0.

build in CI.

MODEXP) are verified against PyEVM.

BLAKE2F) have routing implemented but 58 discrepancies vs PyEVM under investigation; the CI Test Suite is currently red on that regression-gate step. Do not depend on these four precompiles for production cryptography.

<!— runnable-claim: zeth-opcode-report —> <!— runnable-claim: zeth-precompile-discrepancies —>

Evidence grade: differential-tested. The grade applies to the opcode-and-state surface that's verified against PyEVM; it does not apply to the four BN256 / BLAKE2F precompiles, which the repo itself flags as red and under investigation1. The discipline of naming the red row honestly is the point of the differential-test framing.

2. Why differential testing is the load-bearing claim

Most "EVM in Zig" projects answer the wrong question. "Does the interpreter run the toy examples?" is the wrong bar. The real question for a consensus-critical implementation is: for every observable state transition the reference clients produce, does this implementation produce the same one? That is the question only a differential test can answer.

zeth is explicit about this stance. The README calls itself "a Zig EVM focused on correctness-first execution semantics, differential validation, and a path to high-performance research tooling"1, and the project status doc adds:

Alpha software under active development. Claims should be tied to

passing tests, differential results, and CI

artifacts.3

The differential surface as it exists today comprises five pieces, all visible from zig build4:

  1. run_reference_tests — the differential runner that takes

PyEVM (and optionally go-ethereum) as a reference. Compiled binary at ./zig-out/bin/run_reference_tests. In no-reference mode (when neither PyEVM nor Geth is on the path), 22 / 22 self-consistency cases pass; in differential mode the runner asserts post-state equality against the reference for the cases it can drive.

  1. zig build opcode-report -- --format json — machine-readable

per-opcode pass / fail status emitted to a JSON file the CI publishes as an artifact (opcode_report.json). Total 33 cases, 33 passed, 0 failed at the pinned snapshot2.

  1. **zig build validate-rlp / validate-rlp-decode /

validate-rlp-invalid** — RLP encoder / decoder validation against the canonical test corpus.

  1. zig build validate-vm — VMTests against

https://github.com/ethereum/tests (requires a sibling clone of the official Ethereum test corpus).

  1. precompile_differential_report.json — the CI artifact that

tracks the per-precompile pass / fail status against PyEVM. This is the artifact that currently reports the 58 BN256 / BLAKE2F discrepancies; the file is tracked in validation/baselines/reference_discrepancies.json1.

The hierarchy is the load-bearing observation: each of those five surfaces emits a machine-readable JSON artifact, and the CI either gates on it or publishes it for inspection. A future contributor — human or agent — does not need to read the source to know the implementation status; the JSON is the source of truth, the README is the human-readable rollup.

3. The opcode coverage map

Per STATUS_SUMMARY.md at revision 381f677 +2:

| Surface | Count | | -------------------------— | ------—:| | Opcode enum entries | 143 | | Opcode dispatch handlers | 142 | | zig build test | 263 / 263 passing | | TODO / FIXME markers | 2 across src/ + validation/ | | Total Zig source | 18 608 lines across 35 files | | opcode_report JSON summary| total 33, passed 33, failures 0 |

The 142-of-143 gap is the single un-dispatched opcode in the enum. The README does not call out which opcode that is in its "Current Reality" section, so this entry will not speculate. The opcode_report artifact reports zero failures across the 33-case machine-readable matrix, which means the un-dispatched enum entry is not exercised by the current report's cases. That is a coverage gap to close, not a silent bug — the difference matters for honesty.

The README also enumerates the correctness fixes landed in this revision, with each one mapping to a concrete EVM-spec behaviour2:

big-endian bytes to the little-endian limb layout (limbs[0] is LSB, limbs[3] is MSB). Previously MSB was stored in limbs[0], which produced incorrect memory layout for MSTORE / MLOAD relative to the EVM specification.

position >= 31 and large U256 positions (the prior code masked with & 31, which wrapped around). The sign-bit-zero case now clears the upper bytes (previously left them unchanged).

amount has non-zero upper limbs (shift ≥ 2⁶⁴).

(byte-length is 0, not 1).

These are exactly the class of bug a differential test catches and a unit test misses: each one passes textbook EVM-spec reading until the implementation is run side-by-side with PyEVM on a malicious-corner input.

4. Precompile coverage and the BN256 / BLAKE2F honest gap

The precompile address space 0x01..0x09 is the most consequential correctness surface in any EVM implementation, because precompiles substitute curated cryptographic primitives for opcode-level arithmetic — and getting the substitution wrong silently produces forks. zeth's differential-test status, per the README1:

| Precompile | Address | Function | Differential status (vs PyEVM, May 2026) | | --------— | -----— | ---------------------— | -------------------------------------------— | | ECRECOVER | 0x01 | secp256k1 sig recovery | verified | | SHA256 | 0x02 | SHA-256 hash | verified | | RIPEMD160 | 0x03 | RIPEMD-160 hash | verified | | IDENTITY | 0x04 | data echo | verified | | MODEXP | 0x05 | modular exponentiation | verified | | BN256ADD | 0x06 | BN256 curve addition | routing implemented, 58 discrepancies investigating | | BN256MUL | 0x07 | BN256 scalar mul | routing implemented, 58 discrepancies investigating | | BN256PAIRING | 0x08| BN256 pairing check | routing implemented, 58 discrepancies investigating | | BLAKE2F | 0x09 | BLAKE2b F compression | routing implemented, 58 discrepancies investigating |

The 58 discrepancies are tracked in validation/baselines/reference_discrepancies.json, and the README's posture is the right one1:

The CI Test Suite is currently red on this regression-gate step

until the discrepancies are resolved or accepted into the baseline

as known-investigating findings. Do not depend on these four

precompiles for production cryptography.

Three things to read off this honestly:

the substrate that makes "verified" mean something for 0x01..0x05. A green CI with 58 silent discrepancies in 0x06..0x09 would be exactly the failure mode the differential discipline is designed to prevent.

single fault in the curve arithmetic can produce dozens of failing cases across the BN256 surface, because the test vectors share inputs across operations. The right reading is "the four red precompiles do not pass the differential gate," not "there are 58 independent bugs."

correct framing.** No other claim is being made. The five green precompiles are the v1-class surface; the four red ones are an honest open coverage gap on the path to v2.

The precompile_differential_report.json artifact published by CI is the live source of truth for which precompiles pass on a given commit. The README's color is a snapshot; the JSON is the live state.

5. Gas accounting — exact-equality goldens, not approximations

Gas accounting is where most EVM implementations silently disagree with the reference. The cost rules for CALL* / CREATE* / SELFDESTRUCT / SSTORE involve enough conditional branches that "close enough" is not a viable correctness bar. zeth's README lines up its gas-correctness coverage as exact-equality golden tests1:

| Surface | Coverage | | ---------------------------------------------— | -----------------------------------------------------— | | CALL* / CREATE* stipend / refund edges | exact-equality golden tests | | SELFDESTRUCT accounting | exact-equality golden tests | | Memory expansion boundaries | exact-equality golden tests | | SSTORE EIP-2200 / EIP-2929 refund logic | exact-equality golden tests |

The CHANGELOG's v0.1.0 "Other" section enumerates the closure work that produced this surface, line by line6:

263 green`

The pattern is consistent: identify a specific gas-accounting edge, write a golden test pinning the reference's exact result, fix the implementation until the golden passes, ship. The 263-test green wall is the cumulative artifact of that loop.

6. State journaling — nested call isolation as integration test

The state-journaling surface — transaction-scoped snapshot / commit / revert with nested call isolation — is exercised through integration tests across the five state-lifecycle surfaces the EVM mutates1:

SLOAD walk)

account at end-of-transaction)

The README's phrasing is "proven through integration tests covering storage, balance, nonce, code, and selfdestruct lifecycle." This is the substrate that makes nested-call semantics correct: when a CALL into a child context modifies state and the child later reverts, the journal must roll back exactly the right set of mutations, including the SELFDESTRUCT flag, the nonce increment that comes from a child CREATE, and the storage writes the child wrote.

The static-context write prohibitions are the dual of that surface: the README enumerates that SSTORE, LOG*, CREATE*, SELFDESTRUCT, and value-carrying CALL / CALLCODE are all prohibited inside static context1. The CHANGELOG's "extend static-mode write prohibitions and regressions" plus "enforce static sstore and add staticcall regression" entries are the closure work for that prohibition surface6.

7. The full opcode-parity surface

Beyond gas and state, the README lists the opcode-level correctness work that landed in the parity-closure tranche1:

detailed in §3

by the parity-closure unit tests; the CHANGELOG names "fix blockhash high-limb input semantics" and "lock extcode and balance push20 address semantics" as the specific landings.

The status-line claim from STATUS_SUMMARY.md ties these together: "Closed (signed arithmetic, SIGNEXTEND, shift, environmental opcodes)" sits under the "High-impact parity gaps" workstream, which the snapshot marks closed2.

8. What's NOT in v1 — the honest roadmap

The repo is alpha software, explicitly. The four open workstreams, from STATUS_SUMMARY.md2:

  1. ~~Complete remaining gas-rule edge correctness~~ — closed

(stipend / refund edges, memory expansion goldens).

  1. ~~Land transaction-scoped state journaling~~ — closed

(journal integration tests through EVM).

  1. ~~Close high-impact parity gaps~~ — closed (signed arithmetic,

SIGNEXTEND, shift, environmental opcodes).

  1. Differential validation hardening and CI regression gates

in progress. The 58 BN256 / BLAKE2F discrepancies are the live work in this workstream. Broader test corpus + machine-readable reports are the deliverable shape.

  1. Strategic trackszeth-sim, zeth-wasm, then zeth-prove.

The wasm32-wasi and riscv64-linux cross-compile targets that already build in CI are the substrate prep for these tracks1.

The CHANGELOG's v0.1.0 "Added" list also names the infrastructure waves: "massive infrastructure wave — 392 tests, 27.7K LoC, 8 new modules"; "ship 5 infrastructure pillars — 308 tests, EVMC, benchmarks, state tests, zkVM guest"; "ship Cancun opcodes, transaction layer, rv32 zkVM target — 285 tests green"7. The Cancun opcode work is in the substrate; end-to-end Cancun fork support as an integrated, differential-tested surface is not what the current snapshot claims.

Six gaps this entry will not paper over:

  1. Cancun fork support is "in the substrate" per the CHANGELOG,

but the README's "Current Reality" section pins the correctness-tested surface at the gas / state / parity work above, not at end-to-end Cancun differential testing. The new Cancun-era opcodes (MCOPY, TLOAD, TSTORE, BLOBHASH) and the London-fork BASEFEE handling are not enumerated under "verified."

  1. The 58 BN256 / BLAKE2F discrepancies are the live red gate.
  2. The single un-dispatched opcode (142 handlers vs 143 enum

entries) is a coverage observation, not a known bug, until the opcode-report matrix expands to exercise it.

  1. PyEVM-only differential, not go-ethereum. The

run_reference_tests runner supports Geth ("uses PyEVM / Geth if available"4) but the README's "verified" calls in §4 are explicitly framed against PyEVM. A go-ethereum cross-check is the natural next layer.

  1. Toolchain pin nuance. The README pins Zig 0.14.1 as the build

toolchain for the measured snapshot; build.zig.zon's minimum_zig_version reads 0.16.0. The README is the canonical build-tested toolchain claim, the .zon is the lower bound; the intersection is the surface the v1 evidence applies to.

  1. No published throughput numbers. The README does not publish

opcodes-per-second or transactions-per-second figures. Performance characterization is a downstream concern that follows correctness closure, not a current claim.

9. The version contract

The CHANGELOG records two relevant tag lines89:

SECURITY.md, CODE_OF_CONDUCT.md (Contributor Covenant 2.1), Dependabot configured for github-actions security updates, CODEOWNERS routing review to @SMC17, LICENSE / README / CONTRIBUTING / CI workflow verified, the v1.x stability promise written down (surface stable; breaking changes bump to v2.x).

release that established the 263-test surface, the differential- test runner, the opcode-report machinery, the precompile-routing surface, and the cross-compile targets.

The two tag lines are deliberate: v1.0.0 marks the hygiene milestone (the API-stability commitment), and v0.1.0 marks the substrate-and-tests delivery the hygiene wraps. Both ship on the same date, which is the rough-but-versioned convention applied: ship a 1.0 that means "the existing surface stands and breaks bump to 2.x," not "feature-complete."

10. Reproducibility

The full reproduction recipe, from the README5:

git clone https://github.com/SMC17/zeth.git
cd zeth
zig build              # builds the binaries
zig build test         # 263 tests, 0 failures on Zig 0.14.1

Examples:

zig build run-counter
zig build run-storage
zig build run-arithmetic
zig build run-events

Validation surfaces:

# RLP validators
zig build validate-rlp
zig build validate-rlp-decode
zig build validate-rlp-invalid

# Machine-readable opcode / precompile report
zig build opcode-report -- --format json --output /tmp/opcode_report.json

# Differential runner (uses PyEVM / Geth if available)
./zig-out/bin/run_reference_tests

# VMTests (requires: git clone https://github.com/ethereum/tests ethereum-tests)
zig build validate-vm

The differential runner's posture is the right one: when PyEVM / Geth are unavailable on the host, the runner falls into "no-reference mode" and exercises 22 / 22 self-consistency cases. The differential claim is gated on the reference being present; the runner does not silently downgrade to "passed" when there's no reference to compare against2.

11. Status footer

the gas / state / parity / 0x01..0x05 precompile surfaces. It does not apply to the 0x06..0x09 BN256 / BLAKE2F precompiles, which are explicitly red and tracked in validation/baselines/reference_discrepancies.json1.

reproduction recipe; Zig 0.14.1 is the pinned build toolchain1.

pinned at revision 381f677 + (2026-03-26 snapshot). Re-pinning the snapshot is the natural next move; the live CI artifacts (opcode_report.json, precompile_differential_report.json) are the live state.

matrix coverage for the 143rd enum entry; go-ethereum cross-check alongside PyEVM; end-to-end Cancun differential testing; published throughput characterization.

12. Cross-references

rippled-zig — XRPL protocol toolkit; the consensus-critical Zig pair.

differential reference10.

zig build validate-vm11.

semantics12.

EIP-2929 (Gas Cost Increases for State Access Opcodes), the SSTORE refund-logic specifications the goldens pin1314.


Footnotes

  1. zeth/README.md, "Current Reality" section, lines 9–24. Pinned snapshot date 2026-03-26 at revision 381f677 +. Toolchain: Zig 0.14.1. zig build test passes (263 tests, 0 failures). Full EVM opcode dispatch (142 / 143 opcodes); precompile routing (0x01..0x09). Per-precompile differential status against PyEVM (May 2026): 0x01 (ECRECOVER), 0x02 (SHA256), 0x03 (RIPEMD160), 0x04 (IDENTITY), 0x05 (MODEXP) verified; 0x06 (BN256ADD), 0x07 (BN256MUL), 0x08 (BN256PAIRING), 0x09 (BLAKE2F) routing implemented, 58 discrepancies vs PyEVM under investigation, tracked in validation/baselines/reference_discrepancies.json, CI Test Suite currently red on the regression-gate step. Gas correctness: exact-equality golden tests for CALL* / CREATE* stipend / refund edges, SELFDESTRUCT accounting, memory expansion boundaries, SSTORE EIP-2200 / EIP-2929 refund logic. State journaling: transaction-scoped snapshot / commit / revert with nested call isolation, proven through integration tests covering storage, balance, nonce, code, and selfdestruct lifecycle. Parity: signed arithmetic (SDIV / SMOD overflow, sign propagation), SIGNEXTEND, SHL / SHR / SAR shift-by-256+, BALANCE / EXTCODE* / BLOCKHASH edge semantics. Static-context write prohibitions enforced for SSTORE, LOG*, CREATE*, SELFDESTRUCT, value-carrying CALL / CALLCODE. Cross-compilation: WASM (wasm32-wasi) and RISC-V (riscv64-linux) targets build in CI.
  2. zeth/STATUS_SUMMARY.md — Snapshot Date 2026-03-26, Revision 381f677 + pending, Toolchain Zig 0.14.1. Measured state: zig build test passes 263 / 263; opcode enum entries 143; opcode dispatch handlers 142; TODO / FIXME markers 2 across src/ + validation/; ./zig-out/bin/run_reference_tests 22 / 22 pass in no-reference mode when PyEVM / Geth are unavailable; opcode_report summary total 33, passed 33, precompile tests 14, precompile passed 14, failures 0; total Zig source 18 608 lines across 35 files. Correctness fixes in this revision: U256 endianness (fromBytes / toBytes correctly map BE bytes to LE limb layout, limbs[0] = LSB, limbs[3] = MSB); SIGNEXTEND byte-position guard handles position ≥ 31 and large U256 positions, sign-bit-zero case clears upper bytes; SHL / SHR / SAR returns correct result when shift amount has non-zero upper limbs (shift ≥ 2⁶⁴); EXP gas returns 10 (not 60) when exponent is zero (byte-length is 0, not 1). Active workstreams: gas-rule edge correctness closed; state journaling closed; high-impact parity gaps closed; differential validation hardening and CI regression gates in progress; strategic tracks zeth-sim, zeth-wasm, then zeth-prove.
  3. zeth/README.md, "Status" section, lines 98–100. "Alpha software under active development. Claims should be tied to passing tests, differential results, and CI artifacts."
  4. zeth/README.md, "Validation" subsection, lines 57–73. zig build validate-rlp / validate-rlp-decode / validate-rlp-invalid for RLP encoder / decoder validators. zig build opcode-report -- --format json --output /tmp/opcode_report.json for machine-readable opcode + precompile report. ./zig-out/bin/run_reference_tests is the differential runner that "uses PyEVM / Geth if available." zig build validate-vm for VMTests against a sibling clone of https://github.com/ethereum/tests.
  5. zeth/README.md, "Quick Start" section, lines 31–55. Prerequisites: Zig 0.14.1, Python 3.11+ (optional, for PyEVM-based validation). Build and test: git clone https://github.com/SMC17/zeth.git && cd zeth && zig build && zig build test. Examples: zig build run-counter / run-storage / run-arithmetic / run-events.
  6. zeth/CHANGELOG.md v0.1.0 "Other" section. The relevant gas / state edges, exact-equality golden landings, and static-context regression closures landing in this release: evm: close P0 correctness gaps — fix 4 bugs, add 58 tests, ship 263 green; add returndatacopy boundary exact-golden matrix; evm: fix sha3 log and returndatacopy memory gas; evm: close zero-length copy gas edges; evm: avoid zero-length copy memory expansion gas; evm: fix extcodecopy memory gas and add state edge tests; lock extcode and balance push20 address semantics; evm: fix blockhash high-limb input semantics; evm: extend static-mode write prohibitions and regressions; evm: enforce static sstore and add staticcall regression.
  7. zeth/CHANGELOG.md v0.1.0 "Added" section. The infrastructure-wave headlines: massive infrastructure wave — 392 tests, 27.7K LoC, 8 new modules; ship 5 infrastructure pillars — 308 tests, EVMC, benchmarks, state tests, zkVM guest; ship Cancun opcodes, transaction layer, rv32 zkVM target — 285 tests green; Add CI differential gating and machine-readable opcode report; Add precompile CALL dispatch with SHA256 RIPEMD160 ECRECOVER. The measured state at the snapshot date pins to 263 / 263 green; the larger test counts in the CHANGELOG headlines reflect cumulative wave totals across the milestones.
  8. zeth/CHANGELOG.md v1.0.0 (2026-05-13). Production-grade hygiene milestone: SECURITY.md present (coordinated disclosure policy); CODE_OF_CONDUCT.md (Contributor Covenant 2.1); Dependabot configured for github-actions security updates (monthly); CODEOWNERS routes review to @SMC17; LICENSE / README / CONTRIBUTING / CI workflow verified; v1.x cycle = surface stable, breaking changes bump to v2.x.
  9. zeth/CHANGELOG.md v0.1.0 (2026-05-13). The substrate-and-tests delivery release. Ships the 263-test surface, the differential-test runner, the opcode-report machinery, the precompile-routing surface, and the wasm32-wasi / riscv64-linux cross-compile targets.
  10. PyEVM — the Python reference EVM implementation at https://github.com/ethereum/py-evm. Used by zeth's run_reference_tests runner as the primary differential reference. The README's "verified" calls for precompiles 0x01..0x05 are framed against this reference.
  11. ethereum/tests — the official Ethereum test corpus at https://github.com/ethereum/tests. Consumed by zig build validate-vm (per zeth/README.md line 71: "requires: git clone https://github.com/ethereum/tests ethereum-tests"). The corpus is the canonical test-vector source the Ethereum implementations cross-validate against.
  12. Gavin Wood, Ethereum: A Secure Decentralised Generalised Transaction Ledger (Berlin / London / Paris versions). The canonical EVM specification: §9 defines execution semantics, Appendix H enumerates the opcode set with its gas costs. The "exact-equality golden tests" framing in §5 means the implementation produces gas costs equal to those derived from the Yellow Paper's formulas plus the EIP overlays (EIP-2200, EIP-2929, etc.).
  13. EIP-2200 (Wei Tang, 2019) — "Structured Definitions for Net Gas Metering." Replaces the previous SSTORE gas metering with a structured set of rules (no-op, fresh-slot, dirty-slot) and refund mechanics. Referenced in zeth/README.md line 18 under the gas-correctness surface ("SSTORE EIP-2200 / EIP-2929 refund logic").
  14. EIP-2929 (Vitalik Buterin, Martin Swende, 2020) — "Gas cost increases for state access opcodes." Introduces the access-list / warm-storage / cold-storage gas distinction that interacts with EIP-2200's refund logic. Both EIPs are pinned as exact-equality goldens in the zeth gas suite.