Portfolio bench: five projects, one hardware, one day, parseable numbers

2026-05-15 · project sovereign-stack

This is a lab-notebook entry, not an essay. Five OSS projects, each shipping its own BENCH.md with measured numbers, all captured on the same machine on 2026-05-15. The point is not the absolute numbers (those are Intel Ice Lake specific). The point is the discipline: consistent methodology, parseable wire format, reproducibility recipe in every BENCH.md.

The discipline is the substrate; the numbers are the artifacts.

1. The portfolio

| Project | Visibility | Repo | Latest tag | |--------------------—|----------—|-----------------------------------—|---------—:| | zig-h3 | public | [SMC17/zig-h3][zig-h3] | v1.3.0 | | mast | public | [SMC17/mast][mast] | v1.2.0 | | aac-launch | public | [SMC17/aac-launch][aac-launch] | v0.4.0 | | stax-experiment | private | [SMC17/stax-experiment][stax-experiment] | v0.2.0 | | agent-app-control | public | [SMC17/agent-app-control][agent-app-control] | v0.6.0 |

[zig-h3]: https://github.com/SMC17/zig-h3 [mast]: https://github.com/SMC17/mast [aac-launch]: https://github.com/SMC17/aac-launch [stax-experiment]: https://github.com/SMC17/stax-experiment [agent-app-control]: https://github.com/SMC17/agent-app-control

Every release is GPG-signed (fingerprint 079261B06444C6A410B3BE363CFCB60243028886). stax-experiment is private while the substrate-discipline gets dogfooded — the other four are public AGPL.

2. Hardware + methodology

Host: Intel Core i7-1065G7 @ 1.30 GHz (Ice Lake, 4C/8T, 1.3 GHz base / 3.9 GHz boost), Linux 7.0.3-arch1-1 x86_64. Compiler: Zig 0.16.0, -Doptimize=ReleaseFast. Timing: std.os.linux.clock_gettime(.MONOTONIC, &ts). Date: 2026-05-15.

Every BENCH.md is a child of this snapshot. Different hardware will move absolute numbers; ratios (e.g., pure-Zig vs libh3) are robust within Zig 0.16.0 across architectures.

3. Hot-path primitives — 19–28 M ops/sec

Operations that run on every dispatch and gate larger I/O. These need to be cheap enough that no realistic workload bottlenecks on them.

| Project / Operation | Throughput | ns/op | |---------------------------------------------------------—|--------------—|----—:| | agent-app-control isMutatingCommand (16 verbs) | 28.5 M ops/sec | 35 | | mast buffer setContents (16 B) | 27.0 M ops/sec | 37 | | mast buffer append (single chunk) | 23.4 M ops/sec | 42 | | mast buffer fromBytes (16 B) | 19.5 M ops/sec | 51 |

These primitives are 3–4 orders of magnitude below the noise floor of any real I/O they gate: hyprctl roundtrip ≈ µs, ydotool key event ≈ ms, file fsync ≈ ms.

4. Parser surfaces — operations that scale with input size

| Project / Operation | Throughput | ns/op | |---------------------------------------------------------—|---------------—|----—:| | aac-launch parseExecLine (bare exec) | 3.9 M ops/sec | 254 | | aac-launch parseExecLine (%f substitution) | 2.4 M ops/sec | 418 | | aac-launch parseExecLine (quoted) | 1.2 M ops/sec | 843 | | stax-experiment parseLineSummary (95 B) | 95 K ops/sec | 10 478 | | stax-experiment bulk-read 100 K events | 170 K parses/sec | 5 860 |

parseExecLine is 30× faster than parseLineSummary per call because parseLineSummary builds a full std.json DOM tree per line. The next-frontier optimization target (streaming parser, est. 3–5× speedup) is named in stax-experiment's BENCH.md — filed honestly as a post-v0.2 move, not a v0.2 blocker.

5. Geo-conversion — comparative against C reference

The headline result of the portfolio. zig-h3's pure-Zig path is 0.71–0.88× of libh3 (12–29% faster) on every measured geo operation. Same hardware, same allocator, same RNG seed, same binary.

| Operation | libh3 ns/op | pure-Zig ns/op | Ratio | |------------------------—|----------—:|-------------—:|----—:| | latLngToCell (res 7) | 1068 | 761 | 0.71× | | latLngToCell (res 9) | 1151 | 826 | 0.72× | | latLngToCell (res 11) | 1314 | 959 | 0.73× | | cellToLatLng (res 7) | 562 | 398 | 0.71× | | cellToLatLng (res 9) | 526 | 435 | 0.83× | | cellToLatLng (res 11) | 640 | 482 | 0.75× | | gridDisk (res 9, k=3) | 1026 | 902 | 0.88× |

This is not "Zig beats C" as a general claim. It is "this specific port, with this specific set of compiler decisions (inline-default, tighter struct layout, no prototype-width noise), comes out ahead on this hardware." Different compilers or CPU generations will move the absolutes. The dedicated entry on this is at /lab/zig-h3-pure-zig-vs-libh3/.

6. IO-bound — operations that touch disk

| Project / Operation | Throughput | µs/op | |---------------------------------------------------------—|---------------—|----—:| | mast Buffer.save (16 B, atomic fsync+rename) | 42.8 K ops/sec | 23.4 | | mast Buffer.save (4 KB) | 45.3 K ops/sec | 22.1 | | mast Buffer.save (64 KB) | 22.4 K ops/sec | 44.6 |

Dominated by the fsync → rename syscall sequence. Throwing more CPU at it does not help — the bottleneck is the journal commit. At 22 µs per atomic save the user-facing perception is zero perceptible lag even at 10+ saves per second.

7. Cross-project pattern

Across all 5 projects, the bench surfaces share these properties:

  1. Single-threaded baseline. No SIMD, no PGO, no LTO. The numbers

are reproducible from a fresh git clone.

  1. MONOTONIC timing. Every bench uses clock_gettime directly,

not std.time.Timer (which was removed in Zig 0.16).

  1. Parseable output. Every bench emits `bench=NAME size=N

iters=N ns_per_op=N ops_per_sec=N` lines that downstream tools can grep. No JSON, no XML, no protobuf — just key=value.

  1. Honest scope. Every BENCH.md names what the numbers are NOT:

not multi-threaded, not PGO-optimized, not hardware-portable absolutes (only ratios are robust). Every "we beat X" claim is paired with a "we beat it under these specific conditions" clause.

  1. Reproducibility recipe at the bottom. Every BENCH.md ends with

the exact 2–3 commands a reader needs to re-run the bench.

8. What this portfolio is NOT yet

These benches are a start, not a destination. Open next-frontier moves filed honestly in individual BENCH.md files:

to validate the ratio claims hold beyond x86_64 Ice Lake.

binding comparison Uber actually ships).

allocator pressure).

overhead, currently unmeasured).

None of these are v1.0 blockers. They are the multi-quarter next moves that compound the substrate as the public footprint grows.

9. Why this entry exists

A single project's BENCH.md is a perf claim. Five aligned BENCH.md files in five released projects, all captured on the same hardware with the same methodology and a parseable wire format, is a substrate claim — that the engineering discipline is consistent across projects rather than ad-hoc per release.

The discipline is the substrate; the numbers are the artifacts.

10. Reproducibility

Every individual BENCH.md ends with the exact reproduction recipe for its specific bench. To reproduce the aggregate view:

for repo in zig-h3 mast aac-launch agent-app-control; do
  git clone https://github.com/SMC17/$repo
  cd $repo
  zig build bench -Doptimize=ReleaseFast 2> /tmp/$repo-bench.out
  cd -
done

# stax-experiment is currently private; if you have access:
# git clone https://github.com/SMC17/stax-experiment && ...

The full audit register that gates every claim here lives in stax-experiment. The 169 verdicts on this workstation (134 confirmed, 22 refuted, 14 inconclusive, 11 Type-1 catches, 33 Type-2 catches as of 2026-05-15) record what was claimed and what actually held.