Benchmark Challenge: znap — Zero-Allocation Zig Neural Inference (scaffold)
Target claim (from public PRE_LAUNCH_CHECKLIST.md and paper mirror): 31 ns linear policy (30→6), 8× BitNet b1.58 ternary compression (zero quality loss on "same action" metric), 34-138× speedup vs NumPy/PyTorch equivalents on 30-dim workloads. Zero-allocation hot path on Apple M1 (NEON) and x86_64 (AVX2/AVX-512). Reproducible via public model + Zig only.
Hardware class for baseline: Apple M1 (or M-series) 10-core or equivalent x86_64 with AVX2+. Consumer GPU not required.
Reproduction (exact, repo-relative references):
- Clone or extract the znap surface (public mirror at SMC17/znap or local mac-mining/gh-iceberg/clones/znap equivalent after clean extraction).
curl -L -O https://huggingface.co/SMC17/stories15M-zig/resolve/main/stories15M.bincurl -L -O https://huggingface.co/SMC17/stories15M-zig/resolve/main/tokenizer.binzig build run -- stories15M.bin tokenizer.bin -t 1 -n 256- For micro-benches: build and run the SDOT / linear / DQN benches in the repo (see PRE_LAUNCH_CHECKLIST.md for 30-dim, 30→64→6 cases).
Success criteria (falsifiable):
- Linear policy (30→6) reports ≤ 50 ns median on M1-class hardware at ReleaseFast (or equivalent documented ISA).
- 8× memory reduction on BitNet b1.58 matrices with action-match rate documented.
- Speedup table vs PyTorch/NumPy baseline (same workload, single-thread where applicable) shows ≥ 30× on at least one micro-op.
- No malloc in inner loop (GPA safety = true shows zero events).
Falsifier: Median exceeds 100 ns on the linear policy case on documented M1-class hardware, or speedup < 10× vs equivalent PyTorch path on the same 30-dim workload, or allocation events appear in the hot path under safety allocator.
Proof level of this challenge: scaffold (repro commands drawn from public audit checklist; independent run required for "benchmarked").
Related public evidence in this stack:
- blog/content/lab/zig-h3-pure-zig-vs-libh3.md (another pure-Zig win).
- COMPETITIVE_LANDSCAPE.md (evidence graveyard section).
Register any new run or improvement:
stax-experiment register --lane znap-challenge --hypothesis "My hardware reproduces or beats the 31 ns linear policy" --falsifier "Median >100 ns or speedup <10× on equivalent workload"
Run the challenge. Publish the trace. Non-Zig inference stacks that cannot match the latency or allocation profile on equivalent silicon are the baseline this evidence attacks.
Extraction note: This document was produced as part of the clean harness + QM positioning surface (aac-launch/extract-harness-core.sh + blog canon). All paths are repository-relative.