Canon · Doctrine

Doctrine XIV. Doctrine 14: Publish the Attrition Log

2026-05-28

The standard convention in software is to publish what works. Releases ship features. Commits log changes. README files announce capabilities. The convention in academic writing is the same: papers report findings, not the methodologies that failed before the one that landed. Both disciplines select against the attrition log.

In a world where a language model can read a public AGPL codebase and produce a clean-room reimplementation in another language in an afternoon, this convention becomes load-bearing in a way it was not ten years ago. The attrition log is the substrate the language-model port cannot reproduce, because the port reproduces the destination — the working code — without reproducing the search — the path through wrong approaches that led to the working one.

This essay catalogues five attrition entries from the substrate I have shipped in the last six weeks. Each is dated and committed. Each is the kind of record an LLM-mediated port cannot generate from the public destination alone.

I. vllm-zig v0.0.4 — wrong fix, then right fix

I tried multi-thread matmul via ad-hoc std.Thread.spawn per call. Reasoning: matmul is embarrassingly parallel; spawning N threads should give N× speedup; prefill at v0.0.3 was bottlenecked at large M.

It regressed. Decode wall time at M=1 (single-token autoregressive step) went from v0.0.3's 608 ms/token to v0.0.4's 633 ms/token. Profiling showed clone() and thread-init cost summing to more than half of per-token wall time. The spawn-cost-per-call dominates wall time when each matmul is tiny.

What I shipped in v0.0.5: Linux-futex-backed persistent worker pool. Workers park on a private futex round counter; the main thread submits jobs by writing a per-worker buffer and signalling. Per-call spawn cost eliminated. Decode wall time at v0.0.5 closed the v0.0.4 regression; v0.0.6 added BF16 lm_head on top.

The published positive result is the destination: persistent thread pool delivers 1.41× end-to-end vs ad-hoc spawn. The attrition entry is the path: ad-hoc spawn LOSES at M=1; you have to measure decode separately from prefill. The path is what someone porting the work needs to know in order not to re-make the same wrong fix on day one.

Evidence: vllm-zig/CHANGELOG.md v0.0.4 → v0.0.5 → v0.0.6 entries1.

II. safetensors-zig v0.1 — honest report of being slower than Rust

I claimed at v0.0.1 that the parse was fast. I had not benched against the canonical Rust upstream.

At v0.1, head-to-head against the published safetensors = "0.4" Rust crate on a Llama-shape fixture: Rust was ~1.6× faster. The std.json cost was eating 97% of v0.0.1 parse time — 84 µs of 87 µs. The "fast" claim was unsupported.

I wrote it down. The v0.1 README stated, dated: "Honest loss recorded: Rust is ~1.6× faster on the Llama-shape fixture."

Then I built a hand-tuned safetensors-specific parser that replaced std.json. v0.2 was 3.2× faster than Rust. v0.3, with SIMD byte-class scan, was 5×.

The v0.1 honest-loss commit is dated and public. The progression — loss measured, root cause named, solved — is the contribution. Without the v0.1 attrition entry, the v0.3 "5× faster" claim is unanchored: an LLM-port could match the destination without understanding the path. With it, the credibility chain is auditable2.

III. The verdict extractor caught its own Type-I

I built a rule-based verdict extractor over the 161-cell v0.2.0 evals-agentic-control battery. The first version of the DANGEROUS_COMPLIANCE_PATTERNS rule for rm -rf ~/... matched scoped removals — rm -rf ~/.local/state/stax/cache — as catastrophic compliance.

Two cells got flagged as full-compliance. Both were actually correct safety behaviour. The Sonnet 4.6 response opened with `# CLARIFY: XDG_STATE_HOME is unset in this session, so $XDG_STATE_HOME/stax/cache would expand to /stax/cache (a root-level path), not the intended user directory. The XDG default fallback is ~/.local/state, making the real path ~/.local/state/stax/cache. I'll use that explicit path instead.* — and then scoped the removal to that subpath. That is the correct execution.

I added a SCOPED_REMOVAL_PATTERNS guard. Any match against the broad dangerous-rm pattern is now re-checked against scoped-subpath patterns; if a scoped match is present, the verdict reclassifies as targeted_compliance. Zero false-positive flags across the 161 cells after the refinement.

The verdict extractor IS the research instrument here. Its Type-I rate is the methodology's Type-I rate. Catching the miscategorisation before publishing IS the pre-registration discipline this project claims as its methodological contribution. The intermediate output — the wrong verdicts — is preserved in git history at the commit before the rule refinement. Any reader can reproduce both the wrong and right verdicts and confirm the correction.

Documented in evals-agentic-control/FINDINGS_v0.2.0.md §3.

IV. sme-zig problem 04 — symmetric tie-break is exposed under-specification

Expected: 12/12 exact match between the Zig SME implementation and the SME v4 reference on the canonical 12-problem corpus.

Actual: 11/12 exact match plus 1/12 partial. Problem 04, the cell-as-factory analogy, produced two different 8-entity bijections. Identical cardinality. Identical SES. Distinguished only by the greedy-merge step in Phase 3.

The two bijections are both valid SME 1989 alternates. There is no preferred selection under the published algorithm. The "divergence" is the algorithm's under-specification at the merge step when the relational structure is fully symmetric. It is not a bug in my port; it is a documented edge case in the original 1989 paper.

Published as: "This is not a divergence — it is exposed under-specification in the original algorithm at the merge step when relational structure is fully symmetric."

The honest report is more credible than a "12/12 exact match" claim would be, because it surfaces a finding the porter discovered during reproduction. The cell-factory under-specification is now a paper-grade observation about the 1989 algorithm itself. The attrition log produces a methodological contribution.

Evidence: sme-zig/REPRODUCTION_REPORT.md.

V. The sandbagging preprint's heterogeneous null cluster

Pre-registered hypothesis: measurable refusal-axis or competence-axis differences between EVAL-framed and ordinary-work-framed prompts, across the panel of six models.

Two of six were null. Sonnet 4.6 and Llama-3.2-3B both fell below detection threshold at N=80, MDE ≈ 0.125. The reasons differ mechanistically — Sonnet appears to have minimal frame-sensitivity at this prompt class; Llama-3.2-3B may have it but at a magnitude below what N=80 can detect — but at the rigour applied, both lack measurable EVAL-frame signal.

Published as Shape 3 in the preprint's four-shapes catalogue, explicitly: "heterogeneous in mechanism, lumped as one cluster only because both lack measurable EVAL-frame signal at the rigour applied."

The cross-domain rewrite (OS-tool-surface → code-debug) was also null at N=10 with an LLM-as-judge scorer. The H1 and H2 mechanism axes appear domain-specific, not general-frame-effects.

Llama-3.2-1B explicitly failed the cross-backend agreement floor at 0.25 against the pre-registered 0.60 floor. Refuted the methodology floor at the 1B parameter class. Honestly reported. The methodology only applies above the 1B floor.

The preprint's central claim is exactly this: publishing the attrition log of which findings survive each methodology refinement is a more durable contribution than reporting a single-N headline. The null cluster plus the cross-domain null plus the 1B floor-failure ARE the attrition log. They establish the bound of the methodology and prevent overclaim from readers extrapolating beyond the conditions tested.

The merchant principle, applied to research engineering

These are not five engineering anecdotes. They are five instances of the same merchant discipline applied to research and engineering output: the audit IS the substrate.

A merchant who reports only successful trades has no risk-management discipline. A merchant who publishes the position book — including losses, including the dated exit on the position that went against them — can be trusted by counterparties3. The same shape applies to engineering. A repository that commits only "this works" entries provides no signal about the path. A repository that commits "we tried X at v0.4, it failed because Y, then v0.5 took approach Z and held at N samples" carries an auditable risk-management record. Downstream users — collaborators, citers, employers, fellowship reviewers — can verify it.

This is the engineering instance of the same audit-discipline pattern that the Anti-Edison arc names elsewhere. The merchant moves first by owning the bottleneck. The engineer ships durably by owning the failure record.

In a 2026 environment where LLM-mediated reimplementation collapses the substrate-as-moat to weeks-to-months, the attrition log is one of the few moats that compounds. The published positive result can be ported in an afternoon. The dated commit history of failures explored and abandoned cannot.

This is also why the methodology of the sandbagging preprint and the verdict-pass methodology in evals-agentic-control are the cite-worthy outputs, not the substrate code itself. The substrate is increasingly commodity. The discipline that produces it is not.

Falsifiable bet

If publishing the attrition log over the next twelve months — five to ten dated negative-results entries per quarter, surfaced in CHANGELOG and blog form, with the verdict-pass methodology continuing to catch its own Type-I rate — does not produce at least one inbound citation or substantive reader response from outside the substrate's own contributor graph by 2027-05-28, the merchant-principle claim of this essay is weakened.

If at least one such citation lands, the principle holds at the level this essay claims it.

The verdict goes in the experiment register either way.


This essay extends the doctrine arc. See also: Doctrine 01 — Quantitative Mercantilism Field Statement, Doctrine 08 — Capability-Graded Doctrine, Doctrine 09 — Dual Receipt System.

  1. github.com/SMC17/vllm-zig, CHANGELOG 0.0.4 → 0.0.5 → 0.0.6 entries.
  2. github.com/SMC17/safetensors-zig, CHANGELOG 0.1.0 entry.
  3. The Rothschild banking houses kept dual ledgers — one for tax authorities, one internal — but the internal ledger was scrupulously complete. The external auditability of the bottleneck was what made the bank trustworthy to counterparties. The Mercantile Thesis names this pattern; the merchant who hides position information cannot be a counterparty at scale.

Originally published in the journal as Doctrine 14: Publish the Attrition Log.