"DOCTRINE 06"

Doctrine 06: The Eight-Axis Check — A Sovereign-Appliance Audit Rubric

2026-05-12 · 19 min read · 4561 words

The Mercantile Thesis names a slot: the appliance layer of AI is built by an integrator who owns the path from silicon to editor as a single sovereign stack. The thesis includes a falsifiable bet, Bet 3, dated Q4 2029: someone will ship that stack and pass an eight-axis check. Without a published rubric, the bet is unfalsifiable: the author would get to score his own product, define each axis after the fact, and grade the result. That is not how falsifiable bets work.

This essay specifies the rubric. It defines each of the eight axes, provides a Pass / Partial / Fail scoring scale, demonstrates the rubric on a real product (NVIDIA's DGX Spark / Project DIGITS), states what is deliberately omitted from the rubric and why, and describes the audit procedure that three external reviewers would follow when Bet 3 resolves.

The rubric is not a moat checklist. A product can fail axes and still be commercially successful. The rubric scores one specific structural property: whether the product is a sovereign appliance in the Mercantile Thesis sense, a stack whose owner cannot be cut off by an upstream vendor, regulator, or model lab without consent. That property is not the same as commercial success, brand strength, or capital efficiency. It is the property the merchant lens specifically claims is durable.

I. The Eight Axes

Each axis names a layer of the integration the appliance has to defend. The axes are ordered roughly bottom-up (from the metal toward the user), but the ordering is not a priority weighting. All eight count equally in the check.

1. Silicon path. The integrator can emit instructions for the target silicon without going through a vendor-controlled intermediate runtime that the vendor reserves the right to revoke or break. The standard is operational stability (the integrator can ship today and keep shipping next year on the same interface), not formal documentation. NVIDIA's PTX is documented but the JIT compiler shifts between driver versions; CUDA-the-runtime is the actually-stable contract. Apple's Metal Shading Language is documented; AMX is undocumented but reverse-engineered with multi-year stability (MLX ships against it). AMD's HSA/HIP and the LLVM AMDGPU backend are documented and stable. RISC-V's vector ISA is the cross-vendor reference. Documentation correlates with stability but is neither necessary nor sufficient; the check is whether the interface holds across vendor releases the integrator does not control. The Python API for torch.cuda is not a silicon-path interface; it is a runtime call layered on a layered runtime.

2. Runtime. The integrator ships an inference-and-orchestration daemon that runs on the user's hardware, accepts a model and a workload, and produces results without requiring a network round-trip to a vendor endpoint. vLLM, llama.cpp, MLX, Ollama, NVIDIA's local NIM containers, Apple's local CoreML inference are runtime artifacts in this sense. A wrapper that calls api.openai.com is not. The check: can the user run inference air-gapped, on owned hardware, indefinitely?

3. Determinism. Given the same model weights, input tokens, sampling parameters, and seed, the runtime produces the same output bytes. Bit-identical reproducibility is the strict standard; reproducibility up to floating-point epsilon is the partial standard. Most production inference stacks fail this strict standard today because of nondeterministic GPU reduction kernels and atomic accumulator races. Determinism matters because it is the prerequisite for both auditable behavior and reproducible debugging; without it, every defect investigation is archaeology. The check: does the runtime ship with a reproducibility test in CI that fails the build if a deterministic-mode invariant breaks?

4. Multi-agent orchestration. The integrator owns the harness that coordinates multiple model calls into a single user task: tool selection, agent-to-agent routing, retry and fallback policy, loop termination. The harness is owned when its source is in the integrator's repository, the orchestration loop is introspectable at every step, and the user can replace any agent or tool with their own without modifying the framework. LangGraph, the Anthropic Agent SDK harness shape, Cognition's trajectory-trained orchestrator are partial-credit examples; a black-box "agent" endpoint is fail. The check: can the user trace any model call back to the line of harness code that emitted it, and substitute a custom implementation at that line without forking the framework?

5. Editor surface. The user's editing environment is the human-facing surface where the appliance is consumed. The check is two-sided: the integrator must support the user's existing editor (Emacs, Neovim, JetBrains, VSCode via LSP) without requiring a switch, and the integration must not be a thin protocol bridge that hides the appliance behind a vendor-controlled UI. Cursor and Zed are partial credit (forked-editor pattern: own UI, but the user must adopt it); Continue, Aider, Codeium plugins running locally are higher credit (work inside the user's chosen editor). The fail mode is requiring all editing to happen inside a vendor app.

6. Build gate. The integrator owns the boundary at which source becomes signed, distributable artifact. Reproducible builds (Nix, Bazel-with-remote-cache-disabled, Guix, Buck2 in hermetic mode), signing keys held by the integrator and not by a vendor (Sigstore with the integrator's own Fulcio or with offline keys), and release pipelines that a third party can independently re-run from public source are the full-credit standard. Vendor-CI-with-local-signing is partial credit. Vendor-controlled release approval is fail. The check: can a third party rebuild the released artifact from the published source and verify the signature against the integrator's published key?

7. Data lineage. Every model output is traceable to (a) the training corpus or fine-tune dataset that produced the weights, (b) the tool-call inputs that produced the runtime context, and (c) the user-owned log of the call itself. Provenance for training data is the hardest part of this axis (most open-weight models do not publish complete training corpora), and partial credit is appropriate when the integrator publishes the fine-tune lineage even if the base-model lineage is opaque. The check: given a model output, can the user produce a complete chain of evidence (training data identifiers, fine-tune dataset IDs, tool-call inputs, prompt) that explains where the output came from?

8. License posture. Every load-bearing component (code, model weights, configuration, tool descriptors, default prompts) is licensed such that the integrator (and downstream users) can fork, modify, redistribute, and run commercially without vendor consent. AGPL, Apache 2.0, MIT, BSD, and the OSI-approved open licenses pass; the OpenRAIL family, source-available licenses with field-of-use restrictions, and most "research-only" model licenses are partial credit; proprietary EULAs and closed weights are fail. License posture is the legal absorption surface; it is what makes the stack actually portable when a vendor changes terms. The check: can the user fork the entire stack and run it under their own terms without violating any license in the dependency tree?

II. The Scoring Scale

Each of the eight axes is scored on a three-point scale.

Pass. The integrator unambiguously meets the axis criterion as stated. There is documentary evidence (source code, signing keys, license files, reproducibility test logs) that an external reviewer can verify without vendor cooperation. The standard is binary and conservative: when in doubt, score Partial, not Pass.

Partial. The integrator meets the spirit of the axis but with a meaningful caveat: a vendor dependency the integrator does not control, a reproducibility gap the integrator has not yet closed, a license that grants most but not all of the rights the axis requires. Partial-credit scoring is the most common outcome on a serious audit; it is not a failure. It is the honest signal that the appliance is sovereign on this axis under the current vendor relationships and would need explicit work to remain so if those relationships changed.

Fail. The integrator does not meet the axis criterion. The product is dependent on an external party for a function the rubric requires the integrator to own. Fail on a single axis does not mean the product is not a sovereign appliance, but it does mean that for that specific axis, the product is at the mercy of the upstream party.

Substrate-Owner Pass (asterisk). A fourth outcome reserved for the case where the candidate being audited is itself the substrate owner for the axis being scored: NVIDIA being scored on Silicon Path, or Apple being scored on Silicon Path on Apple Silicon, for example. The substrate owner trivially satisfies the axis criterion because they own the layer the criterion is testing for, which is structurally uninformative for the appliance-integrator question this rubric is built to answer. Substrate-Owner Pass is recorded with the asterisk and explicitly does not count toward the composite-rubric Pass-on-all-eight requirement; an axis with Substrate-Owner Pass is treated as Partial for composite-scoring purposes. The asterisk is the rubric's honest signal that the candidate is the wrong reference frame for this axis.

The composite rubric requires Pass on all eight axes for the product to qualify as a sovereign appliance under Bet 3. Partial on any axis flags the product as sovereign-aspirational: the integrator has named the axis but has not yet closed it. Fail on any axis disqualifies the product from the sovereign-appliance category for the bet's purposes; it may still be a successful product on other dimensions.

This scoring is deliberately strict. Bet 3 says passes on all eight; that requirement is what makes the bet hard. A scale that allowed mostly-passing products to qualify would convert Bet 3 into a near-tautology, the failure mode the verification protocol explicitly flagged for the original Bet 3 wording.

III. A Worked Example: NVIDIA DGX Spark / Project DIGITS

To show the rubric is operable rather than rhetorical, here is the scoring for the most-discussed contemporary candidate: NVIDIA's DGX Spark / Project DIGITS workstation appliance, scored as of 2026-05.

Version-pinned scoring: the per-axis scoring below references NVIDIA's product surface as it stood in May 2026 (NIM container architecture, NeMo orchestration, NGC registry distribution, NVIDIA AI Enterprise license terms). Reviewers running this rubric in 2027 or later should expect the product surface to have shifted and should treat the scores below as illustrative methodology, not as authoritative current-state assessment. The rubric's structure is the durable artifact; the scoring is a snapshot.

1. Silicon path. Substrate-Owner Pass (asterisk). PTX is publicly documented, the LLVM NVPTX backend is open source, and a developer can emit PTX without going through CUDA-the-runtime. NVIDIA owns the silicon and the public ABI; the integrator's interface is documented. The asterisk: the substrate owner trivially passes axes where it owns the layer being scored, which is structurally uninformative for the appliance-integrator question this rubric is built to answer. NVIDIA-on-NVIDIA on Silicon Path is not a meaningful Pass; an independent integrator emitting PTX without NVIDIA's cooperation is. The asterisk applies to any axis where the candidate is the substrate owner; downstream this means NVIDIA also gets a Substrate-Owner Pass on Build Gate (axis 6) for similar reasons.

2. Runtime. Partial. NIM (NVIDIA Inference Microservices) ships as locally runnable containers; the user can run inference air-gapped on owned hardware. The partial-credit caveat: NVIDIA's recommended deployment path bundles enterprise-licensed components and depends on the NGC container registry for updates, which is a vendor-controlled distribution channel. A determined user can pin and self-mirror, but the default path is vendor-tethered.

3. Determinism. Fail. As shipped, the inference path uses nondeterministic CUDA reductions for performance, and there is no shipping-default deterministic-mode CI test that gates releases. NVIDIA exposes deterministic-mode flags but does not guarantee them at the runtime level; an external reviewer cannot point to a CI invariant that fails the build if determinism breaks.

4. Multi-agent orchestration. Fail. DGX Spark ships with NeMo and a recommended agent framework, but the orchestration loop is not an integrator-owned harness in the rubric's sense: the recommended path routes through NeMo or LangGraph, both of which are upstream third parties. The product itself does not own the multi-agent layer; it ships pointers to other people's harnesses.

5. Editor surface. Fail. No first-party editor integration ships with the workstation. Users are expected to bring their own editor and to integrate via existing tooling (LangChain, Aider, etc.). This is not a fail on the editor's side; it is a fail on the appliance's side. The rubric requires the appliance to own the editor seam, and DGX Spark does not.

6. Build gate. Partial. The OS image is signed by NVIDIA; updates are signed by NVIDIA. Users can run their own builds on the hardware and sign with their own keys, but the appliance's default release pipeline is vendor-controlled. A third party cannot independently rebuild the shipped DGX Spark image from public source.

7. Data lineage. Fail. Models shipped via NIM are accompanied by model cards but not by complete training-data provenance. Tool-call lineage is whatever the user's chosen harness provides; since the harness is not part of the appliance (axis 4), data lineage is not part of the appliance either.

8. License posture. Fail. Significant components of the DGX Spark stack ship under the NVIDIA AI Enterprise license, which restricts redistribution and field of use. The user cannot fork the entire appliance and run it under their own terms; they can only use the components NVIDIA has licensed for their use case.

Composite: 0 Pass, 3 Partial, 5 Fail. (Per Section II's compositing rule, Substrate-Owner Pass on axis 1 composites as Partial; the per-axis label remains Substrate-Owner Pass with the asterisk, but the composite count reflects the lack of meaningful Pass for the appliance-integrator question this rubric scores.) DGX Spark is, on this rubric, not a sovereign appliance. It is a workstation-scale NVIDIA product that solves the silicon-path axis (because NVIDIA owns the silicon) but does not solve the upstream axes that the appliance integrator's slot specifically requires. This is exactly the prediction the Mercantile Thesis makes: substrate owners build substrate-shaped products, and the appliance slot is filled by an integrator who picks up the axes the substrate owner has organizational reasons not to ship.

This worked example is illustrative, not normative. NVIDIA's organizational shape is right for what NVIDIA is doing; the appliance integrator's slot is a different slot.

IV. What's Deliberately Out

Three categories are deliberately omitted from the rubric. Each omission is a choice with a reason.

Go-to-market. Distribution channels, partnerships, sales motion, and marketing surface are not in the rubric. The reason is that a sovereign appliance can succeed or fail on go-to-market without becoming any more or less sovereign; the axes scored are structural properties of the product, not of the company. A perfectly architected appliance with bad distribution is still a sovereign appliance; a perfectly distributed wrapper with poor architecture is not. The rubric scores the architecture.

Capital structure. Funding source, equity-vs-grant-vs-bootstrapped, valuation, and runway are not in the rubric. The reason is the same: capital structure changes the company's velocity and survival probability but does not change whether the product is sovereign. A bootstrapped sovereign appliance and a Series-B sovereign appliance score identically on the eight axes. (Capital structure does affect whether the integrator can resist acquisition pressure that would compromise the axes, but that is a future-state consideration; the rubric scores the current state.)

Brand and ecosystem. Developer mindshare, GitHub stars, conference talks, and ecosystem extensions are not in the rubric. These are downstream effects of the architecture, not architectural properties. A product with strong axes and weak brand is still sovereign; a product with strong brand and weak axes is not.

These omissions are the hardest part of the rubric to defend rhetorically (every one of the omitted categories is real and important), but each is real and important for a different question than the one the rubric is trying to score. The rubric scores sovereignty. Sovereignty is not the same as success. Conflating them would corrupt the rubric.

V. The Audit Procedure

Reviewers. Three external reviewers, named by mid-2027, none of whom are employees, contractors, advisors, or equity holders in the integrator being audited. Reviewers are recruited from across at least two regions (e.g., one EU, one US, one rest-of-world) to dilute single-jurisdiction bias. Reviewer identities are public. Reviewer disclosures (any indirect relationship to the integrator) are public.

Materials. The integrator provides: source repository access (read-only is acceptable), build documentation, signing key information, license inventory, runtime artifacts, sample model weights, and a written self-assessment scoring each axis. The self-assessment is the integrator's claim; the reviewers' job is to confirm or contest it.

Procedure. Each reviewer scores each axis independently. Scores are submitted privately to the audit coordinator. After all three reviewers have submitted, scores are compared. Where reviewers agree (3-of-3 on Pass / Partial / Fail), the score stands. Where they disagree, the reviewers convene and produce a joint written justification for the final score; the joint justification is published alongside the score. The composite rubric requires Pass on all eight axes; any axis with a non-unanimous Pass requires the joint justification to explicitly address the dissent.

Publication. The full audit (per-axis scores, per-reviewer notes, joint justifications, and the integrator's self-assessment) is published as a document attached to the bet's resolution. The publication is the bet's resolution; there is no parallel private record.

Disputes. If the integrator believes the audit was conducted incorrectly, they may publish a written dissent. The dissent does not change the published audit; it is published alongside it. The rubric explicitly does not allow re-scoring after publication. Bet 3 says the bet defaults to failed if the rubric is contested at resolution time; the audit's authority comes from the procedure being followed, not from the integrator's agreement with the result.

A lighter procedure (single auditor, unpublished scores, integrator-vetoed publication) would not produce a result an external skeptic could trust. The Mercantile Thesis is a public claim; its falsification has to be a public process. The procedure's weight is the cost of making the structural claim falsifiable.

VI. Adversarial Robustness: What If the Integrator Refuses, or Nobody Scores 8/8?

Two failure modes the procedure has to handle explicitly.

The non-cooperating integrator. An integrator scoring poorly has every incentive to refuse repository access, decline to submit a self-assessment, or stiff-arm reviewers and let the bet expire. The default rule: if a plausible candidate has been publicly identified as a Bet 3 candidate and the integrator declines to participate by 2029-09-30 (three months before the bet resolves), reviewers score from public artifacts only and the candidate is presumptively scored Fail on any axis where public evidence is unavailable. (The audit coordinator role: the rubric author or a successor named publicly by 2027, a single accountable person, not a committee, charged with identifying candidates, recruiting reviewers, and publishing the audit. The role's identity is itself a public commitment; if the rubric author cannot serve in 2029, succession is announced no later than 2027-12-31.) The integrator does not get to win the bet by silence. Refusal is not procedural neutrality; it is a Fail signal on the axes whose evidence is held private.

The structurally-near-impossible 8/8. A serious objection: if no candidate plausibly scores 8/8, Bet 3 defaults to Failed by construction, and the parent thesis can claim the appliance slot remains "open" indefinitely. That would convert the bet into rhetorical armor disguised as falsification. The honest answer: the rubric is calibrated against the eight load-bearing axes the parent thesis depends on, not against the lowest bar a 2029 product might clear. If 8/8 turns out to be unreachable in practice by Q4 2029, the right inference is not that the rubric is too strict; it is that the merchant lens's appliance-layer prediction was wrong on its stated timeline, and the bet has fired its falsification correctly. The lens needs revision in that case, with the failed bet documented as part of the canon. The rubric's job is to specify what would constitute a sovereign appliance; it is not the rubric's job to ensure that a sovereign appliance exists.

A weaker accommodation: if Q4 2029 produces a candidate scoring 7/8 with a defensible argument that the missing axis was the wrong one to gate on, the merchant lens has not been straightforwardly falsified. It has been partly falsified, in a way the next iteration of the rubric should incorporate. The bet's resolution document should record both the score and the argument, so that future readers can judge whether the merchant lens's structural prediction held under a corrected rubric.

The eight axes are not the only way to draw the appliance-vs-wrapper line. They are one way, specified rigidly enough that the line can be checked rather than asserted. If a future reviewer believes the rubric is wrong (that an axis is missing, or that the scoring scale lets through products that should fail), the right move is to publish a competing rubric, audit the same products against both, and see where the rubrics disagree. That is how disciplines become rigorous: not by the original author getting it right on the first pass, but by the rubric being public enough that others can sharpen it.

With the rubric published, the bet dated, and the audit procedure specified, the merchant lens has produced a tactical prediction that can fail in public, and a rubric that anyone can use to check the prediction.

Sources

Foundational:

The Mercantile Thesis — the parent essay this rubric backs, particularly the "Where the appliance gets built" section's eight-axes specification and the Bet 3 falsification surface.
Doctrine 01 — Quantitative Mercantilism, A Field Statement — the methodological foundation the rubric inherits from.

Adjacent rubric and audit references:

The Open Source Initiative's OSI-approved license list (opensource.org/licenses) is the operational definition for the License Posture axis's pass standard.
Reproducible Builds project (reproducible-builds.org) is the operational definition for the Build Gate axis's reproducibility standard.
Sigstore (sigstore.dev) and the in-toto attestation framework (in-toto.io) are the operational references for the Build Gate axis's signing standard.
The MLPerf Inference benchmark suite is the closest existing reference for the kind of public, reproducible scoring procedure this rubric requires; the Mercantile Thesis's Bet 1 SWE-bench Verified criterion follows the same shape.

Cross-references in the canon:

Sovereign Audit 08 — The Mercantile Thesis — the engineering-side companion that tracks the integrator's progress against these axes in running code.
Doctrine 02 — Quants and Plumbers — the data-stack architectural distinction that informs the Data Lineage axis.
Doctrine 05 — The Sovereign Cloak — the sovereign-integration architectural template.
Working definition of "sovereign" used throughout this essay: a stack whose owner cannot be cut off by an upstream vendor, regulator, or model lab without consent. This requires hardware control (run inference on owned silicon indefinitely), license openness (fork, modify, redistribute every load-bearing component without vendor approval), and operational independence from any single supplier (re-target a different supplier as a one-time engineering cost, not a thesis-breaking event). Stricter than data sovereignty (where the bits live) and stricter than regulatory sovereignty (jurisdiction).

Known Issues for the Next Iteration (V2)

"V2" here means a future revision of this essay, not a separate canon-update or successor essay. When V2 ships it will replace this V1 (V1.1 minor-revision counter included) at the same URL, with a dated revision footer and a versioned audit trail. Reviewers running the rubric should record the rubric version they audited against; Bet 3 resolves on whichever rubric version is in force at audit time.

This rubric is V1.2, audited adversarially before publication, patched on the four most load-bearing seams from that audit, and patched again post-cold-reader-test for clarity (Substrate-Owner Pass formal definition, audit-coordinator role definition, inline sovereign-definition, this V2-referent note). Eight substantive known issues remain, surfaced by the original audit but deferred to V2 rather than papered over with hedging language. Listing them publicly so external reviewers know exactly where the rubric is weakest:

Determinism axis check-text vs axis-text divergence (gameable). The check ("a CI test that fails the build if a deterministic-mode invariant breaks") is a process check; the axis demands bit-identical output (a result check). An integrator can ship a trivial CI invariant and pass while production is wildly nondeterministic. V2 needs result-level checks.
Determinism axis discrimination. Pass standard (bit-identical GPU inference at full throughput) is currently unmeetable; Partial standard (epsilon-determinism) is barely meetable. The axis carries little discriminating power between sovereign and non-sovereign integrators in 2026. V2 should either lower the bar to a meaningful one or document why the high bar is load-bearing.
Build Gate silent on supply-chain hygiene. The axis rewards possession of signing infrastructure but is silent on key management, HSM use, rotation policy, and post-incident recovery. SolarWinds-class attacks are out of scope. V2 should add a key-management sub-criterion.
License Posture vs EU AI Act tension. AGPL alone may not satisfy AI Act Annex VIII transparency requirements; OpenRAIL-style licenses exist because OSI licenses don't address model-specific harm restrictions some regulators now require. The rubric currently forces a choice between Pass on License Posture and regulatory compliance in some jurisdictions. V2 needs to acknowledge or resolve the tension.
CPU-only appliance treatment undefined. The Silicon Path axis was written assuming GPU/accelerator paths; an integrator running llama.cpp on Threadripper with direct AVX-512 emission is in an underspecified position. V2 should clarify whether the silicon path must be the accelerator path or any compute path.
Roadmap-state vs current-state divergence. The rubric scores current state; an integrator who passes 8/8 today while having announced an acquisition or weight-closing for next quarter currently wins the bet. V2 should either score the integrator's two-year-forward roadmap or invoke a re-audit clause when a Pass is contingent on a publicly disclosed reversal.
Multi-agent orchestration scores introspectability, not safety. Axis 4's check confirms the harness is traceable and substitutable. It does not confirm the orchestration is sandboxed, rate-limited, or rollback-capable. A sovereign-but-unsafe harness scores Pass. V2 should add an operational-safety sub-criterion.
Reviewer-pool jurisdictional bias. "Public reviewer identities from at least two regions" excludes reviewers from jurisdictions where publicly auditing US/EU/PRC AI products carries personal risk; the procedure embeds the bias it claims to dilute. V2 should allow a pseudonymous-reviewer track for jurisdiction-sensitive cases, with the audit coordinator verifying real-world identity to a published bond.

The right move when one of these breaks the rubric in production is to publish the V2 with the patch, not to retroactively reinterpret V1. Bet 3 still resolves on the rubric in force at audit time; an integrator who would pass V2 but fails V1 is an honest fail under V1, which is a signal to ship V2 sooner.

Footnotes

The chip-to-runtime ABI distinction is canonical in the systems literature: the difference between writing for x86 and writing for libc, between writing GPU shaders in PTX or SPIR-V vs. writing in CUDA C, is the difference between owning the silicon path and being a customer of someone else's runtime. The Vulkan / SPIR-V combination is the cross-vendor reference for owning the silicon-path axis without owning the silicon. ↩
Achieving deterministic GPU inference at full throughput is an open engineering problem as of 2026; the standard mitigations (deterministic-mode CUDA flags, CPU fallback for reduction kernels, fixed-seed sampling) impose meaningful performance cost. The rubric scores the commitment to determinism (a CI test that fails the build if determinism breaks) rather than the absence of any nondeterminism. The standard is engineering discipline, not magic. ↩
The three-reviewer-with-published-disagreement procedure is modeled on the IETF's review-and-IESG-ballot process for RFCs, the IEEE peer-review structure for standards documents, and the academic peer-review tradition's published-dissent norm. It is not modeled on commercial third-party audits (SOC 2, ISO 27001) because those audits' authority depends on the auditor's commercial accreditation rather than on the public verifiability of the audit procedure. ↩