Week of 2026-05-15 — first Lineage Mode audio ship

The first Lineage Mode audio program. The audio component of the Stax Edition I capsule, which had been a placeholder line in the Editions charter §6 ("60-minute audio program on the history of single-purpose audio remotes from telegraph keys to studio fader handles to the Sonos volume knob") and is now a real, listenable, downloadable artifact.

The intent of this workshop entry is not to recite the file paths. The intent is to document what voice was chosen and why, what the ambient source was and why, the precise shell pipeline incantations, the wallclock cost, and what is explicitly gated on Membrane / AETHER for the production-grade rewrite — so the next Lineage Mode ship and the eventual AETHER integration both have a defensible substrate to start from.

What landed

~/aether/audio-pipeline/scripts/edition-i-phonograph-object-lesson.md

— the narration script. ~7,410 words, six chapters of 1,150–1,290 words each, plus a 150-word program-introduction preamble. Written in the calibrated-voice discipline the Mercantile Thesis and Sonos S2 essays already model. No "the era has begun" framings. No "invalidates" framings. Sourced historical facts: the Morse key of 1844, the BBC's Savoy Hill in 1926, the Ampex VR-1000 in 1956, Marantz/McIntosh/Krell, the TiVo Peanut of 1999, Sonos CR100, etc.

~/aether/audio-pipeline/build-audio.sh — the build harness. ~110

lines of bash. Splits the script by # Chapter N headings (via a python heredoc), runs Piper per chapter, concatenates with sox, generates a synthetic pink-noise ambient bed, mixes at -16 dBFS, encodes to opus (96k VBR) and mp3 (128k CBR), and emits sidecar JSON + chapter-marker txt.

~/blog/public/canon/audio/edition-i-phonograph-object-lesson.{opus,mp3,json,txt,html}

— the shipped artifacts.

~/blog/content/canon/audio/edition-i-phonograph-object-lesson.md —

the source-of-truth content markdown (the standalone HTML is hand-authored against stax.css because the blog-builder doesn't yet treat audio as a canon arc; that builder extension is a later move).

Voice — `en_US-libritts_r-medium`

Tested three Piper voices on the same sentence. libritts_r-medium had the most uniform prosody and the steadiest cadence for long-form essay narration. amy-medium was warmer but had more variable pacing. ryan-medium was crisp but more news-anchor than essayist.

Real-time factor on this hardware (no GPU, pure CPU inference): ~0.13–0.17 across the six chapters. Six chapters of ~6 minutes average inference cost per chapter to produce ~6.4 minutes of audio. The voice is good enough that I would not be embarrassed to ship this in Edition I's capsule QR card. It is not better than a hired voice-actor; it is well above an espeak-ng fallback would have been.

Quality assessment from the file alone (since I cannot listen here): 22050 Hz mono PCM source, no clipping flagged by sox in the per-chapter WAVs, ~150 samples clipped during the narration+ambient mix (sox warned, level was reduced to fix). The Real-Time-Factor and audio duration metadata is consistent across chapters (~0.32 s of inference per second of audio).

Ambient bed — synthetic pink noise

The charter named CC0 background candidates (FreeMusicArchive, Free Sound Project, YouTube Audio Library, Library of Congress field recordings). For this first ship I went with a synthetic pink-noise bed generated locally by sox for full provenance control with zero external dependency. The recipe is documented in ~/aether/audio-pipeline/scripts/credits.md. The bed is mixed at -16 dBFS so the narration sits comfortably on top.

External CC0 / public-domain audio sourcing is the obvious upgrade for Edition II's audio. For Edition I, a clean synthetic bed beats a CC0 track of debatable license provenance.

The shell pipeline

# Phase 1 — split script into per-chapter txt files
python3 split-by-chapter.py script.md chapters/

# Phase 2 — Piper TTS per chapter
piper --model en_US-libritts_r-medium.onnx \
      --output_file chapters/chapter-N.wav \
      --sentence_silence 0.5 < chapters/chapter-N.txt

# Phase 3 — concatenate
sox chapters/chapter-0.wav chapters/chapter-1.wav ... chapters/chapter-6.wav \
    narration-full.wav

# Phase 4 — synthetic ambient bed
sox -n -r 22050 -c 1 background.wav \
    synth <duration> pinknoise \
    vol 0.08 reverb 60 50 60 \
    highpass 200 lowpass 1200 \
    fade t 3 <duration> 3

# Phase 5 — mix
sox -m -v 1.0 narration-full.wav -v 0.15 background.wav mixed.wav trim 0 <duration>

# Phase 6 — encode
ffmpeg -y -i mixed.wav -c:a libopus -b:a 96k -application audio output.opus
ffmpeg -y -i mixed.wav -c:a libmp3lame -b:a 128k                  output.mp3

The build is idempotent; re-running with a cached WAV chapter skips the TTS step for that chapter. sentence_silence 0.5 puts 500ms of breath between sentences, which is the right pacing for narration with this voice model.

Wallclock cost

End-to-end from the build script invocation: about six minutes (05:31 of TTS inference + ~30 s for sox + ~30 s for the two ffmpeg encodes). The script-writing — the actual essay-at-narration-depth work — took the longer part of the session and is the part that does not amortize: every new Lineage Mode program needs new prose.

Final artifact stats

Duration: 40:08 (2,408 seconds) across 7 tracks (intro +

6 chapters).

Opus file: 30.8 MB at 96 kbps VBR.
MP3 file: 36.7 MB at 128 kbps CBR.
Both files are gitignored — too large for the repository.

Documented in the public canon-audio page where the audio lives; hosted on the same stax.dev web root that serves the HTML page. Future move is to push to Cloudflare R2 or similar object store; the HTML page's <source> URLs become a one-line update at that point.

What's gated on Membrane / AETHER

The shell+sox+ffmpeg pipeline shipped here is the v0.1 proof of concept. The production-grade Lineage Mode runtime is the Membrane Framework integration named in:

~/codex/methods/stax-editions-drop-house-charter.md §6.3 (Edition

III software component) — "Lineage Mode runtime (Membrane Framework / Elixir) released as v1.0"

~/blog/content/workshop/2026-05-week-3-aether-bootstrap.md — AETHER

is the Elixir/Phoenix substrate the Membrane runtime will plug into.

Specifically gated on AETHER:

Streaming TTS pipeline — Piper invoked per-chapter as a one-shot

binary today. The Membrane version is a Piper-backed Membrane Element that emits PCM frames, composed with a CC0-music-library Element, a Compressor Element, and an Opus-encoder Element in a single Membrane pipeline. Same output, better operational story (back-pressure, per-frame metrics, hot-swappable encoder).

Multi-zone live preview — the AETHER LiveView dashboard already

handles Sonos zones; the Membrane integration adds a "now mastering" pane that streams the in-progress audio to a browser tab so the producer (me, today) can monitor the mix in real time without waiting for the encode to complete.

Capsule-level pipelines — a single Membrane pipeline that

ingests the four capsule sources (object spec markdown, essay markdown, software README, audio script markdown) and emits all four delivery artifacts plus the QR-code landing page. The four-component capsule is the unit; the Membrane pipeline is the unit operator.

None of those features are blockers for shipping Edition I's audio. They are the right next moves once Edition I is in customers' hands and the Lineage Mode runtime needs to scale to weekly drops.

Honest limitations

Target was 60 minutes; actual is 40. The script was written

for ~9,000 words at 150 wpm. Piper at libritts_r-medium runs closer to 185 wpm in practice, so the script delivered 40 minutes instead of

The audio is still substantial, the lineage is still complete,

and the chapter structure holds. The honest reading: I should either write longer scripts for Edition II's audio, or slow Piper's sentence silence to 0.8s, or both. I will not pad the script with rhetoric to hit a duration target — that is the parlor-trick failure mode the memory file explicitly names.

The blog-builder does not yet treat audio as a canon arc. The

page is shipped as hand-authored HTML against stax.css using the same .canon-entry template the lineage/anti-edison/doctrine entries use. The builder extension to add audio as a first-class arc is the right next move; it is not blocking Edition I.

The narration is a neural TTS, not a human voice. This is named

explicitly. The capsule QR card and the page above name Piper as the narrator. The financial model of Stax Editions does not require a human voice-actor for Edition I; it does require the audio to be honest about how it was produced. It is.

What I'd do differently next time

Write the script first at the actual Piper wpm rate, not the

general-purpose 150 wpm planning rate. That alone gets the duration closer to target.

Source one CC0 ambient track from FreeMusicArchive or Internet

Archive and run it through sox for tempo/key match, rather than the pure synthetic bed. The bed is fine; a real-instruments track would be warmer.

Land the blog-builder's audio canon-arc support before the second

Lineage Mode program so it lives in the same content/canon source tree as the other arcs.

Status

Shipped to ~/blog/public/canon/audio/. Atlas updated. Page live in the same build as the rest of the canon section once bash build.sh runs and git push lands.