AETHER audio pipeline: a runnable claim for ~9h 49m of shipped spoken-word
This is a lab notebook entry, not a marketing brief. Every claim is graded against the controlled evidence vocabulary1; every empirical number is footnoted to a file in the working tree or to an ffprobe reading of the shipped artifact. The substrate under examination is Aether.Audio.* — an Elixir / Membrane Framework pipeline plus an operationally-equivalent bash harness — that has produced 15 shipped spoken-word programs across the Stax canon.
1. The claim
AETHER's audio pipeline takes a markdown script of a Stax canon program and emits a finished .opus + .mp3 pair plus four sidecar artifacts (.json chapter markers, .txt timecodes, .html player page, content .md) through a single substrate. As of 2026-05-19, the pipeline has produced 15 shipped programs: Edition I (Phonograph Object Lesson, 40:08)2, Edition III (Anti-Edison Vol I LP, 1:27:59)3, Edition VI (Anti-Edison Vol II LP, 1:20:45)4, and 12 × monthly Stax Almanac (Jan–Dec, 6:20:47 total)5.
Total shipped audio: 9 h 49 m 39 s (35,378.98 s of .opus measured via ffprobe)6. Total narration words: ~95,600 across the 15 scripts7. Voice is en_US-libritts_r-medium (Piper TTS)8; ambient beds are sox-synthesized via one of 14 named recipes in Aether.Audio.AmbientSource9.
<!— runnable-claim: aether-audio-pipeline-shipped-programs —>
Evidence grade: compiled. The Elixir code path passes mix compile clean on Elixir 1.17 / OTP 2710; 38 / 39 tests in mix test pass (the one failure is an unrelated Phoenix page-controller assertion against stale boilerplate)11. The substrate has no dedicated unit tests for Aether.Audio.* — integration evidence is the 15 produced programs and their ffprobe-verifiable durations. The shell-harness path is a separate implementation running the same toolchain; the two paths produce the same shape of output but have not been bit-compared. Do not read this entry as unit-tested or audited — those grades are not yet earned.
2. Why audio
The Stax canon has print-form essays — Mercantile Thesis, Anti-Edison arc, twelve-figure Almanac. Print and long-form audio reach different attention surfaces: a 6,000-word lineage essay sits in the research-paper surface, a 33-minute monthly Almanac sits in the commute / walk-hours podcast surface. Both surfaces are load-bearing.
The pipeline's job is to make audio shipment the same operational cost as essay shipment: one markdown script in, finished .opus + .mp3 + chapter markers + player page out. Without the pipeline, each program would be a bespoke production round — voice talent booking, studio time, ambient sourcing, manual mastering, manual chaptering, manual page wiring. The pipeline collapses that to "write the script, run the build."
This is the substrate move: the pipeline-shape is the artifact, the audio files are the by-product12.
3. Architecture — two operationally-equivalent paths
The substrate has two paths that produce the same shape of output from the same input script:
Path 1 — Elixir / Membrane Framework (canonical). Top-level orchestrator Aether.Audio.LineageProgram (260 LOC) composes shell-out wrappers for Piper TTS (PiperShim), sox-synthesized ambient (AmbientSource), sox-backed mix (Mixer), ffmpeg loudnorm (LoudnessNormalize), and a real Membrane.Pipeline for the WAV → Opus encode leg (OpusEncodePipeline) with a WavStripHeader filter element bridging Membrane.File.Source to Membrane.FFmpeg.SWResample.Converter13. The Membrane element model is the production-correct factoring for the linear encoder DAG; the synthesis-and-mix leg shells out because no upstream-stable Membrane mixer or loudnorm element existed in the 1.0 plug-in set14. The wrapper factoring preserves the option to swap to a native Membrane implementation in v2 without changing the call surface.
Path 2 — bash + python3 harness (operational shortcut). Three shell scripts under ~/aether/audio-pipeline/: build-edition-iii.sh (Anti-Edison Vol I)15, build-edition-vi.sh (Vol II)16, and build-almanac-batch.sh (the 8-program May–December batch)17. The harness invokes the same Piper binary, the same sox filter chains (recipe arguments copied verbatim from AmbientSource's build_args/6 clauses), and the same ffmpeg command lines the Elixir path runs through System.cmd/3. The two paths are not bit-identical (Opus encoder non-determinism; the Vol I / Vol II shell scripts skip the explicit loudnorm pass that the Elixir path and the Almanac batch include); they are operationally equivalent at the format and duration level18.
The Elixir path is the documented canonical implementation. The shell-harness produced 14 of the 15 shipped files because for a one-shot long-form build it is faster to debug shell-stderr than a Phoenix-app-spawned Port.
4. The ten-phase build
The full pipeline runs ten phases per program:
- Chapter split. Regex on
^# Chapter (\d+) — (.+)$; preamble
before the first heading becomes chapter 0 (program intro). PiperShim.split_chapters/2 for Elixir; an inline Python heredoc in the shell harness19.
- Piper TTS per chapter. `piper —model en_US-libritts_r-medium.onnx
—sentence_silence 0.4 < chapter-N.txt > chapter-N.wav` (22050 Hz mono). Mtime-cached20.
- Duration measurement.
soxi -D chapter-N.wav; sum to total
narration length, used to size the ambient bed21.
- Narration concatenation. `sox chapter-0.wav … chapter-N.wav
narration-full.wav`. For Editions III / VI, four 25-second interludes are interleaved at side breaks22.
- Ambient bed synthesis.
sox -n … synth <dur+10> <recipe-args>
writes background.wav 10 seconds longer than the narration. Recipe selection is the load-bearing per-program parameter (§5).
- Loudness normalization. `ffmpeg -af
loudnorm=I=-16:TP=-1.5:LRA=11 writes normalized.wav` at -16 LUFS integrated, -1.5 dBFS true-peak — broadcast-podcast central target (Apple -16, Spotify -14, Amazon -14)23.
- Mix narration + ambient. `sox -m -v 1.0 narration-full.wav
-v 0.13 background.wav mixed.wav trim 0 <narration-dur>`; ambient sits ~-18 dBFS under narration24.
- Opus encode.
ffmpeg -c:a libopus -b:a 96k -application audio
with title / artist=Stax / album / date=2026 / genre="Spoken Word" metadata25.
- MP3 fallback encode.
ffmpeg -c:a libmp3lame -b:a 128k, same
metadata schema26.
- Sidecar generation.
python3 gen-sidecars.py(690 LOC) reads
a per-program registry and emits .json chapter markers, .txt timecodes, .html player page, and content .md27.
Phase order is fixed by data dependency. The Elixir path encodes the graph via a with chain in LineageProgram.run/329; the shell harness uses sequential phase blocks with set -e.
5. The ambient recipe library
The per-program ambient bed is the load-bearing aesthetic parameter of the pipeline. A monthly on Medici banking gets :florence_duomo (110 Hz + 165 Hz sines, 0.15 Hz tremolo, 100-step reverb — a synthesized chant drone, NOT a sampled choir); a monthly on the Hanseatic League's Treaty of Stralsund gets :hanseatic_dock (brown noise + 0.18 Hz tremolo, 70-step reverb, 60–900 Hz band-pass)33. The full set in AmbientSource.build_args/6 is 14 recipes plus a :gaslit_factory_floor recipe that lives only in the shell-harness build script for Edition III34:
| Recipe | Program | Synthesis core | |-------------------------------—|------------------------------------—|--------------------------------------------—| | :pink_reverb | Edition I (default) | pink noise + reverb, BP 200–1200 Hz | | :rumble_machinery | Almanac January (Rockefeller) | brown + pink mix, BP 60–800 Hz, heavy reverb | | :north_atlantic_winter | Almanac February (Tudor) | brown + 0.2 Hz tremolo, BP 40–600 Hz | | :victorian_laboratory | Almanac March (Perkin) | pink + 1400 Hz sine, BP 200–2000 Hz | | :florence_duomo | Almanac April (Medici) | 110+165 Hz sines + tremolo, BP 80–1500 Hz | | :hanseatic_dock | Almanac May (Hanse) | brown + 0.18 Hz tremolo, BP 60–900 Hz | | :waterloo_courier_road | Almanac June (Rothschild) | pink + 220 Hz sine, BP 80–1200 Hz | | :homestead_blast_furnace | Almanac July (Carnegie) | brown+pink+95 Hz sine, BP 50–800 Hz | | :trading_floor_after_hours | Almanac August (Slim) | pink + 1800 Hz sine, BP 150–1800 Hz | | :shenzhen_apartment_1987 | Almanac September (Ren Zhengfei) | pink + brown mix, BP 100–1500 Hz | | :library_mahogany | Almanac October (Morgan) | brown + 1.0 Hz tremolo, BP 50–700 Hz | | :venetian_lagoon | Almanac November (Polo) | brown + 130 Hz sine + tremolo, BP 60–1000 Hz | | :yokohama_steamship | Almanac December (Iwasaki) | brown + pink + tremolo, BP 40–900 Hz | | :electrified_factory_floor | Edition VI (Anti-Edison Vol II) | brown + pink mix, BP 60–900 Hz, heavy reverb | | :gaslit_factory_floor (shell) | Edition III (Anti-Edison Vol I) | brown + pink mix, BP 40–700 Hz, heavy reverb |
Every recipe is sox-synthesized: no sampled audio, no field recordings, no third-party sound libraries. The license posture is clean by construction — synthetic noise is mathematically generated with no human creative input beyond the recipe parameters, so the produced audio is unambiguously the work of the pipeline-author35. The beds are suggestive of the historical setting (a Hanseatic dock has waves, a Yokohama steamship has piston cadence), not field-recorded reconstructions.
6. The runnable-claim contract
A reader can rebuild any of the 15 programs from source. For Anti-Edison Vol I15:
cd ~/aether/audio-pipeline
bash build-edition-iii.sh # ~5 min wallclock on this host
ffprobe ~/blog/public/canon/audio/edition-iii-anti-edison-vol-i.opus
ffprobe should report duration 5278.92 ± ε s (≈ 1 h 27 m 59 s) and a libopus stream at 96 kbps31. The output .opus is not bit-identical to the shipped file: libopus and libmp3lame carry frame-timing and silence-detection state across invocations. Duration is deterministic to within the encoder's ~10 ms frame boundary; the narration content is bit-identical at the WAV stage before the lossy encoder runs30. The Almanac batch build runs eight programs back-to-back via bash build-almanac-batch.sh (~40 min wallclock).
The Elixir canonical path is a single iex call (Phoenix context, with NATS progress events to the /audio LiveView):
iex> Aether.Audio.Pipeline.run!(%{
...> script: "~/aether/audio-pipeline/scripts/almanac-january-rockefeller.md",
...> basename: "almanac-january-rockefeller",
...> output_dir: "~/blog/public/canon/audio",
...> title: "Stax Almanac · January · John D. Rockefeller",
...> album: "Stax Almanac"
...> })
Aether.Audio.Pipeline wraps LineageProgram.run/3 and publishes seven NATS subjects under homelab.audio.pipeline.* so the /audio LiveView can render progress as the run advances32.
7. The 15 shipped programs
All durations measured via ffprobe -show_entries format=duration on the published .opus files at ~/blog/public/canon/audio/ as of 2026-05-196:
| Program | Duration | Chapters | Ambient recipe | |--------------------------------------------------—|--------------—|--------—|-------------------------------—| | Edition I — Phonograph Object Lesson | 40:08 | 6 | :pink_reverb | | Edition III — Anti-Edison Vol I LP | 1:27:59 | 6 + 4 IL | :gaslit_factory_floor | | Edition VI — Anti-Edison Vol II LP | 1:20:45 | 6 + 4 IL | :electrified_factory_floor | | Almanac · January · Rockefeller | 33:59 | 6 | :rumble_machinery | | Almanac · February · Tudor | 29:37 | 6 | :north_atlantic_winter | | Almanac · March · Perkin | 32:45 | 6 | :victorian_laboratory | | Almanac · April · Medici | 35:13 | 6 | :florence_duomo | | Almanac · May · Hanseatic League | 31:22 | 6 | :hanseatic_dock | | Almanac · June · Rothschild | 31:00 | 6 | :waterloo_courier_road | | Almanac · July · Carnegie | 32:43 | 6 | :homestead_blast_furnace | | Almanac · August · Slim | 32:33 | 6 | :trading_floor_after_hours | | Almanac · September · Ren Zhengfei | 32:38 | 6 | :shenzhen_apartment_1987 | | Almanac · October · Morgan | 32:39 | 6 | :library_mahogany | | Almanac · November · Polo | 27:37 | 6 | :venetian_lagoon | | Almanac · December · Iwasaki | 28:42 | 6 | :yokohama_steamship | | Total | 9:49:39 | 94 | 15 distinct recipe-instances |
(Chapters count includes the program-introduction chapter 0; "IL" = side-break interlude on Editions III / VI.)
Every program is published under CC BY-NC 4.0 per the Stax Editions drop-house charter §1235.
8. Pipeline-as-substrate
What the pipeline enables that bespoke per-program production wouldn't:
- Marginal-cost collapse. A new monthly Almanac is "write the
script, add a registry entry, pick a recipe, run the build." Piper TTS runs at real-time factor ~0.13–0.17× on this laptop37; a 30-minute program takes ~4–5 minutes to synthesize and another ~30 seconds to mix, normalize, encode. The 8-program May–December batch ran in under 40 minutes total.
- Format consistency. Every program ships with the same
chapter-marker shape, the same -16 LUFS target, the same 96k Opus + 128k MP3 encode, the same five sidecar artifacts. The player page renders identically for the Phonograph Edition and a monthly Almanac.
- Substrate for the 2027 cycle. The 2027 twelve-figure Almanac
roster is already published; the pipeline is what makes that ship-cadence feasible without re-tooling per program.
- Composability with the Director's Track DSL. Chapter-marker
sidecars let the Director's DSL38 address program sections by basename + chapter index — a scene like play :almanac-jan-rockefeller chapter: 3 zone: "kitchen" reaches the section level, not just the file level.
9. Honest limitations
Six things the current pipeline does not do:
- **Piper TTS produces serviceable but unmistakably-synthesized
narration.** en_US-libritts_r-medium is the best-public-OSS multi-speaker LibriTTS variant — uniform prosody at 165–185 wpm — but it is not Studio One quality and there has been no professional voice audition pipeline8. The Edition III physical-press capsule (Q4 2026) will need a real-human-voice master.
- **Loudness normalization targets streamed playback (-16 LUFS), not
vinyl pressing.** The physical LP master will need a separate normalization pass tuned to the pressing facility's spec39.
- Ambient beds are sox-synthesized at recipe-level fidelity.
:venetian_lagoon is suggestive of the historical setting; it is not a field recording of San Marco at dawn. "Synthetic, suggestive, licensed-clean," not "actual lagoon water."
- **Differential testing against the canonical Membrane pipeline is
partial. The shell-harness and Elixir paths produce the same shape of output (Opus + MP3 + sidecars, same metadata schema, durations within encoder frame-boundary tolerance), but the two paths have not** been bit-compared end-to-end. Opus encoder non-determinism makes byte-identical comparison non-trivial; the next-frontier work is a duration+integrated-LUFS comparison harness36.
- The pipeline does not handle music yet. Spoken-word + synthetic
ambient is the entire scope. A real instrumental score (e.g., for the Edition III LP physical press) would need a parallel pipeline or external mastering.
- No dedicated unit tests for
Aether.Audio.*. The 38 / 39
passing tests in mix test cover the Director runtime, Director scene, and web controllers — not the audio modules11. The integration evidence is the 15 produced programs; that is real evidence and it is narrower than what unit-tested would warrant. A PiperShim.split_chapters/2 test, an AmbientSource argument-builder test, and a LineageProgram.chapter_timestamps/1 property test are the next-frontier closures.
10. v2 deferrals
Honest deferred items for the next pass:
- Real human voice. Audition / Pro Tools handoff for the Edition
III physical LP press master; long-term audition pipeline for the broader public-distribution surface.
- Music score layering. Instrumental composition + multi-track
music+narration mix for programs where a synthetic ambient bed is insufficient.
- Container-level chapter markers. Bake chapter timecodes into
the Opus OpusTags block (and ID3v2 CHAP for MP3) so chapter-aware players render them as a scrubbable list40.
- Public-API exposure. Once the AETHER NATS bus matures, a
mix call AETHER /audio/produce '{"script": "...", "ambient": ":victorian_laboratory"}' returning a finished .opus is a clean v2 shape that composes with the rest of the agent fleet.
- Native Membrane mixer + loudnorm. Swap sox / ffmpeg shell-outs
for native Membrane elements when the plug-in set ships upstream-stable equivalents; the wrapper factoring is designed for that swap14.
- External CC0 ambient pool. Curated pool of CC0 / public-domain
field recordings (archive.org, Free Music Archive CC0 tier, Library of Congress field-recording collection) selectable in place of synthetic beds where field fidelity matters41.
11. Cross-references
- Lineage Mode — the substrate concept of "every Stax canon
program ships an audio component"; the pipeline is the operational implementation.
- Director's Track DSL — composes with this pipeline at the
program-section level.
- Stax Editions I, III, VI — ship audio components produced by
this pipeline; Edition II (Almanac) is now audio-complete across the full 12-month roster via this substrate.
- Design-system contract —
~/codex/methods/stax-dev-portfolio-design-system.md defines the evidence vocabulary1.
- License posture — pipeline code AGPL-3.042; produced
audio CC BY-NC 4.035; Piper voice model CC BY 4.0 (LibriTTS-R derivative)8; sox-synthesized ambient CC0-equivalent by construction.
12. Status footer
- Evidence grade:
compiled. Elixir code path compiles clean
under Elixir 1.17 / OTP 27; 38 / 39 tests pass in mix test; the audio modules have no dedicated unit tests. Integration evidence is the 15 shipped programs (9 h 49 m 39 s of audio, ffprobe-verifiable at ~/blog/public/canon/audio/). Not unit-tested, not differential-tested end-to-end, not audited.
- Reproducible:
true. The bash-harness commands and iex call in
§6 are the canonical reproduction recipes.
- Last verified: 2026-05-19, Intel Core i7-1065G7 @ 1.30 GHz,
Linux 7.0.3-arch1-1 x86_64, Elixir 1.17.3 / OTP 27, Piper v2023.11.14-2, sox v14.4.2, ffmpeg v8.0.1.
- Open gaps: unit-test coverage for
PiperShim/AmbientSource
/ LineageProgram; duration+LUFS differential test between shell and Elixir paths; container-level chapter markers; human-voice audition pipeline; vinyl-press master pass; native Membrane mixer + loudnorm.
Footnotes
~/codex/methods/stax-dev-portfolio-design-system.mddefines the controlled evidence vocabulary used across/labentries:compiled,unit-tested,property-tested,fuzz-tested,differential-tested,benchmarked,audited,sketch,NOASSERTION. The vocabulary is enforced via the frontmatterevidence:field, which the renderer surfaces as a pill in the entry header. ↩ffprobe -v error -show_entries format=duration -of csv=p=0 ~/blog/public/canon/audio/edition-i-phonograph-object-lesson.opus→2407.985125s = 40:07.99 (rounded to 40:08 for the display table). Six chapters: program intro + five narrative chapters, per the JSON sidecar at~/blog/public/canon/audio/edition-i-phonograph-object-lesson.json. ↩ffprobe …/edition-iii-anti-edison-vol-i.opus→5278.920417s = 1:27:58.92. Six chapters (intro + five narrative sides A1/A2/B1/B2/coda) plus four side-break interludes at A1→A2, A2→B1, B1→B2, B2→coda; structure per the JSON sidecar at~/blog/public/canon/audio/edition-iii-anti-edison-vol-i.json. ↩ffprobe …/edition-vi-anti-edison-vol-ii.opus→4844.774583s = 1:20:44.77. Same shape as Edition III (intro + four narrative sides + coda + four interludes); structure per the JSON sidecar at~/blog/public/canon/audio/edition-vi-anti-edison-vol-ii.json. ↩- Sum of
ffprobedurations across the 12 monthly.opusfiles (Jan through Dec) =22847.30s = 6 h 20 m 47 s. Per-file values in §7. Source-of-truth durations measured 2026-05-19 against~/blog/public/canon/audio/almanac-*.opus. ↩ - Aggregate measurement:
for f in edition-i-phonograph-object-lesson edition-iii-anti-edison-vol-i edition-vi-anti-edison-vol-ii almanac-january-rockefeller almanac-{02-feb-tudor,03-mar-perkin,04-apr-medici,05-may-hanse,06-jun-rothschild,07-jul-carnegie,08-aug-slim,09-sep-ren,10-oct-morgan,11-nov-polo,12-dec-iwasaki}; do ffprobe -v error -show_entries format=duration -of csv=p=0 $f.opus; done | awk '{s+=$1} END {print s}'→35378.98s = 9 h 49 m 38.98 s. ↩ wc -wacross the 15 script files under~/aether/audio-pipeline/scripts/edition-*.mdand~/aether/audio-pipeline/scripts/almanac-*.md: 95,632 words total. Per-script counts range from 4,171 (December · Iwasaki) to 13,595 (Edition III · Anti-Edison Vol I). The total counts the markdown source words including chapter headings; the spoken narration after stripping headings and code blocks is marginally smaller, but the ~95,600 figure is the load-bearing approximation. ↩~/aether/audio-pipeline/piper/en_US-libritts_r-medium.onnx, model card athttps://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/libritts_r/medium. Per~/aether/audio-pipeline/scripts/credits.md: license is CC BY 4.0 (LibriTTS-R dataset derivative). Piper engine itself (v2023.11.14-2) athttps://github.com/rhasspy/piperis MIT-licensed. Tested againstamy-mediumandryan-medium;libritts_r-mediumhad the most uniform prosody for long-form essay narration at 165–185 wpm. ↩~/aether/lib/aether/audio/ambient_source.ex, lines 154–512. Fourteen namedbuild_args/6clauses, one per recipe atom; clause heads enumerate:pink_reverb,:rumble_machinery,:north_atlantic_winter,:victorian_laboratory,:florence_duomo,:hanseatic_dock,:waterloo_courier_road,:homestead_blast_furnace,:trading_floor_after_hours,:shenzhen_apartment_1987,:library_mahogany,:venetian_lagoon,:yokohama_steamship,:electrified_factory_floor. Default-gain and default-fade clauses at lines 122–152. ↩cd ~/aether && mix compile --force 2>&1 | tail -2on 2026-05-19:Compiling 37 files (.ex)→Generated aether app. Clean compile, zero warnings, on Elixir 1.17.3 / OTP 27. ↩cd ~/aether && mix test 2>&1 | tail -2on 2026-05-19:Finished in 6.9 seconds (0.8s async, 6.1s sync)→39 tests, 1 failure. The one failure is intest/aether_web/controllers/page_controller_test.exs:6— an assertion against the Phoenix-generated "Peace of mind from prototype to production" boilerplate text that was overwritten by the AETHER Sonos-zones dashboard. Unrelated to audio. The audio modules themselves have no dedicated tests; the only files undertest/aether/aredirector/runtime_test.exsanddirector/scene_test.exs. ↩- The "pipeline is the artifact, audio files are the by-product" framing is the same substrate move that the portfolio-bench
/labentry makes for benchmarks: a single bench is a perf claim, five aligned bench files are a substrate claim. One audio file is a production artifact; 15 audio files through one pipeline is a substrate claim. ↩ - Module roster at
~/aether/lib/aether/audio/:pipeline.ex(193 LOC — orchestrator with NATS events),lineage_program.ex(260 LOC — DAG),piper_shim.ex(180 LOC — TTS subprocess shim),ambient_source.ex(513 LOC — recipe library),mixer.ex(92 LOC — sox-backed two-input mix),loudness_normalize.ex(93 LOC — ffmpeg loudnorm),opus_encode_pipeline.ex(105 LOC — real Membrane.Pipeline for WAV→Opus),wav_strip_header.ex(106 LOC — Membrane.Filter for header strip). Total: ~1,542 Elixir LOC for the audio pipeline. ↩ ~/aether/lib/aether/audio/lineage_program.exlines 23–38: "The encoder leg is implemented as a real Membrane.Pipeline … because that is the part of the pipeline where Membrane's element model is the production-correct factoring: file-source → format-converter → encoder → file-sink is a single linear DAG with no multi-source mixing and no subprocess streaming, which is exactly what Membrane Core 1.3 is good at. The synthesis-and-mix leg shells out to piper/sox/ffmpeg behind Membrane-element-shaped Elixir modules, because Membrane has no upstream-stable mixer or loudnorm element in the 1.0 plugin set and rolling them is an unjustified scope expansion. The wrapper factoring preserves the option to swap to a native Membrane implementation in v2 without changing the call surface." ↩~/aether/audio-pipeline/build-edition-iii.sh, 157 lines. Phases enumerated asecho "=== Phase N: …"blocks; Phase 1 (split chapters via inline Python heredoc), Phase 2 (Piper TTS per chapter, mtime-cached), Phase 3 (chapter durations viasoxi -D), Phase 4 (synthesize 4 × 25-second interludes), Phase 5 (assemble program with sox concat), Phase 6 (ambient bed via:gaslit_factory_floorrecipe — sox-only inline), Phase 7 (mix narration + ambient), Phase 8 (Opus encode at 96k), Phase 9 (MP3 encode at 128k). Note: this script omits the explicit loudnorm pass that the Elixir path and the Almanac batch script run. The integrated loudness of the shipped Edition III is set by the mix gains rather than by a loudnorm post-pass. ↩~/aether/audio-pipeline/build-edition-vi.sh, 157 lines. Same 9-phase shape as the Vol I script; recipe is:electrified_factory_floor(the Vol II sister recipe documented inambient_source.exlines 471–512); narrative ordering interleaves four interludes between five chapter sides plus a coda. ↩~/aether/audio-pipeline/build-almanac-batch.sh, 357 lines. Thebuild_oneshell function (lines 37–298) implements the 10-phase per-program build including the explicit loudnorm step at Phase 7. Twelvebuild_oneinvocations at the bottom of the script (May through December in batch-2; the Feb/Mar/Apr batch-1 invocations are commented out as previously shipped earlier the same day per the comment block at lines 13–22). ↩- The Elixir path and the shell-harness path share the same Piper binary, the same
en_US-libritts_r-medium.onnxvoice, the same sox synth-and-reverb arguments (the bashcase "$RECIPE"block inbuild-almanac-batch.shlines 116–262 mirrorsAether.Audio.AmbientSource.build_args/6clause-by-clause), and the same ffmpeg libopus + libmp3lame command lines. They are not bit-identical (Opus encoder non-determinism, and the Edition III/VI shell scripts skip the explicit loudnorm pass). They are operationally equivalent at the format and duration level. A duration+LUFS differential-test harness is an open v2 gap. ↩ - Elixir path:
Aether.Audio.PiperShim.split_chapters/2at~/aether/lib/aether/audio/piper_shim.exlines 56–105 — regex pattern^# Chapter (\d+) — (.+?)$\n(.*?)(?=^# Chapter |\z)matches against the script, preamble (anything before the first# Chapterheading, with the document's H1 stripped) becomes chapter 0. Shell path: inline Pythonpython3 - "$SCRIPT" "$CHAPTERS_DIR" <<'PY' … PYheredoc in each of the three build scripts; same regex shape (re.split(r'^# Chapter (\d+) — (.+)$', src, flags=re.MULTILINE)). ↩ - Elixir path:
Aether.Audio.PiperShim.run_piper/3atpiper_shim.exlines 145–169; constructs<piper> --model <voice> --output_file <wav> --sentence_silence <s> --quiet < <txt>viash -cso piper reads from the file and gets a real EOF. Mtime-cache check at lines 127–143 — synthesis short-circuits if the output WAV is newer than the input.txt. Shell path: equivalent inline[[ -f "$out" && "$out" -nt "$in" ]]cache check +"$PIPER" --model "$VOICE" --output_file "$out" --sentence_silence 0.5 < "$in". ↩ - Elixir path:
LineageProgram.duration_seconds/1atlineage_program.exlines 120–131 — wrapssoxi -D <path>, parses the float from stdout. Shell path: inlinedur=$(soxi -D "$CHAPTERS_DIR/chapter-$i.wav")in each script. ↩ - Elixir path:
LineageProgram.concat_chapters/2atlineage_program.exlines 107–118 —sox <chapter-0.wav> … <chapter-N.wav> narration-full.wav. Shell path: explicitsox chapter-0.wav chapter-1.wav … narration-full.wavenumeration; Edition III/VI scripts interleaveinterlude-N.wavfiles at the side breaks. ↩ ~/aether/lib/aether/audio/loudness_normalize.exlines 1–31 — moduledoc names the broadcast-podcast target reasoning: "Apple Podcasts target is −16 LUFS, Spotify is −14 LUFS, Amazon Music is −14 LUFS; −16 is the conservative central value that plays comfortably on every major podcast platform without triggering platform-side compression." Single-pass loudnorm (accurate to ~0.5 LUFS of target); v2 can upgrade to two-pass for finer precision. Filter at line 53:loudnorm=I=-16:TP=-1.5:LRA=11:print_format=summary. ↩~/aether/lib/aether/audio/mixer.exlines 32–78 —sox -m -v <narration-gain> <narration> -v <ambient-gain> <ambient> <output> [trim 0 <narration-dur>]. Defaults: narration gain 1.0 (0 dB), ambient gain 0.12 (~-18 dB). Shell-harness Almanac uses ambient gain 0.15 (~-16 dB); Edition III/VI shell uses 0.13 (~-18 dB). ↩~/aether/lib/aether/audio/lineage_program.exencode_opus/3at lines 149–183 —ffmpeg -y -i <wav> -c:a libopus -b:a 96000 -application audio -metadata title=… -metadata artist=Stax -metadata album=… -metadata date=2026 -metadata genre="Spoken Word" <output>. Verification step atverify_with_membrane/1lines 190–197 (file exists and size > 100 bytes — minimal smoke test). ↩lineage_program.exencode_mp3/3at lines 199–221 —ffmpeg -y -i <wav> -c:a libmp3lame -b:a 128k -metadata title=… <output>. Same metadata schema as Opus. ↩~/aether/audio-pipeline/gen-sidecars.py, 690 lines. Per-program registry at lines 25 onward (one dict literal per basename); the script reads<basename>.jsonchapter-marker output, writes.txt(HH:MM:SS.mmm chapter timecodes),.html(full player + transcript page), and a~/blog/content/canon/audio/<basename>.mdfor the blog builder. Invoked frombuild-almanac-batch.shPhase 10 aspython3 "$PIPELINE_DIR/gen-sidecars.py" "$BASENAME" "$SCRIPT" "$WORK_DIR". ↩~/aether/audio-pipeline/gen-sidecars.py. TheREGISTRYdict at lines 25 onward keys every shipped program by basename and stores title, edition number, month, ambient recipe, lineage cluster, primary sources, and chapter structure. Adding a new program is one registry entry. ↩~/aether/lib/aether/audio/lineage_program.exrun/3at lines 68–101 — singlewithexpression chainsPiperShim.synthesize→concat_chapters→duration_seconds→AmbientSource.render→Mixer.mix→LoudnessNormalize.normalize→encode_opus→encode_mp3→build_manifest. Any failure short-circuits to{:error, reason}. ↩libopuscarries internal state across frames (silence-detection, frame-timing, look-ahead buffering) such that two invocations on the same input WAV produce.opusfiles that have the same audio content at PCM-decode level but are not byte-identical at the container level. Duration is reproducible to within the Opus frame boundary (~10 ms).libmp3lamehas analogous non-determinism. Byte-deterministic encoding would require a different codec (e.g., FLAC) or codec-level seed pinning; neither is in v1 scope. ↩ffprobe ~/blog/public/canon/audio/edition-iii-anti-edison-vol-i.opusreports: Format ogg; Stream #0:0 Audio: opus, 48000 Hz, mono, fltp, 96 kb/s; Duration 01:27:58.92; Metadata title, artist=Stax, album=Stax Editions, date=2026, genre=Spoken Word. Properties consistent across the 15 shipped programs (the Almanac album metadata readsStax Almanac; the Edition I album readsStax Edition I). ↩~/aether/lib/aether/audio/pipeline.exlines 30–36 — seven NATS subjects published over the lifecycle of a run:homelab.audio.pipeline.started,.chapter,.mixed,.normalized,.encoded,.done,.error. The orchestrator generates a run-id (line 185–192) and publishes a coalesced summary at the end; v0.2 will push event emission down intoLineageProgramitself for live streaming. The/audioLiveView consumes these subjects to render a "now mastering" panel. ↩~/aether/lib/aether/audio/ambient_source.ex:florence_duomoclause at lines 261–305:synth <dur> sine 110, synth <dur> sine mix 165, tremolo 0.15 25, vol 0.06, reverb 100 90 100, highpass 80, lowpass 1500, fade t 5 <dur> 5.:hanseatic_dockclause at lines 307–320:synth <dur> brownnoise, tremolo 0.18 35, vol 0.06, reverb 70 60 70, highpass 60, lowpass 900, fade t 4 <dur> 4. Every clause is sox-synthesized arguments only; no sample file paths. ↩- The
:gaslit_factory_floorrecipe used by the Edition III shipped audio lives only in~/aether/audio-pipeline/build-edition-iii.shlines 122–129 (inline sox-synth args). The ElixirAether.Audio.AmbientSourcedefines:electrified_factory_floor(Edition VI Anti-Edison Vol II) but not:gaslit_factory_floor(Edition III Anti-Edison Vol I). The Vol VI module-doc comment atambient_source.exlines 471–481 explicitly references "the Edition III Vol I shell-harness ambient bed" — i.e., acknowledges the shell-only origin. Porting:gaslit_factory_floorto the Elixir module is an honest v2 cleanup. ↩ ~/aether/audio-pipeline/scripts/credits.md, "License of the resulting audio" section: "CC BY-NC 4.0 per the Stax Editions drop-house charter §12." The same line appears in every program'sgen-sidecars.pyREGISTRY entry under thelicensekey. ↩- A duration+integrated-LUFS differential-test harness — run the shell-harness build of all 15 programs, run the Elixir-pipeline build of the same 15, compare
ffprobeduration andffmpeg loudnormintegrated-LUFS readings within 10 ms / 0.5 LUFS tolerance — is the next-frontier closure that would justify upgrading the evidence grade fromcompiledtodifferential-tested. Not done. ↩ - Piper TTS real-time factor on this host (Intel Core i7-1065G7 @ 1.30 GHz, CPU-only inference): 0.13–0.17×, i.e., synthesizing 60 seconds of narration takes 7.8–10.2 seconds of wallclock. Measured empirically across the 15 program builds; the per-chapter Piper invocation in the shell-harness logs reports a "synthesized in Xs" line that gives the read-off. ↩
- The Director's Track DSL is the AETHER substrate's scene-driven composition layer; the
play_programoperation in the DSL takes a basename + zone and (where applicable) chapter index, which addresses an audio file produced by this pipeline. The Director runtime + scene tests at~/aether/test/aether/director/{runtime,scene}_test.exsare where the DSL's evidence sits; the binding to this pipeline is the basename + chapter-index addressing scheme that this pipeline's sidecar artifacts make addressable. ↩ - The Edition III physical-press LP (Q4 2026) is a separate mastering job from the streamed-podcast target. RIAA EQ pre-emphasis, lacquer-cutting headroom, and side-A / side-B level matching are all out of scope for the -16 LUFS streamed target. The streamed
.opusand the LP master are different artifacts produced from the same narration WAV. ↩ - The Opus
OpusTagsblock and the MP3 ID3v2 CHAP frame both support container-level chapter markers; chapter-aware players (Overcast, Apple Podcasts, VLC) will render them as a scrubbable list. The pipeline currently emits chapter timecodes to a.txtsidecar and to the<basename>.jsonfor the player page, but does not bake them into the Opus or MP3 container.ffmpegsupports-map_metadata+-map_chaptersfor this; the next-frontier work is a small post-encode pass that writes a chapter-metadata file and re-encodes (or remuxes) with chapters. ↩ ~/aether/lib/aether/audio/ambient_source.exmoduledoc lines 81–84: "v2 enhancement: replace the synthetic bed with selection from a curated CC0 ambient pool (e.g. archive.org Public Domain Audio, Free Music Archive CC0 tier)." The wrapper factoring atrender/3makes the swap one-clause-change in the dispatch: add adefp render_from_pool/3and route:archive_org_<recipe>atoms to it. ↩- The AETHER application as a whole is licensed AGPL-3.0; the audio pipeline modules live under
~/aether/lib/aether/audio/inside that license boundary. The Piper voice model carries its own CC BY 4.0 (LibriTTS-R derivative); the produced audio is CC BY-NC 4.0 per the Stax Editions charter; these three licenses compose without conflict because each governs a different layer (code / model / output). ↩