Canon · Lineage

Lineage XLIV. Lineage 44: Kenneth Forbus, The Substrate Is the Teaching

2026-05-13

I took Kenneth Forbus's Qualitative Reasoning class at Northwestern in 2024. The infrastructure he'd built ran through Emacs and Lisp, and he had constructed a website that checked our homework submissions: we'd write code in the editor, submit it to a form, and his Lisp-based grader would evaluate it and return the result. Nobody else had anything close to that at Northwestern. The typical computer science course was a professor or TA looking at your code by eye, or a generic automated-grading system that ran your program and compared output. Forbus had engineered the grade-checking into the same language as the problem: Lisp checking Lisp, running inside an environment that was also the editor, also the compiler, also the REPL. The infrastructure was twenty years ahead of the systems my peers were using in the same building.

The thing that stays with me about the class (more than the qualitative-reasoning theory, though that was solid) was the substrate realization. The course didn't teach qualitative reasoning through problem sets. It taught qualitative reasoning by immersing you in an environment where qualitative reasoning was the native operation of the system. You didn't learn how to write a qualitative-physics model and then test it. You wrote in a language that was a qualitative model, in an editor that understood the structure of the model, graded by a system that reasoned over the same primitives you were using. The curriculum was the substrate.

This is Forbus's contribution to the merchant line that Sean is now building: the recognition that teaching happens in the engineering substrate, not in the curriculum document, and that a cognitive-systems environment is its own instructor.

I. The Flow

Kenneth Forbus directed the flow of cognitive representation from psychology into engineering. The flow had three legs.

The first leg was qualitative physics: the observation that ordinary people reason about the physical world using mental models that are qualitative, not quantitative. When you reason about a bathtub filling with water, you don't calculate the rate equation for fluid dynamics. You reason qualitatively: water is a substance, the faucet is a source, the drain is a sink, inflow overcomes outflow so the level rises, when the level reaches the faucet height the inflow path closes. This qualitative reasoning runs continuously, it's fast, it's how people actually think about the physical world, and it was invisible to the quantitative machinery of AI in the 1980s. Forbus named this field, built the foundational representations (qualitative process theory, qualitative differential equations), and published the work in a way that made it parseable to the AI and cognitive-science communities. The publication vehicle was Artificial Intelligence, 1984, a paper that established qualitative physics as a distinct sub-field of AI with its own representations, its own inference rules, its own problem classes.

The second leg was analogical reasoning and similarity: the complementary insight that reasoning by analogy is not a heuristic decoration on top of symbolic logic; it's a first-class cognitive operation that humans use to transfer knowledge across domains. If you understand how a pump works, and someone tells you that the heart is a pump, you can immediately reason about the heart by analogy. This operation (matching relational structures between a source domain and a target domain, and transferring inferences) is the content of analogical reasoning. The publication vehicle was the Structure Mapping Engine (SME), co-developed with Dedre Gentner and Brian Falkenhainer, published in 1986. The SME was (and still is, in updated form) a computational implementation of structure-mapping theory: the algorithm that takes two relational-structure representations and extracts the mapping that best preserves structure while transferring analogical knowledge. The SME was the first time the psychological theory of analogy became runnable code, and it held.

The third leg was the engineering substrate: the Companion cognitive architecture, a multi-agent system where qualitative reasoning and analogical processing are first-class operations, integrated into an infrastructure designed to support long-duration collaborative interaction between humans and machines. The Companion isn't a standalone algorithm. It's an integrated system combining the Structure Mapping Engine, qualitative representations, sketch understanding (the CogSketch subsystem), knowledge representation and inference, and interactive learning, all designed to work together on the hypothesis that "analogical processing and qualitative representations are at the core of human cognition." The Companion is still active as of 2026, continuously developed by the Qualitative Reasoning Group.

The three legs converge: qualitative representations (what people actually think), analogical reasoning (how people transfer knowledge), and engineering substrate (the integrated system that runs both). Forbus didn't split these into a publication-per-drawer. He built them in the order that made the substrate runnable: first the representations (qualitative physics), then the core operation (analogy), then the integrated system (Companion). Each publication was a load-bearing piece of the substrate that followed.

The flow was the flow of cognitive science from theory into engineering. The bottleneck was the translation step.

II. The Bottleneck

The bottleneck Forbus cleared was the translation from psychological insight into runnable systems infrastructure.

In the 1980s, cognitive psychology had produced compelling models of how humans reason. The models were usually described in prose, sometimes in pseudocode, rarely in a form that could be executed. The barrier wasn't talent or will. It was the absence of a shared engineering discipline for translating psychological theory into running code. If you built a computational model of human analogy, how would you know it was correct? You'd run it on cases where human analogical reasoning was well-documented, and you'd check whether the computational model matched the human behavior. But what counts as a match? How do you measure the fidelity of the mapping? Which mismatches matter and which are acceptable artifacts of the implementation?

Forbus approached this by committing to the structure-mapping theory developed by Dedre Gentner. Structure-mapping theory made explicit predictions about which mappings humans would find analogical and which they wouldn't. If the psychological theory made specific, falsifiable predictions, then a computational implementation of the theory could be measured against human behavior in experiments. The SME (Structure Mapping Engine) could be tested by giving it pairs of analogs that humans had rated in psychological experiments, and checking whether the SME's mapping matched the human judgments. This converted a qualitative psychological theory into a quantitative engineering problem: does the system produce mappings that correlate with human similarity judgments? Testable. Falsifiable. Publishable.

The pattern Forbus used is the same one he applies across the substrate: name the representation explicitly, build the system that runs the representation, measure the system against the phenomenon it models. Qualitative physics uses explicitly-named fluid substances, sources, sinks, and laws (conservation of stuff, causality from change in quantity). The system that reasons over these representations is the qualitative-physics inference engine. Measured against human reasoning about everyday physical situations: does the system produce qualitative predictions that match human intuition?

This is a cleaner articulation of the bottleneck than "making a research-group code base." The bottleneck is the discipline of measuring the translated system against the original phenomenon. Without that discipline, you can ship a system and call it an implementation of a theory, and nobody can tell you you're wrong. With it, the system becomes accountable to the phenomenon it models.

The secondary bottleneck Forbus cleared was the engineering discipline of making systems extensible and recomposable. The Structure Mapping Engine isn't a library you import once and it's done. It's a component that gets used in CogSketch (sketch understanding via analogy), in Companion (as the central reasoning operation), in constraint-solving systems, in educational software. Each application extends SME's interface or changes the priority of the mappings it produces. The system had to be built so that a different team could extend it without forking and maintaining a separate codebase. This is the "substrate as teaching" problem: a system you build becomes the platform for work you didn't anticipate. Forbus's group solved this by designing the structure-mapping operation as a stable interface with explicit hooks for customization, and by publishing enough of the system's internals that external developers could reason about what extensions would work.

The bottleneck was translation + extensibility. Both were cleared by running the system against the phenomenon it modeled, and both required an open enough architecture that the next researcher could build on top without rewriting everything.

III. The Principal Risk

The principal risk Forbus took was building the lab public-facing.

This is not the default academic move. The safe move is: work in the lab, accumulate papers, once you have enough papers and reputation, the outside world takes you seriously. The flow you direct is internal to the research group. You publish papers. The papers get cited. Citations are currency. You don't risk the group's credibility by shipping code that might not work, or by documenting the failures visibly, or by teaching undergraduates using the research infrastructure directly.

Forbus did the opposite. The Qualitative Reasoning Group at Northwestern published code: the SME, CogSketch, the reasoning systems, the inference engines. The group published not just papers but also the systems themselves, in a form that external researchers could download, run, extend, and test. This is a structural risk: if the code doesn't work, the critique comes back at the research thesis, not at the implementation. If the system fails on a problem that seems simple, the psychological theory it's supposed to implement looks weak.

The teaching risk was even sharper. When Sean took Forbus's Qualitative Reasoning class, the grading system was custom-built by Forbus, running the lab's own Lisp code, checking submissions in real-time. If the grading system had a bug (if it rejected correct code or accepted wrong code), the burden fell on Forbus to notice, diagnose, and fix it. The system's failures were Forbus's failures, in public, in a class with paying students. The safer move would have been a generic grading system where bugs are somebody else's problem. Forbus chose to put the research substrate directly into the teaching pipeline.

The risk paid because the substrate was good. But the structure of the risk is what matters here. Forbus wasn't risk-averse about the quality of the system because he'd built quality into the system's architecture. The risks he took were about visibility, not about the underlying work.

A merchant in the Forbus lineage understands that building the substrate public-facing and tying your teaching to the substrate you built is a principal-risk move. It's also the move that generates the signal: the students who take the class remember the substrate for the rest of their careers, and they know that it worked because they used it. Eighteen years later, the memory is still sharp.

IV. The Lineage

The lineage Forbus belongs to is the cluster of operators who fused cognitive science with engineering substrate: the recognition that the medium (the cognitive-systems environment) is the message (the teaching, the research, the theory).

The intellectual forebears are clear. John McCarthy invented Lisp in 1958 as a language for reasoning. Lisp wasn't designed for practical programming; it was designed to make the operations of symbolic reasoning syntactically visible. A Lisp program reads like a statement in logic, which means that reading the code is reading the reasoning. Richard Stallman took this further: Emacs (1985) as the substrate where text, Lisp, and reasoning could be unified. In Emacs, the editor is the Lisp machine, the buffer is the interface, the keystrokes are Lisp events, and extending the editor is writing Lisp that operates on text. This is the pattern Forbus inherited from Stallman: the environment is the reasoning engine, and the user interface is the artifact the reasoning engine is applied to.

Marvin Minsky contributed the frame-based representation of knowledge: the idea that concepts are organized into structures (frames) that carry default values, constraints, and expectations. Frames made knowledge representation compositional: a frame for a BIRD has a slot for WINGS, a slot for FLIGHT, default values for SPEED and ALTITUDE. When you encounter a penguin (a bird that doesn't fly), you're reasoning by exception to the frame structure, which is visible. Minsky's contribution was making the structure of knowledge syntactically explicit, which made reasoning tractable.

Allen Newell and Herbert Simon contributed the production-system architecture (Soar): the insight that a cognitive agent is a system that repeatedly matches the current state against a set of rules (productions), fires the matching rules, and updates the state. This is still the dominant architecture for cognitive agents. The reason is structural: a production system is interpretable. You can read the productions, understand what they do, test whether they match human behavior. Newell and Simon made cognitive architecture a discrete, debuggable, measurable discipline.

Forbus synthesized these lineages. He took McCarthy's and Stallman's principle that reasoning and representation should be syntactically unified in the environment, Minsky's principle that knowledge should be explicitly structured, and Newell-Simon's principle that cognitive architecture should be measured against behavior. The synthesis is the Companion: a system where qualitative representations (explicitly structured), analogical reasoning (the core operation), and measurable human-behavior correspondence (the test) are all integrated into one substrate.

The modern merchant line that inherits from Forbus is the one building the appliance-layer cognitive substrate: the recognition that the next wave of AI isn't about the model (which is becoming commodity electricity), but about the integration of models, inference engines, knowledge representation, user interface, and teaching infrastructure into a unified substrate that can be absorbed by silicon vendors and model vendors without modification. The editor (as Sean is building it), the inference daemon, the knowledge base, the agent-orchestration layer: these are all applications of the cognitive-systems substrate principle. The substrate is the teaching. The merchant position is the operator who recognizes that the substrate, not the model, is what outlasts the hype cycle.

V. The Lesson

Four lessons distill from Forbus's career and the Qualitative Reasoning Group's work.

The substrate is the teaching. Forbus could have taught qualitative reasoning through problem sets and lectures. Instead, he built a cognitive-systems environment where reasoning about physical processes is the native operation of the system, and he taught inside that environment. The students learned qualitative reasoning because they lived in qualitative reasoning as a material thing they could manipulate. This is why Sean remembers the Emacs interface, the Lisp grading system, the buffer-as-protocol architecture, eighteen years later: not because the qualitative-reasoning theory was novel (it wasn't), but because the substrate was there, working, every day in the class. The lesson for the merchant: if you want to teach something, build the substrate that embodies it, and teach inside the substrate. The curriculum will follow.

Open the lab. The Qualitative Reasoning Group published code, published benchmarks, published failures. The SME is a runnable system you can download. CogSketch is open. The Companion is developed in the open. This is a principal-risk move that generates credibility because the code either works or it doesn't, and if it works, external researchers can extend it without negotiating with the lab first. The merchant line that inherits this learns that open systems with clear extension points generate more productive external development than proprietary systems with closed boundaries, because the bottleneck moves from "getting permission to extend" to "understanding the extension interface well enough to not break the system." Openness isn't altruism; it's a structural property of systems that scale.

Measure the system against the phenomenon. Forbus's work is testable because the psychological theories he implements make falsifiable predictions. The Structure Mapping Engine can be compared to human similarity judgments. Qualitative-physics reasoning can be compared to human intuition about physical processes. Companion's reasoning can be measured against task performance in collaborative scenarios. A system that isn't measured against the phenomenon it claims to model is a system where you can't tell if it works. The merchant learns that measurement is the antidote to hype, and that clear, named benchmarks are the highest-leverage position you can take on your own work.

Build the substrate extensible. The reason the Qualitative Reasoning Group remains active and productive is that the core systems (SME, qualitative reasoning engines, sketch understanding) were built as stable interfaces that different teams can extend. A team building educational software extends the SME by changing how analogies are prioritized. A team building collaborative systems extends the Companion by adding new knowledge representation layers. The extension happens without forking the core. This requires explicit interface design, documented assumptions, and version discipline: all things that feel like overhead when you're building the first system but become survival skills when the system has been in use for decades. The merchant learns that the difference between a system and a platform is the extension interface, and that designing the extension interface matters more than getting the first implementation right.


VI. Honest Limitations

Five limitations the essay does not pretend to have resolved:

1. The Northwestern Qualitative Reasoning Group publication record is read at citation-level, not primary-research-level. The substantial Forbus and QRG publication record across approximately four decades (the foundational 1984 Artificial Intelligence qualitative-physics paper; the 1989 Falkenhainer-Forbus-Gentner SME paper; the broader CogSketch, Companion, and sketch-understanding publication series across the 1990s-2020s; the 2025 Gentner-Forbus retrospective in Current Directions in Psychological Science) is read at citation-and-abstract level in this essay through the cited Primary Sources references; the original implementation source code (the SME source distribution; the CogSketch source; the Companion architecture source), the original Northwestern QRG internal research-protocol documentation, and the underlying psychological-experimental datasets that the structure-mapping theory is validated against have not been independently reviewed at research-level precision. Quantitative claims (the four-decade research-program duration; the substantial number of structure-mapping comparisons against human-judgment data; the CogSketch deployment scope in undergraduate engineering classrooms) should be read as engineering-order-of-magnitude rather than research-cited-precision.

2. The Mercantile-lens reading is the essay's analytical frame, not settled-historiography consensus. Conventional cognitive-science literature on the Forbus-Gentner research program emphasizes different load-bearing variables. The cognitive-science-internal treatment substantially emphasizes structure-mapping theory as the canonical account of analogical reasoning; the broader AI-history treatment substantially reads the qualitative-physics program through the GOFAI-vs-connectionism historical contest; the educational-technology-deployment treatment substantially reads CogSketch and Companion through the broader intelligent-tutoring-systems literature. The Lineage reading (substrate-as-teaching architectural commitment producing multi-decade research-program sustainability through open-code-and-published-benchmark discipline) is interpretive, not academic canon, and a reader who weights any of the conventional readings heavily will find the Mercantile-lens engagement deliberately framework-load-bearing rather than canonical-cognitive-science-historiographical.

3. The analogical-reasoning research program is contested in the broader cognitive-science literature. Structure-mapping theory (the Gentner 1983 foundational formulation; the Falkenhainer-Forbus-Gentner 1989 SME computational instantiation; the subsequent multi-decade refinement program) is one of several competing accounts of human analogical reasoning in the cognitive-science literature; the principal competing accounts include the multi-constraint theory (Holyoak and Thagard), the LISA connectionist account (Hummel and Holyoak), and the broader probabilistic-program-inference account (Tenenbaum and colleagues). The structure-mapping theory is the dominant computationally-instantiated account in the literature but is not uncontested. The essay's reading (which treats SME as canonical for analogical-reasoning at the computational-implementation level) substantially weights the structure-mapping-dominance position; a reader who weights the multi-constraint or connectionist or probabilistic-program-inference accounts heavily will find the structure-mapping-as-canonical reading deliberately framework-load-bearing.

4. The structure-mapping engine's commercial impact is limited compared to large-language-model success. The SME and its descendants (CogSketch, Companion) have produced substantial cognitive-science research output and substantial educational-technology deployment in undergraduate engineering classrooms across multi-decade scale; they have not produced commercial-AI-system impact at scale comparable to the 2017-and-subsequent transformer-architecture large-language-model commercial trajectory. The essay's substrate-as-teaching architectural lesson is the load-bearing reading for the Mercantile-lens engagement and is canonical for the broader QM editor-substrate lineage that the essay's Cross-References develop; the essay does not pretend that the structure-mapping computational architecture has produced commercial-AI-system scale comparable to the LLM trajectory. A reader who weights commercial-AI-system-scale heavily as the operative metric will find the structure-mapping architectural reading deliberately bounded at the research-program-and-educational-deployment scale rather than at the commercial-AI-system scale.

5. The framework would be partially refuted by a rigorous LLM-vs-structure-mapping comparison. If a comparable rigorous-and-published comparison between contemporary large-language-model-based analogy systems (substantially GPT-4-class or Claude-class transformer architectures prompted for analogical reasoning at the structure-mapping-benchmark scale) and the canonical SME implementation on specific structure-mapping benchmarks (the Karla-the-hawk analogy benchmark; the broader QRG-published analogy-task benchmark suite; the cross-domain analogy-generation benchmarks the Forbus-Gentner program has developed across the 1990s-2020s) substantially demonstrated that the LLM architectures outperform the SME implementation at the structure-mapping benchmarks where the SME is positioned as the canonical computational instantiation, the essay's structure-mapping-as-load-bearing-analogical-reasoning-architecture reading would be substantially refuted at the architectural-mechanism level. The symmetric falsification holds: a comparison demonstrating that the SME implementation systematically outperforms contemporary LLM architectures at the structure-mapping benchmarks would substantially confirm the architectural reading at the same mechanism level. The falsification possibility should be held open and tested against subsequent QM cognitive-systems-substrate Lineage canon entries that develop the broader analogical-reasoning-architectural-comparison question at depth.


Cross-References

This entry synthesizes the cognitive-systems lineage that Sean names in the draft essay apple-next-pixar-emacs, which argues that the next era of the AI-coding economy runs through an open-source, AGPL-licensed editor substrate integrating Forbus-style qualitative reasoning with multi-agent orchestration. The essay names Forbus's Northwestern class as a preview of that substrate. The lineage follows from the audit discipline documented in lineage-42-0theta-manifesto, which establishes that honest claim language and public audit are the load-bearing engineering practices. Forbus's group operates under this discipline: open code, published benchmarks, testable theory.

The closest historical lineage is the cluster that includes Richard Stallman (Emacs as substrate), John McCarthy (Lisp as reasoning), and Marvin Minsky (structured knowledge representation). The immediate successor in the contemporary merchant line is Sean's work on the appliance-layer editor, which inherits Forbus's principle that the substrate is the teaching and that cognitive-systems environments are built to be inhabited and extended, not just shipped and consumed.


Primary Sources


Type-I and Type-II Audit

Type-I (overclaim) risk. This entry relies on Sean's personal memory of Forbus's class in 2024 for specific details (Emacs interface, Lisp grading system, webpage-based submission). I have verified through public sources that Forbus teaches Qualitative Reasoning courses (COMP_SCI 496 is listed in the Northwestern course catalog), that he directs the Qualitative Reasoning Group, and that the group operates in the open (code published, papers published). I have not independently verified from external sources that Forbus built the specific grading infrastructure Sean describes, though the infrastructure is consistent with Forbus's documented teaching philosophy and the group's public emphasis on building cognitive-systems environments. If Forbus or his group published information contradicting Sean's recollection of the class infrastructure, the entry should be revised.

Type-II (missed risk). The entry may under-weight the computational complexity of the qualitative-reasoning approach. Modern machine learning has substantially out-performed qualitative-physics systems on many reasoning tasks, and the Companion architecture has not become the mainstream approach to cognitive modeling that some of its early advocates hoped. The entry frames this as a deliberate choice to build toward human-level collaborative AI rather than narrow task performance, but this framing may be generous to the Companion's limitations relative to contemporary LLM-based systems. The entry's claim that "the substrate outlasts the hype cycle" is a prediction that qualitative reasoning will become relevant again when AI systems need to collaborate with humans over long periods with transparent reasoning; this is a bet about the future, not a documented fact. If the bet is wrong, the entry's thesis about the merchant line inheriting from Forbus will need revision.


Status: DRAFT. No public push. Do not consult with Forbus about this entry; he has not been consulted and has not endorsed any claims here.

Originally published in the journal as Lineage 44: Kenneth Forbus, The Substrate Is the Teaching.