Outcome Engineering — Spec-driven development for AI agents

Three things quietly go wrong

Goal drifts out of sight

Specs capture what to build, but not why it exists. Over time, the product is defined by what was built rather than what it should be. Dead code accumulates because nobody can tie it back to a purpose.

Agents imitate whatever they find

Context selection is heuristic: grepping for keywords, embedding similarity, tool defaults. As the repo grows, agents pick up patterns without knowing which ones are current or correct.

Instructions decay silently

Specs evolve, but tests still pass for old behavior. The binding between what you intended and what the code does erodes with every change — and nothing signals the gap.

Three principles for staying in control

spx/

├── spx-cli.product.md

├── 21-test-harness.enabler/● valid

├── 32-parse-directory-tree.enabler/● valid

└── 54-spx-tree-interpretation.outcome/○ needs work

├── spx-tree-interpretation.md

├── 43-status-rollup.outcome/◐ stale

└── 54-spx-tree-status.outcome/○ needs work

Always be converging

Iterate on a durable artifact in small, reviewable steps — each reversible and repeatable, each reducing uncertainty about what the system should do.

Agents will get things wrong. Because every agent starts from the spec — not the implementation — mistakes do not compound. Go back one step, try again, and the spec anchors the retry.

spx/

├── spx-cli.product.md

├── 15-cli-framework.adr.md

├── 21-test-harness.enabler/● valid

├── 32-parse-directory-tree.enabler/● valid

└── 54-spx-tree-interpretation.outcome/

└── 43-status-rollup.outcome/

Determinism unless creation is the goal

Never generate what can be derived deterministically.

The path from root to node defines what context an agent receives: ancestor specs and lower-index siblings. No keyword search. No embedding lottery.

Curate context rather than letting agents search the codebase. Link traceability explicitly rather than inferring it. Reserve generative capacity for the parts that require it.

54-spx-tree-interpretation.outcome/

├── spx-tree-interpretation.md

│ ↑ human writes hypothesis + assertions

├── tests/

│ ↑ agent writes tests + implementation

└── spx-lock.yaml

↑ system binds spec to evidence

Ask what matters

Expose the maximum-leverage decisions. Let the agent handle the rest.

The structure separates intended behavior (spec) from whether that behavior holds (tests + lock file). Humans write the hypothesis and assertions; agents write the tests and implementation.

Why outcomes are the right interface

Outcomes bridge the gap between what the product should do and what the code actually does. They are present at the moment of every decision — and they are the stable interface for every future agent.

Outcome Hypothesis

Roots/

Goals and understanding

Below the surface: the business goals, user research, and domain knowledge that inform what to build. These are the inputs to every outcome hypothesis.

Trunk/

The outcome hypothesis

The interface between product thinking and engineering. Each outcome states a belief about what change the software will produce — and what evidence would confirm it.

Canopy/

Validated outcomes

Above ground: the specs, tests, and lock files that make each outcome visible. Green leaves are validated. Amber means stale. Gray means work is needed.

The Spec Tree in practice

A git-native directory structure where each node co-locates a spec, its tests, and a lock file.

spx/

├── spx-cli.product.md

├── 15-cli-framework.adr.md

├── 21-test-harness.enabler/

└── ...

The product node anchors everything

Every Spec Tree starts with a single product file at its root. This file captures what the product is and why it exists — the value function that every other node must trace back to.

When product learning changes, the spec still contains why it exists. Every outcome has evidence because it traces back to this root.

spx/

├── spx-cli.product.md

├── 15-cli-framework.adr.md

├── 15-tree-structure-contract.pdr.md

├── 21-test-harness.enabler/

└── ...

Decision records capture choices upfront

ADRs and PDRs sit alongside the specs they affect. Architecture decisions (ADR) and product decisions (PDR) are captured before implementation begins.

When an agent needs context, it doesn’t guess — it reads the decision record on the path from root to its current node. Context is deterministic.

spx/

├── spx-cli.product.md

├── 15-cli-framework.adr.md

├── 21-test-harness.enabler/● valid

├── 32-parse-directory-tree.enabler/● valid

├── 43-node-status.enabler/● valid

├── 54-spx-tree-interpretation.outcome/

└── ...

Enablers build infrastructure bottom-up

Enabler nodes (marked with .enabler) are infrastructure — shared utilities, parsers, test harnesses. The numeric prefix encodes dependency order: lower numbers are dependencies for higher ones.

Index 21 (test-harness) must be valid before index 32 (parse-directory-tree) can be worked on. The tree encodes this constraint in the filename.

spx/

├── ...

├── 54-spx-tree-interpretation.outcome/◐ stale

│ ├── spx-tree-interpretation.md

│ ├── 21-parent-child-links.enabler/● valid

│ ├── 43-status-rollup.outcome/◐ stale

│ └── 54-spx-tree-status.outcome/○ needs work

└── ...

Outcomes express testable hypotheses

Outcome nodes (marked with .outcome) each begin with a belief about what change the software will produce — an outcome hypothesis with testable assertions.

When a spec changes, its lock file hash breaks. Parent nodes inherit the worst child state. Staleness bubbles up to the root — nothing hides. Three states — valid, stale, needs work — tell you exactly where the product stands.

## spx-lock.yaml

schema: spx-lock/v1

blob: a3b7c12

tests:

- path: tests/status.unit.test.ts

blob: 9d4e5f2

# When either blob changes,

# the node becomes ◐ stale

Lock files bind specs to evidence

Each node can have a spx-lock.yaml that records Git blob hashes for the spec and its tests. When either side changes, the hash breaks and the node is visibly stale — before anyone runs a test.

This is drift detection: the binding between spec and tests never silently decays.

Build with outcomes, not just features

The Spec Tree is open source. Read the methodology, explore the CLI, or try it in your next project.

Read the docs View on GitHub Try spx