Greenfield Development

with Claude Code

Martin Brian ▸ Senior AI Engineer

Round 1 ▸ 00:05:00

The Game

Audience vs. The Rival

The room lighting up phone flashlights during the Simon Sees game — ▸ Flashlights ▸ the room plays live

How to play

▸ The Room ▸ you 0

▸ Anatomy of 1 round

▸ The Rival ▸ pre-recorded 0

▸ 1 ▸ BUILD-UP

Music plays. The condition lands in silence.

▸ 2 ▸ 🟢 GREEN LIGHT

Move. Cluster. Coordinate. ~5s.

▸ 3 ▸ 🔴 FREEZE

One snapshot. The doll is checking…

▸ 4 ▸ ⚖️ SCORE

SAM 3.1 counts who matched. Meter moves.

▸ HOST: Simon says — show me something red. Phase ▸ Reveal

No phones ▸ no cloud ▸ a webcam watches the room ▸ only SAM runs live

Watch for "Simon says"

▸ Simon round

Host: "Simon says — show me something red."

Match it ▸ coverage scores positive.

▸ Feint round

Host: "Show me something red." (no "Simon says")

It's a trap ▸ matchers score negative.

▸ Restraint is an action. Holding still on a feint scores too.

Round 2 ▸ Why

Software is
Complex

Cynefin (kuh-NEV-in) says hi

The four domains

Complex

▸ Probe ▸ Sense ▸ Respond

Emergent practice. Unknown unknowns. Cause & effect only clear in hindsight. Software + AI lives here.

Complicated

▸ Sense ▸ Analyze ▸ Respond

Good practice. Known unknowns. Several right answers. Specialists help.

Chaotic

▸ Act ▸ Sense ▸ Respond

Novel practice. No cause-effect. Stabilize first, ask later. (Apollo 13.)

Clear

▸ Sense ▸ Categorize ▸ Respond

Best practice. Known knowns. Checklists work. (Password reset.)

Disorder ▸ you don't know which

▸ Software dev = Complex. Best practices don't exist here.

Snowden · Cynefin framework

AI is an amplifier

AI magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.
— Google DORA ▸ ROI of AI-Assisted Software Development

DORA ▸ cloud.google.com/resources/content/dora-roi-of-ai-assisted-software-development

…so where does it land?

▸ The system, not the tool

Returns come from the platform, the workflows, the team. Model = multiplier; org = integer.

▸ Code is a liability

Op cost > build cost. More code without oversight = more verification debt.

▸ Local wins ≠ global wins

Without foundations ▸ local productivity that drowns in downstream chaos.

▸ Three places the amplifier rule actually bites.

Round 3 ▸ How

Rules of
Thumb

We can't keep up with tools. We can teach the thinking.

Pre-mortem

▸ The exercise

"It's talk day. The demo died publicly. What killed it?"

Unsorted

Wi-Fi flakes during regen

Projector cable goes dead

Not Top Ten

SAM weights re-download mid-show

Maybe

Audience too dim — SAM mis-counts

A slide is out of date by talk day

Top Ten

Regen produces something that doesn't run

A live model gets called on stage and 502s

HBR 2007 · Klein · Mountain Goat Software ▸ surface the Top Ten before they're real.

Close the loop

▸ Each failure mode → an agentic mitigation

▸ Slide goes stale

Nightly skill re-checks claims vs. source repo ▸ opens a PR on drift.

▸ You don't wait for the Top Ten ▸ even a "Maybe" earns an agent.

▸ The pre-mortem isn't an exercise. It's a backlog.

LLMs are coherence engines, not truth engines

▸ Vibing

prompt ▸ LLM ▸ 🌀 app

Looks right. Can't tell which 10% is wrong without running it.

▸ Rewilded SE

prompt ▸ LLM ▸ 🔧 tool ▸ 🗺 fact

The LLM writes the tool that retrieves the fact. The fact is the verdict.

▸ Coherence ≠ truth ▸ the fact lives outside the model.

After Wardley & Girba, Rewilding Software Engineering · wasowski.jarek, "Stop writing specs, start writing facts."

"Great instinct."

user ▸ "Should I rewrite our auth in Rust this sprint?"
claude ▸ "Great instinct — Rust would eliminate a whole class of bugs in your auth layer. Let me sketch the migration…"
user ▸ "…is that actually a good idea?"
claude ▸ "Honestly, no. Three open tickets, no Rust expertise, and the bugs aren't in auth."

Coherence, not judgment. The verifier has to live outside the conversation.

▸ Composite of real exchanges ▸ golden tests, hooks, CI ▸ the seam between the model and the verdict.

Where do humans fit?

Without comprehension, engineering becomes belief.
— after Wardley & Girba, Rewilding Software Engineering

▸ Code is the blueprint

The "spec" is closer to a wishlist. The code is what makes the decisions.

▸ Knight Capital

$440M lost in 45 minutes ▸ behavior the system didn't understand. Comprehension is a feature.

Belief

seams failure modes blast radius

Comprehension

▸ Comprehension is architectural — own the seams, not every line.

The primitives, at a glance

▸ Part 1 of 2 ▸ deterministic / mechanical

▸ Primitive	What it is	When it fires	Key point
▸ Permissions	allow / ask / deny rules in `settings.json`	every tool call	Owner Claude Code, not the model
▸ Hooks	shell commands on lifecycle events	PreToolUse, PostToolUse, Stop, SessionStart…	Trait deterministic
▸ MCP servers	external tools via Model Context Protocol (stdio · HTTP · SSE)	when the model calls them	Use live data, APIs · model-driven trigger

▸ Three deterministic primitives ▸ next slide: the probabilistic three.

The primitives, at a glance

▸ Part 2 of 2 ▸ probabilistic / model-driven

▸ Primitive	What it is	When it fires	Key point
▸ Sub-agents	isolated context, own tools + prompt	when spawned by the main agent	Use parallel work · context protection · specialized review
▸ CLAUDE.md	markdown loaded in full at session start	always-on context	Trait probabilistic — model reads, doesn't obey
▸ Skills	packaged markdown + scripts	on-demand when the description matches the prompt	Use recurring procedures

▸ Plugins are packaging, not a primitive ▸ they bundle the six above.

▸ Six tools in the box ▸ next slide picks one for the job.

Picking a Claude Code primitive

▸ Walk top-down ▸ first YES wins

Q1Same approval, over & over?

YES ▸

Permissions

NO ↓

Q2Deterministic auto-fire on a lifecycle event?

YES ▸

Hook

NO ↓

Q3External system or live data?

YES ▸

MCP server

NO ↓

Q4Verbose / parallelizable work to isolate?

YES ▸

Sub-agent

NO ↓

Q5Applies on every prompt in the project? (rules live here too — split with @imports to debloat)

YES ▸

CLAUDE.md

NO ↓

FALLBACKAnything else recurring

▸

Skill

decision-framework.md ▸ plugins are packaging, not a primitive ▸ lands ~90% of the time.

Or let it pick for you

▸ /claude-automation-recommender ▸ the decision tree, run by Claude Code

SCAN

Reads the repo — stack, scripts, repeated rituals, friction points.

MAP

Recommends hooks · sub-agents · skills · plugins · MCP — each tied to a need, with the why.

YOU

Still your call — run each suggestion past the four gates before you install it.

▸ The meta-loop ▸ Claude Code sets up Claude Code.

▸ Best on a cold-start repo or onboarding ▸ it surfaces insight, you decide what to install ▸ the judgment stays yours.

Permissions ▸ three tiers

allow same outcome every time. Read, Grep, npm test.

ask side effects worth eyeballing. git push, npm publish.

deny destructive / unrecoverable. rm -rf, --force.

▸ Default ask. Promote after the 3rd "yes". Demote after the 1st regret.

▸ The lightest fix on the list ▸ first thing to reach for, last thing to skip.

AI moved the doors

▸ Type 1 ▸ one-way door

Irreversible ▸ slow down, gather data, commit. Used to be: most custom code.

▸ Type 2 ▸ two-way door

Reversible ▸ move fast, accept being wrong. Now: anything you can regen from a spec.

▸ AI didn't change where the doors are ▸ it changed how many components live on the Type 2 side.

▸ Bezos 2016 · one-way / two-way doors ▸ references/framework-survey.md §4

Round 4 ▸ Engineering

The Tool
Audition

A project moves through stages ▸ matching the tool to the stage is the engineering

The gates ▸ v0.1

▸ Observability & Ownership

See inside it ▸ scannable, auditable, no black box. Can't observe = don't own.

▸ Correctness of Output

Is the result right? Verifiable, falsifiable ▸ or running on faith?

▸ Cost

$/run, tokens too. /fast is great and pricey ▸ threshold is per-project.

▸ Simplicity & Maintainability

Makes sense in 3 months? Can a teammate run it without you?

Every gate is scored on one axis ▸ drop below budget on any single gate

v0.1 for this project ▸ yours may add a 5th: Ethics · Privacy · Latency · Compliance

Score the tool

▸ inspector The Gates inspector

▸ 1 ▸ Profile the project

Which stage — throwaway, internal, or public? Then weigh cost sensitivity · precision · latency · blast radius · team familiarity. Set a budget per gate.

▸ 2 ▸ Score each tool ▸ A+ · A · B · C · D · F

Obs / Cost / Simp / Corr. A is good. F is fail. Can't decide A-or-B? Pick B. The letter forces a verdict.

▸ 3 ▸ Below budget on any gate ▸ drop it

No averaging. One failing gate = noise. Engineering lives in the threshold.

▸ Next ▸ three real examples.

Tools, in the order you reach for them

Scores are circumstantial ▸ each row = ONE use case ▸ re-score for yours.

▸ Step 1 ▸ ↓

Project setup

/init · CLAUDE.md

▸ Step 2 ▸ ↓

Daily mode

/fast · Plan mode

▸ Step 3 ▸ ↓

Guardrails

Permissions · hooks (used right vs wrong)

▸ Step 4 ▸ ↓

External data

Context7 MCP · Playwright MCP

▸ Step 5 ▸ ↓

Community plugins

claude-mermaid · pr-review-toolkit

▸ Step 6 ▸ ↓

Famous frameworks

superpowers · wshobson/agents

▸ Step 7 ▸ ↓

Around Claude Code

ccusage · claudia · claude-code-router

▸ ↓ to walk through · full table at gates-scored-tools.md

Step 1 · Project setup

Tool	Obs	Cost	Simp	Corr	Use case
`/init`code.claude.com/docs/en/memory	A+	A	A+	A	scan codebase · draft CLAUDE.md · you review the seams
`CLAUDE.md`tight: under 200 lines, conventions only	A+	A	A+	A	project conventions ▸ always-on context
`CLAUDE.md`bloated: 500+ lines, conflicting rules, big imports	C	D	C	D	same tool, wrong use ▸ Claude reads it as context, not enforcement

▸ Same tool, two use cases, very different scores ▸ this is the whole point.

▸ Run /init once per repo. Keep CLAUDE.md tight or you'll regress yourself.

Step 2 · Daily mode

Tool	Obs	Cost	Simp	Corr	Use case
`/fast`accelerated Opus speed mode	A	D	A+	A	Personal/hobby + Prototype ▸ one-shot prep · cost-gated above
Plan modepropose-then-execute	A+	C	A+	A+	non-trivial change · catches errors before they ship

▸ Reach for these for individual tasks. Each has a sweet spot.

Step 3 · Guardrails

Tool	Obs	Cost	Simp	Corr	Use case
Permissionscode.claude.com/docs/en/permissions · first-party	A+	A+	A+	A+	allow / ask / deny ▸ the lightest fix
claude-code-hooks-masteryused right: 1 hook, 1 concern, fast	A+	A+	C	A	surgical: lint, secret-scan, boundary-check
Hooks gone wrong5 chained, calls external APIs, blocks on slow ones	C	D	D	C	same tool, wrong use ▸ over-engineered, every Claude action stalls

▸ Use hooks for ▸ lint · secret detection · boundary checks · spec-drift · cost audit · test gating ▸ one hook per concern.

▸ Same primitive, two use cases, very different scores ▸ keep them simple, fast, single-purpose.

Step 4 · External data

Tool	Obs	Cost	Simp	Corr	Use case
Context7 MCPgithub.com/upstash/context7	D	C	A+	A	Prototype + Internal product ▸ live library docs · vendor before Public / Regulated
Playwright MCPgithub.com/microsoft/playwright-mcp	A	A	A	A+	UI verification · real browser, no hallucinations
Slack MCPclaude.ai built-in · OAuth	C	A	A	B	Internal product+ ▸ ops/on-call · lock scope; send is irreversible
Vercel MCPvercel.com · OAuth	C	A	A	B	Internal product+ ▸ deploys + envs · split read/write configs
Gmail / Drive MCPclaude.ai built-in	D	A	A+	A	Personal / hobby only ▸ forbidden for Internal product+ (client / business data)
Generic vendor-API MCPthe pattern, not a product	D	C	A	C	Prototype OK ▸ vendor or replicate before Internal product+ ▸ the cautionary archetype

▸ External system ▸ Obs is the gate to watch. Vendor or replicate before prod.

Step 5 · Community plugins

Tool	Obs	Cost	Simp	Corr	Use case
claude-mermaidgithub.com/veelenga/claude-mermaid	A+	A	A+	A+	diagrams in any repo
revealjs-skillgithub.com/ryanbbrown/revealjs-skill	A+	A+	A	A	decks like this one

▸ Pin both to a commit ▸ .claude-plugin/marketplace.json

{
  "plugins": [
    {
      "name": "claude-mermaid",
      "source": { "source": "github", "repo": "veelenga/claude-mermaid",
                  "ref": "v0.4.0", "sha": "<40-char commit SHA>" }
    },
    {
      "name": "revealjs-skill",
      "source": { "source": "github", "repo": "ryanbbrown/revealjs-skill",
                  "sha": "<40-char commit SHA>" }
    }
  ]
}

▸ ref = branch/tag (drifts) ▸ sha = exact commit (frozen) ▸ both supported on github, url, git-subdir sources.

Step 6 · Famous frameworks

Tool	Obs	Cost	Simp	Corr	Use case
obra/superpowersgithub.com/obra/superpowers	A	D	D	A+	TDD methodology ▸ pay Cost & Simp to buy A+ Corr
pr-review-toolkit6 specialist subagents ▸ this marketplace	A	C	C	A+	pre-merge review ▸ same trade as superpowers, lighter
wshobson/agents191 sub-agents · 78 plugins	A	C	C	A	cherry-pick 2–3 ▸ don't install the whole marketplace
claude-flow314 MCP tools · swarm	D	F	D	D	Personal / hobby demo only ▸ even Corr is D — nothing to buy ▸ drop above

▸ Cost & Simp can be paid when Corr is the bottleneck ▸ but fail the gate you're buying = no ▸ no averaging.

▸ Cherry-pick from heavy methodologies. Don't install the whole marketplace.

Step 7 · Around Claude Code

Tool	Obs	Cost	Simp	Corr	Use case
ccusagegithub.com/ryoppippi/ccusage	A+	A+	A+	A+	local token/cost analyzer ▸ the no-brainer ▸ default-on
claudiagithub.com/getAsterisk/claudia	A	A+	A	A	Internal product+ ▸ desktop dashboard for teams that want a UI alongside the CLI
claude-code-routergithub.com/musistudio/claude-code-router	D	A+	C	D	Personal / hobby only ▸ routes to DeepSeek/Gemini · two failing gates ▸ never for Internal product+ (client data)

▸ Scores are circumstantial ▸ claude-code-router scores the same — recommendation flips from Reject to Buy on a personal hobby project.

▸ ccusage makes Cost enforceable ▸ claudia adds a lens ▸ router is the cautionary tale.

Tactics ▸ how to raise scores

	Observability	Cost	Simplicity	Correctness
Vendoring pull the code in	++	+	−	=
Version locking pin models, prompts, data	+	=	=	++
Audit hooks cheap-model checks	++	−	=	+

▸ Reads as deltas ▸ ++ raises a grade · = unchanged · − small cost ▸ match the move to the failing gate.

Compose your dev experience from many small tools. — after Wardley & Girba, Rewilding SE

Take the rubric home

QR ▸ mbrian23.github.io/claude-tool-audit

▸ claude-tool-audit ▸ the rubric, runnable

A Claude Code plugin that walks you through scoring a candidate tool against the four gates. 29+ worked audits covering models, MCPs, hooks, frameworks, wrappers — all parseable, all comparable.

▸ mbrian23.github.io/claude-tool-audit

▸ /claude-tool-audit:audit-tool <tool> — score one candidate
▸ /claude-tool-audit:audit-project — audit a whole repo
▸ /claude-tool-audit:budget-planner — set per-gate budgets for a new project

▸ Scan ▸ install ▸ re-score for your project. The toolkit is portable. The judgment isn't.

Finale

The Reveal

The deck rebuilds the game

Would your gates change?

▸ EMP-774 ▸ task compliance monitor Worker surveillance HUD — same aesthetic, different use case

▸ Same person. Same HUD aesthetic. The use case shifted. ▸ Pick your gates. Score again. Drop the noise.

▸ The toolkit is portable. The judgment isn't.

Same gates. Different setup wins.

Budget	The Game ▸ 5 min, a laugh	Surveillance ▸ 24/7, livelihoods
Observability	≥ C ▸ ok	≥ A+ ▸ required
Cost	≥ D ▸ 5 min/year	≥ A ▸ 24/7 runtime
Simplicity	≥ A ▸ wins	≥ D ▸ layered OK
Correctness	≥ C ▸ FPs are funny	≥ A+ ▸ FPs cost jobs
+ Ethics 5th gate, new project	— n/a	≥ A+ ▸ added

▸ Different projects ▸ different budgets AND sometimes different gates ▸ pick yours.

▸ The toolkit is portable. The judgment isn't.

▸ Verdict

Thank you.

questions ▸ rebuild ▸ play

▸ The Room ∞

▸ The Rival 0

Greenfield Development

The Game

How to play

Watch for "Simon says"

Software isComplex

The four domains

AI is an amplifier

…so where does it land?

Rules ofThumb

Pre-mortem

Close the loop

LLMs are coherence engines, not truth engines

"Great instinct."

Where do humans fit?

The primitives, at a glance

The primitives, at a glance

Picking a Claude Code primitive

Or let it pick for you

Permissions ▸ three tiers

AI moved the doors

The ToolAudition

The gates ▸ v0.1

Score the tool

Tools, in the order you reach for them

Step 1 · Project setup

Step 2 · Daily mode

Step 3 · Guardrails

Step 4 · External data

Step 5 · Community plugins

Step 6 · Famous frameworks

Step 7 · Around Claude Code

Tactics ▸ how to raise scores

Take the rubric home

The Reveal

Would your gates change?

Same gates. Different setup wins.

Thank you.

Software is
Complex

Rules of
Thumb

The Tool
Audition