Marvik ▸ ¡IA en vivo! ▸ 2026-05-28

Greenfield Development

with Claude Code

Round 1 ▸ 00:05:00

The Game

Audience vs. The Rival

How to play

▸ The Room ▸ you 0
▸ Anatomy of 1 round
▸ The Rival ▸ pre-recorded 0

▸ 1 ▸ BUILD-UP

Music plays. The condition lands in silence.

▸ 2 ▸ 🟢 GREEN LIGHT

Move. Cluster. Coordinate. ~5s.

▸ 3 ▸ 🔴 FREEZE

One snapshot. The doll is checking…

▸ 4 ▸ ⚖️ SCORE

SAM 3.1 counts who matched. Meter moves.

▸ HOST: Simon says — show me something red. Phase ▸ Reveal
No phones ▸ no cloud ▸ a webcam watches the room ▸ only SAM runs live

Watch for "Simon says"

▸ Simon round

Host: "Simon says — show me something red."

Match it ▸ coverage scores positive.

▸ Feint round

Host: "Show me something red." (no "Simon says")

It's a trap ▸ matchers score negative.

▸ Restraint is an action. Holding still on a feint scores too.

Round 2 ▸ Why

Software is
Complex

Cynefin (kuh-NEV-in) says hi

The four domains

Complex

▸ Probe ▸ Sense ▸ Respond

Emergent practice. Unknown unknowns. Cause & effect only clear in hindsight. Software + AI lives here.

Complicated

▸ Sense ▸ Analyze ▸ Respond

Good practice. Known unknowns. Several right answers. Specialists help.

Chaotic

▸ Act ▸ Sense ▸ Respond

Novel practice. No cause-effect. Stabilize first, ask later. (Apollo 13.)

Clear

▸ Sense ▸ Categorize ▸ Respond

Best practice. Known knowns. Checklists work. (Password reset.)

Disorder ▸ you don't know which

▸ Software dev = Complex. Best practices don't exist here.

Snowden · Cynefin framework

AI is an amplifier

AI magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.

— Google DORA ▸ ROI of AI-Assisted Software Development
DORA ▸ cloud.google.com/resources/content/dora-roi-of-ai-assisted-software-development

…so where does it land?

▸ The system, not the tool

Returns come from the platform, the workflows, the team. Model = multiplier; org = integer.

▸ Code is a liability

Op cost > build cost. More code without oversight = more verification debt.

▸ Local wins ≠ global wins

Without foundations ▸ local productivity that drowns in downstream chaos.

▸ Three places the amplifier rule actually bites.

Round 3 ▸ How

Rules of
Thumb

We can't keep up with tools. We can teach the thinking.

Pre-mortem

▸ The exercise

"It's talk day. The demo died publicly. What killed it?"

Unsorted

Wi-Fi flakes during regen
Projector cable goes dead

Not Top Ten

SAM weights re-download mid-show

Maybe

Audience too dim — SAM mis-counts
A slide is out of date by talk day

Top Ten

Regen produces something that doesn't run
A live model gets called on stage and 502s
HBR 2007 · Klein · Mountain Goat Software ▸ surface the Top Ten before they're real.

Close the loop

▸ Each failure mode → an agentic mitigation

▸ Slide goes stale

Nightly skill re-checks claims vs. source repo ▸ opens a PR on drift.

▸ You don't wait for the Top Ten ▸ even a "Maybe" earns an agent.

▸ The pre-mortem isn't an exercise. It's a backlog.

LLMs are coherence engines, not truth engines

▸ Vibing

prompt ▸ LLM ▸ 🌀 app

Looks right. Can't tell which 10% is wrong without running it.

▸ Rewilded SE

prompt ▸ LLM ▸ 🔧 tool ▸ 🗺 fact

The LLM writes the tool that retrieves the fact. The fact is the verdict.

▸ Coherence ≠ truth ▸ the fact lives outside the model.

After Wardley & Girba, Rewilding Software Engineering · wasowski.jarek, "Stop writing specs, start writing facts."

"Great instinct."

user ▸ "Should I rewrite our auth in Rust this sprint?"
claude ▸ "Great instinct — Rust would eliminate a whole class of bugs in your auth layer. Let me sketch the migration…"
user ▸ "…is that actually a good idea?"
claude ▸ "Honestly, no. Three open tickets, no Rust expertise, and the bugs aren't in auth."

Coherence, not judgment. The verifier has to live outside the conversation.

▸ Composite of real exchanges ▸ golden tests, hooks, CI ▸ the seam between the model and the verdict.

Where do humans fit?

Without comprehension, engineering becomes belief.

— after Wardley & Girba, Rewilding Software Engineering

▸ Code is the blueprint

The "spec" is closer to a wishlist. The code is what makes the decisions.

▸ Knight Capital

$440M lost in 45 minutes ▸ behavior the system didn't understand. Comprehension is a feature.

Belief
seams failure modes blast radius
Comprehension

▸ Comprehension is architectural — own the seams, not every line.

The primitives, at a glance

▸ Part 1 of 2 ▸ deterministic / mechanical

▸ Primitive What it is When it fires Key point
▸ Permissionsallow / ask / deny rules in settings.jsonevery tool callOwner Claude Code, not the model
▸ Hooksshell commands on lifecycle eventsPreToolUse, PostToolUse, Stop, SessionStart…Trait deterministic
▸ MCP serversexternal tools via Model Context Protocol (stdio · HTTP · SSE)when the model calls themUse live data, APIs · model-driven trigger
▸ Three deterministic primitives ▸ next slide: the probabilistic three.

The primitives, at a glance

▸ Part 2 of 2 ▸ probabilistic / model-driven

▸ Primitive What it is When it fires Key point
▸ Sub-agentsisolated context, own tools + promptwhen spawned by the main agentUse parallel work · context protection · specialized review
▸ CLAUDE.mdmarkdown loaded in full at session startalways-on contextTrait probabilistic — model reads, doesn't obey
▸ Skillspackaged markdown + scriptson-demand when the description matches the promptUse recurring procedures

▸ Slash commands & plugins are packaging, not primitives ▸ they bundle the six above.

▸ Six tools in the box ▸ next slide picks one for the job.

Picking a Claude Code primitive

▸ Walk top-down ▸ first YES wins

Q1Same approval, over & over?
YES ▸
Permissions
NO ↓
Q2Deterministic auto-fire on a lifecycle event?
YES ▸
Hook
NO ↓
Q3External system or live data?
YES ▸
MCP server
NO ↓
Q4Verbose / parallelizable work to isolate?
YES ▸
Sub-agent
NO ↓
Q5Applies on every prompt in the project? (rules live here too — split with @imports to debloat)
YES ▸
CLAUDE.md
NO ↓
FALLBACKAnything else recurring
Skill
decision-framework.md ▸ slash commands & plugins are packaging, not primitives ▸ lands ~90% of the time.

Or let it pick for you

/claude-automation-recommender ▸ the decision tree, run by Claude Code

SCAN
Reads the repo — stack, scripts, repeated rituals, friction points.
MAP
Recommends hooks · sub-agents · skills · plugins · MCP — each tied to a need, with the why.
YOU
Still your call — run each suggestion past the four gates before you install it.

▸ The meta-loop ▸ Claude Code sets up Claude Code.

▸ Best on a cold-start repo or onboarding ▸ it surfaces insight, you decide what to install ▸ the judgment stays yours.

Permissions ▸ three tiers

allow same outcome every time. Read, Grep, npm test.
ask side effects worth eyeballing. git push, npm publish.
deny destructive / unrecoverable. rm -rf, --force.

▸ Default ask. Promote after the 3rd "yes". Demote after the 1st regret.

▸ The lightest fix on the list ▸ first thing to reach for, last thing to skip.

AI moved the doors

▸ Type 1 ▸ one-way door

Irreversible ▸ slow down, gather data, commit. Used to be: most custom code.

▸ Type 2 ▸ two-way door

Reversible ▸ move fast, accept being wrong. Now: anything you can regen from a spec.

▸ AI didn't change where the doors are ▸ it changed how many components live on the Type 2 side.

▸ Bezos 2016 · one-way / two-way doors ▸ references/framework-survey.md §4

Round 4 ▸ Engineering

The Tool
Audition

A project moves through stages ▸ matching the tool to the stage is the engineering

The gates ▸ v0.1

▸ Observability & Ownership

How it works inside. Scannable. Auditable supply chain. No black box, no unapproved external LLMs. If you can't observe it, you don't own it.

▸ Correctness of Output

Whether the result is right. Deterministic-enough? Verifiable? Falsifiable? Or running on faith?

▸ Cost

$/request, $/run, tokens too. /fast is great and expensive. Threshold is per-project, per-moment.

▸ Simplicity & Maintainability

Will it make sense in 3 months? Can a teammate run it without you in the room?

Score the tool

▸ inspector The Gates inspector

▸ 1 ▸ Profile the project

Which stage — throwaway, internal, or public? Then weigh cost sensitivity · precision · latency · blast radius · team familiarity. Set a budget per gate.

▸ 2 ▸ Score each tool ▸ A+ · A · B · C · D · F

Obs / Cost / Simp / Corr. A is good. F is fail. Can't decide A-or-B? Pick B. The letter forces a verdict.

▸ 3 ▸ Below budget on any gate ▸ drop it

No averaging. One failing gate = noise. Engineering lives in the threshold.

▸ Next ▸ three real examples.

Tools, in the order you reach for them

Scores are circumstantial ▸ each row = ONE use case ▸ re-score for yours.

▸ Step 1 ▸ ↓

Project setup

/init · CLAUDE.md

▸ Step 2 ▸ ↓

Daily mode

/fast · Plan mode

▸ Step 3 ▸ ↓

Guardrails

Permissions · hooks (used right vs wrong)

▸ Step 4 ▸ ↓

External data

Context7 MCP · Playwright MCP

▸ Step 5 ▸ ↓

Community plugins

claude-mermaid · pr-review-toolkit

▸ Step 6 ▸ ↓

Famous frameworks

superpowers · wshobson/agents

▸ Step 7 ▸ ↓

Around Claude Code

ccusage · claudia · claude-code-router

▸ ↓ to walk through · full table at gates-scored-tools.md

Step 1 · Project setup

Tool Obs Cost Simp Corr Use case
/initcode.claude.com/docs/en/memory A+ A A+ A scan codebase · draft CLAUDE.md · you review the seams
CLAUDE.mdtight: under 200 lines, conventions only A+ A A+ A project conventions ▸ always-on context
CLAUDE.mdbloated: 500+ lines, conflicting rules, big imports C D C D same tool, wrong use ▸ Claude reads it as context, not enforcement

Same tool, two use cases, very different scores ▸ this is the whole point.

▸ Run /init once per repo. Keep CLAUDE.md tight or you'll regress yourself.

Step 2 · Daily mode

Tool Obs Cost Simp Corr Use case
/fastaccelerated Opus speed mode A D A+ A Personal/hobby + Prototype ▸ one-shot prep · cost-gated above
Plan modepropose-then-execute A+ C A+ A+ non-trivial change · catches errors before they ship
▸ Reach for these for individual tasks. Each has a sweet spot.

Step 3 · Guardrails

Tool Obs Cost Simp Corr Use case
Permissionscode.claude.com/docs/en/permissions · first-party A+ A+ A+ A+ allow / ask / deny ▸ the lightest fix
claude-code-hooks-masteryused right: 1 hook, 1 concern, fast A+ A+ C A surgical: lint, secret-scan, boundary-check
Hooks gone wrong5 chained, calls external APIs, blocks on slow ones C D D C same tool, wrong use ▸ over-engineered, every Claude action stalls

▸ Use hooks for ▸ lint · secret detection · boundary checks · spec-drift · cost audit · test gatingone hook per concern.

▸ Same primitive, two use cases, very different scores ▸ keep them simple, fast, single-purpose.

Step 4 · External data

Tool Obs Cost Simp Corr Use case
Context7 MCPgithub.com/upstash/context7 D C A+ A Prototype + Internal product ▸ live library docs · vendor before Public / Regulated
Playwright MCPgithub.com/microsoft/playwright-mcp A A A A+ UI verification · real browser, no hallucinations
Slack MCPclaude.ai built-in · OAuth C A A B Internal product+ ▸ ops/on-call · lock scope; send is irreversible
Vercel MCPvercel.com · OAuth C A A B Internal product+ ▸ deploys + envs · split read/write configs
Gmail / Drive MCPclaude.ai built-in D A A+ A Personal / hobby only ▸ forbidden for Internal product+ (client / business data)
Generic vendor-API MCPthe pattern, not a product D C A C Prototype OK ▸ vendor or replicate before Internal product+ ▸ the cautionary archetype
▸ External system ▸ Obs is the gate to watch. Vendor or replicate before prod.

Step 5 · Community plugins

Tool Obs Cost Simp Corr Use case
claude-mermaidgithub.com/veelenga/claude-mermaid A+ A A+ A+ diagrams in any repo
revealjs-skillgithub.com/ryanbbrown/revealjs-skill A+ A+ A A decks like this one

▸ Pin both to a commit ▸ .claude-plugin/marketplace.json

{
  "plugins": [
    {
      "name": "claude-mermaid",
      "source": { "source": "github", "repo": "veelenga/claude-mermaid",
                  "ref": "v0.4.0", "sha": "<40-char commit SHA>" }
    },
    {
      "name": "revealjs-skill",
      "source": { "source": "github", "repo": "ryanbbrown/revealjs-skill",
                  "sha": "<40-char commit SHA>" }
    }
  ]
}
ref = branch/tag (drifts) ▸ sha = exact commit (frozen) ▸ both supported on github, url, git-subdir sources.

Step 6 · Famous frameworks

Tool Obs Cost Simp Corr Use case
obra/superpowersgithub.com/obra/superpowers A D D A+ TDD methodology ▸ pay Cost & Simp to buy A+ Corr
pr-review-toolkit6 specialist subagents ▸ this marketplace A C C A+ pre-merge review ▸ same trade as superpowers, lighter
wshobson/agents191 sub-agents · 78 plugins A C C A cherry-pick 2–3 ▸ don't install the whole marketplace
claude-flow314 MCP tools · swarm D F D D Personal / hobby demo only ▸ even Corr is D — nothing to buy ▸ drop above

Cost & Simp can be paid when Corr is the bottleneck ▸ but fail the gate you're buying = no ▸ no averaging.

▸ Cherry-pick from heavy methodologies. Don't install the whole marketplace.

Step 7 · Around Claude Code

Tool Obs Cost Simp Corr Use case
ccusagegithub.com/ryoppippi/ccusage A+ A+ A+ A+ local token/cost analyzer ▸ the no-brainer ▸ default-on
claudiagithub.com/getAsterisk/claudia A A+ A A Internal product+ ▸ desktop dashboard for teams that want a UI alongside the CLI
claude-code-routergithub.com/musistudio/claude-code-router D A+ C D Personal / hobby only ▸ routes to DeepSeek/Gemini · two failing gates ▸ never for Internal product+ (client data)

▸ Scores are circumstantial ▸ claude-code-router scores the same — recommendation flips from Reject to Buy on a personal hobby project.

▸ ccusage makes Cost enforceable ▸ claudia adds a lens ▸ router is the cautionary tale.

Tactics ▸ how to raise scores

Observability Cost Simplicity Correctness
Vendoring
pull the code in
++ + =
Version locking
pin models, prompts, data
+ = = ++
Audit hooks
cheap-model checks
++ = +

▸ Reads as deltas++ raises a grade · = unchanged · small cost ▸ match the move to the failing gate.

Compose your dev experience from many small tools. — after Wardley & Girba, Rewilding SE

Take the rubric home

QR ▸ mbrian23.github.io/claude-tool-audit

claude-tool-audit ▸ the rubric, runnable

A Claude Code plugin that walks you through scoring a candidate tool against the four gates. 29+ worked audits covering models, MCPs, hooks, frameworks, wrappers — all parseable, all comparable.

mbrian23.github.io/claude-tool-audit

/claude-tool-audit:audit-tool <tool> — score one candidate
/claude-tool-audit:audit-project — audit a whole repo
/claude-tool-audit:budget-planner — set per-gate budgets for a new project

▸ Scan ▸ install ▸ re-score for your project. The toolkit is portable. The judgment isn't.

Finale

The Reveal

The deck rebuilds the game

Would your gates change?

▸ EMP-774 ▸ task compliance monitor Worker surveillance HUD — same aesthetic, different use case

▸ Same person. Same HUD aesthetic. The use case shifted. ▸ Pick your gates. Score again. Drop the noise.

▸ The toolkit is portable. The judgment isn't.

Same gates. Different setup wins.

Budget The Game ▸ 5 min, a laugh Surveillance ▸ 24/7, livelihoods
Observability ≥ C ▸ ok ≥ A+ ▸ required
Cost ≥ D ▸ 5 min/year ≥ A ▸ 24/7 runtime
Simplicity ≥ A ▸ wins ≥ D ▸ layered OK
Correctness ≥ C ▸ FPs are funny ≥ A+ ▸ FPs cost jobs
+ Ethics
5th gate, new project
— n/a ≥ A+ ▸ added

▸ Different projects ▸ different budgets AND sometimes different gates ▸ pick yours.

▸ The toolkit is portable. The judgment isn't.

▸ Verdict

Thank you.

questions ▸ rebuild ▸ play

▸ The Room
Final ▸ 2026-05-28
▸ The Rival 0