▸ 1 ▸ BUILD-UP
Music plays. The condition lands in silence.
▸ 2 ▸ 🟢 GREEN LIGHT
Move. Cluster. Coordinate. ~5s.
▸ 3 ▸ 🔴 FREEZE
One snapshot. The doll is checking…
▸ 4 ▸ ⚖️ SCORE
SAM 3.1 counts who matched. Meter moves.
Marvik ▸ ¡IA en vivo! ▸ 2026-05-28
with Claude Code
Round 1 ▸ 00:05:00
Audience vs. The Rival
▸ 1 ▸ BUILD-UP
Music plays. The condition lands in silence.
▸ 2 ▸ 🟢 GREEN LIGHT
Move. Cluster. Coordinate. ~5s.
▸ 3 ▸ 🔴 FREEZE
One snapshot. The doll is checking…
▸ 4 ▸ ⚖️ SCORE
SAM 3.1 counts who matched. Meter moves.
▸ Simon round
Host: "Simon says — show me something red."
Match it ▸ coverage scores positive.
▸ Feint round
Host: "Show me something red." (no "Simon says")
It's a trap ▸ matchers score negative.
▸ Restraint is an action. Holding still on a feint scores too.
Round 2 ▸ Why
Cynefin (kuh-NEV-in) says hi
Complex
▸ Probe ▸ Sense ▸ Respond
Emergent practice. Unknown unknowns. Cause & effect only clear in hindsight. Software + AI lives here.
Complicated
▸ Sense ▸ Analyze ▸ Respond
Good practice. Known unknowns. Several right answers. Specialists help.
Chaotic
▸ Act ▸ Sense ▸ Respond
Novel practice. No cause-effect. Stabilize first, ask later. (Apollo 13.)
Clear
▸ Sense ▸ Categorize ▸ Respond
Best practice. Known knowns. Checklists work. (Password reset.)
▸ Software dev = Complex. Best practices don't exist here.
AI magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.
— Google DORA ▸ ROI of AI-Assisted Software Development
▸ The system, not the tool
Returns come from the platform, the workflows, the team. Model = multiplier; org = integer.
▸ Code is a liability
Op cost > build cost. More code without oversight = more verification debt.
▸ Local wins ≠ global wins
Without foundations ▸ local productivity that drowns in downstream chaos.
Round 3 ▸ How
We can't keep up with tools. We can teach the thinking.
▸ The exercise
"It's talk day. The demo died publicly. What killed it?"
Unsorted
Not Top Ten
Maybe
Top Ten
▸ Each failure mode → an agentic mitigation
▸ Slide goes stale
Nightly skill re-checks claims vs. source repo ▸ opens a PR on drift.
▸ You don't wait for the Top Ten ▸ even a "Maybe" earns an agent.
▸ Vibing
prompt ▸ LLM ▸ 🌀 app
Looks right. Can't tell which 10% is wrong without running it.
▸ Rewilded SE
prompt ▸ LLM ▸ 🔧 tool ▸ 🗺 fact
The LLM writes the tool that retrieves the fact. The fact is the verdict.
▸ Coherence ≠ truth ▸ the fact lives outside the model.
user ▸ "Should I rewrite our auth in Rust this sprint?"
claude ▸ "Great instinct — Rust would eliminate a whole class of bugs in your auth layer. Let me sketch the migration…"
user ▸ "…is that actually a good idea?"
claude ▸ "Honestly, no. Three open tickets, no Rust expertise, and the bugs aren't in auth."
Coherence, not judgment. The verifier has to live outside the conversation.
Without comprehension, engineering becomes belief.
— after Wardley & Girba, Rewilding Software Engineering
▸ Code is the blueprint
The "spec" is closer to a wishlist. The code is what makes the decisions.
▸ Knight Capital
$440M lost in 45 minutes ▸ behavior the system didn't understand. Comprehension is a feature.
▸ Comprehension is architectural — own the seams, not every line.
▸ Part 1 of 2 ▸ deterministic / mechanical
| ▸ Primitive | What it is | When it fires | Key point |
|---|---|---|---|
| ▸ Permissions | allow / ask / deny rules in settings.json | every tool call | Owner Claude Code, not the model |
| ▸ Hooks | shell commands on lifecycle events | PreToolUse, PostToolUse, Stop, SessionStart… | Trait deterministic |
| ▸ MCP servers | external tools via Model Context Protocol (stdio · HTTP · SSE) | when the model calls them | Use live data, APIs · model-driven trigger |
▸ Part 2 of 2 ▸ probabilistic / model-driven
| ▸ Primitive | What it is | When it fires | Key point |
|---|---|---|---|
| ▸ Sub-agents | isolated context, own tools + prompt | when spawned by the main agent | Use parallel work · context protection · specialized review |
| ▸ CLAUDE.md | markdown loaded in full at session start | always-on context | Trait probabilistic — model reads, doesn't obey |
| ▸ Skills | packaged markdown + scripts | on-demand when the description matches the prompt | Use recurring procedures |
▸ Slash commands & plugins are packaging, not primitives ▸ they bundle the six above.
▸ Walk top-down ▸ first YES wins
▸ /claude-automation-recommender ▸ the decision tree, run by Claude Code
▸ The meta-loop ▸ Claude Code sets up Claude Code.
Read, Grep, npm test.
git push, npm publish.
rm -rf, --force.
▸ Default ask. Promote after the 3rd "yes". Demote after the 1st regret.
▸ Type 1 ▸ one-way door
Irreversible ▸ slow down, gather data, commit. Used to be: most custom code.
▸ Type 2 ▸ two-way door
Reversible ▸ move fast, accept being wrong. Now: anything you can regen from a spec.
▸ AI didn't change where the doors are ▸ it changed how many components live on the Type 2 side.
references/framework-survey.md §4Round 4 ▸ Engineering
A project moves through stages ▸ matching the tool to the stage is the engineering
▸ Observability & Ownership
How it works inside. Scannable. Auditable supply chain. No black box, no unapproved external LLMs. If you can't observe it, you don't own it.
▸ Correctness of Output
Whether the result is right. Deterministic-enough? Verifiable? Falsifiable? Or running on faith?
▸ Cost
$/request, $/run, tokens too. /fast is great and expensive. Threshold is per-project, per-moment.
▸ Simplicity & Maintainability
Will it make sense in 3 months? Can a teammate run it without you in the room?
▸ These 4 are v0.1 for this project. Your project may add a 5th ▸ Ethics, Privacy, Latency, Compliance. ▸ Scale: A+ · A · B · C · D · F (A is good, F is fail).
drop assets/caricature.png ▸ see PROMPTS.md
▸ 1 ▸ Profile the project
Which stage — throwaway, internal, or public? Then weigh cost sensitivity · precision · latency · blast radius · team familiarity. Set a budget per gate.
▸ 2 ▸ Score each tool ▸ A+ · A · B · C · D · F
Obs / Cost / Simp / Corr. A is good. F is fail. Can't decide A-or-B? Pick B. The letter forces a verdict.
▸ 3 ▸ Below budget on any gate ▸ drop it
No averaging. One failing gate = noise. Engineering lives in the threshold.
Scores are circumstantial ▸ each row = ONE use case ▸ re-score for yours.
▸ Step 1 ▸ ↓
Project setup
/init · CLAUDE.md
▸ Step 2 ▸ ↓
Daily mode
/fast · Plan mode
▸ Step 3 ▸ ↓
Guardrails
Permissions · hooks (used right vs wrong)
▸ Step 4 ▸ ↓
External data
Context7 MCP · Playwright MCP
▸ Step 5 ▸ ↓
Community plugins
claude-mermaid · pr-review-toolkit
▸ Step 6 ▸ ↓
Famous frameworks
superpowers · wshobson/agents
▸ Step 7 ▸ ↓
Around Claude Code
ccusage · claudia · claude-code-router
gates-scored-tools.md| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
/initcode.claude.com/docs/en/memory |
A+ | A | A+ | A | scan codebase · draft CLAUDE.md · you review the seams |
CLAUDE.mdtight: under 200 lines, conventions only |
A+ | A | A+ | A | project conventions ▸ always-on context |
CLAUDE.mdbloated: 500+ lines, conflicting rules, big imports |
C | D | C | D | same tool, wrong use ▸ Claude reads it as context, not enforcement |
▸ Same tool, two use cases, very different scores ▸ this is the whole point.
/init once per repo. Keep CLAUDE.md tight or you'll regress yourself.| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
/fastaccelerated Opus speed mode |
A | D | A+ | A | Personal/hobby + Prototype ▸ one-shot prep · cost-gated above |
| Plan modepropose-then-execute | A+ | C | A+ | A+ | non-trivial change · catches errors before they ship |
| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
| Permissionscode.claude.com/docs/en/permissions · first-party | A+ | A+ | A+ | A+ | allow / ask / deny ▸ the lightest fix |
| claude-code-hooks-masteryused right: 1 hook, 1 concern, fast | A+ | A+ | C | A | surgical: lint, secret-scan, boundary-check |
| Hooks gone wrong5 chained, calls external APIs, blocks on slow ones | C | D | D | C | same tool, wrong use ▸ over-engineered, every Claude action stalls |
▸ Use hooks for ▸ lint · secret detection · boundary checks · spec-drift · cost audit · test gating ▸ one hook per concern.
| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
| Context7 MCPgithub.com/upstash/context7 | D | C | A+ | A | Prototype + Internal product ▸ live library docs · vendor before Public / Regulated |
| Playwright MCPgithub.com/microsoft/playwright-mcp | A | A | A | A+ | UI verification · real browser, no hallucinations |
| Slack MCPclaude.ai built-in · OAuth | C | A | A | B | Internal product+ ▸ ops/on-call · lock scope; send is irreversible |
| Vercel MCPvercel.com · OAuth | C | A | A | B | Internal product+ ▸ deploys + envs · split read/write configs |
| Gmail / Drive MCPclaude.ai built-in | D | A | A+ | A | Personal / hobby only ▸ forbidden for Internal product+ (client / business data) |
| Generic vendor-API MCPthe pattern, not a product | D | C | A | C | Prototype OK ▸ vendor or replicate before Internal product+ ▸ the cautionary archetype |
| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
| claude-mermaidgithub.com/veelenga/claude-mermaid | A+ | A | A+ | A+ | diagrams in any repo |
| revealjs-skillgithub.com/ryanbbrown/revealjs-skill | A+ | A+ | A | A | decks like this one |
▸ Pin both to a commit ▸ .claude-plugin/marketplace.json
{
"plugins": [
{
"name": "claude-mermaid",
"source": { "source": "github", "repo": "veelenga/claude-mermaid",
"ref": "v0.4.0", "sha": "<40-char commit SHA>" }
},
{
"name": "revealjs-skill",
"source": { "source": "github", "repo": "ryanbbrown/revealjs-skill",
"sha": "<40-char commit SHA>" }
}
]
}
ref = branch/tag (drifts) ▸ sha = exact commit (frozen) ▸ both supported on github, url, git-subdir sources.| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
| obra/superpowersgithub.com/obra/superpowers | A | D | D | A+ | TDD methodology ▸ pay Cost & Simp to buy A+ Corr |
| pr-review-toolkit6 specialist subagents ▸ this marketplace | A | C | C | A+ | pre-merge review ▸ same trade as superpowers, lighter |
| wshobson/agents191 sub-agents · 78 plugins | A | C | C | A | cherry-pick 2–3 ▸ don't install the whole marketplace |
| claude-flow314 MCP tools · swarm | D | F | D | D | Personal / hobby demo only ▸ even Corr is D — nothing to buy ▸ drop above |
▸ Cost & Simp can be paid when Corr is the bottleneck ▸ but fail the gate you're buying = no ▸ no averaging.
| Tool | Obs | Cost | Simp | Corr | Use case |
|---|---|---|---|---|---|
| ccusagegithub.com/ryoppippi/ccusage | A+ | A+ | A+ | A+ | local token/cost analyzer ▸ the no-brainer ▸ default-on |
| claudiagithub.com/getAsterisk/claudia | A | A+ | A | A | Internal product+ ▸ desktop dashboard for teams that want a UI alongside the CLI |
| claude-code-routergithub.com/musistudio/claude-code-router | D | A+ | C | D | Personal / hobby only ▸ routes to DeepSeek/Gemini · two failing gates ▸ never for Internal product+ (client data) |
▸ Scores are circumstantial ▸ claude-code-router scores the same — recommendation flips from Reject to Buy on a personal hobby project.
| Observability | Cost | Simplicity | Correctness | |
|---|---|---|---|---|
| Vendoring pull the code in |
++ | + | − | = |
| Version locking pin models, prompts, data |
+ | = | = | ++ |
| Audit hooks cheap-model checks |
++ | − | = | + |
▸ Reads as deltas ▸ ++ raises a grade · = unchanged · − small cost ▸ match the move to the failing gate.
▸ claude-tool-audit ▸ the rubric, runnable
A Claude Code plugin that walks you through scoring a candidate tool against the four gates. 29+ worked audits covering models, MCPs, hooks, frameworks, wrappers — all parseable, all comparable.
▸ mbrian23.github.io/claude-tool-audit
▸ /claude-tool-audit:audit-tool <tool> — score one candidate
▸ /claude-tool-audit:audit-project — audit a whole repo
▸ /claude-tool-audit:budget-planner — set per-gate budgets for a new project
Finale
The deck rebuilds the game
drop assets/monitoring.png ▸ see PROMPTS.md
▸ Same person. Same HUD aesthetic. The use case shifted. ▸ Pick your gates. Score again. Drop the noise.
| Budget | The Game ▸ 5 min, a laugh | Surveillance ▸ 24/7, livelihoods |
|---|---|---|
| Observability | ≥ C ▸ ok | ≥ A+ ▸ required |
| Cost | ≥ D ▸ 5 min/year | ≥ A ▸ 24/7 runtime |
| Simplicity | ≥ A ▸ wins | ≥ D ▸ layered OK |
| Correctness | ≥ C ▸ FPs are funny | ≥ A+ ▸ FPs cost jobs |
| + Ethics 5th gate, new project |
— n/a | ≥ A+ ▸ added |
▸ Different projects ▸ different budgets AND sometimes different gates ▸ pick yours.
▸ Verdict
questions ▸ rebuild ▸ play