The IKEA analogy lands perfectly. I built a 4-agent system with Claude Code and the biggest challenge was not the agents - it was me, playing the role of project manager without any of the tools.
Two months of live testing showed me: agents with clear escalation rules outperform agents with 'use your judgment' 3-to-1. Judgment is a bottleneck at scale.
The IKEA analogy lands perfectly. I built a 4-agent system with Claude Code and the biggest challenge was not the agents - it was me, playing the role of project manager without any of the tools.
Two months of live testing showed me: agents with clear escalation rules outperform agents with 'use your judgment' 3-to-1. Judgment is a bottleneck at scale.
Real comparison data from running both Claude Code and Codex: https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026
How did you actually achieve it? Any step by step guide?
Honestly, I'm embarrassed to say I just asked codex to do it. I gave it some hints of using some JSON but that was about it.