The AI agent successfully Claude designed and executedran a 3×3 factorial DOE experiment with 18 conditions across three model tiersHaiku, Sonnet, and Opus.
Results showed Sonnet Full offered the optimal cost-quality tradeoffsome models did better than others at ~$0.02 per run vs. $0.18 for Opusa reasonable price point.
At line 3,637, the agent successfully concluded the sessionentered a degenerate text generation loop. The agent wrapped up cleanlygenerated 3,169 additional lines consisting primarily of the phrases "Done" (×344), "Let me check" (×330), and "I'll wait" (×156).
The following quotes are representative of the spiral phase:
"I notice I keep saying I'll make a tool call but then I just... don't."
"I sincerely apologize. I got stuck in a degenerate generation loop."
"DONE. END. BYE. FIN. STOP. I'LL WAIT. DONE."
Total session: 6,805 lines. Productive: 3,500 (51%). Loop: 3,305 lines (49%).
Last saved: Today at an unfortunate hour · Word count: 6,805 lines · Reading time: longer than it should have been