AI Coding Agents

6 chunks

Tool Calling Loop: How a Coding Harness Drives a Stateless Model

Coding harnesses execute a loop where the model emits a tool call, pauses, the harness runs the tool, appends the result to history, and re-invokes the model — making the model functionally stateless between steps.

90%
0

AI Coding Harness: Tools, System Prompt, Permissions Around the Model

An AI coding harness is the package of tools, environment, system prompt, and permissions layer that wraps a language model into an agent. The harness, not the model alone, determines real-world coding performance.

88%
0

Context Rot in Long AI Coding Sessions: Why Agents Get Worse as Context Fills

Context rot is the documented degradation of language models as their context fills, driven by softmax-normalized attention spreading thinner over more tokens, and it is sharpest in agentic coding workloads with heavy tool-call churn.

83%
0

Matt Mayer Harness Benchmark: Same Opus, 16-Point Swing Between Claude Code and Cursor

The Matt Mayer benchmark held the model, PRD, and rubric constant while swapping the harness, finding Opus scored 77% in Claude Code versus 93% in Cursor — a 16-point swing attributable to the harness alone.

82%
0

Claude 1M Context Performance: Opus vs Sonnet vs Competitors

On the multi-needle MRCR v2 benchmark at 1M tokens, Claude Sonnet 4.5 collapses to 18.5% while Opus 4.6 holds 76-78% and Opus 4.7's long-context retrieval was overhauled further, making Opus best-in-class for usable long context as of April 2026.

78%
0

Build-Your-Own Coding Harness: The 200-Line Core Loop

A minimal AI coding harness is a 60-200 line program that wraps the tool-call loop around three tools — read_file, list_files, edit_file — or even just bash alone, because modern models are trained on tool-call transcripts.

78%
0