429 rate limit — the 6 minutes when the infrastructure died before the model did

The numbers

The signal sum from 7,729 sessions over seven months.

Metric	Value	Source
Total sessions	7,729	derived_metrics.session_count
Total events	132,293	derived_metrics.event_count
Total tool calls	64,206	derived_metrics.tool_count
Total failures	11,147	derived_metrics.failure_count
Failure rate per tool call	17.4%	derived (11,147 / 64,206)

The seven failure categories carry different weight.

error          3,115   ← generic tool error (includes upstream service refusal)
timeout        2,549   ← no response (includes gateway/proxy cut-off)
failure        2,378   ← explicit failure signal
nonzero_exit   1,949   ← process exited abnormally
permission       874   ← refusal signal
exception        227   ← unhandled exception
fatal             55   ← unrecoverable

Failure — the outside cap closes before the model does

429 is not a signal that the model gave a wrong answer. It says the request was refused before the model ever saw it. Hourly quota, concurrent connections, per-key caps — when any one of these fills, the model never gets a turn.

This pattern is common inside the 11,147 cumulative failures. A large slice of the 2,549 timeout rows has the same shape — not the model hanging, but a gateway, proxy, or queue cutting the response. The 3,115 error rows include upstream refusals too. The cleaner "model got it wrong" cases live closer to failure (2,378) and nonzero_exit (1,949) — and even combined they account for only 38.8%.

Put differently — the first place an agent dies is not "wrong thought" but "denied entry." More than half of the 11,147 failures are signals generated at the infrastructure boundary, and a sizeable share of those never reached a model at all.

[operator command]   "analyze workspace structure"
[t=0:00]             session opens, directory scan begins
[t=2:14]             first shell call → ok
[t=5:47]             18 tool calls accumulated
[t=6:03]             429 response → session aborted
[t=6:08]             same command handed off to another agent → passes

6 minutes is short. In those 6 minutes the model didn't hallucinate once. It simply ran out of seats for the 7th request. Had an automation pipeline been running without an operator, the next several tasks downstream would have spun into an infinite retry behind that 429.

Next note

Next: the router itself. The next most expensive decision after "which model" turned out to be which failure flows to which next seat. The 11,147 distribution wasn't an alarm graph — it was a draft of a routing table.

Editor's note: Session 7,729 / event 132,293 / tool call 64,206 / failure 11,147 / 7-category breakdown are measured from AX_AI_USAGE_SEMANTIC_ANALYSIS.json derived_metrics. The single-session 6-minute / 429 / 18-tool-calls / handoff-then-pass sequence is reconstructed narrative from one observed Codex CLI session interrupted by a 429 rate limit during a workspace structure analysis — exact timestamps generalized. Model-quality judgments are scoped to what these logs show. People, companies, repos, domains, and keys: zero, per codename policy. The mid-May secret incident is deliberately out of scope (hold).