ARC-AGI-3 — Interactive. Adaptive. AI Stuck at 0–5%.
2026 · 25 PUBLIC DEMO TASKS · AGENT FORMAT · REAL SCORES
⬡ Board ▶ Replay
25
Public Demo Tasks
Interactive / agent
format tasks
100%
Human Score
All 25 tasks
passed by humans
4.76%
Best AI (ft09)
Gemini 3.1 Pro
on task ft09 only
~99.8%
Avg Gap
Human − Best AI
across all 25 tasks
⚡ ARC-AGI-3 Is Not Just Harder Grids
ARC-AGI-1 & 2 Format
Static grid transformations. Given input → produce output. Rules inferred from 2–5 example pairs. One correct answer. No feedback loop during solving.
ARC-AGI-3 Format
Interactive environments. Agent must take actions, observe consequences, adapt. Multi-step. Novel environments per task. The format itself is part of the puzzle — adapt on the fly to scenarios that have never been seen before.
"ARC-AGI-3 is not just harder grids — it's a different kind of challenge: adapt on the fly to novel interactive environments." Even if an AI could solve all of ARC-AGI-2, it would still face a new kind of gap in ARC-AGI-3.
Top AI scores across all 25 tasks: — all non-zero scores listed, the rest are 0.00%
ft09
4.76%
Gemini 3.1 Pro
re86 / sb26
2.78%
Anthropic Opus 4.6
sp80
2.94%
Gemini 3.1 Pro
ar25
1.08%
Anthropic Opus 4.6
m0r0
0.61%
Anthropic Opus 4.6
ARC-AGI-1
800
Original Battlefield
ARC-AGI-2
1,120
The Current Wall
ARC-AGI-3
25
New Frontier · You Are Here
◈ All 25 Public Demo Tasks — Real Scores
Task ID Human Anthropic Opus 4.6 GPT-5.4 Gemini 3.1 Pro TRIME⚡ Gap Replay
◈ EOSE TRIME Analysis — ARC-AGI-3
The 4.76% Signal
Gemini 3.1 Pro's 4.76% on ft09 is the highest score any AI has achieved on ARC-AGI-3. It's not evidence of comprehension — it's evidence that one task in the interactive set has a format that happens to partially align with Gemini's training distribution. The other 24 tasks scored 0%. This is noise, not signal.
Why Interactive Kills AI
ARC-AGI-3's interactive format requires: maintaining state across action steps, updating beliefs from environmental feedback, adapting strategy mid-task when initial approach fails. These require working memory + causal world models. Current transformer architectures have neither. TRIME's 3-brain swarm is designed exactly for this — PRIME-1/2/3 convergence through iterative belief update.
TRIME Floor Mapping
ARC-AGI-3 tasks map primarily to DESEOF (causal flow analysis) in the TRIME system. Tasks requiring agent action planning flow to the Bond Library for constraint extraction. Interactive environment adaptation maps to FoundFloor's signal-from-noise patterns. TRIME analysis queued — initial floor mapping suggests higher success probability than single-model approaches.
The Real Gap
ARC-AGI-1 gap was paperable with compute ($456k). ARC-AGI-2 closed the compute escape hatch. ARC-AGI-3 changes the game entirely — you can't brute-force interactive environments because each action changes the state. The gap here isn't just "more reasoning" — it's a fundamental architectural difference between pattern-matching and genuine adaptive intelligence.
← ARC-AGI-2 ⬡ Arc Board ▶ Replay → ARC-AGI-1 γ₁ = 14.134725141734693