ARC-AGI-2 — 1,120 Tasks. All N/A.
2025 · THE CURRENT WALL · ZERO AI COMPLETIONS · HUMANS PASS
⬡ Board AGI-3 →
1,120
Total Tasks
Each validated by
≥2 humans
0
AI Completions
Across all frontier
models tested
Pass
Human Baseline
All 1,120 tasks
passed by humans
~100%
Avg Gap
Human − AI
Every single task
Every task passed by at least 2 humans. Zero AI systems have passed any.
ARC-AGI-2 was designed after o3's semi-private success on ARC-AGI-1 to close the compute-escape hatch.
1,120 tasks. Humans: Pass. Frontier AI: N/A. The wall is real.
ARC-AGI-1
800
Original Battlefield
ARC-AGI-2
1,120
The Current Wall · You Are Here
ARC-AGI-3
25
Interactive · 2026
∅ The Wall — 25 of 1,120 Tasks every model · every task · N/A
…and 1,095 more tasks — all N/A across all models · → Arc Board
◈ First 25 Tasks — Full Model Breakdown
Task ID Human Anthropic Opus 4.6 (Max) GPT-5.4 (High) Grok 4.20 (Beta) Gemini 3.1 Pro (Preview) TRIME⚡ Gap Replay
…and 1,095 more · all N/A · all ~100% gap
◈ EOSE TRIME Analysis — Why 1,120 = 0
What ARC-AGI-2 Changed
ARC-AGI-2 was specifically designed to resist the strategies that let o3 score on ARC-AGI-1. More complex compositional rules. More ambiguous training examples. More diverse task types. The result: even o3-level reasoning, applied at extreme compute cost, cannot pass these tasks consistently. The compute-escape hatch is closed.
The Validation Standard
Every ARC-AGI-2 task required at least 2 independent human solvers to confirm it before inclusion. This means the "human pass rate" is not theoretical — it's empirical. Real people solved every task. No AI system has. The gap is not a benchmark artifact — it's a direct measurement of the intelligence difference.
TRIME Floor Analysis
TRIME analysis queued for ARC-AGI-2. Initial mapping suggests tasks span Bond Library (structural invariants), DESEOF (causal chains), and FoundFloor (frequency/counting). The 3-brain swarm should handle task decomposition better than single-model approaches — but TRIME hasn't run the full eval yet. Status: Analyzing.
The N/A Signal
A wall of N/A is not nothing — it's a precise measurement. It says: current AI architectures, regardless of scale, compute, or training data, cannot perform the type of rule induction these 1,120 tasks require. ARC-AGI-3 goes further — interactive environments where even the format is novel. N/A → N/A → 0-5%. The trajectory is clear.
← ARC-AGI-1 ⬡ Arc Board → ARC-AGI-3 ▶ Replay γ₁ = 14.134725141734693