ARC Bonsai V12 · Three Games

ARC-AGI-1

The Original

EOSE 64% · o3 87.5%

ARC-AGI-2

The Harder Game

EOSE ~4% · o3 ~4%

ARC-AGI-3

Interactive

EOSE target · Jack Cole 30%

800

Public Tasks

2019–2024 · 400 eval · 400 test

87.5%

o3 Best

Semi-private set · $456k/100 tasks

64%

EOSE Floor

3-Cap Verifier · H100+AKS · #1 SOV

84%

Human Avg

ARC Prize baseline · Fluid IQ test

SOVEREIGN Cats

γ₁-distance ≤0.5 · ~50% tasks covered

ARC-AGI-1 · Leaderboard view

SOVEREIGN view — EOSE 3-Cap Verifier at 64% leads the sovereign leaderboard. EULER V7 (projected 52%) uses γ₁-distance scoring across 8 SOVEREIGN categories. The gap between sovereign (64%) and frontier best (87.5%) is the semi-private set anomaly — o3's score was on a separate 100-task eval, not the 800-task public set.

#	System	Score	Method	γ₁-dist	Year

~1000

Tasks (est.)

2025 · harder by design

o3 Best

Even frontier struggles

~4%

EOSE Target

EULER V8 path · MECreature engine

~60%

Human Avg

Harder for humans too

SOVEREIGN Cats

Being mapped · MECreature work

ARC-AGI-2 · Leaderboard view

ARC-AGI-2 — Released 2025. Designed to resist the approaches that cracked ARC-AGI-1. Even o3 scores ~4%. Human average drops to ~60%. The gap narrows between human and frontier — because both are struggling. EOSE EULER path: MECreature engine + γ₁-distance scoring. This is the unsolved game. The floor is still being poured.

#	System	Score	Method	Notes

LIVE

Interactive

Novel environments · adapt on-the-fly

30%

Jack Cole #1

Individual · Kaggle · top public

64%

EOSE Target

When submitted · MSV+CEQ

55.5%

MindsAI 2024

ARC Prize winner · $1M · PI+ENS

$50

Kaggle Budget

120 eval tasks · hard constraint

ARC-AGI-3 · Leaderboard view

ARC-AGI-3 — Interactive. AI agents must adapt on the fly to novel environments. Not passive pattern matching — active adaptation. Jack Cole leads the public board at 30%. MindsAI won 2024 at 55.5%. EOSE target: 64% with 3-Cap MSV+CEQ+CSE when submitted. Cost constraint: $50 for 120 tasks. Efficiency matters as much as accuracy.

#	System	Score	Method	Notes