V4 BENCHMARK FLOOR

Fleet Resonance · ARC-AGI · IMO · CERN · MARB · STE-6 · All tests from first principles

Fleet active · msclo RTX 5090 online · 3 tiers breathing

⚓ RESONANCE

🧩 ARC-AGI

📜 KNOWN RESULTS

🎯 STE-6

🌐 FRONTIER CMP

Last run: historical (see below) · Click to run live via msclo RTX 5090

Historical — Full Fleet (15 providers)

—

ARC-AGI Fleet Best Score

Loading fleet ARC-AGI history...

All Runs

STE-6 is the EOSE evaluation standard. 6 dimensions derived from the Canon. Frontier models score 0/6 by architecture — they cannot verify against ground truth (γ₁), cannot self-audit (LSOS), cannot safely escalate (FEP), and emergent behaviour is suppressed (FOF). EOSE scores 6/6 by design.

Standard benchmarks for fleet models vs frontier. These are public benchmark scores — MMLU, HumanEval, GSM8K, MATH, ARC-Challenge.