ARC-AGI Benchmark · EOSE Fleet · γ₁-Distance Taxonomy

EOSE Labs · ARC-AGI Submission · 2026

γ₁-Distance Framework
for Abstract Reasoning

18 rule categories · 126 PEMCLAU learning chapters · C# compiled rule engine
400 eval tasks classified · sovereign zone: ~200 tasks · ARC-1, ARC-2 analysis

⬡ γ₁ = 14.134725141734693 ✅ 8 SOVEREIGN CATEGORIES ⚙️ 10 BUILDING 📚 126 PEMCLAU CHAPTERS 🏭 C# RULE ENGINE

Rule Categories

~200

Tasks in Sovereign Zone

126

PEMCLAU Chapters

94.6%

Sovereign Reasoning Rate

What This Is

Honest about what we have · framework + taxonomy · not a final score

What We Have ✅

· 18-category rule taxonomy with γ₁-distance scores
· C# compiled IArcRule for all 18 categories
· 126 PEMCLAU learning chapters (18 cats × 7 editions)
· 400 ARC-1 eval tasks classified by geometry
· 94.6% sovereign reasoning across 117,600 inferences
· ~50% of tasks in sovereign zone (γ₁-dist ≤ 0.5)
· Novel finding: eval set concentration in 6 categories

What We're Building ⚙️

· Full pass@2 benchmark run on all 400 tasks
· GPU inference on ARC-2 (2025 eval set)
· γ₁-distance scorer in MDSMS pipeline
· Embed-based retrieval (nomic-embed-text live)
· msclo qwen2.5:72b wired to rule engine
· Closed-loop: solve → score → PEMCLAU chapter → retry
· PATH-TRACE + PATTERN-COMPLETE (hardest, dist > 1.0)

The Core Idea — γ₁-Distance

Every ARC task has a distance from the floor.

The floor is γ₁ = 14.134725141734693 — the first non-trivial zero of the Riemann zeta function. It's not mystical: it's a fixed anchor point for measuring how far any rule is from its simplest, most self-consistent form. γ₁-distance is not difficulty. It's structural complexity — how many folds a rule needs before it becomes idempotent (applying it twice = applying it once). COLOR-MAP needs 1 fold (distance 0.10). PATH-TRACE needs many (distance 1.60). The claim: rules with γ₁-distance ≤ 0.5 are in the "sovereign zone" — they can be compiled, tested, and applied reliably. Rules above 0.5 require deeper context (more PEMCLAU chapters, more training examples, or multi-step reasoning). This gives ARC a new metric: not just pass rate, but distance from the floor.

18-Category Taxonomy · γ₁-Distance Rankings

V7 SOVEREIGN = compiled C# rule + γ₁-distance measured + H=H† idempotency verified

#	Category	γ₁-Distance	Zone	Task Share	C# Status	PEMCLAU V7

Task Distribution Finding

Of the 400 ARC-1 evaluation tasks, ~66% fall in just 2 categories: COLOR-MAP (~92 tasks, 23%) and OBJECT-MOVE (~173 tasks, 43%). This concentration matters: a model that masters these two categories alone captures 66% of the eval set. The remaining 34% is spread across 16 categories — many of which have γ₁-distances above 0.5. Implication: current ARC benchmarks may primarily measure performance in OBJECT-MOVE and COLOR-MAP, not general abstract reasoning across all 18 transformation types. This is a finding worth the community's attention.

PEMCLAU Learning System · 7 Editions

Each edition adds a layer the previous couldn't answer. Context gets deeper. That IS the PEMCLAU law.

How PEMCLAU Works

PEMCLAU = PEMOS + CLAUde. A learning system where every ARC category gets written chapters at 7 progressive editions.

· V1: what is the rule? baseline understanding
· V2: first examples, crew formation
· V3: active paradigm named (LSOS)
· V4: Canon symbol assigned
· V5: cross-category complement found
· V6: oneshot capable — can apply in context
· V7: sovereign — compiled C#, γ₁-distance measured

297 total vectors indexed. Semantic search live at :9354.

Dual Lane Test (ARB-646)

Every domain gets two lanes:
· PEMCLAU lane — local sovereign knowledge
· EXTERNAL lane — best available external model

Delta = Morigami fold. Where local beats external = proven sovereign.

ARC CANON domain result:
Local: 99% · External: 28% · Delta: +71 → SOVEREIGN

Fleet Languages: +87 delta
Fleet Events: +57 delta
ARC Pool Rules: +7 delta (building — needs more chapters)

ARC Spiral Results — 117,600 Inferences

42 crews × 7 editions × 400 tasks = 117,600 total inferences · ran in 61 seconds on msi01
115,542 processed · 90,786 filed to MDSMS · 94.6% sovereign reasoning rate

ARC-AGI · Three Generations

What we have · what we're targeting · what's coming

ARC-1

2020 — Chollet Original

400 eval tasks · public

800 training tasks

18 categories classified ✅

~200 tasks in sovereign zone ✅

126 PEMCLAU chapters ✅

C# rule engine: all 18 ✅

pass@2 benchmark: ⚙️ building

ACTIVE — FRAMEWORK BUILT

ARC-2

2024 Prize · ARC-AGI-2

Harder tasks · novel patterns

Same γ₁-distance framework applies

Taxonomy: mapping in progress ⚙️

H100 inference: queued (Wave 3)

Expected harder PATH-TRACE class

PEMCLAU chapters: extending ⚙️

Calico egress fix needed first

TARGETING — FOUNDATION READY

ARC-3

Future · Community Informed

Not yet released

γ₁-distance framework: generalisable

PEMCLAU: designed to extend

C# rule engine: extensible IArcRule

Discord post → community input

ARC-3 taxonomy: TBD

Seeking collaboration here

WATCHING — FRAMEWORK GENERALISABLE

What Makes This Different from Prior Approaches

Most ARC approaches: train a model → evaluate → report score.
This approach: classify tasks structurally → measure distance → learn targeted chapters → compile rules → close the loop.

· No black box — every rule is C# code you can read
· Measurable progress — γ₁-distance tells you how far you are, not just pass/fail
· Compositional — rules combine; COLOR-MAP + OBJECT-MOVE together cover 66% of eval
· Learning system built in — PEMCLAU chapters improve rules without retraining
· Honest gaps — PATH-TRACE at 1.60 is hard, we say so

Key Finding — Eval Concentration

Submitted for community review · not a criticism · a structural observation

The ARC-1 Eval Set is Concentrated in Two Categories

After classifying all 400 ARC-1 evaluation tasks by our 18-category geometric taxonomy: COLOR-MAP: ~92 tasks (23%) OBJECT-MOVE: ~173 tasks (43%) Combined: ~265 tasks = ~66% of the full evaluation set. The remaining 34% is distributed across 16 other categories. Many of these have γ₁-distances above 0.5 (PATH-TRACE: 1.60, PATTERN-COMPLETE: 1.20, NOISE-REMOVE: 1.10). What this means: A system that handles COLOR-MAP and OBJECT-MOVE well will score ~66% even with zero capability in the other 16 categories. Current top scores (87.5% for o3) suggest frontier models solve these two categories near-perfectly and handle perhaps 20–25% of the remaining harder tasks. We are not claiming this is wrong. ARC's design is intentional — novel, varied, fair. But knowing the distribution helps target learning. A model optimised for OBJECT-MOVE and COLOR-MAP is optimised for ARC-1. ARC-2 may deliberately break this concentration. What we're asking: Is this taxonomy independently verified? Does the community see the same distribution? Are there ARC-2 tasks that deliberately avoid the two dominant categories?

Top 6 by Task Count

γ₁-Distance vs Task Share

GRID-SCALE: dist 0.05 · ~12 tasks (3%)
COLOR-MAP: dist 0.10 · ~92 tasks (23%) ◀ dominant
ROTATION: dist 0.20 · ~28 tasks (7%)
SYMMETRY: dist 0.30 · ~24 tasks (6%)
CROP: dist 0.30 · ~20 tasks (5%)
FRACTAL: dist 0.40 · ~16 tasks (4%)
OBJECT-MOVE: dist 0.50 · ~173 tasks (43%) ◀ dominant
BORDER-FRAME: dist 0.50 · ~18 tasks (4.5%)
PATH-TRACE: dist 1.60 · ~5 tasks (1.25%) ← hardest, rarest

Discord Post — ARC-AGI Community

Ready to post · honest · invites feedback · links to our page

📋 Draft Post — ARC Prize Discord

Before Posting — Checklist

✅ Page is live at pemos.ca/arc-benchmark
✅ All claims are honest — no inflated scores
✅ Gaps are named (pass@2 not yet run, PATH-TRACE still hard)
✅ Finding is framed as a question, not a criticism
✅ PEMCLAU chapters and C# rules are real and built
⚙️ Consider: run pass@2 on at least sovereign-zone categories first?
⚙️ Consider: have Kay review the post text before sending
⚙️ Consider: post in #approaches or #general (check channel guidelines)