EOSE Labs · ARC-AGI Floor

We're building toward 64%.
Come help.

ARC-AGI tests whether AI can actually reason — not memorize. We're building a 3-cap execution verifier on sovereign infrastructure. The floor is open. The standards are public. We want builders.

64% target

〰️ curious welcome 🌀 wonder encouraged ⚓ no sorry needed

Ways to contribute

⚙️

Code · Fix a gap

Run the executor. Find where it fails. Fix it. The parse loop, the vote logic, the color equivariance layer — all open problems with clear success criteria.

Python Ollama ARC tasks

→ arc-3cap repo

◈

Cap builder · Add a stance

Design a 4th prompt identity. Each cap is a different way of looking at the same puzzle. More stances = better vote diversity = higher score. Test it. File it as a standard.

NP standard floor

→ floor@pemos.ca

📐

Standard builder · File a pattern

We file everything we learn as a named pattern standard (NP). If you crack something — parse reliability, voting logic, color equivariance — it becomes a filed standard, not just a code comment.

NP ARB LSOS

→ see the floor

🔬

Tester · Run and report

Download the runner. Run it on your hardware. Report what you see — which task types fail, which caps do better, where the vote breaks down. Raw results are as valuable as fixes.

results validation

→ audit@pemos.ca

🌐

Community · Bring others

Post results. Share the floor. Point people at the viz. The ARC Prize community is 500+ people who care about real reasoning. We're building something worth showing.

ARC Discord LinkedIn

→ pemos.ca/arc-agi

⚖️

Advisor · Shape the floor

If you've worked ARC-AGI seriously, we want to hear how you think about task selection, failure taxonomy, and what beats 64%. CLO-gated. Serious only.

C=11+ CLO gate

→ legal@pemos.ca

Open gaps · pick one up

GAP-PARSE

Reliable grid extraction

LLM output → valid [[int]] grid, reliably, every time. Current: ~60% parse rate.

GAP-COLOR

Color equivariance

Permuting colors shouldn't change the answer. Each cap tries a different color mapping. Remap back at end.

GAP-VOTE

Vote with confidence

3-cap vote · when no majority, pick the cap with best training accuracy. Don't withdraw blindly.

GAP-TASK

Task selection

Some tasks are solvable by small models. Predict which ones. Spend budget on winnable tasks.

GAP-DSL

DSL primitives

ARC patterns recur. A small library of grid primitives (mirror, rotate, fill, mask) helps caps reason faster.

Where we sit · ARC-AGI v4

Team Score · Note

noemon.ai

91.4%

LLM+DSL+vote

EdisCore

84%

transform() verifier

P'tit Ju

57.6%

DSL+TDD

→

EOSE Labs (3-cap verifier)

→ 64%

building now

—

compress-arc

20%

floor to beat

How we work · the 3 says

〰️

Wiggles

Curiosity is enough to start. You don't need to know everything. Come look. Ask questions. That's a contribution.

🌀

Wonder

The best ideas come from people who look at a problem sideways. Don't suppress the odd angle — that's often where the gap closes.

⚓

No sorry

No gatekeeping. No hierarchy of worthiness. If your contribution is real, it goes on the floor. The work speaks.

We're building toward 64%.Come help.

The floor is open.

We're building toward 64%.
Come help.