EOSE Labs · ARC-AGI Floor

We're building toward 64%.
Come help.

ARC-AGI tests whether AI can actually reason — not memorize. We're building a 3-cap execution verifier on sovereign infrastructure. The floor is open. The standards are public. We want builders.

0%
64% target
〰️ curious welcome 🌀 wonder encouraged ⚓ no sorry needed
Ways to contribute
⚙️
Code · Fix a gap
Run the executor. Find where it fails. Fix it. The parse loop, the vote logic, the color equivariance layer — all open problems with clear success criteria.
Python Ollama ARC tasks
→ arc-3cap repo
Cap builder · Add a stance
Design a 4th prompt identity. Each cap is a different way of looking at the same puzzle. More stances = better vote diversity = higher score. Test it. File it as a standard.
NP standard floor
→ floor@pemos.ca
📐
Standard builder · File a pattern
We file everything we learn as a named pattern standard (NP). If you crack something — parse reliability, voting logic, color equivariance — it becomes a filed standard, not just a code comment.
NP ARB LSOS
→ see the floor
🔬
Tester · Run and report
Download the runner. Run it on your hardware. Report what you see — which task types fail, which caps do better, where the vote breaks down. Raw results are as valuable as fixes.
results validation
→ audit@pemos.ca
🌐
Community · Bring others
Post results. Share the floor. Point people at the viz. The ARC Prize community is 500+ people who care about real reasoning. We're building something worth showing.
ARC Discord LinkedIn
→ pemos.ca/arc-agi
⚖️
Advisor · Shape the floor
If you've worked ARC-AGI seriously, we want to hear how you think about task selection, failure taxonomy, and what beats 64%. CLO-gated. Serious only.
C=11+ CLO gate
→ legal@pemos.ca
Open gaps · pick one up
GAP-PARSE
Reliable grid extraction
P0
LLM output → valid [[int]] grid, reliably, every time. Current: ~60% parse rate.
GAP-COLOR
Color equivariance
P1
Permuting colors shouldn't change the answer. Each cap tries a different color mapping. Remap back at end.
GAP-VOTE
Vote with confidence
P1
3-cap vote · when no majority, pick the cap with best training accuracy. Don't withdraw blindly.
GAP-TASK
Task selection
P2
Some tasks are solvable by small models. Predict which ones. Spend budget on winnable tasks.
GAP-DSL
DSL primitives
P2
ARC patterns recur. A small library of grid primitives (mirror, rotate, fill, mask) helps caps reason faster.
Where we sit · ARC-AGI v4
Team Score · Note
1
noemon.ai
91.4%
LLM+DSL+vote
2
EdisCore
84%
transform() verifier
3
P'tit Ju
57.6%
DSL+TDD
EOSE Labs (3-cap verifier)
→ 64%
building now
compress-arc
20%
floor to beat
How we work · the 3 says
〰️
Wiggles
Curiosity is enough to start. You don't need to know everything. Come look. Ask questions. That's a contribution.
🌀
Wonder
The best ideas come from people who look at a problem sideways. Don't suppress the odd angle — that's often where the gap closes.
No sorry
No gatekeeping. No hierarchy of worthiness. If your contribution is real, it goes on the floor. The work speaks.

The floor is open.

We're building toward 64% on ARC-AGI v4. Come check the floor, pick up a gap, or just watch what happens. All paths welcome.

→ see the floor floor@pemos.ca lookbook