| SET | MODEL | TASKS | CORRECT | SCORE | AVG CELL MATCH | NOTES |
|---|---|---|---|---|---|---|
| V8AGI-1 training ×5 runs | qwen2.5:14b | 54 each | 14–19 | 25.9–35.2% | — | AGI-1 bench · highly repetitive grids · pattern matching works |
| AGI-1Training 20-task | qwen2.5:7b | 20 | 0 | 0% | ~0% | Smaller model · AGI-1 · 0/20 |
| AGI-1Training ×8 editions | qwen2.5:7b | 20 | 0 | 0% | 0.0% | Memorisation collapse confirmed · rotated grids → 0% |
| AGI-2Training | qwen2.5:7b | 20 | 0 | 0% | 21.6% | 2 "close" · 30×30 wall begins |
| AGI-2Training | qwen2.5:14b | 20 | 1 | 5% | 32.8% | 4 "close" · ONLY SOLVE: 017c7c7b (tiling+recolor — guessable) |
| AGI-2 EVALBlind set | qwen2.5:14b | 10 | 0 | 0% | 8.9% | 1 "close" · THE WALL · TOTAL COGNITIVE COLLAPSE |
AUTOPSY — WHY THE WALL IS REAL
The AGI-1 mirage: 25-35% on v8 looked like progress.
It was memorisation. Smaller grids, repetitive patterns, transformer pattern-matching
working on pixel noise. The model never learned rules — it learned textures.
AGI-2 is different: 30×30 grids. Novel topologies.
No two tasks share a pattern. The model receives 900 integers
and must extract the spatial rule. At 8.9% cell match on the eval set,
it is not even close to the right shape — it is hallucinating
pixel values with no geometric grounding.
Rick's Law holds: You cannot transform an object you cannot perceive.
017c7c7b was solved because it was a simple tiling + recolor —
the model could guess the answer from statistical patterns.
The other 9 eval tasks required actual spatial reasoning. Flatline.
The fix is not a bigger model. The fix is the Optic Nerve.