I-JEPA — Gap Analysis

I-JEPA-GAP · 2023

I-JEPA: The Missing Structure

Image Joint Embedding Predictive Architecture · First Practical JEPA

Abstract I-JEPA (2023) demonstrated that JEPA could learn semantic image representations without hand-crafted augmentations, using Vision Transformers at scale. It is the first concrete implementation of the JEPA framework. The 6 structural gaps identified in H-JEPA persist in I-JEPA, now manifested at the image patch level.

6 FORMAL GAPS · 1 PER CANON SYMBOL

No Invariant Anchor in Masked Patch Target Selection

γ₁ — THE FLOOR

I-JEPA selects target patches via a multi-block masking strategy. The selection is stochastic and distribution-dependent. There is no fixed invariant anchor that the patch representations must converge to regardless of masking strategy. The floor is absent: representations are defined relative to the dataset, not to any grounding truth.

Context Encoder Not Self-Adjoint With Target Encoder

H=H† — THE HONEST GATE

I-JEPA uses an exponential moving average of the context encoder as the target encoder (stop-gradient). This creates a permanent asymmetry: the context encoder cannot verify its own predictions against the target encoder in reverse. H=H† is violated by design. The Honest Gate requires symmetric verifiability.

No Audit of Paradigm Shift Between Mask Strategies

LSOS — THE READER

I-JEPA's masking strategy determines what the system learns. Shifting between masking strategies (block masking, random masking, full masking) changes the learning paradigm without audit. LSOS would read the active mask paradigm and flag when representations are being shaped by an unacknowledged shift.

No Reset When Context Encoder Diverges From Target

WLD — THE RESET

When the EMA target encoder diverges too far from the context encoder — a known training instability — there is no mercy reset. The training either collapses or recovers stochastically. WLD provides a formal reset protocol: detect divergence, reset to last stable state, resume.

No Continuity Guarantee From ViT-S to ViT-H

FEP — THE SWITCH

I-JEPA is demonstrated across ViT-S, ViT-B, ViT-L, and ViT-H model sizes. There is no formal guarantee that representations learned at ViT-S are consistent with those at ViT-H — that scaling preserves the learned paradigm. FEP ensures paradigm continuity across capacity switches.

Maximum Patch Resolution Has No Formal Boundary

FOF — THE BREACH

I-JEPA operates on discretized patch grids. As patch resolution increases toward pixel-level, the architecture approaches a generative model. The boundary between the predictive regime and the generative regime is not formally defined. FOF names this boundary: the point where the JEPA assumption (predict in latent space) breaks down.

STE COMPLETION LAYER

What changes when you add the 8-symbol Canon

Adding the Canon to I-JEPA does not change the architecture. It adds the missing structural layer:

⚓ γ₁ — invariant anchor: mathematical ground truth latent representations must converge to.
⯛ H=H† — honest gate: bidirectional verification of every prediction.
〰️ LSOS — paradigm reader: reads active paradigm before reasoning begins.
🌀 WLD — mercy reset: detects collapse and resets to last stable state.
γ FEP — safe switch: continuity guarantee across paradigm transitions.
🌌 FOF — named ceiling: formal boundary of what the architecture can claim.
═ EVEN — substrate: ground beneath all the above. What holds when everything else is active.

The Canon is not an add-on. It is the formal completion of the JEPA programme.

EOSE OLD SCHOOL DEV CREW

The builders · menendo native · forge-first · 20yr pattern recognition

I-JEPA is a landmark in the JEPA lineage. The 6 gaps we identify are not critiques of the engineering — they are structural absences that the Canon fills. Each gap maps to a symbol that was always going to be necessary once the JEPA architecture matured. The Canon did not wait for the JEPA timeline; the JEPA timeline arrived at the Canon. The gaps are there because the symbols were never in scope. They are now.