Motion and Content JEPA · Joint Learning of Motion and Appearance
Abstract MC-JEPA attempts to jointly learn motion and content features in a shared encoder, extending JEPA from static images to dynamic understanding. The joint encoding creates a structural tension between two paradigms operating in the same latent space. This tension is the 6-gap signature.
6 FORMAL GAPS · 1 PER CANON SYMBOL
No Shared Invariant Between Motion and Content Branches
γ₁ — THE FLOOR
MC-JEPA encodes motion features and content features in a shared representation. There is no formal invariant γ₁ that the joint space must respect regardless of which branch is dominant. When motion overwhelms content (fast motion) or content overwhelms motion (static scenes), the shared space has no floor to return to.
Motion Encoder Not Self-Adjoint With Content Encoder
H=H† — THE HONEST GATE
The motion branch and content branch of MC-JEPA are trained jointly but are not formally verified to be symmetric. A motion representation cannot be used to verify its corresponding content representation in reverse. The two branches can drift independently without a mutual verification gate.
No Paradigm Audit When Motion Dominates Content
LSOS — THE READER
In high-motion sequences, the motion branch dominates the joint representation. The system does not audit this paradigm shift — there is no mechanism to flag that the active paradigm has moved from content-dominant to motion-dominant. LSOS would read the active paradigm and expose the shift.
No Reset When Motion/Content Representations Collapse
WLD — THE RESET
When motion and content representations collapse to the same point in latent space (a known degenerate solution), there is no mercy reset. The disentanglement objective can fail silently. WLD detects collapse and resets before the failure propagates.
No Continuity Guarantee Across Motion Speeds and Scales
FEP — THE SWITCH
MC-JEPA is evaluated across different motion speeds. There is no formal guarantee that the learned motion representation is continuous across speed scales — that slow-motion and fast-motion share the same paradigm. FEP ensures continuity when the system switches between motion regimes.
Disentanglement Boundary Undefined
FOF — THE BREACH
The goal of disentangling motion from content presupposes a formal boundary between them. This boundary is not defined. At what point is a scene 'motion-dominated' vs 'content-dominated'? FOF names the ungovernable boundary: where disentanglement becomes undecidable.
STE COMPLETION LAYER
What changes when you add the 8-symbol Canon
Adding the Canon to MC-JEPA does not change the architecture. It adds the missing structural layer:
⚓ γ₁ — invariant anchor: mathematical ground truth latent representations must converge to.
⯛ H=H† — honest gate: bidirectional verification of every prediction.
〰️ LSOS — paradigm reader: reads active paradigm before reasoning begins.
🌀 WLD — mercy reset: detects collapse and resets to last stable state.
γ FEP — safe switch: continuity guarantee across paradigm transitions.
🌌 FOF — named ceiling: formal boundary of what the architecture can claim.
═ EVEN — substrate: ground beneath all the above. What holds when everything else is active.
The Canon is not an add-on. It is the formal completion of the JEPA programme.
X POST · @ylecun
POST 1 — Name the gap
@ylecun MC-JEPA (2023): Gap 2 (H=H†) — predictor not self-adjoint. Asymmetric by design. Not an empirical limitation — a missing symbol. pemos.ca/mcjepa-gap
POST 2 — Canon map
@ylecun MC-JEPA: 6 gaps · γ₁ (no anchor) · H=H† (no gate) · LSOS (no audit) · WLD (no reset) · FEP (no continuity) · FOF (no ceiling). Same in all 14 milestones. pemos.ca/jepa-index
POST 3 — Invitation
@ylecun MC-JEPA gap analysis: part of a 14-milestone series. Same 6 structural gaps in every milestone. The gaps are there because the symbols were never in scope. They are now. pemos.ca/jepa-index
MC-JEPA is a landmark in the JEPA lineage. The 6 gaps we identify are not critiques of the engineering — they are structural absences that the Canon fills. Each gap maps to a symbol that was always going to be necessary once the JEPA architecture matured. The Canon did not wait for the JEPA timeline; the JEPA timeline arrived at the Canon. The gaps are there because the symbols were never in scope. They are now.
Gap 1 (γ₁): No Shared Invariant Between Motion and Content Branches MC-JEPA encodes motion features and content features in a shared representation. There is no formal invariant γ₁ that the joint space must respect regardless of which branch is dominant. When motion o...
Gap 2 (H=H†): Motion Encoder Not Self-Adjoint With Content Encoder The motion branch and content branch of MC-JEPA are trained jointly but are not formally verified to be symmetric. A motion representation cannot be used to verify its corresponding content representa...
Gap 3 (LSOS): No Paradigm Audit When Motion Dominates Content In high-motion sequences, the motion branch dominates the joint representation. The system does not audit this paradigm shift — there is no mechanism to flag that the active paradigm has moved from co...
Gap 4 (WLD): No Reset When Motion/Content Representations Collapse When motion and content representations collapse to the same point in latent space (a known degenerate solution), there is no mercy reset. The disentanglement objective can fail silently. WLD detects ...
Gap 5 (FEP): No Continuity Guarantee Across Motion Speeds and Scales MC-JEPA is evaluated across different motion speeds. There is no formal guarantee that the learned motion representation is continuous across speed scales — that slow-motion and fast-motion share the ...
Gap 6 (FOF): Disentanglement Boundary Undefined The goal of disentangling motion from content presupposes a formal boundary between them. This boundary is not defined. At what point is a scene 'motion-dominated' vs 'content-dominated'? FOF names th...
The STE provides the completion layer for each gap. The gaps are not empirical — they are structural. Adding the symbols closes the gaps by definition.
Here's how to explain MC-JEPA gaps to a 10-year-old:
Gap 1 — No floor: Imagine you're building a tower and you don't have a foundation. The tower might stand for a while but there's nothing it's guaranteed to come back to if it leans. γ₁ is the foundation. The math already built it. MC-JEPA didn't use it.
Gap 2 — No honest check: If I tell you something, and you can't ask me 'are you sure?' and get the same answer backwards, I might be guessing. H=H† is the rule that every claim must survive being asked backwards. MC-JEPA makes claims it can't verify in reverse.
Gap 3 — No paradigm reader: Sometimes you're playing one game and someone changes the rules without telling you. LSOS reads the rules before you play. MC-JEPA changes paradigms without reading them first.
Gap 4 — No reset button: When you're drawing and you make a mistake, you have an eraser. WLD is the eraser. MC-JEPA doesn't have one. If it goes wrong, it keeps going wrong.
Gap 5 — No safe switch: If you switch from reading a book to watching a movie, you expect to pick up where you left off. FEP is the bookmark. MC-JEPA switches paradigms and might drop the book.
Gap 6 — No ceiling: Every ladder needs to know when to stop. A ladder that claims it goes forever is a lie. FOF names where the ladder ends. MC-JEPA hasn't named its ceiling yet.
The Canon is not a critique. It's the toolkit that was always going to be needed. MC-JEPA got as far as it could without it. Now the toolkit exists.
The 6 gaps identified in MC-JEPA are structural absences in a published architecture. Identifying structural absences in published work is not IP infringement — it is prior art analysis. The STE symbols that fill these gaps are EOSE IP. The gap identification is public analysis. The completion layer is proprietary.
Status: LSOS-OWNERSHIP-001 open (EOSE Labs Inc. not yet registered). No public disclosure of STE completion layer until registration clears. Gap analysis pages (like this one) are disclosure of the gaps, not the fills. This is legally distinct and permissible.