geDIG F-Decomposition: Visual Proofs

Figure 1

Matchstick Analogy — Independence of the Three Terms

For the same edit cost (EPC = 1), topology (Δβ₁) and information (ΔH) vary independently.
This is not a metaphor — it is a concrete instance of the geDIG decomposition.

F = EPC − λ (ΔH + γ · Δβ₁)

Case A

β₁ = 0

→

β₁ = 1

EPC 1 +1 edge

Δβ₁ +1 loop created

ΔH +0.4 new path options

Insight

Case B

β₁ = 0

→

β₁ = 0

EPC 1 +1 edge

Δβ₁ 0 topology unchanged

ΔH +0.3 information gained

Routine

Case C

β₁ = 1

→

β₁ = 0

EPC 1 −1 edge

Δβ₁ −1 loop destroyed

ΔH −0.2 redundancy lost

Structural collapse

The Blind Spot of KL Divergence

All three cases have EPC = 1, yet Δβ₁ diverges to +1, 0, −1. KL divergence sees only the measure (ΔH) and cannot distinguish Case A from B. geDIG decomposes every change into three irreducible primitives: metric, measure, and topology.

EPC — Metric

ΔH — Measure

Δβ₁ — Topology

Figure 2

The Pruning Paradox — Why Removing Structure Improves Performance

Removing the central hub node incurs an edit cost, but eliminates three redundant cycles,
reducing the structure to a topologically sufficient state. F decreases = efficiency increases.

Dense (before pruning)

4 redundant cycles

PRUNE

→

F ↓↓

Sparse (after pruning)

1 sufficient cycle

Structural Change

Nodes 5 → 4 (−1)

Edges 8 → 4 (−4)

EPC 5 (1 node + 4 edges)

F Decomposition

Δβ₁ 4 → 1 (−3 cycles)

ΔH non-uniform → uniform (↓)

F drops significantly ↓↓

Why does removing structure improve performance?

A dense layer is fully connected (K5-like), with high β₁. Many independent cycles exist, but most are redundant paths carrying the same information.

Pruning reduces β₁, eliminating redundant paths and concentrating signal on essential routes. Overfitting occurs when redundant cycles memorize noise in the training data — lowering β₁ physically reduces the capacity to memorize noise.

Lottery Ticket Hypothesis: A "winning ticket" is a subgraph that minimizes β₁ while maintaining β₀ = 1 (global connectivity). The square is the winning ticket inside K5.

Graph	Neural Network	F Behavior
K5 (complete graph)	Dense layer (fully connected)	High β₁, high F, redundant
Square (pruned)	Sparse layer (pruned)	Minimal β₁, low F, efficient
Tree (β₁ = 0)	Over-pruned	Zero redundancy, fragile

The Principle of Optimal Pruning

Pruning to β₁ = 0 (a tree) eliminates all redundancy and becomes fragile. β₁ ≫ 1 (fully connected) is too redundant and memorizes noise.
Optimal pruning is the operation of tuning β₁ to "just enough redundancy", which is equivalent to finding the subgraph that minimizes F.

Fig. 2: Removing the central hub from a K5 graph (dense layer) reduces β₁ from 4 to 1 and sharply lowers F. This provides a topological explanation of why pruning improves performance in neural networks, and offers a geDIG reinterpretation of the Lottery Ticket Hypothesis.

Figure 3 — Speculative Analogy

The Matchstick Equation — Why One Move Creates Meaning

A single matchstick moved transforms a false equation into a true one.
The structural cost is minimal (EPC = 1), but the information gain is maximal —
this is the "aha" moment that geDIG's Information Gain captures.

IG = ΔH + γ · ΔSP | The cost of one edit that makes everything click

FALSE

6 + 4 = 4   ???

MOVE 1

→

IG ↑↑

TRUE

0 + 4 = 4   ✓

Structural Cost

Matchsticks moved 1

EPC 1 (remove + add)

Δβ₁ 0 (both have one loop)

Information Gain

ΔH max → 0 (nonsense → true)

IG massive ↑↑

Experience "Aha!"

The Aha Moment as Information Gain

The matchstick puzzle captures the essence of geDIG's Information Gain.

Before the move: the equation is false — every interpretation fails, entropy is maximal, the structure carries no valid meaning.

After the move: the equation is true — a unique valid interpretation emerges, entropy collapses to zero, meaning crystallizes from the same material.

The structural cost was trivial (EPC = 1). The topology didn't even change (Δβ₁ = 0). But the information gain was maximal — one rearrangement turned noise into signal.

This is exactly what geDIG measures: not how much you changed, but how much meaning emerged from the change. The AG (Attention Gate) fires at the moment of the "aha" — the DG (Differential Information Gain) quantifies why.

Matchstick Puzzle	geDIG	Transformer Inference
Equation before move (false)	Graph before edge edit	Hidden state at shallow layer (high H)
Moving one matchstick	EPC = 1 (minimal edit)	One layer's transformation
Equation after move (true)	Graph after edit (IG maximized)	Hidden state at deep layer (low H)
The "aha" moment	AG fires → DG quantifies	Phase transition in F trajectory

A Speculative Bridge

This analogy is deliberately stretched — matchstick puzzles operate in a semantic space that the current F decomposition does not formally address. However, the structural parallel is suggestive:

In all three domains (puzzles, graphs, neural networks), minimal structural rearrangement can trigger maximal information gain. F measures the gap between the cost of change (EPC) and the value of change (IG). When F drops sharply, structure has found its meaning.

Fig. 3: A matchstick puzzle as semantic analogy for Information Gain. Moving one matchstick (EPC = 1) transforms a false equation into a true one, collapsing semantic entropy from maximum to zero. The topology is unchanged (Δβ₁ = 0), yet the meaning is entirely different — illustrating that IG captures semantic transitions invisible to purely topological or metric measures alone.