Matchstick Analogy — Independence of the Three Terms
For the same edit cost (EPC = 1), topology (Δβ1) and information (ΔH) vary independently.
This is not a metaphor — it is a concrete instance of the geDIG decomposition.
F = EPC − λ (ΔH + γ · Δβ1)
Case A
β1 = 0
→
β1 = 1
EPC1+1 edge
Δβ1+1loop created
ΔH+0.4new path options
Insight
Case B
β1 = 0
→
β1 = 0
EPC1+1 edge
Δβ10topology unchanged
ΔH+0.3information gained
Routine
Case C
β1 = 1
→
β1 = 0
EPC1−1 edge
Δβ1−1loop destroyed
ΔH−0.2redundancy lost
Structural collapse
The Blind Spot of KL Divergence
All three cases have EPC = 1, yet Δβ1 diverges to +1, 0, −1.
KL divergence sees only the measure (ΔH) and cannot distinguish Case A from B.
geDIG decomposes every change into three irreducible primitives:
metric, measure, and topology.
EPC — Metric
ΔH — Measure
Δβ1 — Topology
Figure 2
The Pruning Paradox — Why Removing Structure Improves Performance
Removing the central hub node incurs an edit cost, but eliminates three redundant cycles,
reducing the structure to a topologically sufficient state. F decreases = efficiency increases.
Dense (before pruning)
4 redundant cycles
PRUNE
→
F ↓↓
Sparse (after pruning)
1 sufficient cycle
Structural Change
Nodes5 → 4 (−1)
Edges8 → 4 (−4)
EPC5 (1 node + 4 edges)
F Decomposition
Δβ14 → 1 (−3 cycles)
ΔHnon-uniform → uniform (↓)
Fdrops significantly ↓↓
Why does removing structure improve performance?
A dense layer is fully connected (K5-like), with high β1.
Many independent cycles exist, but most are redundant paths carrying the same information.
Pruning reduces β1, eliminating redundant paths and concentrating signal on essential routes.
Overfitting occurs when redundant cycles memorize noise in the training data —
lowering β1 physically reduces the capacity to memorize noise.
Lottery Ticket Hypothesis: A "winning ticket" is a subgraph that minimizes β1
while maintaining β0 = 1 (global connectivity). The square is the winning ticket inside K5.
Graph
Neural Network
F Behavior
K5 (complete graph)
Dense layer (fully connected)
High β1, high F, redundant
Square (pruned)
Sparse layer (pruned)
Minimal β1, low F, efficient
Tree (β1 = 0)
Over-pruned
Zero redundancy, fragile
The Principle of Optimal Pruning
Pruning to β1 = 0 (a tree) eliminates all redundancy and becomes fragile.
β1 ≫ 1 (fully connected) is too redundant and memorizes noise.
Optimal pruning is the operation of tuning β1 to "just enough redundancy",
which is equivalent to finding the subgraph that minimizes F.
Fig. 2: Removing the central hub from a K5 graph (dense layer) reduces β1 from 4 to 1 and sharply lowers F.
This provides a topological explanation of why pruning improves performance in neural networks,
and offers a geDIG reinterpretation of the Lottery Ticket Hypothesis.
Figure 3 — Speculative Analogy
The Matchstick Equation — Why One Move Creates Meaning
A single matchstick moved transforms a false equation into a true one.
The structural cost is minimal (EPC = 1), but the information gain is maximal —
this is the "aha" moment that geDIG's Information Gain captures.
IG = ΔH + γ · ΔSP |
The cost of one edit that makes everything click
FALSE
6 + 4 = 4 ???
MOVE 1
→
IG ↑↑
TRUE
0 + 4 = 4 ✓
Structural Cost
Matchsticks moved1
EPC1 (remove + add)
Δβ10 (both have one loop)
Information Gain
ΔHmax → 0 (nonsense → true)
IGmassive ↑↑
Experience"Aha!"
The Aha Moment as Information Gain
The matchstick puzzle captures the essence of geDIG's Information Gain.
Before the move: the equation is false — every interpretation fails,
entropy is maximal, the structure carries no valid meaning.
After the move: the equation is true — a unique valid interpretation emerges,
entropy collapses to zero, meaning crystallizes from the same material.
The structural cost was trivial (EPC = 1). The topology didn't even change (Δβ1 = 0).
But the information gain was maximal — one rearrangement turned noise into signal.
This is exactly what geDIG measures: not how much you changed, but
how much meaning emerged from the change.
The AG (Attention Gate) fires at the moment of the "aha" —
the DG (Differential Information Gain) quantifies why.
Matchstick Puzzle
geDIG
Transformer Inference
Equation before move (false)
Graph before edge edit
Hidden state at shallow layer (high H)
Moving one matchstick
EPC = 1 (minimal edit)
One layer's transformation
Equation after move (true)
Graph after edit (IG maximized)
Hidden state at deep layer (low H)
The "aha" moment
AG fires → DG quantifies
Phase transition in F trajectory
A Speculative Bridge
This analogy is deliberately stretched — matchstick puzzles operate in a semantic space
that the current F decomposition does not formally address. However, the structural parallel
is suggestive:
In all three domains (puzzles, graphs, neural networks),
minimal structural rearrangement can trigger maximal information gain.
F measures the gap between the cost of change (EPC) and the value of change (IG).
When F drops sharply, structure has found its meaning.
Fig. 3: A matchstick puzzle as semantic analogy for Information Gain. Moving one matchstick (EPC = 1)
transforms a false equation into a true one, collapsing semantic entropy from maximum to zero.
The topology is unchanged (Δβ1 = 0), yet the meaning is entirely different —
illustrating that IG captures semantic transitions invisible to purely topological or metric measures alone.