A structural fitness score for knowledge graphs —
can one equation decide when to restructure?
Three independent structures:
$\Delta EPC \approx$ "metric (distance)", $\Delta H \approx$ "measure (probability)", $\Delta \beta_1 \approx$ "topology (loops)"
$\mathcal{F} < 0$ means information gain exceeds structural cost — the system should commit the change.
Maze: 60% → 98% goal-reach (15×15).
Transformer: F tracks model quality across 8 models.
Why does F need three terms? Same edit cost. Completely different outcomes.
All three cases have EPC = 1, yet $\Delta\beta_1$ diverges to +1, 0, −1. KL divergence sees only $\Delta H$ and cannot distinguish Case A from B. F sees all three.
(λ = 1, γ = 1)
Real-time simulation of the gauge $\mathcal{F}$ as the graph grows.
Click anywhere in the graph to inject a new query node and see if it triggers an Insight (DG) or
Ambiguity (AG).
AG (Ambiguity Gauge): 0-hop error (High Cost).
DG (Discovery Gauge): Multi-hop shortcut (Insight Found).
Where can a single thermodynamic gauge make a difference?
F acts as an accept/reject gate for retrieved information. AG detects novelty; DG validates structural integration. Tested on HotPotQA with real LLM inference.
A partial-observation maze agent builds a persistent knowledge graph. F decides when to explore (AG) vs. exploit (DG). Wake-Sleep-Wake cycle with three-layer search.
Layer-by-layer $\mathcal{F}$ decomposition across hidden states. Tests whether F tracks model quality as a structural signature.
Two gates govern intelligent processing. One equation drives both.
"Is this surprising?" — 0-hop novelty detection. When prediction diverges from input, AG fires and triggers processing.
Computational analogy: noradrenaline
"Does $\mathcal{F}$ decrease?" — Multi-hop structural validation. If restructuring reduces F, the system commits the change.
Computational analogy: dopamine
*The neurotransmitter correspondence is a computational analogy, not a physiological claim.
geDIG provides an operational correspondence between the Free Energy Principle (minimizing surprise) and Minimum Description Length (maximizing compression).
*This is an operational analogy, not a formal proof of equivalence.
Empirical tests of F across different domains.
| Experiment | Scale | Key Result | Status |
|---|---|---|---|
| Maze 15×15 | 12 seeds, 250 steps | 98% goal-reach (baseline ~60%) | Reproducible |
| Maze 25×25 | Active | Graph-persistent DG + 10D vector extension | In progress |
| HotPotQA | dev-500, GPT-4o-mini | EM=38.0%, F1=53.7% | Archived |
| Transformer F-decomp | 8 models | GPT series: $\Delta R^2_{\text{struct}}$ improves with scale | Preliminary |
Reproduce (Maze 15×15)
Requires .venv with networkx, numpy, etc.
See experiments/maze/README.md for full CLI reference.
Papers, code, and open questions.
"geDIG: Gauge what Knowledge Graph needs"
v6.0 Draftmiyauchikazuyoshi/InsightSpike-AI
Research framework for the thermodynamics of intelligence
Open Research Questions
We welcome collaboration on these specific questions:
How to engage: open an Issue, PR, or DM on X: @kazuyoshim5436.
Citation (BibTeX)
See also: geDIG spec, Matchstick Figure.