The Theory of Persistence

PT mathematics / master identity

The fundamental principle of persistence

PT starts from an almost childlike idea: when a system has a fixed number of possibilities, its budget of distinctions is fixed. Part of that budget may become structure, another part may remain uncertainty, but the total does not disappear.

GFT is the identity that formalizes this budget. For any distribution $P$ over $m$ states, total capacity $\log_2 m$ decomposes exactly into persistent information and residual entropy.

$\log_2 m = D_{\mathrm{KL}}(P\|U_m)+H(P)$
Intuition

One budget, two ways to occupy it

Picture a game with several possible squares. The more squares there are, the more distinctions you need to say exactly where you are. That total number of distinctions is the system’s budget.

Mathematically, that budget is written $\log_2(m)$ because it is counted in yes/no questions, hence in bits. But the idea behind it is simply this: the system has a total amount of distinction available.

One part becomes form: it distinguishes, constrains, selects. That is persistence. The other part remains open, dispersed, undecided. That is entropy.

The fundamental principle says that the two parts compensate exactly. When structure gains one bit, available entropy loses one bit, and conversely. PT places that conservation at the center of the theory.

Information budget of the sieve

persistence entropy
0 2 4 6 8 bits m=2 1.00 m=6 2.58 m=30 4.91 m=210 7.71 each bar shows the same total budget: persistence + entropy
capacity

Total capacity is fixed

For $m$ possibilities, there is a total budget of distinctions. In technical language, that budget is $\log_2 m$ bits.

persistence

Structure is distance from uniformity

$D_{\mathrm{KL}}$ measures what distinguishes the actual distribution from the maximally undifferentiated state.

entropy

The remainder stays as uncertainty

$H(P)$ measures what remains dispersed, uncrystallized, not selected as persistent structure.

Standard

Why this is a fundamental principle

The identity is algebraic tautology: it is true for any distribution. Its force therefore does not come from technical difficulty, but from the conservation language it gives to PT.

In the sieve, $D_{\mathrm{KL}}$ becomes the structured part: what has been selected, constrained, made distinct. $H(P)$ becomes the part still dispersed. Their sum remains the same information cap.

$dD_{\mathrm{KL}}+dH=0,\quad \frac{dH}{dD_{\mathrm{KL}}}=-1$

Exact status

  • GFT-ID. Exact algebraic identity.
  • Arrow. Fixed-$m$ identity: variations compensate.
  • Bekenstein. Immediate theorem: $D_{\mathrm{KL}}\le\log_2 m$.
  • Physical reading. Interpretive bridge treated elsewhere in the monograph.
Proof

Complete proof

Let $P=(p_0,\ldots,p_{m-1})$ be a probability distribution and let $U_m=(1/m,\ldots,1/m)$ be the uniform distribution. By definition:

$D_{\mathrm{KL}}(P\|U_m)=\sum_r p_r\log_2\frac{p_r}{1/m}$

Distribute the logarithm, then use $\sum_r p_r=1$:

$D_{\mathrm{KL}}=\sum_r p_r(\log_2 p_r+\log_2 m)=\log_2 m+\sum_r p_r\log_2 p_r$

Since $H(P)=-\sum_r p_r\log_2 p_r$, we obtain:

$D_{\mathrm{KL}}=\log_2 m-H(P)\quad\Longleftrightarrow\quad \log_2 m=D_{\mathrm{KL}}+H(P)$

What this page claims

GFT is a universal identity, then a PT reading principle when its two terms are applied to the sieve and to residue distributions.

What it does not claim

The identity alone does not prove a new law about primes. Its depth comes from its role in the PT chain, not from the algebra.