The Theory of Persistence

PT mathematics / master identity

The fundamental principle of persistence

PT starts from a simple idea: when a constraint acts, not everything disappears. One part becomes structure; another part remains dispersed. The first principle says that these two parts are not independent: they share one same total amount of distinction.

GFT is the identity that formalizes this budget. For any distribution $P$ over $m$ states, total capacity $log_2 m$ decomposes exactly into persistent information and residual entropy.

$\log_2 m = D_{\mathrm{KL}}(P\|U_m)+H(P)$

Plain language

Before the formula

Imagine a sorting process. At first, many things are possible. Then a rule acts: it removes, separates, or orients. At the end, there are two things to read: what has taken a stable form, and what remains still open.

For PT, this is persistence. It does not mean “staying the same without moving.” It means surviving an admissible constraint while keeping a readable identity. The technical formula comes later; it counts this distribution.

A constraint

A constraint is a rule that prevents everything from happening. In a sieve, some numbers disappear; in a physical system, some configurations become impossible.

What persists

What persists is not what merely stays motionless. It is what passes through a constraint without losing its structural identity.

What disperses

Everything that is not stabilized remains as uncertainty, open choice, or noise. PT calls this the residual entropy.

The total budget

The key point is that structure and dispersion do not add freely. They share one same budget of distinction.

Intuition

One budget, two ways to occupy it

The word “information” can sound intimidating, but here it first means: how many differences do you need to recognize a situation? If everything looks the same, there is little to distinguish. If a form appears, something becomes recognizable.

The first principle says that this recognizability has a cost. The sharper a structure becomes, the less blur remains available inside the same system. If the structure dissolves, the blur rises again.

This is the thread linking the other pages: in cosmogony, the cascade sorts thresholds; in the periodic table, shells and blocks are stabilized forms; here, we explain the common accounting rule.

Picture a game with several possible squares. The more squares there are, the more distinctions you need to say exactly where you are. That total number of distinctions is the system’s budget.

Mathematically, that budget is written $\log_2(m)$ because it is counted in yes/no questions, hence in bits. But the idea behind it is simply this: the system has a total amount of distinction available.

One part becomes form: it distinguishes, constrains, selects. That is persistence. The other part remains open, dispersed, undecided. That is entropy.

The fundamental principle says that the two parts compensate exactly. When structure gains one bit, available entropy loses one bit, and conversely. PT places that conservation at the center of the theory.

Read it in four gestures

01

possibilities

what could happen

02

constraint

what sorts

03

persistence

what keeps a form

04

entropy

what remains open

Information budget of the sieve

persistence entropy
0 2 4 6 8 bits m=2 1.00 m=6 2.58 m=30 4.91 m=210 7.71 each bar shows the same total budget: persistence + entropy
capacity

Total capacity is fixed

For $m$ possibilities, there is a total budget of distinctions. In technical language, that budget is $\log_2 m$ bits.

persistence

Structure is distance from uniformity

$D_{\mathrm{KL}}$ measures what distinguishes the actual distribution from the maximally undifferentiated state.

entropy

The remainder stays as uncertainty

$H(P)$ measures what remains dispersed, uncrystallized, not selected as persistent structure.

Standard

Why this is a fundamental principle

The identity is algebraic tautology: it is true for any distribution. Its force therefore does not come from technical difficulty, but from the conservation language it gives to PT.

In the sieve, $D_{\mathrm{KL}}$ becomes the structured part: what has been selected, constrained, made distinct. $H(P)$ becomes the part still dispersed. Their sum remains the same information cap.

$dD_{\mathrm{KL}}+dH=0,\quad \frac{dH}{dD_{\mathrm{KL}}}=-1$

Exact status

  • GFT-ID. Exact algebraic identity.
  • Arrow. Fixed-$m$ identity: variations compensate.
  • Bekenstein. Immediate theorem: $D_{\mathrm{KL}}\le\log_2 m$.
  • Physical reading. Interpretive bridge treated elsewhere in the monograph.
Proof

Complete proof

Let $P=(p_0,\ldots,p_{m-1})$ be a probability distribution and let $U_m=(1/m,\ldots,1/m)$ be the uniform distribution. By definition:

$D_{\mathrm{KL}}(P\|U_m)=\sum_r p_r\log_2\frac{p_r}{1/m}$

Distribute the logarithm, then use $\sum_r p_r=1$:

$D_{\mathrm{KL}}=\sum_r p_r(\log_2 p_r+\log_2 m)=\log_2 m+\sum_r p_r\log_2 p_r$

Since $H(P)=-\sum_r p_r\log_2 p_r$, we obtain:

$D_{\mathrm{KL}}=\log_2 m-H(P)\quad\Longleftrightarrow\quad \log_2 m=D_{\mathrm{KL}}+H(P)$

What this page claims

GFT is a universal identity, then a PT reading principle when its two terms are applied to the sieve and to residue distributions.

What it does not claim

The identity alone does not prove a new law about primes. Its depth comes from its role in the PT chain, not from the algebra.