The Theory of Persistence
Essay · Plain · 6 min

What is persistence?

The word "persistence" has a precise technical meaning in PT: it is the structured part of information, in bits, that resists mixing. Here is how it is defined and why it is conserved.

Go deeper: GFT , T2

The question

When a signal goes through a noisy channel, part gets through and part is lost. When a physical system evolves, some structures persist, others dilute. When you shuffle a deck, the initial order fades, but our knowledge of it is more subtle.

In all three situations, we have an intuition of “what persists”. PT gives it an operational definition.

The definition

In plain words, persistence is the part of a system that remains readable after mixing, noise, or constraint. If everything becomes perfectly random, there is no persistence left. If a form remains recognizable, then part of the information has persisted.

A very ordinary image helps: a pebble. The sea does not add its shape from the outside; it removes what does not hold. What remains is not just any residue, but the stable trace of a long filtering. In PT, the discrete layer often has this role: it marks the remarkable positions where a continuous mechanics under constraint becomes stable and readable.

The technical formula says the same thing with three terms:

The persistence of a distribution PP on mm states is:

DKL(PUm)=log2mH(P),D_{KL}(P \,\|\, U_m) = \log_2 m - H(P),

where H(P)H(P) is Shannon entropy and UmU_m the uniform distribution on mm states. Here, mm is only the number of possibilities, and log2(m)\log_2(m) counts the total budget in bits: in other words, how many binary distinctions it would take to identify a state. The more PP deviates from chance, the larger DKLD_{KL} becomes.

A uniform distribution has DKL=0D_{KL} = 0 — no persistence, pure noise. A distribution concentrated on a single state has DKL=log2mD_{KL} = \log_2 m — the entire informational capacity is structured.

That is the Gap Fundamental Theorem (GFT):

log2m  =  DKL(PUm)  +  H(P).\log_2 m \;=\; D_{KL}(P \,\|\, U_m) \;+\; H(P).

This identity is exact, not approximate. For any distribution, on any number of states. It is the fundamental principle of persistence: the total budget of distinctions is conserved, partitioned between persistence and entropy.

Why it is central

The conservation log2m=DKL+H\log_2 m = D_{KL} + H is an algebraic identity, not a physical law. But it has a strong physical consequence: knowing two of the three quantities (log2m\log_2 m, DKLD_{KL}, HH) determines the third. No double counting possible.

That is precisely what prevents cheating in PT by adding an extra parameter to compensate for an error. Any correction to DKLD_{KL} must mirror in HH. Any redefinition of HH must shift DKLD_{KL} by the same amount. The balance is exactly conserved.

In the language of binary codes: log2m\log_2 m is the optimal description length of a state. DKLD_{KL} is what we save on that description thanks to the structure of PP. HH is what we still must spend because PP is not concentrated.

Physical persistence

When PP is identified with the distribution of gaps between consecutive primes, DKLD_{KL} becomes a physical quantity:

This last bound is the Shannon cap of PT: no prime can carry more than one bit. Three active primes, three bits — exactly the informational content of a Standard Model particle with its gauge quantum numbers.

An analogy

Take a text. Its bit length is log2m\log_2 m — the raw capacity. Its entropy HH measures how unpredictable the words are. Its persistence DKLD_{KL} measures what is structured: grammar, redundancies, patterns.

A random text has DKL=0D_{KL} = 0, H=log2mH = \log_2 m: useless and unreadable.

A hyper-structured text (“aaaaaa…”) has DKL=log2mD_{KL} = \log_2 m, H=0H = 0: predictable, no new information.

An interesting text lives in between, with a partition between the two. PT says physics also lives in between, and that partition follows the arithmetic cascade of the sieve.

And conservation?

One last way to put it: the GFT is the PT equivalent of energy conservation, but written directly in the language of persistence. Informational capacity is neither created nor destroyed — it is transformed. That is why the identity holds at every scale, for every distribution.

This is what makes persistence usable as a biomarker (medical imaging, IST/IEE), a linguistic measure (language evolvability), or an internal consistency check in a physical derivation. The same object, everywhere.


← All essays