A constraint
A constraint is a rule that prevents everything from happening. In a sieve, some numbers disappear; in a physical system, some configurations become impossible.
PT mathematics / master identity
PT starts from a simple idea: when a constraint acts, not everything disappears. One part becomes structure; another part remains dispersed. The first principle says that these two parts are not independent: they share one same total amount of distinction.
GFT is the identity that formalizes this budget. For any distribution $P$ over $m$ states, total capacity $log_2 m$ decomposes exactly into persistent information and residual entropy.
Plain language
Imagine a sorting process. At first, many things are possible. Then a rule acts: it removes, separates, or orients. At the end, there are two things to read: what has taken a stable form, and what remains still open.
For PT, this is persistence. It does not mean “staying the same without moving.” It means surviving an admissible constraint while keeping a readable identity. The technical formula comes later; it counts this distribution.
A constraint is a rule that prevents everything from happening. In a sieve, some numbers disappear; in a physical system, some configurations become impossible.
What persists is not what merely stays motionless. It is what passes through a constraint without losing its structural identity.
Everything that is not stabilized remains as uncertainty, open choice, or noise. PT calls this the residual entropy.
The key point is that structure and dispersion do not add freely. They share one same budget of distinction.
The word “information” can sound intimidating, but here it first means: how many differences do you need to recognize a situation? If everything looks the same, there is little to distinguish. If a form appears, something becomes recognizable.
The first principle says that this recognizability has a cost. The sharper a structure becomes, the less blur remains available inside the same system. If the structure dissolves, the blur rises again.
This is the thread linking the other pages: in cosmogony, the cascade sorts thresholds; in the periodic table, shells and blocks are stabilized forms; here, we explain the common accounting rule.
Picture a game with several possible squares. The more squares there are, the more distinctions you need to say exactly where you are. That total number of distinctions is the system’s budget.
Mathematically, that budget is written $\log_2(m)$ because it is counted in yes/no questions, hence in bits. But the idea behind it is simply this: the system has a total amount of distinction available.
One part becomes form: it distinguishes, constrains, selects. That is persistence. The other part remains open, dispersed, undecided. That is entropy.
The fundamental principle says that the two parts compensate exactly. When structure gains one bit, available entropy loses one bit, and conversely. PT places that conservation at the center of the theory.
what could happen
what sorts
what keeps a form
what remains open
For $m$ possibilities, there is a total budget of distinctions. In technical language, that budget is $\log_2 m$ bits.
$D_{\mathrm{KL}}$ measures what distinguishes the actual distribution from the maximally undifferentiated state.
$H(P)$ measures what remains dispersed, uncrystallized, not selected as persistent structure.
The identity is algebraic tautology: it is true for any distribution. Its force therefore does not come from technical difficulty, but from the conservation language it gives to PT.
In the sieve, $D_{\mathrm{KL}}$ becomes the structured part: what has been selected, constrained, made distinct. $H(P)$ becomes the part still dispersed. Their sum remains the same information cap.
Let $P=(p_0,\ldots,p_{m-1})$ be a probability distribution and let $U_m=(1/m,\ldots,1/m)$ be the uniform distribution. By definition:
Distribute the logarithm, then use $\sum_r p_r=1$:
Since $H(P)=-\sum_r p_r\log_2 p_r$, we obtain:
GFT is a universal identity, then a PT reading principle when its two terms are applied to the sieve and to residue distributions.
The identity alone does not prove a new law about primes. Its depth comes from its role in the PT chain, not from the algebra.