dm422 0 01 PDF

0.
Introduction
The theory of hereditarily finite sets, called here HF, and described in the Appendix, has
been investigated by Ackermann, Beth, Givant and Tarski ([A], [Be] and [TG]). HF is
very closely related to PA (Peano Arithmetic); in fact, they are definitionally equivalent
(they have a common extension by definitions T —see Section 10). Thus it appears that
the general mathematical significance of establishing Gödel’s incompleteness theorems
with reference to one of these theories is the same as for the other.
Unlike most other authors working in this field, we have chosen HF, rather than
PA, for this task. This was done because HF is much better suited for the purpose of
describing its own meta-theory. The reason lies in the great expressive power of set theory.
A set-theoretical description of the language of HF (and then of the whole meta-theory)
presents itself in a wholly natural fashion, once it has been decided how to code (represent)
by constant terms the 7 basic (primitive) symbols and the variables of the first-order
language of HF. It is at hand to code the variables x1 , x2 , . . . simply by the ordinals
1, 2, . . . The constant 0 can be coded as 0, and the remaining 6 symbols as n-tuples of
0’s, say ∈ as h0, 0i, etc. And here ends the arbitrariness of coding, which is so unpleasant
when languages are arithmetized. Because terms, formulas and proofs are built from these
already coded basic symbols and variables by the formation of n-tuples or sequences and
such processes can be faithfully and naturally replicated by set-theoretical constructions.
By contrast, if natural (Gödel) numbers are used as codes, there is no natural way of
coding a sequence of these (numbers) as one number. (Mostly this is done by means of the
Chinese Remainder Theorem or by using the uniqueness of prime power decomposition.)
The necessity of coding proofs appropriately was pointed out by S. Feferman; he showed
that for a less than natural coding, Gödel’s results will fail (Theorems 5.9, 4.10 in [F]).
We need not face such a problem here: A sequence of codes of formulas that represents
a proof, i.e., a sequence of finite sets, is a finite set in its own right and there is no need
to code it again!
The simplicity (and possibly elegance) of working towards Gödel’s results (6.5 and
9.8) in the realm of finite sets offers one further advantage: With marginally more effort,
one can present all arguments completely, without omissions. As far as we know, such
a degree of completeness in proving Gödel’s results has never been attained before. To
reiterate: All published proofs of Gödel’s incompleteness theorems contain gaps, omissions
or references to key results in other publications. The omissions might not be gross, but
nevertheless presenting a potential stumbling block for the non-specialist. Thus, we have
made our exposition very detailed, hoping to make it thereby accessible to the widest
possible readership. Gödel’s theorems certainly deserve this sort of “popularization”; they
[5]
6 S. Świerczkowski
have generated much interest, even among non-mathematicians, and shed much light on
our understanding (or non-understanding) of the real nature of the game of mathematics.
It has been known since long that the more difficult (second) incompleteness theorem
is easily deduced, once one has established for each formula α the theorem (Hilbert–
Bernays derivabiltity condition)
Pf(pαq) → Pf(pPf(pαq)q),
where pαq is the code of α and Pf(x) is the formula which states that x is the code of a
provable formula. To show this, we followed in broad outline the work of Boolos [Bo]. The
latter pertained to PA and as such it had to be, in the words of its author, “incomplete
and irremediably messy” (p. 16 in [Bo]).
We have added to the main body of the paper Sections 10 and 11: The first deals with
the relationship between PA and HF and in the second we address what might be called
the “real world unprovability” of a theorem.
The author would like to thank Jan Mycielski for suggesting this line of research, to
Don Monk for a detailed verification of correctness (and pointing out inaccuracies), and
also to the other participants of the so-called Ulam seminar at the University of Colorado
in Boulder for listening to the first presentation of this work.
1. The language of HF and its coding
The theory of finite sets, i.e. HF, is presented in the language with the logical constants
=, ∨, ¬, ∃, the variables x1 , x2 , . . . and the non-logical constants 0, ∈ (binary relation)
and / (binary operation). (We shall treat ∧, → and ∀ as defined symbols.)
The following axiomatization of HF was taken from A. Tarski and S. Givant [TG]
(though we modified it slightly). There are two axioms and one scheme. The first axiom
defines 0 as the set deprived of elements, the second axiom defines / as the operation of
adjoining an element to a set, and the axiom scheme describes proofs by induction, very
much like in the case of PA.
HF1. z = 0 ↔ ∀x(x 6∈ z).
HF2. z = x / y ↔ ∀u(u ∈ z ↔ u ∈ x ∨ u = y).
HF3. α(0) ∧ ∀x∀y[α(x) ∧ α(y) → α(x / y)] → ∀xα(x).
The assumption in HF3 is that α is any formula in the first-order language based on
0, ∈ and /, moreover α contains a freely occurring variable z such that x and y are
substitutable for z, whilst α(x), α(y) and α(x / y) denote the effects of substitutions.
(We shall use lower case Latin letters, k, l, m, n, r, s, t, u, v, w, x, y, z, mostly to denote
any conveniently chosen variables; e.g., in the above axioms we can think of x, y, z as
x1 , x2 , x3 .)
A detailed development of the theory, to the extent that is needed for proving the
incompleteness theorems, is given in the Appendix (references to it will be prefixed with
Ap). To point out some differences between HF and the Zermelo–Fränkel set theory ZF,
let us note that in HF we have terms, due to the presence of the binary function symbol /,
Finite sets and Gödel’s incompleteness theorems 7
whereas in ZF there are none. We shall also use the fact that the universe of finite sets
is well ordered by an HF-definable relation < (Ap.5). No such ordering can be defined
in ZF.
The only way to produce terms in HF is by repeated applications of /. We can read
x / y as “x eats y” since this set arises from x by adopting a (new, or not) element y. Of
course, x can eat itself; the result is x / x and this is the successor of x, denoted by x+ .
The first few ordinals are then 0, 0+ = 1, 1+ = 2, . . . etc. (Ap.2).
All the constant terms of HF are obtained from terms with variables by replacing
in the latter all the variables by 0. The family of constant terms will be denoted by C.
The elements of C will be used as codes, and also for the construction of the standard
model S of HF (Ap.6).
We call 0 / x a singleton and denote it by {x}. Similarly, (0 / x) / y will be abbreviated
as {x, y} and ((0 / x) / y) / z as {x, y, z}. Thus {x}, {x, y}, {x, y, z} are terms. The
ordered pair hx, yi = {{x}, {x, y}}, triple hx, y, zi = hx, hy, zii, and, in general, any n-
tuple are composites of terms and hence these are terms themselves.
We wish to apply now the theory HF to describe its own meta-theory, i.e., various
relations between the words of the language of HF. For this purpose we assign to the main
objects of this language (variables, logical and non-logical symbols, terms and formulas)
their codes. What kind of entity should a code be? Since we wish to apply set-theoretical
formulas to codes in order to describe the meta-theory, it appears that a code should be
a set, or at least an entity closely and naturally associated to a set. However in the proof
that follows we have to assign to the code pϕq of a formula ϕ the code of this code, i.e.,
p(pϕq)q (this is accomplished by the function H in the proof of Gödel’s Diagonal Lemma
in Section 6). Since only language objects are coded, pϕq must be one of these. The
simplest language objects naturally associated to sets are the constant terms (associated
to the sets they name). Thus constant terms will be selected here as codes. But there
is a restriction. Two constant terms could be just different names for the same set, like
{{0}, 0} and {0, {0}}. To avoid this situation we shall select codes from a family Γ of
constant terms such that for any two distinct terms σ,τ in Γ one has ` σ 6= τ .
Definition 1.1. We denote by Γ0 the least family of constant terms such that
(a) 0 is in Γ0 .
(b) If σ is in Γ0 then so is σ / σ.
Further, we denote by Γ the least family of terms satisfying Γ0 ⊆ Γ and such that
(c) If σ,τ are in Γ then so is the term hσ, τ i.
Theorem 1.2. For any two distinct (as strings of symbols) terms σ, τ in Γ one has
` σ 6= τ .
Proof. Let us say that two constant terms σ, τ are identified in HF if σ = τ is a theorem.
By Ap.6.4, exactly one of the formulas σ = τ , σ 6= τ is a theorem. So our task is to prove
that distinct terms in Γ are never identified in HF. Let σ,τ be distinct. We consider three
cases:
8 S. Świerczkowski
Case 1: Both σ, τ are in Γ0 . It is easy to see that the terms belonging to Γ0 are names
of ordinal numbers, moreover σ / σ is a name of the successor of the number named by σ
(Ap.2.2 and 2.3(a)). Thus, if τ is longer than σ then the ordinal number named by τ
is obtained from the one named by σ by several applications of the successor operation,
which means that it is larger than the ordinal named by σ. It follows that σ, τ are not
identified in HF.
Case 2: Exactly one of σ, τ is in Γ0 . Suppose σ is in Γ0 . Since σ names an ordinal,

either σ is 0 or ` 0 ∈ σ (see Ap.2.5). On the other hand, τ is an ordered pair. But it is
easily proved that an ordered pair (i.e., {{x}, {x, y}}) is never the empty set nor does it
contain 0 among its elements. Thus σ, τ are not identified in HF.
Case 3: Both σ, τ are not in Γ0 . We prove by induction on the length of σ the following
implication:
If σ, τ are distinct terms in Γ \ Γ0 then σ, τ are not identified in HF.
The inductive assumption is that the above implication holds if σ, τ are replaced by σ 0 , τ 0 ,
where σ 0 is shorter than σ. Let σ, τ be distinct terms in Γ \ Γ0 . Suppose σ is hσ 0 , σ 00 i
and τ is hτ 0 , τ 00 i. Clearly, σ 0 is not the same term as τ 0 , or σ 00 is not the same term as τ 00 .
We assume the first possibility (if the second holds, the proof is analogous). If σ 0 or τ 0
is in Γ0 then one of the Cases 1, 2 applies and we see that σ 0 and τ 0 are not identified
in HF. If both σ 0 , τ 0 are not in Γ0 then the inductive hypothesis yields that σ 0 and τ 0 are
not identified in HF. It follows that hσ 0 , σ 00 i = hτ 0 , τ 00 i is not a theorem and thus σ, τ are
not are not identified in HF.
We begin coding by assigning to each variable xk and each of the basic 7 language
symbols 0, ∈, /, =, ∨, ¬, ∃ their codes, denoted by pxk q, p0q, p∈q, p/q, p=q, p∨q, p¬q, p∃q.
These are the following terms belonging to Γ :
p0q = 0,
p∈q = h0, 0i,
p/q = h0, 0, 0i,
p=q = h0, 0, 0, 0i,
p∨q = h0, 0, 0, 0, 0i,
p¬q = h0, 0, 0, 0, 0, 0i,
p∃q = h0, 0, 0, 0, 0, 0, 0i,
px1 q = 0 / 0, px2 q = px1 q / px1 q, and in general: pxk+ q = pxk q / pxk q.
Thus the code pxk q of xk is a term representing the ordinal number k (see Ap.2.3(a)).
The coding of terms and formulas now proceeds as follows:
Terms: The terms 0 and xk have been coded already. Other terms are coded so that if µ
and τ have been coded, then
pµ / τ q = hp/q, pµq, pτ qi.
Atomic formulas: The codes of atomic formulas (for terms µ, τ ) are

pµ = τ q = hp=q, pµq, pτ qi and pµ ∈ τ q = hp∈q, pµq, pτ qi.
Non-atomic formulas: For non-atomic formulas, if α, β are formulas already coded, we
put
p¬αq = hp¬q, pαqi,
pα ∨ βq = hp∨q, pαq, pβqi,
p∃xk αq = hp∃q, pxk q, pαqi.
2. Σ-formulas
We wish now to describe the property of a formula ϕ being provable by a set-theoretical

condition applicable to the code pϕq. This means that we should find a formula Pf(x)
of HF such that
` ϕ iff ` Pf(pϕq)
for any formula ϕ. Let us call the above the proof formalization condition. The description
of the proof process not being unique, there are many candidates for Pf(x). After having
selected Pf(x), in order to establish the proof formalization condition, we shall invoke a
model S of HF which we call the standard model. It will be not too hard then to argue
convincingly that ϕ is a theorem iff Pf(pϕq) is true in S, i.e.,
` ϕ iff S Pf(pϕq).
However not every sentence true in S is provable in HF (for example, the sentence δ
in Gödel’s First Incompleteness Theorem 6.5). Fortunately there is a large class of sen-
tences for which truth in S is equivalent to provability in HF. These are the Σ-sentences
introduced in this section. Hence we need to choose Pf(x) so that Pf(pϕq) is a Σ-sentence.
We first introduce the strict Σ-formulas, i.e., formulas that involve only bounded
universal quantifiers, the variables and the symbols ∈, ∨, ∧, ∃.
Definition 2.1. The class Σ of strict Σ-formulas is the smallest class of formulas such
that:
(1) The atomic formulas xi ∈ xj belong to Σ for all variables xi , xj .
(2) If α, β are in Σ, then so are α ∨ β and α ∧ β.
(3) If α is in Σ, then so are ∃xi α and ∀(xi ∈ xj )α for any distinct variables xi , xj .
If α is a strict Σ-formula and β is provably in HF equivalent to α then β will be called
simply a Σ-formula.
Here ∀(xi ∈ xj )α is an abbreviation of ∀xi (xi ∈ xj → α) (which in turn is an
abbreviation of ¬∃xi (xi ∈ xj ∧ ¬α)). So the only permissible occurrences of → and ¬ in
a strict Σ-formula are those which are introduced by the abbreviations ∀(xi ∈ xj )α.
Lemma 2.2. The formulas x ⊆ y, x = y z = x / y, x = 0, x ∈ 0, 0 ∈ x and all atomic
formulas are Σ-formulas.
10 S. Świerczkowski
Proof. We have the theorems:

(a) x ⊆ y ↔ ∀(u ∈ x)[u ∈ y].
(b) x = y ↔ x ⊆ y ∧ y ⊆ x.
(c) z = x / y ↔ ∀(u ∈ z)[u ∈ x ∨ u = y] ∧ (x ⊆ z) ∧ (y ∈ z).
(d) x = 0 ↔ ∀(u ∈ x)[u ∈ u] (see Ap.1.1 and 1.13).
(e) x ∈ 0 ↔ x ∈ x (see Ap.1.13).
(g) 0 ∈ x ↔ ∃y(y = 0 ∧ y ∈ x).
It follows from 2.1(1), (b) and (c) that every atomic formula is a Σ-formula. (E.g.,
z = (x / u) / y is equivalent to ∃v(z = v / y ∧ v = x / u).)
Our way of coding terms (in Section 1) implies that, to code a longer term, one has to
perform a sequence of coding operations, applied to its sub-terms. The same observation
applies to coding formulas. Thus it is clear that formulas such as Term(x) or Form(x)
(saying that x is the code of a term or a formula) must involve the sub-formulas Seq(s, k),
LstSeq(s, k, y) described in the next lemma. We need to know that these are Σ-formulas.
Lemma 2.3. Each of the following formulas is a Σ-formula: Ord(x), Seq(s, k), Seq(s, k, y);
their respective meanings are:
(a) Ord(x): x is an ordinal number ,
(b) Seq(s, k): s is a sequence of length k > 0 (i.e., s is a function whose domain is
the ordinal number k > 0),
(c) LstSeq(s, k, y): s is a sequence of length k > 0 and s terminates (ends) with y.
Proof.
(a) Ord(x) ↔ ∀(y ∈ x){[y ⊆ x] ∧ ∀(z ∈ y)[z ⊆ y]}
(see 2.2 and Ap.2.1).
(b) Seq(s, k) ↔ Ord(k) ∧ 0 ∈ k ∧ ∀(n ∈ k)∃u[hn, ui ∈ s] ∧
∀(y, z ∈ s)∃(m, n ∈ k)∃u∃v[y = hm, ui ∧ z = hn, vi ∧ (m 6= n ∨ u = v)].
(c) LstSeq(s, k, y) ↔ Seq(s, k) ∧ ∃n(n / n = k ∧ hn, yi ∈ s) (see 2.2).
In (b) the sub-formula m 6= n ∨ u = v replaces the implication m = n → u = v (which
is not allowed in a strict Σ-formula). Observe that m 6= n is a Σ-formula if m, n are
ordinals, as it is equivalent to m ∈ n ∨ n ∈ m (see Ap.2.5).
In Ap.6 we shall construct the standard model S of HF. Its elements are all the
sets that can be written down using only { , } and 0. For example: 0, {0}, {0, {0}},
{0, {{{0}}}}. Each of these is represented by a constant term, but in general, not uniquely
(e.g., the terms (x / y) / z and (x / z) / y are provably equal in HF).
A Σ-sentence is a Σ-formula that is a sentence. The key property of these sentences
is that any true Σ-sentence is provable. Here “true” means “valid in the standard model”
S of HF and provability is in HF. In order to establish this property, we need the class
of Σ/ -formulas:
Definition 2.4. By a Σ/ -formula we shall mean a formula that is made up of terms, the
symbols 0, =, ∈, ∨, ∧, ∃xi and ∀(xi ∈ τ ), where τ is any term that does not contain xi .
For a Σ/ -formula α, its length λ(α) is defined to be the total number of occurrences of
the symbols ∨, ∧, ∃ and ∀.
Let us observe that the length of a Σ/ -formula remains unchanged if a variable occur-
ring in the formula is replaced by a term (assuming that after this replacement we again
get a formula). Evidently, every strict Σ-formula is a Σ/ -formula.
Theorem 2.5. For every Σ-sentence α,
(§) S α ⇒ ` α.
Proof. Since every strict Σ-sentence is a Σ/ -sentence, it will suffice to prove (§) for every
Σ/ -sentence α. We proceed by induction on the length λ(α).
Suppose λ(α) = 0. As all logical operators (i.e., ¬, →, ∨, ∧, ∃, ∀) are then excluded
from α, α must be of the form σ ∈ τ or σ = τ , where σ, τ ∈ C. For a sentence of this
kind, (§) follows directly from the definition of ∈ and = in the structure S (see Ap.6.6).
Now suppose (§) has been shown for all Σ/ -sentences β (in place of α) such that
λ(β) < λ(α). We deduce that then (§) holds for the Σ/ -sentence α, by considering the
following cases:
• α is β ∨ γ. In this case β and γ are Σ/ -sentences and, by the inductive assumption,
S α ⇒ (S β or S γ) ⇒ (` β or ` γ) ⇒ ` β ∨ γ ⇒ ` α.
• α is β ∧ γ. The reasoning is analogous.
• α is ∃xβ(x). If S α then S β(τ ) for some τ ∈ C. Since λ(β(τ )) = λ(β) < λ(α),
the inductive assumption yields
S β(τ ) ⇒ ` β(τ ) ⇒ ` ∃xβ(x)
(by the rules of logic). Thus S α ⇒ ` α.
• α is ∀(x ∈ τ )β(x). Since there are no free variables in α and x does not occur in τ ,
we must have τ ∈ C. Thus, by Ap.6.2, there are τ1 , . . . , τm ∈ C such that
(§§) ` ∀(x ∈ τ )β(x) ↔ β(τ1 ) ∧ . . . ∧ β(τm ).
Since λ(β(τj )) = λ(β(x)) < λ(α) for all j, we can apply the inductive assumption,
thus getting:
S α ⇒ S β(τ1 ) and . . . and S β(τm ) ⇒
` β(τ1 ) and . . . and ` β(τm ) ⇒ ` β(τ1 ) ∧ . . . ∧ β(τm ) ⇒ ` α,
by (§§).
3. An HF-description of relations between formulas
In this section we cover the more tedious part of the work needed for producing the
Σ-formula Pf(x). This amounts to a direct verification that all basic relations between
variables, terms and formulas that underlie the idea of a proof can be described by
means of Σ-formulas “talking” only about codes. All of these 25 formulas will become
sub-formulas of Pf(x).
We shall introduce each of the formulas 1–25 below by stating first its intended
meaning (in the meta-theory of HF) and then writing the formula itself. If any of the
formulas Seq(s, k), LstSeq(s, k, t) (both introduced above) is used then, for any m ∈ k
(i.e., m < k), we shall write ϕ(sm ) to abbreviate ∃y[hm, yi ∈ s ∧ ϕ(y)] (see Ap.3.2).
Hence, if ϕ is a Σ-formula, then so is ϕ(sm ).
1. Var(x)
Means: x is the code of a variable.
Ord(x) ∧ 0 ∈ x.
2. SeqTerm(s, k, t)
Means: s is a sequence of length k and s ends with the code t of a term.
LstSeq(s, k, t) ∧ ∀(l ∈ k){sl = 0 ∨ Var(sl ) ∨ ∃(m, n ∈ l)[sl = hp/q, sm , sn i]}.
3. Term(t)
Means: t is the code of a term.
∃k∃s SeqTerm(s, k, t).
4. Const(t)
Means: t is the code of a constant term.
∃k∃s[LstSeq(s, k, t) ∧ ∀(l ∈ k){sl = 0 ∨ ∃(m, n ∈ l)[sl = hp/q, sm , sn i]}].
5. NecSeqTerm(s, k, t)
Means: s is a sequence of length k whose last element t is the code of a term and
all terms whose codes appear in s (as some sl ) are sub-terms of the term coded
by t (i.e., each sl is necessary for the formation of the code t).
SeqTerm(s, k, t) ∧ ∀(l ∈ k){l+ = k ∨
∃(n, m ∈ k)[(l ∈ m) ∧ (sm = hp/q, sl , sn i ∨ sm = hp/q, sn , sl i)]}.
In other words, each sl , except the last, is used in creating some sm further on.
Lemma 3.1. ` Term(t) → ∃k∃s NecSeqTerm(s, k, t).
Proof. Assume Term(t), i.e., SeqTerm(s, k, t) for some sequence s of length k. If there is
an l ∈ k such that k 6= l (i.e., l is not the predecessor of k) and sl is not needed further
on in s (i.e., no sm with l ∈ m is of the form sm = hp/q, sn , sl i or sm = hp/q, sl , sn i)
then, omitting sl , we obtain a sequence z of length k such that SeqTerm(z, k, t). Thus,
for the shortest s (smallest k) such that SeqTerm(s, k, t), we have NecSeqTerm(s, k, t).
6. VarOccTerm(v, t)
Means: v codes a variable, t codes a term and the variable coded by v occurs in
the term coded by t.
Var(v) ∧ ∃k∃s[NecSeqTerm(s, k, t) ∧ ∃(l ∈ k)(sl = v)].
7. VarNonOccTerm(v, t)
Means: v codes a variable, t codes a term and the variable coded by v does not
occur in the term coded by t.
Var(v) ∧ ∃k∃s{NecSeqTerm(s, k, t) ∧
∀(l ∈ k)[sl = 0 ∨ (Var(sl ) ∧ (sl 6= v)) ∨ ∃(n, m ∈ l)[sl = hp/q, sn , sm i]]}.
Note. Requiring NecSeqTerm(s, k, t) in 7 instead of SeqTerm(s, k, t) (obviously the lat-

ter would do) does not introduce any restrictions. Indeed, the proof of 3.1 shows that if
there is a sequence s of some length k which terminates with t and satisfies the require-
ments of 7 without the “Nec” part, then there is also such a sequence (obtained from
the previous one by pruning) where all sl are “necessary” in the sense of satisfying the
requirement NecSeqTerm.
Let us observe that 6, 7 and 3.1 imply
` Term(t) ∧ Var(v) → VarOccTerm(v, t) ∨ VarNonOccTerm(v, t).
8. At(y)
Means: y is the code of an atomic formula.
∃u∃t[Term(u) ∧ Term(t) ∧ (y = hp=q, u, ti ∨ y = hp∈q, u, ti)].
9. VarOccAt(v, y)
Means: v codes a variable, y codes an atomic formula and the variable coded
by v occurs in the atomic formula coded by y.
∃u∃t[Term(u) ∧ Term(t) ∧ (y = hp=q, u, ti ∨ y = hp∈q, u, ti) ∧
(VarOccTerm(v, u) ∨ VarOccTerm(v, t))].
10. VarNonOccAt(v, y)
Means: v codes a variable, y codes an atomic formula and the variable coded
by v does not occur in the atomic formula coded by y.
∃u∃t[VarNonOccTerm(v, u) ∧ VarNonOccTerm(v, t) ∧
(y = hp=q, u, ti ∨ y = hp∈q, u, ti)].
11. MakeForm(u, w, v, y)
Means: v codes a variable and if u, w are codes of formulas, then y is the code
of a formula created from one or both formulas coded by u, w and possibly the
variable coded by v by a single application of ∨, ¬ or ∃.
Var(v) ∧ [y = hp∨q, u, wi ∨ y = hp∨q, w, ui ∨ y = hp¬q, ui ∨ y = hp∃q, v, ui].
12. SeqForm(s, k, y)
Means: s is a sequence of length k whose last element y is the code of a formula.
LstSeq(s, k, y) ∧ ∀(n ∈ k)[At(sn ) ∨ ∃(m, l ∈ n)∃v MakeForm(sm , sl , v, sn )].
13. Form(y)
Means: y is the code of a formula.
∃k∃s SeqForm(s, k, y).
14. NonAt(y)
Means: y is the code of a non-atomic formula.
Form(y) ∧ ∃u∃w∃v MakeForm(u, w, v, y).
15. VarTopForm(v, y)
Means: y is the code of a formula beginning with ∃xi , where pxi q = v.
Var(v) ∧ ∃w[Form(w) ∧ y = hp∃q, v, wi].
16. SeqVarTopForm(v, z, n, y)
Means: v codes a variable, y codes a formula and z is a sequence of length n
showing how the formula coded by y is made up of formulas beginning with ∃xi ,
where pxi q = v, and of atomic formulas, by using ∨, ¬ and ∃xj , where j 6= i.
Var(v) ∧ LstSeq(z, n, y) ∧ ∀(r ∈ n)[At(zr ) ∨ VarTopForm(v, zr ) ∨
∃(m, l ∈ r)∃u(Var(u) ∧ u 6= v ∧ MakeForm(zm , zl , u, zr ))].
17. NecSeqVarTopForm(v, z, n, y)
Means: As in 15, but with the additional requirement that all formulas whose
codes appear in z (as some zn ) are sub-formulas of the formula coded by y (i.e.,
each zn is necessary for the formation of the code y).
SeqVarTopForm(v, z, n, y) ∧
∀(r ∈ n){r + = n ∨ ∃(l, m ∈ n)∃u[(r ∈ m) ∧ MakeForm(zr , zl , u, zm )]}.
Lemma 3.2. ` Var(v) ∧ Form(y) → ∃n∃z NecSeqVarTopForm(v, z, n, y).
Proof. Suppose Var(v) and Form(y). Then there is a sequence s of length k such that
SeqForm(s, k, y). We now remove from s all sl for which VarTopForm(v, sl ) and place
these, possibly in the order of their first original appearance in s, at the beginning of a new
sequence z. The remainder of z is composed of those sl which were not removed from s.
Let n be the length of z (actually n = k but we do not need this fact). One checks that
SeqVarTopForm(v, z, n, y). If n is the smallest ordinal for which there is a z satisfying this
latter formula then, as in the proof of 3.1, we see that NecSeqVarTopForm(v, z, n, y).
18. VarOccFreeForm(v, y)
Means: v codes a variable, y codes a formula and the variable coded by v has a
free occurrence in the formula coded by y.
∃n∃z{NecSeqVarTopForm(v, z, n, y) ∧ ∃(r ∈ n)[At(zr ) ∧ VarOccAt(v, zr )]}.
19. VarNonOccFreeForm(v, y)
Means: v codes a variable, y codes a formula and the variable coded by v has
no free occurrence (is bound) in the formula coded by y.
∃n∃z{NecSeqVarTopForm(v, z, n, y) ∧
∀(r ∈ n)[(At(zr ) ∧ VarNonOccAt(v, zr )) ∨ NonAt(zr )]}.
Note. Choosing NecSeqVarTopForm instead of SeqVarTopForm in 19 (clearly the latter
would do) does not introduce any restrictions. (We justify this similarly to the Note
following 7.)
From 3.2 and the implication preceding 8 we see that
` Var(v) ∧ Form(y) → VarOccFreeForm(v, y) ∨ VarNonOccFreeForm(v, y).
20. TermSubsVarForm(t, v, y)
Means: t codes a term, v codes a variable, y codes a formula and the term coded
by t is substitutable for the variable coded by v in the formula coded by y.
Term(t) ∧ ∃n∃z{SeqVarTopForm(v, z, n, y) ∧
∀(r ∈ n)[At(zr ) ∨ ∃(m, l ∈ r)∃u{zr = hp∨q, zm , zl i ∨ zr = hp¬q, zm i ∨
[zr = hp∃q, u, zm i ∧ (VarNonOccFreeForm(v, zm ) ∨
(VarOccFree Form(v, zm ) ∧ VarNonOccTerm(u, t)))]}]}.
21. SeqRepVarTermTerm(s, s0 , k, v, t, u, u0 )
Means: v codes a variable, t codes a term, u and u0 are coding terms and s and s0
are sequences of length k such that s ends with u and s0 ends with u0 and the
term coded by u0 is obtained from that coded by u by replacing each occurrence
of the variable coded by v by the term coded by t.
Var(v) ∧ Term(t) ∧ LstSeq(s, k, u) ∧ LstSeq(s0 , k, u0 ) ∧
∀(n ∈ k)[((sn = v) ∧ (s0n = t)) ∨ (Var(sn ) ∧ (sn 6= v) ∧ (s0n = sn )) ∨
((sn = 0) ∧ (s0n = 0)) ∨ ∃(m, l ∈ n)[sn = hp/q, sm , sl i ∧ s0n = hp/q, s0m , s0l i]].
22. RepVarTermTerm(v, t, u, u0 )
Means: v codes a variable, t codes a term, u and u0 code terms and replacing
in the term coded by u the variable coded by v, at all of its occurrences, by the
term coded by t one obtains the term coded by u0 .
∃k∃s∃s0 SeqRepVarTermTerm(s, s0 , k, v, t, u, u0 ).
23. RepVarTermAt(v, t, y, y 0 )
Means: v codes a variable, t codes a term, y and y 0 are coding atomic formulas
and replacing the variable coded by v by the term coded by t, at all occurrences
of this variable in the atomic formula coded by y, one gets the atomic formula
coded by y 0 .
∃u∃u0 ∃w∃w0 {RepVarTermTerm(v, t, u, u0 ) ∧ RepVarTermTerm(v, t, w, w 0 ) ∧
[(y = hp=q, u, wi ∧ y 0 = hp=q, u0 , w0 i) ∨ (y = hp∈q, u, wi ∧ y 0 = hp∈q, u0 , w0 i)]}.
24. SeqRepVarTermForm(s, s0 , k, v, t, y, y 0 )
Means: v codes a variable, t codes a term, y and y 0 code formulas and s and
s0 are sequences of length k such that s ends with y and s0 ends with y 0 and
the formula coded by y 0 is obtained from that coded by y by replacing each free
occurrence of the variable coded by v by the term coded by t.
Var(v) ∧ LstSeq(s, k, y) ∧ LstSeq(s0 , k, y 0 ) ∧
∀(l ∈ k){[At(sl ) ∧ RepVarTermAt(v, t, sl , s0l )] ∨ ∃(m, n ∈ l)∃u
[(sl = hp∃q, v, sm i ∧ s0l = sl ) ∨ (sl = hp∨q, sm , sn i ∧ s0l = hp∨q, s0m , s0n i) ∨
(Var(u) ∧ u 6= v ∧ sl = hp∃q, u, sm i ∧ s0l = hp∃q, u, s0m i) ∨
(sl = hp¬q, sm i ∧ s0l = hp¬q, s0m i)]}.
25. RepVarTermForm(v, t, y, y 0 )
Means: v codes a variable, t codes a term, y and y 0 code formulas and replacing
in the formula coded by y each free occurrence of the variable coded by v by the
term coded by t, one obtains the formula coded by y 0 .
∃k∃s∃s0 SeqRepVarTermForm(s, s0 , k, v, t, y, y 0 ).
4. The formula Pf
A formula ϕ is a theorem of HF (in symbols: HF ` ϕ or ` ϕ) if there is a sequence

of formulas, terminating with ϕ, where each formula in the sequence is either an axiom
of HF or a logical axiom or is derived from one or two preceding formulas by a rule of
inference (see 4.1 and 4.2 below). We begin this section by recalling the logical axioms
and rules of inference for a first-order theory, as applied to HF. Next, these axioms and
rules are expressed by Σ-formulas and codes. This leads to a Σ-formula Pf(x) such that
` ϕ iff ` Pf(pϕq) for every formula ϕ, moreover ` α → Pf(pαq) for every Σ-sentence α.
It is easily verified that the description of the logic of HF given below is equivalent
(in the sense of yielding the same theorems) to the description of the logic of a first-order
theory in [Sh] (when applied to HF).
Definition 4.1. The logical axioms are the following formulas (where ϕ → ψ abbreviates
¬ϕ ∨ ψ):
Sentential (Boolean) Axioms: For any formulas ϕ, ψ, µ:
ϕ → ϕ,
ϕ → ϕ ∨ ψ,
ϕ ∨ ϕ → ϕ,
ϕ ∨ (ψ ∨ µ) → (ϕ ∨ ψ) ∨ µ,
(ϕ ∨ ψ) ∧ (¬ϕ ∨ µ) → ψ ∨ µ.
Specialization Axioms: For every formula ϕ and every xi :
ϕ → ∃xi ϕ.
Equality Axioms:
x1 = x 1 ,
(x1 = x2 ) ∧ (x3 = x4 ) → [(x1 = x3 ) → (x2 = x4 )],
(x1 = x2 ) ∧ (x3 = x4 ) → [(x1 ∈ x3 ) → (x2 ∈ x4 )],
(x1 = x2 ) ∧ (x3 = x4 ) → [x1 / x3 = x2 / x4 ].
Definition 4.2. The rules of inference are as follows:
ϕ, ϕ → ψ
Modus Ponens: .
ψ
ϕ
Substitution: for any term τ that is substitutable for xi in ϕ.
ϕ(xi /τ )
ϕ→ψ
∃-introduction: provided xi does not occur free in ψ.
∃xi ϕ → ψ
For each of these three rules we say that the formula written below the line is derived by
that rule from the formula(s) written above the line.
In order to write the formula Pf, it will be convenient to have short suggestive names
for certain terms (see the coding of formulas in Section 1):
Definition 4.3.
Neg(x) = hp¬q, xi,
Disj(x, y) = hp∨q, x, yi,
Impl(x, y) = Disj(Neg(x), y),
Conj(x, y) = Neg(Disj(Neg(x), Neg(y)),

Exi(x, y) = hp∃q, x, yi,
All(x, y) = Neg(Exi(x, Neg(y)).
Keeping the style adopted for presenting the 25 Σ-formulas in Section 3, we now list
nine Σ-formulas which describe the codes of axioms and the rules of inference. To begin,
let us denote by c1 , . . . , c6 the constant terms which code the axioms HF1, HF2 and the
four Equality Axioms in 4.1. The nine Σ-formulas are as follows.
(1) Ax(y)
Means: y is the code of an individual axiom.
y = c1 ∨ y = c2 ∨ . . . ∨ y = c 6 .
(2) Ind(x)
Means: x is the code of an instance of the Induction Scheme HF3, i.e., of
α(0) ∧ ∀x1 ∀x2 [α(x1 ) ∧ α(x2 ) → α(x1 / x2 )] → ∀x1 α(x1 ),
where α is any formula such that x2 is substitutable for x1 in α (see Note below).
∃y{Form(y) ∧ TermSubsVarForm(px2 q, px1 q, y) ∧
∃y 0 ∃y 00 ∃y 000 [RepVarTermForm(px1 q, 0, y, y 0 ) ∧
RepVarTermForm(px1 q, px2 q, y, y 00 ) ∧
RepVarTermForm(px1 q, hp/q, px1 q, px2 qi, y, y 000 ) ∧ x =
Impl(Conj(y 0 , All(px1 q, All(px2 q, Impl(Conj(y, y 00 ), y 000 )))), All(px1 q, y))]}.
Note. To check the above, imagine that, for some α,

y = pα(x1 )q, y 0 = pα(0)q, y 00 = pα(x2 )q, y 000 = pα(x1 / x2 )q.
(3) Sent(x)
Means: x codes a formula that is an instance of one of the five Sentential Axioms
(see 4.1).
∃y∃z∃w{Form(y) ∧ Form(z) ∧ Form(w) ∧
[x = Impl(y, y) ∨
x = Impl(y, Disj(y, z)) ∨
x = Impl(Disj(y, y), y) ∨
x = Impl(Disj(y, Disj(z, w)), Disj(Disj(y, z), w) ∨
x = Impl(Conj(Disj(y, z), Disj(Neg(y), w)), Disj(z, w))]}.
(4) Spec(x)
Means: x codes a Specialization Axiom (see 4.1).
∃y∃v[Form(y) ∧ Var(v) ∧ x = Impl(y, Exi(v, y))].
(5) ModPon(x, y, y 0 )
Means: If x, y, y 0 are codes of formulas then y 0 codes a formula which follows
from the formulas coded by x and y by an application of the Modus Ponens Rule
(in 4.2).
y = Impl(x, y 0 ).
(6) Subst(y, y 0 )
Means: y, y 0 are codes of formulas and the formula coded by y 0 is obtained from
that coded by y by an application of the Substitution Rule (in 4.2).
∃v∃t[TermSubsVarForm(t, v, y) ∧ RepVarTermForm(v, t, y, y 0 )].
(7) ∃intro(y, y 0 )
Means: y codes a formula and y 0 codes the formula obtained from that coded
by y by an application of the ∃-introduction rule (in 4.2).
∃u∃w∃v[Form(u) ∧ VarNonOccFreeForm(v, w) ∧
y = Impl(u, w) ∧ y 0 = Impl(Exi(v, u), w)].
(8) Prf(s, k, y)
Means: s is a sequence of length k of codes of formulas such that the corresponding
sequence of formulas is a proof of the formula coded by y.
LstSeq(s, k, y) ∧ ∀(n ∈ k)[Ax(sn ) ∨ Ind(sn ) ∨ Sent(sn ) ∨ Spec(sn ) ∨
∃(m, l ∈ n)[ModPon(sm , sl , sn ) ∨ Subst(sm , sn ) ∨ ∃ intro(sm , sn )]].
(9) Pf(y)
Means: y codes a formula which is a theorem of ZF.
∃k∃s Prf(s, k, y).
Proposition 4.4 (Proof formalization condition). For every formula ϕ,
`ϕ iff ` Pf(pϕq).
Proof. Assume ` ϕ. Then there is a sequence of formulas ϕ0 , . . . , ϕp that constitutes a
proof of ϕ. If s = {hl, pϕl qi : l ∈ p+ }, then s ∈ S (= the universe of the standard model S;
see Ap.6.6) and S Prf(s, p+ , pϕq), so that S Pf(pϕq). Since Pf(pϕq) is a Σ-sentence,
we conclude, by 2.5, that ` Pf(pϕq).
Now assume ` P f (pϕq). Hence S ∃k∃s Prf(s, k, pϕq). This means that there is a
sequence s ∈ S of length k, where each sl codes some formula ϕl , so that ϕk is ϕ and the
sequence of formulas (ϕ0 , . . . , ϕk ) constitutes a proof of ϕ. Thus ` ϕ.
5. Phantom terms, functions and formulas
As commonly defined, we call f a function if all elements of f are ordered pairs and
hu, wi ∈ f ∧ hu, zi ∈ f → w = z.
Associated to a function are two sets: the domain dom(f ) and the range rng(f ). Functions
can be obtained also from the function symbols of the language, in our case, from /. For
example, x / x determines a function once we select some set for the domain.
More generally, functions can be obtained from certain formulas. Thus let ϕ be a
formula with exactly n + 1 free variables (n ≥ 0) and let y be one of these variables.
Suppose that ∃!yϕ is a theorem of HF (“∃!” means: “∃ a unique”). Then we shall associate
with ϕ and y a new symbol Fϕy and we shall call it an n-ary p-function symbol, where
“p” might stand for “phantom” to stress that we do not wish to consider Fϕy as part
of the language of HF (in [Bo], “p” stands for “pseudo”). We shall call ϕ the defining
formula for Fϕy . A term Fϕy (z1 , . . . , zn ) (of the language of HF extended by the p-function
symbol Fϕy ) will be called a p-term. We would like to write formulas including p-terms,
however these will be always regarded as representing certain formulas of HF. Thus let
xi1 , . . . , xin be the other free variables in ϕ, where i1 < . . . < in . Then the representation
rule is as follows: If β(z) is an atomic formula of HF then
β(Fϕy (z1 , . . . , zn )) represents any formula ∃w[ϕ(z1 , . . . , zn , w) ∧ β(w)],
where w is a variable substitutable for y in ϕ, moreover each zk is substitutable for xik
in ϕ. Thus α(Fϕy (z1 , . . . , zn )) represents many formulas, however these are equivalent to
each other in HF. The above rule is extended in an obvious way to all formulas containing
the symbol Fϕy so that the representation commutes with the logical connectives and
quantifiers (for details see Ap.3).
Only capital letters will be used for p-function symbols and we shall omit the suffix ϕ
and exponent y when circumstances permit. Also, we shall sometimes say p-function to
mean a p-function symbol Fϕy , or even its whole definition. Several p-functions will be in-
troduced below. Then by a p-term we shall mean any term of the extended language, i.e., a
term formed from the variables, 0, the function symbol / and the p-function symbols. For-
mulas containing such terms will be called p-formulas. The above abbreviation principle
can be naturally extended to any p-formula which then is easily seen to be an abbrevia-
tion of a (not unique) formula of HF. By a slight abuse of language we shall occasionally
say that a p-formula α is a theorem of HF to mean that α abbreviates such a theorem.
Our goal in this section is to prepare the definition, for every i, of a p-function K such
that for every formula γ(xi ), the p-formula K(pγq) = pγ(pγq)q is a theorem of HF.
Proposition 5.1. There exists a ternary p-function, denoted here by REPL, such that
` REPL(pxi q, pτ q, pϕq) = pϕ(xi /τ )q
for every variable xi , term τ and formula ϕ [where ϕ(xi /τ ) results from replacing each
free occurrence of xi in ϕ by τ ].
(We are not concerned at present with the issue of substitutability.) As the defining
formula ϕ(v, t, y, y 0 ) for REPL we choose the formula:
[(∃!z) RepVarTermForm(v, t, y, z) ∧ RepVarTermForm(v, t, y, y 0 )] ∨
[¬(∃!z) RepVarTermForm(v, t, y, z) ∧ y 0 = 0].
It is clear that (∃!y 0 )ϕ(v, t, y, y 0 ). So, for the reader who found our description of the
meaning of formula 25 (Section 3) wholly convincing, the above proposition requires no
further proof. For the more meticulous, we suggest checking through Lemmas 5.2–5.4.
Lemma 5.2. For every term µ,
` RepVarTermTerm(pxi q, pτ q, pµq, u0 ) ↔ u0 = pµ(xi /τ )q.
(Here µ(xi /τ ) is the term obtained from µ by replacing each occurrence of xi by τ .)
Proof. It will be enough to show that for any µ,
(1) ` RepVarTermTerm(pxi q, pτ q, pµq, u0 ) → u0 = pµ(xi /τ )q,
(2) ` RepVarTermTerm(pxi q, pτ q, pµq, pµ(xi /τ )q).
Proof of (1). It will suffice to show (see 22) that

(3) SeqRepVarTermTerm(s, s0 , k, pxi q, pτ q, pµq, u0 ) → u0 = pµ(xi /τ )q
is a theorem of HF. We keep the variable xi and term τ fixed and proceed by induction
with respect to the length (number of appearances of /) of µ. It is clear from 21 that
if µ is shortest possible, i.e., µ is 0 or a variable xj , then u0 is either 0 or pxj q (if j 6= i)
or pτ q (if j = i) as required. For the inductive step one assumes that µ = σ / ν in (3).
Then, assuming the left side of (3), we see from 21 that pσq = sm , pνq = sn for some
m, n ∈ k and
pµq = hp/q, sm , sn i, u0 = hp/q, s0m , s0n i.
If in (3) we replace s by the initial segment of s terminating with sm , and s0 by the initial
segment of s0 terminating with s0m , and µ by σ, then, by the inductive hypothesis (σ being
shorter than µ), we get s0m = pσ(xi /τ )q. Similarly, s0n = pν(xi /τ )q, so that finally
u0 = hp/q, s0m , s0n i = hp/q, pσ(xi /τ )q, pν(xi /τ )qi = pσ(xi /τ ) / ν(xi /τ )q = pµ(xi /τ )q.
Proof of (2). By 22 and (3), it will suffice to find a constant (term) c such that
(4) ` ∃k∃s∃s0 SeqRepVarTermTerm(s, s0 , k, pxi q, pτ q, pµq, c).
Now, for the given term µ, one can explicitly exhibit a sequence s, of some length k, of
constant terms such that S SeqTerm(s, k, pµq). Then formula 21 tells us exactly how
to find a sequence s0 of constant terms, terminating with some constant c, for which
S SeqRepVarTermTerm(s, s0 , k, pxi q, pτ q, pµq, c).
The last formula is a Σ-sentence, hence it is a theorem of HF (see 2.5). Thus (4) follows.
Lemma 5.3. If ϕ is an atomic formula, then
` RepVarTermAt(pxi q, pτ q, pϕq, y) ↔ y = pϕ(xi /τ )q.
Proof. This is a direct consequence of 5.2 and 23.
Lemma 5.4. For every formula ϕ,
` RepVarTermForm(pxi q, pτ q, pϕq, y 0 ) ↔ y 0 = pϕ(xi /τ )q.
(Just now we leave aside the issue of substitutability of τ for xi .)
Proof. Very similar to the proof of 5.2. Instead of (1), (2), we now have to establish
(10 ) ` RepVarTermForm(pxi q, pτ q, pϕq, y 0 ) → y 0 = pϕ(xi /τ )q,
(20 ) ` RepVarTermForm(pxi q, pτ q, pϕq, pϕ(xi /τ )q).
The proof of (1 ) reduces to showing (see 25)
0
(30 ) ` SeqRepVarTermForm(s, s0 , k, px1 q, pτ q, pϕq, y 0 ) → y 0 = pϕ(xi /τ )q.

To show this we use induction on the length of ϕ. (Here by “length” we mean the number
of occurrences of ∨, ¬ and ∃.) If ϕ is atomic (i.e., of length 0), then (30 ) is a direct
consequence of 24 and 5.3. For the inductive step, one has to consider four possibilities
for ϕ: α ∨ β, ¬α, ∃xi α, and ∃xj α, where j 6= i. In each of these cases the inductive
hypothesis is applied (to the shorter formula α or β) similarly to the proof of (3) above.
The proof of (20 ) is entirely analogous to that of (2) above.
6. Gödel’s First Incompleteness Theorem
In this section we wish to prove Gödel’s Diagonal Lemma and then his First Incomplete-
ness Theorem. To start with, we introduce two unary p-functions W, H that describe the
formation of certain codes. Since a code (of a variable, a term or a formula) is a constant
term, that code can be coded again. H will do this, i.e., we shall have H(µ) = pµq,
whenever µ is already a code. W is just the restriction of H to the codes of variables (i.e.,
to the ordinals), however we begin with W .
Recall that px1 q is 0 / 0, i.e. 0+ , and in general, pxk+ q is pxk q / pxk q, i.e. (pxk q)+ ,
for every ordinal k. The predecessor of a non-zero ordinal x is x.
Lemma 6.1. There is a p-function W such that
` W (pxk q) = p(pxk q)q
for every variable xk .
Proof. Define W recursively on ordinals by (see Ap.3.4):

0 if x = 0 or x is not an ordinal,
W (x) =
hp/q, W (x), W (x)i if x is a non-zero ordinal.
By this definition,
W (px1 q) = W (0+ ) = hp/q, W (0), W (0)i = hp/q, 0, 0i = p0 / 0q = p(px1 q)q,
as required. Assuming that W (pxk q) = p(pxk q)q, we get
W (pxk+ q) = W ((pxk q)+ ) = hp/q, W (pxk q), W (pxk q)i = hp/q, p(pxk q)q, p(pxk q)qi
= p(pxk q / pxk q)q = p(pxk+ q)q.
Recall that all codes (as described in Section 1) belong to the family of terms Γ
defined in 1.1.
Lemma 6.2. There is a p-function H such that H(µ) = pµq for every µ in Γ .
Proof. According to the definition of an ordered pair,
hx, yi = {{x}, {x, y}} = {0 / x, (0 / x) / y} = (0 / (0 / x)) / ((0 / x) / y).
Thus, for any terms σ, τ ,
phσ, τ iq = p(0 / (0 / σ)) / ((0 / σ) / τ )q
= hp/q, hp/q, 0, hp/q, 0, pσqii, hp/q, hp/q, 0, pσqi, pτ qii.
Let t(x, y) be the term
hp/q, hp/q, 0, hp/q, 0, xii, hp/q, hp/q, 0, xi, yii.
Thus t(pσq, pτ q) is phσ, τ iq for any terms µ, σ. We now define H(x) recursively (see
Ap.4.9) as
(
W (x) if x is an ordinal,
H(x) = t(H(u), H(v)) if x = hu, vi for some u, v,
0 in all other cases.
The definition makes sense because no ordered pair is an ordinal (0 ∈ x for every non-zero
ordinal x, whereas 0 6∈ hu, vi for all u, v). Since x = hu, vi → u, v ∈ cl(x), the requirements
of Ap.4.9 are satisfied.
Now suppose µ ∈ Γ . If µ = 0 or µ is the code of a variable, then H(µ) = W (µ) = pµq
because the first line of the definition of H applies and we can use 6.1. All other terms µ
in Γ are generated from 0 and from the codes of variables by (repeatedly) forming ordered
pairs. Since for µ = hσ, τ i the terms σ, τ are shorter than µ, we may proceed by induction
and assume that H(σ) = pσq, H(τ ) = pτ q. Then, by the definition of t(x, y),
pµq = phσ, τ iq = t(pσq, pτ q) = t(H(σ), H(τ )) = H(µ),
as we wished to show.
It is clear that no unary p-function H can satisfy H(τ ) = pτ q for all constant terms τ .
Indeed, if this were so, the terms σ = (0 / µ) / ν and τ = (0 / ν) / µ, where µ, ν are
different constant terms, would have provably equal codes, which is easily seen not to be
the case.
Let us agree that when a formula ϕ is written as ϕ(xi ) then, for any term τ , we shall
write ϕ(τ ) to mean ϕ(xi /τ ).
Lemma 6.3. For every variable xi , there exists a p-function K such that, for every for-
mula ϕ(xi ),
` K(pϕq) = pϕ(pϕq)q.
Proof. Consider the Replacement Function REPL defined in Section 5. Noting that the
composite of p-functions is a p-function, we put
K(x) = REPL(pxi q, H(x), x).
By 5.1, for any formula ϕ(xi ) and any term τ one has
` REPL(pxi q, pτ q, pϕq) = pϕ(τ )q.
Let τ = pϕq and recall that pτ q = H(τ ) in this case (by 6.2). Then
` REPL(pxi q, H(pϕq), pϕq) = pϕ(pϕq)q,
i.e. ` K(pϕq) = pϕ(pϕq)q.
Theorem 6.4 (Gödel’s Diagonal Lemma). For every formula α(xi ), there is a formula δ
such that
` δ ↔ α(pδq).
Proof. We replace the variable xi in α by the p-term K(xi ), and we denote by β(xi ) the
resulting formula. Thus
` β(xi ) ↔ α(K(xi )).
Then, by 6.3,
` β(pβq) ↔ α(p(pβq)q)).
Thus a formula δ satisfying the claim of the Diagonal Lemma is β(pβq).
Theorem 6.5 (Gödel’s First Incompleteness Theorem). If HF is a consistent theory,
then there is a sentence δ such that neither HF ` δ nor HF ` ¬δ. Moreover , S δ.
Proof. Consider the p-formula ¬ Pf(x1 ). By the Diagonal Lemma, there is a sentence δ
such that
(4) ` δ ↔ ¬ Pf(pδq).
(a) Suppose ` δ. Then ` Pf(pδq), by 4.4. On the other hand, from (4) follows
` ¬ Pf(pδq). This contradicts the assumption of consistency.
(b) Suppose ` ¬δ. Then, by (4), ` Pf(pδq) and hence ` δ, by 4.4. This, together with
the assumed ` ¬δ, contradicts the consistency assumption.
(c) Suppose non[S δ]. Then S ¬δ, and hence S Pf(pδq), by (4). So ` δ,
by 4.4, contradicting non[S δ].
7. Pseudo-coding
Our next (and last) goal is to prove Gödel’s Second Incompleteness Theorem, i.e., 9.8.
A glance at its proof in Section 9 will show that at this moment it hinges only on the
Hilbert–Bernays derivability condition
` Pf(pδq) → Pf(pPf(pδq)q),
albeit only for the sentence δ satisfying (4) in 6.5. In what follows, the above implication
will be obtained as a special case of α → Pf(pαq), which we show to be a theorem for every
Σ-sentence α. (Clearly Pf(pδq) is a Σ-sentence.) Often results are proved by induction
on the length of a formula, but in this case induction does not work: α → Pf(pαq) is not
a theorem for every Σ-formula α. Indeed, we can take α to be x1 = x2 . Then, accepting
as a theorem x1 = x2 → Pf(px1 = x2 q), we would quickly reach a contradiction: Since
x2 does not occur in Pf(px1 = x2 q) (i.e., in Pf(hp=q, 0 / 0, (0 / 0) / (0 / 0)i), we would
conclude ` x1 = x1 → Pf(px1 = x2 q), hence ` Pf(px1 = x2 q), and finally ` x1 = x2 ,
by 4.4.
The approach that works is as follows. First, we define a pseudo-code of a formula α,
denoting it by [[α]]. Such a pseudo-code is a term, like the code pαq, but whereas pαq
is a constant term, [[α]] is in general a term with variables. In fact, the variables of the
term [[α]] are exactly those that are free in α. Next, we describe a rather special p-func-
tion F (we call it the special p-function). Then we replace each variable xi occurring
in [[α]] by the p-term F (xi ). The resulting p-term is denoted by [[α]](F (x1 ), . . . , F (xn )),
where n is chosen large enough so that all variables in [[α]] are among x1 , . . . , xn . Then
we show by induction on the length of the Σ-formula α that
α → Pf([[α]](F (x1 ), . . . , F (xn )))
is a theorem of HF. It is part of the definition of a pseudo-code that if α is sentence
then [[α]] is identical with pαq. This allows us to conclude ` α → Pf(pαq) for every
Σ-sentence α.
To create [[α]] we follow the same coding procedure as the one leading to the code
pαq, except that no variable xi occurring free in α is coded. The free occurrences of xi
are just left standing. For example:
[[x1 = x2 ]] = hp=q, x1 , x2 i and [[∃x1 (x1 ∈ x3 )]] = hp∃q, px1 q, hp∈q, px1 q, x3 ii.
If we try to define [[α]] by induction on the length of the formula α, i.e., by analogy
with defining pαq, we hit an obstacle: when an occurrence of xi is to be coded at a stage
when a sub-formula of α containing xi free is coded, one does not know yet whether this
occurrence of xi will remain free or finally become bound in α. To overcome this obstacle,
we introduce the concept of an intermediary code [[α]]V corresponding to any set V of
indices of variables (i.e., non-zero ordinals). This code [[α]]V , which we call a V -code, is
created just like [[α]], except that for an occurrence of xi not to be coded, we require not
only that this occurrence is free in α but also that i ∈ V . Thus [[α]]V is [[α]] when V
contains all the indices of the variables free in α, and [[α]]V is pαq when V is empty.
V -coding can be defined inductively. We begin with the terms. We put [[0]]V = 0, and
for any variable xi we define

xi if i ∈ V ,
[[xi ]]V =
pxi q otherwise.
The V -codes of other terms are defined by induction on the length of the term: if σ, τ are
V -coded, then [[σ / τ ]]V = hp/q, [[σ]]V , [[τ ]]V i.
The V -codes of atomic formulas are defined by [[σ ∈ τ ]]V = hp∈q, [[σ]]V , [[τ ]]V i and
[[σ = τ ]]V = hp=q, [[σ]]V , [[τ ]]V i (for any terms σ, τ ).
The V -codes of other formulas are defined by induction on the length of the formula:
[[¬α]]V = hp¬q, [[α]]V i,
[[α ∨ β]]V = hp∨q, [[α]]V , [[β]]V i,
[[∃xk α]]V = hp∃q, pxk q, [[α]]V \{k} i.
Definition 7.1. If V contains the indices of all variables occurring free in α then [[α]]V
(obviously not depending on V in that case) will be denoted by [[α]] and called the
pseudo-code of α.
It may be worth observing that if we replace each variable xi occurring in the term
[[α]] by pxi q then we obtain pαq.
In this section we describe by means of pseudo-codes some properties of the formula
Pf(x). Our goal is Theorem 7.3. This will be needed for proving Theorem 9.1, from which
the Hilbert–Bernays condition and then Gödel’s Second Theorem are easily deduced.
Definition 7.2. Given any string t1 , . . . , tn of variables and any V -code [[α]]V , we shall
denote by [[α]]V (t1 , . . . , tn ) the term that results from [[α]]V when each variable xi oc-
curring in [[α]]V , with i ≤ n, is replaced by ti .
Suppose β is a theorem, i.e., `β. If we replace each of the variables x1 , . . . , xn at
each of its free occurrences in β by some constant term then the formula so obtained
is also a theorem (by the Substitution Rule in 4.2). This just described situation in
the meta-theory admits description in HF. Thus, `β is equivalent to Pf(pβq) (see 4.4).
If t1 , . . . , tn are the codes of the constant terms that are substituted for x1 , . . . , xn (at
the free occurrences of these variables) then this new theorem we get from β has the
{1, . . . , n}-code [[β]]{1,...,n} (t1 , . . . , tn ). Hence the mentioned description in HF is the im-
plication () in 7.3 below. The reader who accepts the accuracy of our description of the
meta-theory of HF by formulas of HF in Sections 3 and 4 might feel no need to see the
proof of () (Lemmas 7.4–7.6). We give it here for the sake of completeness, although it
is rather long and tedious (due to the fact that pseudo-codes do not participate in the
creation of the formula Pf).
Theorem 7.3. For every formula β and every n = 1, 2, . . . ,
() Const(t1 ) ∧ . . . ∧ Const(tn ) ∧ Pf(pβq) → Pf([[β]]{1,...,n} (t1 , . . . , tn ))
is a theorem of HF.
Proof. We reduce the proof to establishing Lemma 7.6, i.e., the theorem
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermForm(pxr q, tr , [[β]]V \{r} (t1 , . . . , tn ), [[β]]V (t1 , . . . , tn ))
for every r ∈ V ⊆ {1, . . . , n}. Recall that in forming the term [[β]]V \{r} (t1 , . . . , tn ) all the
occurrences of xr are coded as pxr q, and when these occurrences of pxr q which corres-
pond to free occurrences of xr in β are replaced by tr , we obtain [[β]]V (t1 , . . . , tn ). This
procedure of replacing the pxr q by tr is described above by RepVarTermForm.
So let us assume that Lemma 7.6 (i.e., the above implication) has been proved. Then,
to prove 7.3, we first observe that 4, 7 and the subsequent Note (Section 3) imply
` Const(t) → VarNonOccTerm(u, t).
Hence
` Var(v) ∧ Form(y) ∧ Const(t) → TermSubsVarForm(t, v, y),
using 20, Lemma 3.2 and the Note after 19 (last line). Next, let us observe that
` TermSubsVarForm(t, v, y) ∧ RepVarTermForm(v, t, y, y 0 ) ∧ Pf(y) → Pf(y 0 )
by the definitions of Subst and Pf, i.e., (6) and (9) in Section 4. Combining this with
the previous theorem, we get
` Const(t) ∧ RepVarTermForm(v, t, y, y 0 ) ∧ Pf(y) → Pf(y 0 ).
Consider now this theorem for every r = 1, . . . , n with
v = pxr q, t = tr , y = [[β]]{1,...,r} (t1 , . . . , tn ), y 0 = [[β]]{1,...,r} (t1 , . . . , tn ),
where, for r = 1, [[β]]{1,...,r} = [[β]]0 = pβq. Then it follows, by the assumed 7.6 with
V = {1, . . . , r} and r = 1, . . . , n, that
` Const(t1 ) ∧ . . . ∧ Const(tr ) ∧ Pf([[β]]{1,...,r} (t1 , . . . , tn )) → Pf([[β]]{1,...,r} (t1 , . . . , tn )).
Combining all of these implications, for r = 1, . . . , n, we get the assertion of 7.3.
The following two lemmas pave the ground for the proof of Lemma 7.6.
Lemma 7.4. Suppose τ is a term, V ⊆ {1, . . . , n} and r ∈ V . Then
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermTerm(pxr q, tr , [[τ ]]V \{r} (t1 , . . . , tn ), [[τ ]]V (t1 , . . . , tn ))
(see 21, 22 in Section 3).
Proof. We use induction on the length of τ . So we have to consider the cases when τ is 0
or a variable, and the case when τ = σ / µ.
A. Case when τ is 0. We have [[0]]V = 0, for every V , and by 22,

` Var(v) ∧ Term(t) → RepVarTermTerm(v, t, 0, 0).
B. Case when τ is a variable. We have three possibilities:
(a): τ is xi and i 6∈ V . From the definition of V -coding follows [[xi ]]V \{r} = [[xi ]]V =
pxi q. Moreover pxi q 6= pxr q, because r ∈ V . From 21, 22 one obtains the general theorem
` Var(v) ∧ Var(u) ∧ Term(t) ∧ u 6= v → RepVarTermTerm(v, t, u, u).
It remains to substitute v = pxr q, t = tr , u = pxi q.
(b): τ is xi , i ∈ V and i 6= r. By the definition of V -coding, we have in this case
[[xi ]]V \{r} = [[xi ]]V = xi , and thus
[[τ ]]V \{r} (t1 , . . . , tn ) = [[τ ]]V (t1 , . . . , tn )) = ti .
We have the general theorem
` Var(v) ∧ Term(t) ∧ Const(t0 ) → RepVarTermTerm(v, t, t0 , t0 )
(see 4 and 21, 22 in Section 3). It suffices to substitute here v = pxr q, t = tr , t0 = ti .
(c): τ is xr . We have [[xr ]]V \{r} = pxr q, [[xr ]]V = xr , whence [[τ ]]V (t1 , . . . , tn ) = tr .
In the general theorem
` Var(v) ∧ Term(t) → RepVarTermTerm(v, t, v, t)
(following directly from 21, 22) we can now substitute v = pxr q, t = tr .
C. Case when τ is σ / µ. From 21, 22 follows the general theorem
` RepVarTermTerm(v, t, u, u0 ) ∧ RepVarTermTerm(v, t, w, w 0 ) →
RepVarTermTerm(v, t, hp/q, u, wi, hp/q, u0 , w0 i).
Let us substitute here v = pxr q, t = tr , and
u = [[σ]]V \{r} (t1 , . . . , tn ), u0 = [[σ]]V (t1 , . . . , tn ),
w = [[µ]]V \{r} (t1 , . . . , tn ), w0 = [[µ]]V (t1 , . . . , tn ).
The inductive hypothesis (σ, µ being shorter than τ ) leads us now to the conclusion
that the left side of the above implication becomes a theorem when it is preceded by
“Const(t1 ) ∧ . . . ∧ Const(tn ) →”. Hence the right side, when so preceded, is a theorem as
well. It remains to observe that
hp/q, u, wi = hp/q, [[σ]]V \{r} , [[µ]]V \{r} i(t1 , . . . , tn ) = [[τ ]]V \{r} (t1 , . . . tn ),
hp/q, u0 , w0 i = hp/q, [[σ]]V , [[µ]]V i(t1 , . . . , tn ) = [[τ ]]V (t1 , . . . tn )
(by 7.2 and the definition of V -coding of terms).
Lemma 7.5. Suppose α is an atomic formula, V ⊆ {1, . . . , n} and r ∈ V . Then
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermAt(pxr q, tr , [[α]]V \{r} (t1 , . . . , tn ), [[α]]V (t1 , . . . , tn )).
Proof. There are terms σ, µ such that α is either σ ∈ µ or σ = µ. Let us consider only
the first case, since the proof for the second is analogous. By 23, we have the general
theorem
` RepVarTermTerm(v, t, u, u0 ) ∧ RepVarTermTerm(v, t, w, w 0 ) →
RepVarTermAt(v, t, hp∈q, u, wi, hp∈q, u0 , w0 i).
Using the same substitutions for v, . . . , w 0 as in the proof of Lemma 7.4, we see from
Lemma 7.4 that the left side of the above implication becomes a theorem when it is
preceded by “Const(t1 ) ∧ . . . ∧ Const(tn ) →”. So the same applies to the right side. It
remains to observe that
hp∈q, u, wi = hp∈q, [[σ]]V \{r} , [[µ]]V \{r} i(t1 , . . . , tn ) = [[α]]V \{r} (t1 , . . . , tn ),
hp∈q, u0 , w0 i = hp∈q, [[σ]]V , [[µ]]V i(t1 , . . . , tn ) = [[α]]V (t1 , . . . , tn ).
Lemma 7.6. Suppose β is a formula, V ⊆ {1, . . . , n} and r ∈ V . Then
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermForm(pxr q, tr , [[β]]V \{r} (t1 , . . . , tn ), [[β]]V (t1 , . . . , tn )).
Proof. We fix n ≥ 1. When β is atomic, the proof is already there. So we have to
consider the cases when β is of the form ∃xi γ, or γ ∨ δ, or ¬γ, while accepting the
inductive hypothesis that the implication in 7.6 has been proved for γ and δ (and for all
r, V satisfying r ∈ V ⊆ {1, . . . , n}).
A. Case when β is ∃xi γ. As in the proof of 7.4.B, we consider various possibilities for
i, r and V .
(a): i 6∈ V and hence pxi q 6= pxr q. We have the general theorem
` Var(u) ∧ u 6= v ∧ RepVarTermForm(v, t, y, y 0 ) →
RepVarTermForm(v, t, hp∃q, u, yi, hp∃q, u, y 0 i).
(see 24, 25). Let us substitute here u = pxi q, v = pxr q, t = tr and
y = [[γ]]V \{r} (t1 , . . . , tn ), y 0 = [[γ]]V (t1 , . . . , tn ).
Then, by the inductive hypothesis, the left side of the above implication becomes a
theorem when preceded by “Const(t1 ) ∧ . . . ∧ Const(tn ) →”. So the same applies to the
right side, where, by the definition of the V -coding,
hp∃q, u, yi = hp∃q, pxi q, [[γ]]V \{r} i(t1 , . . . , tn )
= [[∃xi γ]]V \{r} (t1 , . . . , tn ) = [[β]]V \{r} (t1 , . . . , tn ),
and similarly for y 0 and V in place of y and V \ {r}.
(b): i ∈ V and i 6= r. Let W = V \ {i}. In the general implication used in the proof
of (a), we make the same substitutions for u, v, t as in (a) and put
y = [[γ]]W \{r} (t1 , . . . , tn ), y 0 = [[γ]]W (t1 , . . . , tn ).
Then, by the inductive hypothesis applied to γ and W , the left side of that implication,
when preceded by “Const(t1 ) ∧ . . . ∧ Const(tn ) →”, becomes a theorem. So the same
applies to the right side. We observe that now, by the definition of the V -coding, and
since W \ {r} = (V \ {r}) \ {i},

hp∃q, u, yi = hp∃q, pxi q, [[γ]]W \{r} i(t1 , . . . , tn )
= [[∃xi γ]]V \{r} (t1 , . . . , tn ) = [[β]]V \{r} (t1 , . . . , tn ).
We have a similar equality for y 0 , W and V in place of y, W \ {r} and V \ {r}.
(c): i = r. We use the general theorem
` RepVarTermForm(v, t, y, y 0 ) → RepVarTermForm(v, t, hp∃q, v, yi, hp∃q, v, yi).
Here we substitute v = pxr q = pxi q and let t, y, y 0 be as in (a). By the definition of
V -coding,
hp∃q, v, yi = hp∃q, pxr q, [[γ]]V \{r} i(t1 , . . . , tn )
= [[∃xr γ]]V \{r} (t1 , . . . , tn ) = [[β]]V \{r} (t1 , . . . , tn )
= [[∃xi γ]]V (t1 , . . . , tn ) = [[β]]V (t1 , . . . , tn ).
So we reach the conclusion, arguing similarly to cases (a) and (b).
B. Case when β is γ ∨ δ. We have the general theorem
` RepVarTermForm(v, t, y, y 0 ) ∧ RepVarTermForm(v, t, z, z 0 ) →
RepVarTermForm(v, t, hp∨q, y, zi, hp∨q, y 0 , z 0 i)
(see 24, 25). We substitute here v = pxr q, t = tr and
y = [[γ]]V \{r} (t1 , . . . , tn ), y 0 = [[γ]]V (t1 , . . . , tn )),
z = [[δ]]V \{r} (t1 , . . . , tn ), z 0 = [[δ]]V (t1 , . . . , tn )).
Then
hp∨q, y, zi = hp∨q, [[γ]]V \{r} , [[δ]]V \{r} i(t1 , . . . , tn )
= [[γ ∨ δ]]V \{r} (t1 , . . . tn ) = [[β]]V \{r} (t1 , . . . , tn )
and similarly for y, z, V \ {r} replaced by y 0 , z 0 , V . We reach the conclusion by using the
induction hypothesis for γ and δ.
C. Case when β is ¬γ. We argue as in the preceding case, using the theorem
` RepVarTermForm(v, t, y, y 0 ) → RepVarTermForm(v, t, hp¬q, yi, hp¬q, y 0 i).
Corollary 7.7. For any formulas α, β and n = 1, 2, . . . ,
` Const(t1 ) ∧ . . . ∧ Const(tn ) ∧ Pf(pα → βq) →
{Pf([[α]]{1,...,n} (t1 , . . . , tn )) → Pf([[β]]{1,...,n} (t1 , . . . , tn ))}.
Proof. Suppose Const(ti ) for i = 1, . . . , n and Pf(pα → βq) (where α → β is the abbre-
viation for ¬α ∨ β). Then, by 7.3,
Pf([[α → β]]{1,...,n} (t1 , . . . , tr )),
and by the rules of V -coding this is
Pf(Impl([[α]]{1,...,n} (t1 , . . . , tn ), [[β]]{1,...,n} (t1 , . . . , tn ))).
Since the Modus Ponens Rule of Inference is incorporated in the definition ((8) and (9))
of the formula Pf, we have the general theorem
` Pf(Impl(x, y)) → [Pf(x) → Pf(y)].
It remains to substitute x = [[α]]{1,...,n} (t1 , . . . , tn ) and y = [[β]]{1,...,n} (t1 , . . . , tn ).
8. The special p-function F
We shall presently introduce the special p-function F . According to the plan outlined at
the beginning of the previous section, a key role will be given to the p-term [[α]]((F (x 1 ),
. . . , F (xn )), obtained from the term [[α]] by replacing each xi occurring in it by the p-term
F (xi ). In this section, after defining F , we shall show that the p-formula Const(F (xi )) is
(or rather, abbreviates) a theorem of HF. This will allow us to apply the results of the
preceding section to the situation when each ti is replaced by F (xi ).
To define F , we need to select definably from every non-empty set x an element v, so
that then canonically x = u / v, with u = x \ {v}. To select v ∈ x, we use an ordering
relation < on the universe of all (hereditarily finite) sets. The description of < starts from
the following observation: Any ordering of a (finite) set y determines an ordering of the
power set P (y) (= set of subsets of y). This is defined so that for z1 , z2 ⊆ y and z1 6= z2
one puts z1 < z2 iff the greatest element of the symmetric difference (z1 ∪ z2 ) \ (z1 ∩ z2 ) is
in z2 , and otherwise, z2 < z1 . Since all sets can be reached from 0 by repeated application
of the power-set operation (and 0 is ordered!), the above principle yields a unique ordering
of the universe of all sets. This ordering refines the partial ordering by ⊂ and it also has
the property that x < y when x ∈ y (see Ap.5 for the details). We can use the same
symbol < for this order and for the usual order among ordinals, since on the sub-universe
of ordinals, both orders coincide.
Definition 8.1. The special p-function F is defined by recursion on inequality (see
Ap.5.14) as follows:
1◦ F (0) = 0.
2◦ Suppose x 6= 0 and F (y) has been defined for all y < x. Then put
F (x) = hp/q, F (u), F (v)i,
where v is the <-greatest element of x and u = x \ {v} (i.e., x = u / v and v 6∈ u).
The existence of v follows from Ap.5.10. Also, we have u, v < x, by 4.5 and 5.9 in Ap.
(We leave to the reader the task of describing the p-function G required in 5.14 of Ap.)
Let us list some properties of F that may justify why we called F special:
(a) In the standard model S, the values of F are codes (or rather, named by codes)
of constant terms.
(b) The definition of F parallels the coding of terms (compare 2◦ in 8.1 with pσ / τ q
= hp/q, pσq, pτ qi ).
(c) F is injective.
(d) For each constant term τ there is a constant term µ such that ` τ = µ and
F (µ) = pµq.
(e) F (k) = p(pxk q)q for every ordinal k 6= 0. (Hence F coincides on the sub-universe
of ordinals with the two p-functions W, H of Section 6.)
Property (a) follows from 8.2, (b) is obvious, and (c), (d), (e) will not be used below.
There is a p-function in [Bo] that plays a similar role to that of F here, however the
definition in [Bo] is simpler, because in PA each x is the successor of a unique u, so that
x = Su can be used instead of x = u / v.
Theorem 8.2. ` Const(F (x)).
Proof. Let us prove Const(F (x)) by induction on < (applying the scheme 5.11 of Ap).
We have Const(F (0)), due to F (0) = 0. Now let x 6= 0 and assume Const(F (y)) for all
y < x. Let u, v < x be such that v is the <-largest element of x and u = x \ {v}, so that
F (x) = hp/q, F (u), F (v)i. Since then Const(F (u)) and Const(F (v)), by the inductive
assumption, we can use the easily proved theorem
` Const(t1 ) ∧ Const(t2 ) → Const(hp/q, t1 , t2 i)
to deduce Const(hp/q, F (u), F (v)i), i.e., Const(F (x)).
The theorem Const(F (x)) allows us to replace in 7.3 and 7.7 each ti by F (xi ), thus
leading to:
Lemma 8.3. If β is a theorem and all variables free in β are among x1 , . . . , xn then
` Pf([[β]](F (x1 ), . . . , F (xn ))).
Proof. Assume ` β. Then ` Pf(pβq), by 4.4. Replacing each ti by F (xi ) in 7.3 and
using 8.2, we obtain
` Pf(pβq) → Pf([[β]]{1,...,n} (F (x1 ), . . . , F (xn ))).
Moreover the assumption about the variables implies [[β]]{1,...,n} = [[β]].
Lemma 8.4. If α → β is a theorem and all variables free in α and β are among x 1 , . . . , xn
then
` Pf([[α]](F (x1 ), . . . , F (xn ))) → Pf([[β]](F (x1 ), . . . , F (xn ))).
Proof. Assume ` α → β. Then ` Pf(pα → βq), by 4.4. Replacing each ti by F (xi ) in 7.7
and using 8.2, we obtain
` Pf(pα → βq) →
{Pf([[α]]{1,...,n} (F (x1 ), . . . , F (xn ))) → Pf([[β]]{1,...,n} (F (x1 ), . . . , F (xn )))}.
Moreover the assumption about the variables implies [[β]]{1,...,n} = [[β]] and [[α]]{1,...,n}
= [[α]].
In the above lemmas we obtained [[α]](F (x1 ), . . . , F (xn )) by replacing each ti in
[[α]](t1 , . . . , tn ) by F (xi ). However, t1 , . . . , tn were introduced merely for the sake of con-
venience and clearly we can pass directly from [[α]] to [[α]](F (x1 ), . . . , F (xn )) just by
replacing each xi in [[α]] by F (xi ).
9. Gödel’s Second Incompleteness Theorem
The proof of Gödel’s Second Incompleteness Theorem is based on the fact that α →
Pf(pαq) is a theorem of HF for every Σ-sentence α. We obtain this result as a direct
consequence of Theorem 9.1 whose proof occupies the major part of this section.
Theorem 9.1. If α is a Σ-formula and all variables free in α are among x 1 , . . . , xn then
the p-formula
(?) α → Pf([[α]](F (x1 ), . . . , F (xn )))
abbreviates a theorem of HF.
Definition 9.2. A formula α will be called a (?)-formula iff (?) abbreviates a theorem
of HF for every (or equivalently, some) n such that all free variables of α are among
x1 , . . . , x n .
Due to 8.4, it will suffice to show that every strict Σ-formula is a (?)-formula. Recall
that (see 2.1) the class Σ of strict Σ-formulas is the least class of formulas containing all
atomic formulas of the form xi ∈ xj , and such that Σ is closed under conjunctions, dis-
junctions and the application of existential quantifiers and bounded universal quantifiers
∀(xi ∈ xj ), where i 6= j. Thus the proof is of 9.1 is reduced to showing Lemmas 9.4–9.7
below. (However we first need the auxiliary result 9.3.)
Lemma 9.3. xi = xj is a (?)-formula.
Proof. We have to prove that
` xi = xj → Pf([[xi = xj ]](F (x1 ), . . . , F (xn )))
provided i, j ≤ n. Since in this case, by the definition of pseudo-coding,
[[xi = xj ]](F (x1 ), . . . , F (xn )) = hp=q, F (xi ), F (xj )i,
we have to prove
(1) xi = xj → Pf(hp=q, F (xi ), F (xj )i).
We have:
` xi = x i .
` Pf([[xi = xi ]](F (x1 ), . . . , F (xn ))) when i ≤ n, by 8.3.
` Pf(hp=q, F (xi ), F (xi )i), by the definition of pseudo-coding.
` xi = xj → Pf(hp=q, F (xi ), F (xj )i), by the rules of logic.
Lemma 9.4. xi ∈ xj is a (?)-formula.
Proof. We have to show
` xi ∈ xj → Pf([[xi ∈ xj ]](F (x1 ), . . . , F (xn )))
provided i, j ≤ n. Similarly to the case of the formula xi = xj , this amounts to proving
` xi ∈ xj → Pf(hp∈q, F (xi ), F (xj )i).
There will be no loss of generality if we take here i = 1, j = 2 = n, so let β(x2 ) be the

formula
x1 ∈ x2 → Pf(hp∈q, F (x1 ), F (x2 )i).
We shall prove ∀x2 β(x2 ) by induction on < (applying the scheme 5.11 in Ap) with
respect to x2 , keeping x1 fixed. In this proof we shall need the theorem
(2) ` F (x2 ) = hp/q, F (x3 ), F (x4 )i →
{Pf(hp∈q, F (x1 ), F (x3 )i) ∨ Pf(hp=q, F (x1 ), F (x4 )i) → Pf(hp∈q, F (x1 ), F (x2 )i)}.
To prove (2), note first the theorems
` Pf(zi ) → Pf(hp∨q, z1 , z2 i) (i = 1, 2),
which follow directly from our definition of the Pf formula (see second Boolean Axiom
in 4.1 and the Modus Ponens Rule in 4.2). Hence
` Pf(z1 ) ∨ Pf(z2 ) → Pf(hp∨q, z1 , z2 i).
We substitute here z1 = hp∈q, F (x1 ), F (x3 )i and z2 = hp=q, F (x1 ), F (x4 )i and observe
that then
hp∨q, z1 , z2 i = hp∨q, hp∈q, F (x1 ), F (x3 )i, hp=q, F (x1 ), F (x4 )ii
= [[x1 ∈ x3 ∨ x1 = x4 ]](F (x1 ), F (x2 ), F (x3 ), F (x4 )).
This leads to
(3) ` Pf(hp∈q, F (x1 ), F (x3 )i) ∨ Pf(hp=q, F (x1 ), F (x4 )i) →
Pf([[x1 ∈ x3 ∨ x1 = x4 ]](F (x1 ), F (x2 ), F (x3 ), F (x4 ))).
Moreover
(4) ` Pf([[x1 ∈ x3 ∨ x1 = x4 ]](F (x1 ), F (x2 ), F (x3 ), F (x4 ))) →
Pf([[x1 ∈ x3 / x4 ]](F (x1 ), F (x2 ), F (x3 ), F (x4 )))
by the theorem x1 ∈ x3 ∨ x1 = x4 → x1 ∈ x3 / x4 and 8.4. Combining (3) and (4), we
obtain
` Pf(hp∈q, F (x1 ), F (x3 )i) ∨ Pf(hp=q, F (x1 ), F (x4 )i) →
Pf(hp∈q, F (x1 ), hp/q, F (x3 ), F (x4 )i).
So (2) follows.
Returning now to the proof of ∀x2 β(x2 ), by induction on >, we note that it will suffice
to show that
(5) ∀(x3 < x2 )β(x3 ) → β(x2 )
is a theorem. Let us assume ∀(x3 < x2 )β(x3 ). If x2 = 0 then (5) is a theorem, because
β(0) is a theorem (being an implication with the premise x1 ∈ 0). If x2 6= 0, let x3 < x2
and x4 be such that x2 = x3 / x4 and F (x2 ) = hp/q, F (x3 ), F (x4 )i (see 8.1). Then
x1 ∈ x2 → x1 ∈ x3 ∨ x1 = x4 → Pf(hp∈q, F (x1 ), F (x3 )i) ∨ Pf(hp=q, F (x1 ), F (x4 )i)
by the assumption (since x3 < x2 ) and (1) in 9.3. Combining this with (2), we get β(x2 ),
so (5) is shown.
Lemma 9.5. If α and β are (?)-formulas then α ∨ β and α ∧ β are (?)-formulas.
Proof. Suppose all variables free in α ∨ β are among x1 , . . . , xn . Then we should show
(a) ` α ∨ β → Pf([[α ∨ β]](F (x1 ), . . . , F (xn ))),
(b) ` α ∧ β → Pf([[α ∧ β]](F (x1 ), . . . , F (xn ))).
To get (a), note that, by assumption,
` α ∨ β → Pf([[α]](F (x1 ), . . . , F (xn ))) ∨ Pf([[β]](F (x1 ), . . . , F (xn ))).
It now remains to apply the theorem Pf(z1 ) ∨ Pf(z2 ) → Pf(hp∨q, z1 , z2 i) with z1 =
[[α]](F (x1 ), . . . , F (xn )), z2 = [[β]](F (x1 ), . . . , F (xn )), noting that in this case
hp∨q, z1 , z2 i = [[α ∨ β]](F (x1 ), . . . , F (xn )).
The proof of part (b) is analogous.
Lemma 9.6. If α is a (?)-formula then, for every i, ∃xi α is a (?)-formula. In other
words, if all variables free in ∃xi α are among x1 , . . . , xn then
` ∃xi α → Pf([[∃xi α]](F (x1 ), . . . , F (xn ))).
Proof. Let m ≥ n be such that all variables free in α are among x1 , . . . , xm . Then
` Pf([[α]](F (x1 ), . . . , F (xm ))) → Pf([[∃xi α]](F (x1 ), . . . , F (xm )))
by the Specialization Axiom ` α → ∃xi α and 8.4. Combining this with the assump-
tion (?), we obtain
` α → Pf([[∃xi α]](F (x1 ), . . . , F (xm ))).
Since xi is not free on the right hand side, we can apply the ∃-introduction rule (in 4.2),
to get
` ∃xi α → Pf([[∃xi α]](F (x1 ), . . . , F (xm ))).
All the variables free in ∃xi α are among x1 , . . . , xn , so we can replace here m by n.
Lemma 9.7. If α is a (?)-formula then, for every i 6= j, ∀(xi ∈ xj )α is a (?)-formula. In
other words, if all variables free in ∀(xi ∈ xj )α are among x1 , . . . , xn then
` ∀(xi ∈ xj )α → Pf([[∀(xi ∈ xj )α]](F (x1 ), . . . , F (xn ))).
Proof. Let α be a (?)-formula. The following two observations will be used in the proof:
(a) If β ↔ α then β is a (?)-formula (in virtue of 8.4).
(b) If a variable xk is replaced in α by some variable xl (substitutable for xk in α)
then the formula so obtained is also a (?)-formula.
To simplify matters, we take i = 1, j = 2 and represent by x3 the remaining free
variables in α. We shall distinguish two cases:
Case 1: x2 is not free in α. We shall write α as α(x1 ). It will be also convenient
to take n = 5 and assume that x5 is substitutable for x1 (otherwise, we can rename
the bound variables of α so that x5 will not be among them and then replace α by the
formula β so obtained; see (a) above).
Thus our purpose is to prove ∀x2 γ(x2 ) for the formula γ(x2 ):
∀(x1 ∈ x2 )α(x1 ) → Pf([[∀(x1 ∈ x2 )α(x1 )]](F (x1 ), . . . , F (x5 ))),
under the assumption

(6) ` α(x1 ) → Pf([[α(x1 )]](F (x1 ), . . . , F (x5 ))).
The proof, by induction on <, will be somewhat analogous to the proof of ∀x2 β(x2 ) in 9.4.
We begin by showing
(7) ` F (x2 ) = hp/q, F (x4 ), F (x5 )i →
{Pf([[∀(x1 ∈ x4 )α(x1 )]](F (x1 ), . . . , F (x5 ))) ∧ Pf([[α(x5 )]](F (x1 ), . . . , F (x5 ))) →
Pf([[∀(x1 ∈ x2 )α(x1 )]](F (x1 ), . . . , F (x5 )))}.
To do this, note first that
(8) ` Pf(z1 ) ∧ Pf(z2 ) → Pf(Conj(z1 , z2 )),
by our definition of the Pf formula in Section 4. Let us substitute here
z1 = [[∀(x1 ∈ x4 )α(x1 )]](F (x1 ), . . . , F (x5 )),
z2 = [[α(x5 )]](F (x1 ), . . . , F (x5 ))
and note that then
Conj(z1 , z2 ) = [[∀(x1 ∈ x4 )α(x1 ) ∧ α(x5 )]](F (x1 ), . . . , F (x5 )),
by our definition of pseudo-coding. Together with (8) this leads to
(9) ` Pf([[∀(x1 ∈ x4 )α(x1 )]](F (x1 ), . . . , F (x5 ))) ∧ Pf([[α(x5 )]](F (x1 ), . . . , F (x5 ))) →
Pf([[∀(x1 ∈ x4 )α(x1 ) ∧ α(x5 )]](F (x1 ), . . . , F (x5 ))).
Since x5 is substitutable for x1 in α, one easily proves
` ∀(x1 ∈ x4 )α(x1 ) ∧ α(x5 ) → ∀(x1 ∈ x4 / x5 )α(x1 ).
Hence
(10) ` Pf([[∀(x1 ∈ x4 )α(x1 ) ∧ α(x5 )]](F (x1 ), . . . , F (x5 ))) →
Pf([[∀(x1 ∈ x4 / x5 )α(x1 )]](F (x1 ), . . . , F (x5 ))),
due to 8.4. Finally, from the definition of pseudo-coding, and under the assumption
F (x2 ) = hp/q, F (x4 ), F (x5 )i, we obtain
[[∀(x1 ∈ x4 / x5 )α(x1 )]](F (x1 ), . . . , F (x5 )) =
[[∀x1 (x1 ∈ x4 / x5 → α(x1 ))]](F (x1 ), . . . , F (x5 )) =
All(px1 q, Impl(hp∈q, px1 q, hp/q, F (x4 ), F (x5 )ii, [[α(x1 )]]{3} (F (x1 ), . . . , F (x5 )))) =
All(px1 q, Impl(hp∈q, px1 q, F (x2 )i, [[α(x1 )]]{3} (F (x1 ), . . . , F (x5 )))) =
[[∀(x1 ∈ x2 )α(x1 )]](F (x1 ), . . . , F (x5 )).
This calculation, together with (9) and (10), yields the desired result (7).
Returning now to the proof of ∀x2 γ(x2 ) by induction on <, we recall (see 5.1 in Ap)
that it will suffice to establish that
(11) ∀(x4 < x2 )γ(x4 ) → γ(x2 )
is a theorem. Thus let us assume ∀(x4 < x2 )γ(x4 ).
Suppose x2 = 0. Then the proof of (11) obviously reduces to verifying ` γ(0). To do

this, we note first that ` ∀(x1 ∈ 0)α(x1 ), whence by 8.3,
` Pf([[∀(x1 ∈ 0)α(x1 )]](F (x1 ), . . . , F (x5 ))).
Now, since ∀(x1 ∈ x2 )α(x1 ) is just an abbreviation of ∀x1 (x1 ∈ x2 → α(x1 )), we easily
check that
[[∀(x1 ∈ x2 )α(x1 )]](F (x1 ), F (0), F (x3 ), F (x4 ), F (x5 )) =
[[∀(x1 ∈ 0)α(x1 )]](F (x1 ), F (x2 ), F (x3 ), F (x4 ), F (x5 ))
(as both sides of this equality are equal to
All(px1 q, Impl(hp∈q, px1 q, 0i, [[α(x1 )]]{3} (F (x1 ), . . . , F (x5 )))),
because F (0) = 0). Hence
` Pf([[∀(x1 ∈ x2 )α(x1 )]](F (x1 ), F (0), F (x3 ), F (x4 ), F (x5 ))),
which shows that ` γ(0). Thus (11) is a theorem when x2 = 0.
Now suppose x2 6= 0 and let x4 , x5 < x2 be such that x2 = x4 / x5 and F (x2 ) =
hp/q, F (x4 ), F (x5 )i (see 8.1). Then γ(x4 ) by assumption, whence
∀(x1 ∈ x4 )α(x1 ) → Pf([[∀(x1 ∈ x4 )α(x1 )]](F (x1 ), . . . , F (x5 )))
(since x2 is not free in α). Moreover, replacing x1 by x5 in (6) we conclude
` α(x5 ) → Pf([[α(x5 )]](F (x1 ), . . . , F (x5 )))
(see property (b) above). Hence
∀(x1 ∈ x2 )α(x1 ) → ∀(x1 ∈ x4 )α(x1 ) ∧ α(x5 ) →
Pf([[∀(x1 ∈ x4 )α(x1 )]](F (x1 ), . . . , F (x5 ))) ∧ Pf([[α(x5 )]](F (x1 ), . . . , F (x5 ))).
Combining this sequence of implications with (7), we obtain γ(x2 ).
Case 2: x2 is free in α. The assumption is now that α(x1 , x2 , x3 ) is a (?)-formula.
Consider a variable that is substitutable for x2 in α, say x4 . Then, by virtue of (b)
above, also α(x1 , x4 , x3 ) is a (?)-formula. Treating now the pair x4 , x3 the same way
as x3 was treated in Case 1 we conclude by the same considerations as in Case 1
that ∀(x1 ∈ x2 )α(x1 , x4 , x3 ) is a (?)-formula. Hence, applying (b) again, we see that
∀(x1 ∈ x2 )α(x1 , x2 , x3 ) is a (?)-formula.
Let Con(HF) be a sentence stating that HF is a consistent theory. For instance, this
could be: “0 ∈ 0 is not provable in HF”. To express this in HF, we may take Con(HF) to
be ¬ Pf(p0 ∈ 0q).
Theorem 9.8 (Gödel’s Second Incompleteness Theorem). If HF is a consistent theory,
then Con(HF) is not a theorem of HF.
Proof. We take the sentence δ satisfying
(4) ` δ ↔ ¬ Pf(pδq)
(see 6.5) and we show that
(44) ` Con(HF) → δ.
Then, since δ is not a theorem (see 6.5), neither is Con(HF). To show (44), we apply 9.1
to the Σ-sentence Pf(pδq):
` Pf(pδq) → Pf(pPf(pδq)q).
On the other hand, applying 8.4 to (4), we deduce
` Pf(pδq) → Pf(p¬ Pf(pδq)q).
The last two theorems yield
` Pf(pδq) → Pf(pPf(pδq)q) ∧ Pf(p¬ Pf(pδq)q).
By the definition of Pf, we have, for any formulas α, β,
` Pf(pαq) ∧ Pf(pβq) → Pf(pα ∧ βq),
so we conclude from the above:
` Pf(pδq) → Pf(pPf(pδq) ∧ ¬ Pf(pδq)q).
Since ` Pf(pδq) ∧ ¬ Pf(pδq) → 0 ∈ 0, we can use 8.4 to obtain
` Pf(pδq) → Pf(p0 ∈ 0q),
that is,
` Pf(pδq) → ¬ Con(HF).
Hence
` Con(HF) → ¬ Pf(pδq).
Combining this with (4), we obtain (44).
This Incompleteness Phenomenon still attracts very much attention. Apart from the
publications listed below (consulted during the preparation of this paper) one can find a
more extensive list of references in [L].
10. Finite sets and arithmetic
We conclude this paper by stating a theorem which says that each of the theories HF and
PA is an extension by definitions of the other. In detail: If one adds to PA the symbols ∈
and / and one views x ∈ y and x ∈ y / z as abbreviations of certain formulas of PA (the
formulas (∈) and (/) in 10.1) then the axioms of HF can be proved in PA. Conversely,
if one adds to HF the function symbols S (successor), + and ×, and one views Sx = y,
x + y = z and x × y = z as abbreviations of certain formulas of HF then all axioms of
PA become theorems of HF. Most of these facts are discussed in [Be], [My] and [TG]
(although in [Be] this mutual relationship between PA and HF is not fully worked out,
in [TG] the results are merely stated and in [My] the work concerns the model ω of PA).
This situation would not exclude the possibility that arithmetic, when so defined
in HF, is stronger than PA (i.e., that some theorem of HF, after the elimination of ∈
and / in favor of +, × and S, becomes a statement about numbers not provable in PA).
The opposite situation (where PA and HF are interchanged) might be possible as well.
That neither of these can happen (at least when (∈) in 10.1 below is used for defining
the relation x ∈ y in PA) is a consequence of the following result.
Theorem 10.1. Let T be the theory, in the language {0, =, +, ×, S, ∈, /}, axiomatized by
the axioms of PA and two further axioms (where z < y abbreviates ∃w(z + Sw = y)):
(∈) x ∈ y ↔ ∃(u < y)∃(v < y)[y = v + 2x + u × 2Sx ∧ v < 2x ]
(that is, x occurs as an exponent when y is written as a sum of powers of 2),
(/) x ∈ y / z ↔ x ∈ y ∨ x = z.
Then there exist formulas α(x, y), β(x, y, z) and γ(x, y, z) in the language of HF such
that another axiomatization of T is provided by the axioms HF1, HF2, the axiom scheme
HF3 and three further axioms:
(S) Sx = y ↔ α(x, y),
(+) x + y = z ↔ β(x, y, z),
(×) x × y = z ↔ γ(x, y, z).
Moreover α, β, γ are unique up to equivalence in HF.
Thus PA and HF have the common extension T by definitions:
T = PA + (∈) + (/) = FS + (S) + (+) + (×).
The above definition (∈) of the relation ∈ in arithmetic is due to Ackermann [A]. The
proof of 10.1 is too tedious to be included here; let us merely indicate how α, β and γ
can be obtained. Suppose E : S → ω is the “enumeration” by natural numbers of the
universe S of the standard model S of HF (see Ap.6), satisfying
1◦ E(0) = 0,
2◦ E({a0 , . . . , an }) = 2E(a0 ) + . . . + 2E(an )
for any distinct a0 , . . . , an ∈ S. It is not hard to formalize in HF this description of E so
as to get a p-function E (in the sense of Section 5) which maps bijectively the universe
of HF onto its sub-universe of ordinal numbers. Then we define
α(x, y) ↔ E(x) / E(x) = E(y),
β(x, y, z) ↔ E(x) ⊕ E(y) = E(z),
γ(x, y, z) ↔ E(x) ⊗ E(y) = E(z),
where ⊕ and ⊗ denote addition and multiplication of ordinals in HF.
11. A theorem of Reinhardt
This section deals with a question raised by J. Mycielski, whether there exists a theorem
which is of moderate length (so that it can be written down), whilst each proof of it
is so long that it can never be written down. W. Reinhardt has indicated (verbal com-
munication) how to construct a theorem of this kind and we shall work out here some
of the details of this construction. In its original form this result concerned PA but we
believe that the proof of Theorem 11.1 would become much more cumbersome if we were
to replace HF by PA.
Theorem 11.1. If HF is consistent, then there exists a theorem δ of HF which can be
written down, yet such that no proof of δ can be written down (because of its immense
length).
Proof. The proof is based on Theorem 6.4 (Gödel’s Diagonal Lemma). We conclude from
Gödel’s Lemma that there exists, for any fixed ordinal, a sentence δ asserting about itself
that it has no proof whose code is less (in the sense of the inequality < in HF) than
that ordinal. It then remains to attend to the details, to ensure that δ is of reasonably
moderate length, HF ` δ and there does not exist a proof of δ of moderate length. To be
more precise, we start with
Definition 11.2. The length of a string of symbols, such as a term, a formula or a proof,
is the total number of symbols occurring in the string. Here, in counting the symbols, we
disregard the parentheses and commas. Every re-occurring symbol is counted at every
occurrence.
Definition 11.3. It will be said that a formula α is of strictly moderate length, or simply
strictly moderate, if α can be practically written down using an imaginably limited amount
of paper or time. A formula that is HF-equivalent to a strictly moderate one will be called
moderate.
This definition is not at all precise, so it will be left to the reader to agree with us when
we call a given formula “moderate”. Here is an example: Consider the ordinal-valued
exponential p-function exp2 (x) to the base 2, i.e., the p-function satisfying exp2 (x) =
exp2 (x) + exp2 (x) for every non-zero ordinal x and exp2 (x) = 1 when x = 0 or x is not
an ordinal. Then exp2 is defined recursively on ordinals (Ap.3.3) and thus y = exp2 (x)
may be viewed as an abbreviation of a moderate formula. The same may be said about
the 6-fold superposition
y = exp2 (. . . (exp2 (x))))))).
Let K be the constant term for which K = exp2 (. . . (exp2 (2))))))) is a theorem of HF.
Then K denotes a very large ordinal number, in fact K > 1050 , which supposedly exceeds
the number of elementary particles in the Universe. Since only 0, / and parentheses may
appear in any constant term, it is impossible to write K explicitly. Thus y = K is a
formula of HF that is not strictly moderate. But y = K is evidently moderate since it
is HF-equivalent to a strictly moderate formula resulting from a 6-fold application of
the moderate formula abbreviated by y = exp2 (x). We shall need below the formula
z = R(4K + 17), and a discussion similar to the above (based on the recursive definition
of R in Ap.3.3) shows that this one is moderate as well.
We now claim that for proving Reinhardt’s theorem it is enough to find a moderate
formula α(x), with one free variable x, such that for every sentence δ:
(a) S α(pδq) ⇒ ` δ. [Recall that S = standard model of HF.]
(b) If ` δ and δ has a proof of length less than K, then S α(pδq).
(c) S ¬α(pδq) ⇒ ` ¬α(pδq).
Indeed, given such α(x), let δ be the sentence obtained in the proof of the Diagonal
Lemma (6.4 above), when applied to ¬α(x), so that
(1) ` δ ↔ ¬α(pδq).
Then δ is moderate as well. Moreover S α(pδq) is impossible, by (a) and (1). Thus
S ¬α(pδq), so (c) and (1) imply ` δ. Also, no proof of δ can be of length less than K,
by (b).
Let us show that (a), (b), (c) are statisfied when α(x) is
(2) ∃(k ∈ K)∃(s ∈ R(4K + 17))Prf(s, k, x).
[Note that ` k ∈ K ↔ Ord(k) ∧ k < K.] Then α(x) is equivalent to
∀y∀z[(y = K ∧ z = R(4K + 17)) → ∃(k ∈ y)∃(s ∈ z)Prf(s, k, x)],
so we see from the above discussion of the formulas y = K and z = R(4K + 17) that α(x)
is moderate. Since condition (a) is now a direct consequence of 4.4, it remains to verify
(b) and (c).
Proof of (b). We need to show that if δ has a proof of length less than K then there
exists a proof δ0 , . . . , δn of δ such that the sequence
(3) s = {h0, pδ0 qi, . . . , hn, pδn qi}
satisfies s ∈ R(4K + 17) and n+ < K. The next two lemmas will serve this purpose.
Lemma 11.4. Let w be a term or formula of length l, and suppose that the index j of
every variable xj occurring in w does not exceed m. Then rank(pwq) ≤ 2l + m + 12.
Proof. First, we deduce directly from the definition of rank that rank(j) = j for all
ordinals j (see Ap.4.4 and 4.5). Let l = 1. Then w is 0 or xj and pwq = 0 or pwq = j ≤ m.
Since rank(j) = j, the assertion is true. We now proceed by induction on l, noting first
that hx, yi = {{x}, {x, y}} and hx, y, zi = hx, hy, zii imply
(4) rank(hx, yi) ≤ max(rank(x), rank(y)) + 2,
(5) rank(hx, y, zi) ≤ max(rank(x), rank(y), rank(z)) + 4.
If l > 1, then pwq is hx, yi or hx, y, zi, where x, y, z are codes of certain sub-strings
of the string w. If pwq is hx, yi, then x = p¬q, whence rank(x) = 10 (see Section 1),
moreover the length of the sub-string of w whose code in y is l − 1. By the inductive
assumption, rank(y) ≤ 2(l − 1) + m + 12, and rank(pwq) ≤ 2l + m + 12 follows from (4).
If pwq = hx, y, zi, then l > 2, x is one of the codes p∈q, p/q, . . . , p∃q and y, z are codes
of sub-strings of w of length at most l − 2. Hence rank(x) ≤ rank(p∃q) ≤ 12, and using
the inductive assumption and (5), we get rank(pwq) ≤ 2l + m + 12.
Lemma 11.5. If each formula δi in the sequence s (see (3)) is of length at most l and the
index j of any variable xj occurring in any δi does not exceed m, then s < 2l +m+n+16.
Proof. One first shows by induction on ordinals n that if rank(x) = n then x < n+
(this is a consequence of rank(n) = n and R(n) < R(n+ ) \ R(n) proved in Ap.5.8 and
5.9). In other words, x < [rank(x)]+ for all x. Thus it will be enough to show that
rank(s) ≤ 2l + m + n + 15. Each element of s is a pair hi, pδi qi, where i ≤ n and
rank(pδi q) ≤ 2l + m + 12, by 11.4. Thus, by (4), each element of s is of rank at most

2l + m + n + 14, which implies that rank(s) ≤ 2l + m + n + 15.
To conclude the proof of (b), assume that δ has a proof δ0 , . . . , δn shorter than K.
Such proof must employ less than K variables, so replacing these variables by others, if
need be, we may assume that the indices j of the variables xj occurring in the formulas
δi are less than K. Also, it is clear that n+ < K and that the length of each formula δi is
less than K. Thus the assumptions of 11.5 are satisfied with l < K, m < K and n+ < K.
It now follows from 11.5 that the sequence s (see (3)) satisfies s < 4K + 16. The theorems
y < x → y ∈ R(rank(x)+ ) (see proof of Ap.5.12) and Ord(j) → rank(j) = j now imply
that s ∈ R(4K + 17). Hence S α(pδq).
Proof of (c). Clearly, ¬α(x) is equivalent to
∀(k ∈ K)∀(s ∈ R(4K + 17))¬ Prf(s, k, x).
It follows from the construction of the standard model S (Ap.6) that there are constant
terms σ and τ naming K and R(4K + 17) respectively. Thus Lemma 6.2 in the Appendix
implies that there are constant terms σ1 , . . . , σl and τ1 , . . . , τm such that
^
(6) ` ¬α(pδq) ↔ ¬ Prf(τi , σj , pδq)
1≤i≤m
1≤j≤l
(see (§§) in the proof of 2.5 above). Consequently S ¬α(pδq) yields S ¬ Prf(τ i , σj , pδq)
for all 1 ≤ i ≤ m, 1 ≤ j ≤ l. For any particular τi , σj , S ¬ Prf(τi , σj , pδq) means that
σj fails to be a sequence of length τi coding a proof of δ, that is, σj is not a sequence
of length τi of codes of formulas such that the corresponding sequence of formulas is
a proof of δ. Now, there is an effective procedure that, when applied to the term σj ,
leads to verifying that either σj does not name a sequence of length (named by) τi or,
while naming such a sequence, that sequence fails to satisfy at least one of the conditions
spelled out in the description of the formula Prf(τi , σj , pδq) by (8) in Sect. 4. Thus the
failure of σj to code a proof of δ of length τi is provable in HF. We conclude that
(7) S ¬ Prf(τi , σj , pδq) ⇒ ` ¬ Prf(τi , σj , pδq)
for all 1 ≤ i ≤ m, 1 ≤ j ≤ l. So S ¬α(pδq) implies ` ¬α(pδq), by (6) and (7). This
finishes the proof of (c), and hence of Theorem 11.1.
Appendix: The axiomatic theory of finite sets
In this Appendix, totally self-contained, we present a systematic development of the

theory of finite sets, also called the theory of hereditarily finite sets or sets of finite rank.
We call this theory HF. A systematic presentation of HF from its axioms has not been
available hitherto. Tarski and Givant proposed in [TG] several equivalent axiomatizations,
and ours is a modification of one of these. In the exposition of HF given below, we present
only as much of the theory as is needed for the proofs of the incompleteness results on
the previous pages.
1. Axioms and basic results
The non-logical symbols of HF are 0, ∈ and /, where 0 is a constant, ∈ a binary relation

symbol and / a binary operation symbol. One can axiomatize HF using ∈ as the only
non-logical symbol (see [TG]). However in that case the theory has no terms, whilst it is
precisely the constant terms that we have used for the coding of the meta-theory.
Informally, 0 denotes the empty set and x / y is the result of enlarging the set x by a
new element, namely y. (Thus y ∈ x is equivalent to x / y = x, so that / could serve as
the only non-logical symbol, while 0 would be defined by HF1.) For the axioms we take
the universal closures of the following formulas:
HF1. z = 0 ↔ ∀x(x 6∈ z).
HF2. z = x / y ↔ ∀u(u ∈ z ↔ u ∈ x ∨ u = y).
HF3. α(0) ∧ ∀x∀y[α(x) ∧ α(y) → α(x / y)] → ∀xα(x).
The assumption in HF3 is that α is any formula in the first-order language based on
0, ∈ and /, moreover α contains a freely occurring variable z such that x and y are
substitutable for z, whilst α(x), α(y) and α(x / y) denote the effects of substitutions.
Theorem 1.1.
(a) x 6∈ 0.
(b) u ∈ x / y ↔ u ∈ x ∨ u = y.
(c) z ∈ 0 / y ↔ z = y.
Proof.
(a) Substitute 0 for z in HF1.
(b) Substitute x / y for z in HF2.
(c) Substitute 0 for x in (b) and use (a).
Theorem 1.2 (Extensionality Property). x = z ↔ ∀u(u ∈ x ↔ u ∈ z).
Proof. Let α(x) be the asserted formula. To apply HF3, we check that
1◦ ` α(0) [by 1.1(a) and HF1].
2◦ ` α(x / y) [by 1.1(b) and HF2].
Definition 1.3. We denote by {x}, {x, y}, {x, y, z} and hx, yi the following terms:
(a) {x} = 0 / x.
(b) {x, y} = {x} / y, {x, y, z} = {x, y} / z.
(c) hx, yi = {{x}, {x, y}}.
Theorem 1.4.
(a) u ∈ {x} ↔ u = x.
(b) u ∈ {x, y} ↔ u = x ∨ u = y.
(c) {x, x} = {x}.
(d) hx, yi = hu, vi ↔ x = u ∧ y = v.
Proof. (a), (b) follow directly from the definition. For (c), (d) use Theorem 1.2 and
(a), (b).
Theorem 1.5 (Existence of the union of two sets).

∃z∀u(u ∈ z ↔ u ∈ x ∨ u ∈ y).
Proof. Let α(x) be the above formula. If z exists, z is unique by 1.2, so denote z by x ∪ y.
To prove ∀xα(x), we apply HF3:
1◦ ` α(0) [take z = y in α(0)].
2◦ ` [α(x) → α(x / w)].
To show 2◦ , assume α(x), i.e., the existence of x ∪ y. By 1.1(b), α(x / w) is equivalent to
∃z∀u(u ∈ z ↔ u ∈ x ∨ u = w ∨ u ∈ y). We can take z = (x ∪ y) / w.
Theorem 1.6 (Existence of the union of a set of sets).
∃z∀u(u ∈ z ↔ ∃(y ∈ x)[u ∈ y]).
S
Proof. Let α(x) be the above formula and let x be the above z, if it exists. To prove
∀xα(x), we apply HF3:
S
1◦ ` α(0) [take z = 0 in α(0), i.e., 0 = 0].
2◦ ` [α(x) → α(x / w)].
S
To show 2◦ , assume α(x), i.e., that x exists. We need to find z such that
u ∈ z ↔ ∃(y ∈ x / w)[u ∈ y]
↔ ∃y[(y ∈ x ∨ y = w) ∧ (u ∈ y)]
↔ ∃(y ∈ x)[u ∈ y] ∨ u ∈ w
[
↔u∈ x ∨ u ∈ w.
S
So we can take z = x ∪ w (see previous theorem).
Theorem 1.7 (Comprehension Scheme). ∃z∀u[u ∈ z ↔ (u ∈ x) ∧ ϕ(u)] for any formula
ϕ(u) in which z is not free.
Proof. Let α(x) be the above formula. If z exists, then it is unique and we shall denote
it by {u ∈ x : ϕ(u)}. To prove ∀xα(x), we use HF3:
1◦ ` α(0) [take z = 0 in α(0)].
2◦ ` [α(x) → α(x / y)].
To show 2◦ , assume α(x), i.e., that {u ∈ x : ϕ(u)} exists. We have to find a z such that
u ∈ z ↔ (u ∈ x / y) ∧ ϕ(u) ↔ (u ∈ x ∨ u = y) ∧ ϕ(u).
Since z is not free in ϕ(u), the required z is {u ∈ x : ϕ(u)} / y if ϕ(y) and {u ∈ x : ϕ(u)}
if ¬ϕ(y).
Definition 1.8 (Intersection).
(a) x ∩ y = {u ∈ x : u ∈ y}.
T T
(b) x = {v ∈ u0 : ∀(u ∈ x)[v ∈ u]} for some u0 ∈ x if x 6= 0 and x = 0 if x = 0.
Theorem 1.9 (Replacement Scheme).
[(∀(u ∈ x)∃!vψ(u, v)] → ∃z∀v[v ∈ z ↔ ∃(u ∈ x)ψ(u, v)]
for any formula ψ in which z is not free. (“∃!” means “there exists a unique”.)
Proof. Let α(x) be the above formula. If z exists, then it is unique and we shall denote
it by {v : ∃(u ∈ x)ψ}. To prove ` ∀xα(x), we use HF3:
1◦ ` α(0) [take z = 0 in α(0)].
2◦ ` [α(x) → α(x / y)].
To show 2◦ , assume α(x), that is, assume that {v : ∃(u ∈ x)ψ} exists provided that
∀(u ∈ x)∃!vψ(u, v). To show α(x / y), assume that ∀(u ∈ x / y)∃!vψ(u, v), i.e., that
∀u[(u ∈ x ∨ u = y) → ∃!vψ(u, v)]. Hence ∃!vψ(y, v), so let vy be this unique v. Thus the
required z on the right hand side of the implication α(x / y), i.e., {v : ∃(u ∈ x / y)ψ}, is
{v : ∃(u ∈ x)ψ} / vy .
Definition 1.10 (Subset relation).
(a) y ⊆ x ↔ ∀v(v ∈ y → v ∈ x).
(b) y ⊂ x ↔ y ⊆ x ∧ y 6= x.
Theorem 1.11 (Existence of the power set). ∃z∀u(u ∈ z ↔ u ⊆ x).
Proof. Let α(x) be the above formula. If z exists, it is unique and we shall denote it by
P (x). To prove ∀xα(x) we apply HF3. We have
1◦ ` α(0) [take z = {0} in α(0)].
2◦ ` [α(x) → α(x / y)].
To show 2◦ , assume α(x), so that P (x) exists. One checks directly from the definition
that
u ⊆ x / y ↔ u ∈ P (x) ∨ ∃(v ∈ P (x))[u = v / y].
Hence, using the existence of union and the Replacement Scheme (Theorems 1.5 and 1.9),
we find that P (x / y) = P (x) ∪ {u : ∃(v ∈ P (x))[u = v / y]}.
Definition 1.12. We shall call w an ∈-minimal element of z if w ∈ z and no element
of w belongs to z (i.e., w ∩ z = 0).
Theorem 1.13 (Foundation (Regularity) Property).
z 6= 0 → ∃(w ∈ z)[w ∩ z = 0].
Proof (by Donald Monk ). We need to show that ∀(w ∈ z)[w ∩ z 6= 0] → z = 0 and this
will follow if we prove the implication ∀(w ∈ z)[w ∩ z 6= 0] → ∀xα(x), where α(x) is the
formula (x 6∈ z) ∧ (x ∩ z = 0). By HF3, it will be sufficient to establish:
1◦ ` ∀(w ∈ z)[w ∩ z 6= 0] → α(0).
2◦ ` ∀(w ∈ z)[w ∩ z 6= 0] → [α(x) ∧ α(y) → α(x / y)].
Here 1◦ is obvious, and to prove 2◦ , we verify the mutual incompatibility of the following
four conditions:
(a) ∀(w ∈ z)[w ∩ z 6= 0].
(b) α(x), that is, (x 6∈ z) ∧ (x ∩ z = 0).
(c) α(y), that is, (y 6∈ z) ∧ (y ∩ z = 0).
(d) ¬α(x / y), that is, [(x / y) ∈ z)] ∨ [(x / y) ∩ z 6= 0].
The two cases in (d) are:
Case 1: (x / y) ∩ z 6= 0. Since y ∈ z is excluded by (c), we get x ∩ z 6= 0, and that is

excluded by (b).
Case 2: (x / y) ∈ z. Then (x / y) ∩ z 6= 0, by (a), and we are back to Case 1.
Corollary 1.14. x ∩ {x} = 0 (i.e., x 6∈ x).
Proof. Apply Foundation Property to z = {x} 6= 0.
2. Ordinals
Definition 2.1. We say that x is transitive if every element of x is a subset of x. We
call x an ordinal if x is transitive and every element of x is transitive. This will be
expressed by the formula
∀(y ∈ x){[y ⊆ x] ∧ ∀(z ∈ y)[z ⊆ y]}.
The above formula will be abbreviated as Ord(x).

Definition 2.2 (Successor). x+ = x / x = x ∪ {x}.
Theorem 2.3.
(a) Ord(x) → Ord(x+ ).
(b) Ord(x) ∧ y ∈ x → Ord(y).
(c) Ord(0).
(d) If x is a non-zero ordinal , then the ∈-minimal element of x is 0.
Proofs. Immediate from Definitions 1.12, 2.1 and 2.2.
Ordinals will be mostly denoted by k, l, m, n, p, q, r.
Theorem 2.4 (Comparability of ordinals).
Ord(k) ∧ Ord(l) → k ∈ l ∨ k = l ∨ l ∈ k.
Proof. Let β(k, l) be the formula k ∈ l ∨ k = l ∨ l ∈ k and suppose that
(1) ¬β(k0 , l0 )
for some ordinals k0 , l0 . We claim that then we can select k0 that satisfies
(2) ∀(m ∈ k0 )∀lβ(m, l)
(∀l ranging over ordinals). Indeed, if (2) fails then {m ∈ k0 : ∃l¬β(m, l)} 6= 0 and if r0
is an ∈-minimal element of this set (see 1.13), then r0 ∈ k0 and for each n ∈ r0 , we have
n ∈ k0 (because k0 is transitive), and ∀lβ(n, l). In other words, ∀(n ∈ r0 )∀lβ(n, l). So (2)
holds if we replace k0 by r0 —or simply start our considerations with r0 in place of k0
(noting that ∃l¬β(r0 , l), by the definition of r0 ).
We claim further that, having selected k0 such that (2) holds, we can select l0 satisfying
(3) ∀(p ∈ l0 )β(k0 , p).
Indeed, suppose (3) fails for some p, so that {p ∈ l0 : ¬β(k0 , p)} 6= 0. Let q0 be an
∈-minimal element of this set. Then q0 ∈ l0 and for each m ∈ q0 we have m ∈ l0 (because
l0 is transitive), and also β(k0 , m). In other words, ∀(m ∈ q0 )β(k0 , m)—so we get (3)
with q0 in place of l0 , or else we get (3) by just replacing l0 at the beginning of our
considerations by q0 (noting that ¬β(k0 , q0 ), by the definition of q0 ).
We now claim that l0 ⊂ k0 . Indeed, suppose p ∈ l0 . Then β(k0 , p), by (3), i.e. k0 ∈
p ∨ k0 = p ∨ p ∈ k0 . But k0 ∈ p contradicts p ∈ l0 and (1) (due to p ⊆ l0 ). Also k0 = p
contradicts (1) (since p ∈ l0 is assumed). Thus p ∈ k0 , and l0 ⊆ k0 follows. Since l0 = k0
contradicts (1), we get l0 ⊂ k0 .
Let m ∈ k0 \ l0 . Then β(m, l0 ), by (2), i.e.,
m ∈ l0 ∨ m = l0 ∨ l0 ∈ m.
Here both m = l0 and l0 ∈ m contradict (1), because m ∈ k0 . But also m ∈ l0 is
impossible, by the choice of m.
Theorem 2.5. Let k, l be ordinals. Then
(a) Exactly one of k ∈ l, k = l, l ∈ k occurs.
(b) k ∈ l ↔ k ⊂ l.
(c) l ∈ k → (l+ = k ∨ l+ ∈ k).
(d) k + = l+ → k = l.
Proof. (a) Any two of the three mentioned possibilities contradict 1.14. At least one
possibility occurs, by 2.4.
(b) Implication → follows by the transitivity of l and 1.14. Suppose k ⊂ l. Then
k ∈ l ∨ l ∈ k, by 2.4. But l ∈ k yields l ∈ l, contradicting 1.14.
(c) By 2.4, we have to exclude k ∈ l + , i.e., we have to show that neither k ∈ l nor
k = l can occur. This is indeed so because each of these possibilities, together with l ∈ k,
leads to l ∈ l.
(d) Suppose k 6= l and k + = k ∪ {k} = l ∪ {l} = l+ . Then k ∈ l and l ∈ k, yielding
again the contradiction l ∈ l.
Definition 2.6. For ordinals k, l, we shall use:
(i) k < l to abbreviate k ∈ l,
(ii) k ≤ l to abbreviate x < y ∨ x = y.
In Section 5 we shall define a binary relation of inequality < on the universe of all
sets. On the sub-universe of ordinals, both relations will be seen to coincide.
Theorem 2.7. Every set x of ordinals is ordered by the binary relation < (or equivalen-
tly, by ⊂). Moreover if x 6= 0 then x has a smallest and a largest element.
Proof. Let x be a set of ordinals. Then x is ordered by ∈ because of (a), (b) in 2.5.
Denote the largest element of x, if existing, by max(x). Let α(x) be the formula stating
the existence of max(x) for non-empty x:
[x 6= 0 ∧ ∀(k ∈ x) Ord(k)] → ∃(l ∈ x)∀(k ∈ x)[k ≤ l].
To prove ` ∀xα(x), we use HF3, i.e., we verify:
1◦ ` α(0) [obvious].
2◦ ` [α(x) → α(x / y)].
To show the latter, assume α(x), i.e., the existence of max(x), provided x is a non-empty
set of ordinals. We should deduce α(x / y). Thus assume that x / y is a set of ordinals
(evidently non-empty). Then x is a set of ordinals, so if x 6= 0, then max(x) exists by
the assumption α(x). Since x / y arises from x by adjoining one element, it is clear that
max(x / y) also exists. If, on the other hand, x = 0, then x / y = {y} and y = max(x / y).
To deduce the existence of min(x), replace in the above proof ≤ by ≥ and “max” by
“min” throughout.
Theorem 2.8. Ord(k) ∧ k 6= 0 → ∃!l(l + = k).
Proof. By 2.1, k is a non-empty set of ordinals. Let l = max(k) (see 2.7). Then l ∈ k,
whence l+ = k ∨ l+ ∈ k, by 2.5. But l+ ∈ k contradicts l = max(k). Thus l + = k. The
uniqueness of l follows from 2.5(d).
Definition 2.9. The unique (if existing) l for which l + = k will be denoted by k and
called the predecessor of k.
3. p-functions
As commonly defined, x is a function if each y ∈ x is an ordered pair and

[hu, v1 i ∈ x ∧ hu, v2 i ∈ x] → v1 = v2 .
S
Since hu, vi = {{u}, {u, v}}, it follows from hu, vi ∈ x that {u} ∈ x, and hence u ∈
SS
( x). Thus the domain of a function x is
n [ [ o
dom(x) = u ∈ x : ∃v(hu, vi ∈ x)
(see 1.6 and 1.7).
It will be convenient for us to use also functions in a broader sense; their domains will
not be sets and we shall call them p-functions. A symbol denoting a p-function will not
belong to the formal language describing HF (if it were added to the language of HF then
in discourses like in Part 1 it would have to be coded). Nevertheless, formulas containing
(names of) p-functions will be written and understood as representing certain formulas
of HF. p-functions will always be introduced by formulas of HF, by a process we shall
presently describe.
Definition 3.1. A formula ϕ of HF will be called functional with respect to a free
occurring variable y iff (∃!y)ϕ is a theorem.
If ϕ has n + 1 free variables and ϕ is functional with respect to y, we associate with ϕ
and y a new n-ary function symbol Fϕy . We shall write formulas of the first order language
obtained by adjoining Fϕy to the language of HF. However this expanded language will
NOT be our formal means of describing HF, and formulas containing Fϕy will be viewed
merely as representing (mostly abbreviating) certain formulas of HF. Fϕy will be called
a p-symbol and formulas containing Fϕy will be called p-formulas. Here we may read “p”
as standing for “phantom” (in [Bo], “p” stands for “pseudo”)—to symbolize appearance
without formal acceptance.
A term Fϕy (τ1 , . . . , τn ), where τ1 , . . . , τn are any terms of HF, will be called a simple
p-term. In general, terms belonging to the language of HF expanded by Fϕy will be
called p-terms. Such p-terms do not represent anything in HF, however formulas contain-
ing p-terms (i.e., p-formulas) will, as mentioned above, represent formulas of HF. This
representation is not unique: a p-formula represents many, logically equivalent, formulas
of HF. To describe this representation, we introduce a process of reduction by which a
p-formula is reduced to one containing one less simple p-term. In detail, suppose the free
variables in ϕ, other than y, are xi1 , . . . , xin (say, in the increasing order of indices). Then
we shall write ϕ as ϕ(xi1 , . . . , xin , y). Let α be a p-formula. Clearly, α contains a simple
p-term and the latter occurs in an atomic sub-formula of α. In other words, there exists
an atomic formula β(z) and there are HF-terms τ1 , . . . , τn such that β(Fϕy (τ1 , . . . , τn )) is a
sub-formula of α. The reduction which we now apply to α consists in replacing, inside α,
the atomic sub-formula β(Fϕy (τ1 , . . . , τn )) by
∃w[β(w) ∧ ϕ(τ
e 1 , . . . , τn , w)],
where w is a new variable, not occurring in α (thus neither in any of the terms τ1 , . . . , τn )
and ϕ
e is any variant of ϕ such that τ1 , . . . , τn , w are substitutable for xi1 , . . . , xin , y in ϕ.
e
[A variant ϕe of ϕ is obtained by renaming the bound variables of ϕ so that ϕ ↔ ϕ e is a
theorem of logic.]
As a result of such a reduction the number of occurrences of Fϕy in a formula is reduced
by one. Thus, when no further reduction is possible, we end up with a formula of HF and
we shall say that the latter is represented by α (or, that α represents the latter). Let us
agree that if α represents γ then α also represents every formula equivalent to γ.
For example, the p-formula Fϕy (xi1 , . . . , xin ) = u represents
∃w[w = u ∧ ϕ(xi1 , . . . , xin , w)],
and the latter, in view of ` ∃!yϕ(xi1 , . . . , xin , y), is equivalent to ϕ(xi1 , . . . , xin , u).
Often instead of Fϕy we shall use only one letter (usually capital); then ϕ and y will be
specified beforehand. By abuse of language, we shall call a p-function symbol a p-function.
We shall use several p-functions, each of these introduced by a formula functional
with respect to a variable (3.1). A formula made up of symbols of HF and several distinct
p-function symbols represents a formula of HF. We obtain the latter by consecutive
reductions eliminating simple p-terms as described above.
Occasionally it may be convenient to consider a p-formula ψ(x, y) functional with
respect to y. However the p-function Fψy defined by such a formula need not be regarded
as a new sort of entity, as we can identify it with the p-function Fϕy , where ϕ is an
HF-formula represented by ψ.
Examples of p-functions abound; e.g., it is clear that the following formulas are func-
tional with respect to y:
S
(a) y = x.
T
(b) y = x.
(c) y = P (x) (i.e., ∀u(u ∈ y ↔ u ⊆ x)).
Other examples are p-functions defined by recursion. We begin with infinite sequences,
defined by recursion on ordinals. To define these, we use finite sequences, called here
simply sequences.
Definition 3.2 (Sequence). A function whose domain is an ordinal k 6= 0 will be called
a sequence of length k. We denote by Seq(s, k) any formula stating that s is a sequence of
length k (e.g., the formula introduced in Part 1, 2.3(b)). If Seq(s, k) then, for any formula
α(z) and n < k, we shall write α(sn ) to abbreviate ∃z[hn, zi ∈ s ∧ α(z)].
Theorem 3.3. For every constant term τ and p-functions G, H there exists a p-function
F such that
(a) F (0) = τ .
(b) x 6= 0 ∧ Ord(x) → F (x) = G(F (x)).
(c) ¬ Ord(x) → F (x) = H(x).
Proof. Let ψ(x, y) be the p-formula
{x 6= 0 ∧ Ord(x) ∧ ∃s[Seq(s, x) ∧ y = G(sx ) ∧ ∀(n < x)[(n = 0 ∧ sn = τ ) ∨
(n 6= 0 ∧ sn = G(sn ))]]} ∨ [¬ Ord(x) ∧ H(x) = y] ∨ [x = 0 ∧ y = τ ].
Then it is easy to prove that ψ is functional with respect to y. (The existence of a least
ordinal x such that there is no unique y for which ψ(x, y) leads easily to a contradiction.)
Next, we check that (a), (b), (c) hold when F is Fψy .
Definition 3.4. We shall say that the p-function F in 3.3 is defined recursively on ordi-
nals.
4. The rank function
Definition 4.1. Let R(x) denote the p-function defined recursively on ordinals by
(a) R(0) = 0.
(b) R(x) = P (R(x)) for very non-zero ordinal x.
(c) R(x) = 0 if x is not an ordinal.
Theorem 4.2. For all ordinals m, n:
(a) n < m → R(n) ⊂ R(m).
(b) R(m) is transitive.
Proof. (a) Obviously R(0) = 0 ⊂ R(m) for 0 < m. Now, if there is some n such that
n < m, for some m, but R(n) ⊂ R(m) fails, then by 2.7 there is the least n with
that property. By the above, n > 0, whence n < m. Moreover, R(n) ⊂ R(m). Thus
R(n) ⊂ R(m), by the definition of R, and we get a contradiction.
(b) Clearly R(0) = 0 is transitive. If m 6= 0, then m < m, whence
x ∈ R(m) → x ⊆ R(m) ⊂ R(m),
by (b) in 4.1 and (a) above.
Theorem 4.3. ∃n[Ord(n) ∧ x ∈ R(n)] (i.e., the sets R(n) cover the universe).
Proof. Let α(x) be the above formula. We apply HF3 to prove ∀xα(x). We have
1◦ α(0) [because 0 ∈ R(0+ ) = P (R(0)) = P (0) = {0}].
2◦ ` [α(x) ∧ α(y) → α(x / y)].
To prove 2◦ , assume α(x) ∧ α(y), i.e., x ∈ R(n), y ∈ R(m) for some ordinals m, n. Then
x / y = x ∪ {y} ⊆ R(n) ∪ R(m+ )
by 4.1(b) and 4.2(b). So, for k = max{m+ , n}, x / y ⊆ R(k) (see 4.2(a)), whence
x / y ∈ R(k + ).
Definition 4.4. For every x, let rank(x) be the least ordinal n such that x ∈ R(n+ )
(i.e., x ⊆ R(n)).
Theorem 4.5. x ∈ y → rank(x) < rank(y).
Proof. Let x ∈ y and suppose rank(y) = n. Then y ⊆ R(n) and hence x ∈ R(n). Thus
n 6= 0 and x ⊆ R(n), whence rank(x) ≤ n < n.
Definition 4.6. The transitive closure cl(x) of x is the minimal transitive set y such
that x ⊆ y.
Concerning the existence of cl(x), note that x ⊆ R(n) for some ordinal n (see 4.3),
and R(n) is transitive (see 2.2). Thus cl(x) is the intersection of all transitive subsets
of R(n) which contain x as a subset.
Theorem 4.7. rank(x) = rank(cl(x)).
Proof. One checks without problems that:
(a) u ⊆ v → rank(u) ≤ rank(v).
(b) rank(u ∪ v) = max{rank(u), rank(v)} (see 4.2(a)).
(c) rank({w}) = (rank(w))+ .
Now let α(x) be the formula rank(x) = rank(cl(x)). We apply HF3 to prove ∀xα(x). We
have
1◦ ` α(0) [because cl(0) = 0].
2◦ ` [α(x) ∧ α(y) → α(x / y)].
To prove 2◦ , assume α(x) ∧ α(y), i.e., that for some m, n, rank(x) = rank(cl(x)) = n and
rank(y) = rank(cl(y)) = m. Then both sets on the two sides of the inclusion
x / y = x ∪ {y} ⊆ cl(x) ∪ cl(y) ∪ {y}
have the same rank, namely max{n, m+ } (see (b), (c) above). Since the set on the right
is transitive,
x / y ⊆ cl(x / y) ⊆ cl(x) ∪ cl(y) ∪ {y}.
Thus x / y and cl(x / y) have the same rank (see (a)).
The name of the following theorem is justified by the fact that rank(u) < rank(x) for
all u ∈ cl(x), by 4.5 and 4.7. To state it, we need the following definition.
Definition 4.8 (Restriction). If f is a function and z ⊆ dom(f ), then the function
{y ∈ f : ∃(u ∈ z)∃v(y = hu, vi)} will be denoted by f z and called the restriction of f
to z. A suggestive notation is f z = {hu, f (u)i : u ∈ z}. Similarly, if F is a p-function, we

denote by F z the function {hu, F (u)i : u ∈ z}. (The existence of F z follows from 1.9.)
Theorem 4.9 (Definition of a p-function by recursion on rank). For every binary p-
function G(x, y) there exists a p-function F such that
` F (x) = G(x, F cl(x)).
Proof. Let us call a function f good if its domain d is transitive and
(∗) f (u) = G(u, f cl(u))
for all u ∈ d. (The right side makes sense because u ∈ d → cl(u) ⊆ cl(d) = d due to
transitivity.) We shall show:
(a) If f is good and d0 is a transitive subset of the domain d of f , then f d0 is good.
(b) If f, f 0 are good functions with domains d, d0 , then f (d ∩ d0 ) = f 0 (d ∩ d0 ).
(c) For every x, there exists a good function with domain cl(x) / x.
Assuming for the moment that (a), (b) and (c) have been shown, let us show how
to find F . For every x, let fx be the good function with domain cl(x) / x (which exists
and is unique, by (b) and (c)). Obviously, there is a formula ϕ(x, y) of HF that can be
abbreviated as y = fx (x). And clearly, such ϕ is functional with respect to y. Let F
be Fϕy . Then F (x) = fx (x) is a theorem, because according to the reduction procedure
in Section 3, this p-formula reduces to ∃w(w = fx (x)). So it remains to prove fx (x) =
G(x, F cl(x)), and this will follow from (∗) (taken with f = fx , u = x) provided F cl(x) =
fx cl(x), i.e. if for all u ∈ cl(x) one has fu (u) = fx (u). Now, this is indeed so, for if
u ∈ cl(x) then
dom(fu ) = cl(u) ∪ {u} ⊆ cl(x) ⊆ dom(fx ),
and since both domains contain u, fu (u) = fx (u) follows from (b).
Proof of (a). If (∗) holds for all u ∈ d, then (∗) holds for all u ∈ d0 ⊆ d.
Proof of (b). By the Foundation Theorem 1.13, 0 belongs to every transitive set, hence
0 ∈ d ∩ d0 . Also
f (0) = G(0, 0) = f 0 (0).
Let u0 ∈ d ∩ d0 be of smallest rank such that f (u0 ) 6= f 0 (u0 ), if such u0 exists. By the
above, u0 6= 0 and we have
u ∈ cl(u0 ) → rank(u) < rank(u0 )
(see 4.5 and 4.7), whence f (u) = f 0 (u) for all such u and thus f cl(u0 ) = f 0 cl(u0 ). Hence
f (u0 ) = G(u0 , f cl(u0 )) = G(u0 , f 0 cl(u0 )) = f 0 (u0 ).
This contradicts the choice of u0 . We conclude that f (u) = f 0 (u) for all u ∈ d ∩ d0 .
Proof of (c). We have cl(0) / 0 = {0} and a good function f with this domain is
defined by f (0) = G(0, 0). Suppose there is an x0 such that a good function with domain
cl(x0 ) / x0 does not exist, and let x0 be of the least rank possible. Then x0 6= 0. For each
u ∈ cl(x0 ), let fu denote the unique (see (b)) good function with domain cl(u) / u. The
existence of fu follows from
rank(u) < rank(cl(x0 )) = rank(x0 )
(see 4.5, 4.7). Clearly cl(x0 ) is covered by the domains cl(u) / u of all fu , u ∈ cl(x0 ). So,
by (b), the union
[
f = {fu : u ∈ cl(x0 )}
is a good function with domain cl(x0 ). But then the set
f / hx0 , G(x0 , f )i
is evidently a good function with domain cl(x0 ) / x0 , contrary to the choice of x0 . The
contradiction shows that a good function with domain cl(x) / x always exists. This
finishes the proof of (c), and hence of Theorem 4.9.
Certain p-functions of n variables can be defined in a similar way. Let us illustrate
this for the case n = 2.
Theorem 4.10. For every ternary p-function G(x, y, z) there exists a binary p-function
F (x, y) such that
` F (x, y) = G(x, y, F (cl(x) × cl(y))).
Proof. To show this, we proceed by analogy with the preceding proof. For transitive sets
d1 , d2 , we say that a function f with domain d1 × d2 is good if
f (u, v) = G(u, v, f (cl(u) × cl(v)))
for all hu, vi ∈ d1 × d2 . One proves that for every x, y there is exactly one good function
with domain (cl(x) / x) × (cl(y) / y), and when this function is denoted by fxy , one
defines F (x, y) = fxy (x, y). In the part of the proof corresponding to (c) above, one
can start again from negating the claim which leads to the existence of an x0 of least
rank for which there is a y0 such that there does not exist a good function with domain
(cl(x0 ) / x0 )×(cl(y0 ) / y0 ). Since also y0 can then be chosen of least rank, a contradiction
results similarly to the proof of (c) above.
5. A definable order on the universe
In the ZF set theory there does not seem to exist a formula that defines a relation of
global linear order. For HF, such definable order exists, moreover it is a refinement of
the partial order by rank and also of the partial order by subsets. The existence of a
definable order for HF can be deduced from the definitional equivalence of HF and PA,
but we choose here a shorter path: The order will be defined recursively by a suitably
chosen formula (the latter can be found in [My]).
In contrast to the ZF set theory, in HF every ordering of a set x is a well-ordering
(in spite of the fact that in non-standard models of HF certain sets are infinite).
Theorem 5.1. Every ordering relation r ⊆ x × x is a well-ordering.
Proof. Let α(x) be the formula saying: “if r ⊆ x × x orders x, then r well-orders” x. To
prove ` ∀xα(x), we use HF3. We have
1◦ ` α(0) [obvious].
2◦ ` α(x) → α(x / y).
To prove 2◦ , assume α(x) and suppose r ⊆ (x / y) × (x / y) orders x / y. Then, by α(x),

the restriction of r to x well-orders x. By adding to x the single element y (if it is not
in x already), one does not spoil the well-ordering.
Corollary 5.2. For every ordering relation r on a set x, every non-empty y ⊆ x has
a minimal element minr (y) and a maximal element maxr (y). [For maxr consider the
inverse relation r −1 , which also well-orders x.]
Definition 5.3. By an n-ary p-relation we shall mean a formula ϕ with n free variables.
The characteristic p-function of such a p-relation is the {0, 1}-valued p-function F such
that
` (F (x1 , . . . , xn ) = 1) ↔ ϕ(x1 , . . . , xn ).
We say that the p-relation is defined by recursion on rank if its characteristic p-function
is defined by recursion on rank (as in Theorem 4.9 or its generalization to more variables).
Theorem 5.4. There exists a binary p-relation < such that
` x < y ↔ ∃(v ∈ y \ x)∀(u ∈ x \ y)(u < v).
Proof. This reduces to a verification that the above defines < by recursion on rank, i.e.,
according to 4.10. Obviously 0 < y for all y 6= 0, and x ≮ 0 for all x. Moreover, for x, y 6= 0
the above condition determines x < y when u < v is known for all hu, vi ∈ x × y, thus
x < y is determined by the restriction of < to cl(x) × cl(y). Hence 4.10 can be applied to
find the (binary) recursively defined characteristic p-function F of <. (We leave checking
the details, i.e., finding (the ternary) G, to the reader).
Note. It is clear from this definition that x ⊂ y → x < y. Hence our use hitherto of <
between ordinals does not clash with the definition of < in 5.4.
To establish the order properties of < we shall associate with every order r on a set z
an order {r}z on the power set P (z) of z.
Definition 5.5. If r is an order relation on z, we shall denote by {r}z the binary relation
on the power set P (z) given, for any x, y ⊆ z, by
x{r}z y ↔ ∃(v ∈ y \ x)∀(u ∈ x \ y)(hu, vi ∈ r).
(Here x{r}z y abbreviates hx, yi ∈ {r}z .)
Note. Each y ⊆ z is described by its characteristic function χy : z → {0, 1}, so that
χy (u) = 1 iff u ∈ y. The functions χy can be viewed as z-indexed sequences of 0’s
and 1’s. The order r on z (which is a well-ordering; see 4.1) determines the so-called
anti-lexicographical ordering of z-indexed sequences of integers. This (when restricted to
0–1 sequences) is an ordering of the set of characteristic functions χy , y ⊆ z. By the
correspondence χy ⇔ y, it induces the order {r}z .
It is easily seen that x{r}z y ↔ x 6= y ∧ maxr [(x \ y) ∪ (y \ x)] ∈ y. From this we
immediately get the proof of:
Theorem 5.6. x ⊂ y ⊆ z → x{r}z y.
Theorem 5.7. For every order relation r on a set z, {r}z is an order relation on P (z).
Proof. For any subset u of z, let {r}u denote the restriction of r to u, i.e., r ∩ (u × u).
If z = 0 then there is nothing to prove. So let us assume that z 6= 0 and let us denote
maxr (z) by w (see 4.2). We first show that if {r}z\{w} is an order on P (z \ {w}), then
{r}z is an order on P (z). For this purpose, we first note that for every x, y ⊆ z:
(1) w ∈ x ∩ y → [x{r}z y ↔ (x \ {w}){r}z\{w} (y \ {w})].
(2) w ∈
6 x ∪ y → [x{r}z y ↔ x{r}z\{w} y].
(3) w ∈ y \ x → x{r}z y.
Let P (z) be partitioned in two sets: those subsets of z which contain w and those which
do not. Then, assuming that {r}z\{w} is an order on P (z \ {w}), we see from (1) and (2)
that each of the two sets of subsets of z is ordered by {r}z , and it follows from (3) that
each element of the second set (of those subsets of z which do not contain w) r-precedes
each element of the first set. Thus {r}z is an order on P (z).
To complete the proof, let u0 = minr (z) and for each v ∈ z, let [u0 , v] ⊆ z be the closed
interval (for the order r) with end-points u0 , v. One now shows by induction with respect
to the well-ordering r (see 4.1) that for each v ∈ z, {r}[u0 ,v] is an order on P ([u0 , v]).
This is evident for v = u0 (because of [u0 , u0 ] = {u0 }, P ({u0 }) = {0, {u0 }} and 5.6). The
inductive step is accomplished by applying to [u0 , v] the same observations (1), (2), (3)
as were made for z (but with w = v in this case).
Theorem 5.8. Let < be the p-relation introduced by Theorem 5.4, let n be a non-zero
ordinal and suppose that:
(a) R(n) is ordered by <, and
(b) R(n) < (R(n) \ R(n)), i.e., ∀u∀v[u ∈ R(n) ∧ v ∈ (R(n) \ R(n)) → u < v].
Then:
(a+ ) R(n+ ) is ordered by <, and
(b+ ) R(n) < (R(n+ ) \ R(n)).
Proof. Assume (a), (b) and n 6= 0. By definition, R(n+ ) = P (R(n)) and on R(n+ )
the relation < coincides with {<}R(n) (see 5.5). Hence (a+ ) holds, by 5.7. Now let y ∈
R(n+ ) \ R(n) and let x ∈ R(n), i.e., x ⊆ R(n). Then y ⊆ R(n) and y * R(n), so there is
a v ∈ y such that v ∈ R(n) \ R(n). By (b), this v satisfies u < v for all u ∈ R(n), hence
also for all u ∈ x ⊆ R(n). Thus x < y, by 5.4, and (b+ ) is shown.
Theorem 5.9. < is a relation of order and < refines the partial order by subsets and the
partial order by rank.
Proof. It will suffice to prove that:
(1) x 6< x ∧ (x < y < z → x < z) ∧ (x < y ∨ y < x ∨ x = y).
(2) rank(x) < rank(y) → x < y.
To begin with, note that (a) and (b) in 5.8 hold when n = 0, i.e., when R(n) = 0
and R(n) = {0}. So it follows from that theorem that (a) and (b) hold for all non-zero
ordinals n. In particular, taking n large enough so that x, y, z ∈ R(n) (see 4.3), we get (1).
Next, assume that rank(x) < rank(y) and let rank(x) = m, rank(y) = n. Then
m < n → m+ ≤ n → R(m+ ) ⊆ R(n).
Since x ∈ R(m+ ) ⊆ R(n) and y ∈ R(n+ ) \ R(n), x < y follows from R(n) < R(n+ ) \ R(n)
(which is property (b+ ) in 5.8).
Corollary 5.10.
(a) Every set x 6= 0 has an <-largest and an <-smallest element.
(b) x ∈ y → x < y.
Proof. (a) follows from 5.2, and (b) from 4.5 and property (2) in 5.9.
Theorem 5.11 (Induction scheme for inequality). For any formula ϕ(x),
∀x[∀(y < x)ϕ(y) → ϕ(x))] → ∀xϕ(x).
Proof. Suppose the premise ∀x[. . .] of the above implication holds and ¬ϕ(x0 ) is possible,
where x0 ∈ R(n+ +
0 ). Then the set z0 = {u ∈ R(n0 ) : ¬ϕ(u)} is non-empty and min< (z)
exists (see 5.2). We can assume that min< (z) = x0 , whence x0 ∈ R(n+ 0 ) and rank(x0 )
≤ n0 . Now, for any y,
y < x0 → rank(y) ≤ rank(x0 ) ≤ n0 ,
by 5.9(2), whence y < x0 → y ∈ R(n+ 0 ). By the definition of z and x0 , it now follows
that y < x0 → ϕ(y), i.e., ∀(y < x0 )ϕ(y). Thus ϕ(x0 ), by the assumed premise ∀x[. . .],
contrary to the choice of x0 .
Theorem 5.12. ∃z∀y(y < x ↔ y ∈ z).
Proof. We see from (2) in 5.9 that y < x → rank(y) ≤ rank(x). Thus
` y < x → y ∈ R((rank(x))+ ),
by the definition (4.4) of rank, and 4.2(a). By the Comprehension Scheme 1.7, the required
z may now be taken as
z = {y ∈ R((rank(x))+ ) : y < x}.
Definition 5.13. For every x, we shall denote by [0, x) the unique z that satisfies
∀y(y ∈ z ↔ y < x).
Theorem 5.14 (Definition of a p-function by recursion on inequality). For every binary
p-function G(x, y) there exists a p-function F such that
` F (x) = G(x, F [0, x)).
We omit the straightforward proof (similar to that of 4.9).
6. The standard model S of HF

We conclude the Appendix with a description of the standard model of HF. Such a
description cannot be given formally in HF; it belongs to the meta-theory and to formalize
it one would need (a fragment of) the ZF set theory.
When viewed as an object of the ZF set theory, the standard model of HF is the
structure S = hS, 0, ∈, /i, where S is the smallest (infinite) set that contains the empty
set 0 and is closed under the operation / of adding one element (i.e., S has the property
that for any x, y in S, also x ∪ {y} is in S). However, it does not seem appropriate to
involve the more powerful ZF theory in order to describe a model of HF, so we rather
choose for a construction using a part of the meta-theory of HF and only as much of ZF
(informally) as is needed. To begin with, we establish some properties of constant terms.
Definition 6.1. We denote by C the class of constant terms, i.e., C is the smallest class
of terms such that 0 ∈ C and σ / τ ∈ C for any σ, τ ∈ C. For every τ ∈ C the number of
appearances of / in τ will be called the length of τ and denoted by l(τ ).
Lemma 6.2. For every 0 6= τ ∈ C there are finitely many τ1 , . . . , τm ∈ C, all shorter
than τ , such that
` x ∈ τ ↔ x = τ 1 ∨ . . . ∨ x = τm .
Proof. The shortest τ 6= 0 is 0 / 0. We have the theorem
` x ∈ 0 / 0 ↔ x ∈ 0 ∨ x = 0 ↔ x = 0,
i.e., m = 1 and τ1 is 0. Now assume that the lemma has been established for all non-zero
constant terms shorter than τ , and let τ be σ / µ. Then σ is shorter than τ , hence there
are σ1 , . . . , σn ∈ C such that
` x ∈ σ ↔ x = σ1 ∨ . . . ∨ x = σn .
Since ` x ∈ τ ↔ x ∈ σ ∨ x = µ, we get
` x ∈ τ ↔ x = σ1 ∨ . . . ∨ x = σn ∨ x = µ.
By inductive assumption, each σi is shorter than σ, thus also shorter than τ . Also µ is
shorter than τ .
Lemma 6.3. Let σ, τ ∈ C be such that non ` σ = τ (i.e., σ = τ is not a theorem of HF).
Then, for some ν ∈ C, one of the two possibilities occurs:
(A) ` ν ∈ σ and ` ν ∈6 τ.
(B) ` ν 6∈ σ and ` ν ∈ τ .
Proof. To begin with, let us establish (A) and (B) when one of the terms σ, τ is 0. Suppose
σ is 0, and assume that σ = τ is not a theorem. Then obviously τ is not the term 0, so τ
is of the form µ / ν and we have ` ν ∈ τ . Clearly ` ν 6∈ 0. Thus (B) follows. Analogously,
if τ is 0 and σ = τ is not a theorem, we get (A).
We now prove the lemma by induction on the length l(σ). The case l(σ) = 0 has been
dealt with already. Suppose that l(σ) > 0 and that the lemma has been shown in every
case (i.e., for every τ ) when σ is replaced by a term shorter than σ. Suppose further that
σ = τ is not a theorem. We should show that then, for every τ , there is a ν such that
either (A) or (B). This was already shown when τ is 0. So it remains to deal with the
situation when neither σ nor τ is 0. Then, by 6.2, there are σ1 , . . . , σn ∈ C, all shorter
than σ, and τ1 , . . . , τm ∈ C, all shorter than τ , such that
(1) ` x ∈ σ ↔ x = σ1 ∨ . . . ∨ x = σn .
(2) ` x ∈ τ ↔ x = τ1 ∨ . . . ∨ x = τm .
Suppose for a moment that for every σi there is a τj such that ` σi = τj , and also for
every τj there is a σi such that ` σi = τj . It is clear that then (1) and (2) above imply
` (x ∈ σ ↔ x ∈ τ ), i.e., ` σ = τ . As this contradicts our assumption, at least one of the
two possibilities must occur.
(a) There is a σi such that non ` σi = τj for all τj . Since σi is shorter than σ, we
conclude by the inductive assumption that either (A) or (B) holds for σi (in place of σ)
and every τj (in place of τ ), for suitable νj . Hence ` σi 6= τj for all j. We now replace x
by σi in
` x 6∈ τ ↔ x 6= τ1 ∧ . . . ∧ x 6= τm
(see (2)), concluding ` σi 6∈ τ . On the other hand, ` σi ∈ σ, by (1). So (A) holds with σi
in place of ν.
(b) There is a τj such that non ` σi = τj for all σi . This case is quite analogous to
the previous; so, proceeding as in (a), we show that (B) holds with τj in place of ν.
Corollary 6.4. For every σ, τ ∈ C, non ` σ = τ implies ` σ 6= τ .
Corollary 6.5. If σ, τ ∈ C are such that ` ν ∈ σ ⇔ ` ν ∈ τ for every ν ∈ C, then
` σ = τ.
Definition 6.6. To define the structure S = hS, 0, ∈, /i we introduce the following:
(a) An equivalence relation ≡ on C given by
σ ≡ τ ⇔ ` σ = τ.
(b) For every τ ∈ C, the ≡-equivalence class [τ ] containing τ .
(c) The set S of all equivalence classes [τ ], i.e., S = {[τ ] : τ ∈ C}.
(d) The binary relation ∈ on S defined by
[τ ] ∈ [µ] ⇔ ` τ ∈ µ.
(e) The binary operation / on S defined by
[τ ] / [µ] = [τ / µ].
(f) The abbreviation 0 for [0].
It is easy to see that this structure is well defined, i.e., [τ ] ∈ [µ] and [τ ] / [µ] depend
only on the equivalence classes [τ ], [µ].
Proposition 6.7. If HF is consistent, then S is a model of HF.
Proof. Let us check that HF1, HF2 and HF3 (see Section 1) hold in S.
HF1. Suppose [σ] ∈ S is such that S [τ ] 6∈ [σ] for every τ . Then σ cannot be of the
form µ / ν, because [ν] ∈ [µ / ν] holds in S (see 6.6(d)). Thus σ is 0. Conversely, we have
S [τ ] 6∈ [0] for all τ , because S [τ ] ∈ [0] is equivalent to ` σ ∈ 0 and this contradicts
Theorem 1.1 and the assumption of consistency.
HF2. By 6.5, the Extensionality Property 1.2 holds for S. Thus HF2 will be true
in S if we show that for every τ, µ, ν ∈ C,
` τ ∈ µ / ν ⇔ (` τ ∈ µ or ` τ = ν).
To verify ⇒, assume ` τ ∈ µ / ν. If ` τ = ν, then we are done. If non ` τ = ν, then
by 6.4, ` τ 6= ν. So, in this case,
` τ ∈ µ / ν ⇒ ` (τ ∈ µ ∨ τ = ν) ∧ (τ 6= ν) ⇒ ` τ ∈ µ.
The verification of ⇐ is immediate:
(` τ ∈ µ or ` τ = ν) ⇒ ` (τ ∈ µ ∨ τ = ν) ⇒ ` τ ∈ µ / ν.
HF3. For each equivalence class a ∈ S, let h(a), the height of a, be the smallest l(τ )
(see 6.1) such that τ ∈ a. It is easy to see that if a 6= 0, then there are b, c ∈ S such that
S (a = b / c) and h(b), h(c) < h(a). Given a formula α(x) with one free variable x, let
Aα ⊆ S be the truth set of α, i.e.,
Aα = {a ∈ S : S α(a)}.
Assuming that 0 ∈ Aα and moreover that a, b ∈ Aα implies a / b ∈ Aα for all a, b ∈ S,
we deduce by induction on the height h(c) that c ∈ Aα for every c ∈ S. Thus Aα = S,
and HF3 is verified.
References
[A] W. Ackermann, Die Wiederspruchsfreiheit der allgemeinen Mengenlehre, Math. Ann.
114 (1937), 305–315.
[A-B] Z. Adamowicz and T. Bigorajska, Existentially closed structures and Gödel’s second
incompleteness theorem, J. Symbolic Logic 66 (2001), 349–356.
[Be] E. W. Beth, Axiomatique de la théorie des ensembles sans axiome de l’infini , Bull. Soc.
Math. Belg. 16 (1964), 127–136.
[Bo] G. Boolos, The Logic of Provability, Cambridge Univ. Press, Cambridge, 1993.
[F] S. Feferman, Arithmetization of metamathematics in a general setting, Fund. Math. 49
(1960), 35–92.
[GT] S. Givant and A. Tarski, Peano arithmetic and the Zermelo-like theory of sets with
finite ranks, Notices Amer. Math. Soc. 24 (1977), A-437.
[H-B] D. Hilbert and P. Bernays, Grundlagen der Mathematik , Vol. 2, Springer, Berlin, 1939.
[J] T. Jech, On Gödel’s second incompleteness theorem, Proc. Amer. Math. Soc. 121
(1994), 311–313.
[Ki] M. Kikuchi, Kolmogorov complexity and the second incompleteness theorem, Arch.
Math. Logic 36 (1997), 437–443.
[Kl] S. C. Kleene, Introduction to Metamathematics, Van Nostrand, 1952; Russian transla-
tion by A. S. Yesenin-Vol’pin, Izdat. Inostr. Liter., Moscow, 1957.
[Ko] H. Kotlarski, Other proofs of old results, Math. Logic Quart. 44 (1998), 474–480.
[L] P. Lindström, Aspects of Incompleteness, Lecture Notes in Logic 10, Springer, Berlin,
1997.
[Mo] A. Mostowski, Sentences Undecidable in Formalized Arithmetic, North-Holland, Ams-
terdam, 1964.
[My] J. Mycielski, The definition of arithmetic operations in the Ackermann model , Algebra
i Logika Sem. 3 (1964), no. 5–6, 64–65 (in Russian).
[Sh] J. R. Shoenfield, Mathematical Logic, Addison-Wesley, Reading, MA, 1967.

[Sm] C. A. Smorynski, The incompleteness theorems, in: Handbook of Mathematical Logic,
J. Barwise (ed.), North-Holland, Amsterdam, 1977, 821–865.
[TG] A. Tarski and S. Givant, A Formalization of Set Theory without Variables, Amer. Math.
Soc. Colloq. Publ. 41, Amer. Math. Soc., Providence, RI, 1987.
[TMR] A. Tarski with A. Mostowski and R. M. Robinson, Undecidable Theories, North-
Holland, Amsterdam, 1953.
[V] P. Vopěnka, A new proof of the Gödel’s result of non-provability of consistency, Bull.
Acad. Polon. Sci. Sér. Sci. Math. Astronom. Phys. 14 (1966), 111–116.

dm422 0 01 PDF

Uploaded by

Copyright:

Available Formats

dm422 0 01 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

dm422 0 01 PDF

Uploaded by

Copyright:

Available Formats

0.

1. The language of HF and its coding

Case 2: Exactly one of σ, τ is in Γ0 . Suppose σ is in Γ0 . Since σ names an ordinal,

Atomic formulas: The codes of atomic formulas (for terms µ, τ ) are

We wish now to describe the property of a formula ϕ being provable by a set-theoretical

Proof. We have the theorems:

3. An HF-description of relations between formulas

Note. Requiring NecSeqTerm(s, k, t) in 7 instead of SeqTerm(s, k, t) (obviously the lat-

A formula ϕ is a theorem of HF (in symbols: HF ` ϕ or ` ϕ) if there is a sequence

Conj(x, y) = Neg(Disj(Neg(x), Neg(y)),

Note. To check the above, imagine that, for some α,

5. Phantom terms, functions and formulas

Proof of (1). It will suffice to show (see 22) that

(30 ) ` SeqRepVarTermForm(s, s0 , k, px1 q, pτ q, pϕq, y 0 ) → y 0 = pϕ(xi /τ )q.

6. Gödel’s First Incompleteness Theorem

A. Case when τ is 0. We have [[0]]V = 0, for every V , and by 22,

since W \ {r} = (V \ {r}) \ {i},

8. The special p-function F

9. Gödel’s Second Incompleteness Theorem

There will be no loss of generality if we take here i = 1, j = 2 = n, so let β(x2 ) be the

under the assumption

Suppose x2 = 0. Then the proof of (11) obviously reduces to verifying ` γ(0). To do

10. Finite sets and arithmetic

11. A theorem of Reinhardt

rank(pδi q) ≤ 2l + m + 12, by 11.4. Thus, by (4), each element of s is of rank at most

Appendix: The axiomatic theory of finite sets

In this Appendix, totally self-contained, we present a systematic development of the

1. Axioms and basic results

The non-logical symbols of HF are 0, ∈ and /, where 0 is a constant, ∈ a binary relation

Theorem 1.5 (Existence of the union of two sets).

Case 1: (x / y) ∩ z 6= 0. Since y ∈ z is excluded by (c), we get x ∩ z 6= 0, and that is

The above formula will be abbreviated as Ord(x).

As commonly defined, x is a function if each y ∈ x is an ordered pair and

4. The rank function

to z. A suggestive notation is f z = {hu, f (u)i : u ∈ z}. Similarly, if F is a p-function, we

5. A definable order on the universe

To prove 2◦ , assume α(x) and suppose r ⊆ (x / y) × (x / y) orders x / y. Then, by α(x),

6. The standard model S of HF

[Sh] J. R. Shoenfield, Mathematical Logic, Addison-Wesley, Reading, MA, 1967.

You might also like

to z. A suggestive notation is f z = {hu, f (u)i : u ∈ z}. Similarly, if F is a p-function, we