dm422 0 01 PDF
dm422 0 01 PDF
dm422 0 01 PDF
Introduction
The theory of hereditarily finite sets, called here HF, and described in the Appendix, has
been investigated by Ackermann, Beth, Givant and Tarski ([A], [Be] and [TG]). HF is
very closely related to PA (Peano Arithmetic); in fact, they are definitionally equivalent
(they have a common extension by definitions T —see Section 10). Thus it appears that
the general mathematical significance of establishing Gödel’s incompleteness theorems
with reference to one of these theories is the same as for the other.
Unlike most other authors working in this field, we have chosen HF, rather than
PA, for this task. This was done because HF is much better suited for the purpose of
describing its own meta-theory. The reason lies in the great expressive power of set theory.
A set-theoretical description of the language of HF (and then of the whole meta-theory)
presents itself in a wholly natural fashion, once it has been decided how to code (represent)
by constant terms the 7 basic (primitive) symbols and the variables of the first-order
language of HF. It is at hand to code the variables x1 , x2 , . . . simply by the ordinals
1, 2, . . . The constant 0 can be coded as 0, and the remaining 6 symbols as n-tuples of
0’s, say ∈ as h0, 0i, etc. And here ends the arbitrariness of coding, which is so unpleasant
when languages are arithmetized. Because terms, formulas and proofs are built from these
already coded basic symbols and variables by the formation of n-tuples or sequences and
such processes can be faithfully and naturally replicated by set-theoretical constructions.
By contrast, if natural (Gödel) numbers are used as codes, there is no natural way of
coding a sequence of these (numbers) as one number. (Mostly this is done by means of the
Chinese Remainder Theorem or by using the uniqueness of prime power decomposition.)
The necessity of coding proofs appropriately was pointed out by S. Feferman; he showed
that for a less than natural coding, Gödel’s results will fail (Theorems 5.9, 4.10 in [F]).
We need not face such a problem here: A sequence of codes of formulas that represents
a proof, i.e., a sequence of finite sets, is a finite set in its own right and there is no need
to code it again!
The simplicity (and possibly elegance) of working towards Gödel’s results (6.5 and
9.8) in the realm of finite sets offers one further advantage: With marginally more effort,
one can present all arguments completely, without omissions. As far as we know, such
a degree of completeness in proving Gödel’s results has never been attained before. To
reiterate: All published proofs of Gödel’s incompleteness theorems contain gaps, omissions
or references to key results in other publications. The omissions might not be gross, but
nevertheless presenting a potential stumbling block for the non-specialist. Thus, we have
made our exposition very detailed, hoping to make it thereby accessible to the widest
possible readership. Gödel’s theorems certainly deserve this sort of “popularization”; they
[5]
6 S. Świerczkowski
have generated much interest, even among non-mathematicians, and shed much light on
our understanding (or non-understanding) of the real nature of the game of mathematics.
It has been known since long that the more difficult (second) incompleteness theorem
is easily deduced, once one has established for each formula α the theorem (Hilbert–
Bernays derivabiltity condition)
Pf(pαq) → Pf(pPf(pαq)q),
where pαq is the code of α and Pf(x) is the formula which states that x is the code of a
provable formula. To show this, we followed in broad outline the work of Boolos [Bo]. The
latter pertained to PA and as such it had to be, in the words of its author, “incomplete
and irremediably messy” (p. 16 in [Bo]).
We have added to the main body of the paper Sections 10 and 11: The first deals with
the relationship between PA and HF and in the second we address what might be called
the “real world unprovability” of a theorem.
The author would like to thank Jan Mycielski for suggesting this line of research, to
Don Monk for a detailed verification of correctness (and pointing out inaccuracies), and
also to the other participants of the so-called Ulam seminar at the University of Colorado
in Boulder for listening to the first presentation of this work.
The theory of finite sets, i.e. HF, is presented in the language with the logical constants
=, ∨, ¬, ∃, the variables x1 , x2 , . . . and the non-logical constants 0, ∈ (binary relation)
and / (binary operation). (We shall treat ∧, → and ∀ as defined symbols.)
The following axiomatization of HF was taken from A. Tarski and S. Givant [TG]
(though we modified it slightly). There are two axioms and one scheme. The first axiom
defines 0 as the set deprived of elements, the second axiom defines / as the operation of
adjoining an element to a set, and the axiom scheme describes proofs by induction, very
much like in the case of PA.
HF1. z = 0 ↔ ∀x(x 6∈ z).
HF2. z = x / y ↔ ∀u(u ∈ z ↔ u ∈ x ∨ u = y).
HF3. α(0) ∧ ∀x∀y[α(x) ∧ α(y) → α(x / y)] → ∀xα(x).
The assumption in HF3 is that α is any formula in the first-order language based on
0, ∈ and /, moreover α contains a freely occurring variable z such that x and y are
substitutable for z, whilst α(x), α(y) and α(x / y) denote the effects of substitutions.
(We shall use lower case Latin letters, k, l, m, n, r, s, t, u, v, w, x, y, z, mostly to denote
any conveniently chosen variables; e.g., in the above axioms we can think of x, y, z as
x1 , x2 , x3 .)
A detailed development of the theory, to the extent that is needed for proving the
incompleteness theorems, is given in the Appendix (references to it will be prefixed with
Ap). To point out some differences between HF and the Zermelo–Fränkel set theory ZF,
let us note that in HF we have terms, due to the presence of the binary function symbol /,
Finite sets and Gödel’s incompleteness theorems 7
whereas in ZF there are none. We shall also use the fact that the universe of finite sets
is well ordered by an HF-definable relation < (Ap.5). No such ordering can be defined
in ZF.
The only way to produce terms in HF is by repeated applications of /. We can read
x / y as “x eats y” since this set arises from x by adopting a (new, or not) element y. Of
course, x can eat itself; the result is x / x and this is the successor of x, denoted by x+ .
The first few ordinals are then 0, 0+ = 1, 1+ = 2, . . . etc. (Ap.2).
All the constant terms of HF are obtained from terms with variables by replacing
in the latter all the variables by 0. The family of constant terms will be denoted by C.
The elements of C will be used as codes, and also for the construction of the standard
model S of HF (Ap.6).
We call 0 / x a singleton and denote it by {x}. Similarly, (0 / x) / y will be abbreviated
as {x, y} and ((0 / x) / y) / z as {x, y, z}. Thus {x}, {x, y}, {x, y, z} are terms. The
ordered pair hx, yi = {{x}, {x, y}}, triple hx, y, zi = hx, hy, zii, and, in general, any n-
tuple are composites of terms and hence these are terms themselves.
We wish to apply now the theory HF to describe its own meta-theory, i.e., various
relations between the words of the language of HF. For this purpose we assign to the main
objects of this language (variables, logical and non-logical symbols, terms and formulas)
their codes. What kind of entity should a code be? Since we wish to apply set-theoretical
formulas to codes in order to describe the meta-theory, it appears that a code should be
a set, or at least an entity closely and naturally associated to a set. However in the proof
that follows we have to assign to the code pϕq of a formula ϕ the code of this code, i.e.,
p(pϕq)q (this is accomplished by the function H in the proof of Gödel’s Diagonal Lemma
in Section 6). Since only language objects are coded, pϕq must be one of these. The
simplest language objects naturally associated to sets are the constant terms (associated
to the sets they name). Thus constant terms will be selected here as codes. But there
is a restriction. Two constant terms could be just different names for the same set, like
{{0}, 0} and {0, {0}}. To avoid this situation we shall select codes from a family Γ of
constant terms such that for any two distinct terms σ,τ in Γ one has ` σ 6= τ .
Definition 1.1. We denote by Γ0 the least family of constant terms such that
(a) 0 is in Γ0 .
(b) If σ is in Γ0 then so is σ / σ.
Further, we denote by Γ the least family of terms satisfying Γ0 ⊆ Γ and such that
(c) If σ,τ are in Γ then so is the term hσ, τ i.
Theorem 1.2. For any two distinct (as strings of symbols) terms σ, τ in Γ one has
` σ 6= τ .
Proof. Let us say that two constant terms σ, τ are identified in HF if σ = τ is a theorem.
By Ap.6.4, exactly one of the formulas σ = τ , σ 6= τ is a theorem. So our task is to prove
that distinct terms in Γ are never identified in HF. Let σ,τ be distinct. We consider three
cases:
8 S. Świerczkowski
Case 1: Both σ, τ are in Γ0 . It is easy to see that the terms belonging to Γ0 are names
of ordinal numbers, moreover σ / σ is a name of the successor of the number named by σ
(Ap.2.2 and 2.3(a)). Thus, if τ is longer than σ then the ordinal number named by τ
is obtained from the one named by σ by several applications of the successor operation,
which means that it is larger than the ordinal named by σ. It follows that σ, τ are not
identified in HF.
Case 3: Both σ, τ are not in Γ0 . We prove by induction on the length of σ the following
implication:
If σ, τ are distinct terms in Γ \ Γ0 then σ, τ are not identified in HF.
The inductive assumption is that the above implication holds if σ, τ are replaced by σ 0 , τ 0 ,
where σ 0 is shorter than σ. Let σ, τ be distinct terms in Γ \ Γ0 . Suppose σ is hσ 0 , σ 00 i
and τ is hτ 0 , τ 00 i. Clearly, σ 0 is not the same term as τ 0 , or σ 00 is not the same term as τ 00 .
We assume the first possibility (if the second holds, the proof is analogous). If σ 0 or τ 0
is in Γ0 then one of the Cases 1, 2 applies and we see that σ 0 and τ 0 are not identified
in HF. If both σ 0 , τ 0 are not in Γ0 then the inductive hypothesis yields that σ 0 and τ 0 are
not identified in HF. It follows that hσ 0 , σ 00 i = hτ 0 , τ 00 i is not a theorem and thus σ, τ are
not are not identified in HF.
We begin coding by assigning to each variable xk and each of the basic 7 language
symbols 0, ∈, /, =, ∨, ¬, ∃ their codes, denoted by pxk q, p0q, p∈q, p/q, p=q, p∨q, p¬q, p∃q.
These are the following terms belonging to Γ :
p0q = 0,
p∈q = h0, 0i,
p/q = h0, 0, 0i,
p=q = h0, 0, 0, 0i,
p∨q = h0, 0, 0, 0, 0i,
p¬q = h0, 0, 0, 0, 0, 0i,
p∃q = h0, 0, 0, 0, 0, 0, 0i,
px1 q = 0 / 0, px2 q = px1 q / px1 q, and in general: pxk+ q = pxk q / pxk q.
Thus the code pxk q of xk is a term representing the ordinal number k (see Ap.2.3(a)).
The coding of terms and formulas now proceeds as follows:
Terms: The terms 0 and xk have been coded already. Other terms are coded so that if µ
and τ have been coded, then
pµ / τ q = hp/q, pµq, pτ qi.
Finite sets and Gödel’s incompleteness theorems 9
2. Σ-formulas
For a Σ/ -formula α, its length λ(α) is defined to be the total number of occurrences of
the symbols ∨, ∧, ∃ and ∀.
Let us observe that the length of a Σ/ -formula remains unchanged if a variable occur-
ring in the formula is replaced by a term (assuming that after this replacement we again
get a formula). Evidently, every strict Σ-formula is a Σ/ -formula.
Theorem 2.5. For every Σ-sentence α,
(§) S α ⇒ ` α.
Proof. Since every strict Σ-sentence is a Σ/ -sentence, it will suffice to prove (§) for every
Σ/ -sentence α. We proceed by induction on the length λ(α).
Suppose λ(α) = 0. As all logical operators (i.e., ¬, →, ∨, ∧, ∃, ∀) are then excluded
from α, α must be of the form σ ∈ τ or σ = τ , where σ, τ ∈ C. For a sentence of this
kind, (§) follows directly from the definition of ∈ and = in the structure S (see Ap.6.6).
Now suppose (§) has been shown for all Σ/ -sentences β (in place of α) such that
λ(β) < λ(α). We deduce that then (§) holds for the Σ/ -sentence α, by considering the
following cases:
• α is β ∨ γ. In this case β and γ are Σ/ -sentences and, by the inductive assumption,
S α ⇒ (S β or S γ) ⇒ (` β or ` γ) ⇒ ` β ∨ γ ⇒ ` α.
• α is β ∧ γ. The reasoning is analogous.
• α is ∃xβ(x). If S α then S β(τ ) for some τ ∈ C. Since λ(β(τ )) = λ(β) < λ(α),
the inductive assumption yields
S β(τ ) ⇒ ` β(τ ) ⇒ ` ∃xβ(x)
(by the rules of logic). Thus S α ⇒ ` α.
• α is ∀(x ∈ τ )β(x). Since there are no free variables in α and x does not occur in τ ,
we must have τ ∈ C. Thus, by Ap.6.2, there are τ1 , . . . , τm ∈ C such that
(§§) ` ∀(x ∈ τ )β(x) ↔ β(τ1 ) ∧ . . . ∧ β(τm ).
Since λ(β(τj )) = λ(β(x)) < λ(α) for all j, we can apply the inductive assumption,
thus getting:
S α ⇒ S β(τ1 ) and . . . and S β(τm ) ⇒
` β(τ1 ) and . . . and ` β(τm ) ⇒ ` β(τ1 ) ∧ . . . ∧ β(τm ) ⇒ ` α,
by (§§).
In this section we cover the more tedious part of the work needed for producing the
Σ-formula Pf(x). This amounts to a direct verification that all basic relations between
variables, terms and formulas that underlie the idea of a proof can be described by
means of Σ-formulas “talking” only about codes. All of these 25 formulas will become
sub-formulas of Pf(x).
12 S. Świerczkowski
We shall introduce each of the formulas 1–25 below by stating first its intended
meaning (in the meta-theory of HF) and then writing the formula itself. If any of the
formulas Seq(s, k), LstSeq(s, k, t) (both introduced above) is used then, for any m ∈ k
(i.e., m < k), we shall write ϕ(sm ) to abbreviate ∃y[hm, yi ∈ s ∧ ϕ(y)] (see Ap.3.2).
Hence, if ϕ is a Σ-formula, then so is ϕ(sm ).
1. Var(x)
Means: x is the code of a variable.
Ord(x) ∧ 0 ∈ x.
2. SeqTerm(s, k, t)
Means: s is a sequence of length k and s ends with the code t of a term.
LstSeq(s, k, t) ∧ ∀(l ∈ k){sl = 0 ∨ Var(sl ) ∨ ∃(m, n ∈ l)[sl = hp/q, sm , sn i]}.
3. Term(t)
Means: t is the code of a term.
∃k∃s SeqTerm(s, k, t).
4. Const(t)
Means: t is the code of a constant term.
∃k∃s[LstSeq(s, k, t) ∧ ∀(l ∈ k){sl = 0 ∨ ∃(m, n ∈ l)[sl = hp/q, sm , sn i]}].
5. NecSeqTerm(s, k, t)
Means: s is a sequence of length k whose last element t is the code of a term and
all terms whose codes appear in s (as some sl ) are sub-terms of the term coded
by t (i.e., each sl is necessary for the formation of the code t).
SeqTerm(s, k, t) ∧ ∀(l ∈ k){l+ = k ∨
∃(n, m ∈ k)[(l ∈ m) ∧ (sm = hp/q, sl , sn i ∨ sm = hp/q, sn , sl i)]}.
In other words, each sl , except the last, is used in creating some sm further on.
Lemma 3.1. ` Term(t) → ∃k∃s NecSeqTerm(s, k, t).
Proof. Assume Term(t), i.e., SeqTerm(s, k, t) for some sequence s of length k. If there is
an l ∈ k such that k 6= l (i.e., l is not the predecessor of k) and sl is not needed further
on in s (i.e., no sm with l ∈ m is of the form sm = hp/q, sn , sl i or sm = hp/q, sl , sn i)
then, omitting sl , we obtain a sequence z of length k such that SeqTerm(z, k, t). Thus,
for the shortest s (smallest k) such that SeqTerm(s, k, t), we have NecSeqTerm(s, k, t).
6. VarOccTerm(v, t)
Means: v codes a variable, t codes a term and the variable coded by v occurs in
the term coded by t.
Var(v) ∧ ∃k∃s[NecSeqTerm(s, k, t) ∧ ∃(l ∈ k)(sl = v)].
7. VarNonOccTerm(v, t)
Means: v codes a variable, t codes a term and the variable coded by v does not
occur in the term coded by t.
Var(v) ∧ ∃k∃s{NecSeqTerm(s, k, t) ∧
∀(l ∈ k)[sl = 0 ∨ (Var(sl ) ∧ (sl 6= v)) ∨ ∃(n, m ∈ l)[sl = hp/q, sn , sm i]]}.
Finite sets and Gödel’s incompleteness theorems 13
8. At(y)
Means: y is the code of an atomic formula.
∃u∃t[Term(u) ∧ Term(t) ∧ (y = hp=q, u, ti ∨ y = hp∈q, u, ti)].
9. VarOccAt(v, y)
Means: v codes a variable, y codes an atomic formula and the variable coded
by v occurs in the atomic formula coded by y.
∃u∃t[Term(u) ∧ Term(t) ∧ (y = hp=q, u, ti ∨ y = hp∈q, u, ti) ∧
(VarOccTerm(v, u) ∨ VarOccTerm(v, t))].
10. VarNonOccAt(v, y)
Means: v codes a variable, y codes an atomic formula and the variable coded
by v does not occur in the atomic formula coded by y.
∃u∃t[VarNonOccTerm(v, u) ∧ VarNonOccTerm(v, t) ∧
(y = hp=q, u, ti ∨ y = hp∈q, u, ti)].
11. MakeForm(u, w, v, y)
Means: v codes a variable and if u, w are codes of formulas, then y is the code
of a formula created from one or both formulas coded by u, w and possibly the
variable coded by v by a single application of ∨, ¬ or ∃.
Var(v) ∧ [y = hp∨q, u, wi ∨ y = hp∨q, w, ui ∨ y = hp¬q, ui ∨ y = hp∃q, v, ui].
12. SeqForm(s, k, y)
Means: s is a sequence of length k whose last element y is the code of a formula.
LstSeq(s, k, y) ∧ ∀(n ∈ k)[At(sn ) ∨ ∃(m, l ∈ n)∃v MakeForm(sm , sl , v, sn )].
13. Form(y)
Means: y is the code of a formula.
∃k∃s SeqForm(s, k, y).
14. NonAt(y)
Means: y is the code of a non-atomic formula.
Form(y) ∧ ∃u∃w∃v MakeForm(u, w, v, y).
15. VarTopForm(v, y)
Means: y is the code of a formula beginning with ∃xi , where pxi q = v.
Var(v) ∧ ∃w[Form(w) ∧ y = hp∃q, v, wi].
14 S. Świerczkowski
16. SeqVarTopForm(v, z, n, y)
Means: v codes a variable, y codes a formula and z is a sequence of length n
showing how the formula coded by y is made up of formulas beginning with ∃xi ,
where pxi q = v, and of atomic formulas, by using ∨, ¬ and ∃xj , where j 6= i.
Var(v) ∧ LstSeq(z, n, y) ∧ ∀(r ∈ n)[At(zr ) ∨ VarTopForm(v, zr ) ∨
∃(m, l ∈ r)∃u(Var(u) ∧ u 6= v ∧ MakeForm(zm , zl , u, zr ))].
17. NecSeqVarTopForm(v, z, n, y)
Means: As in 15, but with the additional requirement that all formulas whose
codes appear in z (as some zn ) are sub-formulas of the formula coded by y (i.e.,
each zn is necessary for the formation of the code y).
SeqVarTopForm(v, z, n, y) ∧
∀(r ∈ n){r + = n ∨ ∃(l, m ∈ n)∃u[(r ∈ m) ∧ MakeForm(zr , zl , u, zm )]}.
Lemma 3.2. ` Var(v) ∧ Form(y) → ∃n∃z NecSeqVarTopForm(v, z, n, y).
Proof. Suppose Var(v) and Form(y). Then there is a sequence s of length k such that
SeqForm(s, k, y). We now remove from s all sl for which VarTopForm(v, sl ) and place
these, possibly in the order of their first original appearance in s, at the beginning of a new
sequence z. The remainder of z is composed of those sl which were not removed from s.
Let n be the length of z (actually n = k but we do not need this fact). One checks that
SeqVarTopForm(v, z, n, y). If n is the smallest ordinal for which there is a z satisfying this
latter formula then, as in the proof of 3.1, we see that NecSeqVarTopForm(v, z, n, y).
18. VarOccFreeForm(v, y)
Means: v codes a variable, y codes a formula and the variable coded by v has a
free occurrence in the formula coded by y.
∃n∃z{NecSeqVarTopForm(v, z, n, y) ∧ ∃(r ∈ n)[At(zr ) ∧ VarOccAt(v, zr )]}.
19. VarNonOccFreeForm(v, y)
Means: v codes a variable, y codes a formula and the variable coded by v has
no free occurrence (is bound) in the formula coded by y.
∃n∃z{NecSeqVarTopForm(v, z, n, y) ∧
∀(r ∈ n)[(At(zr ) ∧ VarNonOccAt(v, zr )) ∨ NonAt(zr )]}.
Note. Choosing NecSeqVarTopForm instead of SeqVarTopForm in 19 (clearly the latter
would do) does not introduce any restrictions. (We justify this similarly to the Note
following 7.)
From 3.2 and the implication preceding 8 we see that
` Var(v) ∧ Form(y) → VarOccFreeForm(v, y) ∨ VarNonOccFreeForm(v, y).
20. TermSubsVarForm(t, v, y)
Means: t codes a term, v codes a variable, y codes a formula and the term coded
by t is substitutable for the variable coded by v in the formula coded by y.
Term(t) ∧ ∃n∃z{SeqVarTopForm(v, z, n, y) ∧
∀(r ∈ n)[At(zr ) ∨ ∃(m, l ∈ r)∃u{zr = hp∨q, zm , zl i ∨ zr = hp¬q, zm i ∨
[zr = hp∃q, u, zm i ∧ (VarNonOccFreeForm(v, zm ) ∨
(VarOccFree Form(v, zm ) ∧ VarNonOccTerm(u, t)))]}]}.
Finite sets and Gödel’s incompleteness theorems 15
21. SeqRepVarTermTerm(s, s0 , k, v, t, u, u0 )
Means: v codes a variable, t codes a term, u and u0 are coding terms and s and s0
are sequences of length k such that s ends with u and s0 ends with u0 and the
term coded by u0 is obtained from that coded by u by replacing each occurrence
of the variable coded by v by the term coded by t.
Var(v) ∧ Term(t) ∧ LstSeq(s, k, u) ∧ LstSeq(s0 , k, u0 ) ∧
∀(n ∈ k)[((sn = v) ∧ (s0n = t)) ∨ (Var(sn ) ∧ (sn 6= v) ∧ (s0n = sn )) ∨
((sn = 0) ∧ (s0n = 0)) ∨ ∃(m, l ∈ n)[sn = hp/q, sm , sl i ∧ s0n = hp/q, s0m , s0l i]].
22. RepVarTermTerm(v, t, u, u0 )
Means: v codes a variable, t codes a term, u and u0 code terms and replacing
in the term coded by u the variable coded by v, at all of its occurrences, by the
term coded by t one obtains the term coded by u0 .
∃k∃s∃s0 SeqRepVarTermTerm(s, s0 , k, v, t, u, u0 ).
23. RepVarTermAt(v, t, y, y 0 )
Means: v codes a variable, t codes a term, y and y 0 are coding atomic formulas
and replacing the variable coded by v by the term coded by t, at all occurrences
of this variable in the atomic formula coded by y, one gets the atomic formula
coded by y 0 .
∃u∃u0 ∃w∃w0 {RepVarTermTerm(v, t, u, u0 ) ∧ RepVarTermTerm(v, t, w, w 0 ) ∧
[(y = hp=q, u, wi ∧ y 0 = hp=q, u0 , w0 i) ∨ (y = hp∈q, u, wi ∧ y 0 = hp∈q, u0 , w0 i)]}.
24. SeqRepVarTermForm(s, s0 , k, v, t, y, y 0 )
Means: v codes a variable, t codes a term, y and y 0 code formulas and s and
s0 are sequences of length k such that s ends with y and s0 ends with y 0 and
the formula coded by y 0 is obtained from that coded by y by replacing each free
occurrence of the variable coded by v by the term coded by t.
Var(v) ∧ LstSeq(s, k, y) ∧ LstSeq(s0 , k, y 0 ) ∧
∀(l ∈ k){[At(sl ) ∧ RepVarTermAt(v, t, sl , s0l )] ∨ ∃(m, n ∈ l)∃u
[(sl = hp∃q, v, sm i ∧ s0l = sl ) ∨ (sl = hp∨q, sm , sn i ∧ s0l = hp∨q, s0m , s0n i) ∨
(Var(u) ∧ u 6= v ∧ sl = hp∃q, u, sm i ∧ s0l = hp∃q, u, s0m i) ∨
(sl = hp¬q, sm i ∧ s0l = hp¬q, s0m i)]}.
25. RepVarTermForm(v, t, y, y 0 )
Means: v codes a variable, t codes a term, y and y 0 code formulas and replacing
in the formula coded by y each free occurrence of the variable coded by v by the
term coded by t, one obtains the formula coded by y 0 .
∃k∃s∃s0 SeqRepVarTermForm(s, s0 , k, v, t, y, y 0 ).
4. The formula Pf
inference (see 4.1 and 4.2 below). We begin this section by recalling the logical axioms
and rules of inference for a first-order theory, as applied to HF. Next, these axioms and
rules are expressed by Σ-formulas and codes. This leads to a Σ-formula Pf(x) such that
` ϕ iff ` Pf(pϕq) for every formula ϕ, moreover ` α → Pf(pαq) for every Σ-sentence α.
It is easily verified that the description of the logic of HF given below is equivalent
(in the sense of yielding the same theorems) to the description of the logic of a first-order
theory in [Sh] (when applied to HF).
Definition 4.1. The logical axioms are the following formulas (where ϕ → ψ abbreviates
¬ϕ ∨ ψ):
Sentential (Boolean) Axioms: For any formulas ϕ, ψ, µ:
ϕ → ϕ,
ϕ → ϕ ∨ ψ,
ϕ ∨ ϕ → ϕ,
ϕ ∨ (ψ ∨ µ) → (ϕ ∨ ψ) ∨ µ,
(ϕ ∨ ψ) ∧ (¬ϕ ∨ µ) → ψ ∨ µ.
Specialization Axioms: For every formula ϕ and every xi :
ϕ → ∃xi ϕ.
Equality Axioms:
x1 = x 1 ,
(x1 = x2 ) ∧ (x3 = x4 ) → [(x1 = x3 ) → (x2 = x4 )],
(x1 = x2 ) ∧ (x3 = x4 ) → [(x1 ∈ x3 ) → (x2 ∈ x4 )],
(x1 = x2 ) ∧ (x3 = x4 ) → [x1 / x3 = x2 / x4 ].
Definition 4.2. The rules of inference are as follows:
ϕ, ϕ → ψ
Modus Ponens: .
ψ
ϕ
Substitution: for any term τ that is substitutable for xi in ϕ.
ϕ(xi /τ )
ϕ→ψ
∃-introduction: provided xi does not occur free in ψ.
∃xi ϕ → ψ
For each of these three rules we say that the formula written below the line is derived by
that rule from the formula(s) written above the line.
In order to write the formula Pf, it will be convenient to have short suggestive names
for certain terms (see the coding of formulas in Section 1):
Definition 4.3.
Neg(x) = hp¬q, xi,
Disj(x, y) = hp∨q, x, yi,
Impl(x, y) = Disj(Neg(x), y),
Finite sets and Gödel’s incompleteness theorems 17
(1) Ax(y)
Means: y is the code of an individual axiom.
y = c1 ∨ y = c2 ∨ . . . ∨ y = c 6 .
(2) Ind(x)
Means: x is the code of an instance of the Induction Scheme HF3, i.e., of
α(0) ∧ ∀x1 ∀x2 [α(x1 ) ∧ α(x2 ) → α(x1 / x2 )] → ∀x1 α(x1 ),
where α is any formula such that x2 is substitutable for x1 in α (see Note below).
∃y{Form(y) ∧ TermSubsVarForm(px2 q, px1 q, y) ∧
∃y 0 ∃y 00 ∃y 000 [RepVarTermForm(px1 q, 0, y, y 0 ) ∧
RepVarTermForm(px1 q, px2 q, y, y 00 ) ∧
RepVarTermForm(px1 q, hp/q, px1 q, px2 qi, y, y 000 ) ∧ x =
Impl(Conj(y 0 , All(px1 q, All(px2 q, Impl(Conj(y, y 00 ), y 000 )))), All(px1 q, y))]}.
(3) Sent(x)
Means: x codes a formula that is an instance of one of the five Sentential Axioms
(see 4.1).
∃y∃z∃w{Form(y) ∧ Form(z) ∧ Form(w) ∧
[x = Impl(y, y) ∨
x = Impl(y, Disj(y, z)) ∨
x = Impl(Disj(y, y), y) ∨
x = Impl(Disj(y, Disj(z, w)), Disj(Disj(y, z), w) ∨
x = Impl(Conj(Disj(y, z), Disj(Neg(y), w)), Disj(z, w))]}.
(4) Spec(x)
Means: x codes a Specialization Axiom (see 4.1).
∃y∃v[Form(y) ∧ Var(v) ∧ x = Impl(y, Exi(v, y))].
(5) ModPon(x, y, y 0 )
Means: If x, y, y 0 are codes of formulas then y 0 codes a formula which follows
from the formulas coded by x and y by an application of the Modus Ponens Rule
(in 4.2).
y = Impl(x, y 0 ).
18 S. Świerczkowski
(6) Subst(y, y 0 )
Means: y, y 0 are codes of formulas and the formula coded by y 0 is obtained from
that coded by y by an application of the Substitution Rule (in 4.2).
∃v∃t[TermSubsVarForm(t, v, y) ∧ RepVarTermForm(v, t, y, y 0 )].
(7) ∃intro(y, y 0 )
Means: y codes a formula and y 0 codes the formula obtained from that coded
by y by an application of the ∃-introduction rule (in 4.2).
∃u∃w∃v[Form(u) ∧ VarNonOccFreeForm(v, w) ∧
y = Impl(u, w) ∧ y 0 = Impl(Exi(v, u), w)].
(8) Prf(s, k, y)
Means: s is a sequence of length k of codes of formulas such that the corresponding
sequence of formulas is a proof of the formula coded by y.
LstSeq(s, k, y) ∧ ∀(n ∈ k)[Ax(sn ) ∨ Ind(sn ) ∨ Sent(sn ) ∨ Spec(sn ) ∨
∃(m, l ∈ n)[ModPon(sm , sl , sn ) ∨ Subst(sm , sn ) ∨ ∃ intro(sm , sn )]].
(9) Pf(y)
Means: y codes a formula which is a theorem of ZF.
∃k∃s Prf(s, k, y).
Proposition 4.4 (Proof formalization condition). For every formula ϕ,
`ϕ iff ` Pf(pϕq).
Proof. Assume ` ϕ. Then there is a sequence of formulas ϕ0 , . . . , ϕp that constitutes a
proof of ϕ. If s = {hl, pϕl qi : l ∈ p+ }, then s ∈ S (= the universe of the standard model S;
see Ap.6.6) and S Prf(s, p+ , pϕq), so that S Pf(pϕq). Since Pf(pϕq) is a Σ-sentence,
we conclude, by 2.5, that ` Pf(pϕq).
Now assume ` P f (pϕq). Hence S ∃k∃s Prf(s, k, pϕq). This means that there is a
sequence s ∈ S of length k, where each sl codes some formula ϕl , so that ϕk is ϕ and the
sequence of formulas (ϕ0 , . . . , ϕk ) constitutes a proof of ϕ. Thus ` ϕ.
As commonly defined, we call f a function if all elements of f are ordered pairs and
hu, wi ∈ f ∧ hu, zi ∈ f → w = z.
Associated to a function are two sets: the domain dom(f ) and the range rng(f ). Functions
can be obtained also from the function symbols of the language, in our case, from /. For
example, x / x determines a function once we select some set for the domain.
More generally, functions can be obtained from certain formulas. Thus let ϕ be a
formula with exactly n + 1 free variables (n ≥ 0) and let y be one of these variables.
Suppose that ∃!yϕ is a theorem of HF (“∃!” means: “∃ a unique”). Then we shall associate
with ϕ and y a new symbol Fϕy and we shall call it an n-ary p-function symbol, where
“p” might stand for “phantom” to stress that we do not wish to consider Fϕy as part
of the language of HF (in [Bo], “p” stands for “pseudo”). We shall call ϕ the defining
Finite sets and Gödel’s incompleteness theorems 19
formula for Fϕy . A term Fϕy (z1 , . . . , zn ) (of the language of HF extended by the p-function
symbol Fϕy ) will be called a p-term. We would like to write formulas including p-terms,
however these will be always regarded as representing certain formulas of HF. Thus let
xi1 , . . . , xin be the other free variables in ϕ, where i1 < . . . < in . Then the representation
rule is as follows: If β(z) is an atomic formula of HF then
β(Fϕy (z1 , . . . , zn )) represents any formula ∃w[ϕ(z1 , . . . , zn , w) ∧ β(w)],
where w is a variable substitutable for y in ϕ, moreover each zk is substitutable for xik
in ϕ. Thus α(Fϕy (z1 , . . . , zn )) represents many formulas, however these are equivalent to
each other in HF. The above rule is extended in an obvious way to all formulas containing
the symbol Fϕy so that the representation commutes with the logical connectives and
quantifiers (for details see Ap.3).
Only capital letters will be used for p-function symbols and we shall omit the suffix ϕ
and exponent y when circumstances permit. Also, we shall sometimes say p-function to
mean a p-function symbol Fϕy , or even its whole definition. Several p-functions will be in-
troduced below. Then by a p-term we shall mean any term of the extended language, i.e., a
term formed from the variables, 0, the function symbol / and the p-function symbols. For-
mulas containing such terms will be called p-formulas. The above abbreviation principle
can be naturally extended to any p-formula which then is easily seen to be an abbrevia-
tion of a (not unique) formula of HF. By a slight abuse of language we shall occasionally
say that a p-formula α is a theorem of HF to mean that α abbreviates such a theorem.
Our goal in this section is to prepare the definition, for every i, of a p-function K such
that for every formula γ(xi ), the p-formula K(pγq) = pγ(pγq)q is a theorem of HF.
Proposition 5.1. There exists a ternary p-function, denoted here by REPL, such that
` REPL(pxi q, pτ q, pϕq) = pϕ(xi /τ )q
for every variable xi , term τ and formula ϕ [where ϕ(xi /τ ) results from replacing each
free occurrence of xi in ϕ by τ ].
(We are not concerned at present with the issue of substitutability.) As the defining
formula ϕ(v, t, y, y 0 ) for REPL we choose the formula:
[(∃!z) RepVarTermForm(v, t, y, z) ∧ RepVarTermForm(v, t, y, y 0 )] ∨
[¬(∃!z) RepVarTermForm(v, t, y, z) ∧ y 0 = 0].
It is clear that (∃!y 0 )ϕ(v, t, y, y 0 ). So, for the reader who found our description of the
meaning of formula 25 (Section 3) wholly convincing, the above proposition requires no
further proof. For the more meticulous, we suggest checking through Lemmas 5.2–5.4.
Lemma 5.2. For every term µ,
` RepVarTermTerm(pxi q, pτ q, pµq, u0 ) ↔ u0 = pµ(xi /τ )q.
(Here µ(xi /τ ) is the term obtained from µ by replacing each occurrence of xi by τ .)
Proof. It will be enough to show that for any µ,
(1) ` RepVarTermTerm(pxi q, pτ q, pµq, u0 ) → u0 = pµ(xi /τ )q,
(2) ` RepVarTermTerm(pxi q, pτ q, pµq, pµ(xi /τ )q).
20 S. Świerczkowski
In this section we wish to prove Gödel’s Diagonal Lemma and then his First Incomplete-
ness Theorem. To start with, we introduce two unary p-functions W, H that describe the
formation of certain codes. Since a code (of a variable, a term or a formula) is a constant
term, that code can be coded again. H will do this, i.e., we shall have H(µ) = pµq,
whenever µ is already a code. W is just the restriction of H to the codes of variables (i.e.,
to the ordinals), however we begin with W .
Recall that px1 q is 0 / 0, i.e. 0+ , and in general, pxk+ q is pxk q / pxk q, i.e. (pxk q)+ ,
for every ordinal k. The predecessor of a non-zero ordinal x is x.
Lemma 6.1. There is a p-function W such that
` W (pxk q) = p(pxk q)q
for every variable xk .
Proof. Define W recursively on ordinals by (see Ap.3.4):
0 if x = 0 or x is not an ordinal,
W (x) =
hp/q, W (x), W (x)i if x is a non-zero ordinal.
By this definition,
W (px1 q) = W (0+ ) = hp/q, W (0), W (0)i = hp/q, 0, 0i = p0 / 0q = p(px1 q)q,
as required. Assuming that W (pxk q) = p(pxk q)q, we get
W (pxk+ q) = W ((pxk q)+ ) = hp/q, W (pxk q), W (pxk q)i = hp/q, p(pxk q)q, p(pxk q)qi
= p(pxk q / pxk q)q = p(pxk+ q)q.
Recall that all codes (as described in Section 1) belong to the family of terms Γ
defined in 1.1.
Lemma 6.2. There is a p-function H such that H(µ) = pµq for every µ in Γ .
Proof. According to the definition of an ordered pair,
hx, yi = {{x}, {x, y}} = {0 / x, (0 / x) / y} = (0 / (0 / x)) / ((0 / x) / y).
Thus, for any terms σ, τ ,
phσ, τ iq = p(0 / (0 / σ)) / ((0 / σ) / τ )q
= hp/q, hp/q, 0, hp/q, 0, pσqii, hp/q, hp/q, 0, pσqi, pτ qii.
Let t(x, y) be the term
hp/q, hp/q, 0, hp/q, 0, xii, hp/q, hp/q, 0, xi, yii.
Thus t(pσq, pτ q) is phσ, τ iq for any terms µ, σ. We now define H(x) recursively (see
Ap.4.9) as
(
W (x) if x is an ordinal,
H(x) = t(H(u), H(v)) if x = hu, vi for some u, v,
0 in all other cases.
22 S. Świerczkowski
The definition makes sense because no ordered pair is an ordinal (0 ∈ x for every non-zero
ordinal x, whereas 0 6∈ hu, vi for all u, v). Since x = hu, vi → u, v ∈ cl(x), the requirements
of Ap.4.9 are satisfied.
Now suppose µ ∈ Γ . If µ = 0 or µ is the code of a variable, then H(µ) = W (µ) = pµq
because the first line of the definition of H applies and we can use 6.1. All other terms µ
in Γ are generated from 0 and from the codes of variables by (repeatedly) forming ordered
pairs. Since for µ = hσ, τ i the terms σ, τ are shorter than µ, we may proceed by induction
and assume that H(σ) = pσq, H(τ ) = pτ q. Then, by the definition of t(x, y),
pµq = phσ, τ iq = t(pσq, pτ q) = t(H(σ), H(τ )) = H(µ),
as we wished to show.
It is clear that no unary p-function H can satisfy H(τ ) = pτ q for all constant terms τ .
Indeed, if this were so, the terms σ = (0 / µ) / ν and τ = (0 / ν) / µ, where µ, ν are
different constant terms, would have provably equal codes, which is easily seen not to be
the case.
Let us agree that when a formula ϕ is written as ϕ(xi ) then, for any term τ , we shall
write ϕ(τ ) to mean ϕ(xi /τ ).
Lemma 6.3. For every variable xi , there exists a p-function K such that, for every for-
mula ϕ(xi ),
` K(pϕq) = pϕ(pϕq)q.
Proof. Consider the Replacement Function REPL defined in Section 5. Noting that the
composite of p-functions is a p-function, we put
K(x) = REPL(pxi q, H(x), x).
By 5.1, for any formula ϕ(xi ) and any term τ one has
` REPL(pxi q, pτ q, pϕq) = pϕ(τ )q.
Let τ = pϕq and recall that pτ q = H(τ ) in this case (by 6.2). Then
` REPL(pxi q, H(pϕq), pϕq) = pϕ(pϕq)q,
i.e. ` K(pϕq) = pϕ(pϕq)q.
Theorem 6.4 (Gödel’s Diagonal Lemma). For every formula α(xi ), there is a formula δ
such that
` δ ↔ α(pδq).
Proof. We replace the variable xi in α by the p-term K(xi ), and we denote by β(xi ) the
resulting formula. Thus
` β(xi ) ↔ α(K(xi )).
Then, by 6.3,
` β(pβq) ↔ α(p(pβq)q)).
Thus a formula δ satisfying the claim of the Diagonal Lemma is β(pβq).
Theorem 6.5 (Gödel’s First Incompleteness Theorem). If HF is a consistent theory,
then there is a sentence δ such that neither HF ` δ nor HF ` ¬δ. Moreover , S δ.
Finite sets and Gödel’s incompleteness theorems 23
Proof. Consider the p-formula ¬ Pf(x1 ). By the Diagonal Lemma, there is a sentence δ
such that
(4) ` δ ↔ ¬ Pf(pδq).
(a) Suppose ` δ. Then ` Pf(pδq), by 4.4. On the other hand, from (4) follows
` ¬ Pf(pδq). This contradicts the assumption of consistency.
(b) Suppose ` ¬δ. Then, by (4), ` Pf(pδq) and hence ` δ, by 4.4. This, together with
the assumed ` ¬δ, contradicts the consistency assumption.
(c) Suppose non[S δ]. Then S ¬δ, and hence S Pf(pδq), by (4). So ` δ,
by 4.4, contradicting non[S δ].
7. Pseudo-coding
Our next (and last) goal is to prove Gödel’s Second Incompleteness Theorem, i.e., 9.8.
A glance at its proof in Section 9 will show that at this moment it hinges only on the
Hilbert–Bernays derivability condition
` Pf(pδq) → Pf(pPf(pδq)q),
albeit only for the sentence δ satisfying (4) in 6.5. In what follows, the above implication
will be obtained as a special case of α → Pf(pαq), which we show to be a theorem for every
Σ-sentence α. (Clearly Pf(pδq) is a Σ-sentence.) Often results are proved by induction
on the length of a formula, but in this case induction does not work: α → Pf(pαq) is not
a theorem for every Σ-formula α. Indeed, we can take α to be x1 = x2 . Then, accepting
as a theorem x1 = x2 → Pf(px1 = x2 q), we would quickly reach a contradiction: Since
x2 does not occur in Pf(px1 = x2 q) (i.e., in Pf(hp=q, 0 / 0, (0 / 0) / (0 / 0)i), we would
conclude ` x1 = x1 → Pf(px1 = x2 q), hence ` Pf(px1 = x2 q), and finally ` x1 = x2 ,
by 4.4.
The approach that works is as follows. First, we define a pseudo-code of a formula α,
denoting it by [[α]]. Such a pseudo-code is a term, like the code pαq, but whereas pαq
is a constant term, [[α]] is in general a term with variables. In fact, the variables of the
term [[α]] are exactly those that are free in α. Next, we describe a rather special p-func-
tion F (we call it the special p-function). Then we replace each variable xi occurring
in [[α]] by the p-term F (xi ). The resulting p-term is denoted by [[α]](F (x1 ), . . . , F (xn )),
where n is chosen large enough so that all variables in [[α]] are among x1 , . . . , xn . Then
we show by induction on the length of the Σ-formula α that
α → Pf([[α]](F (x1 ), . . . , F (xn )))
is a theorem of HF. It is part of the definition of a pseudo-code that if α is sentence
then [[α]] is identical with pαq. This allows us to conclude ` α → Pf(pαq) for every
Σ-sentence α.
To create [[α]] we follow the same coding procedure as the one leading to the code
pαq, except that no variable xi occurring free in α is coded. The free occurrences of xi
are just left standing. For example:
[[x1 = x2 ]] = hp=q, x1 , x2 i and [[∃x1 (x1 ∈ x3 )]] = hp∃q, px1 q, hp∈q, px1 q, x3 ii.
24 S. Świerczkowski
If we try to define [[α]] by induction on the length of the formula α, i.e., by analogy
with defining pαq, we hit an obstacle: when an occurrence of xi is to be coded at a stage
when a sub-formula of α containing xi free is coded, one does not know yet whether this
occurrence of xi will remain free or finally become bound in α. To overcome this obstacle,
we introduce the concept of an intermediary code [[α]]V corresponding to any set V of
indices of variables (i.e., non-zero ordinals). This code [[α]]V , which we call a V -code, is
created just like [[α]], except that for an occurrence of xi not to be coded, we require not
only that this occurrence is free in α but also that i ∈ V . Thus [[α]]V is [[α]] when V
contains all the indices of the variables free in α, and [[α]]V is pαq when V is empty.
V -coding can be defined inductively. We begin with the terms. We put [[0]]V = 0, and
for any variable xi we define
xi if i ∈ V ,
[[xi ]]V =
pxi q otherwise.
The V -codes of other terms are defined by induction on the length of the term: if σ, τ are
V -coded, then [[σ / τ ]]V = hp/q, [[σ]]V , [[τ ]]V i.
The V -codes of atomic formulas are defined by [[σ ∈ τ ]]V = hp∈q, [[σ]]V , [[τ ]]V i and
[[σ = τ ]]V = hp=q, [[σ]]V , [[τ ]]V i (for any terms σ, τ ).
The V -codes of other formulas are defined by induction on the length of the formula:
[[¬α]]V = hp¬q, [[α]]V i,
[[α ∨ β]]V = hp∨q, [[α]]V , [[β]]V i,
[[∃xk α]]V = hp∃q, pxk q, [[α]]V \{k} i.
Definition 7.1. If V contains the indices of all variables occurring free in α then [[α]]V
(obviously not depending on V in that case) will be denoted by [[α]] and called the
pseudo-code of α.
It may be worth observing that if we replace each variable xi occurring in the term
[[α]] by pxi q then we obtain pαq.
In this section we describe by means of pseudo-codes some properties of the formula
Pf(x). Our goal is Theorem 7.3. This will be needed for proving Theorem 9.1, from which
the Hilbert–Bernays condition and then Gödel’s Second Theorem are easily deduced.
Definition 7.2. Given any string t1 , . . . , tn of variables and any V -code [[α]]V , we shall
denote by [[α]]V (t1 , . . . , tn ) the term that results from [[α]]V when each variable xi oc-
curring in [[α]]V , with i ≤ n, is replaced by ti .
Suppose β is a theorem, i.e., `β. If we replace each of the variables x1 , . . . , xn at
each of its free occurrences in β by some constant term then the formula so obtained
is also a theorem (by the Substitution Rule in 4.2). This just described situation in
the meta-theory admits description in HF. Thus, `β is equivalent to Pf(pβq) (see 4.4).
If t1 , . . . , tn are the codes of the constant terms that are substituted for x1 , . . . , xn (at
the free occurrences of these variables) then this new theorem we get from β has the
{1, . . . , n}-code [[β]]{1,...,n} (t1 , . . . , tn ). Hence the mentioned description in HF is the im-
plication () in 7.3 below. The reader who accepts the accuracy of our description of the
meta-theory of HF by formulas of HF in Sections 3 and 4 might feel no need to see the
Finite sets and Gödel’s incompleteness theorems 25
proof of () (Lemmas 7.4–7.6). We give it here for the sake of completeness, although it
is rather long and tedious (due to the fact that pseudo-codes do not participate in the
creation of the formula Pf).
Theorem 7.3. For every formula β and every n = 1, 2, . . . ,
() Const(t1 ) ∧ . . . ∧ Const(tn ) ∧ Pf(pβq) → Pf([[β]]{1,...,n} (t1 , . . . , tn ))
is a theorem of HF.
Proof. We reduce the proof to establishing Lemma 7.6, i.e., the theorem
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermForm(pxr q, tr , [[β]]V \{r} (t1 , . . . , tn ), [[β]]V (t1 , . . . , tn ))
for every r ∈ V ⊆ {1, . . . , n}. Recall that in forming the term [[β]]V \{r} (t1 , . . . , tn ) all the
occurrences of xr are coded as pxr q, and when these occurrences of pxr q which corres-
pond to free occurrences of xr in β are replaced by tr , we obtain [[β]]V (t1 , . . . , tn ). This
procedure of replacing the pxr q by tr is described above by RepVarTermForm.
So let us assume that Lemma 7.6 (i.e., the above implication) has been proved. Then,
to prove 7.3, we first observe that 4, 7 and the subsequent Note (Section 3) imply
` Const(t) → VarNonOccTerm(u, t).
Hence
` Var(v) ∧ Form(y) ∧ Const(t) → TermSubsVarForm(t, v, y),
using 20, Lemma 3.2 and the Note after 19 (last line). Next, let us observe that
` TermSubsVarForm(t, v, y) ∧ RepVarTermForm(v, t, y, y 0 ) ∧ Pf(y) → Pf(y 0 )
by the definitions of Subst and Pf, i.e., (6) and (9) in Section 4. Combining this with
the previous theorem, we get
` Const(t) ∧ RepVarTermForm(v, t, y, y 0 ) ∧ Pf(y) → Pf(y 0 ).
Consider now this theorem for every r = 1, . . . , n with
v = pxr q, t = tr , y = [[β]]{1,...,r} (t1 , . . . , tn ), y 0 = [[β]]{1,...,r} (t1 , . . . , tn ),
where, for r = 1, [[β]]{1,...,r} = [[β]]0 = pβq. Then it follows, by the assumed 7.6 with
V = {1, . . . , r} and r = 1, . . . , n, that
` Const(t1 ) ∧ . . . ∧ Const(tr ) ∧ Pf([[β]]{1,...,r} (t1 , . . . , tn )) → Pf([[β]]{1,...,r} (t1 , . . . , tn )).
Combining all of these implications, for r = 1, . . . , n, we get the assertion of 7.3.
The following two lemmas pave the ground for the proof of Lemma 7.6.
Lemma 7.4. Suppose τ is a term, V ⊆ {1, . . . , n} and r ∈ V . Then
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermTerm(pxr q, tr , [[τ ]]V \{r} (t1 , . . . , tn ), [[τ ]]V (t1 , . . . , tn ))
(see 21, 22 in Section 3).
Proof. We use induction on the length of τ . So we have to consider the cases when τ is 0
or a variable, and the case when τ = σ / µ.
26 S. Świerczkowski
theorem
` RepVarTermTerm(v, t, u, u0 ) ∧ RepVarTermTerm(v, t, w, w 0 ) →
RepVarTermAt(v, t, hp∈q, u, wi, hp∈q, u0 , w0 i).
Using the same substitutions for v, . . . , w 0 as in the proof of Lemma 7.4, we see from
Lemma 7.4 that the left side of the above implication becomes a theorem when it is
preceded by “Const(t1 ) ∧ . . . ∧ Const(tn ) →”. So the same applies to the right side. It
remains to observe that
hp∈q, u, wi = hp∈q, [[σ]]V \{r} , [[µ]]V \{r} i(t1 , . . . , tn ) = [[α]]V \{r} (t1 , . . . , tn ),
hp∈q, u0 , w0 i = hp∈q, [[σ]]V , [[µ]]V i(t1 , . . . , tn ) = [[α]]V (t1 , . . . , tn ).
Lemma 7.6. Suppose β is a formula, V ⊆ {1, . . . , n} and r ∈ V . Then
` Const(t1 ) ∧ . . . ∧ Const(tn ) →
RepVarTermForm(pxr q, tr , [[β]]V \{r} (t1 , . . . , tn ), [[β]]V (t1 , . . . , tn )).
Proof. We fix n ≥ 1. When β is atomic, the proof is already there. So we have to
consider the cases when β is of the form ∃xi γ, or γ ∨ δ, or ¬γ, while accepting the
inductive hypothesis that the implication in 7.6 has been proved for γ and δ (and for all
r, V satisfying r ∈ V ⊆ {1, . . . , n}).
A. Case when β is ∃xi γ. As in the proof of 7.4.B, we consider various possibilities for
i, r and V .
(a): i 6∈ V and hence pxi q 6= pxr q. We have the general theorem
` Var(u) ∧ u 6= v ∧ RepVarTermForm(v, t, y, y 0 ) →
RepVarTermForm(v, t, hp∃q, u, yi, hp∃q, u, y 0 i).
(see 24, 25). Let us substitute here u = pxi q, v = pxr q, t = tr and
y = [[γ]]V \{r} (t1 , . . . , tn ), y 0 = [[γ]]V (t1 , . . . , tn ).
Then, by the inductive hypothesis, the left side of the above implication becomes a
theorem when preceded by “Const(t1 ) ∧ . . . ∧ Const(tn ) →”. So the same applies to the
right side, where, by the definition of the V -coding,
hp∃q, u, yi = hp∃q, pxi q, [[γ]]V \{r} i(t1 , . . . , tn )
= [[∃xi γ]]V \{r} (t1 , . . . , tn ) = [[β]]V \{r} (t1 , . . . , tn ),
and similarly for y 0 and V in place of y and V \ {r}.
(b): i ∈ V and i 6= r. Let W = V \ {i}. In the general implication used in the proof
of (a), we make the same substitutions for u, v, t as in (a) and put
y = [[γ]]W \{r} (t1 , . . . , tn ), y 0 = [[γ]]W (t1 , . . . , tn ).
Then, by the inductive hypothesis applied to γ and W , the left side of that implication,
when preceded by “Const(t1 ) ∧ . . . ∧ Const(tn ) →”, becomes a theorem. So the same
applies to the right side. We observe that now, by the definition of the V -coding, and
28 S. Świerczkowski
Since the Modus Ponens Rule of Inference is incorporated in the definition ((8) and (9))
of the formula Pf, we have the general theorem
` Pf(Impl(x, y)) → [Pf(x) → Pf(y)].
It remains to substitute x = [[α]]{1,...,n} (t1 , . . . , tn ) and y = [[β]]{1,...,n} (t1 , . . . , tn ).
We shall presently introduce the special p-function F . According to the plan outlined at
the beginning of the previous section, a key role will be given to the p-term [[α]]((F (x 1 ),
. . . , F (xn )), obtained from the term [[α]] by replacing each xi occurring in it by the p-term
F (xi ). In this section, after defining F , we shall show that the p-formula Const(F (xi )) is
(or rather, abbreviates) a theorem of HF. This will allow us to apply the results of the
preceding section to the situation when each ti is replaced by F (xi ).
To define F , we need to select definably from every non-empty set x an element v, so
that then canonically x = u / v, with u = x \ {v}. To select v ∈ x, we use an ordering
relation < on the universe of all (hereditarily finite) sets. The description of < starts from
the following observation: Any ordering of a (finite) set y determines an ordering of the
power set P (y) (= set of subsets of y). This is defined so that for z1 , z2 ⊆ y and z1 6= z2
one puts z1 < z2 iff the greatest element of the symmetric difference (z1 ∪ z2 ) \ (z1 ∩ z2 ) is
in z2 , and otherwise, z2 < z1 . Since all sets can be reached from 0 by repeated application
of the power-set operation (and 0 is ordered!), the above principle yields a unique ordering
of the universe of all sets. This ordering refines the partial ordering by ⊂ and it also has
the property that x < y when x ∈ y (see Ap.5 for the details). We can use the same
symbol < for this order and for the usual order among ordinals, since on the sub-universe
of ordinals, both orders coincide.
Definition 8.1. The special p-function F is defined by recursion on inequality (see
Ap.5.14) as follows:
1◦ F (0) = 0.
2◦ Suppose x 6= 0 and F (y) has been defined for all y < x. Then put
F (x) = hp/q, F (u), F (v)i,
where v is the <-greatest element of x and u = x \ {v} (i.e., x = u / v and v 6∈ u).
The existence of v follows from Ap.5.10. Also, we have u, v < x, by 4.5 and 5.9 in Ap.
(We leave to the reader the task of describing the p-function G required in 5.14 of Ap.)
Let us list some properties of F that may justify why we called F special:
(a) In the standard model S, the values of F are codes (or rather, named by codes)
of constant terms.
(b) The definition of F parallels the coding of terms (compare 2◦ in 8.1 with pσ / τ q
= hp/q, pσq, pτ qi ).
(c) F is injective.
30 S. Świerczkowski
(d) For each constant term τ there is a constant term µ such that ` τ = µ and
F (µ) = pµq.
(e) F (k) = p(pxk q)q for every ordinal k 6= 0. (Hence F coincides on the sub-universe
of ordinals with the two p-functions W, H of Section 6.)
Property (a) follows from 8.2, (b) is obvious, and (c), (d), (e) will not be used below.
There is a p-function in [Bo] that plays a similar role to that of F here, however the
definition in [Bo] is simpler, because in PA each x is the successor of a unique u, so that
x = Su can be used instead of x = u / v.
Theorem 8.2. ` Const(F (x)).
Proof. Let us prove Const(F (x)) by induction on < (applying the scheme 5.11 of Ap).
We have Const(F (0)), due to F (0) = 0. Now let x 6= 0 and assume Const(F (y)) for all
y < x. Let u, v < x be such that v is the <-largest element of x and u = x \ {v}, so that
F (x) = hp/q, F (u), F (v)i. Since then Const(F (u)) and Const(F (v)), by the inductive
assumption, we can use the easily proved theorem
` Const(t1 ) ∧ Const(t2 ) → Const(hp/q, t1 , t2 i)
to deduce Const(hp/q, F (u), F (v)i), i.e., Const(F (x)).
The theorem Const(F (x)) allows us to replace in 7.3 and 7.7 each ti by F (xi ), thus
leading to:
Lemma 8.3. If β is a theorem and all variables free in β are among x1 , . . . , xn then
` Pf([[β]](F (x1 ), . . . , F (xn ))).
Proof. Assume ` β. Then ` Pf(pβq), by 4.4. Replacing each ti by F (xi ) in 7.3 and
using 8.2, we obtain
` Pf(pβq) → Pf([[β]]{1,...,n} (F (x1 ), . . . , F (xn ))).
Moreover the assumption about the variables implies [[β]]{1,...,n} = [[β]].
Lemma 8.4. If α → β is a theorem and all variables free in α and β are among x 1 , . . . , xn
then
` Pf([[α]](F (x1 ), . . . , F (xn ))) → Pf([[β]](F (x1 ), . . . , F (xn ))).
Proof. Assume ` α → β. Then ` Pf(pα → βq), by 4.4. Replacing each ti by F (xi ) in 7.7
and using 8.2, we obtain
` Pf(pα → βq) →
{Pf([[α]]{1,...,n} (F (x1 ), . . . , F (xn ))) → Pf([[β]]{1,...,n} (F (x1 ), . . . , F (xn )))}.
Moreover the assumption about the variables implies [[β]]{1,...,n} = [[β]] and [[α]]{1,...,n}
= [[α]].
In the above lemmas we obtained [[α]](F (x1 ), . . . , F (xn )) by replacing each ti in
[[α]](t1 , . . . , tn ) by F (xi ). However, t1 , . . . , tn were introduced merely for the sake of con-
venience and clearly we can pass directly from [[α]] to [[α]](F (x1 ), . . . , F (xn )) just by
replacing each xi in [[α]] by F (xi ).
Finite sets and Gödel’s incompleteness theorems 31
The proof of Gödel’s Second Incompleteness Theorem is based on the fact that α →
Pf(pαq) is a theorem of HF for every Σ-sentence α. We obtain this result as a direct
consequence of Theorem 9.1 whose proof occupies the major part of this section.
Theorem 9.1. If α is a Σ-formula and all variables free in α are among x 1 , . . . , xn then
the p-formula
(?) α → Pf([[α]](F (x1 ), . . . , F (xn )))
abbreviates a theorem of HF.
Definition 9.2. A formula α will be called a (?)-formula iff (?) abbreviates a theorem
of HF for every (or equivalently, some) n such that all free variables of α are among
x1 , . . . , x n .
Due to 8.4, it will suffice to show that every strict Σ-formula is a (?)-formula. Recall
that (see 2.1) the class Σ of strict Σ-formulas is the least class of formulas containing all
atomic formulas of the form xi ∈ xj , and such that Σ is closed under conjunctions, dis-
junctions and the application of existential quantifiers and bounded universal quantifiers
∀(xi ∈ xj ), where i 6= j. Thus the proof is of 9.1 is reduced to showing Lemmas 9.4–9.7
below. (However we first need the auxiliary result 9.3.)
Lemma 9.3. xi = xj is a (?)-formula.
Proof. We have to prove that
` xi = xj → Pf([[xi = xj ]](F (x1 ), . . . , F (xn )))
provided i, j ≤ n. Since in this case, by the definition of pseudo-coding,
[[xi = xj ]](F (x1 ), . . . , F (xn )) = hp=q, F (xi ), F (xj )i,
we have to prove
(1) xi = xj → Pf(hp=q, F (xi ), F (xj )i).
We have:
` xi = x i .
` Pf([[xi = xi ]](F (x1 ), . . . , F (xn ))) when i ≤ n, by 8.3.
` Pf(hp=q, F (xi ), F (xi )i), by the definition of pseudo-coding.
` xi = xj → Pf(hp=q, F (xi ), F (xj )i), by the rules of logic.
Lemma 9.4. xi ∈ xj is a (?)-formula.
Proof. We have to show
` xi ∈ xj → Pf([[xi ∈ xj ]](F (x1 ), . . . , F (xn )))
provided i, j ≤ n. Similarly to the case of the formula xi = xj , this amounts to proving
` xi ∈ xj → Pf(hp∈q, F (xi ), F (xj )i).
32 S. Świerczkowski
Proof. Suppose all variables free in α ∨ β are among x1 , . . . , xn . Then we should show
(a) ` α ∨ β → Pf([[α ∨ β]](F (x1 ), . . . , F (xn ))),
(b) ` α ∧ β → Pf([[α ∧ β]](F (x1 ), . . . , F (xn ))).
To get (a), note that, by assumption,
` α ∨ β → Pf([[α]](F (x1 ), . . . , F (xn ))) ∨ Pf([[β]](F (x1 ), . . . , F (xn ))).
It now remains to apply the theorem Pf(z1 ) ∨ Pf(z2 ) → Pf(hp∨q, z1 , z2 i) with z1 =
[[α]](F (x1 ), . . . , F (xn )), z2 = [[β]](F (x1 ), . . . , F (xn )), noting that in this case
hp∨q, z1 , z2 i = [[α ∨ β]](F (x1 ), . . . , F (xn )).
The proof of part (b) is analogous.
Lemma 9.6. If α is a (?)-formula then, for every i, ∃xi α is a (?)-formula. In other
words, if all variables free in ∃xi α are among x1 , . . . , xn then
` ∃xi α → Pf([[∃xi α]](F (x1 ), . . . , F (xn ))).
Proof. Let m ≥ n be such that all variables free in α are among x1 , . . . , xm . Then
` Pf([[α]](F (x1 ), . . . , F (xm ))) → Pf([[∃xi α]](F (x1 ), . . . , F (xm )))
by the Specialization Axiom ` α → ∃xi α and 8.4. Combining this with the assump-
tion (?), we obtain
` α → Pf([[∃xi α]](F (x1 ), . . . , F (xm ))).
Since xi is not free on the right hand side, we can apply the ∃-introduction rule (in 4.2),
to get
` ∃xi α → Pf([[∃xi α]](F (x1 ), . . . , F (xm ))).
All the variables free in ∃xi α are among x1 , . . . , xn , so we can replace here m by n.
Lemma 9.7. If α is a (?)-formula then, for every i 6= j, ∀(xi ∈ xj )α is a (?)-formula. In
other words, if all variables free in ∀(xi ∈ xj )α are among x1 , . . . , xn then
` ∀(xi ∈ xj )α → Pf([[∀(xi ∈ xj )α]](F (x1 ), . . . , F (xn ))).
Proof. Let α be a (?)-formula. The following two observations will be used in the proof:
(a) If β ↔ α then β is a (?)-formula (in virtue of 8.4).
(b) If a variable xk is replaced in α by some variable xl (substitutable for xk in α)
then the formula so obtained is also a (?)-formula.
To simplify matters, we take i = 1, j = 2 and represent by x3 the remaining free
variables in α. We shall distinguish two cases:
Case 1: x2 is not free in α. We shall write α as α(x1 ). It will be also convenient
to take n = 5 and assume that x5 is substitutable for x1 (otherwise, we can rename
the bound variables of α so that x5 will not be among them and then replace α by the
formula β so obtained; see (a) above).
Thus our purpose is to prove ∀x2 γ(x2 ) for the formula γ(x2 ):
∀(x1 ∈ x2 )α(x1 ) → Pf([[∀(x1 ∈ x2 )α(x1 )]](F (x1 ), . . . , F (x5 ))),
34 S. Świerczkowski
Then, since δ is not a theorem (see 6.5), neither is Con(HF). To show (44), we apply 9.1
to the Σ-sentence Pf(pδq):
` Pf(pδq) → Pf(pPf(pδq)q).
On the other hand, applying 8.4 to (4), we deduce
` Pf(pδq) → Pf(p¬ Pf(pδq)q).
The last two theorems yield
` Pf(pδq) → Pf(pPf(pδq)q) ∧ Pf(p¬ Pf(pδq)q).
By the definition of Pf, we have, for any formulas α, β,
` Pf(pαq) ∧ Pf(pβq) → Pf(pα ∧ βq),
so we conclude from the above:
` Pf(pδq) → Pf(pPf(pδq) ∧ ¬ Pf(pδq)q).
Since ` Pf(pδq) ∧ ¬ Pf(pδq) → 0 ∈ 0, we can use 8.4 to obtain
` Pf(pδq) → Pf(p0 ∈ 0q),
that is,
` Pf(pδq) → ¬ Con(HF).
Hence
` Con(HF) → ¬ Pf(pδq).
Combining this with (4), we obtain (44).
This Incompleteness Phenomenon still attracts very much attention. Apart from the
publications listed below (consulted during the preparation of this paper) one can find a
more extensive list of references in [L].
We conclude this paper by stating a theorem which says that each of the theories HF and
PA is an extension by definitions of the other. In detail: If one adds to PA the symbols ∈
and / and one views x ∈ y and x ∈ y / z as abbreviations of certain formulas of PA (the
formulas (∈) and (/) in 10.1) then the axioms of HF can be proved in PA. Conversely,
if one adds to HF the function symbols S (successor), + and ×, and one views Sx = y,
x + y = z and x × y = z as abbreviations of certain formulas of HF then all axioms of
PA become theorems of HF. Most of these facts are discussed in [Be], [My] and [TG]
(although in [Be] this mutual relationship between PA and HF is not fully worked out,
in [TG] the results are merely stated and in [My] the work concerns the model ω of PA).
This situation would not exclude the possibility that arithmetic, when so defined
in HF, is stronger than PA (i.e., that some theorem of HF, after the elimination of ∈
and / in favor of +, × and S, becomes a statement about numbers not provable in PA).
The opposite situation (where PA and HF are interchanged) might be possible as well.
Finite sets and Gödel’s incompleteness theorems 37
That neither of these can happen (at least when (∈) in 10.1 below is used for defining
the relation x ∈ y in PA) is a consequence of the following result.
Theorem 10.1. Let T be the theory, in the language {0, =, +, ×, S, ∈, /}, axiomatized by
the axioms of PA and two further axioms (where z < y abbreviates ∃w(z + Sw = y)):
(∈) x ∈ y ↔ ∃(u < y)∃(v < y)[y = v + 2x + u × 2Sx ∧ v < 2x ]
(that is, x occurs as an exponent when y is written as a sum of powers of 2),
(/) x ∈ y / z ↔ x ∈ y ∨ x = z.
Then there exist formulas α(x, y), β(x, y, z) and γ(x, y, z) in the language of HF such
that another axiomatization of T is provided by the axioms HF1, HF2, the axiom scheme
HF3 and three further axioms:
(S) Sx = y ↔ α(x, y),
(+) x + y = z ↔ β(x, y, z),
(×) x × y = z ↔ γ(x, y, z).
Moreover α, β, γ are unique up to equivalence in HF.
Thus PA and HF have the common extension T by definitions:
T = PA + (∈) + (/) = FS + (S) + (+) + (×).
The above definition (∈) of the relation ∈ in arithmetic is due to Ackermann [A]. The
proof of 10.1 is too tedious to be included here; let us merely indicate how α, β and γ
can be obtained. Suppose E : S → ω is the “enumeration” by natural numbers of the
universe S of the standard model S of HF (see Ap.6), satisfying
1◦ E(0) = 0,
2◦ E({a0 , . . . , an }) = 2E(a0 ) + . . . + 2E(an )
for any distinct a0 , . . . , an ∈ S. It is not hard to formalize in HF this description of E so
as to get a p-function E (in the sense of Section 5) which maps bijectively the universe
of HF onto its sub-universe of ordinal numbers. Then we define
α(x, y) ↔ E(x) / E(x) = E(y),
β(x, y, z) ↔ E(x) ⊕ E(y) = E(z),
γ(x, y, z) ↔ E(x) ⊗ E(y) = E(z),
where ⊕ and ⊗ denote addition and multiplication of ordinals in HF.
This section deals with a question raised by J. Mycielski, whether there exists a theorem
which is of moderate length (so that it can be written down), whilst each proof of it
is so long that it can never be written down. W. Reinhardt has indicated (verbal com-
munication) how to construct a theorem of this kind and we shall work out here some
of the details of this construction. In its original form this result concerned PA but we
38 S. Świerczkowski
believe that the proof of Theorem 11.1 would become much more cumbersome if we were
to replace HF by PA.
Theorem 11.1. If HF is consistent, then there exists a theorem δ of HF which can be
written down, yet such that no proof of δ can be written down (because of its immense
length).
Proof. The proof is based on Theorem 6.4 (Gödel’s Diagonal Lemma). We conclude from
Gödel’s Lemma that there exists, for any fixed ordinal, a sentence δ asserting about itself
that it has no proof whose code is less (in the sense of the inequality < in HF) than
that ordinal. It then remains to attend to the details, to ensure that δ is of reasonably
moderate length, HF ` δ and there does not exist a proof of δ of moderate length. To be
more precise, we start with
Definition 11.2. The length of a string of symbols, such as a term, a formula or a proof,
is the total number of symbols occurring in the string. Here, in counting the symbols, we
disregard the parentheses and commas. Every re-occurring symbol is counted at every
occurrence.
Definition 11.3. It will be said that a formula α is of strictly moderate length, or simply
strictly moderate, if α can be practically written down using an imaginably limited amount
of paper or time. A formula that is HF-equivalent to a strictly moderate one will be called
moderate.
This definition is not at all precise, so it will be left to the reader to agree with us when
we call a given formula “moderate”. Here is an example: Consider the ordinal-valued
exponential p-function exp2 (x) to the base 2, i.e., the p-function satisfying exp2 (x) =
exp2 (x) + exp2 (x) for every non-zero ordinal x and exp2 (x) = 1 when x = 0 or x is not
an ordinal. Then exp2 is defined recursively on ordinals (Ap.3.3) and thus y = exp2 (x)
may be viewed as an abbreviation of a moderate formula. The same may be said about
the 6-fold superposition
y = exp2 (. . . (exp2 (x))))))).
Let K be the constant term for which K = exp2 (. . . (exp2 (2))))))) is a theorem of HF.
Then K denotes a very large ordinal number, in fact K > 1050 , which supposedly exceeds
the number of elementary particles in the Universe. Since only 0, / and parentheses may
appear in any constant term, it is impossible to write K explicitly. Thus y = K is a
formula of HF that is not strictly moderate. But y = K is evidently moderate since it
is HF-equivalent to a strictly moderate formula resulting from a 6-fold application of
the moderate formula abbreviated by y = exp2 (x). We shall need below the formula
z = R(4K + 17), and a discussion similar to the above (based on the recursive definition
of R in Ap.3.3) shows that this one is moderate as well.
We now claim that for proving Reinhardt’s theorem it is enough to find a moderate
formula α(x), with one free variable x, such that for every sentence δ:
(a) S α(pδq) ⇒ ` δ. [Recall that S = standard model of HF.]
(b) If ` δ and δ has a proof of length less than K, then S α(pδq).
(c) S ¬α(pδq) ⇒ ` ¬α(pδq).
Finite sets and Gödel’s incompleteness theorems 39
Indeed, given such α(x), let δ be the sentence obtained in the proof of the Diagonal
Lemma (6.4 above), when applied to ¬α(x), so that
(1) ` δ ↔ ¬α(pδq).
Then δ is moderate as well. Moreover S α(pδq) is impossible, by (a) and (1). Thus
S ¬α(pδq), so (c) and (1) imply ` δ. Also, no proof of δ can be of length less than K,
by (b).
Let us show that (a), (b), (c) are statisfied when α(x) is
(2) ∃(k ∈ K)∃(s ∈ R(4K + 17))Prf(s, k, x).
[Note that ` k ∈ K ↔ Ord(k) ∧ k < K.] Then α(x) is equivalent to
∀y∀z[(y = K ∧ z = R(4K + 17)) → ∃(k ∈ y)∃(s ∈ z)Prf(s, k, x)],
so we see from the above discussion of the formulas y = K and z = R(4K + 17) that α(x)
is moderate. Since condition (a) is now a direct consequence of 4.4, it remains to verify
(b) and (c).
Proof of (b). We need to show that if δ has a proof of length less than K then there
exists a proof δ0 , . . . , δn of δ such that the sequence
(3) s = {h0, pδ0 qi, . . . , hn, pδn qi}
satisfies s ∈ R(4K + 17) and n+ < K. The next two lemmas will serve this purpose.
Lemma 11.4. Let w be a term or formula of length l, and suppose that the index j of
every variable xj occurring in w does not exceed m. Then rank(pwq) ≤ 2l + m + 12.
Proof. First, we deduce directly from the definition of rank that rank(j) = j for all
ordinals j (see Ap.4.4 and 4.5). Let l = 1. Then w is 0 or xj and pwq = 0 or pwq = j ≤ m.
Since rank(j) = j, the assertion is true. We now proceed by induction on l, noting first
that hx, yi = {{x}, {x, y}} and hx, y, zi = hx, hy, zii imply
(4) rank(hx, yi) ≤ max(rank(x), rank(y)) + 2,
(5) rank(hx, y, zi) ≤ max(rank(x), rank(y), rank(z)) + 4.
If l > 1, then pwq is hx, yi or hx, y, zi, where x, y, z are codes of certain sub-strings
of the string w. If pwq is hx, yi, then x = p¬q, whence rank(x) = 10 (see Section 1),
moreover the length of the sub-string of w whose code in y is l − 1. By the inductive
assumption, rank(y) ≤ 2(l − 1) + m + 12, and rank(pwq) ≤ 2l + m + 12 follows from (4).
If pwq = hx, y, zi, then l > 2, x is one of the codes p∈q, p/q, . . . , p∃q and y, z are codes
of sub-strings of w of length at most l − 2. Hence rank(x) ≤ rank(p∃q) ≤ 12, and using
the inductive assumption and (5), we get rank(pwq) ≤ 2l + m + 12.
Lemma 11.5. If each formula δi in the sequence s (see (3)) is of length at most l and the
index j of any variable xj occurring in any δi does not exceed m, then s < 2l +m+n+16.
Proof. One first shows by induction on ordinals n that if rank(x) = n then x < n+
(this is a consequence of rank(n) = n and R(n) < R(n+ ) \ R(n) proved in Ap.5.8 and
5.9). In other words, x < [rank(x)]+ for all x. Thus it will be enough to show that
rank(s) ≤ 2l + m + n + 15. Each element of s is a pair hi, pδi qi, where i ≤ n and
40 S. Świerczkowski
(see (§§) in the proof of 2.5 above). Consequently S ¬α(pδq) yields S ¬ Prf(τ i , σj , pδq)
for all 1 ≤ i ≤ m, 1 ≤ j ≤ l. For any particular τi , σj , S ¬ Prf(τi , σj , pδq) means that
σj fails to be a sequence of length τi coding a proof of δ, that is, σj is not a sequence
of length τi of codes of formulas such that the corresponding sequence of formulas is
a proof of δ. Now, there is an effective procedure that, when applied to the term σj ,
leads to verifying that either σj does not name a sequence of length (named by) τi or,
while naming such a sequence, that sequence fails to satisfy at least one of the conditions
spelled out in the description of the formula Prf(τi , σj , pδq) by (8) in Sect. 4. Thus the
failure of σj to code a proof of δ of length τi is provable in HF. We conclude that
(7) S ¬ Prf(τi , σj , pδq) ⇒ ` ¬ Prf(τi , σj , pδq)
for all 1 ≤ i ≤ m, 1 ≤ j ≤ l. So S ¬α(pδq) implies ` ¬α(pδq), by (6) and (7). This
finishes the proof of (c), and hence of Theorem 11.1.
Proof. Let α(x) be the above formula. If z exists, then it is unique and we shall denote
it by {v : ∃(u ∈ x)ψ}. To prove ` ∀xα(x), we use HF3:
1◦ ` α(0) [take z = 0 in α(0)].
2◦ ` [α(x) → α(x / y)].
To show 2◦ , assume α(x), that is, assume that {v : ∃(u ∈ x)ψ} exists provided that
∀(u ∈ x)∃!vψ(u, v). To show α(x / y), assume that ∀(u ∈ x / y)∃!vψ(u, v), i.e., that
∀u[(u ∈ x ∨ u = y) → ∃!vψ(u, v)]. Hence ∃!vψ(y, v), so let vy be this unique v. Thus the
required z on the right hand side of the implication α(x / y), i.e., {v : ∃(u ∈ x / y)ψ}, is
{v : ∃(u ∈ x)ψ} / vy .
Definition 1.10 (Subset relation).
(a) y ⊆ x ↔ ∀v(v ∈ y → v ∈ x).
(b) y ⊂ x ↔ y ⊆ x ∧ y 6= x.
Theorem 1.11 (Existence of the power set). ∃z∀u(u ∈ z ↔ u ⊆ x).
Proof. Let α(x) be the above formula. If z exists, it is unique and we shall denote it by
P (x). To prove ∀xα(x) we apply HF3. We have
1◦ ` α(0) [take z = {0} in α(0)].
2◦ ` [α(x) → α(x / y)].
To show 2◦ , assume α(x), so that P (x) exists. One checks directly from the definition
that
u ⊆ x / y ↔ u ∈ P (x) ∨ ∃(v ∈ P (x))[u = v / y].
Hence, using the existence of union and the Replacement Scheme (Theorems 1.5 and 1.9),
we find that P (x / y) = P (x) ∪ {u : ∃(v ∈ P (x))[u = v / y]}.
Definition 1.12. We shall call w an ∈-minimal element of z if w ∈ z and no element
of w belongs to z (i.e., w ∩ z = 0).
Theorem 1.13 (Foundation (Regularity) Property).
z 6= 0 → ∃(w ∈ z)[w ∩ z = 0].
Proof (by Donald Monk ). We need to show that ∀(w ∈ z)[w ∩ z 6= 0] → z = 0 and this
will follow if we prove the implication ∀(w ∈ z)[w ∩ z 6= 0] → ∀xα(x), where α(x) is the
formula (x 6∈ z) ∧ (x ∩ z = 0). By HF3, it will be sufficient to establish:
1◦ ` ∀(w ∈ z)[w ∩ z 6= 0] → α(0).
2◦ ` ∀(w ∈ z)[w ∩ z 6= 0] → [α(x) ∧ α(y) → α(x / y)].
Here 1◦ is obvious, and to prove 2◦ , we verify the mutual incompatibility of the following
four conditions:
(a) ∀(w ∈ z)[w ∩ z 6= 0].
(b) α(x), that is, (x 6∈ z) ∧ (x ∩ z = 0).
(c) α(y), that is, (y 6∈ z) ∧ (y ∩ z = 0).
(d) ¬α(x / y), that is, [(x / y) ∈ z)] ∨ [(x / y) ∩ z 6= 0].
The two cases in (d) are:
44 S. Świerczkowski
2. Ordinals
Definition 2.1. We say that x is transitive if every element of x is a subset of x. We
call x an ordinal if x is transitive and every element of x is transitive. This will be
expressed by the formula
∀(y ∈ x){[y ⊆ x] ∧ ∀(z ∈ y)[z ⊆ y]}.
with q0 in place of l0 , or else we get (3) by just replacing l0 at the beginning of our
considerations by q0 (noting that ¬β(k0 , q0 ), by the definition of q0 ).
We now claim that l0 ⊂ k0 . Indeed, suppose p ∈ l0 . Then β(k0 , p), by (3), i.e. k0 ∈
p ∨ k0 = p ∨ p ∈ k0 . But k0 ∈ p contradicts p ∈ l0 and (1) (due to p ⊆ l0 ). Also k0 = p
contradicts (1) (since p ∈ l0 is assumed). Thus p ∈ k0 , and l0 ⊆ k0 follows. Since l0 = k0
contradicts (1), we get l0 ⊂ k0 .
Let m ∈ k0 \ l0 . Then β(m, l0 ), by (2), i.e.,
m ∈ l0 ∨ m = l0 ∨ l0 ∈ m.
Here both m = l0 and l0 ∈ m contradict (1), because m ∈ k0 . But also m ∈ l0 is
impossible, by the choice of m.
Theorem 2.5. Let k, l be ordinals. Then
(a) Exactly one of k ∈ l, k = l, l ∈ k occurs.
(b) k ∈ l ↔ k ⊂ l.
(c) l ∈ k → (l+ = k ∨ l+ ∈ k).
(d) k + = l+ → k = l.
Proof. (a) Any two of the three mentioned possibilities contradict 1.14. At least one
possibility occurs, by 2.4.
(b) Implication → follows by the transitivity of l and 1.14. Suppose k ⊂ l. Then
k ∈ l ∨ l ∈ k, by 2.4. But l ∈ k yields l ∈ l, contradicting 1.14.
(c) By 2.4, we have to exclude k ∈ l + , i.e., we have to show that neither k ∈ l nor
k = l can occur. This is indeed so because each of these possibilities, together with l ∈ k,
leads to l ∈ l.
(d) Suppose k 6= l and k + = k ∪ {k} = l ∪ {l} = l+ . Then k ∈ l and l ∈ k, yielding
again the contradiction l ∈ l.
Definition 2.6. For ordinals k, l, we shall use:
(i) k < l to abbreviate k ∈ l,
(ii) k ≤ l to abbreviate x < y ∨ x = y.
In Section 5 we shall define a binary relation of inequality < on the universe of all
sets. On the sub-universe of ordinals, both relations will be seen to coincide.
Theorem 2.7. Every set x of ordinals is ordered by the binary relation < (or equivalen-
tly, by ⊂). Moreover if x 6= 0 then x has a smallest and a largest element.
Proof. Let x be a set of ordinals. Then x is ordered by ∈ because of (a), (b) in 2.5.
Denote the largest element of x, if existing, by max(x). Let α(x) be the formula stating
the existence of max(x) for non-empty x:
[x 6= 0 ∧ ∀(k ∈ x) Ord(k)] → ∃(l ∈ x)∀(k ∈ x)[k ≤ l].
To prove ` ∀xα(x), we use HF3, i.e., we verify:
1◦ ` α(0) [obvious].
2◦ ` [α(x) → α(x / y)].
46 S. Świerczkowski
To show the latter, assume α(x), i.e., the existence of max(x), provided x is a non-empty
set of ordinals. We should deduce α(x / y). Thus assume that x / y is a set of ordinals
(evidently non-empty). Then x is a set of ordinals, so if x 6= 0, then max(x) exists by
the assumption α(x). Since x / y arises from x by adjoining one element, it is clear that
max(x / y) also exists. If, on the other hand, x = 0, then x / y = {y} and y = max(x / y).
To deduce the existence of min(x), replace in the above proof ≤ by ≥ and “max” by
“min” throughout.
Theorem 2.8. Ord(k) ∧ k 6= 0 → ∃!l(l + = k).
Proof. By 2.1, k is a non-empty set of ordinals. Let l = max(k) (see 2.7). Then l ∈ k,
whence l+ = k ∨ l+ ∈ k, by 2.5. But l+ ∈ k contradicts l = max(k). Thus l + = k. The
uniqueness of l follows from 2.5(d).
Definition 2.9. The unique (if existing) l for which l + = k will be denoted by k and
called the predecessor of k.
3. p-functions
A term Fϕy (τ1 , . . . , τn ), where τ1 , . . . , τn are any terms of HF, will be called a simple
p-term. In general, terms belonging to the language of HF expanded by Fϕy will be
called p-terms. Such p-terms do not represent anything in HF, however formulas contain-
ing p-terms (i.e., p-formulas) will, as mentioned above, represent formulas of HF. This
representation is not unique: a p-formula represents many, logically equivalent, formulas
of HF. To describe this representation, we introduce a process of reduction by which a
p-formula is reduced to one containing one less simple p-term. In detail, suppose the free
variables in ϕ, other than y, are xi1 , . . . , xin (say, in the increasing order of indices). Then
we shall write ϕ as ϕ(xi1 , . . . , xin , y). Let α be a p-formula. Clearly, α contains a simple
p-term and the latter occurs in an atomic sub-formula of α. In other words, there exists
an atomic formula β(z) and there are HF-terms τ1 , . . . , τn such that β(Fϕy (τ1 , . . . , τn )) is a
sub-formula of α. The reduction which we now apply to α consists in replacing, inside α,
the atomic sub-formula β(Fϕy (τ1 , . . . , τn )) by
∃w[β(w) ∧ ϕ(τ
e 1 , . . . , τn , w)],
where w is a new variable, not occurring in α (thus neither in any of the terms τ1 , . . . , τn )
and ϕ
e is any variant of ϕ such that τ1 , . . . , τn , w are substitutable for xi1 , . . . , xin , y in ϕ.
e
[A variant ϕe of ϕ is obtained by renaming the bound variables of ϕ so that ϕ ↔ ϕ e is a
theorem of logic.]
As a result of such a reduction the number of occurrences of Fϕy in a formula is reduced
by one. Thus, when no further reduction is possible, we end up with a formula of HF and
we shall say that the latter is represented by α (or, that α represents the latter). Let us
agree that if α represents γ then α also represents every formula equivalent to γ.
For example, the p-formula Fϕy (xi1 , . . . , xin ) = u represents
∃w[w = u ∧ ϕ(xi1 , . . . , xin , w)],
and the latter, in view of ` ∃!yϕ(xi1 , . . . , xin , y), is equivalent to ϕ(xi1 , . . . , xin , u).
Often instead of Fϕy we shall use only one letter (usually capital); then ϕ and y will be
specified beforehand. By abuse of language, we shall call a p-function symbol a p-function.
We shall use several p-functions, each of these introduced by a formula functional
with respect to a variable (3.1). A formula made up of symbols of HF and several distinct
p-function symbols represents a formula of HF. We obtain the latter by consecutive
reductions eliminating simple p-terms as described above.
Occasionally it may be convenient to consider a p-formula ψ(x, y) functional with
respect to y. However the p-function Fψy defined by such a formula need not be regarded
as a new sort of entity, as we can identify it with the p-function Fϕy , where ϕ is an
HF-formula represented by ψ.
Examples of p-functions abound; e.g., it is clear that the following formulas are func-
tional with respect to y:
S
(a) y = x.
T
(b) y = x.
(c) y = P (x) (i.e., ∀u(u ∈ y ↔ u ⊆ x)).
Other examples are p-functions defined by recursion. We begin with infinite sequences,
48 S. Świerczkowski
defined by recursion on ordinals. To define these, we use finite sequences, called here
simply sequences.
Definition 3.2 (Sequence). A function whose domain is an ordinal k 6= 0 will be called
a sequence of length k. We denote by Seq(s, k) any formula stating that s is a sequence of
length k (e.g., the formula introduced in Part 1, 2.3(b)). If Seq(s, k) then, for any formula
α(z) and n < k, we shall write α(sn ) to abbreviate ∃z[hn, zi ∈ s ∧ α(z)].
Theorem 3.3. For every constant term τ and p-functions G, H there exists a p-function
F such that
(a) F (0) = τ .
(b) x 6= 0 ∧ Ord(x) → F (x) = G(F (x)).
(c) ¬ Ord(x) → F (x) = H(x).
Proof. Let ψ(x, y) be the p-formula
{x 6= 0 ∧ Ord(x) ∧ ∃s[Seq(s, x) ∧ y = G(sx ) ∧ ∀(n < x)[(n = 0 ∧ sn = τ ) ∨
(n 6= 0 ∧ sn = G(sn ))]]} ∨ [¬ Ord(x) ∧ H(x) = y] ∨ [x = 0 ∧ y = τ ].
Then it is easy to prove that ψ is functional with respect to y. (The existence of a least
ordinal x such that there is no unique y for which ψ(x, y) leads easily to a contradiction.)
Next, we check that (a), (b), (c) hold when F is Fψy .
Definition 3.4. We shall say that the p-function F in 3.3 is defined recursively on ordi-
nals.
Definition 4.1. Let R(x) denote the p-function defined recursively on ordinals by
(a) R(0) = 0.
(b) R(x) = P (R(x)) for very non-zero ordinal x.
(c) R(x) = 0 if x is not an ordinal.
Theorem 4.2. For all ordinals m, n:
(a) n < m → R(n) ⊂ R(m).
(b) R(m) is transitive.
Proof. (a) Obviously R(0) = 0 ⊂ R(m) for 0 < m. Now, if there is some n such that
n < m, for some m, but R(n) ⊂ R(m) fails, then by 2.7 there is the least n with
that property. By the above, n > 0, whence n < m. Moreover, R(n) ⊂ R(m). Thus
R(n) ⊂ R(m), by the definition of R, and we get a contradiction.
(b) Clearly R(0) = 0 is transitive. If m 6= 0, then m < m, whence
x ∈ R(m) → x ⊆ R(m) ⊂ R(m),
by (b) in 4.1 and (a) above.
Theorem 4.3. ∃n[Ord(n) ∧ x ∈ R(n)] (i.e., the sets R(n) cover the universe).
Finite sets and Gödel’s incompleteness theorems 49
Proof. Let α(x) be the above formula. We apply HF3 to prove ∀xα(x). We have
1◦ α(0) [because 0 ∈ R(0+ ) = P (R(0)) = P (0) = {0}].
2◦ ` [α(x) ∧ α(y) → α(x / y)].
To prove 2◦ , assume α(x) ∧ α(y), i.e., x ∈ R(n), y ∈ R(m) for some ordinals m, n. Then
x / y = x ∪ {y} ⊆ R(n) ∪ R(m+ )
by 4.1(b) and 4.2(b). So, for k = max{m+ , n}, x / y ⊆ R(k) (see 4.2(a)), whence
x / y ∈ R(k + ).
Definition 4.4. For every x, let rank(x) be the least ordinal n such that x ∈ R(n+ )
(i.e., x ⊆ R(n)).
Theorem 4.5. x ∈ y → rank(x) < rank(y).
Proof. Let x ∈ y and suppose rank(y) = n. Then y ⊆ R(n) and hence x ∈ R(n). Thus
n 6= 0 and x ⊆ R(n), whence rank(x) ≤ n < n.
Definition 4.6. The transitive closure cl(x) of x is the minimal transitive set y such
that x ⊆ y.
Concerning the existence of cl(x), note that x ⊆ R(n) for some ordinal n (see 4.3),
and R(n) is transitive (see 2.2). Thus cl(x) is the intersection of all transitive subsets
of R(n) which contain x as a subset.
Theorem 4.7. rank(x) = rank(cl(x)).
Proof. One checks without problems that:
(a) u ⊆ v → rank(u) ≤ rank(v).
(b) rank(u ∪ v) = max{rank(u), rank(v)} (see 4.2(a)).
(c) rank({w}) = (rank(w))+ .
Now let α(x) be the formula rank(x) = rank(cl(x)). We apply HF3 to prove ∀xα(x). We
have
1◦ ` α(0) [because cl(0) = 0].
2◦ ` [α(x) ∧ α(y) → α(x / y)].
To prove 2◦ , assume α(x) ∧ α(y), i.e., that for some m, n, rank(x) = rank(cl(x)) = n and
rank(y) = rank(cl(y)) = m. Then both sets on the two sides of the inclusion
x / y = x ∪ {y} ⊆ cl(x) ∪ cl(y) ∪ {y}
have the same rank, namely max{n, m+ } (see (b), (c) above). Since the set on the right
is transitive,
x / y ⊆ cl(x / y) ⊆ cl(x) ∪ cl(y) ∪ {y}.
Thus x / y and cl(x / y) have the same rank (see (a)).
The name of the following theorem is justified by the fact that rank(u) < rank(x) for
all u ∈ cl(x), by 4.5 and 4.7. To state it, we need the following definition.
Definition 4.8 (Restriction). If f is a function and z ⊆ dom(f ), then the function
{y ∈ f : ∃(u ∈ z)∃v(y = hu, vi)} will be denoted by f z and called the restriction of f
50 S. Świerczkowski
(see 4.5, 4.7). Clearly cl(x0 ) is covered by the domains cl(u) / u of all fu , u ∈ cl(x0 ). So,
by (b), the union
[
f = {fu : u ∈ cl(x0 )}
is a good function with domain cl(x0 ). But then the set
f / hx0 , G(x0 , f )i
is evidently a good function with domain cl(x0 ) / x0 , contrary to the choice of x0 . The
contradiction shows that a good function with domain cl(x) / x always exists. This
finishes the proof of (c), and hence of Theorem 4.9.
Certain p-functions of n variables can be defined in a similar way. Let us illustrate
this for the case n = 2.
Theorem 4.10. For every ternary p-function G(x, y, z) there exists a binary p-function
F (x, y) such that
` F (x, y) = G(x, y, F (cl(x) × cl(y))).
Proof. To show this, we proceed by analogy with the preceding proof. For transitive sets
d1 , d2 , we say that a function f with domain d1 × d2 is good if
f (u, v) = G(u, v, f (cl(u) × cl(v)))
for all hu, vi ∈ d1 × d2 . One proves that for every x, y there is exactly one good function
with domain (cl(x) / x) × (cl(y) / y), and when this function is denoted by fxy , one
defines F (x, y) = fxy (x, y). In the part of the proof corresponding to (c) above, one
can start again from negating the claim which leads to the existence of an x0 of least
rank for which there is a y0 such that there does not exist a good function with domain
(cl(x0 ) / x0 )×(cl(y0 ) / y0 ). Since also y0 can then be chosen of least rank, a contradiction
results similarly to the proof of (c) above.
In the ZF set theory there does not seem to exist a formula that defines a relation of
global linear order. For HF, such definable order exists, moreover it is a refinement of
the partial order by rank and also of the partial order by subsets. The existence of a
definable order for HF can be deduced from the definitional equivalence of HF and PA,
but we choose here a shorter path: The order will be defined recursively by a suitably
chosen formula (the latter can be found in [My]).
In contrast to the ZF set theory, in HF every ordering of a set x is a well-ordering
(in spite of the fact that in non-standard models of HF certain sets are infinite).
Theorem 5.1. Every ordering relation r ⊆ x × x is a well-ordering.
Proof. Let α(x) be the formula saying: “if r ⊆ x × x orders x, then r well-orders” x. To
prove ` ∀xα(x), we use HF3. We have
1◦ ` α(0) [obvious].
2◦ ` α(x) → α(x / y).
52 S. Świerczkowski
Theorem 5.7. For every order relation r on a set z, {r}z is an order relation on P (z).
Proof. For any subset u of z, let {r}u denote the restriction of r to u, i.e., r ∩ (u × u).
If z = 0 then there is nothing to prove. So let us assume that z 6= 0 and let us denote
maxr (z) by w (see 4.2). We first show that if {r}z\{w} is an order on P (z \ {w}), then
{r}z is an order on P (z). For this purpose, we first note that for every x, y ⊆ z:
(1) w ∈ x ∩ y → [x{r}z y ↔ (x \ {w}){r}z\{w} (y \ {w})].
(2) w ∈
6 x ∪ y → [x{r}z y ↔ x{r}z\{w} y].
(3) w ∈ y \ x → x{r}z y.
Let P (z) be partitioned in two sets: those subsets of z which contain w and those which
do not. Then, assuming that {r}z\{w} is an order on P (z \ {w}), we see from (1) and (2)
that each of the two sets of subsets of z is ordered by {r}z , and it follows from (3) that
each element of the second set (of those subsets of z which do not contain w) r-precedes
each element of the first set. Thus {r}z is an order on P (z).
To complete the proof, let u0 = minr (z) and for each v ∈ z, let [u0 , v] ⊆ z be the closed
interval (for the order r) with end-points u0 , v. One now shows by induction with respect
to the well-ordering r (see 4.1) that for each v ∈ z, {r}[u0 ,v] is an order on P ([u0 , v]).
This is evident for v = u0 (because of [u0 , u0 ] = {u0 }, P ({u0 }) = {0, {u0 }} and 5.6). The
inductive step is accomplished by applying to [u0 , v] the same observations (1), (2), (3)
as were made for z (but with w = v in this case).
Theorem 5.8. Let < be the p-relation introduced by Theorem 5.4, let n be a non-zero
ordinal and suppose that:
(a) R(n) is ordered by <, and
(b) R(n) < (R(n) \ R(n)), i.e., ∀u∀v[u ∈ R(n) ∧ v ∈ (R(n) \ R(n)) → u < v].
Then:
(a+ ) R(n+ ) is ordered by <, and
(b+ ) R(n) < (R(n+ ) \ R(n)).
Proof. Assume (a), (b) and n 6= 0. By definition, R(n+ ) = P (R(n)) and on R(n+ )
the relation < coincides with {<}R(n) (see 5.5). Hence (a+ ) holds, by 5.7. Now let y ∈
R(n+ ) \ R(n) and let x ∈ R(n), i.e., x ⊆ R(n). Then y ⊆ R(n) and y * R(n), so there is
a v ∈ y such that v ∈ R(n) \ R(n). By (b), this v satisfies u < v for all u ∈ R(n), hence
also for all u ∈ x ⊆ R(n). Thus x < y, by 5.4, and (b+ ) is shown.
Theorem 5.9. < is a relation of order and < refines the partial order by subsets and the
partial order by rank.
Proof. It will suffice to prove that:
(1) x 6< x ∧ (x < y < z → x < z) ∧ (x < y ∨ y < x ∨ x = y).
(2) rank(x) < rank(y) → x < y.
To begin with, note that (a) and (b) in 5.8 hold when n = 0, i.e., when R(n) = 0
and R(n) = {0}. So it follows from that theorem that (a) and (b) hold for all non-zero
ordinals n. In particular, taking n large enough so that x, y, z ∈ R(n) (see 4.3), we get (1).
54 S. Świerczkowski
Next, assume that rank(x) < rank(y) and let rank(x) = m, rank(y) = n. Then
m < n → m+ ≤ n → R(m+ ) ⊆ R(n).
Since x ∈ R(m+ ) ⊆ R(n) and y ∈ R(n+ ) \ R(n), x < y follows from R(n) < R(n+ ) \ R(n)
(which is property (b+ ) in 5.8).
Corollary 5.10.
(a) Every set x 6= 0 has an <-largest and an <-smallest element.
(b) x ∈ y → x < y.
Proof. (a) follows from 5.2, and (b) from 4.5 and property (2) in 5.9.
Theorem 5.11 (Induction scheme for inequality). For any formula ϕ(x),
∀x[∀(y < x)ϕ(y) → ϕ(x))] → ∀xϕ(x).
Proof. Suppose the premise ∀x[. . .] of the above implication holds and ¬ϕ(x0 ) is possible,
where x0 ∈ R(n+ +
0 ). Then the set z0 = {u ∈ R(n0 ) : ¬ϕ(u)} is non-empty and min< (z)
exists (see 5.2). We can assume that min< (z) = x0 , whence x0 ∈ R(n+ 0 ) and rank(x0 )
≤ n0 . Now, for any y,
y < x0 → rank(y) ≤ rank(x0 ) ≤ n0 ,
by 5.9(2), whence y < x0 → y ∈ R(n+ 0 ). By the definition of z and x0 , it now follows
that y < x0 → ϕ(y), i.e., ∀(y < x0 )ϕ(y). Thus ϕ(x0 ), by the assumed premise ∀x[. . .],
contrary to the choice of x0 .
Theorem 5.12. ∃z∀y(y < x ↔ y ∈ z).
Proof. We see from (2) in 5.9 that y < x → rank(y) ≤ rank(x). Thus
` y < x → y ∈ R((rank(x))+ ),
by the definition (4.4) of rank, and 4.2(a). By the Comprehension Scheme 1.7, the required
z may now be taken as
z = {y ∈ R((rank(x))+ ) : y < x}.
Definition 5.13. For every x, we shall denote by [0, x) the unique z that satisfies
∀y(y ∈ z ↔ y < x).
Theorem 5.14 (Definition of a p-function by recursion on inequality). For every binary
p-function G(x, y) there exists a p-function F such that
` F (x) = G(x, F [0, x)).
We omit the straightforward proof (similar to that of 4.9).
When viewed as an object of the ZF set theory, the standard model of HF is the
structure S = hS, 0, ∈, /i, where S is the smallest (infinite) set that contains the empty
set 0 and is closed under the operation / of adding one element (i.e., S has the property
that for any x, y in S, also x ∪ {y} is in S). However, it does not seem appropriate to
involve the more powerful ZF theory in order to describe a model of HF, so we rather
choose for a construction using a part of the meta-theory of HF and only as much of ZF
(informally) as is needed. To begin with, we establish some properties of constant terms.
Definition 6.1. We denote by C the class of constant terms, i.e., C is the smallest class
of terms such that 0 ∈ C and σ / τ ∈ C for any σ, τ ∈ C. For every τ ∈ C the number of
appearances of / in τ will be called the length of τ and denoted by l(τ ).
Lemma 6.2. For every 0 6= τ ∈ C there are finitely many τ1 , . . . , τm ∈ C, all shorter
than τ , such that
` x ∈ τ ↔ x = τ 1 ∨ . . . ∨ x = τm .
Proof. The shortest τ 6= 0 is 0 / 0. We have the theorem
` x ∈ 0 / 0 ↔ x ∈ 0 ∨ x = 0 ↔ x = 0,
i.e., m = 1 and τ1 is 0. Now assume that the lemma has been established for all non-zero
constant terms shorter than τ , and let τ be σ / µ. Then σ is shorter than τ , hence there
are σ1 , . . . , σn ∈ C such that
` x ∈ σ ↔ x = σ1 ∨ . . . ∨ x = σn .
Since ` x ∈ τ ↔ x ∈ σ ∨ x = µ, we get
` x ∈ τ ↔ x = σ1 ∨ . . . ∨ x = σn ∨ x = µ.
By inductive assumption, each σi is shorter than σ, thus also shorter than τ . Also µ is
shorter than τ .
Lemma 6.3. Let σ, τ ∈ C be such that non ` σ = τ (i.e., σ = τ is not a theorem of HF).
Then, for some ν ∈ C, one of the two possibilities occurs:
(A) ` ν ∈ σ and ` ν ∈6 τ.
(B) ` ν 6∈ σ and ` ν ∈ τ .
Proof. To begin with, let us establish (A) and (B) when one of the terms σ, τ is 0. Suppose
σ is 0, and assume that σ = τ is not a theorem. Then obviously τ is not the term 0, so τ
is of the form µ / ν and we have ` ν ∈ τ . Clearly ` ν 6∈ 0. Thus (B) follows. Analogously,
if τ is 0 and σ = τ is not a theorem, we get (A).
We now prove the lemma by induction on the length l(σ). The case l(σ) = 0 has been
dealt with already. Suppose that l(σ) > 0 and that the lemma has been shown in every
case (i.e., for every τ ) when σ is replaced by a term shorter than σ. Suppose further that
σ = τ is not a theorem. We should show that then, for every τ , there is a ν such that
either (A) or (B). This was already shown when τ is 0. So it remains to deal with the
situation when neither σ nor τ is 0. Then, by 6.2, there are σ1 , . . . , σn ∈ C, all shorter
than σ, and τ1 , . . . , τm ∈ C, all shorter than τ , such that
56 S. Świerczkowski
(1) ` x ∈ σ ↔ x = σ1 ∨ . . . ∨ x = σn .
(2) ` x ∈ τ ↔ x = τ1 ∨ . . . ∨ x = τm .
Suppose for a moment that for every σi there is a τj such that ` σi = τj , and also for
every τj there is a σi such that ` σi = τj . It is clear that then (1) and (2) above imply
` (x ∈ σ ↔ x ∈ τ ), i.e., ` σ = τ . As this contradicts our assumption, at least one of the
two possibilities must occur.
(a) There is a σi such that non ` σi = τj for all τj . Since σi is shorter than σ, we
conclude by the inductive assumption that either (A) or (B) holds for σi (in place of σ)
and every τj (in place of τ ), for suitable νj . Hence ` σi 6= τj for all j. We now replace x
by σi in
` x 6∈ τ ↔ x 6= τ1 ∧ . . . ∧ x 6= τm
(see (2)), concluding ` σi 6∈ τ . On the other hand, ` σi ∈ σ, by (1). So (A) holds with σi
in place of ν.
(b) There is a τj such that non ` σi = τj for all σi . This case is quite analogous to
the previous; so, proceeding as in (a), we show that (B) holds with τj in place of ν.
Corollary 6.4. For every σ, τ ∈ C, non ` σ = τ implies ` σ 6= τ .
Corollary 6.5. If σ, τ ∈ C are such that ` ν ∈ σ ⇔ ` ν ∈ τ for every ν ∈ C, then
` σ = τ.
Definition 6.6. To define the structure S = hS, 0, ∈, /i we introduce the following:
(a) An equivalence relation ≡ on C given by
σ ≡ τ ⇔ ` σ = τ.
(b) For every τ ∈ C, the ≡-equivalence class [τ ] containing τ .
(c) The set S of all equivalence classes [τ ], i.e., S = {[τ ] : τ ∈ C}.
(d) The binary relation ∈ on S defined by
[τ ] ∈ [µ] ⇔ ` τ ∈ µ.
(e) The binary operation / on S defined by
[τ ] / [µ] = [τ / µ].
(f) The abbreviation 0 for [0].
It is easy to see that this structure is well defined, i.e., [τ ] ∈ [µ] and [τ ] / [µ] depend
only on the equivalence classes [τ ], [µ].
Proposition 6.7. If HF is consistent, then S is a model of HF.
Proof. Let us check that HF1, HF2 and HF3 (see Section 1) hold in S.
HF1. Suppose [σ] ∈ S is such that S [τ ] 6∈ [σ] for every τ . Then σ cannot be of the
form µ / ν, because [ν] ∈ [µ / ν] holds in S (see 6.6(d)). Thus σ is 0. Conversely, we have
S [τ ] 6∈ [0] for all τ , because S [τ ] ∈ [0] is equivalent to ` σ ∈ 0 and this contradicts
Theorem 1.1 and the assumption of consistency.
Finite sets and Gödel’s incompleteness theorems 57
HF2. By 6.5, the Extensionality Property 1.2 holds for S. Thus HF2 will be true
in S if we show that for every τ, µ, ν ∈ C,
` τ ∈ µ / ν ⇔ (` τ ∈ µ or ` τ = ν).
To verify ⇒, assume ` τ ∈ µ / ν. If ` τ = ν, then we are done. If non ` τ = ν, then
by 6.4, ` τ 6= ν. So, in this case,
` τ ∈ µ / ν ⇒ ` (τ ∈ µ ∨ τ = ν) ∧ (τ 6= ν) ⇒ ` τ ∈ µ.
The verification of ⇐ is immediate:
(` τ ∈ µ or ` τ = ν) ⇒ ` (τ ∈ µ ∨ τ = ν) ⇒ ` τ ∈ µ / ν.
HF3. For each equivalence class a ∈ S, let h(a), the height of a, be the smallest l(τ )
(see 6.1) such that τ ∈ a. It is easy to see that if a 6= 0, then there are b, c ∈ S such that
S (a = b / c) and h(b), h(c) < h(a). Given a formula α(x) with one free variable x, let
Aα ⊆ S be the truth set of α, i.e.,
Aα = {a ∈ S : S α(a)}.
Assuming that 0 ∈ Aα and moreover that a, b ∈ Aα implies a / b ∈ Aα for all a, b ∈ S,
we deduce by induction on the height h(c) that c ∈ Aα for every c ∈ S. Thus Aα = S,
and HF3 is verified.
References
[A] W. Ackermann, Die Wiederspruchsfreiheit der allgemeinen Mengenlehre, Math. Ann.
114 (1937), 305–315.
[A-B] Z. Adamowicz and T. Bigorajska, Existentially closed structures and Gödel’s second
incompleteness theorem, J. Symbolic Logic 66 (2001), 349–356.
[Be] E. W. Beth, Axiomatique de la théorie des ensembles sans axiome de l’infini , Bull. Soc.
Math. Belg. 16 (1964), 127–136.
[Bo] G. Boolos, The Logic of Provability, Cambridge Univ. Press, Cambridge, 1993.
[F] S. Feferman, Arithmetization of metamathematics in a general setting, Fund. Math. 49
(1960), 35–92.
[GT] S. Givant and A. Tarski, Peano arithmetic and the Zermelo-like theory of sets with
finite ranks, Notices Amer. Math. Soc. 24 (1977), A-437.
[H-B] D. Hilbert and P. Bernays, Grundlagen der Mathematik , Vol. 2, Springer, Berlin, 1939.
[J] T. Jech, On Gödel’s second incompleteness theorem, Proc. Amer. Math. Soc. 121
(1994), 311–313.
[Ki] M. Kikuchi, Kolmogorov complexity and the second incompleteness theorem, Arch.
Math. Logic 36 (1997), 437–443.
[Kl] S. C. Kleene, Introduction to Metamathematics, Van Nostrand, 1952; Russian transla-
tion by A. S. Yesenin-Vol’pin, Izdat. Inostr. Liter., Moscow, 1957.
[Ko] H. Kotlarski, Other proofs of old results, Math. Logic Quart. 44 (1998), 474–480.
[L] P. Lindström, Aspects of Incompleteness, Lecture Notes in Logic 10, Springer, Berlin,
1997.
[Mo] A. Mostowski, Sentences Undecidable in Formalized Arithmetic, North-Holland, Ams-
terdam, 1964.
[My] J. Mycielski, The definition of arithmetic operations in the Ackermann model , Algebra
i Logika Sem. 3 (1964), no. 5–6, 64–65 (in Russian).
58 S. Świerczkowski