Analytic Number Theory
Analytic Number Theory
Analytic Number Theory
n=1
n
2
. . . . . . . . . . . . . . . . . . . . . 195
A.2 Innite products . . . . . . . . . . . . . . . . . . . . . . . . . 196
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
6 CONTENTS
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
List of Tables
1.1 Some important multiplicative functions . . . . . . . . . . . . 23
1.2 Some other important arithmetic functions . . . . . . . . . . 24
5.1 The error term in the Prime Number Theorem, I . . . . . . . 143
5.2 The error term in the Prime Number Theorem, II: Elementary
proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.1 Table of all Dirichlet characters modulo 15. The integers a in
the second row are the values of 2
1
11
2
modulo 15. . . . . . 178
7
8 LIST OF TABLES
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Notation
R the set of real numbers
C the set of complex numbers
Z the set of integers
N the set of positive integers (natural numbers)
N
0
the set of nonnegative integers (i.e., N 0)
[x] the greatest integer x (oor function)
x the fractional part of x, i.e., x [x]
d, n, m, . . . integers (usually positive)
p, p
i
, q, q
i
, . . . primes
p
m
, p
, . . . prime powers
d[n d divides n
d n d does not divide n
p
m
[[n p
m
divides exactly n (i.e., p
m
[n and p
m+1
n)
(a, b) the greatest common divisor (gcd) of a and b
[a, b] the least common multiple (lcm) of a and b
n =
k
i=1
p
i
i
canonical prime factorization of an integer n > 1 (with dis-
tinct primes p
i
and exponents
i
1)
n =
p
p
(p)
the prime factorization of n in a dierent notation, with p
running through all primes and exponents (p) 0
n =
p
m
||n
p
m
yet another way of writing the canonical prime factorization
of n
px
summation over all primes x
p
m
summation over all prime powers p
m
with p prime and m a
positive integer
d|n
summation over all positive divisors of n (including the trivial
divisors d = 1 and d = n)
d
2
|n
summation over all positive integers d for which d
2
divides n
p
m
||n
summation over all prime powers that divide exactly n (i.e.,
if n =
k
i
p
i
i
is the standard prime factorization of n, then
p
m
||n
f(p
m
) is the same as
k
i=1
f(p
i
i
))
p|n
summation over all (distinct) primes dividing n.
9
10 LIST OF TABLES
Convention for empty sums and products: An empty sum (i.e., one in
which the summation condition is never satised) is dened as 0; an empty
product is dened as 1. Thus, for example, the relation n =
p
m
||n
p
m
remains valid for n = 1 since the right-hand side is an empty product in this
case.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Chapter 0
Primes and the Fundamental
Theorem of Arithmetic
Primes constitute the holy grail of analytic number theory, and many of
the famous theorems and problems in number theory are statements about
primes. Analytic number theory provides some powerful tools to study prime
numbers, and most of our current (still rather limited) knowledge of primes
has been obtained using these tools.
In this chapter, we give a precise denition of the concept of a prime,
and we state the Fundamental Theorem of Arithmetic, which says that every
integer greater than 1 has a unique (up to order) representation as a product
of primes. We conclude the chapter by proving the innitude of primes.
The material presented in this chapter belongs to elementary (rather
than analytic) number theory, but we include it here in order to make the
course as self-contained as possible.
0.1 Divisibility and primes
In order to dene the concept of a prime, we rst need to dene the notion
of divisibility.
Given two integers d ,= 0 and n, we say that d divides n or n is
divisible by d, if there exists an integer m such that n = dm. We write
d[n if d divides n, and d n if d does not divide n.
Note that divisibility by 0 is not dened, but the integer n in the above
denition may be 0 (in which case n is divisible by any non-zero integer d)
or negative (in which case d[n is equivalent to d[(n)).
While the above denition allows for the number d in the relation d[n
11
12 CHAPTER 0. PRIMES
to be negative, it is clear that d[n if and only if (d)[n, so there is a one-to-
one correspondence between the positive and negative divisors of an integer
n. In particular, no information is lost by focusing on the positive divisors
of a given integer, and it will be convenient to restrict the notion of a divisor
to that of a positive divisor. We therefore make the following convention:
Unless otherwise specied, by a divisor of an integer we mean a positive
divisor, and in a notation like d[n the variable d represents a positive divisor
of n. This convention allows us, for example, to write the sum-of-divisors
function (n) (dened as the sum of all positive divisors of n) simply as
(n) =
d|n
d, without having to add the extra condition d > 0 under the
summation symbol.
The greatest common divisor (gcd) of two integers a and b that are
not both zero is the unique integer d > 0 satisfying (i) d[a and d[b, and (ii)
if c[a and c[b, then c[d. The gcd of a and b is denoted by (a, b). If (a, b) = 1,
then a and b are said to be relatively prime or coprime.
The least common multiple (lcm) of two non-zero integers a and b
is the unique integer m > 0 satisfying (i) a[m and b[m, and (ii) if a[n and
b[n, then m[n. The lcm of a and b is denoted by [a, b].
The gcd and the lcm of more than two integers are dened in an analo-
gous manner.
An integer n > 1 is called prime (or a prime number) if its only
positive divisors are the trivial ones, namely 1 and n.
The sequence of primes, according to this (commonly accepted) deni-
tion is thus 2, 3, 5, 7, 11, . . . Note, in particular, that 1 is not a prime, nor is
0 or any negative integer.
Primes in other algebraic structures. The notion of a prime can
be dened in quite general algebraic structures. All that is needed for such
a denition to make sense is an analog of the multiplication operation (so
that divisibility can be dened), and the notion of units (which serve as
trivial divisors, analogous to the numbers 1 among the integers). One
can then dene a prime as any element in the given structure that can only
be factored in a trivial way, in the sense that one of the factors is a unit. The
best-known examples of such structures are algebraic integers, which behave
in many respects like the ordinary integers, and which form the subject of
a separate branch of number theory, algebraic number theory.
Another example is given by the ring of polynomials with integer coef-
cients, with multiplication of ordinary polynomials as ring operation and
the constant polynomials 1 as units. The primes in such a polynomial
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 13
ring turn out to be the irreducible (over Z) polynomials.
0.2 The Fundamental Theorem of Arithmetic
As the name suggests, this result, which we now state, is of fundamental
importance in number theory, and many of the results in later chapters
depend in a crucial way on this theorem and would fail if the theorem were
false.
Theorem 0.1 (Fundamental Theorem of Arithmetic). Every integer
n > 1 can be written as a product of primes, and the representation is unique
up to the order of the factors.
The proof of this result, while elementary, is somewhat involved, and we
will not give it here. (It can be found in any text on elementary number
theory.) We only note here that the crux of the proof lies in showing the
uniqueness of a prime factorization; the proof of the existence of such a
factorization is an easy exercise in induction.
Notation. There are several common ways to denote the prime factoriza-
tion guaranteed by the Fundamental Theorem of Arithmetic. First, we can
write the prime factorization of an integer n 2 as
n = p
1
. . . p
r
,
where the p
i
s are primes, but not necessarily distinct.
In most situations it is more useful to combine identical factors in the
above representation and write
n = p
1
1
. . . p
r
r
,
where, this time, the p
i
s are distinct primes, and the exponents
i
positive
integers.
Using the notation p
m
[[n if p
m
is the exact power of p that divides n
(i.e., p
m
[n, but p
m+1
n), we can write the above representation as
n =
p
m
||n
p
m
.
Yet another useful representation of the prime factorization of n is
n =
p
p
(p)
,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
14 CHAPTER 0. PRIMES
where the product is extended over all prime numbers and the exponents
(p) are nonnegative integers with (p) ,= 0 for at most nitely many p.
The last notation is particularly convenient when working with the great-
est common divisor or the least common multiple, since these concepts have
a simple description in terms of this notation: Indeed, if n and m are posi-
tive integers with prime factorization n =
p
p
(p)
and m =
p
p
(p)
, then
the gcd and lcm of n and m are given by
(n, m) =
p
p
min((p),(p))
, [n, m] =
p
p
max((p),(p))
,
respectively. Similarly, divisibility is easily characterized in terms of the ex-
ponents arising in the representation: Given n =
p
p
(p)
and m =
p
p
(p)
,
we have m[n if and only if (p) (p) for all p.
With the convention that an empty product is to be interpreted as 1, all
of the above formulas remain valid when n = 1.
Unique factorization in more general algebraic structures. As
mentioned above, the concept of a prime can be dened in very general
algebraic structures. One can then ask if an analog of the Fundamental
Theorem of Arithmetic also holds in these structures. It turns out that the
existence part of this result, i.e., the assertion that every (non-unit) element
in the given structure has a representation as a product of prime elements,
remains valid under very general conditions. By contrast, the uniqueness
of such a representation (up to the order of the factors or multiplication by
units) is no longer guaranteed and can fail, even in some simple examples.
For instance, in the ring of algebraic integers n + m
6i : m, n Z, the
number 10 can be factored as 10 = 2 5 and 10 = (2+i
6)(2i
iI
p
i
i
,
where p
1
< p
2
< is the sequence of primes, I a nite (possibly empty)
subset of the positive integers, and the exponents
i
are positive integers.
This characterization of the positive integers suggests the following general-
ization of the concepts of a prime and a (positive) integer, which was
rst proposed some 50 years ago by Arne Beurling. Instead of starting with
an appropriate analog of the integers and then trying to dene a notion of a
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 15
prime, the idea of Beurling was to start with an appropriate generalization
of the primes and then dene generalized integers as above in terms of these
generalized primes. Specically, let T = p
1
< p
2
< be an arbitrary
sequence of positive numbers (which need not even be integers), and let N
P
be the set of all nite products of the form () with the p
i
s taken from T.
Then T is called a system of Beurling generalized primes, and N
P
the asso-
ciated system of Beurling generalized integers. One can study such systems
in great generality, and ask, for instance, how the growth of such a se-
quence of generalized primes is related with that of the associated sequence
of generalized integers.
0.3 The innitude of primes
We conclude this chapter with a proof of the innitude of primes, a result
rst proved some two thousand years ago by Euclid.
Theorem 0.2. There are innitely many primes.
Proof. We give here a somewhat nonstandard proof, which, while not as
short as some other proofs, has a distinctly analytic avor. It is based on
the following lemma, which is of interest in its own right.
Lemma 0.3. Let T = p
1
, . . . , p
k
be a nite set of primes, let
N
P
= n N : p[n p T,
i.e., N
P
is the set of positive integers all of whose prime factors belong to
the set T (note that 1 N
P
), and let
N
P
(x) = #n N
P
: n x (x 1).
Then there exist constants c and x
0
(depending on T) such that N
P
(x)
c(log x)
k
for x x
0
.
Proof. Note that
N
P
= p
a
1
1
p
a
2
2
. . . p
a
k
k
: a
i
N
0
,
and that by the Fundamental Theorem of Arithmetic each element in N
P
corresponds to a unique k-tuple (a
1
, . . . , a
k
) of nonnegative integers. Thus,
N
P
(x) = #(a
1
, . . . , a
k
) : a
i
N
0
, p
a
1
1
. . . p
a
k
k
x
= #(a
1
, . . . , a
k
) : a
i
N
0
, a
1
log p
1
+ +a
k
log p
k
log x.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
16 CHAPTER 0. PRIMES
Now note that the inequality a
1
log p
1
+ + a
k
log p
k
log x implies a
i
log x/ log p
i
log x/ log 2 for each i. Hence, for each a
i
there are at most
[log x/ log 2] + 1 choices, and the number of tuples (a
1
, . . . , a
k
) counted in
N
P
(x) is therefore
__
log x
log 2
_
+ 1
_
k
.
If we now restrict x by x 2, then [log x/ log 2] + 1 2 log x/ log 2, so the
above becomes
_
2
log x
log 2
_
k
= (2/ log 2)
k
(log x)
k
.
This gives the asserted bound for N
P
(x) with c = (2/ log 2)
k
and x
0
= 2.
With this lemma at hand, the innitude of primes follows easily: If
there were only nitely many primes, then we could apply the lemma with
T equal to the set of all primes and, consequently, N
P
the set of all positive
integers, so that N
P
(x) = [x] for all x 1. But the lemma would give
the bound N
P
(x) c(log x)
k
for all x 2 with some constant c, and since
(log x)
k
/[x] tends to zero as x , this is incompatible with the equality
N
P
(x) = [x].
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 17
0.4 Exercises
0.1 Show that there exist arbitrarily large intervals that are free of primes,
i.e., for every positive integer k there exist k consecutive positive in-
tegers none of which is a prime.
0.2 Let p(x) =
k
i=0
a
i
x
i
be a polynomial with integer coecients a
i
and
of degree k 1. Show that p(n) is composite for innitely many inte-
gers n.
Remark: With composite replaced by prime, the question be-
comes a famous problem that is open (in general) for polynomials of
degree at least 2. For example, it is not known whether there are
innitely many primes of the form p(n) = n
2
+ 1.
0.3 Call a set of positive integers a PC-set if it has the property that
any pair of distinct elements of the set is coprime. Given x 2,
let N(x) = max[A[ : A [2, x], A is a PC-set, i.e., N(x) is the
maximal number of integers with the PC property that one can t
into the interval [2, x]. Prove that N(x) is equal to (x), the number
of primes x.
0.4 A positive integer n is called squarefull if it satises () p[n p
2
[n.
(Note that n = 1 is squarefull according to this denition, since 1 has
no prime divisors and the above implication is therefore trivially true.)
(i) Show that n is squarefull if and only if n can be written in the
form n = a
2
b
3
with a, b N.
(ii) Find a similar characterization of k-full integers, i.e., integers
n N that satisfy () with 2 replaced by k (where k 3).
0.5 Let T = p
1
, . . . , p
k
be a nite set of primes, let
N
P
= n N : p[n p T
i.e., N
P
is the set of positive integers all of whose prime factors belong
to the set T (note that 1 N
P
), and let
N
P
(x) = #n N
P
: n x (x 1).
In Lemma 0.3 we showed that N
P
(x) c
1
(log x)
k
for a suitable con-
stant c
1
(depending on the set T, but not on x) and for all suciently
large x, say x x
1
. Prove that a bound of the same type holds in the
other direction, i.e., there exist constants c
2
> 0 and x
2
, depending on
T, such that N
P
(x) c
2
(log x)
k
holds for all x x
2
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
18 CHAPTER 0. PRIMES
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Chapter 1
Arithmetic functions I:
Elementary theory
1.1 Introduction and basic examples
A simple, but very useful concept in number theory is that of an arithmetic
function. An arithmetic function is any real- or complex-valued function
dened on the set N of positive integers. (In other words, an arithmetic
function is just a sequence of real or complex numbers, though this point of
view is not particularly useful.)
Examples
(1) Constant function: The function dened by f(n) = c for all n,
where c is a constant, is denoted by c; in particular, 1 denotes the
function that is equal to 1 for all n.
(2) Unit function: e(n), dened by e(1) = 1 and e(n) = 0 for n 2.
(3) Identity function: id(n); dened by id(n) = n for all n.
(4) Logarithm: log n, the (natural) logarithm, restricted to N and re-
garded as an arithmetic function.
(5) Moebius function: (n), dened by (1) = 1, (n) = 0 if n is not
squarefree (i.e., divisible by the square of a prime), and (n) = (1)
k
if n is composed of k distinct prime factors (i.e., n =
k
i
p
i
).
19
20 CHAPTER 1. ARITHMETIC FUNCTIONS I
(6) Characteristic function of squarefree integers:
2
(n) or [(n)[.
From the denition of the Moebius function, it follows that the ab-
solute value (or, equivalently, the square) of is the characteristic
function of the squarefree integers.
(7) Liouville function: (n), dened by (1) = 1 and (n) = (1)
k
if n is composed of k not necessarily distinct prime factors (i.e., if
n =
k
i=1
p
i
i
then (n) =
k
i=1
(1)
i
).
(8) Euler phi (totient) function: (n), the number of positive integers
m n that are relatively prime to n; i.e., (n) =
n
m=1,(m,n)=1
1.
(9) Divisor function: d(n), the number of positive divisors of n (includ-
ing the trivial divisors d = 1 and d = n); i.e., d(n) =
d|n
1. (Another
common notation for this function is (n).)
(10) Sum-of-divisors function: (n), the sum over all positive divisors
of n; i.e., (n) =
d|n
d.
(11) Generalized sum-of-divisors functions:
(n), dened by
(n) =
d|n
d
k
i=1
p
i
i
; i.e., (n) =
p|n
1.
(13) Total number of prime divisors: (n), dened in the same way as
(n), except that prime divisors are counted with multiplicity. Thus,
(1) = 0 and (n) =
k
i=1
i
if n 2 and n =
k
i=1
p
i
i
; i.e., (n) =
p
m
|n
1. For squarefree integers n, the functions (n) and (n) are
equal and are related to the Moebius function by (n) = (1)
(n)
.
For all integers n, (n) = (1)
(n)
.
(14) Ramanujan sums: Given a positive integer q, the Ramanujan sum
c
q
is the arithmetic function dened by c
q
(n) =
q
a=1,(a,q)=1
e
2ian/q
.
(15) Von Mangoldt function: (n), dened by (n) = 0 if n is not a
prime power, and (p
m
) = log p for any prime power p
m
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 21
1.2 Additive and multiplicative functions
Many important arithmetic functions are multiplicative or additive func-
tions, in the sense of the following denition.
Denition. An arithmetic function f is called multiplicative if f , 0 and
(1.1) f(n
1
n
2
) = f(n
1
)f(n
2
) whenever (n
1
, n
2
) = 1;
f is called additive if it satises
(1.2) f(n
1
n
2
) = f(n
1
) +f(n
2
) whenever (n
1
, n
2
) = 1.
If this condition holds without the restriction (n
1
, n
2
) = 1, then f is called
completely (or totally) multiplicative resp. completely (or totally)
additive.
The condition (1.1) can be used to prove the multiplicativity of a given
function. (There are also other, indirect, methods for establishing mul-
tiplicativity, which we will discuss in the following sections.) However, in
order to exploit the multiplicativity of a function known to be multiplicative,
the criterion of the following theorem is usually more useful.
Theorem 1.1 (Characterization of multiplicative functions). An
arithmetic function f is multiplicative if and only if f(1) = 1 and, for n 2,
(1.3) f(n) =
p
m
||n
f(p
m
).
The function f is completely multiplicative if and only if the above condition
is satised and, in addition, f(p
m
) = f(p)
m
for all prime powers p
m
.
Remarks. (i) The result shows that a multiplicative function is uniquely
determined by its values on prime powers, and a completely multiplicative
function is uniquely determined by its values on primes.
(ii) With the convention that an empty product is to be interpreted as
1, the condition f(1) = 1 can be regarded as the special case n = 1 of (1.3).
With this interpretation, f is multiplicative if and only if f satises (1.3)
for all n N.
Proof. Suppose rst that f satises f(1) = 1 and (1.3) for n 2. If n
1
and
n
2
are positive integers with (n
1
, n
2
) = 1, then the prime factorizations of
n
1
and n
2
involve disjoint sets of prime powers, so expressing each of f(n
1
),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
22 CHAPTER 1. ARITHMETIC FUNCTIONS I
f(n
2
), and f(n
1
n
2
) by (1.3) we see that f satises (1.1). Moreover, since
f(1) = 1, f cannot be identically 0. Hence f is multiplicative.
Conversely, suppose that f is multiplicative. Then f is not identically
0, so there exists n N such that f(n) ,= 0. Applying (1.3) with (n
1
, n
2
) =
(n, 1), we obtain f(n) = f(1 n) = f(1)f(n), which yields f(1) = 1, upon
dividing by f(n).
Next, let n 2 be given with prime factorization n =
k
i=1
p
i
i
. Shaving
o prime powers one at a time, and applying (1.3) inductively, we have
f(n) = f
_
p
1
1
p
k
k
_
= f
_
p
1
1
p
k1
k1
_
f
_
p
k
k
_
= = f (p
1
1
) f
_
p
k
k
_
,
so (1.3) holds.
If f is completely multiplicative, then for any prime power p
m
we have
f(p
m
) = f(p
m1
p) = f(p
m1
)f(p) = = f(p)
m1
.
Conversely, if f is multiplicative and satises f(p
m
) = f(p)
m
for all prime
powers p
m
, then (1.3) can be written as f(n) =
r
i=1
f(p
i
), where now n =
r
i=1
p
i
is the factorization of n into single (not necessarily distinct) prime
factors p
i
. Since, for any two positive integers n
1
and n
2
, the product of the
corresponding factorizations is the factorization of the product, it follows
that the multiplicativity property f(n
1
n
2
) = f(n
1
)f(n
2
) holds for any pair
(n
1
, n
2
) of positive integers. Hence f is completely multiplicative.
Theorem 1.2 (Products and quotients of multiplicative functions).
Assume f and g are multiplicative function. Then:
(i) The (pointwise) product fg dened by (fg)(n) = f(n)g(n) is multi-
plicative.
(ii) If g is non-zero, then the quotient f/g (again dened pointwise) is
multiplicative.
Proof. The result is immediate from the denition of multiplicativity.
Analogous properties hold for additive functions: an additive function
satises f(1) = 0 and f(n) =
p
m
||m
f(p
m
), and the pointwise sums and
dierences of additive functions are additive.
Tables 1.1 and 1.2 below list the most important multiplicative and ad-
ditive arithmetic functions, along with their values at prime powers, and
basic properties. (Properties that are not obvious from the denition will
be established in the following sections.)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 23
Function value at n value at p
m
properties
e(n) 1 if n = 1,
0 else
0 unit element
w.r.t. Dirichlet
product,
e f =f e=f
id(n) (identity
function)
n p
m
s(n) (char. fct.
of squares)
1 if n = m
2
with m N,
0 else
1 if m is even,
0 if m is odd
2
(n) (char.
fct. of
squarefree
integers)
1 if n is
squarefree,
0 else
1 if m = 1,
0 if m > 1
(n) (Moebius
function)
1 if n = 1,
(1)
k
if
n =
k
i=1
p
i
(p
i
distinct),
0 otherwise
1 if m = 1,
0 if m > 1
d|n
(d) = 0 if
n 2
1 = e
(n) (Liouville
function)
1 if n = 1,
(1)
k
i=1
i
if
n =
k
i=1
p
i
i
(1)
m
d|n
(d) = n
2
1 = s
(n) (Euler phi
function)
#1 m n :
(m, n) = 1
p
m
(1 1/p)
d|n
(d) = n
1 = id
d(n) (= (n))
(divisor
function)
d|n
1 m+ 1 d = 1 1
(n) (sum of
divisor
function)
d|n
d
p
m+1
1
p 1
= 1 id
Table 1.1: Some important multiplicative functions
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
24 CHAPTER 1. ARITHMETIC FUNCTIONS I
Function value at n value at p
m
properties
(n) (number
of distinct
prime
factors)
0 if n = 1,
k if
n =
k
i=1
p
i
i
1 additive
(n) (total
number of
prime
factors)
0 if n = 1,
k
i=1
i
if
n =
k
i=1
p
i
i
m completely
additive
log n
(logarithm)
log n log p
m
completely
additive
(n) (von
Mangoldt
function)
log p if n = p
m
,
0 if n is not a
prime power
log p neither additive
nor
multiplicative
log = 1
Table 1.2: Some other important arithmetic functions
1.3 The Moebius function
The fundamental property of the Moebius function is given in the following
theorem.
Theorem 1.3 (Moebius identity). For all n N,
d|n
(d) = e(n); i.e.,
the sum
d|n
(d) is zero unless n = 1, in which case it is 1.
Proof. There are many ways to prove this important identity; we give here
a combinatorial proof that does not require any special tricks or techniques.
We will later give alternate proofs, which are simpler and more elegant, but
which depend on some results in the theory of arithmetic functions.
If n = 1, then the sum
d|n
(d) reduces to the single term (1) = 1,
so the asserted formula holds in this case. Next, suppose n 2 and let
n =
k
i=1
p
i
i
be the canonical prime factorization of n. Since (d) = 0 if
d is not squarefree, the sum over d can be restricted to divisors of the form
d =
iI
p
i
, where I 1, 2, . . . , k, and each such divisor contributes a
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 25
term (d) = (1)
|I|
. Hence,
d|n
(d) =
I{1,...,k}
(1)
|I|
.
Now note that, for any r 0, 1, . . . , k, there are
_
k
r
_
subsets I with [I[ = r,
and for each such subset the summand (1)
|I|
is equal to (1)
r
. Hence the
above sum reduces to
k
r=0
(1)
r
_
k
r
_
= (1 1)
k
= 0,
by the binomial theorem. (Note that k 1, since we assumed n 2.) Hence
we have
d|n
(d) = 0 for n 2, as claimed.
Motivation for the Moebius function. The identity given in this the-
orem is the main reason for the peculiar denition of the Moebius function,
which may seem rather articial. In particular, the denition of (n) as 0
when n is not squarefree appears to be unmotivated. The Liouville func-
tion (n), which is identical to the Moebius function on squarefree integers,
but whose denition extends to non-squarefree integers in a natural way,
appears to be a much more natural function to work with. However, this
function does not satisfy the identity of the theorem, and it is this identity
that underlies most of the applications of the Moebius function.
Application: Evaluation of sums involving a coprimality condition.
The identity of the theorem states that
d|n
(d) is the characteristic func-
tion of the integer n = 1. This fact can be used to extract specic terms
from a series. A typical application is the evaluation of sums over integers n
that are relatively prime to a given integer k. By the theorem, the charac-
teristic function of integers n with (n, k) = 1 is given by
d|(n,k)
(d). Since
the condition d[(n, k) is equivalent to the simultaneous conditions d[n and
d[k, one can formally rewrite a sum
n,(n,k)=1
f(n) as follows:
n
(n,k)=1
f(n) =
n
f(n)e((n, k)) =
n
f(n)
d|(n,k)
(n)
=
d|k
(d)
n
d|n
f(n) =
d|k
(d)
m
f(dm).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
26 CHAPTER 1. ARITHMETIC FUNCTIONS I
The latter sum can usually be evaluated, and doing so yields a formula for
the original sum. (Of course, one has to make sure that the series involved
converge.) The following examples illustrate the general method.
Evaluation of the Euler phi function. By denition, the Euler phi
function is given by (n) =
mn,(m,n)=1
1. Eliminating the coprimality
condition (m, n) = 1, as indicated above, yields the identity
(n) =
mn
d|(m,n)
(d) =
d|n
(d)
mn,d|m
1 =
d|n
(d)(n/d).
(For an alternative proof of this identity see Section 1.7.)
Ramanujan sums. The functions c
q
(n) =
q
a=1,(a,q)=1
exp(2ian/q),
where q is a positive integer, are called Ramanujan sums. By eliminat-
ing the condition (a, q) = 1 using the above method, one can show that
c
q
(n) =
d|(q,n)
d(q/d). When n = 1, this formula reduces to c
q
(1) = (q),
and we obtain the remarkable identity
q
a=1,(a,q)=1
exp(2ia/q) = (q),
which shows that the sum over all primitive k-th roots of unity is equal
to (q).
A weighted average of the Moebius function. While the estimation
of the partial sums of the Moebius function
nx
(n) is a very deep (and
largely unsolved) problem, remarkably enough a weighted version of this
sum, namely
nx
(n)/n is easy to bound. In fact, we will prove:
Theorem 1.4. For any real x 1 we have
(1.4)
nx
(n)
n
1.
Proof. Note rst that, without loss of generality, one can assume that x = N,
where N is a positive integer. We then evaluate the sum S(N) =
nN
e(n)
in two dierent ways. On the one hand, by the denition of e(n), we have
S(N) = 1; on the other hand, writing e(n) =
d|n
(d) and interchanging
summations, we obtain S(N) =
dN
(d)[N/d], where [t] denotes the
greatest integer t. Now, for d N 1, [N/d] diers from N/d by
an amount that is bounded by 1 in absolute value, while for d = N, the
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 27
quantities [N/d] and N/d are equal. Replacing [N/d] by N/d and bounding
the resulting error, we therefore obtain
S(N) N
dN
(d)
d
dN
[(d)[ [[N/d] (N/d)[
dN1
[(d)[ N 1.
Hence,
dN
(d)
d
(N 1) +[S(N)[ = (N 1) + 1 = N,
which proves (1.4).
The Moebius function and the Prime Number Theorem. Part of
the interest in studying the Moebius function stems from the fact that the
behavior of this function is intimately related to the Prime Number Theorem
(PNT), which says that the number (x) of primes below x is asymptot-
ically equal to x/ log x as x and which is one of the main results
of Analytic Number Theory. We will show later that the PNT is equiv-
alent to the fact that (n) has average value (mean value) zero, i.e.,
lim
x
(1/x)
nx
(n) = 0. The latter statement can has the following
probabilistic interpretation: If a squarefree integer is chosen at random, then
it is equally likely to have an even and an odd number of prime factors.
Mertens conjecture. A famous conjecture, attributed to Mertens
(though it apparently was rst stated by Stieltjes who mistakenly believed
that he had proved it) asserts that [
nx
(n)[
x for all x 1. This
conjecture had remained open for more than a century, but was disproved
(though only barely!) in 1985 by A. Odlyzko and H. te Riele, who used
extensive computer calculations, along with some theoretical arguments, to
show that the above sum exceeds 1.06
d|n
(d)(n/d).
(iii) is multiplicative.
(iv) (n) =
p
m
||n
(p
m
p
m1
) = n
p|n
(1 1/p) for all n N.
Proof. (i) Split the set A = 1, 2, . . . , n into the pairwise disjoint subsets
A
d
= m A : (m, n) = d, d[n. Writing an element m A
d
as m =
dm
, we see that A
d
= dm
: 1 m
n/d, (m
, n/d) = 1, and so
[A
d
[ = (n/d). Since n = [A[ =
d|n
[A
d
[, it follows that n =
d|n
(n/d).
Writing d
1
(m) is never 1) is true for values m up to 10
10
10
. While the conjecture is
still open, Kevin Ford, a former UIUC graduate student and now a faculty
member here, proved a number of related conjectures. In particular, he
showed that for every integer k 2 there exist innitely many m such that
1
(m) has cardinality k. This was known as Sierpinskis conjecture,
and it complements Carmichaels conjecture which asserts that in the case
k = 1, the only case not covered by Sierpinskis conjecture, the assertion of
Sierpinskis conjecture is not true.
1.5 The von Mangoldt function
The denition of the von Mangoldt function may seem strange at rst glance.
One motivation for this peculiar denition lies in the following identity.
Theorem 1.6. We have
d|n
(d) = log n (n N).
Proof. For n = 1, the identity holds since (1) = 0 = log 1. For n 2 we
have, by the denition of ,
d|n
(d) =
p
m
|n
log p = log n.
(For the last step note that, for each prime power p
[[n, gives
||n
log p
= log
||n
p
= log n.)
The main motivation for introducing the von Mangoldt function is that
the partial sums
nx
(n) represent a weighted count of the prime powers
p
m
x, with the weights being log p, the correct weights to oset the
density of primes. It is not hard to show that higher prime powers (i.e.,
those with m 2) contribute little to the above sum, so the sum is essen-
tially a weighted sum over prime numbers. In fact, studying the asymptotic
behavior of the above sum is essentially equivalent to studying the behavior
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
30 CHAPTER 1. ARITHMETIC FUNCTIONS I
of the prime counting function (x); for example, the PNT is equivalent to
the assertion that lim
x
(1/x)
nx
(n) = 1. In fact, most proofs of the
PNT proceed by rst showing the latter relation, and then deducing from
this the original form of the PNT. The reason for doing this is that, because
of the identity in the above theorem (and some similar relations), working
with (n) is technically easier than working directly with the characteristic
function of primes.
1.6 The divisor and sum-of-divisors functions
Theorem 1.7. The divisor function d(n) and the sum-of-divisors function
(n) are multiplicative. Their values at prime powers are given by
d(p
m
) = m+ 1, (p
m
) =
p
m+1
1
p 1
.
Proof. To prove the multiplicativity of d(n), let n
1
and n
2
be positive in-
tegers with (n
1
, n
2
) = 1. Note that if d
1
[n
1
and d
2
[n
2
, then d
1
d
2
[n
1
n
2
.
Conversely, by the coprimality condition and the fundamental theorem of
arithmetic, any divisor d of n
1
n
2
factors uniquely into a product d = d
1
d
2
,
where d
1
[n
1
and d
2
[n
2
. Thus, there is a 1 1 correspondence between the
set of divisors of n
1
n
2
and the set of pairs (d
1
, d
2
) with d
1
[n
1
and d
2
[n
2
.
Since there are d(n
1
)d(n
2
) such pairs and d(n
1
n
2
) divisors of n
1
n
2
, we ob-
tain d(n
1
n
2
) = d(n
1
)d(n
2
), as required. The multiplicativity of (n) can be
proved in the same way. (Alternate proofs of the multiplicativity of d and
will be given in the following section.)
The given values for d(p
m
) and (p
m
) are obtained on noting that the
divisors of p
m
are exactly the numbers p
0
, p
1
, . . . , p
m
. Since there are m+1
such divisors, we have d(p
m
) = m + 1, and applying the geometric series
formula to the sum of these divisors gives the asserted formula for (p
m
).
Perfect numbers. The sum-of-divisors function is important because of
its connection to so-called perfect numbers, that is, positive integers n
that are equal to the sum of all their proper divisors, i.e., all positive divisors
except n itself. Since the divisor d = n is counted in the denition of (n),
the sum of proper divisors of n is (n) n. Thus, an integer n is perfect if
and only if (n) = 2n. For example, 6 is perfect, since 6 = 1 + 2 + 3. It is
an unsolved problem whether there exist innitely many perfect numbers.
However, a result of Euler states:
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 31
Theorem (Euler). An even integer n is perfect if and only if n is of the
form n = 2
p1
(2
p
1) where p is a prime and 2
p
1 is also prime.
This result is not hard to prove, using the multiplicity of (n). The
problem with this characterization is that it is not known whether there
exist innitely many primes p such that 2
p
1 is also prime. (Primes of this
form are called Mersenne primes, and whether there exist innitely many of
these is another famous open problem.)
There is no analogous characterization of odd perfect numbers; in fact,
no single odd perfect number has been found, and it is an open problem
whether odd perfect numbers exist.
1.7 The Dirichlet product of arithmetic functions
The two obvious operations on the set of arithmetic functions are pointwise
addition and multiplication. The constant functions f = 0 and f = 1
are neutral elements with respect to these operations, and the additive and
multiplicative inverses of a function f are given by f and 1/f, respectively.
While these operations are sometimes useful, by far the most important
operation among arithmetic functions is the so-called Dirichlet product,
an operation that, at rst glance, appears mysterious and unmotivated, but
which has proved to be an extremely useful tool in the theory of arithmetic
functions.
Denition. Given two arithmetic functions f and g, the Dirichlet prod-
uct (or Dirichlet convolution) of f and g, denoted by f g, is the arith-
metic function dened by
(f g)(n) =
d|n
f(d)g(n/d).
In particular, we have (f g)(1) = f(1)g(1), (f g)(p) = f(1)g(p) +
f(p)g(1) for any prime p, and (f g)(p
m
) =
m
k=0
f(p
k
)g(p
mk
) for any
prime power p
m
.
It is sometimes useful to write the Dirichlet product in the symmetric
form
(f g)(n) =
ab=n
f(a)g(b),
where the summation runs over all pairs (a, b) of positive integers whose
product equals n. The equivalence of the two denitions follows immediately
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
32 CHAPTER 1. ARITHMETIC FUNCTIONS I
from the fact that the pairs (d, n/d), where d runs over all divisors of n, are
exactly the pairs (a, b) of the above form.
One motivation for introducing this product is the fact that the de-
nitions of many common arithmetic functions have the form of a Dirichlet
product, and that many identities among arithmetic functions can be written
concisely as identities involving Dirichlet products. Here are some examples:
Examples
(1) d(n) =
d|n
1, so d = 1 1.
(2) (n) =
d|n
d, so = id1.
(3)
d|n
(d) = e(n) (see Theorem 1.3), so 1 = e.
(4)
d|n
(d)(n/d) = (n) (one of the applications of the Moebius iden-
tity, Theorem 1.3), so id = .
(5)
d|n
(d) = n (Theorem 1.5), so 1 = id.
(6)
d|n
(d) = log n (Theorem 1.6), so 1 = log.
A second motivation for dening the Dirichlet product in the above man-
ner is that this product has nice algebraic properties.
Theorem 1.8 (Properties of the Dirichlet product).
(i) The function e acts as a unit element for , i.e., f e = e f = f for
all arithmetic functions f.
(ii) The Dirichlet product is commutative, i.e., f g = g f for all f and
g.
(iii) The Dirichlet product is associative, i.e., (f g) h = f (g h) for
all f, g, h.
(iv) If f(1) ,= 0, then f has a unique Dirichlet inverse, i.e., there is a
unique function g such that f g = e.
Proof. (i) follows immediately from the denition of the Dirichlet prod-
uct. For the proof of (ii) (commutativity) and (iii) (associativity) it is
useful to work with the symmetric version of the Dirichlet product, i.e.,
(f g)(n) =
ab=n
f(a)g(b). The commutativity of is immediate from
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 33
this representation. To obtain the associativity, we apply this representa-
tion twice to get
((f g) h)(n) =
dc=n
(f g)(d)h(c) =
dc=n
ab=d
f(a)g(b)h(c)
=
abc=n
f(a)g(b)h(c),
where the last sum runs over all triples (a, b, c) of positive integers whose
product is equal to n. Replacing (f, g, h) by (g, h, f) in this formula yields
the same nal (triple) sum, and we conclude that (f g) h = (g h) f =
f (g h), proving that is associative.
It remains to prove (iv). Let f be an arithmetic function with f(1) ,= 0.
By denition, a function g is a Dirichlet inverse of f if (f g)(1) = e(1) = 1
and (f g)(n) = e(n) = 0 for all n 2. Writing out the Dirichlet product
(f g)(n), we see that this is equivalent to the innite system of equations
f(1)g(1) = 1, (A
1
)
d|n
g(d)f(n/d) = 0 (n 2). (A
n
)
We need to show that the system (A
n
)
n=1
has a unique solution g. We
do this by inductively constructing the values g(n) and showing that these
values are uniquely determined.
For n = 1, equation (A
1
) gives g(1) = 1/f(1), which is well dened since
f(1) ,= 0. Hence, g(1) is uniquely dened and (A
1
) holds. Let now n 2,
and suppose we have shown that there exist unique values g(1), . . . , g(n
1) so that equations (A
1
)(A
n1
) hold. Since f(1) ,= 0, equation (A
n
) is
equivalent to
(1.5) g(n) =
1
f(1)
d|n,d<n
g(d)f(n/d).
Since the right-hand side involves only values g(d) with d < n, this deter-
mines g(n) uniquely, and dening g(n) by (1.5) we see that (A
n
) (in addition
to (A
1
)(A
n1
)) holds. This completes the induction argument.
Examples
(1) Since 1 = e, the Moebius function is the Dirichlet inverse of the
function 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
34 CHAPTER 1. ARITHMETIC FUNCTIONS I
(2) Multiplying the identity = id (obtained in the last section) by 1
gives 1 = 1 = 1 id = e id = id, so we get the identity
1 = id stated in Theorem 1.5.
The last example is a special case of an important general principle,
which we state as a theorem.
Theorem 1.9 (Moebius inversion formula). If g(n) =
d|n
f(d) for all
n N, then f(n) =
d|n
g(d)(n/d) for all n.
Proof. The given relation can be written as g = f 1. Taking the Dirichlet
product of each side in this relation with the function we obtain g =
(f 1) = f (1 ) = f e = f, which is the asserted relation.
Finally, a third motivation for the denition of the Dirichlet product is
that it preserves the important property of multiplicativity of a function, as
shown in the following theorem. This is, again, by no means obvious.
Theorem 1.10 (Dirichlet product and multiplicative functions).
(i) If f and g are multiplicative, then so is f g.
(ii) If f is multiplicative, then so is the Dirichlet inverse f
1
.
(iii) If f g = h and if f and h are multiplicative, then so is g.
(iv) (Distributivity with pointwise multiplication) If h is completely multi-
plicative, then h(f g) = (hf) (hg) for any functions f and g.
Remarks. (i) The product of two completely multiplicative functions is mul-
tiplicative (by the theorem), but not necessarily completely multiplicative.
For example, the divisor function d(n) can be expressed as a product 1 1
in which each factor 1 is completely multiplicative, but the divisor function
itself is only multiplicative in the restricted sense (i.e., with the coprimality
condition). The same applies to the Dirichlet inverse: if f is completely
multiplicative, then f
1
is multiplicative, but in general not completely
multiplicative.
(ii) By Theorem 1.8, any function f with f(1) ,= 0 has a Dirichlet inverse.
Since a multiplicative function satises f(1) = 1, any multiplicative function
has a Dirichlet inverse.
(iii) Note that the distributivity asserted in property (iv) only holds
when the function h is completely multiplicative. (In fact, one can show
that this property characterizes completely multiplicative functions: If h is
any non-zero function for which the identity in (iv) holds for all functions f
and g, then h is necessarily completely multiplicative.)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 35
Proof. (i) Let f and g be multiplicative and let h = f g. Given n
1
and n
2
with (n
1
, n
2
) = 1, we need to show that h(n
1
n
2
) = h(n
1
)h(n
2
). To this end
we use the fact (see the proof of Theorem 1.7) that each divisor d[n
1
n
2
can
be factored uniquely as d = d
1
d
2
with d
1
[n
1
and d
2
[n
2
, and that, conversely,
given any pair (d
1
, d
2
) with d
1
[n
1
and d
2
[n
2
, the product d = d
1
d
2
satises
d[n
1
n
2
. Hence
h(n
1
n
2
) =
d|n
1
n
2
f(d)g(n
1
n
2
/d) =
d
1
|n
1
d
2
|n
2
f(d
1
d
2
)g((n
1
n
2
)/(d
1
d
2
)).
Since (n
1
, n
2
) = 1, any divisors d
1
[n
1
and d
2
[n
2
satisfy (d
1
, d
2
) = 1 and
(n
1
/d
1
, n
2
/d
2
) = 1. Hence, in the above double sum we can apply the
multiplicativity of f and g to obtain
h(n
1
n
2
) =
d
1
|n
1
d
2
|n
2
f(d
1
)g(n/d
1
)f(d
2
)g(n
2
/d
2
)
= (f g)(n
1
)(f g)(n
2
) = h(n
1
)h(n
2
),
which is what we had to prove.
(ii) Let f be a multiplicative function and let g be the Dirichlet inverse
of f. We prove the multiplicativity property
(1.6) g(n
1
n
2
) = g(n
1
)g(n
2
) if (n
1
, n
2
) = 1
by induction on the product n = n
1
n
2
. If n
1
n
2
= 1, then n
1
= n
2
= 1, and
(1.6) holds trivially. Let n 2 be given, and suppose (1.6) holds whenever
n
1
n
2
< n. Let n
1
and n
2
be given with n
1
n
2
= n and (n
1
, n
2
) = 1. Applying
the identity (A
n
) above, we obtain, on using the multiplicativity of f and
that of g for arguments < n,
0 =
d|n
1
n
2
f(d)g(n
1
n
2
/d)
=
d
1
|n
1
d
2
|n
2
f(d
1
)f(d
2
)g(n
1
/d
1
)g(n
2
/d
2
) + (g(n
1
n
2
) g(n
1
)g(n
2
))
= (f g)(n
1
)(f g)(n
2
) + (g(n
1
n
2
) g(n
1
)g(n
2
))
= e(n
1
)e(n
2
) + (g(n
1
n
2
) g(n
1
)g(n
2
)),
= g(n
1
n
2
) g(n
1
)g(n
2
),
since, by our assumption n = n
1
n
2
2, at least one of n
1
and n
2
must be
2, and so e(n
1
)e(n
2
) = 0. Hence we have g(n
1
n
2
) = g(n
1
)g(n
2
). Thus,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
36 CHAPTER 1. ARITHMETIC FUNCTIONS I
(1.6) holds for pairs (n
1
, n
2
) of relatively prime integers with n
1
n
2
= n, and
the induction argument is complete.
(iii) The identity f g = h implies g = f
1
h, where f
1
is the Dirichlet
inverse of f. Since f and h are multiplicative functions, so is f
1
(by (ii))
and f
1
h (by (i)). Hence g is multiplicative as well.
(iv) If h is completely multiplicative, then for any divisor d[n we have
h(n) = h(d)h(n/d). Hence, for all n,
h(f g)(n) = h(n)
d|n
f(d)g(n/d) =
d|n
h(d)f(d)h(n/d)g(n/d)
= ((hf) (hg))(n),
proving (iv).
Application I: Proving identities for multiplicative arithmetic
functions. The above results can be used to provide simple proofs of
identities for arithmetic functions, using the multiplicativity of the func-
tions involved. To prove an identity of the form f g = h in the case when
f, g, and h are known to be multiplicative functions, one simply shows, by
direct calculation, that () (f g)(p
m
) = h(p
m
) holds for every prime power
p
m
. Since, by the above theorem, the multiplicativity of f and g implies
that of f g, and since multiplicative functions are uniquely determined by
their values at prime powers, () implies that the identity (f g)(n) = h(n)
holds for all n N.
Examples
(1) Alternate proof of the identity
d|n
(d) = e(n). The identity
can be written as 1 = e, and since all three functions involved
are multiplicative, it suces to verify that the identity holds on prime
powers. Since e(p
m
) = 0 and ( 1)(p
m
) =
m
k=0
(p
k
) = 1 1 + 0
0 = 0, this is indeed the case.
(2) Proof of
d|n
2
(d)/(d) = n/(n). This identity is of the form
f 1 = g with f =
2
/ and g = id/. The functions f and g
are both quotients of multiplicative functions and therefore are mul-
tiplicative. Hence all three functions in the identity f 1 = g are
multiplicative, and it suces to verify the identity at prime powers.
We have g(p
m
) = p
m
/(p
m
) = p
m
/(p
m
p
m1
) = (1 1/p)
1
, and
(f 1)(p
m
) =
m
k=0
(
2
(p
k
)/(p
k
)) = 1 +1/(p 1) = (1 1/p)
1
, and
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 37
so g(p
m
) = (f 1)(p
m
) for every prime power p
m
. Thus the identity
holds at prime powers, and therefore it holds in general.
(3) The Dirichlet inverse of . Since 1 = e, the function 1 is the
Dirichlet inverse of the Moebius function. To nd the Dirichlet inverse
of , i.e., the unique function f such that f = e, note rst that
since and e are both multiplicative, f must be multiplicative as well,
and it therefore suces to evaluate f at prime powers. Now, for any
prime power p
m
,
0 = e(p
m
) =
m
k=0
f(p
k
)(p
mk
) =
m
k=0
f(p
k
)(1)
mk
,
so f(p
m
) =
m1
k=0
f(p
k
)(1)
k
. This implies f(p) = 1, and by in-
duction f(p
m
) = 0 for m 2. Hence f is the characteristic function
of the squarefree numbers, i.e.,
1
=
2
.
Application II: Evaluating Dirichlet products of multiplicative
functions. Since the Dirichlet product of multiplicative functions is mul-
tiplicative, and since a multiplicative function is determined by its values on
prime powers, to evaluate a product f g with both f and g multiplicative,
it suces to compute the values of f g at prime powers. By comparing
these values to those of familiar arithmetic functions, one can often identify
f g in terms of familiar arithmetic functions.
Examples
(1) The function 1. We have (1)(p
m
) =
m
k=0
(p
k
) =
m
k=0
(1)
k
,
which equals 1 if m is even, and 0 otherwise. However, the latter values
are exactly the values at prime powers of the characteristic function
of the squares, which is easily seen to be multiplicative. Hence 1 is
equal to the characteristic function of the squares.
(2) The function f
k
(n) =
d|n,(d,k)=1
(d). Here k is a xed positive
integer, and the summation runs over those divisors of n that are
relatively prime to k. We have f
k
= g
k
1, where g
k
(n) = (n)
if (n, k) = 1 and g
k
(n) = 0 otherwise. It is easily seen that g
k
is multiplicative, so f
k
is also multiplicative. On prime powers p
m
,
g
k
(p
m
) = 1 if m = 1 and p k and g
k
(p
m
) = 0 otherwise, so
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
38 CHAPTER 1. ARITHMETIC FUNCTIONS I
f
k
(p
m
) =
m
i=0
g(p
k
) = 1 1 = 0 if p k, and f
k
(p
m
) = 1 otherwise.
By the multiplicativity of f
k
it follows that f
k
is the characteristic
function of the set A
k
= n N : p[n p[k.
Application III: Proving the multiplicativity of functions, using
known identities. This is, in a sense, the previous application in reverse.
Suppose we kow that f g = h and that f and h are multiplicative. Then,
by Theorem 1.10, g must be multiplicative as well.
Examples
(1) Multiplicativity of . Since 1 = id (see Theorem 1.5) and
the functions 1 and id are (obviously) multiplicative, the function
must be multiplicative as well. This is the promised proof of the
multiplicativity of the Euler function (part (ii) of Theorem 1.5).
(2) Multiplicativity of d(n) and (n). Since d = 11, and the function
1 is multiplicative, the function d is multiplicative as well. Similarly,
since = id1, and 1 and id are multiplicative, is multiplicative.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 39
1.8 Exercises
1.1 Evaluate the function f(n) =
d
2
|n
(d) (where the summation runs
over all positive integers d such that d
2
[n), in the sense of expressing
it in terms of familiar arithmetic functions.
1.2 The unitary divisor function d
(n) = (a, b) N
2
: ab = n, (a, b) = 1.
Show that d
d|n
1
d
f
_
n
d
_
(n N).
1.4 For each of the following arithmetic functions, evaluate the function,
or express it in terms of familiar arithmetic functions.
(i) g
k
(n) =
d|n,(d,k)=1
(d), where k N is xed. (Here the sum-
mation runs over all d N that satisfy d[n and (d, k) = 1.)
(ii) h
k
(n) =
d|n,k|d
(d), where k N is xed.
1.5 Show that, for every positive integer n 2,
1kn1
(k,n)=1
k =
n
2
(n).
1.6 Let f(n) =
d|n
(d) log d. Find a simple expression for f(n) in terms
of familiar arithmetic functions.
1.7 Let f(n) = #(n
1
, n
2
) N
2
: [n
1
, n
2
] = n, where [n
1
, n
2
] is the least
common multiple of n
1
and n
2
. Show that f is multiplicative and
evaluate f at prime powers.
1.8 Let f be a multiplicative function. We know that the Dirichlet inverse
f
1
is then also multiplicative. Show that f
1
is completely multi-
plicative if and only if f(p
m
) = 0 for all prime powers p
m
with m 2
(i.e., if and only if f is supported by the squarefree numbers).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
40 CHAPTER 1. ARITHMETIC FUNCTIONS I
1.9 Given an arithmetic function f, a Dirichlet square root of f is an
arithmetic function g such that g g = f. Prove by elementary tech-
niques that the constant function 1 has two Dirichlet square roots, of
the form g, where g is a multiplicative function, and nd the values
of g at prime powers.
1.10 Let f(n) = (n)/n, and let n
k
k=1
be the sequence of values n at
which f attains a record low; i.e., n
1
= 1 and, for k 2, n
k
is
dened as the smallest integer > n
k1
with f(n
k
) < f(n) for all n <
n
k
. (For example, since the rst few values of the sequence f(n) are
1, 1/2, 2/3, 1/2, 4/5, 1/3, . . ., we have n
1
= 1, n
2
= 2, and n
3
= 6, and
the corresponding values of f at these arguments are 1, 1/2 and 1/3.)
Find (with proof) a general formula for n
k
and f(n
k
).
1.11 Let f be a multiplicative function satisfying lim
p
m
f(p
m
) = 0.
Show that lim
n
f(n) = 0.
1.12 An arithmetic function f is called periodic if there exists a positive
integer k such that f(n + k) = f(n) for every n N; the integer k is
called a period for f. Show that if f is completely multiplicative and
periodic with period k, then the values of f are either 0 or roots of
unity. (An root of unity is a complex number z such that z
n
= 1 for
some n N.)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Chapter 2
Arithmetic functions II:
Asymptotic estimates
The values of most common arithmetic functions f(n), such as the divisor
function d(n) or the Moebius function (n), depend heavily on the arith-
metic nature of the argument n. As a result, such functions exhibit a seem-
ingly chaotic behavior when plotted or tabulated as functions of n, and it
does not make much sense to seek an asymptotic formula for f(n).
However, it turns out that most natural arithmetic functions are very
well behaved on average, in the sense that the arithmetic means M
f
(x) =
(1/x)
nx
f(n), or, equivalently, the summatory functions S
f
(x) =
nx
f(n), behave smoothly as x and can often be estimated very
accurately. In this chapter we discuss the principal methods to derive such
estimates. Aside from the intrinsic interest of studying the behavior of
M
f
(x) or S
f
(x), these quantities arise naturally in a variety of contexts,
and having good estimates available is crucial for a number of applications.
Here are some examples, all of which will be discussed in detail later in this
chapter.
(1) The number of Farey fractions of order Q, i.e., the number of rationals
in the interval (0, 1) whose denominator in lowest terms is Q, is equal
to S
(Q), where S
(x) =
nx
(n) is the summatory function of the
Euler phi function.
(2) The probability that two randomly chosen positive integers are co-
prime is equal to the limit lim
x
2S
(x)/x
2
, which turns to be 6/
2
.
(3) The probability that a randomly chosen positive integer is squarefree
41
42 CHAPTER 2. ARITHMETIC FUNCTIONS II
is equal to the mean value of the function
2
(n)(= [(n)[), i.e., the
limit lim
x
M
g(x),
indicates that the constants x
0
and c implicit in the estimate may depend
on . If the constants can be chosen independently of , for in some range,
then the estimate is said to hold uniformly in that range. Dependence on
several parameters is indicated in an analogous way, as in f(x)
,
g(x).
Oh and oh expressions in equations. A term O(g(x)) in an equation
stands for a function R(x) satisfying the estimate R(x) = O(g(x)). For
example, the notation f(x) = x(1 + O(1/ log x)) means that there exist
constants x
0
and c and a function (x) dened for x x
0
and satisfying
[(x)[ c/ log x for x x
0
, such that f(x) = x(1 +(x)) for x x
0
.
Big oh versus small oh. Big oh estimates provide more infor-
mation than small oh estimates or asymptotic formulas, since they give
explicit bounds for the error terms involved. An o-estimate, or an asymp-
totic formula, only shows that a certain function tends to zero, but does not
provide any information for the rate at which this function tends to zero. For
example, the asymptotic relation f(x) g(x), or equivalently, the o-estimate
f(x) = g(x)+o(g(x)), means that the ratio (g(x)f(x))/g(x) tends to zero,
whereas a corresponding O-estimate, such as f(x) = g(x) + O((x)g(x)),
with an explicit function (x) (e.g., (x) = 1/ log x), provides additional
information on the speed of convergence.
O-estimates are also easier to work with and to manipulate than o-
estimates. For example, Os can be pulled out of integrals or sums pro-
vided the functions involved are nonnegative, whereas such manipulations
are in general not allowed with os.
For the above reasons, O-estimates are much more useful than o-
estimates, and one should therefore try to state and prove results in terms of
O-estimates. It is very rare that one can prove a o-estimate, without getting
an explicit bound for the o-term, and hence a more precise O-estimate, by
the same argument.
2.1.3 Examples
Examples from analysis
(1) x
= O
,c
(e
cx
) for any xed real numbers and c > 0.
(2) exp(x
) exp((x + 1)
) if < 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 45
(3) log x = O
(x
,
exp((log x)
)
A,
(log x)
A
).
The proofs of such estimates are usually straightforward exercises at the
calculus level. To illustrate some typical arguments, we give the proofs of
(1), (2), and (5):
Proof of (1). Let and c > 0 be given. We need to show that there exist
constants C and x
0
such that x
Ce
cx
for x x
0
. Setting f(x) = x
e
cx
,
this is equivalent to showing that f(x) is bounded for suciently large x.
This, however, follows immediately on noting that (i) f(x) tends to zero as
x (which can be seen by lHopitals rule) and (ii) f(x) is continuous
on the positive real axis, and hence must attain a maximal value on the
interval [1, ). An alternative argument, which yields explicit values for
the constants C and x
0
runs as follows:
If 0, then for x 1 we have f(x) = x
e
cx
1, so the desired bound
holds with constants C = x
0
= 1. Assume therefore that 0 < < 1. We
have log f(x) = log x cx and hence f
= /x c.
Hence f
0
e
it follows
that f(x) f(x
0
) = x
0
e
cx
0
= C for x x
0
.
Proof of (2). Let f(x) = exp(x
(x + 1)
(x+1)
(x + 1)
[ max
xyx+1
y
1
by the mean value theorem of calculus, and since the expression on the right
here tends to zero as x , as < 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
46 CHAPTER 2. ARITHMETIC FUNCTIONS II
Proof of (5). In the range [x[ 1, the estimate cos x 1 = O(x
2
) with 1
as O-constant follows immediately from the mean value theorem (or, what
amounts to the same argument, Taylors series for cos x truncated after the
rst term and with an explicit error term):
[ cos x 1[ = [ cos 0 cos x[ [x[ max
0|y||x|
[ siny[ x
2
,
where in the last step we used the fact that [ siny[ is increasing and bounded
by [y[ on the interval [1, 1]. To extend the range for this estimate to all
of R, we only need to observe that, since [ cos x[ 1 for all x we have
[ cos x 1[ 2 for all x, and hence [ cos x 1[ 2x
2
for [x[ 1. Thus, the
desired estimate holds for all x with O-constant 2.
Examples from number theory
(1) The prime number theorem (PNT) is the statement that (x)
x/ log x, or, equivalently, (x) = x/ log x + o(x/ log x)). Here (x)
is the number of primes not exceeding x.
(2) A sharper form of the PNT asserts that (x) = x/ log x +
O(x/(log x)
2
). Factoring out the main term x/ log x, this estimate
can also be written as (x) = (x/ log x)(1 +O(1/ log x)), which shows
that O(1/ log x) is the relative error in the approximation of (x) by
x/ log x.
(3) A still sharper form involves the approximation Li(x) =
_
x
2
(1/ log t)dt.
The currently best known estimate for (x) is of the form (x) =
Li(x) +O
(xexp((log x)
(x
1/2+
) for any xed > 0.
Additional examples and remarks
The following examples and remarks illustrate common uses of the O-
and o-notations. The proofs are immediate consequences of the denitions.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 47
(1) A commonly seen O-estimate is f(x) = O(1). This simply means that
f(x) is bounded for suciently large x (or for all x in a given range).
Similarly f(x) = o(1) means that f(x) tends to 0 as x .
(2) If C is a positive constant, then the estimate f(x) = O(Cg(x)) is
equivalent to f(x) = O(g(x)). In particular, the estimate f(x) = O(C)
is equivalent to f(x) = O(1). The same holds for o-estimates.
(3) O-estimates are transitive, in the sense that if f(x) = O(g(x)) and
g(x) = O(h(x)), then f(x) = O(h(x)).
(4) As an application of this transitivity and the basic estimates above we
have, for example, log(1 +O(f(x))) = O(f(x)) and 1/(1 +O(f(x))) =
1 + O(f(x)) whenever f(x) 0 as x . (The latter condition
ensures that the function represented by the term O(f(x)) is bounded
by 1/2 for suciently large x, so the estimates log(1 + y) = O(y)
and (1 + y)
1
= 1 + O(y) are applicable with y being the function
represented by O(f(x)).)
(5) If f(x) = g(x) +O(1), then e
f(x)
e
g(x)
, and vice versa.
(6) If f(x) = g(x) +o(1), then e
f(x)
e
g(x)
, and vice versa.
(7) Os can be pulled out of sums or integrals provided the function in-
side the O-term is nonnegative. For example, if F(x) =
_
x
0
f(y)dy and
f(y) = O(g(y)) for y 0, where g is a nonnegative function, then
F(x) = O
__
x
0
g(y)dy
_
. (This does not hold without the nonnegativ-
ity condition, nor does an analogous result hold for o-estimates; for
counterexamples see the exercises.)
(8) According to our convention, an asymptotic estimate for a function of
x without an explicitly given range is understood to hold for x x
0
for
a suitable x
0
. This is convenient as many estimates (e.g., log log x =
O(
log x)) do not hold, or do not make sense, for small values of x,
and the convention allows one to just ignore those issues. However,
there are applications in which it is desirable to have an estimate
involving a simple explicit range for x, such as x 1, instead of an
unspecied range like x x
0
with a suciently large x
0
. This
can often be accomplished in two steps as follows: First establish the
desired estimate for x x
0
, with a certain x
0
. Then use direct (and
usually trivial) arguments to show that the estimate also holds for
1 x x
0
. For example, one form of the PNT states that ()
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
48 CHAPTER 2. ARITHMETIC FUNCTIONS II
(x) = Li(x) + O(x(log x)
2
). Suppose we have established () for
x x
0
. To show that () in fact holds for x 2, we can argue as
follows: In the range 2 x x
0
the functions (x) and Li(x) are
bounded from above, say [(x)[, [ Li(x)[ A for 2 x x
0
and some
constant A (depending on x
0
). On the other hand, the function in the
error term, x(log x)
2
, is bounded from below by a positive constant,
say > 0, in this range. (For example, we can take = 2(log x
0
)
2
.)
Hence, for 2 x x
0
we have
[(x) Li(x)[ 2A
2A
x(log x)
2
(2 x x
0
),
so () holds for 2 x x
0
with c = 2A/ as O-constant.
2.1.4 The logarithmic integral
The logarithmic integral is the function Li(x) dened by
Li(x) =
_
x
2
(log t)
1
dt (x 2).
This integral is important in number theory as it represents the best known
approximation to the prime counting function (x). The integral cannot
be evaluated exactly (in terms of elementary functions), but the following
theorem gives (a sequence of) asymptotic estimates for the integral in terms
of elementary functions.
Theorem 2.1 (The logarithmic integral). For any xed positive integer
k we have
Li(x) =
x
log x
_
k1
i=0
i!
(log x)
i
+O
k
_
1
(log x)
k
_
_
(x 2).
In particular, we have Li(x) = x/ log x +O(x/(log x)
2
) for x 2.
To prove the result we require a crude estimate for a generalized version
of the logarithmic integral, namely
Li
k
(x) =
_
x
2
(log t)
k
dt (x 2),
where k is a positive integer (so that Li
1
(x) = Li(x)). This result is of
independent interest, and its proof is a good illustration of the method of
splitting the range of integration.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 49
Lemma 2.2. For any xed positive integer k we have
Li
k
(x)
k
x
(log x)
k
(x 2).
Proof. First note that the bound holds trivially in any range of the form
2 x x
0
(with the O-constant depending on x
0
). We therefore may
assume that x 4. In this case we have 2
x 2)
k
x)
k
= 2
k
(log x)
k
, so the integral over this range is at most
2
k
(log x)
k
(x
x)
k
x(log x)
k
, which again is of the desired order
of magnitude.
Remark. The choice of
x as the splitting point is sucient to obtain
the asserted upper bound for Li
k
(x), but it is not optimal and choosing
a larger division point allows one to derive a more accurate estimate for
Li
k
(x) (which, however, is still inferior to what can be obtained with the
integration by parts method that we will use to prove Theorem 2.1). For
example, splitting the integral at x(log x)
k1
, the contribution of lower
subrange is
k
x(log x)
k1
, whereas in the upper subrange the integrand
can be approximated as follows:
(log t)
k
= (log x + log(t/x))
k
= (log x)
k
_
1 +O
k
_
log log x
log x
__
.
This leads to the estimate
Li
k
(x) =
x
(log x)
k
_
1 +O
k
_
log log x
log x
__
.
Proof of Theorem 2.1. Integration by parts shows that, for i = 1, 2, . . .,
Li
i
(x) =
x
(log x)
i
2
(log 2)
i
_
x
2
t
i
(log t)
i+1
t
dt
=
x
(log x)
i
2
(log 2)
i
+i Li
i+1
(x).
Applying this identity successively for i = 1, 2, . . . , k (or, alternatively, using
induction on k) gives
Li(x) = Li
1
(x) = O
k
(1) +
k
i=1
(i 1)!x
(log x)
i
+k! Li
k+1
(x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
50 CHAPTER 2. ARITHMETIC FUNCTIONS II
(Here the term O
k
(1) absorbs the constant terms 2(log 2)
i
that arise when
using the above estimate for each i = 1, 2, . . . , k.) Since Li
k+1
(x)
k
x(log x)
k1
by Lemma 2.2, the asserted estimate follows.
Remark. Note that, because of the factor i!, the series in the main term
diverges if one lets k . The resulting innite series
i=0
i!(log x)
i
is
an example of a so-called asymptotic series, a series that diverges every-
where, but which, when truncated at some level k, behaves like an ordinary
convergent series, in the sense that the error introduced by truncating the
series at the kth term has an order of magnitude equal to that of the next
term in the series.
2.2 Sums of smooth functions: Eulers summation
formula
2.2.1 Statement of the formula
The simplest types of sums
nx
f(n) are those in which f is a smooth
function that is dened for real arguments x. Sums of this type are of interest
in their own right (for example, Stirlings formula for n! is equivalent to an
estimate for a sum of the above type with f(n) = log nsee Theorem 2.6 and
Corollary 2.7 below), but they also occur in the process of estimating sums
of arithmetic functions like the divisor function or the Euler phi function
(see the following sections).
The basic idea for handling such sums is to approximate the sum by a
corresponding integral and investigate the error made in the process. The
following important result, known as Eulers summation formula, gives an
exact formula for the dierence between such a sum and the corresponding
integral.
Theorem 2.3 (Eulers summation formula). Let 0 < y x, and sup-
pose f(t) is a function dened on the interval [y, x] and having a continuous
derivative there. Then
y<nx
f(n) =
_
x
y
f(t)dt +
_
x
y
tf
nx
f(n),
taken over all positive integers n x. In this case, Eulers summation
formula reduces to the following result:
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 51
Corollary 2.4 (Eulers summation formula, special case). Let x 1
and suppose that f(t) is dened on [1, x] and has a continuous derivative on
this interval. Then we have
nx
f(n) =
_
x
1
f(t)dt +
_
x
1
tf
(t)tdt.
Combining these formulas gives the desired identity.
Remark. The above proof is quite simple and intuitive, and motivates the
particular form of the Euler summation formula. However, it is less ele-
mentary in that it depends on a concept beyond the calculus level, namely
the Stieltjes integral. In Section 2.3.1 we will give an independent, more
elementary, proof; in fact, we will prove a more general result (the partial
summation formula), of which Eulers summation formula is a corollary.
2.2.2 Partial sums of the harmonic series
Eulers summation formula has numerous applications in number theory and
analysis. We will give here three such applications; the rst is to the partial
sums of the harmonic series.
Theorem 2.5 (Partial sums of the harmonic series). We have
nx
1
n
= log x + +O
_
1
x
_
(x 1),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
52 CHAPTER 2. ARITHMETIC FUNCTIONS II
where = lim
x
(
nx
1/n log x) = 0.5772 . . . is a constant, the so-
called Euler constant.
Remark. The error term O(1/x) here is best-possible, since the left-hand
side has a jump of size 1/x whenever x crosses an integer, while the main
term on the right is continuous in x.
Proof. Let S(x) =
nx
1/n. By Eulers summation formula (in the version
given by Corollary 2.4) we have, for x 1,
nx
=
_
x
1
1
t
dt +
_
x
1
t
1
t
2
dt +
x
x
+ 1 = log x I(x) + 1 +O
_
1
x
_
,
where I(x) =
_
x
1
tt
2
dt. To obtain the desired estimate, it suces to show
that I(x) = 1 + O(1/x) for x 1. To estimate the integral I(x), we
employ the following trick: We extend the integration in the integral to
innity and estimate the tail of the integral. Since the integrand is bounded
by 1/t
2
, the integral converges absolutely when extended to innity, and
therefore equals a nite constant, say I, and we have
I(x) = I
_
x
t
t
2
dt = I +O
__
x
1
t
2
dt
_
= I +O
_
1
x
_
(x 1).
We have thus shown that S(x) = log x + 1 I + O(1/x) for x 1. In
particular, this implies that S(x) log x converges to 1 I as x .
Since, by denition, = lim
x
(S(x) log x), we have 1 I = , and thus
obtain the desired formula.
2.2.3 Partial sums of the logarithmic function and Stirlings
formula
Our second application of Eulers summation formula is a proof of the so-
called Stirling formula, which gives an asymptotic estimate for N!, where
N is a (large) integer. This formula will be an easy consequence of the
following estimate for the logarithm of N!, log N! =
nN
log n, which is a
sum to which Eulers summation formula can be applied.
Theorem 2.6 (Partial sums of the logarithmic function). We have
nN
log n = N(log N 1) +
1
2
log N +c +O
_
1
N
_
(N N),
where c is a constant.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 53
Proof. Let S(N) =
nN
log n. Applying Eulers summation formula
(again in the special case provided by Corollary 2.4), and noting that
N = 0 since N is an integer, we obtain
S(N) = I
1
(N) +I
2
(N)
with
I
1
(N) =
_
N
1
(log t)dt = t(log t 1)
N
1
= N log N N + 1
and
I
2
(N) =
_
N
1
t
t
dt =
_
N
1
1/2
t
dt +
_
N
1
(t)
t
dt =
1
2
log N +I
3
(N),
where (t) = t 1/2 is the row of teeth function and I
3
(N) =
_
N
1
((t)/t)dt. Combining these formulas gives
S(N) = N log N N +
1
2
log N + 1 +I
3
(N).
Thus, to obtain the desired estimate, it suces to show that I
3
(N) = c
+
O(1/N), for some constant c
.
We begin with an integration by parts to get
I
3
(N) =
R(t)
t
N
1
+I
4
(N),
where
R(t) =
_
t
1
(t)dt, I
4
(x) =
_
x
1
R(t)
t
2
dt.
Since (t) is periodic with period 1, [(t)[ 1/2 for all t, and
_
k+1
k
(t) = 0
for any integer k, we have R(t) = 0 whenever t is an integer, and [R(t)[ 1/2
for all t. Hence the terms R(t)/t vanish at t = 1 and t = N, so we have
I
3
(N) = I
4
(N). Now, the integral I
4
(N) converges as x , since its
integrand is bounded by [R(t)[t
2
(1/2)t
2
, and we therefore thave
I
4
(N) = I
_
N
R(t)
t
2
dt = I O
__
N
1
t
2
dt
_
= I +O
_
1
N
_
,
where
I =
_
1
R(t)
t
2
dt.
(Note here again the trick of extending a convergent integral to innity
and estimating the tail.) Hence we have I
3
(N) = I
4
(N) = I + O(1/N), as
we wanted to show.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
54 CHAPTER 2. ARITHMETIC FUNCTIONS II
We now use this estimate to prove (modulo the evaluation of a constant)
Stirlings formula for n!.
Corollary 2.7 (Stirlings formula). If n is a positive integer, then
n! = C
nn
n
e
n
_
1 +O
_
1
n
__
,
where C is a constant.
Remark. One can show that the constant C is equal to
2, and Stirlings
formula is usually stated with this explicit value of the constant. However,
proving this is far from easy, and since the value of the constant is not
important for our applications, we will not pursue this here. The argument
roughly goes as follows: Consider the sum
n
k=0
_
n
k
_
. On the one hand,
this sum is exactly equal to 2
n
. On the other hand, by expressing the
binomial coecients in terms of factorials and estimating the factorials by
Stirlings formula one can obtain an estimate for this sum involving the
Stirling constant C. Comparing the two evaluations one obtains C =
2.
Proof. Since n! = exp
kn
log k, we have, by the theorem,
n! = exp
_
nlog n n +
1
2
log n +c +O
_
1
n
__
,
which reduces to the right-hand side in the estimate of the corollary, with
constant C = e
c
.
The estimate of Theorem 2.6 applies only to sums
nx
log n when x is
a positive integer; this is the case of interest in the application to Stirlings
formula. In numbertheoretic applications one needs estimates for these sums
that are valid for all (large) real x. The following corollary provides such an
estimate, at the cost of a weaker error term.
Corollary 2.8. We have
nx
log n = x(log x 1) +O(log x) (x 2).
Proof. We apply the estimate of the theorem with N = [x]. The left-hand
side remains unchanged when replacing x by N. On the other hand, the
main term on the right, x(log x 1), has derivative log x. so it changes by
an amount of order at most O(log x) if x is replaced by [x]. Since the error
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 55
term O(1/x) on the right is also of this order of magnitude, the asserted
estimate follows. (Note here the restriction x 2; in the larger range x 1
this estimate would not be valid, since the error term O(log x) is 0 at x = 1,
whereas the main terms on the left and right are clearly not equal when
x = 1.)
2.2.4 Integral representation of the Riemann zeta function
The Riemann zeta function is dened for complex arguments s with Re s > 1
by the series
(s) =
n=1
1
n
s
.
As an application of Eulers summation formula, we now derive an integral
represention for this function. This representation will be crucial in deriving
deeper analytic properties of the zeta function.
Theorem 2.9 (Integral representation of the zeta function). For
Re s > 1 we have
(2.1) (s) =
s
s 1
s
_
1
xx
s1
dx.
Proof. Fix s with Re s > 1, and let S(x) =
nx
n
s
. Applying Eulers
summation formula in the form of Corollary 2.4 with f(x) = x
s
, we get,
for any x 1,
S(x) = I
1
(x) +I
2
(x) xx
s
+ 1,
where
I
1
(x) =
_
x
1
y
s
dy =
1 x
1s
s 1
=
1
s 1
+O
s
(x
1Re s
)
and
I
2
(x) =
_
x
1
y(s)y
s1
dy
= s
_
1
yy
s1
dy +O
s
__
x
y
Re s1
dy
_
= s
_
1
yy
s1
dy +O
s
(x
Re s
).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
56 CHAPTER 2. ARITHMETIC FUNCTIONS II
Letting x , the O-terms in the estimates for I
1
(x) and I
2
(x), as well as
the term xx
s
, tend to zero, and we conclude
(s) = lim
x
S(x) =
1
s 1
+ 1 s
_
1
yy
s1
dy
=
s
s 1
s
_
1
yy
s1
dy,
which is the asserted identity.
2.3 Removing a smooth weight function from a
sum: Summation by parts
2.3.1 The summation by parts formula
Summation by parts (also called partial summation or Abel summation) is
the analogue for sums of integration by parts. Given a sum of the form
nx
a(n)f(n), where a(n) is an arithmetic function with summatory func-
tion A(x) =
nx
a(n) and f(n) is a smooth weight, the summation by
parts formula allows one to remove the weight f(n) from the above sum
and reduce the evaluation or estimation of the sum to that of an integral
over A(t). The general formula is as follows:
Theorem 2.10 (Summation by parts formula). Let a : N C be an
arithmetic function, let 0 < y < x be real numbers and f : [y, x] C be a
function with continuous derivative on [y, x]. Then we have
(2.2)
y<nx
a(n)f(n) = A(x)f(x) A(y)f(y)
_
x
y
A(t)f
(t)dt,
where A(t) =
nt
a(n).
This formula is easy to remember since it has the same form as the
formula for integration by parts, if one thinks of A(x) as the integral of
a(n).
In nearly all applications, the sums to be estimated are sums of the form
nx
a(n)f(n), where n ranges over all positive integers x. We record
the formula in this special case separately.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 57
Corollary 2.11 (Summation by parts formula, special case). Let
a : N C be an arithmetic function, let x 1 be a real number and
f : [1, x] C a function with continuous derivative on [1, x]. Then we have
(2.3)
nx
a(n)f(n) = A(x)f(x)
_
x
1
A(t)f
(t)dt.
Proof. Applying the theorem with y = 1 and adding the term a(1)f(1) =
A(1)f(1) on both sides of (2.2) gives (2.3).
In the case when a(n) 1 the sum on the left of (2.2) is of the same
form as the sum estimated by Eulers summation formula (Theorem 2.3),
which states that, under the same conditions on f, one has
y<nx
f(n) =
_
x
y
f(t)dt +
_
x
y
tf
y<nx
f(n) = [x]f(x) [y]f(y)
_
x
y
[t]f
(t)dt
= xf(x) yf(y)
_
x
y
tf
(t)dt
xf(x) +yf(y) +
_
x
y
tf
(t)dt.
By an integration by parts, the rst integral on the right-hand side equals
xf(x) yf(y)
_
x
y
f(t)dt, so the above reduces to
y<nx
f(n) =
_
x
y
f(t)dt xf(x) +yf(y) +
_
x
y
tf
(t)dt,
which is the desired formula.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
58 CHAPTER 2. ARITHMETIC FUNCTIONS II
Proof of Theorem 2.10. Let 0 < y < x, a, and f be given as in the theorem.
and let I denote the integral on the right of (2.2). Setting (n, t) = 1 if
n t and (n, t) = 0 otherwise, we have
I =
_
x
y
nx
a(n)(n, t)f
(t)dt =
nx
a(n)
_
x
y
(n, t)f
(t)dt
=
nx
a(n)
_
x
max(n,y)
f
(t)dt,
where the interchanging of integration and summation is justied since the
sum involves only nitely many terms. Since f
nx
a(n)(f(x) f(max(n, y))
=
nx
a(n)f(x)
ny
a(n)f(y)
y<nx
a(n)f(n)
= A(x)f(x) A(y)f(y)
y<nx
a(n)f(n).
Hence,
y<nx
a(n)f(n) = A(x)f(x) A(y)f(y) I,
which is the desired formula.
Partial summation is an extremely useful tool that has numerous appli-
cations in number theory and analysis. In the following subsections we give
three such applications. We will encounter a number of other applications
in later chapters.
2.3.2 Kroneckers Lemma
As a rst, and simple, illustration of the use of the partial summation for-
mula we prove the following result, known as Kroneckers Lemma, which
is of independent interest and has a number of applications in its own right,
in particular, in probability theory and analysis.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 59
Theorem 2.12 (Kroneckers Lemma). Let f : N C be an arithmetic
function. If s is a complex number with Re s > 0 such that
(2.4)
n=1
f(n)
n
s
converges,
then
(2.5) lim
x
1
x
s
nx
f(n) = 0.
In particular, the convergence of
n=1
f(n)/n implies that f has mean value
zero in the sense that lim
x
(1/x)
nx
f(n) = 0.
Remarks. Kroneckers lemma is often stated only in the special case men-
tioned at the end of the above theorem (i.e., the case s = 1), and for most
applications it is used in this form. We have stated a slightly more general
version involving weights n
s
instead of n
1
, as we will need this version
later. In fact, the result holds in greater generality, with the function x
s
replaced by a general weight function w(x) in both (2.4) and (2.5).
Proof. Fix a function f(n) and s C with Re s > 0 as in the theorem, and
set
S(x) =
nx
f(n), T(x) =
nx
f(n)
n
s
.
The hypothesis (2.4) means that T(x) converges to a nite limit T as x ,
and the desired conclusion (2.5) is equivalent to lim
x
S(x)/x
s
= 0.
Let > 0 be given. Since lim
x
T(x) = T, there exists x
0
= x
0
() 1
such that
(2.6) [T(x) T[ (x x
0
).
Let x x
0
. Applying the summation by parts formula with f(n)/n
s
and n
s
in place of a(n) and f(n), respectively, we obtain
S(x) =
nx
f(n)
n
s
n
s
= T(x)x
s
_
x
1
T(t)st
s1
dt
=
_
x
0
T(x)st
s1
dt
_
x
1
T(t)st
s1
dt.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
60 CHAPTER 2. ARITHMETIC FUNCTIONS II
Dening T(t) to be 0 if t 1, we can combine the last two integrals to a
single integral over the interval [0, x] and obtain
[S(x)[ =
_
x
0
(T(x) T(t))st
s1
dt
_
x
0
[T(x) T(t)[[s[t
Re s1
dt.
To estimate the latter intergral, we split the interval of integration into the
two subintervals [0, x
0
) and [x
0
, x], and bound the integrand separately in
these two intervals. For x
0
t x we have, by (2.6),
[T(t) T(x)[ [T(t) T[ +[T T(x)[ 2,
while for 0 t x
0
we use the trivial bound
[T(t) T(x)[ [T(t)[ +[T[ +[T T(x)[
nx
0
f(n)
n
s
+[T[ + = M,
say, with M = M() a constant depending on , but not on x. It follows
that
[S(x)[
_
x
x
0
[s[t
Re s1
dt +M
_
x
0
0
[s[t
Re s1
dt
[s[
Re s
_
_
x
Re s
x
0
Re s
_
+Mx
Re s
0
_
and hence
S(x)
x
s
[s[
Re s
_
+
Mx
Re s
0
x
Re s
_
.
Since, by hypothesis, Re s > 0, the last term on the right tends to zero as
x , so we obtain limsup
x
[S(x)/x
s
[ [s[/ Re s. Since > 0 was
arbitrary, we conclude that lim
x
S(x)/x
s
= 0, as desired.
2.3.3 Relation between dierent notions of mean values of
arithmetic functions
We next use partial summation to study the relation between two dier-
ent types of mean values, or averages, of an arithmetic function f: the
ordinary (or asymptotic) mean value
(2.7) M(f) = lim
x
1
x
nx
f(n),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 61
and the logarithmic mean value
(2.8) L(f) = lim
x
1
log x
nx
f(n)
n
,
The asymptotic mean value is (a limit of) an ordinary average, or arith-
metic mean, of the values f(n), while the logarithmic mean value can be
regarded as a weighted average of these values, with the weights being 1/n.
Thus, to convert between these two mean values it is natural to use partial
summation to remove or re-instate the weights 1/n. The application of par-
tial summation in this way is very common, and it is also quite instructive
as it illustrates both a siutation in which this approach is successful, and a
situation in which the method fails.
In one direction (namely, going from M(f) to L(f)), the method works
well, and we have the following result.
Theorem 2.13. Let f be an arithmetic function, and suppose that the ordi-
nary mean value M(f) exists. Then the logarithmic mean value L(f) exists
as well, and is equal to M(f).
Proof. Suppose f has mean value M(f) = A. Let S(x) =
nx
f(n)
and T(x) =
nx
f(n)/n. By the assumption M(f) = A, we have
lim
x
S(x)/x = A, and we need to show that lim
x
T(x)/ log x = A.
To obtain an estimate for T(x), we apply the partial summation formula
with a(n) = f(n), A(x) = S(x), and with the function f(x) = 1/x as the
weight function to be removed from the sum. We obtain
(2.9) T(x) =
S(x)
x
+
_
x
1
S(t)
t
2
dt =
S(x)
x
+I(x),
say. Upon dividing by log x, the rst term, S(x)/(xlog x), tends to zero,
since, by hypothesis, S(x)/x converges, and hence is bounded. To show that
the limit L(f) = lim
x
T(x)/ log x exists and is equal to A, it remains
therefore to show that the integral I(x) satises
(2.10) lim
x
I(x)
log x
= A.
Let > 0 be given. By our assumption lim
t
S(t)/t = A, there exists
t
0
= t
0
() 1 such that [S(t)/t A[ for t t
0
. Moreover, for 1 t t
0
we have
S(t)
t
A
1
t
nt
0
[f(n)[ +[A[
nt
0
[f(n)[ +[A[ = K
0
,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
62 CHAPTER 2. ARITHMETIC FUNCTIONS II
say, where K
0
= K
0
() is a constant depending on . Hence, for x t
0
we
have
[I(x) Alog x[ =
_
x
1
S(t)/t A
t
dt
_
t
0
1
K
0
t
dt +
_
x
t
0
t
dt
K
0
log t
0
+ log(x/t
0
)
K
0
log t
0
+ log x.
Since was arbitrary, it follows that
limsup
x
[I(x) Alog x[
log x
= 0,
which is equivalent to (2.10).
In the other direction (going from L(f) to M(f)) the method fails; in-
deed, the converse of Theorem 2.13 is false:
Theorem 2.14. There exist arithmetic functions f such that L(f) exists,
but M(f) does not exist.
Proof. Dene a function f by f(n) = n if n = 2
k
for some nonnegative
integer k, and f(n) = 0 otherwise. This function does not have an ordinary
mean value since the average (1/x)
nx
f(n), as a function of x, has a
jump of size 1 at all powers of 2, and hence does not converge as x .
However, f has a logarithmic mean value (namely 1/ log 2), since
1
log x
nx
f(n)
n
=
1
log x
2
k
x
2
k
2
k
=
1
log x
__
log x
log 2
_
+ 1
_
=
1
log 2
+O
_
1
log x
_
.
For an arithmetic function to have an asymptotic mean value is there-
fore a stronger condition than having a logarithmic mean value, and the
existence of an asymptotic mean value is usually much harder to prove than
the existence of a logarithmic mean value. For example, it is relatively easy
to prove (as we will see in the next chapter) that the von Mangoldt function
(n) has logarithmic mean value 1, and the Moebius function (n) has loga-
rithmic mean value 0. By contrast, the existence of an ordinary asymptotic
mean value for or is equivalent to the prime number theorem and much
more dicult to establish.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 63
Failure of partial summation. It is tempting to try to use partial sum-
mation in an attempt to show that the existence of L(f) implies that of
M(f). Of course, since this implication is not true, such an approach is
bound to fail, but it is instructive to see what exactly goes wrong if one tries
to apply partial summation in the converse direction. Thus, assume that
L(f) exists and is equal to A. In an attempt to show that M(f) exists as well
and is equal to A, one would start with the sum S(x) =
nx
(f(n)/n)n,
and then remove the factor n by partial summation. Applying partial
summation as in (2.9), but with the roles of S(x) and T(x) interchanged,
gives the identity
S(x) = xT(x)
_
x
1
T(t)dt,
so to show that M(f) = A we would need to show
(2.11)
S(x)
x
= T(x) (1/x)
_
x
1
T(t)dt A (x ).
However, the assumption that f has logarithmic mean value A is equivalent
to the estimate T(x) = Alog x + o(log x), and substituting this estimate
into (2.11) introduces an error term o(log x) that prevents one from drawing
any conclusions about the convergence of S(x)/x in (2.11). To obtain (2.11)
would require a much stronger estimate for T(x), in which the error term is
o(1) instead of o(log x).
The reason why (2.11) is so ineective is because the right-hand side is a
dierence of two large terms of nearly the same size, both of which are much
larger than the left-hand side. By contrast, the right-hand side of (2.9) is
a sum of two expressions, each of the same (or smaller) order of magnitude
than the function on the left.
The logarithmic mean value as an average version of the asymp-
totic mean value. Further insight into the relation between the asymp-
totic and logarithmic mean values is provided by rewriting the identity (2.9)
in terms of the functions
(t) = m(e
t
), (t) = l(e
t
),
where
m(x) =
1
x
nx
f(n), (x) =
1
log x
nx
f(n)
n
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
64 CHAPTER 2. ARITHMETIC FUNCTIONS II
are the nite asymptotic, resp. logarithmic, mean values. Assuming that
m(x) = o(log x) (a very mild assumption that holds, for example, if the
function f is bounded), (2.9) becomes
(t) = o(1) +
1
t
_
t
0
(s)ds.
Thus, the convergence of (t) (i.e., the existence of a logarithmic mean
value) is equivalent to the convergence of (t) = (1/t)
_
t
0
(s)ds, i.e., the
convergence of a certain average of (s), the ordinary (nite) mean value.
It is obvious that if a function (t) converges, then so does its average (t),
and it is also easy to construct functions (t) for which the converse does
not hold. Interpreting M(f) as the limit of a function (t), and L(f) as the
limit of the corresponding average function (t), it is then clear that the
existence of M(f) implies that of L(f), but not vice versa.
2.3.4 Dirichlet series and summatory functions
As a nal illustration of the use of partial summation, we prove an integral
representation for the so-called Dirichlet series of an arithmetic function.
Given an arithmetic function f, the Dirichlet series of f is the (formal)
innite series
F(s) =
n=1
f(n)
n
s
,
where s is any complex number. The following result gives a repre-
sentation of this series as a certain integral involving the partial sums
S(x) =
nx
f(n).
Theorem 2.15 (Mellin transform representation of Dirichlet se-
ries). Let f be an arithmetic function, let S
f
(x) =
nx
f(n) be the asso-
ciated summatory function, and let F(s) =
n=1
f(n)n
s
be the Dirichlet
series associated with f, whenever the series converges.
(i) For any complex number s with Re s > 0 such that F(s) converges we
have
(2.12) F(s) = s
_
1
S
f
(x)
x
s+1
dx,
(ii) If S
f
(x) = O(x
(s) =
_
0
(s)x
s
dx,
provided the integral exists. In this terminology (2.12) says that F(s)/s
is the Mellin transform of the function S
f
(x)/x (with the convention that
S
f
(x) = 0 if x < 1).
Proof. (i) Suppose that F(s) converges for some s with Re s > 0. We want
to apply partial summation to remove the factor n
s
in the summands of
F(s) in order express F(s) in terms of the partial sums S
f
(x). Since F(s) is
an innite series, we cannot apply the partial summation formula directly to
F(s). However, we can apply it to the partial sums F
N
(s) =
N
n=1
f(n)n
s
and obtain, for any positive integer N,
(2.13) F
N
(s) =
S
f
(N)
N
s
+s
_
N
1
S
f
(x)
x
s+1
dx.
Now let N on both sides of this identity. Since, by assumption, the
series F(s) converges, the partial sums F
N
(s) on the left tend to F(s). Also,
by Kroneckers Lemma (Theorem 2.12), the convergence of F(s), along with
the hypothesis Re s > 0, implies that the rst term on the right, S
f
(N)/N
s
,
tends to zero. Hence the integral on the right-hand side converges as N
tends to innity, and we obtain (2.12).
(ii) Suppose that S
f
(x) = O(x
)
and hence tends to zero as N . Also, the integrand S
f
(x)x
s1
in the
integral on the right of (2.13) is of order O(x
1
), so this integral is abso-
lutely convergent when extended to innity. Letting N , we conclude
that the limit lim
N
F
N
(s) exists and is equal to s
_
1
S
f
(x)x
s1
dx. But
this means that F(s) converges and the identity (2.12) holds.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
66 CHAPTER 2. ARITHMETIC FUNCTIONS II
2.4 Approximating an arithmetic function by a
simpler arithmetic function: The convolution
method
2.4.1 Description of the method
Among the various methods for estimating sums of arithmetic functions, one
of the most widely applicable is the convolution method presented in this
section. The basic idea of is as follows. Given an arithmetic function f whose
partial sums F(x) =
nx
f(n) we want to estimate, we try to express
f as a convolution f = f
0
g, where f
0
is a function that approximates
f (in a suitable sense) and which is well-behaved in the sense that good
estimates for the partial sums F
0
(x) =
nx
f
0
(n) are available, and where
g is a perturbation that is small (again in a suitable sense). Writing
f(n) =
d|n
g(d)f
0
(n/d), we have
F(x) =
nx
f(n) =
nx
d|n
g(d)f
0
(n/d) =
dx
g(d)
nx
d|n
f
0
(n/d) (2.14)
=
dx
g(d)
x/d
f
0
(n
) =
dx
g(d)F
0
(x/d).
Substituting known estimates for F
0
(y) then yields an estimate for F(x) =
nx
f(n).
We call this method the convolution method, since the idea of writing an
unknown function as a convolution of a known function with a perturbation
factor is key to the method.
In practice, the approximating function f
0
is usually a very simple
and well-behaved function, such as the function 1, or the identity func-
tion f(n) = n, though other choices are possible, too. In most applications
the function f is multiplicative, and an appropriate approximation is usu-
ally easily obtained by taking for f
0
a simple multiplicative function whose
values on primes are similar (or equal) to the corresponding values of f.
The following examples illustrate typical situations in which the method
can be successfully applied, along with appropriate choices of the approxi-
mating function. We will carry out the argument in detail for two of these
cases.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 67
Examples
(1) f(n) = (n): f is multiplicative with f(p) = p 1 for all primes
p. Thus, a natural approximation to f is provided by the identity
function id, which at a prime p has value p. The identity 1 = id
proved earlier implies = id, so we have = f
0
g with f
0
= id
and g = . The estimation of
nx
(n) will be carried out in detail
in Theorem 2.16 below.
(2) f(n) = (n): This case is very similar to the previous example. The
function (n) is multiplicative with values (p) = p+1 at primes, and
choosing id as the approximating function f
0
leads to an estimate for
lex
(n) of the same quality as the estimate for
nx
(n) given in
Theorem 2.16.
(3) f(n) = (n)/n: f is multiplicative with f(p) = 1 1/p for all primes
p, so f
0
= 1 is a natural choice for an approximating function. The
corresponding perturbation factor is g = / id which can be seen as
follows: Starting from the identity = (id), we obtain f = / id =
(id)/ id. Since the function 1/ id is completely multiplicative, it
distributes over the Dirichlet product (see Theorem 1.10), so f =
(id)/ id = 1 / id.
(4) f(n) =
2
(n): f is multiplicative with values 1 at all primes p, so
f
0
= 1 serves as the obvious approximating function. See Theorem
2.18 below for a detailed argument in this case.
(5) f(n) = (n): Suppose we have information on the behavior of
M(x) =
nx
(n), such as the relation M(x) = o(x) (a result which,
as we will see in the next chapter, is equivalent to the prime number
theorem), or the relation M(x) = O
(x
1/2+
) for > 0 (which is equiv-
alent to the Riemann Hypothesis). Applying the convolution method
with f = and f
0
= then allows one to show that estimates of the
same type hold for (n).
(6) f(n) = 2
(n)
: Since (n), the number of distinct prime divisors of n, is
an additive function, the function f = 2
is multiplicative. At primes
f has the same values as the divisor function. This suggests to apply
the convolution method with the divisor function as the approximat-
ing function, and to try to derive estimates for the partial sums of f
from estimates for the partial sums of the divisor function provided by
Dirichlets theorem (see Theorem 2.20 in the following section). This
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
68 CHAPTER 2. ARITHMETIC FUNCTIONS II
approach works, and it yields an estimate for
nx
2
(n)
of nearly the
same quality as Dirichlets estimate for
nx
d(n).
2.4.2 Partial sums of the Euler phi function
We will prove the following estimate.
Theorem 2.16. We have
(2.15)
nx
(n) =
3
2
x
2
+O(xlog x) (x 2).
Before proving this result, we present some interesting applications and
interpretations of the result.
Number of Farey fractions of given order. Let Q be a positive integer.
The Farey fractions of order Q are the rational numbers in the interval
(0, 1] with denominator (in reduced form) at most Q. From the denition
of (n) it is clear that (n) represents the number of rational numbers
in the interval (0, 1] that in reduced form have denominator n. The sum
nQ
(n) is therefore equal to the number of rationals in the interval
(0, 1] with denominators Q, i.e., the number of Farey fractions of order
Q. The theorem shows that this number is equal to (3/
2
)Q
2
+O(Qlog Q).
Lattice points visible from the origin. A second application of the
theorem is obtained by interpreting the pairs (n, m), with 1 m n and
(m, n) = 1 as lattice points in the plane. The number of such pairs is equal
to the sum
nx
(n) estimated in the theorem. It is easy to see that the
condition (m, n) = 1 holds if and only if the point (m, n) is visible from
the origin, in the sense that the line segment joining this point with the
origin does not pass through another lattice point. The theorem therefore
gives an estimate for the number of lattice points in the triangular region
0 < n x, 0 < m n, that are visible from the origin. By a simple
symmetry argument, it follows that the total number of lattice points in the
rst quadrant that are visible from the origin and have coordinates at most
x is (6/
2
)x
2
+O(xlog x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 69
Probability that two random integers are coprime. Dening this
probability as the limit
lim
N
1
N
2
#(n, m) : 1 n, m N, (n, m) = 1,
we see from the previous application that this limit exists and is equal to
6/
2
.
Proof of Theorem 2.16. We apply the identity (2.14) with f = and f
0
= id
as the approximating function. As noted above, the identity id = 1 implies
= id, so we have g = . Moreover, the summatory function of f
0
(= id)
equals
F
0
(x) =
nx
n =
1
2
[x]([x] + 1) =
1
2
x
2
+O(x).
Substituting this estimate into (2.14) gives
nx
(n) =
dx
(d)
_
1
2
_
x
d
_
2
+O
_
x
d
_
_
=
1
2
x
2
dx
(d)
d
2
+O
_
_
x
dx
1
d
_
_
=
1
2
x
2
d=1
(d)
d
2
+O
_
x
2
d>x
1
d
2
_
+O
_
_
x
dx
1
d
_
_
.
(Note here that for the estimation of
dx
(d)d
2
in the last step we
used the trick of extending the sum to innity and estimating the tail of
the innite series. This is a very useful device that can be applied to any
nite sum that becomes convergent, and hence equal to a constant, when
the summation is extended to innity.) Since
d>x
d
2
1/x (e.g., by
Eulers summation formula, or, simpler, by noting the sum is
_
x1
t
2
dt =
(x 1)
1
) and
dx
1/d log x for x 2, the two error terms are of
order O(x) and O(xlog x), respectively, while the main term is Cx
2
, with
C = (1/2)
d=1
(d)/d
2
.
To complete the proof, it remains to show that the constant C is equal
to 3/
2
. This follows from the following lemma.
Lemma 2.17. We have
n=1
(n)n
2
= 6/
2
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
70 CHAPTER 2. ARITHMETIC FUNCTIONS II
Proof. By the Moebius identity e(n) =
d|n
(d) we have
1 =
n=1
e(n)
n
2
=
n=1
1
n
2
d|n
(d)
=
d=1
m=1
(d)
(dm)
2
=
d=1
(d)
d
2
m=1
1
m
2
.
Hence
n=1
(n)
n
2
=
_
n=1
1
n
2
_
1
.
By Theorem A.1 of the Appendix, the sum
n=1
n
2
is equal to
2
/6. The
result now follows.
2.4.3 The number of squarefree integers below x
Since
2
(n) is the characteristic function of the squarefree integers, the sum-
matory function of
2
is the counting function for the squarefree integers.
The following theorem gives an estimate for this function.
Theorem 2.18. We have
(2.16)
nx
2
(n) =
6
2
x +O(
x) (x 1).
Thus, the probability that a random integer is squarefree is 6/
2
=
0.6079 . . . .
Proof. Since the function
2
is multiplicative and equal to 1 at primes, it
is natural to apply the convolution method with f
0
= 1 as approximating
function. Writing
2
= f
0
g = 1 g, we have g =
2
by Moebius
inversion.
We begin by explicitly evaluating the function g. Since
2
and are
multiplicative functions, so is the function g, and its value at a prime power
p
m
is given by
g(p
m
) =
m
k=0
2
(p
k
)(p
mk
) = (p
m
) +(p
m1
) =
_
_
0 if m = 1,
1 if m = 2,
0 if m 3.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 71
It follows that g(n) = 0 unless n = m
2
where m is squarefree, and in this
case g(n) = (m). In fact, since (m) = 0 if m is not squarefree, we have
g(m
2
) = (m) for all positive integers m, and g(n) = 0 if n is not a square.
The identity (2.14) with g dened as above and F
0
(x) =
nx
1 = [x]
then gives
nx
2
(n) =
dx
g(d)[x/d] =
x
(m)
_
x
m
2
+O(1)
_
= x
x
(m)
m
2
+O
_
_
m
x
[(m)[
_
_
= x
m=1
(m)
m
2
+O
_
_
x
m>
x
1
m
2
_
_
+O(
x).
(Note again the trick of extending a convergent sum to an innite se-
ries and estimating the tail.) The coecient of x in the main term is
m=1
(m)/m
2
= 6/
2
by Lemma 2.17, the second of the two error terms
is of the desired order of magnitude O(
n>y
1/n
2
1/y the same holds for the rst error term. The asserted
estimate therefore follows.
2.4.4 Wintners mean value theorem
Given an arithmetic function f, we say that f has a mean value if the limit
lim
x
1
x
nx
f(n)
exists (and is nite), and we denote the limit by M(f), if it exists. The
concept of a mean value is a useful one, as many results in number theory
can be phrased in terms of existence of a mean value. For example, as
we will show in the next chapter, the prime number theorem is equivalent
to the assertions M() = 1 and M() = 0; Theorem 2.18 above implies
M(
2
) = 6/
2
; and a similar argument shows that M(/ id) = 6/
2
.
As a rst illustration of the convolution method, we prove a result due
to A. Wintner, that gives a general sucient condition for the existence of
a mean value. Note that this theorem does not require the function f to be
multiplicative.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
72 CHAPTER 2. ARITHMETIC FUNCTIONS II
Theorem 2.19 (Wintners mean value theorem). Suppose f = 1 g,
where
n=1
[g(n)[/n < . Then f has a mean value given by M(f) =
n=1
g(n)/n.
As an illustration of this result, we consider again the function f =
2
.
We have f = 1 g, where the function g =
2
is given by g(n) = (m) if
n = m
2
and g(n) = 0 if n is not a square, as shown in the proof of Theorem
2.18. Hence the series
n=1
g(n)/n equals
m=1
(m)/m
2
, which converges
absolutely, with sum 6/
2
. Wintners theorem therefore applies and shows
that
2
has mean value 6/
2
, as we had obtained in Theorem 2.18. (Of
course, a direct application of the convolution method, as in the proof of
Theorem 2.18, may yield more precise estimates with explicit error terms in
any given case. The main advantage of Wintners mean value theorem lies
in its generality.)
Proof. Applying again the identity (2.14) with f
0
= 1, F
0
(x) = [x], we
obtain
1
x
nx
f(n) =
1
x
dx
g(d)[x/d]
=
d=1
g(d)
d
+O
_
d>x
[g(d)[
d
_
+O
_
_
1
x
dx
[g(d)[
_
_
.
As x , the rst of the two error terms tends to zero, by convergence
of the series
d=1
[g(d)[/d. The same is true for the second error term,
in view of Kroneckers Lemma (Theorem 2.12) and the hypothesis that
d=1
[g(d)[/d converges. Hence, as x , the left-hand side converges
to the sum
d=1
g(d)/d, i.e., M(f) exists and is equal to the value of this
sum.
2.5 A special technique: The Dirichlet hyperbola
method
2.5.1 Sums of the divisor function
In this section we consider a rather special technique, the Dirichlet hy-
perbola method, invented by Dirichlet to estimate the partial sums of the
divisor function. Dirichlets result is as follows:
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 73
Theorem 2.20 (Dirichlet). We have
nx
d(n) = xlog x + (2 1)x +O(
x) (x 1),
where is Eulers constant (see Theorem 2.5).
Proof. Let D(x) =
nx
d(n). Writing d(n) =
ab=n
1, where a and b run
over positive integers with product n, we obtain
D(x) =
nx
ab=n
1 =
a,bx
abx
1.
Note that, in the latter sum, the condition ab x forces at least one of a and
b to be
1
+
3
,
where
1
=
bx/a
,
2
=
ax/b
,
3
=
x
.
The last sum,
3
, here compensates for the overlap, i.e., those terms (a, b)
that are counted in both
1
and
2
.
The last sum is trivial to estimate. We have
3
=
_
_
a
x
1
_
_
_
_
x
1
_
_
= [
x]
2
= (
x +O(1))
2
= x +O(
x).
Also,
2
=
1
, so it remains to estimate
1
.
This is rather straightforward, using the estimate for the partial sums of
the harmonic series (Theorem 2.5). We have
1
=
x
_
x
a
_
= x
x
1
a
+O
_
_
a
x
1
_
_
= x
_
log
x + +O
_
1
x
__
+O(
x)
=
1
2
xlog x +x +O(
x).
Hence
1
+
3
= xlog x + (2 1)x +O(
x),
which is the desired estimate.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
74 CHAPTER 2. ARITHMETIC FUNCTIONS II
2.5.2 Extensions and remarks
Geometric interpretation. The argument in this proof has the following
simple geometric interpretation, which explains why it is called the hyper-
bola method. The sum D(x) is equal to the number of pairs (a, b) of positive
integers with ab x, i.e., the number of lattice points in the rst quadrant
(not counting points on the coordinate axes) that are to the left of the hyper-
bola ab = x. The sums
1
and
2
count those points which, in addition,
fall into the innite strips dened by 0 < a
x, and 0 < b
y, respec-
tively, whereas
3
counts points that fall into the intersection of these two
strips. It is geometrically obvious that
1
+
3
is equal to D(x),
the total number of lattice points located in the rst quadrant and to the
left of the hyperbola ab = x.
The hyperbola method for general functions. The method underly-
ing the proof of Dirichlets theorem can be generalized as follows. Consider a
sum F(x) =
nx
f(n), and suppose f can be represented as a convolution
f = g h. Letting G(x) and H(x) denote the partial sums of the functions
g(n) and h(n), respectively, we can try to obtain a good estimate for F(x)
by writing F(x) =
1
+
3
with
1
=
x
g(a)H(x/a),
2
=
x
h(a)G(x/b),
3
= G(
x)H(
x),
and estimating each of these sums individually. This can lead to better
estimates than more straightforward approaches (such as writing F(x) =
ax
g(a)H(x/a)), provided good estimates for the functions H(x) and
G(x) are available. The Dirichlet divisor problem is an ideal case, since
here the functions g and h are identically 1, and H(x) = G(x) = [x] for all
x 1. In practice, the usefulness of this method is limited to a few very
special situations, which are similar to that of the Dirichlet divisor problem,
and in most cases the method does not provide any advantage over simpler
approaches. In particular, the convolution method discussed in the previous
section has a much wider range of applicability, and for most problems this
should be the rst method to try.
Maximal order of the divisor function. Dirichlets theorem gives an
estimate for the average order of the divisor function, but the divisor func-
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 75
tion can take on values that are signicantly smaller or signicantly larger
than this average. Regarding lower bounds, we have d(n) = 2 whenever n is
prime, and this bound is obviously best possible. The problem of obtaining a
similarly optimal upper bound is harder. It is easy to prove that d(n) grows
at a rate slower than any power of n, in the sense that, for any given > 0
and all suciently large n, we have d(n) n
(x) =
2x +O(
x).
Remark. While O-estimates can be integrated provided the range of
integration is contained in the range of validity of the estimate, in gen-
eral such estimates cannot be dierentiated. The above problem illus-
trates a situation where, under certain additional conditions (namely,
the monoticity of the derivative), dierentiation of a O-estimate is
allowed.
2.4 Let n be an integer 2 and p a positive real number. A useful estimate
is
_
n
i=1
a
i
_
p
n,p
n
i=1
a
p
i
(a
1
, a
2
, . . . , a
n
> 0).
Prove this estimate with explicit and best-possible values for the
implied constants. In other words, determine the largest value of
c
1
= c
1
(n, p) and the smallest value of c
2
= c
2
(n, p) such that
c
1
n
i=1
a
p
i
_
n
i=1
a
i
_
p
c
2
n
i=1
a
p
i
(a
1
, a
2
, . . . , a
n
> 0).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 77
2.5 Obtain an estimate for the sum
nx
(log n)/n with error term
O((log x)/x).
2.6 Given a positive integer k, let S
k
(x) =
nx
(log nlog x)
k
. Estimate
S
k
(x) to within an error O
k
((log x)
k
). Deduce that, for each k, the
limit
k
= lim
x
(1/x)S
k
(x) exists (as a nite number), and obtain
an explicit evaluation of the constant
k
.
2.7 Obtain an estimate for the sum
S(x) =
2nx
1
nlog n
with error term O(1/(xlog x).
2.8 Given an arithmetic function f dene a mean value H(f) by
H(f) = lim
x
1
xlog x
nx
f(n) log n,
if the limit exists. Show that H(f) exists if and only if the ordinary
mean value M(f) = lim
x
(1/x)
nx
f(n) exists.
2.9 Given an arithmetic function a(n), n = 1, 2, . . . , and a real number
> 1 dene a mean value M
(a) by
M
(a) = lim
x
1 +
x
1+
nx
n
a(n),
provided the limit exist. (In particular, M
0
(a) = M(a) is the usual
asymptotic mean value of a.) Prove, using a rigorous x
0
argument,
that the mean value M
n=1
f(n)n
s
satises
(0) F(s) =
A
s 1
+o
_
1
s 1
_
(s 1+).
Show that if f has a logarithmic mean value L(f) = A, then f also
has an analytic mean value, and the two mean values are equal.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
78 CHAPTER 2. ARITHMETIC FUNCTIONS II
2.11 Let f be an arithmetic function having a non-zero mean value M(f) =
A, and let be a xed real number. Obtain an asymptotic formula
for the sums
nx
f(n)n
i
.
2.12 Say that an arithmetic function f has a strong logarithmic mean value
A, and write L
nx
f(n)
n
= Alog x +B +o(1) (x )
for some constants A and B. This is obviously a stronger condition
than the existence of a logarithmic mean value which would correspond
to an estimate of the above form with o(log x) in place of B +o(1).
(i) Show that, in contrast to the (ordinary) logarithmic mean value,
this stronger condition is sucient to imply the existence of the
asymptotic mean value. That is, show that if f has a strong
logarithmic mean value A in the above sense, then the ordinary
mean value M(f) also exists and is equal to A.
(ii) Is the converse true, i.e., does the existence of M(f) imply that
of a strong logarithmic mean value?
2.13 Let > 1 and t ,= 0 be xed real numbers, and S
t,
(x) =
x<nx
n
1it
. Obtain an estimate for S
t,
(x) as x with error
term O
t,
(1/x). Deduce from this estimate that, for any non-zero t and
any > 1, the limit lim
x
[S
t,
(x)[ exists, and that, for given t ,= 0
and suitable choices of , this limit is non-zero. (Thus, by Cauchys
criterion, the series
n=1
n
1it
diverges for every real t ,= 0.)
2.14 Obtain an asymptotic estimate with error term O(x
1/3
) for the number
of squarefull integers x, i.e., for the quantity
S(x) = #n x : p[n p
2
[n.
2.15 Let f(n) =
d
3
|n
(d), where the sum runs over all cubes of positive
integers dividing n. Estimate
nx
f(n) with as good an error as you
can get.
2.16 For any positive integer n dene its squarefree kernel k(n) by k(n) =
p|n
p. Obtain an estimate for
nx
k(n)/n with error term O(
x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 79
2.17 (i) Obtain an estimate for the sum
S(x) =
nx
n odd
1
n
, x 1,
with error term O(1/x). (The estimate should not involve any
unspecied constants.)
(ii) Let
D(x) =
nx
n odd
d(n),
where d(n) is the divisor function. Give an estimate for D(x)
with error term O(
nx
d(n), for the sum
nx
2
(n)
.
2.19 Obtain an estimate, similar to the estimate for
nx
1/n proved in
Theorem 2.5, for the sum
nx
1/(n). (Hint: Convolution method.)
2.20 Let q
1
= 1, q
2
= 2, q
3
= 3, q
4
= 5 . . . denote the sequence of squarefree
numbers.
(i) Obtain an asymptotic estimate with error term O(
n) for q
n
.
(ii) Show that there are arbitrarily large gaps in the sequence q
n
,
i.e., limsup
n
(q
n+1
q
n
) = . (Hint: Chinese Remainder
Theorem.)
(iii) Prove the stronger bound
limsup
n
q
n+1
q
n
log n/ log log n
1
2
.
(iv)* (Harder) Prove that (iii) holds with 1/2 replaced by the constant
2
/12, i.e., that the limsup above is at least
2
/12.
2.21 Show that the inequality (n) n/4 holds for at least 1/3 of all
positive integers n (in the sense that if A is the set of such n, then
liminf
x
(1/x)#n x : n A 1/3). (Hint: use the fact that
(1)
nx
(n) (3/
2
)x
2
(which was proved in Theorem 2.16) or (2)
nx
(n)/n (6/
2
)x (an easy consequence of Wintners theorem,
or of (1), by partial summation).)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
80 CHAPTER 2. ARITHMETIC FUNCTIONS II
2.22 Let f = 1 g. Wintners theorem (Theorem 2.19) shows that if the
series
(1)
n=1
g(n)
n
converges absolutely, then the mean value M(f) of f exists and is equal
to the sum of the series (1).
(i) Show that the conclusion of Wintners theorem remains valid if
the series (1) converges only conditionally and if, in addition,
(2) limsup
x
1
x
nx
[g(n)[ < .
(ii) Show that condition (2) cannot be dropped; i.e., construct an
example of a function g for which the series (1) converges, but
the function f = 1 g does not have a mean value.
2.23 Using the Dirichlet hyperbola method (or some other method), ob-
tain an estimate for the sum
nx
d(n)/n with an error term
O((log x)/
x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Chapter 3
Distribution of primes I:
Elementary results
The Prime Number Theorem (PNT), in its most basic form, is the asymp-
totic relation (x) x/ log x for the prime counting function (x), the
number (x) of primes x. This result had been conjectured by Legendre
and (in a more precise form) by Gauss, based on examining tables of primes.
However, neither succeeded in proving the PNT (and it certainly wasnt for
lack of trying!). It was only much later, near the end of the 19th century,
that a proof of the PNT was given, independently by J. Hadamard and C.
de la Vallee Poussin, via a new, analytic, approach that was not available to
Gauss and his contemporaries. We will give a proof of the PNT, in a strong
form with an explicit error term, in a later chapter.
In this chapter we establish a number of elementary results on the dis-
tribution of primes that are much easier to prove than the PNT and which,
for the most part, have been known long before the PNT was proved. These
results are of interest in their own right, and they have many applications.
3.1 Chebyshev type estimates
Getting upper and lower bounds for the prime counting function (x) is
surprisingly dicult. Euclids result that there are innitely many primes
shows that (x) tends to innity, but the standard proofs of the innitude
of prime are indirect and do not give an explicit lower bound for (x), or
give only a very weak bound. For example, Euclids argument shows that
the n-th prime p
n
satises the bound p
n
p
1
. . . p
n1
+ 1. By induction,
this implies that p
n
e
e
n1
for all n, from which one can deduce the bound
81
82 CHAPTER 3. DISTRIBUTION OF PRIMES I
(x) log log x for suciently large x. This bound is far from the true order
of (x), but it is essentially the best one derive from Euclids argument.
Eulers proof of the innitude of primes proceeds by showing that
px
1/p log log x c for some constant c and suciently large x. Al-
though this gives the correct order for the partials sum of the reciprocals
of primes (as we will see below, the estimate is accurate to within an error
O(1)), one cannot deduce from this a lower bound for (x) of comparable
quality. In fact, one can show (see the exercises) that the most one can
deduce from the above bound for
px
1/p is a lower bound of the form
(x) log x. While this is better than the bound obtained from Euclids
argument, it is still far from the true order of magnitude.
In the other direction, getting non-trivial upper bounds for (x) is not
easy either. Even showing that (x) = o(x), i.e., that the primes have
density zero among all integers, is by no means easy, when proceeding from
scratch. (Try to prove this bound without resorting to any of the results,
techniques, and tricks you have learned so far.)
In light of these diculties in getting even relatively weak nontrivial
bounds for (x) it is remarkable that, in the middle of the 19th century, the
Russian mathematician P.L. Chebyshev was able to determine the precise
order of magnitude of the prime counting function (x), by showing that
there exist positive constants c
1
and c
2
such that
c
1
x
log x
(x) c
2
x
log x
for all suciently large x. In fact, Chebyshev proved such an inequality
with constants c
1
= 0.92 . . . and c
2
= 1.10 . . .. This enabled him to conclude
that, for suciently large x (and, in fact, for all x 1) there exists a prime
p with x < p 2x, an assertion known as Bertrands postulate.
In establishing these bounds, Chebyshev introduced the auxiliary func-
tions
(x) =
px
log p, (x) =
nx
(n),
which proved to be extremely useful in subsequent work. Converting results
on (x) to results on (x) or (x), or vice versa, is easy (see Theorem 3.2
below), and we will state most of our results for all three of these functions
and use whichever version is most convenient for the proof.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 83
Theorem 3.1 (Chebyshev estimates). For x 2 we have
(x) x, (i)
(x) x, (ii)
(x)
x
log x
. (iii)
Proof. We will establish (i), and then deduce (ii) and (iii) from (i).
To prove (i), we need to show that there exist positive constants c
1
and
c
2
such that
(3.1) c
1
x (x) c
2
x
holds for all x 2.
We begin by noting that it suces to establish (3.1) for x x
0
, for
a suitable x
0
2. Indeed, suppose there exists a constant x
0
2 such
that (3.1) holds for x x
0
. Since for 2 x x
0
we have, trivially,
(x)/x (x
0
)/2 and (x)/x (2)/x
0
= log 2/x
0
, it then follows that
(3.1) holds for all x 2 with constants c
1
= min(c
1
, log 2/x
0
) and c
2
=
max(c
2
, (x
0
)/2) in place of c
1
and c
2
.
In what follows, we may therefore assume that x is suciently large. (Re-
call our convention that O-estimates without explicit range are understood
to hold for x x
0
with a suciently large x
0
.)
Dene
S(x) =
nx
log n, D(x) = S(x) 2S(x/2).
To prove (3.1) we will evaluate D(x) in two dierent ways. On the one hand,
using the asymptotic estimate for S(x) established earlier (see Corollary 2.8),
we have
D(x) = x(log x 1) +O(log x) 2(x/2)(log(x/2) 1) +O(log(x/2))
= (log 2)x +O(log x).
Since 1/2 < log 2 < 1, this implies
(3.2) x/2 D(x) x (x x
0
)
with a suitable x
0
.
On the other hand, using the identity log n = ( 1)(n) =
d|n
(d)
and interchanging summations, we have
(3.3) S(x) =
dx
(d)
nx
d|n
1 =
dx
(d)[x/d] (x 1),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
84 CHAPTER 3. DISTRIBUTION OF PRIMES I
where [t] denotes the greatest integer function. Applying (3.3) to S(x) and
S(x/2), we get
(3.4) D(x) = S(x) 2S(x/2) =
dx
(d)f(x/d) (x 2),
where f(t) = [t] 2[t/2]. (Note that in the evaluation of S(x/2) the sum-
mation range d x/2 can be extended to d x, since the terms with
x/2 < d x do not contribute to the sum due to the factor [x/2d].) By the
elementary inequalities [s] s and [s] > s 1, valid for any real number s,
we have
f(t)
_
< t 2(t/2 1) = 2,
> t 1 2(t/2) = 1.
Since the function f(t) = [t] 2[t/2] is integer-valued, it follows that
f(t)
_
= 1 if 1 t < 2,
0, 1 if t 2.
Hence (3.4) implies
(3.5) D(x)
_
dx
(d) = (x)
x/2<dx
(d) = (x) (x/2)
(x 2).
Combining (3.2) and (3.5), we obtain
(3.6) (x) D(x) x/2 (x x
0
),
and
(3.7) (x) D(x) +(x/2) x +(x/2) (x x
0
).
The rst of these inequalities immediately gives the lower bound in (3.1)
(with c
1
= 1/2). To obtain a corresponding upper bound, we note that
iteration of (3.7) yields
(x)
k1
i=0
x2
i
+(x2
k
),
for any positive integer k such that x2
k+1
x
0
. Choosing k as the maximal
such integer, we have 2
k+1
x x
0
> 2
k
x and thus (x2
k
) (x
0
), and
hence obtain
(x)
k1
i=0
x2
i
+(x
0
) 2x +(x
0
),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 85
which gives the upper bound in (3.1) for x x
0
with a suciently large
constant c
2
(in fact, we could take c
2
= 2 + , for x x
0
(), for any xed
> 0 with a suitable x
0
()).
This completes the proof of (i).
To deduce (ii), we note that
(x) (x) =
p
m
x
log p
px
log p (3.8)
=
x
log p
2mlog x/ log p
1
x
(log p)
_
log x
log p
_
xlog x,
so that
(x)
_
(x),
(x)
xlog x.
Hence the upper bound in (3.1) remains valid for (x), with the same values
of c
2
and x
0
, and the lower bound holds for (x) with constant c
1
/2 (for
example) instead of c
1
, upon increasing the value of x
0
if necessary.
The lower bound in (iii) follows immediately from that in (ii), since
(x) (1/ log x)
px
log p = (x)/ log x. The upper bound follows from
the inequality
(x) (
x) +
1
log x
x<px
log p
x +
2
log x
(x)
and the upper bound in (ii).
Alternate proofs of Theorem 3.1. The proof given here rests on the
convolution identity 1 = log, which relates the unknown function to
two extremely well-behaved functions, namely 1 and log. Given this relation,
it is natural to try to use it to derive information on the average behavior of
the function (n) from the very precise information that is available on the
behavior of the functions 1 and log n. The particular way this identity is used
in the proof of the theorem may seem contrived. Unfortunately, more natural
approaches dont work, and one has to resort to some sort of trickery to
get any useful information out of the above identity. For example, it is
tempting to try to simply invert the relation 1 = log to express as
= log, interpret as a perturbation as the function log and proceed
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
86 CHAPTER 3. DISTRIBUTION OF PRIMES I
as in the convolution method described in Section 2.4. Unfortunately, the
error terms in this approach are too large to be of any use.
There exist alternate proofs of Theorem 3.1, but none is particularly
motivated or natural, and all involve some sort of trick. For example, a
commonly seen argument, which may be a bit shorter than the one given
here, but has more the character of pulling something out of the air, is
based on an analysis of the middle binomial coecient
_
2n
n
_
: On the one
hand, writing this coecient as a fraction (n + 1)(n + 2) (2n)/n! and
noting that every prime p with n < p 2n divides the numerator, but not
the denominator, we see that
_
2n
n
_
is divisible by the product
n<p2n
p.
On the other hand, the binomial theorem gives the bound
_
2n
n
_
2n
k=0
_
2n
k
_
= 2
2n
.
Hence
n<p2n
p 2
2n
, and taking logarithms, we conclude
(2n) (n) =
n<p2n
log p (2 log 2)n,
for any positive integer n. By iterating this inequality, one gets (2
k
)
(2 log 2)2
k
for any positive integer k, and then (x) (4 log 2)x for any real
number x 2. This proves the upper bound in (ii) with constant 4 log 2.
The lower bound in (ii) can be proved by a similar argument, based on an
analysis of the prime factorization of
_
2n
n
_
, and the lower bound
_
2n
n
_
1
2n + 1
2n
k=0
_
2n
k
_
=
2
2n
2n + 1
.
The constants in Chebyshevs estimates. An inspection of the above
argument shows that it yields (3.1) with any constants c
1
and c
2
sat-
isfying c
1
< log 2 = 0.69 . . . and c
2
> 2 log 2 = 1.38 . . ., for su-
ciently large x. Chebyshev used a more complicated version of this ar-
gument, in which the linear combination S(x) 2S(x/2) is replaced by
S(x) S(x/2) S(x/3) S(x/5) + S(x/30), to obtain c
1
= 0.92 . . . and
c
2
= 1.10 . . . as constants in (3.1). For most applications, the values of these
constants are not important. However, since the PNT had not been proved
at the time Chebyshev proved his estimates, there was a strong motivation
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 87
to obtain constants as close to 1 as possible. It is natural to ask if, by con-
sidering more general linear combinations of the functions S(x/k), one can
further improve these constants. This is indeed the case; in fact, Diamond
and Erd os showed that it is possible to obtain constants c
1
and c
2
arbitrarily
close to 1, by using Chebyshevs approach with a suitable linear combination
of the function S(x). Now, the assertion that (3.1) holds with constants c
1
and c
2
arbitrarily close to 1, clearly implies the PNT in the form (x) x,
so it would seem that Chebyshevs method in fact yields a proof of the PNT.
However, this is not the case, since in proving that c
1
and c
2
can be taken
arbitrarily close to 1, Diamond and Erd os had to use the PNT.
Theorem 3.2 (Relation between , , and ). For x 2 we have
(x) = (x) +O(
x), (i)
(x) =
(x)
log x
+O
_
x
log
2
x
_
. (ii)
Proof. A slightly weaker version of (i), with error term O(
xlog x) in-
stead of O(
x) log x (
x/ log
x) log x
x.
To obtain (ii), we write
(x) =
px
1 =
px
(log p)(1/ log p)
and eliminate the factor 1/ log p by partial summation:
(x) =
(x)
x
_
x
2
(t)
_
1
t(log t)
2
_
dt =
(x)
x
+
_
x
2
(t)
t(log t)
2
dt.
Since (t) t by Chebyshevs estimate, the last integral is of order
_
x
2
1
(log t)
2
dt
_
x
2
1
(log 2)
2
+
_
x
x
1
(log
x)
2
dt
x +
x
(log
x)
2
x
(log x)
2
,
so we have
(x) =
(x)
log x
+O
_
x
log
2
x
_
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
88 CHAPTER 3. DISTRIBUTION OF PRIMES I
In view of (i), we may replace (x) by (x) on the right-hand side, and thus
obtain (ii).
Corollary 3.3 (Equivalent formulations of PNT). The following rela-
tions are equivalent:
(x)
x
log x
(x ), (i)
(x) x (x ), (ii)
(x) x (x ). (iii)
Proof. By the previous theorem, the functions (x), (x), and (x) log x
dier by an error term that is of order O(x/ log x) (at worst), and hence are
of smaller order (by a factor 1/ log x) than the main terms in the asserted
relations.
3.2 Mertens type estimates
A second class of estimates below the level of the PNT are estimates for
certain weighted sums over primes, such as the sum of reciprocals of primes
up to x. These estimates seem surprisingly strong as the error terms involved
are by at least a logarithmic factor smaller than the main term, yet they are
not strong enough to imply the PNT.
Theorem 3.4 (Mertens estimates). We have
nx
(n)
n
= log x +O(1), (i)
px
log p
p
= log x +O(1), (ii)
px
1
p
= log log x +A+O
_
1
log x
_
, (iii)
px
_
1
1
p
_
=
e
log x
_
1 +O
_
1
log x
__
, (iv)
where A is a constant and is Eulers constant.
Before proving this result, we make some remarks and derive two corol-
laries.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 89
Estimate (iv) is usually referred to as Mertens formula. We will prove
this result here with an unspecied constant in place of e
px
_
1
1
p
_
1
= e
(log x)
_
1 +O
_
1
log x
__
.
This version has the following interesting interpretation:
Let P(x) denote the product on the left of (iv). Expanding each of the
factors (1 1/p)
1
into a geometric series and multiplying out all terms
in this product, one obtains a sum over terms ()
px
p
p
, where the
exponents
p
run over all non-negative integers. Now, () is the reciprocal
of a positive integer n all of whose prime factors are x, and by the fun-
damental theorem of arithmetic each such reciprocal 1/n has exactly one
representation in the form (). Hence, letting A
x
= n N : p[n p x
denote the set of such integers n, we have P(x) =
nAx
1/n. Now A
x
clearly contains every positive integer n x, so we have the lower bound
P(x)
nx
1/n, which is asymptotic to log x. The estimate (iv) shows
that P(x) is (asymptotically) by a factor e
and A
1 A
_
x
0
1
(t)/t
t
dt + (1 )
_
x
x
0
1
t
dt (x
0
)
_
1
1
t
2
dt + (1 ) log x,
which contradicts (3.9) if x is suciently large. Hence A
1, and a similar
argument shows A
1.
Proof of Theorem 3.4. To prove (i) we begin, as in the proof of Chebyshevs
estimate for (x), with two evaluations for S(x) =
nx
log n. On the one
hand, by Corollary 2.7, we have S(x) = xlog x +O(x). On the other hand,
(3.3) and Chebyshevs estimate imply
S(x) =
dx
(d)[x/d] = x
nx
(d)
d
+O
_
_
dx
(d)
_
_
= x
nx
(d)
d
+O(x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 91
Setting the last expression equal to xlog x + O(x) and dividing by x, we
obtain (i).
The estimate (ii) follows from (i) on noting that the dierence between
the sums in (i) and (ii) equals
p
m
x
m2
log p
p
m
,
which can be bounded by
p
log p
m=2
1
p
m
2
p
log p
p
2
< .
Hence the sums in (i) and (ii) dier by a term of order O(1), and so (ii)
follows from (i).
We now deduce (iii) from (ii). To this end we write the summand 1/p
as ((log p)/p)(1/ log p) and apply partial summation to remove the factor
1/ log p. Dening L(t) and R(t) by
L(t) =
pt
log p
p
= log t +R(t)
(so that R(t) = O(1) by (ii)), we obtain
px
1
p
=
L(x)
log x
_
x
2
L(t)
1
t(log t)
2
dt
= 1 +
R(x)
log x
+
_
x
2
1
t log t
dt +
_
x
2
R(t)
t(log t)
2
dt
= 1 +O
_
1
log x
_
+ log log x log log 2 +I(x),
where I(x) =
_
x
2
R(t)/(t log
2
t)dt. To obtain the desired estimate (iii), it
suces to show that, for some constant C,
(3.10) I(x) = C +O
_
1
log x
_
.
To prove this, note that, since R(t) = O(1) and the integral
_
2
(t log
2
t)
1
dt
converges, the innite integral I() =
_
2
R(t)/(t log
2
t)dt converges. Set-
ting C = I(), we have
I(x) = C
_
x
R(t)
t(log t)
2
dt = C +O
__
x
1
t(log t)
2
dt
_
= C +O
_
1
log x
_
,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
92 CHAPTER 3. DISTRIBUTION OF PRIMES I
which proves (3.10).
It remains to prove (iv). We will establish (iv) with some positive con-
stant B in place of e
(which is, in fact, the most dicult part of the proof of Theorem 3.4), to a
later chapter.
Taking logarithms, (iv) becomes
px
log
_
1
1
p
_
= log log x + log
_
1 +O
_
1
log x
__
.
Since, for [y[ 1/2, [ log(1 +y)[ [y[, this estimate is equivalent to
(3.11)
px
log
_
1
1
p
_
= C + log log x +O
_
1
log x
_
,
with C = . We will show that the latter estimate holds, with a suitable
constant C.
Using the expansion log(1 x) =
n1
x
n
/n ([x[ < 1), we have
px
log
_
1
1
p
_
=
px
1
p
+
px
r
p
,
with r
p
=
m=2
1/(mp
m
). Since [r
p
[ (1/2)
m=2
p
m
p
2
, the series
p
r
p
is absolutely convergent, with sum R, say, and we have
px
r
p
= R
p>x
r
p
= R +O
_
p>x
1
p
2
_
= R +O
_
1
x
_
.
Hence the dierence between the left-hand sides of (iii) and (3.11) is R +
O(1/x). Therefore (3.11) follows from (iii).
3.3 Elementary consequences of the PNT
The following result gives some elementary consequences of the PNT.
Theorem 3.7 (Elementary consequences of the PNT). The PNT
implies:
(i) The nth prime p
n
satises p
n
nlog n as n .
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 93
(ii) The function (n), the number of distinct prime factors of n, has
maximal order (log n)/(log log n), i.e., it satises
limsup
n
(n)
(log n)/(log log n)
= 1.
(iii) For every > 0 there exists x
0
= x
0
() 2 such that for all x x
0
there exists a prime p with x < p (1 +)x.
(iv) The set of rational numbers p/q with p and q prime is dense on the
positive real axis.
(v) Given any nite string a
1
. . . a
n
of digits 0, 1, . . . , 9 with a
1
,= 0,
there exists a prime number whose decimal expansion begins with this
string.
Proof. (i) Since p
n
as n , the PNT gives
(3.12) n = (p
n
)
p
n
log p
n
(n ).
This implies that, for any xed > 0 and all suciently large n, we have
p
1
n
n p
n
, and hence (1 ) log p
n
log n log p
n
. The latter relation
shows that log p
n
log n as n , and substituting this asymptotic
formula into (3.12) yields n p
n
/ log n, which is equivalent to the desired
relation p
n
nlog n.
(ii) First note that, given any positive integer k, the least positive integer
n with (n) = k is n
k
= p
1
. . . p
k
, where p
i
denotes the i-th prime. Since
log n/ log log n is a monotone increasing function for suciently large n, it
suces to consider integers n from the sequence n
k
in the limsup in (ii).
We then need to show that
(3.13) limsup
k
k
(log n
k
)/(log log n
k
)
= 1.
The PNT and the asymptotic formula for p
k
proved in part (i) implies
log n
k
=
k
i=1
log p
i
= (p
k
) p
k
k log k (k )
and
log log n
k
= log((1 +o(1))k log k) = log k + log log k + log(1 +o(1))
= (1 +o(1)) log k.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
94 CHAPTER 3. DISTRIBUTION OF PRIMES I
Substituting these estimates on the left side of (3.13) gives the desired rela-
tion.
(iii) By the PNT we have, for any xed > 0,
((1 +)x)
(x)
(1 +)x/ log((1 +)x)
x/ log x
= (1 +)
log x
log(1 +) + log x
,
and thus lim
x
((1 +)x)/(x) = 1 +. This implies ((1 +)x) > (x)
for any > 0 and x x
0
(), which is equivalent to the assertion in (iii).
(iv) Given a positive real number and > 0, we need to show that there
exist primes p and q with [p/q [ , or equivalently () q q p
q +q. To this end, set
. Thus, for
x x
0
, there exists a prime p with x < p (1 +
)x. Taking x = q, we
conclude that there exists a prime p with q < p (1 +
)q = q +q, as
desired.
(v) Given a string of digits a
1
. . . a
r
, with a
1
,= 0, let A = a
1
. . . a
r
denote
the integer formed by these digits. Since a
1
,= 0, A is a positive integer. Now
observe that a positive integer n begins with the string a
1
. . . a
r
if and only
if, for some integer k 0, () 10
k
A n < 10
k
(A + 1). Applying the result
of (iii) with some < 1/A (say, = 1/(A+ 1)), we see that the interval ()
contains a prime for suciently large k.
3.4 The PNT and averages of the Moebius func-
tion
As we have seen above, the PNT is equivalent to each of the relations
(x) x and (x) x, in the sense that deducing one statement from
the other, and vice versa, is substantially easier than proving either of these
statements. (One should be aware that this type of equivalence is an
imprecise, and to some extent subjective, notion, but in the context of the
prime number theorem this informal usage of the term equivalent has
become standard. Of course, from a purely logical point of view, all true
statements are equivalent to each other.)
There exist many other prime number sums or products for which an
asymptotic estimation is equivalent, in the same sense, to the PNT. These
equivalences are usually neither particularly deep or unexpected, and are
easily established.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 95
In this section we prove the equivalence of the PNT to a rather dierent
type of result, namely that the Moebius function has mean value zero. In
contrast to the above-mentioned equivalences, the connection between the
PNT and the mean value of the Moebius function lies much deeper and is
more dicult to establish (though still easier than a proof of the PNT). The
precise statement is the following.
Theorem 3.8 (Relation between PNT and the Moebius function).
The PNT is equivalent to the relation
(3.14) lim
x
1
x
nx
(n) = 0,
i.e., the assertion that the mean value M() of the Moebius function exists
and is equal to 0.
The proof of this result will be given in the second part of this section.
We rst make some remarks and establish several auxiliary results.
Primes and the Moebius function. What is so surprising about this
result is that there does not seem to be any obvious connection between the
distribution of primes (which is described by the PNT), and the distribu-
tion of the values of the Moebius function (which is described by the result
that M() = 0). If one restricts to squarefree numbers, then the Moebius
function encodes the parity of the number of prime factors of an integer.
The assertion that M() = 0 can then be interpreted as saying that the two
parities, even and odd, occur with the same asymptotic frequency. More
precisely, this may be formulated as follows: Let Q(x) denote the number
of squarefree positive integers x, and Q
+
(x), resp. Q
(x) = Q
+
(x) +o(x) = (1/2)Q(x) +o(x)
3
2
x (x ),
in view of the asymptotic relation Q(x) (6/
2
)x. The equivalence between
the PNT and the relation M() = 0 therefore means that an asymptotic
formula for the function (x), which counts positive integers x with exactly
one prime factor, is equivalent to an asymptotic formula for the function
Q
nx
(n)
n
1.
Proof. Note rst that, without loss of generality, we can assume that x = N,
where N is a positive integer. We then evaluate the sumS(N) =
nN
e(n),
where e is the convolution identity, dened by e(n) = 1 if n = 1 and e(n) = 0
otherwise, in two dierent ways. On the one hand, by the denition of
e(n), we have S(N) = 1; on the other hand, writing e(n) =
d|n
(d) and
interchanging summations, we obtain
S(N) =
dN
(d)[N/d] = N
dN
(d)
d
dN
(d)N/d,
where t denotes the fractional part of t. We now bound the latter sum.
Since N is an integer, we have N/d = 0 when d = N. Thus we can restrict
the summation to those terms for which 1 d N1, and using the trivial
bound [(d)N/d[ 1 for these terms, we see that this sum is bounded by
N 1. Hence,
dN
(d)
d
(N 1) +[S(N)[ = (N 1) + 1 = N,
which gives the asserted bound for x = N.
Corollary 3.10 (Logarithmic mean value of the Moebius function).
The Moebius function has logarithmic mean value L() = 0. Moreover, if
the ordinary mean value M() exists, it must be equal to 0.
Proof. The rst statement follows immediately from the denition of the
logarithmic mean value and Theorem 3.9. The second statement follows
from the rst and the general result (Theorem 2.13) that if the ordinary
mean value M(f) of an arithmetic function f exists, then the logarithmic
mean value L(f) exists as well, and the two mean values are equal.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 97
Our second auxiliary result is a general result relating the ordinary
mean value of an arithmetic function to a mean value involving logarith-
mic weights. Its proof is a simple exercise in partial summation and is
omitted here.
Lemma 3.11. Given an arithmetic function f, dene a mean value H(f)
by
H(f) = lim
x
1
xlog x
nx
f(n) log n,
if the limit exists. Then H(f) exists if and only if the ordinary mean value
M(f) exits.
We are now ready to prove the main result of this section.
Proof of 3.8. The proof of this result is longer and more complex than any
of the proofs we have encountered so far. Yet it is still easier than a proof of
the PNT itself. Its proof requires much of the arsenal of tools and tricks we
have assembled so far: convolution identities between arithmetic functions,
partial summation, convolution arguments, the Dirichlet hyperbola method,
and an estimate for sums of the divisor functions.
We will use the fact that the PNT is equivalent to the relation
(3.15) (x) x (x ).
We will also use the result of Lemma 3.11 above, according to which (3.14)
is equivalent to
(3.16) lim
x
1
xlog x
nx
(n) log n = 0.
To prove Theorem 3.8 it is therefore enough to show the implications (i)
(3.15) (3.16) and (ii) (3.14) (3.15).
(i) Proof of (3.15) (3.16): This is the easier direction. The proof
rests on the following identity which is a variant of the identity log = 1 .
Lemma 3.12. We have
(3.17) (n) log n = ( )(n) (n N).
Proof. Suppose rst that n is squarefree. In this case we have, for any divisor
d of n, (n) = (d)(n/d) (since any such divisor d must be squarefree and
relatively prime to its complementary divisor n/d) and (d)(d) = (d)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
98 CHAPTER 3. DISTRIBUTION OF PRIMES I
(since for squarefree d, (d) is zero unless d is a prime, in which case (d) =
1). Thus, multiplying the identity log n =
d|n
(n/d) by (n), we obtain
(n) log n =
d|n
(d)(n/d)(n/d) =
d|n
(d)(n/d),
which proves (3.17) for squarefree n. If n is not squarefree, the left-hand side
of (3.17) is zero, so it suces to show that ( )(n) = 0 for non-squarefree
n. Now ( )(n) =
p
m
|n
(log p)(n/p
m
). If n is divisible by the squares
of at least two primes, then none of the numbers n/p
m
occurring in this sum
is squarefree, so the sum vanishes. On the other hand, if n is divisible by
exactly one square of a prime, then n is of the form n = p
m
0
0
n
0
with m
0
2
and n
0
squarefree, (n
0
, p
0
) = 1, and the above sum reduces to (log p
0
)(n
0
)+
(log p
0
)(n
0
p
0
), which is again zero since (n
0
p
0
) = (n
0
)(p
0
) = (n
0
).
This completes the proof of the lemma.
Now, suppose (3.15) holds, and set
H(x) =
nx
(n) log n.
We need to show that, given > 0, we have [H(x)[ xlog x for all su-
ciently large x. From the identity (3.17) we have
H(x) =
nx
d|n
(d)(n/d)
=
dx
(d)
mx/d
(m) =
dx
(d)(x/d).
Let > 0 be given. By the hypothesis (3.15) there exists x
0
= x
0
() 1
such that, for x x
0
, [(x) x[ x. Moreover, we have trivially [(x)[
nx
log n xlog x for all x 1. Applying the rst bound with x/d in
place of x for d x/x
0
, and the second (trivial) bound for x/x
0
< d x,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 99
we obtain, for x x
0
,
[H(x)[
dx/x
0
(d)
x
d
dx/x
0
[(d)[
_
x
d
_
x
d
(3.18)
+
x/x
0
<dx
[(d)[
_
x
d
_
dx/x
0
(d)
x
d
dx/x
0
[(d)[
x
d
+
x/x
0
<dx
[(d)[
x
d
log(x/d)
=
2
+
3
,
say. Of the three sums here, the rst is bounded by
= x
dx/x
0
(d)
d
x
by Lemma 3.9. The second sum is bounded by
2
x
dx/x
0
1
d
= x(log(x/x
0
) +O(1)) x(log x +O(1)),
by Theorem 2.5. The third sum satises
3
(log x
0
)x
0
dx
1 (log x
0
)x
0
x,
and hence is of order O
(x) (x x
0
),
which implies
limsup
x
[H(x)[
xlog x
.
Since was arbitrary, the limsup must be zero, i.e., (3.16) holds.
(ii) Proof of (3.14) (3.15): The proof rests on the identity =
log , which follows from log = 1 by Moebius inversion. However, a
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
100 CHAPTER 3. DISTRIBUTION OF PRIMES I
direct application of this identity is not successful: Namely, writing (n) =
d|n
(log d)(n/d) and inverting the order of summation as usual yields
(x) =
nx
(n) =
dx
(log d)
nx/d
(n).
Assuming a bound of the form x/d for the (absolute value of) the inner sum
would yield a bound
dx
(log d)(x/d), which has order of magnitude
x(log x)
2
and thus would not even yield a Chebyshev type bound for (x).
Even assuming stronger bounds on
nx
(n) (e.g., bounds of the form
O(x/ log
A
x) for some constant A) would at best yield Chebyshevs bound
(x) x in this approach.
To get around these diculties, we take the unusual (and surprising) step
of approximating a smooth function, namely log n, by an arithmetic function
that is anything but smooth, but which has nice arithmetic properties. The
approximation we choose is the function f(n) = d(n)2, where d = 11 is
the divisor function and is Eulers constant. This choice is motivated by
the fact that the summatory function of f(n) approximates the summatory
function of log n very well. Indeed, on the one hand, Theorem 2.20 gives
nx
f(n) =
nx
d(n) 2
nx
1
= x(log x + 2 1) +O(
x) 2x +O(1)
= x(log x 1) +O(
x),
while, on the other hand, by Corollary 2.8,
nx
log n = x(log x 1) +O(log x).
Thus, if we dene the remainder function r(n) by
log = f +r = (1 1) 2 +r,
we have
(3.19)
nx
r(n) =
nx
log n
nx
f(n) = O(
x) (x 1).
Replacing the function log by f +r = (11)21+r in the identity =
log, we obtain, using the algebraic properties of the Dirichlet convolution,
= (1 1 2 1 +r)
= ( 1) 1 2( 1) + r = 1 2e + r,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 101
where e is the usual convolution identity. It follows that
(3.20) (x) =
nx
1 2 +
nx
( r)(n) = x +O(1) +E(x),
where
E(x) =
nx
( r)(n).
Thus, in order to obtain (3.15), it remains to show that the term E(x) is of
order o(x) as x .
It is instructive to compare the latter sum E(x) with the sum (x) =
nx
( log)(n) we started out with. Both of these sums are convolution
sums involving the Moebius function. The dierence is that, in the sum
E(x), the function log n has been replaced by the function r(n), which, by
(3.19), is much smaller on average than the function log n. This makes a
crucial dierence in our ability to successfully estimate the sum. Indeed,
writing
E(x) =
dx
(d)
nx/d
r(n)
and bounding the inner sum by (3.19) would give the bound
E(x)
dx
_
x/d =
dx
1
d
x,
which is by a factor (log x)
2
better than what a similar argument with the
function log n instead of r(n) would have given and strong enough to yield
Chebyshevs bound for (x).
Thus, it remains to improve the above bound from O(x) to o(x), by
exploiting our assumption (3.14) (which was not used in deriving the above
bound). To this end, we use a general version of the Dirichlet hyperbola
method: We x y with 1 y x and split the sum E(x) into
(3.21) E(x) =
nx
d|n
r(d)(n/d) =
dmx
r(d)(m) =
1
+
3
,
where
1
=
dy
r(d)M(x/d),
2
=
mx/y
(m)R(x/m),
3
= R(y)M(x/y),
with
M(x) =
nx
(n), R(x) =
nx
r(n).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
102 CHAPTER 3. DISTRIBUTION OF PRIMES I
We proceed to estimate the three sums arising in the decomposition
(3.21). Let > 0 be given. Then, by our assumption (3.14), there exists
x
0
= x
0
() such that [M(x)[ x for x x
0
. Moreover, by (3.19), there
exists a constant c such that [R(x)[ c
y(x/y) cx,
and
mx/y
[(m)[c
_
x/m c
mx/y
1
m
2cx
y
,
where the latter estimate follows (for example) from Eulers summation
formula in the form
nt
1
n
= 1 +
_
t
1
1
s
ds
t
_
t
1
s
s
2
ds
1 +
_
t
1
t
1/2
dt 2
t (t 2).
Finally, we have
dy
[r(d)[(x/d) C(y)x,
where
C(y) =
dy
[r(d)[
d
is a constant depending only on y.
Substituting the above bounds into (3.21), we obtain
[E(x)[ x
_
c +C(y) +
2c
y
_
.
It follows that
limsup
x
[E(x)[
x
(c +C(y)) +
2c
y
,
for any xed > 0 and y 1. Since > 0 was arbitrary, the above limsup is
bounded by 2c/
p
a
p
converges if and only if
n=2
a
n
/ log n converges.
3.4 For positive integers k dene the generalized von Mangoldt functions
k
by the identity
d|n
k
(d) = (log n)
k
(which for k = 1 reduces to
the familiar identity for the ordinary von Mangoldt function (n)).
Show that
k
(n) = 0 if n has more than k distinct prime factors.
3.5 Call a positive integer n round if it has no prime factors greater than
x,
and then show that the dierence between R(x) and R
0
(x) is of order
O(x/ log x) and thus negligible.)
3.6 Let be a xed non-zero real number, and let S
(x) =
px
p
1i
.
Use the prime number theorem in the form (x) = x/ log x +
O(x/ log
2
x) to derive an estimate for S
px
_
1 +
1
p
_
.
Obtain an estimate for Q(x) with relative error O(1/ log x). Ex-
press the constant arising in this estimate in terms of well-known
mathematical constants. (Hint: Relate Q(x) to the product P(x) =
px
(1 1/p) estimated by Mertens formula.)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
104 CHAPTER 3. DISTRIBUTION OF PRIMES I
3.8 Let
R(x) =
px
_
1 +
2
p
_
.
Obtain an estimate for R(x) with relative error O(1/ log x). (Constants
arising in the estimate need not be evaluated explicitly.
3.9 Using only Mertens type estimates (but not the PNT), obtain an
asymptotic estimate for the partial sums
S(x) =
px
1
p log p
with as good an error term as you can get using only results at the
level of Mertens.
3.10 (i) Show that the estimate (a stronger version of one of Mertens
estimates)
(1)
nx
(n)
n
= log x +C +o(1),
where C is a constant, implies the PNT.
(ii) (Harder) Show that the converse also holds, i.e., the PNT implies
(1).
3.11 Show that
nx
2
(n) =
6
2
x +o(
x) (x ).
(With error term O(
x)
requires an appeal to the PNT and a more careful treatment of the
term that gave rise to the O(
x) error.)
3.12 Eulers proof of the innitude of primes shows that ()
px
1/p
log log x C, for some constant C and all suciently large x. This is
a remarkably good lower bound for the sum of reciprocals of primes
(it is o by only a term O(1)), so it is of some interest to see what
this bound implies for (x). The answer is, surprisingly little, as the
following problems show.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 105
(i) Deduce from (), without using any other information about the
primes, that there exists > 0 such that (x) > log x for all
suciently large x. In other words, show that if A is any sequence
of positive integers satisfying
(1)
ax,aA
1
a
log log x C
for some constant C and all suciently large x, then there exists
a constant > 0 such that the counting function A(x) = #a
A, a x satises
(2) A(x) log x
for all suciently large x.
(ii) (Harder) Show that this result is nearly best possible, in the sense
that it becomes false if the function log x on the right-hand side
of (2) is replaced by a power (log x)
n=1
f(n)
n
s
is called the Dirichlet series associated with f. A Dirichlet series can be
regarded as a purely formal innite series (i.e., ignoring questions about
convergence), or as a function of the complex variable s, dened in the
region in which the series converges. The variable s is usually written as
(4.2) s = +it, = Re s, t = Ims.
Dirichlet series serve as a type of generating functions for arithmetic
functions, adapted to the multiplicative structure of the integers, and they
play a role similar to that of ordinary generating functions in combinatorics.
For example, just as ordinary generating functions can be used to prove
combinatorial identities, Dirichlet series can be applied to discover and prove
identities among arithmetic functions.
On a more sophisticated level, the analytic properties of a Dirichlet se-
ries, regarded as a function of the complex variable s, can be exploited to
obtain information on the behavior of partial sums
nx
f(n) of arithmetic
107
108 CHAPTER 4. ARITHMETIC FUNCTIONS III
functions. This is how Hadamard and de la Vallee Poussin obtained the
rst proof of the Prime Number Theorem. In fact, most analytic proofs of
the Prime Number Theorem (including the one we shall give in the follow-
ing chapter) proceed by relating the partial sums
nx
(n) to a complex
integral involving the Dirichlet series
n=1
(n)n
s
, and evaluating that
integral by analytic techniques.
The most famous Dirichlet series is the Riemann zeta function (s),
dened as the Dirichlet series associated with the constant function 1, i.e.,
(4.3) (s) =
n=1
1
n
s
( > 1),
where is the real part of s, as dened in (4.2).
4.2 Algebraic properties of Dirichlet series
We begin by proving two important elementary results which show that
Dirichlet series respect the multiplicative structure of the integers. It is
because of these results that Dirichlet series, rather than ordinary generating
functions, are the ideal tool to study the behavior of arithmetic functions.
The rst result shows that the Dirichlet series of a convolution product
of arithmetic functions is the (ordinary) product of the associated Dirichlet
series. It is analogous to the well-known (and easy to prove) fact that,
given two functions f(n) and g(n), the product of their ordinary generating
functions
n=0
f(n)z
n
and
n=0
g(n)z
n
is the generating function for the
function h(n) =
n
k=0
f(k)g(n k), the additive convolution of f and g.
Theorem 4.1 (Dirichlet series of convolution products). Let f and g
be arithmetic functions with associated Dirichlet series F(s) and G(s). Let
h = f g be the Dirichlet convolution of f and g, and H(s) the associated
Dirichlet series. If F(s) and G(s) converge absolutely at some point s, then
so does H(s), and we have H(s) = F(s)G(s).
Proof. We have
F(s)G(s) =
k=1
m=1
f(k)g(m)
k
s
m
s
=
n=1
1
n
s
km=n
f(k)g(m) =
n=1
(f g)(n)
n
s
,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 109
where the rearranging of terms in the double sum is justied by the absolute
convergence of the series F(s) and G(s). This shows that F(s)G(s) = H(s);
the absolute convergence of the series H(s) =
n=1
h(n)n
s
follows from
that of F(s) and G(s) in view of the inequality
n=1
h(n)
n
s
n=1
1
[n
s
[
km=n
[f(k)[ [g(m)[
=
_
k=1
f(k)
k
s
__
m=1
g(m)
m
s
_
.
Remark. The hypothesis that the Dirichlet series F(s) and G(s) converge
absolutely is essential here, since one has to be able to rearrange the terms in
the double series obtained by multiplying the series F(s) and G(s). Without
this hypothesis, the conclusion of the theorem need not hold.
Corollary 4.2 (Dirichlet series of convolution inverses). Let f be an
arithmetic function with associated Dirichlet series F(s), and g the convo-
lution inverse of f (so that f g = e), and let G(s) be the Dirichlet series
associated with g. Then we have G(s) = 1/F(s) at any point s at which both
F(s) and G(s) converge absolutely.
Proof. Since the function e has Dirichlet series
n=1
e(n)n
s
= 1, the result
follows immediately from the theorem.
Remark. The absolute convergence of F(s) does not imply that of the Dirich-
let series associated with the Dirichlet inverse of f. For example, the function
dened by f(1) = 1, f(2) = 1, and f(n) = 0 for n 3 has Dirichlet series
F(s) = 1 2
s
, which converges everywhere. However, the Dirichlet series
of the Dirichlet inverse of f is 1/F(s) = (1 2
s
)
1
=
k=0
2
ks
, which
converges absolutely in > 0, but not in the half-plane 0.
The theorem and its corollary can be used, in conjunction with known
convolution identities, to evaluate the Dirichlet series of many familiar arith-
metic functions, as is illustrated by the following examples.
Examples of Dirichlet series
(1) Unit function. The Dirichlet series for e(n), the convolution unit, is
n=1
e(n)n
s
= 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
110 CHAPTER 4. ARITHMETIC FUNCTIONS III
(2) Moebius function. Since is the convolution inverse of the
function 1 and the associated Dirichlet series
n=1
(n)n
s
and
(s) =
n=1
n
s
both converge absolutely in > 1, we have
n=1
(n)n
s
= 1/(s) for > 1. In particular, setting s = 2,
we obtain the relation
n=1
(n)n
2
= 1/(2) = 6/
2
, which we had
derived earlier.
(3) Characteristic function of the squares. Let s(n) denote the char-
acteristic function of the squares. Then the associated Dirichlet series
is given by
n=1
s(n)n
s
=
m=1
(m
2
)
s
= (2s), which converges
absolutely in > 1/2.
(4) Logarithm. Termwise dierentiation of the series (s) =
n=1
n
s
gives the series
n=1
(log n)n
s
. Since (s) converges absolutely and
uniformly in any range of the form 1+ with > 0 (which follows,
for example, by applying the Weierstrass M-test since the terms of the
series are bounded by n
1
in that range and
n=1
n
1
converges),
termwise dierentiation is justied in the range > 1, and we therefore
have
(s) =
n=1
(log n)n
s
. Hence the Dirichlet series for the
function log n is
n=1
id(n)n
s
=
n=1
n
(s1)
= (s1), which converges
absolutely in > 2.
(6) Euler phi function. By the identity = id and the formulas for
the Dirichlet series for id and obtained above, the Dirichlet series
for (n) is
n=1
(n)n
s
= (s1)/(s) and converges absolutely for
> 2.
(7) Divisor function. Since d = 1 1, the Dirichlet series for the divisor
function is
n=1
d(n)n
s
= (s)
2
and converges absolutely in > 1.
(8) Characteristic function of the squarefree numbers. The func-
tion
2
satises the identity
2
s = 1, where s is the character-
istic function of the squares, whose Dirichlet series was evaluated
above as (2s). Hence the Dirichlet series associated with
2
, i.e.,
F(s) =
n=1
2
(n)n
s
, satises F(s)(2s) = (s), where all series
converge absolutely in > 1. It follows that F(s) = (s)/(2s) for
> 1.
(9) Von Mangoldt function. Since 1 = log and the function log
has Dirichlet series
n=1
(n)n
s
(s) =
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 111
(s), and so
n=1
(n)n
s
=
n=1
(n)n
s
=
(s)/(s),
it clearly shows the inuence of the location of zeta zeros on the dis-
tribution of prime numbers.
The second important result of this section gives a representation of the
Dirichlet series of a multiplicative function as an innite product over primes,
called Euler product. Given a Dirichlet series F(s) =
n=1
f(n)n
s
, the
Euler product for F(s) is the innite product
(4.4)
p
_
1 +
m=1
f(p
m
)
p
ms
_
.
(For the denition of convergence and absolute convergence of innite prod-
ucts, and some basic results about such products, see Section A.2 in the
Appendix.)
Theorem 4.3 (Euler product identity). Let f be a multiplicative arith-
metic function with Dirichlet series F, and let s be a complex number.
(i) If F(s) converges absolutely at some point s, then the innite product
(4.4) converges absolutely and is equal to F(s).
(ii) The Dirichlet series F(s) converges absolutely if and only if
(4.5)
p
m
f(p
m
)
p
ms
< .
Proof. (i) The absolute convergence of the innite product follows from the
bound
m=1
f(p
m
)
p
ms
m=1
f(p
m
)
p
ms
n=1
f(n)
n
s
< ,
and a general convergence criterion for innite products (Lemma A.3 in the
Appendix). It therefore remains to show that the product is equal to F(s),
i.e., that lim
N
P
N
(s) = F(s), where
P
N
(s) =
pN
_
1 +
m=1
f(p
m
)
p
ms
_
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
112 CHAPTER 4. ARITHMETIC FUNCTIONS III
Let N 2 be given, and let p
1
, . . . , p
k
denote the primes N. Upon
multiplying out P
N
(s) (note that the term 1 in each factor can be written
as f(p
m
)/p
ms
with m = 0) and using the multiplicativity of f, we obtain
P
N
(s) =
m
1
=0
. . .
m
k
=0
f(p
m
1
1
) f(p
m
k
k
)
p
m
1
s
1
p
m
k
s
k
=
m
1
=0
. . .
m
k
=0
f(p
m
1
1
p
m
k
k
)
(p
m
1
1
p
m
k
k
)
s
.
The integers p
m
1
1
. . . p
m
k
k
occurring in this sum are positive integers composed
only of prime factors p N, i.e., elements of the set
A
N
= n N : p[n p N.
Moreover, by the Fundamental Theorem of Arithmetic theorem, each ele-
ment of A
N
has a unique factorization as p
m
1
1
. . . p
m
k
k
with m
i
N 0,
and thus occurs exactly once in the above sum. Hence we have P
N
(s) =
nA
N
f(n)n
s
. Since A
N
contains all integers N, it follows that
[P
N
(s) F(s)[ =
nA
N
f(n)
n
s
n>N
f(n)
n
s
,
which tends to zero as N , in view of the absolute convergence of the
series
n=1
f(n)n
s
. Hence lim
N
P
N
(s) = F(s).
(ii) Since the series in (4.5) is a subseries of
n=1
[f(n)n
s
[, the absolute
convergence of F(s) implies (4.5). Conversely, if (4.5) holds, then, by Lemma
A.3, the innite product
p
_
1 +
m=1
f(p
m
)
p
ms
_
converges (absolutely). Moreover, if P
N
(s) denotes the same product, but
restricted to primes p N, then, as in the proof of part (i), we have
P
N
(s) =
nA
N
f(n)
n
s
nN
f(n)
n
s
.
Since P
N
(s) converges as N , the partial sums on the right are bounded
as N . Thus, F(s) converges absolutely.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 113
Remarks. As in the case of the previous theorem, the result is not valid
without assuming absolute convergence of the Dirichlet series F(s).
The theorem is usually only stated in the form (i); however, for most
applications the condition stated in (ii) is easier to verify than the absolute
convergence of F(s).
Examples of Euler products
(1) Riemann zeta function. The most famous Euler product is that
of the Riemann zeta function, the Dirichlet series of the arithmetic
function 1:
(s) =
n=1
1
n
s
=
p
_
1 +
m=1
1
p
ms
_
=
p
_
1
1
p
s
_
1
( > 1).
Eulers proof of the innitude of primes was based on this identity. In
fact, if one could take s = 1 in this identity one would immediately
obtain the innitude of primes since in that case the series on the left
is divergent, forcing the product on the right to have innitely many
factors. However, since the identity is only valid in > 1, a slightly
more complicated argument is needed, by applying the identity with
real s = > 1 and investigating the behavior of the left and right sides
as s 1+. If there were only nitely many primes, then the product
on the right would involve only nitely many factors, and hence would
converge to the nite product
p
(1 1/p)
1
as s 1+. On the
other hand, for every N, the series on the left (with s = > 1) is
nN
n
, which converges to
nN
n
1
as s 1+. Hence the
limit of the left-hand side, as s 1+, is
nN
n
1
for every xed
N and, since
nN
n
1
as N , this limit must be innite.
This contradiction proves the innitude of primes.
(2) Moebius function. The Dirichlet series for the Moebius function
has Euler product
n=1
(n)
n
s
=
p
_
1
1
p
s
_
,
a representation that is valid in the half-plane > 1. This can be seen
directly, by the denition of the Euler product. Alternatively, one can
argue as follows: Since the Moebius function is the Dirichlet inverse
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
114 CHAPTER 4. ARITHMETIC FUNCTIONS III
of the arithmetic function 1, its Dirichlet series is the reciprocal of the
Riemann zeta function. Hence, by Lemma A.4, its Euler product con-
sists of factors that are reciprocals of the factors of the Euler product
of the zeta function.
(3) Completely multiplicative functions. The functions 1 and con-
sidered above are examples of completely multiplicative functions and
their inverses. The Euler products of arbitrary completely multiplica-
tive functions and their inverses have the same general shape. Indeed,
let f be a completely multiplicative function with Dirichlet series F(s),
and let g be the Dirichlet inverse of f, with Dirichlet series G(s). Then,
formally, we have the identities
F(s) =
p
_
m=0
f(p)
m
p
ms
_
=
p
_
1
f(p)
p
s
_
1
,
and
G(s) =
1
F(s)
=
p
_
1
f(p)
p
s
_
.
These representations are valid provided the associated Dirichlet series
converge absolutely, a condition that can be checked, for example,
using the criterion of part (ii) of Theorem 4.3. For example, if [f(p)[
1 for all primes p, then both F(s) and G(s) converge absolutely in
> 1, and so the Euler product representations are valid in > 1 as
well.
(4) Characteristic function of integers relatively prime to a given
set of primes. Given a nite or innite set of primes T, let f
P
denote the characteristic function of the positive integers that do not
have a prime divisor belonging to the set T. Thus f
P
is the completely
multiplicative function dened by f
P
= 1 if p , T and f
P
= 0 if p T.
Then the Dirichlet series F
P
of f
P
is given by the Euler product
F
P
(s) =
pP
_
1
1
p
s
_
1
= (s)
pP
_
1
1
p
s
_
,
and this representation is valid in > 1.
(5) Characteristic function of k-free integers. Given an integer k
2, fet f
k
(n) denote the characteristic function of the k-free integers,
i.e., integers which are not divisible by the k-th power of a prime. The
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 115
function f
k
is obviously multiplicative, and since it is bounded by 1,
its Dirichlet series F
k
(s) converges absolutely in the half-plane > 1
and there has Euler product
F
k
(s) =
p
_
k1
m=0
1
p
ms
_
=
p
1 p
ks
1 p
s
=
(s)
(ks)
.
(6) Euler phi function. Since (p
m
) = p
m
p
m1
for m 1, we have,
formally,
n=1
(n)
n
s
=
p
_
1 +
m=1
p
m
p
m1
p
ms
_
=
p
_
1 +
1 p
1
p
s1
(1 p
s+1
)
_
=
p
1 p
s
1 p
s+1
.
Since (n) n, the Dirichlet series for converges absolutely in the
half-plane > 2, so the above Euler product representation is valid in
this half-plane. Moreover, the last expression above can be recognized
as the product of the Euler product representations for the Dirichlet
series (s 1) and 1/(s). Thus, the Dirichlet series for (n) is equal
to (s 1)/(s), a result we had obtained earlier using the identity
= id.
4.3 Analytic properties of Dirichlet series
We begin by proving two results describing the regions in the complex plane
in which a Dirichlet series converges, absolutely or conditionally.
In the case of an ordinary power series
n=0
a
n
z
n
, it is well-known that
there exists a disk of convergence [z[ < R such that the series converges
absolutely [z[ < R, and diverges when [z[ > R. The number R, called
radius of convergence, can be any positive real number, or 0 (in which case
the series diverges for all z ,= 0), or (in which case the series converges
everywhere). For values z on the circle [z[ = R, the series may converge or
diverge.
For Dirichlet series, a similar result is true, with the disk of convergence
replaced by a half-plane of convergence of the form >
0
. However, in
contrast to the situation for power series, where the regions for convergence
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
116 CHAPTER 4. ARITHMETIC FUNCTIONS III
and absolute convergence are identical (except possibly for the boundaries),
for Dirichlet series there may be a nontrivial region in the form of a vertical
strip in which the series converges, but does not converge absolutely.
Theorem 4.4 (Absolute convergence of Dirichlet series). For every
Dirichlet series there exists a number
a
R, called the abscissa of
absolute convergence, such that for all s with >
a
the series converges
absolutely, and for all s with <
a
, the series does not converge absolutely.
Proof. Given a Dirichlet series F(s) =
n=1
f(n)n
s
, let A be the set of
complex numbers s at which F(s) converges absolutely. If the set A is
empty, the conclusion of the theorem holds with
a
= . Otherwise, set
a
= infRe s : s A R. By the denition of
a
, the series F(s)
does not converge absolutely if <
a
. On the other hand, if s = +it and
s
+it
with
, then
n=1
f(n)
n
s
n=1
[f(n)[
n
n=1
[f(n)[
n
n=1
f(n)
n
s
.
Hence, if F(s) converges absolutely at some point s, then it also converges
absolutely at any point s
with Re s
n=1
n
s
has
abscissa of absolute convergence
a
= 1, but it does not converge absolutely
when = 1. On the other hand, the Dirichlet series corresponding to the
arithmetic function f(n) = 1/ log
2
(2n) has the same abscissa of convergence
1, but also converges absolutely at = 1.
Establishing an analogous result for conditional convergence is harder.
The key step is contained in the following theorem.
Proposition 4.5. Suppose the Dirichlet series F(s) =
n=1
f(n)n
s
con-
verges at some point s = s
0
=
0
+ it
0
. Then the series converges at every
point s with >
0
. Moreover, the convergence is uniform in every compact
region contained in the half-plane >
0
.
Proof. Suppose F(s) converges at s
0
, and let s be a point with >
0
. Set
=
0
(so that > 0) and let
S(x, y) =
x<ny
f(n)
n
s
, S
0
(x, y) =
x<ny
f(n)
n
s
0
(y > x 1).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 117
We will establish the convergence of the series F(s) by showing that it
satises Cauchys criterion.
Let > 0 be given. By Cauchys criterion, applied to the series F(s
0
) =
n=1
f(n)n
s
0
(which, by hypothesis, converges) there exists x
0
= x
0
()
1 such that
(4.6) [S
0
(x, y)[ (y > x x
0
).
We now relate the sums S(x, y) to the sums S
0
(x, y) by writing the sum-
mands as f(n)n
s
0
n
s
0
s
, and removing the factor n
s
0
s
by partial sum-
mation. Given y > x x
0
, we have
S(x, y) =
x<ny
f(n)
n
s
0
n
s
0
s
= S
0
(x, y)y
s
0
s
_
y
x
S
0
(x, u)(s
0
s)u
s
0
s1
du.
Since, by (4.6), [S
0
(x, u)[ for u x( x
0
), and [u
s
0
s
[ = u
= u
+[s s
0
[
_
y
x
u
1
du
_
1 +[s s
0
[
_
1
u
1
du
_
=
_
1 +
[s s
0
[
_
= C (y > x x
0
()),
where C = C(s, s
0
) = 1 +[s s
0
[/ is independent of x and y. Since was
arbitrary, this shows that the series F(s) satises Cauchys criterion and
hence converges.
To prove that the convergence is uniform on compact subsets of the half
plane >
0
, note that in any compact subset K of the half-plane >
0
,
the quantity =
0
is bounded from below and [s s
0
[ is bounded
from above. Hence, the constant C = C(s, s
0
) = [s s
0
[/ dened above is
bounded by a constant C
0
= C
0
(K) depending only on the subset K, but
not on s, and the Cauchy criterion therefore holds uniformly in K.
The following result describes the region of convergence of a Dirichlet
series and is the analog of Theorem 4.4 for conditional convergence.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
118 CHAPTER 4. ARITHMETIC FUNCTIONS III
Theorem 4.6 (Convergence of Dirichlet series). For every Dirichlet
series there exists a number
c
R , called the abscissa of conver-
gence, such that the series converges in the half-plane >
c
(the half-plane
of convergence), and diverges in the half-plane <
c
. The convergence
is uniform on compact subsets of the half-plane of convergence. Moreover,
the abscissa of convergence
c
and the abscissa of absolute convergence
a
satisfy
a
1
c
a
.
Remarks. As in the case of Theorem 4.4, at points s on the line =
c
, the
series may converge or diverge.
The inequalities
a
1
c
a
are best-possible, in the sense that
equality can occur in both cases, as illustrated by the following examples.
(i) If f(n) is nonnegative, then, at any point s on the real line the
associated Dirichlet series F(s) =
n=1
f(n)n
s
converges if and only if
it converges absolutely. Since, by the theorem, convergence (and absolute
convergence) occurs on half-planes, this implies that the half-planes of con-
vergence and absolute convergence are identical. Hence we have
c
=
a
whenever f(n) is nonnegative.
(ii) The Dirichlet series F(s) =
n=1
(1)
n
n
s
is an example for which
c
=
a
1. Indeed, F(s) converges at any real s with s > 0 (since it is
an alternating series with decreasing terms at such points), and diverges for
0 (since for 0 the terms of the series do not converge to zero), so
we have
c
= 0. However, since
n=1
[(1)
n
n
s
[ =
n=1
n
, the series
converges absolutely if and only if > 1, so that
a
= 1.
Proof. If the series F(s) diverges everywhere, the result holds with
c
= .
Suppose therefore that the series converges at at least one point, let D be the
set of all points s at which the series converges, and dene
c
= infRe s :
s D R . Then, by the denition of
c
, F(s) diverges at any
point s with <
c
. On the other hand, there exist points s
0
=
0
+it
0
with
0
arbitrarily close to
c
such that the series converges at s
0
. By Proposition
4.5 it follows that, given such a point s
0
, the series F(s) converges at every
point s with >
0
, and the convergence is uniform in compact subsets of
>
0
. Since
0
can be taken arbitrarily close to
c
, it follows that F(s)
converges in the half-plane >
c
, and that the convergence is uniform on
compact subsets of this half-plane.
To obtain the last assertion of the theorem, note rst that the in-
equality
c
a
holds trivially since absolute convergence implies con-
vergence. Moreover, if F(s) =
n=1
f(n)n
s
converges at some point
s = s
0
=
0
+ it
0
, then f(n)n
s
0
tends to zero as n , so that, in
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 119
particular, [f(n)n
s
0
[ 1 for n n
0
, say. Hence, for n n
0
and any
s we have [f(n)n
s
[ n
(
0
)
, and since the series
n=1
n
(
0
)
con-
verges whenever >
0
+ 1, it follows that F(s) converges absolutely in
>
0
+ 1. Since
0
can be taken arbitrarily close to
c
, this implies that
a
c
+ 1.
We are now ready to prove the most important result about Dirichlet
series, namely that Dirichlet series are analytic functions of s in their half-
plane of convergence. It is this result that allows one to apply the powerful
apparatus of complex analysis to the study of arithmetic functions.
Theorem 4.7 (Analytic properties of Dirichlet series). A Dirich-
let series F(s) =
n=1
f(n)n
s
represents an analytic function of s in its
half-plane of convergence >
c
. Moreover, in the half-plane of conver-
gence, the Dirichlet series can be dierentiated termwise, that is, we have
F
(s) =
n=1
f(n)(log n)n
s
, and the latter series also converges in this
half-plane.
Proof. Let F
N
(s) =
N
n=1
f(n)n
s
denote the partial sums of F(s). Since
each term f(n)n
s
= f(n)e
s(log n)
is an entire function of s, the functions
F
N
(s) are entire. By Theorem 4.6, as N , F
N
(s) converges to F(s),
uniformly on compact subsets of the half-plane >
c
. By Weierstrass
theorem on uniformly convergent sequences of analytic functions, it follows
that F(s) is analytic in every compact subset of the half-plane >
c
, and
hence in the entire half-plane. This proves the rst assertion of the theo-
rem. The second assertion regarding termwise dierentiation follows since
the nite partial sums F
N
(s) can be termwise dierentiated with derivative
F
N
(s) =
N
n=1
f(n)(log n)n
s
, and since, by another application of Weier-
strass theorem, the derivatives F
N
(s) converge to F
n=1
f(n)n
s
and G(s) =
n=1
g(n)n
s
are Dirichlet series with
nite abscissa of convergence that satisfy F(s) = G(s) for all s with suf-
ciently large. Then f(n) = g(n) for all n.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
120 CHAPTER 4. ARITHMETIC FUNCTIONS III
Proof. Set h(n) = f(n) g(n) and let H(s) = F(s) G(s) be the Dirichlet
series for h. By the hypotheses of the theorem there exists
0
such that
H(s) converges absolutely in the half-plane
0
, and is identically 0
in this half-plane. We need to show that h(n) = 0 for all n. To get a
contradiction, suppose h is not identically 0, and let n
0
be the smallest
positive integer n such that h(n) ,= 0. Then H(s) = h(n
0
)n
s
0
+ H
1
(s),
where H
1
(s) =
n=n
0
+1
h(n)n
s
. Since H(s) = 0 for
0
, it follows
that, for any
0
, h(n
0
)n
0
= H
1
(), and hence
[h(n
0
)[ [H
1
()[n
n=n
0
+1
[h(n)[
n
0
n
.
Setting =
0
+ with 0, we have, for n n
0
+ 1,
n
0
n
=
_
n
0
n
_
_
n
0
0
n
0
_
_
n
0
n
0
+ 1
_
_
n
0
0
n
0
_
,
so that
[h(n
0
)[ n
0
0
_
n
0
n
0
+ 1
_
n=n
0
+1
[h(n)[
n
0
= C
0
_
n
0
n
0
+ 1
_
,
where C
0
= n
0
0
n=n
0
+1
[h(n)[n
0
is a (nite) constant, independent of ,
by the absolute convergence of H(
0
). Letting , the right-hand side
tends to zero, contradicting our hypothesis that h(n
0
) ,= 0.
Corollary 4.9 (Computing convolution inverses via Dirichlet se-
ries). Let F(s) =
n=1
f(n)n
s
be a Dirichlet series with nite abscissa of
convergence, and suppose that 1/F(s) = G(s), where G(s) =
n=1
g(n)n
s
is a Dirichlet series with nite abscissa of convergence. Then g is the con-
volution inverse of f.
Proof. Let h = f g, and let H(s) be the Dirichlet series for h. By the
hypotheses of the corollary and Theorems 4.4 and 4.6, the series F(s) and
G(s) converge absolutely in a half-plane
0
. By Theorem 4.1 it then
follows that H(s) also converges absolutely in the same half-plane and is
equal to F(s)G(s) there. On the other hand, since G(s) = 1/F(s), we
have H(s) = 1 =
n=1
e(n)n
s
. By the uniqueness theorem it follows that
h(n) = e(n) for all n, i.e., we have h = f g = e.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 121
Application: Proving identities for arithmetic functions via Dirich-
let series. The uniqueness theorem and its corollary provide a new method
for obtaining identities among arithmetic functions and computing convo-
lution inverses. In order to prove an identity of the form f(n) g(n), it
suces to show that the corresponding Dirichlet series converge and are
equal for suciently large . In practice, this is usually carried out by
algebraically manipulating the Dirichlet series for f(n) to obtain another
Dirichlet series and then reading o the coecients of the latter Dirichlet
series, to conclude that these coecients must be equal to those in the origi-
nal Dirichlet series. In most cases, the functions involved are multiplicative,
so that the Dirichlet series can be written as Euler products, and it is the
individual factors in the Euler product that are manipulated. We illustrate
this technique with the following examples.
Examples
(1) Alternate proof of the identity = id. Using only the multi-
plicativity of and the denition of at prime powers, we have shown
above that the Dirichlet series of is equal to (s 1)/(s). Since
(s 1) and 1/(s) are, respectively, the Dirichlet series of the func-
tions id and , (s 1)/(s) is the Dirichlet series of the convolution
product id. Since both of these series converge absolutely for > 2,
we can apply the uniqueness theorem for Dirichlet series to conclude
that = id.
(2) Computing square roots of completely multiplicative func-
tions. Given a completely multiplicative function f, we want to nd
a function g such that g g = f. To this end note that if g satises
g g = f, then the corresponding Dirichlet series G(s) must satisfy
G(s)
2
= F(s), provided G(s) converges absolutely. Thus, we seek
a function whose Dirichlet series is the square root of the Dirichlet
series for f. Now, since f is completely multiplicative, its Dirichlet
series has an Euler product with factors of the form (1 f(p)p
s
)
1
.
Taking the square root of this expression and using the binomial series
(1 +x)
1/2
=
n=0
_
1/2
n
_
x
n
gives
_
1
f(p)
p
s
_
1/2
= 1 +
m=1
_
1/2
m
_
(1)
m
f(p)
m
p
ms
.
The latter series can be identied as the p-th factor of the Eu-
ler product of the multiplicative function g dened by g(p
m
) =
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
122 CHAPTER 4. ARITHMETIC FUNCTIONS III
(f(p))
m
_
1/2
m
_
. Let G(s) be the Dirichlet series for g. Then G(s)
2
=
F(s), and the uniqueness theorem yields g g = f provided both se-
ries G(s) and F(s) have nite abscissa of convergence. The bound
[
_
1/2
m
_
[ = [(1)
m
_
2m
m
_
[ 2
2m
shows that this convergence condition
is satised if, for example, the values f(p) are uniformly bounded.
(3) Computing convolution inverses. A direct computation of convo-
lution inverses requires solving an innite system of linear equations,
but Dirichlet series often allow a quick computation of an inverse. As
an application of Corollary 4.9, consider the function f dened by
f(1) = 1, f(2) = 1, and f(n) = 0 for n 3. This function has
Dirichlet series F(s) = 1 2
s
, which is an entire function of s. The
reciprocal of F(s) is given by 1/F(s) = (1 2
s
)
1
=
k=0
2
ks
.
Writing this series in the form G(s) =
n=1
g(n)n
s
, and reading o
the coecients g(n), we see that 1/F(s) is the Dirichlet series of the
function g dened by g(n) = 1 if n = 2
k
for some nonnegative integer
k, and g(n) = 0 otherwise. The series G(s) converges absolutely for
> 0. Hence, Corollary 4.9 is applicable and shows that g is the
convolution inverse of f.
(4) Evaluating functions via Dirichlet series. Another type of ap-
plication is illustrated by the following example. Let f
k
(n) denote the
characteristic function of k-free integers. In order to estimate the par-
tial sums
nx
f
k
(n) (i.e., the number of k-free positive integers x),
a natural approach is to use the convolution method with the function
1 as the approximating function. This requires computing the per-
turbation factor g
k
dened by f
k
= 1 g
k
. If F
k
and G
k
denote the
Dirichlet series of f
k
and g
k
, respectively, then G
k
(s) = F
k
(s)/(s). In
the previous section, we computed F
k
(s) as F
k
(s) = (s)/(ks), so
G
k
(s) =
1
(ks)
=
p
_
1
1
p
ks
_
.
The latter product is the Euler product of the Dirichlet series for
the multiplicative function g
k
dened by g
k
(p
m
) = 1 if m = k and
g
k
(p
m
) = 0 otherwise, i.e., g
k
(n) = (n
1/k
) if n is a k-th power, and
g
k
(n) = 0 otherwise. Since all series involved converge absolutely for
> 1, the uniqueness theorem applies, and we conclude that the
coecients of the latter series and those of G
k
(s) must be equal, i.e.,
we have g
k
g
k
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 123
(5) Wintners theorem for multiplicative functions. In the termi-
nology and notation of Dirichlet series, Wintners theorem (Theorem
2.19) states that if f = 1 g and if the Dirichlet series G(s) of g
converges absolutely at s = 1, then the mean value M(f) exists and
is equal to G(1). This result holds for arbitrary arithmetic functions
f and g satisfying the above conditions, but if these functions are
multiplicative, then one can express the mean value M(f) as an Eu-
ler product, and one can check the condition that the Dirichlet se-
ries G(s) converges absolutely at s = 1 by the criterion of Theorem
4.3: Applying Theorem 4.3 to the function g = f , noting that
g(p
m
) = f(p
m
) f(p
m1
) for every prime power p
m
, and using the
fact that if f is multiplicative, then so is g = f , we obtain the
following version of Wintners theorem for multiplicative functions:
Suppose f is multiplicative and satises
p
m
[f(p
m
) f(p
m1
)[
p
m
< .
Then the mean value M(f) exists and is given by
M(f) =
p
_
1 +
m=1
f(p
m
) f(p
m1
p
m
_
=
p
_
1
1
p
_
_
1 +
m=1
f(p
m
)
p
m
_
.
4.4 Dirichlet series and summatory functions
4.4.1 Mellin transform representation of Dirichlet series
As we have seen in Chapter 2, to investigate the behavior of arithmetic
functions one usually considers the associated summatory functions
(4.7) M(f, x) =
nx
f(n),
or weighted versions of those sums, such as the logarithmic sums
(4.8) L(f, x) =
nx
f(n)
n
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
124 CHAPTER 4. ARITHMETIC FUNCTIONS III
In contrast to the individual values f(n), which for most natural arithmetic
functions oscillate wildly and show no discernable pattern when n , the
summatory functions M(f, x) and L(f, x) are usually well-behaved and can
be estimated in a satisfactory manner. Most results and problems on arith-
metic functions can be expressed in terms of these summatory functions.
For example, as we have seen in Chapter 2, Mertens estimates show that
L(, x) = log x+O(1) and the PNT is equivalent to the asymptotic formula
M(, x) x.
It is therefore natural to try to express the Dirichlet series F(s) of an
arithmetic function f in terms of the summatory functions M(f, x) and vice
versa, to exploit this to translate between properties of the analytic function
F(s) those of the arithmetic quantity M(f, x).
In one direction, namely going from M(f, x) to F(s), this is rather easy.
The key result is the following theorem which expresses F(s) as an integral
over M(f, x) and is a restatement (in a slightly dierent notation) of Theo-
rem 2.15. The converse direction is considerably more dicult, and will be
considered in a separate section.
Theorem 4.10 (Mellin transform representation of Dirichlet se-
ries). Let F(s) =
n=1
f(n)n
s
be a Dirichlet series with nite abscissa
of convergence
c
, and let M(f, x) and L(f, x) be given by (4.7) and (4.8),
respectively. Then we have
F(s) = s
_
1
M(f, x)x
s1
dx ( > max(0,
c
)), (4.9)
F(s) = (s 1)
_
1
L(f, x)x
s
dx ( > max(1,
c
)). (4.10)
Proof. We rst show that the second relation follows from the rst, applied
to the function
f(n) = f(n)/n, with s = s 1 in place of s, and
F(s) =
n=1
f(n)n
s
in place of F(s). To this end, observe rst that L(f, x) =
M(
f, x), so the right-hand side of (4.10) becomes the right-hand side of
(4.9) with
f in place of f and s = s 1 in place of s. Moreover, we have
F(s) =
F(s 1) =
F( s), so the left-hand sides of these relations are also
equal under these substitutions. Finally, since
F(s) = F(s+1), the abscissa
of convergence
c
of
F is equal to
c
1, so the condition > max(0,
c
) in
(4.9) translates into 1 > max(0,
c
1), or, equivalently, > max(1,
c
),
which is the condition in (4.10).
The rst relation, (4.9), was already proved in Theorem 2.15 of Chap-
ter 2, as an application of partial summation. We give here an alternate
argument: Let f and F(s) be given as in the theorem, and x s with
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 125
> min(0,
c
), so that the Dirichlet series F(s) converges at s. Write
M(f, x) =
n=1
(n, x)f(n),
where
(x, n) =
_
1 if n x,
0 if n > x.
Then, for every X 1,
s
_
X
1
M(f, x)x
s1
dx = s
_
X
1
n=1
(x, n)f(n)x
s1
dx
= s
_
X
1
nX
(x, n)f(n)x
s1
dx
= s
nX
f(n)
_
X
1
(x, n)x
s1
dx
= s
nX
f(n)
_
X
n
x
s1
dx
= s
nX
f(n)
1
s
_
1
n
s
1
X
s
_
=
nX
f(n)
n
s
1
X
s
M(f, X).
Now let X . Then, by the convergence of F(s), the rst term on the
right tends to F(s). Moreover, Kroneckers Lemma (Theorem 2.12) implies
that the second term tends to 0. Hence we conclude
lim
X
s
_
X
1
M(f, x)x
s1
dx = F(s),
which proves (4.9).
Despite its rather elementary nature and easy proof, this result has a
number of interesting and important applications, as we will illustrate in
the following subsections.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
126 CHAPTER 4. ARITHMETIC FUNCTIONS III
4.4.2 Analytic continuation of the Riemann zeta function
As a rst application of Theorem 4.10, we give an integral representation for
the Riemann zeta function that is valid in the half-plane > 0 and provides
an analytic continuation of (s) to this half-plane.
Theorem 4.11 (Integral representation and analytic continuation
of the zeta function). The Riemann zeta function, dened for > 1 by
the series
(4.11) (s) =
n=1
1
n
s
,
has an analytic continuation to a function dened on the half-plane > 0
and analytic in this half-plane with the exception of a simple pole at s = 1
with residue 1, given by
(4.12) (s) =
s
s 1
s
_
1
xx
s1
dx ( > 0).
Remark. Strictly speaking, we should use a dierent symbol, say
(s), for
the analytic continuation dened by (4.12). However, to avoid awkward
notations, it has become standard practice to denote the analytic continu-
ation of a Dirichlet series by the same symbol as the series itself, and we
will usually follow this practice. That said, one should be aware that the
validity of the series representation in general does not extend to the region
in which the Dirichlet series is analytic (in the sense of having an analytic
continuation there). For example, the Dirichlet series representation (4.11)
of the zeta function diverges at every point in the half-plane <
c
= 1
(and even at every point on the line = 1, as one can show by Eulers
summation), and thus is not even well dened outside the half-plane > 1.
By contrast, the representation (4.12) is well-dened in the larger half-plane
> 0 and represents an analytic function there.
Proof. Applying Theorem 4.10 with f 1, F(s) = (s),
c
= 1, and
M(f, x) = [x], we obtain
(s) = s
_
1
[x]x
s1
dx ( > 1).
Setting [x] = x x, where x is the fractional part of x, we can write
the last integral as
_
1
x
s
dx
_
1
xx
s1
dx =
1
s 1
_
1
xx
s1
dx,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 127
and thus obtain the representation (4.12) in the half-plane > 1. Now note
that, given > 0, the integral in (4.12) is bounded, for any s with , by
_
1
xx
s1
dx
_
1
x
1
dx
_
1
x
1
dx =
1
.
Hence this integral converges absolutely and uniformly in the half-plane
and therefore represents an analytic function of s in the half-plane
. Since > 0 can be taken arbitrarily small, this function is in fact
analytic in the half-plane > 0. It follows that the right-hand side of (4.12)
is an analytic function in this half-plane, with the exception of the pole at
s = 1 with residue 1, coming from the term s/(s 1). This provides the
asserted analytic continuation of (s) to the half-plane > 0.
As an immediate consequence of the representation (4.12) for (s), we
obtain an estimate for (s) near the point s = 1.
Corollary 4.12 (Estimate for (s) near s = 1). For [s1[ 1/2, s ,= 1,
we have
(s) =
1
s 1
+ +O([s 1[),
where is Eulers constant.
Proof. By Theorem 4.11, the function (s) 1/(s1) is analytic in the disk
[s 1[ < 1 and therefore has a power series expansion
(s)
1
s 1
=
n=0
a
n
(s 1)
n
in this disk. It follows that
(s)
1
s 1
= a
0
+O([s 1[)
in the disk [s 1[ 1/2. Thus it remains to show that the constant a
0
is
equal to . By (4.12) we have, in the half-plane > 0,
(s)
1
s 1
= 1 s
_
1
xx
s1
dx.
Letting s 1 on the right-hand side, we get
a
0
= lim
s1
_
(s)
1
s 1
_
= 1
_
1
xx
2
dx.
Now, from the proof of the harmonic sum estimate (Theorem 2.5) we have
= 1
_
1
xx
2
dx, so we obtain a
0
= as claimed.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
128 CHAPTER 4. ARITHMETIC FUNCTIONS III
4.4.3 Lower bounds for error terms in summatory functions
We begin with a general result relating error terms in estimates for the
summatory functions M(f, x) to the region of analyticity of the Dirichlet
series F(s).
Theorem 4.13 (Error terms in estimates for M(f, x) and analyticity
of F(s)).
(i) If M(f, x) = O(x
+ O(x
) in
case (ii), so the Dirichlet series F(s) has nite abscissa of convergence, and
Theorem 4.10 can therefore be applied in both cases.
(i) If M(f, x) = O(x
+O(x
), we set M(f, x) = Ax
+M
1
(f, x), and split
the integral on the right of (4.9) into a sum of two integrals corresponding
to the terms Ax
and M
1
(f, x). Since M
1
(f, x) = O(x
(x
).
Theorem 4.13 then implies that the Dirichlet series for the Moebius function,
namely
n=1
(n)n
s
= 1/(s), is analytic in the half-plane > . This in
turn implies that (s) is meromorphic in this half-plane and satises
(4.14) (s) has no zeros in > .
Thus, the quality of estimates for M(, x), and hence the quality of the error
term in the PNT, depends on the zero-free region of the Riemann zeta
function, and specically the values for which (4.14) holds. Unfortunately,
very little is known in this regard. The current state of knowledge can be
summarized as follows:
(4.14) holds for > 1. (This follows immediately from the Euler
product representation for (s).)
It is known that (s) has innitely many zeros with real part 1/2, so
(4.14) does not hold for any value < 1/2. (The proof of this is not
easy and beyond the scope of this course.)
For = 1/2, statement (4.14) is the Riemann Hypothesis, the most
famous problem in number theory. It is easily seen that (4.14) holds
for = 1/2 if and only if it holds for all > 1/2.
It is not known whether there exists some with 1/2 < 1 for which
(4.14) holds (which would be a weak form of the Riemann Hypothesis).
From these remarks and Theorem 4.13 we have the following result.
Theorem 4.14 (Lower bounds for the error term in the PNT). The
estimate (4.13) does not hold for any < 1/2. If it holds for = 1/2 + ,
for any > 0, then the Riemann Hypothesis follows.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
130 CHAPTER 4. ARITHMETIC FUNCTIONS III
Remarks. (i) By a similar argument, using part (ii) of Theorem 4.13 and
the fact that the Dirichlet series for (n) is
(x
)
implies (4.14). Since (4.14) known to be false when < 1/2, (4.15) does not
hold if < 1/2. Moreover, if (4.15) holds for = 1/2 + , for every > 0,
then the Riemann Hypothesis follows.
(ii) The converse of the above statements also holds, though this requires
an entirely dierent argument, which is beyond the scope of this course.
Namely, if the Riemann Hypothesis is true, then (4.13) and (4.15) hold
for any > 1/2, i.e., the error terms in these estimates are essentially
(namely, up to a factor x
) of size O(
px
_
1
1
p
_
=
e
C
log x
_
1 +O
_
1
log x
__
(x 2),
with an unspecied constant C. We will now show this constant is equal to
the Euler constant , as claimed in Theorem 3.4.
Taking logarithms in (4.16), we see that (4.16) is equivalent to
(4.17) log P(x) = log log x +C +
_
1
log x
_
(x 2),
where
P(x) =
px
_
1
1
p
_
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 131
Now,
log P(x) =
px
log
_
1
1
p
_
=
px
m=1
1
mp
m
=
x
1
mp
m
+O
_
_
px
m>log x/ log p
1
p
m
_
_
=
x
1
mp
m
+O
_
_
px
1
x
_
_
= L(f, x) +O
_
(x)
x
_
= L(f, x) +O
_
1
log x
_
,
where f is the function dened by
f(n) =
_
1
m
if n = p
m
,
0 otherwise,
and L(f, x) =
nx
f(n)/n is the logarithmic summatory function of f,
as dened in Theorem 4.10. Thus, (4.17) is equivalent to
(4.18) L(f, x) = log log x +C +
_
1
log x
_
(x 2).
We now relate f to the Riemann zeta function. Let s be real and greater
than 1. Expanding (s) into an Euler product and taking logarithms, we
obtain
log (s) = log
p
_
1
1
p
s
_
1
=
p
log
_
1
1
p
s
_
1
=
m=1
1
mp
ms
= F(s),
where F(s) is the Dirichlet series of f(n). On the other hand, Corollary 4.12
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
132 CHAPTER 4. ARITHMETIC FUNCTIONS III
gives
log (s) = log
_
1
s 1
(1 +O(s 1))
_
= log
1
s 1
+ log(1 +O(s 1))
= log
1
s 1
+O(s 1) (1 < s < s
0
),
for a suitable s
0
with 1 < s
0
< 3/2. Thus, we have
(4.19) F(s) = log
1
s 1
+O(s 1) (1 < s < s
0
).
Next, we apply Theorem 4.10 to express F(s) as an integral over L(f, x).
Since 0 f(n) 1, the abscissa of convergence of F(s) is 1, so the
representation given by this theorem is valid in > 1. Noting that L(f, x) =
0 for x < 2, we obtain, in the half-plane > 1,
F(s) = (s 1)
_
2
L(f, x)x
s
dx.
Substituting the estimate (4.18) for L(f, x), we get
F(s) = (s 1)
_
2
_
log log x +C +O
_
1
log x
__
x
s
dx (4.20)
= (s 1)
_
log 2
_
log u +C +O
_
1
u
__
e
u(s1)
du
=
_
(s1) log 2
_
log
1
s 1
+ log v +C +O
_
s 1
v
__
e
v
dv.
We now restrict s to the interval 1 < s < s
0
(< 3/2) and estimate the
integral on the right of (4.20). The contribution of the O-term to this
integral is bounded by
(s 1)
_
(s1) log 2
e
v
v
dv
(s 1)
_
log
1
(s 1) log 2
+
_
1
e
v
dv
_
(s 1)
_
log
1
s 1
+O(1)
_
(s 1) log
1
s 1
,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 133
since 1 log 1/(s 1) by our assumption 1 < s < s
0
< 3/2. In the integral
over the terms log 1/(s1) +log v +C we replace the lower integration limit
by 0, which introduces an error of order
_
(s1) log 2
0
_
log
1
s 1
+[ log v[ +[C[
_
dv (s 1) log
1
s 1
.
With these estimates, (4.20) becomes
F(s) = log
1
s 1
_
0
e
v
dv +
_
0
(log v)e
v
dv +C
_
0
e
v
dv (4.21)
+O
_
(s 1) log
1
s 1
_
= log
1
s 1
+I +C +O
_
(s 1) log
1
s 1
_
,
where
I =
_
0
(log v)e
v
dx
Equating the estimates (4.21) and (4.19) for F(s) we get
log
1
s 1
+O(s1) = log
1
s 1
+I+C+O
_
(s 1) log
1
s 1
_
(1 < s < s
0
).
Letting s 1+, the error terms here tends to zero, and we therefore con-
clude that
C = I =
_
0
(log v)e
v
dv.
The integral I can be found in many standard tables of integrals (e.g., Grad-
sheyn and Ryzhik, Table of integrals, series, and products) and is equal
to . Hence C = , which is what we wanted to show.
4.5 Inversion formulas
In this section, we consider the converse problem of representing the par-
tial sums M(f, x) in terms of the Dirichlet series F(s). This is a more
dicult problem than that of expressing F(s) in terms of M(f, x), and the
resulting formulas are more complicated, involving complex integrals, usu-
ally in truncated form with error terms, because of convergence problems.
However, such inversion formulas are essential in applications such as the
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
134 CHAPTER 4. ARITHMETIC FUNCTIONS III
analytic proof of the prime number theorem, since in those applications ana-
lytic information on the generating Dirichlet series of an arithmetic function
is available and one needs to translate that information into information on
the behavior of the partial sums of the arithmetic function.
Formulas expressing M(f, x), or similar functions, in terms of F(s), are
collectively known as Perron formulas. We prove here two such formulas,
one for M(f, x), and the other for an average version of M(f, x), dened by
(4.22) M
1
(f, x) =
_
x
1
M(f, y)dy =
nx
f(n)(x n).
(The second identity here follows by writing M(f, y) =
ny
f(n) and
inverting the order of summation and integration.)
The proof of these formulas rests on the evaluation of certain complex
integrals, which we state in the following lemma.
Lemma 4.15. Let c > 0, and for T > 0 and y > 0 set
(4.23) I(y, T) =
1
2i
_
c+iT
ciT
y
s
s
ds, I
1
(y) =
1
2i
_
c+i
ci
y
s
s(s + 1)
ds.
(i) Given T > 0 and y > 0, y ,= 1, we have
(4.24)
_
_
[I(y, T) 1[
y
c
T log y
if y > 1,
[I(y, T)[
y
c
T[ log y[
if 0 < y < 1.
(ii) For all y > 0 we have
(4.25) I
1
(y) =
_
_
1
1
y
_
if y > 1,
0 if 0 < y 1.
Proof. (i) Suppose rst that y > 1; we seek to estimate [I(y, T) 1[. Given
b < 0, we apply the residue theorem, replacing the path [c iT, c + iT]
by the path consisting of the two horizontal segments [c iT, b iT] and
[b + iT, c + iT] and the vertical segment [b iT, b + iT]. In doing so, we
pick up a residue equal to 1 from the pole of the integrand y
s
/s at s = 0. It
remains to estimate the integral over the new path. On the vertical segment
[b iT, b + iT], the integrand is bounded by [y
s
[/[s[ y
b
/[b[, and so the
integral over [b iT, b + iT] is bounded by 2Ty
b
/[b[. Since y > 1, this
bound tends to 0 as b .
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 135
On the two horizontal segments we have [y
s
/s[ y
1
2
_
c
b
y
T
d
1
2
_
c
T
d =
y
c
2T log y
.
Letting b , we conclude that, for y > 1, the integral I(y, T)
diers from 1 by at most twice the above bound, i.e., an amount
(1/)y
c
/(T log y), as claimed.
In the case 0 < y < 1, we apply a similar argument, except that we now
move the path of integration to a line = a to the right of the line = c,
with the new path consisting of the horizontal segments [c iT, a iT]
and [a + iT, c + iT] and the vertical segment [a iT, a + iT]. As before,
the contribution of the vertical segment tends to 0 on letting a ,
whereas the contribution of each of the horizontal segments is at most
(1/2)
_
c
(y
/T)d y
c
/(2T[ log y[). This time, however, there is no
residue contribution, since the integrand has no poles in the region enclosed
by the old and new paths of integration. Hence, for 0 < y < 1, we have
[I(y, T)[ y
c
/(T log [y[).
(ii) Considering rst the integral over a nite line segment [ciT, c+iT]
and treating this integral as that of (i) by shifting the path of integration,
we obtain in the case y > 1 a contribution coming from the residues of
y
s
/(s(s + 1)) at the poles s = 0 and s = 1, namely 1 1/y, and an
error term that tends to 0 as T . Letting T , we conclude that
I
1
(y) = (11/y) in the case y > 1. If 0 < y < 1, the same argument applies,
but without a residue contribution, so in this case we have I
1
(y) = 0. Hence
(4.25) holds for all y > 0 except possibly y = 1. To deal with the remaining
case y = 1, we use a continuity argument: It is easily veried that the left
and right-hand sides of (4.25) are continuous functions of y > 0. Since both
sides are equal for y > 1, if follows that the equality persists when y = 1.
We are now ready to state the two main results of this section, which
give formulas for M
1
(f, x) and M(f, x) as complex integrals over F(s).
Theorem 4.16 (Perron Formula for M
1
(f, x)). Let f(n) be an arithmetic
function, and suppose that the Dirichlet series F(s) =
n=1
f(n)n
s
has
nite abscissa of absolute convergence
a
. Let M
1
(f, x) be dened by (4.22)
and (4.7). Then we have, for any c > max(0,
a
) and any real number
x 1,
(4.26) M
1
(f, x) =
1
2i
_
c+i
ci
F(s)
x
s+1
s(s + 1)
ds.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
136 CHAPTER 4. ARITHMETIC FUNCTIONS III
Remark. Since, on the line of integration, [x
s+1
[ = x
c+1
and [F(s)[
n=1
[f(n)[n
c
< , the integrand is bounded by x
c+1
/[s[
2
. Thus, the
integral in (4.26) converges absolutely. By contrast, the formula for M(f, x)
(see Theorem 4.17 below) involves an integral over F(s)x
s+1
/s, which is
only conditionally convergent.
Proof. Ignoring questions of convergence for the moment, we obtain (4.26)
by writing the right-hand side as
1
2i
_
c+i
ci
n=1
f(n)
n
s
x
s+1
s(s + 1)
ds =
1
2i
n=1
f(n)
_
c+i
ci
(x/n)
s
x
s(s + 1)
ds
=
n=1
f(n)xI
1
(x/n) =
nx
f(n)(x n) = M
1
(f, x),
using the evaluation of I
1
(y) given by Lemma 4.15. To justify the inter-
changing of the order of integration and summation, we note that
_
c+i
ci
n=1
[f(n)[
[n
s
[
x
s+1
s(s + 1)
[ds[
x
c+1
n=1
[f(n)[
n
c
_
c+i
ci
x
c+1
[s(s + 1)[
[ds[ < ,
since, by the assumption c > max(0,
a
), we have
n=1
[f(n)[n
c
< and
_
c+i
ci
1
[s(s + 1)[
[ds[
_
c+i
ci
1
[s[
2
[ds[ =
_
1
c
2
+t
2
dt < .
Theorem 4.17 (Perron Formula for M(f, x)). Let f(n) be an arith-
metic function, and suppose that the Dirichlet series F(s) =
n=1
f(n)n
s
has nite abscissa of absolute convergence
a
. Then we have, for any
c > max(0,
a
) and any non-integral value x > 1,
(4.27) M(f, x) =
1
2i
_
c+i
ci
F(s)
x
s
s
ds,
where the improper integral
_
c+i
ci
is to be interpreted as the symmetric limit
lim
T
_
c+iT
ciT
. Moreover, given T > 0, we have
(4.28) M(f, x) =
1
2i
_
c+iT
ciT
F(s)
x
s
s
ds +R(T),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 137
where
(4.29) [R(T)[
x
c
T
n=1
[f(n)[
n
c
[ log(x/n)[
.
Remark. The restriction to non-integral values of x in the innite version
of Perrons formula (4.27) can be dropped if we replace the function M(f, x)
by the interpolation between its left and right limits, namely M
(f, x) =
(1/2)(M(f, x) +M(f, x+)). This can be proved in the same manner using
the following evaluation of the integral I(y, T) in the case y = 1:
I(1, T) =
1
2i
_
c+iT
ciT
1
s
ds =
1
2
_
T
T
c it
c
2
+t
2
dt
=
1
2
_
T
T
c
c
2
+t
2
dt =
1
2
_
T/c
T/c
1
1 +u
2
du
=
1
2
(arctan(T/c) arctan(T/c)) ,
which converges to (1/2)(/2 (/2)) = 1/2 as T .
However, in applications the stated version is sucient, since for any
integer N, M(f, N) is equal to M(f, x) for N < x < N + 1 and one can
therefore apply the formula with such a non-integral value of x. Usually
one takes x to be of the form x = N + 1/2 in order to minimize the eect
a small denominator log(x/n) on the right-hand side of (4.29) can have on
the estimate.
Proof. The formula (4.27) follows on letting T in (4.28), so it suces
to prove the latter formula. To this end we proceed as in the proof of
Theorem 4.16, substituting F(s) =
n=1
f(n)n
s
in the rst (main) term
on the right-hand side of (4.28), to obtain
1
2i
_
c+iT
ciT
F(s)
x
s
s
ds =
n=1
f(n)I(x/n, T).
The interchanging of integration and summation is again permissible, since
the range of integration is a compact interval, [c iT, c + iT], and the
series
n=1
f(n)n
s
converges absolutely and uniformly on that interval.
Estimating I(x/n, T) by Lemma 4.15, we obtain
n=1
f(n)I(x/n, T) =
nx
f(n) +E(T) = M(f, x) +E(T),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
138 CHAPTER 4. ARITHMETIC FUNCTIONS III
where
[E(T)[
n=1
[f(n)[
(x/n)
c
T[ log(x/n)[
=
x
c
T
n=1
[f(n)[
n
[ log(x/n)[
.
Collecting these estimates yields (4.28), with R(T) = E(T) satisfying
(4.29), as required.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 139
4.6 Exercises
4.1 Let F(s) =
m,n=1
[m, n]
s
. Determine the abscissa of convergence
c
of F(s) and express F(s) in terms of the Riemann zeta function.
(Hint: Express F(s) as
n=1
f(n)n
s
, where f(n) = #(a, b) N
2
:
[a, b] = n, and represent the latter as an Euler product.)
4.2 Express the Dirichlet series
n=1
d(n)
2
n
s
in terms of the Riemann
zeta function. Then use this relation to derive a convolution identity
relating the functions d
2
(n) and d
4
(n) (where d
k
(n)#(a
1
, . . . , a
k
)
N
k
: a
1
. . . a
k
= n is the generalized divisor function).
4.3 Let f(n) =
d|n
(log d)/d, and let F(s) =
n=1
f(n)n
s
. Evaluate
F(s) in terms of the Riemann zeta function.
4.4 Evaluate the series
(m
1
,...,mr)=1
m
s
1
m
s
r
, where the summation
is over all tuples (m
1
, . . . , m
r
) of positive integers that are relatively
prime, in terms of the Riemann zeta function.
4.5 Let f(n) be the unique positive real-valued arithmetic function that
satises
d|n
f(d)f(n/d) = 1 for all n (i.e., f is the positive Dirichlet
square root of the function 1). Let F(s) =
n=1
f(n)n
s
be the
Dirichlet series of F(s).
(i) Express F(s) for > 1 in terms of the Riemann zeta function.
(ii) Find an explicit formula for f(p
k
), where p is prime and k 1.
4.6 For each of the following functions f(n) determine the abscissa of
convergence
c
and the abscissa of absolute convergence
a
of the
associated Dirichlet series.
(i) f(n) = (n) (where (n) is the number of distinct prime factors
of n)
(ii) f(n) = e
2in
, where R Z
(iii) f(n) = n
in
, where Z
(iv) f(n) = d
k
(n) = #(a
1
, . . . , a
k
) N
k
: a
1
. . . a
k
= n (the gener-
alized divisor function)
(v) f(n) any periodic function with period q and
q
n=1
f(n) = 0.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
140 CHAPTER 4. ARITHMETIC FUNCTIONS III
4.7 Let
1
and
2
be real numbers with
1
2
1
+ 1. Construct an
arithmetic function whose Dirichlet series has abscissa of convergence
c
=
1
and abscissa of absolute convergence
a
=
2
.
4.8 Let f(n) be an arithmetic function satisfying
S(x) =
nx
f(n) = Ax
+Bx
+O(x
) (x 1),
where > > 0 are real numbers and A and B are non-zero
real numbers. Let F(s) =
n1
f(n)n
s
be the generating Dirichlet
series for f. Find, with proof, a half-plane (as large as possible) in
which F(s) is guaranteed to have a meromorphic continuation, and
determine all poles (if any) of (the meromorphic continuation of) F(s)
in that region, and the residues of F at those poles.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Chapter 5
Distribution of primes II:
Proof of the Prime Number
Theorem
5.1 Introduction
In this chapter we give an analytic proof of the Prime Number Theorem
(PNT) with error term. In its original form, the PNT is the assertion that
the number of primes, (x), satises
(5.1) (x)
x
log x
(x ),
but, as we have shown in Chapter 3, the PNT is equivalent to any one of
the relations
(x) Li(x) (x ), (5.2)
(x) x (x ), (5.3)
(x) x (x ), (5.4)
where
Li(x) =
_
x
2
dt
log t
,
and
(x) =
px
log p, (x) =
nx
(n).
141
142 CHAPTER 5. DISTRIBUTION OF PRIMES II
We will prove the PNT in the form (5.4); more precisely, we will establish
the following quantitative form (i.e., one with explicit error term) of this
relation.
Theorem 5.1 (Prime number theorem with error term). We have
(5.5) (x) = x +O(xexp(c(log x)
)) (x 2),
where c is a positive constant and = 1/10.
To gauge the quality of the error term, we note that, on the one hand,
xexp(c(log x)
)
k
x
(log x)
k
,
for any xed constant k, while, on the other hand,
xexp(c(log x)
x
1
for any xed > 0. (These estimates hold regardless of the specic value of
, as long as 0 < < 1.)
The PNT was proved independently, and essentially simultaneously, by
Jacques Hadamard and Charles de la Vallee Poussin at the end of the 19th
century. The proofs of Hadamard and de la Vallee Poussin both used an
analytic approach that had its roots in the work of Riemann some 50 years
earlier.
After the PNT had been proved, the main focus shifted to establishing
the PNT with as good an error term as possible. This problem is still wide
open, and what we know is very far from what is being conjectured.
Tables 5.1 and 5.2 list the principal milestones in this development, and
a more detailed description is given below. We will state all results in terms
of the form (5.4) of the PNT, but the error terms in the relations (5.3) and
(5.2) are essentially the same as for (5.4). (This is not true for the original
form (5.1) of the PNT, since the right-hand side, x/ log x, is only a crude
approximation to (x) that diers from the true size of (x) by a term
of order x/(log x)
2
; the correct approximation for (x) is the logarithmic
integral Li(x).)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 143
Author(s) Bound for
(x) x
Zerofree region
(t
= max([t[, 3))
Remarks
Chebyshev
(1851)
cx (x x
0
)
(c 0.1)
Chebyshev
bound
Hadamard,
de la Vallee
Poussin
(1896)
o(x) 1 Prime
Number
Theorem
De la Vallee
Poussin
(1899)
O
_
xe
c
log x
_
1
c
log t
Classical
error term
Littlewood
(1922)
O
_
xe
c
log t
Vinogradov
Korobov
(1958)
O
_
xe
c
(log x)
3/5
(log log x)
1/5
_
1
c(log log t
)
1/3
(log t
)
2/3
Current
record
O
(x
1/2+
), > 0 > 1/2 Riemann
Hypothesis
Table 5.1: The error term in the Prime Number Theorem, I
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
144 CHAPTER 5. DISTRIBUTION OF PRIMES II
Author(s) Bound for
(x) x
Remarks
Erd osSelberg (1949) o(x) First elementary proof
Bombieri, Wirsing
(1964)
O
A
_
x(log x)
A
_
,
any A > 0
First elementary proof
with error term
DiamondSteinig
(1970)
O
_
xe
c(log x)
_
,
any < 1/7
First elementary proof
with exponential error
term
LavrikSobirov (1973) O
_
xe
c(log x)
_
,
any < 1/6
Current conrmed
record for error term in
elementary proof
O
_
xe
c
log x
_
Likely limit of
elementary proofs
Table 5.2: The error term in the Prime Number Theorem, II: Elementary
proofs
The classical error term. (5.5) with = 1/2 was established by de
la Vallee Poussin shortly after his proof of the PNT. This is a stronger
result than the one we will prove here (with = 1/10), but to obtain
this error term requires considerably more machinery from complex
analysis than we have time to develop (such as the theory of entire
functions of nite order, Hadamard products, and the theory of the
Gamma function). The proof we will give goes back to E. Landau
in the early part of the 20th century and has the advantage that it
is a relatively low tech proof, requiring only a modest amount of
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 145
complex analysis.
Vinogradovs error term. The only signicant improvement over
de la Vallee Poussins error term is due to I.M. Vinogradov who, some
50 years ago, obtained (5.5) with = 3/5, for any xed > 0 (with
the constant c depending on ). Aside from minor improvements, in
which the was made precise, Vinogradovs result still represents
the current record in the error term of the PNT.
Error terms obtained by elementary methods. The rst ele-
mentary proof of the PNT was given by Erd os and Selberg in the
1940s. (Here elementary is to be interpreted in a technical sense
an elementary proof is one that avoids the use of tools from complex
analysis. Elementary in this context is not synonymous with sim-
ple; in fact, the restriction to elementary methods comes at the
expense of rendering the proof much longer, more complicated, and
less transparent.)
Other elementary proofs have since been given, but the early elemen-
tary proofs did not give explicit error terms, and most elementary
approaches to the PNT yield only very weak error terms. It wasnt
until the 1970s when Diamond and Steinig obtained a form of the PNT
by elementary methods that involved an exponential error term as in
(5.5), though only with an exponent = 1/7, which is smaller than
the exponents = 1/2 and = 3/5 in the results of de la Vallee
Poussin and Vinogradov. The current record for the value of in ele-
mentary error terms is only slightly larger, namely = 1/6 . This
still falls far short of the classical exponent = 1/2. Obtaining the
value = 1/2 by elementary means would be a major achievement,
and there are reasons to believe that this value represents the limit of
what can possibly be achieved by elementary methods.
The conjectured error term. Assuming primes behave, in some
appropriate sense, randomly, one might expect the error term in (5.5)
to be of size about the square root of the main term. Thus, a natural
conjecture would be that (x) = x +O
(x
1/2+
) for every xed > 0.
As we will see, this conjecture is equivalent to the Riemann Hypothesis.
Moreover, if true, it is best-possible; i.e., the exponent 1/2 here cannot
be replaced by a smaller exponent. An indication of how far we are
from proving such a result is the fact, noted above, that the error term
in (5.5) is greater than x
1
, for any xed > 0.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
146 CHAPTER 5. DISTRIBUTION OF PRIMES II
The proof of Theorem 5.1 will take up most of the remainder of this
chapter. We now give a brief outline of the argument.
Our starting point is Perrons formula in the form given by Theorem 4.16
for the function f(n) = (n). Since (n) has Dirichlet series
(s)/(s),
this formula gives
1
(f, x) =
1
2i
_
a+i
ai
_
(s)
(s)
_
x
s+1
s(s + 1)
ds
for any a > 1, where
1
(x) =
_
x
0
(y)dy.
We apply this formula initially with a value of a depending on x and slightly
larger than 1 (namely, a = 1 + 1/ log x), and then move part of the line of
integration to the left of the line = 1. Since (s) has a pole at s = 1,
the integrand has a pole at the same point, and passing over this pole we
pick up a contribution x
2
/2 from the residue of the integrand at s = 1. This
contribution will be the main term in the estimate for
1
(x). The error term
will come from bounding the integral over the shifted path of integration.
In order to obtain good estimates for the integrand, we need to, on the one
hand, move as far to the left of = 1 as possible (so that [x
s
[ = x
is small
compared to x). On the other hand, since any zero of (s) gives rise to a pole
of
n=1
1
n
s
( > 1),
i.e., (s) is the Dirichlet series for the arithmetic function 1. We begin by
collecting some elementary properties of this function, most of which have
been established earlier.
Theorem 5.2 (Basic properties of the zeta function).
(i) (s) is analytic in > 1 and there has the Dirichlet series represen-
tation (s) =
n=1
n
s
.
(ii) (s) has an analytic continuation to a function dened on the half-
plane > 0 and analytic in this half-plane with the exception of a
simple pole at s = 1 with residue 1. The analytic continuation is also
denoted by (s) and has the integral representation
(5.6) (s) =
s
s 1
s
_
1
xx
s1
dx ( > 0).
(iii) (s) has an Euler product representation (s) =
p
(1 p
s
)
1
in
> 1.
(iv) (s) has no zeros in the half-plane > 1.
Proof. (i), (ii), and (iii) were established in Theorems 4.11 and 4.3; (iv)
follows immediately from the Euler product representation, since the Euler
product is absolutely convergent, and none of its factors is zero, in the half-
plane > 1. (In general, an absolutely convergent innite product can only
be 0 if one of its factors is 0.)
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
148 CHAPTER 5. DISTRIBUTION OF PRIMES II
5.3 The Riemann zeta function, II: upper bounds
In this section we establish upper bounds for (s) and
(s)).
[(s)[ 4
[t[
1
0
1
0
([t[ 2, 1/2
0
< 1,
0
) (i)
[(s)[ A
1
log [t[ ([t[ 2, 1
1
4 log [t[
) (ii)
[
(s)[ A
2
log
2
[t[ ([t[ 2, 1
1
12 log [t[
) (iii)
To prove this result, we need three lemmas.
Lemma 5.4.
(5.7) (s) =
N
n=1
1
n
s
N
1s
1 s
s
_
N
uu
s1
du (N N, > 0).
Proof. The argument is modication of that used to establish the integral
representation (5.6). Given a positive integer N, we apply the Mellin trans-
form representation (Theorem 4.10) to the Dirichlet series
F(s) =
n=N+1
1
n
s
= (s)
N
n=1
1
n
s
.
This is the Dirichlet series corresponding to the function f dened by f(n) =
1 if n > N and f(n) = 0 if n N, whose partial sums are given by
M(f, x) = [x] N if x N and M(f, x) = 0 otherwise. Hence Theorem
4.10 gives, for > 1,
F(s) = s
_
1
M(f, x)x
s1
dx = s
_
N
([x] N)x
s1
dx
= s
_
N
x
s
dx sN
_
N
x
s1
dx s
_
N
xx
s1
dx
=
sN
1s
1 s
N
1s
s
_
N
xx
s1
dx
=
N
1s
1 s
s
_
N
xx
s1
dx.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 149
Since (s) = F(s) +
N
n=1
n
s
, this yields the desired relation (5.7) in the
range > 1. Since both sides of this relation are analytic in > 0 except for
a simple pole at s = 1 with residue 1 (note that the integral
_
N
xx
s1
dx
is uniformly convergent, and hence analytic, in any half-plane , > 0),
the relation remains valid in the larger half-plane > 0, as asserted.
Lemma 5.5.
(5.8) [(s)[
N
n=1
1
n
+
N
1
[t[
+
[s[
(N N, > 0, t ,= 0).
Proof. This follows immediately from the previous lemma, on noting that
each of the three terms on the right-hand side of (5.7) is bounded, in absolute
value, by the corresponding term on the right of (5.8). For the rst two
terms, this is obvious, and for the third term this follows from the inequality
s
_
N
uu
s1
du
[s[
_
N
u
1
du =
[s[N
[[
.
Lemma 5.6.
[(s)[
N
1
0
1
0
+
N
1
0
[t[
+
_
1 +
[t[
0
_
N
0
(5.9)
(N N, 1/2 <
0
< 1,
0
> 0, t ,= 0).
Proof. We show that the three terms on the right of (5.8) are bounded by
the corresponding terms in (5.9). Using the hypotheses
0
and
0
< 1
and the inequality n
_
n
n1
x
n=1
1
n
0
1 +
_
N
1
x
0
dx = 1 +
N
1
0
1
1
0
N
1
0
1
0
,
as desired. Since
0
, the second term is trivially bounded by the
corresponding term in (5.9). The same holds for the third term, in view
of the bound [s[/ ( + [t[)/ 1 + [t[/
0
. Hence (5.9) follows from
(5.8).
Proof of Theorem 5.3. (i) We apply Lemma 5.6 with N = [[t[], where [x]
denotes the greatest integer x. Since, by hypothesis, 0 <
0
< 1, we then
have N
1
0
[t[
1
0
, so the lemma gives
(5.10) [(s)[
[t[
1
0
1
0
_
1 +
1
0
[t[
+
1
0
[[t[]
+
(1
0
)[t[
0
[[t[]
_
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
150 CHAPTER 5. DISTRIBUTION OF PRIMES II
Since [t[ 2 we have (1
0
)/[t[ (1
0
)/[[t[] 1/2. Moreover, the
inequalities [[t[] [t[/2 and 1/2
0
< 1 give (1
0
)[t[/(
0
[[t[]) 2.
Hence the expression in parentheses on the right of (5.10) is at most 1 +
1/2 + 1/2 + 2 = 4, and we obtain (i).
(ii) We set
0
= 11/(4 log [t[). Since [t[ 2, we have 4 log [t[ log 16 >
2 and so 1/2 <
0
< 1. Hence we can apply the estimate of part (i) with
this value of
0
and obtain
[(s)[ 4
[t[
1
0
1
0
=
4e
1/4
1/(4 log [t[)
= 16e
1/4
log [t[,
which is the desired bound with constant A
1
= 16e
1/4
.
(iii) First note that, for 2, the Dirichlet series representation of
(s)
implies [
(s)[
n=1
(log n)n
2
, so the asserted bound holds trivially in
the half-plane 2. Also, the analyticity of (s) in the region s : Re s >
0, s ,= 1 implies that [
C : [s
(s)[ =
1
2i
_
|s
s|=
(s
)
(s
s)
2
ds
max
|s
s|=
[(s
)[.
To estimate the right-hand side of (5.11), we will show that for [s
s[
, (s
in this range fall into the range of validity of the upper bound for
(s
+ it
with [s
1
1
6 log [t[
1
1
6 log [t
[
2/3
= 1
1
4 log [t
[
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 151
Thus the point s
)[ A
1
log [t
[ (3/2)A
1
log [t[ ([s
s[ = ).
Substituting this estimate in (5.11), we obtain
[
(s)[
1
(3/2)A
1
log [t[ = 18A
1
(log [t[)
2
,
which is the desired estimate.
5.4 The Riemann zeta function, III: lower bounds
and zerofree region
The next result gives a zero-free region for (s) to the left of the line = 1
of width a constant multiple of (log [t[)
9
, and a lower bound for (s) in
this region. This result is the most important ingredient in the proof of the
PNT; the value = 1/10 in the estimate (5.5) is directly related to the
exponent 9 appearing in the denition of the region. De la Vallee Poussins
error term ( = 1/2) is a consequence of a similar estimate, but in a wider
region, with the exponent 1 instead of 9, and Vinogradovs value = 3/5
corresponds to an exponent 2/3 + in the zero-free region.
Theorem 5.7 (Zero-free region and upper bound for 1/(s)).
(i) (s) has no zeros in the closed half-plane 1.
(ii) There exist constants c
1
> 0 and A
3
> 0 such that (s) has no zeros
in the region
> 1 c
1
, [t[ 2,
and in this region satises
1
(s)
A
3
.
(iii) There exist constants c
2
> 0 and A
4
> 0 such that (s) has no zeros
in the region
1
c
2
(log [t[)
9
, [t[ 2,
and in this region satises
1
(s)
A
4
(log [t[)
7
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
152 CHAPTER 5. DISTRIBUTION OF PRIMES II
The key ingredient in the proof is the following elementary inequality.
Lemma 5.8 (3-4-1 Lemma). For any real number we have
3 + 4 cos + cos(2) 0.
Proof. We have
0 (1 + cos )
2
= 1 + 2 cos + cos
2
= 1 + 2 cos + (1/2)(1 + cos(2))
= (1/2)(3 + 4 cos + cos(2)).
We use this lemma to deduce a lower bound for a certain product of
powers of the zeta function.
Lemma 5.9 (3-4-1 inequality for (s)). We have
()
3
( +it)
4
( + 2it)
1 ( > 1, t R).
Proof. Note that for Re s > 1,
log [(s)[ = log
p
_
1
1
p
s
_
1
= Re
p
log
_
1
1
p
s
_
= Re
m1
1
mp
ms
=
m1
cos(t log p
m
)
mp
m
,
where, as usual, = Re s and t = Ims, and log denotes the principal branch
of the logarithm. Applying this relation with , +it, and +2it in place
of s, we obtain
log
()
3
( +it)
4
( + 2it)
m1
P(t log p
m
)
mp
m
,
where
P() = 3 + 4 cos + cos(2)
is the trigonometric polynomial of Lemma 5.8. Since, by that lemma, P()
is nonnegative, all terms in the double series on the right are nonnegative, so
the left-hand side is nonnegative as well. This implies the asserted inequality.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 153
Proof of Theorem 5.7. (i) By Theorem 5.2 (s) has no zeros in the open
half-plane > 1, so it remains to exclude the possibility of a zero on the
line = 1. We argue by contradiction and suppose that (1 + it
0
) = 0 for
some real number t
0
. Recall that, by Theorem 5.2, (s) is analytic in the
half-plane > 0, except for a simple pole at s = 1. Since has a pole at
s = 1, it cannot have a zero there, so we necessarily have t
0
,= 0.
With a view towards applying Lemma 5.9, we consider the behavior of
the three functions (), ( + it
0
), and ( + 2it
0
) as 1+. Since
(s) has a pole at 1, ()( 1) is bounded as 1+. Furthermore, our
assumption that (1 + it
0
) = 0 implies, by the analyticity of (s), that the
expression
( +it
0
)
1
=
( +it
0
) (1 +it
0
)
( +it
0
) (1 +it
0
)
also stays bounded as 1+. Finally, the analyticity of (s) at 1 + 2it
0
implies that (+2it
0
) converges to (1+2it
0
) as 1+, and, in particular,
stays bounded. It follows that the function
()
3
( +it
0
)
4
( + 2it
0
)
= ( 1) [()( 1)[
3
( +it
0
)
1
4
[( + 2it
0
)[
is of order O(1) as 1+ and hence tends to 0. On the other hand, by
Lemma 5.9, this function is bounded from below by 1, so we have arrived
at a contradiction. Hence (s) cannot have a zero on the line = 1.
(ii) First note that in the half-plane 2 the asserted bound holds
trivially: indeed, in this half-plane we have
[(s) 1
n2
1
n
2
= 1
_
2
6
1
_
> 0
and thus
(5.12)
1
(s)
(2
2
/6)
1
( 2).
It remains therefore to show that 1/(s) is uniformly bounded in the compact
rectangle
(5.13) 1 c
1
2, [t[ 2,
with a suciently small positive constant c
1
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
154 CHAPTER 5. DISTRIBUTION OF PRIMES II
Now, by part (i), (s) has no zeros in the closed half-plane 1, so
1/(s) is analytic in this half-plane and therefore bounded in any compact
region contained in this half-plane. In particular, 1/(s) is bounded in the
rectangle 1 2, [t[ 2. By compactness, it follows that 1/(s) remains
bounded in any suciently small neighborhood of this rectangle, and, in
particular, in a rectangle of the form (5.13).
(iii) For 2 the bound follows from (5.12), so we may restrict to
the case when 2. To obtain the desired bound for 1/(s) we will use
again Lemma 5.9, in conjunction with the upper bounds for (s) and
(s)
established in Theorem 5.3.
We x a constant A that will be chosen later and let t be given with
[t[ 2. We consider rst the range
(5.14) 1 +A(log [t[)
9
2.
By Lemma 5.9 we have, for > 1,
(5.15) [( +it)[
1
()
3/4
1
[( + 2it)[
1/4
.
Since (s) has a simple pole at s = 1, there exists an absolute constant c
3
such that
() c
3
( 1)
1
for 1 < 2. Moreover, by Theorem 5.3(ii) we have
[( + 2it)[ A
1
log [2t[ 2A
1
log [t[,
where in the last step we have used the trivial inequality log(2[t[) log [t[
2
=
2 log [t[, which is valid since [t[ 2. Inserting these bounds into (5.15) and
now restricting to the narrower range (5.14), we obtain
[( +it)[ c
3/4
3
(2A
1
)
1/4
( 1)
3/4
(log [t[)
1/4
(5.16)
c
4
A
3/4
(log [t[)
7
,
where c
4
= c
3/4
3
(2A
1
)
1/4
is an absolute constant.
This proves the asserted bound in the range (5.14), for any choice of the
constant A. To complete the proof, we show that, if A is chosen suciently
small, then a bound of the same type holds in the range
(5.17) 1 A(log [t[)
9
1 +A(log [t[)
9
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 155
Write
1
=
1
(A, t) = 1 A(log [t[)
9
,
2
=
2
(A, t) = 1 +A(log [t[)
9
,
and note that for
1
2
we have
[( +it)[ =
(
2
+it)
_
2
(u +it)du
[(
2
+it)[ (
2
1
) max
1
u
2
[
(u +it)[.
Since
2
+it falls in the range (5.14), the rst term on the right can be es-
timated by (5.16). Moreover, by Theorem 5.3, we have [
(s)[ A
2
(log [t[)
2
provided satises 1 (12 log [t[)
1
. If the constant A is suciently
small, the range (5.17) is contained in the latter range, and so the above
bound for [
(s)
(s)
A
5
(log [t[)
9
,
and for all s in the range
(5.19) 1 c
6
, [t[ 2, s ,= 1,
we have
(s)
(s)
A
5
max
_
1,
1
[ 1[
_
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
156 CHAPTER 5. DISTRIBUTION OF PRIMES II
Proof. The rst estimate follows by combining the bounds (iii) of Theorems
5.7 and 5.3, and noting that the ranges of validity of these latter estimates,
namely 1c
2
(log [t[)
9
and 11/(12 log [t[), both contain the range
(5.18), provided [t[ 2 and the constant c
6
is chosen suciently small.
The second estimate is a consequence of the analytic properties of (s):
Since 1/(s) is analytic in 1 and (s) is analytic in > 0 except for
a simple pole at s = 1, the logarithmic derivative
(s)/(s) is analytic in
1, except for a simple pole at s = 1. Hence (s 1)
(s)/(s) is analytic
in 1. By compactness the analyticity extends to a region of the form
1c
6
, [t[ 2, provided c
6
is a suciently small constant. It follows that
this function is bounded in the compact region 1c
6
2, [t[ 2, so that
we have [
(s)/(s) =
n1
(n)n
s
is trivially bounded by
n1
(n)n
2
< ,
we obtain the second estimate of the theorem, by adjusting the constant A
5
if necessary.
5.5 Proof of the Prime Number Theorem
We are now ready to prove the prime number theorem in the form given by
Theorem 5.1. We break the proof into several steps:
Application of Perrons formula. We let
1
(x) =
_
x
0
(y)dy =
nx
(n)(x n),
and apply Perrons formula in the version given by Theorem 4.16 with
f(n) = (n). The corresponding Dirichlet series is F(s) =
n=1
f(n)n
s
=
(s)
(s)
_
x
s+1
s(s + 1)
ds.
We x x e, let e T x be a parameter that will be chosen later (as
a function of x), and we set
a = 1 +
1
log x
, b = 1
c
6
(log T)
9
,
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 157
where c
6
is the constant of Theorem 5.10. Note that, since c
6
< 1/2 and
e T x, we have
(5.21) 1 < a 2,
1
2
< 1 c
6
< b < 1.
Shifting the path of integration. The path of integration in (5.20) is
a vertical line located within the half-plane > 1. We move the portion
[t[ T of this path to the left of the line = 1, replacing it by a rectangular
path joining the points b iT and aiT. Thus, the new path of integration
is of the form L =
5
i=1
L
i
, with
L
1
= (a i, a iT],
L
2
= [a iT, b iT],
L
3
= [b iT, b +iT],
L
4
= [b +iT, a +iT],
L
5
= [a +iT, a +i).
With this change of the path of integration we have
(5.22)
1
(x) = M +
1
2i
5
j=1
I
j
,
where M, the main term, is the contribution of the residues at singularities
of the integrand in the region enclosed by the two paths and I
j
denotes the
integral over the path L
j
.
The main term. The region enclosed by the original and the modied
paths of integration is the rectangle with vertices
a iT = 1 +
1
log x
iT, b iT = 1
c
6
(log T)
9
iT,
which falls within the zero-free region of (s) given by Theorem 5.7. Thus,
the integrand function has only one singularity in this region, namely that
generated by the pole of (s) at s = 1. Since this pole is simple, it follows
that
(s)
(s)
x
s+1
s(s + 1)
, s = 1
_
=
1
2
x
2
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
158 CHAPTER 5. DISTRIBUTION OF PRIMES II
Hence we have
(5.23) M =
1
2
x
2
.
This will be the main term of our estimate for
1
(x). It remains to estimate
the contribution of the integrals I
j
. Here and in the remainder of this section,
the constants implied in the notation are absolute and, in particular,
independent of the value of T (which will only be chosen at the end of the
proof).
Estimates of I
1
and I
5
. These are the integrals along the vertical seg-
ments (a i, a iT] and [a + iT, a + i). On these segments we have
= a = 1 + 1/ log x and [t[ T. Thus,
(s)
(s)
n=1
(n)
n
a
=
(a)
(a)
1
a 1
= log x,
and
x
s+1
s(s + 1)
=
x
a+1
[s[[s + 1[
x
a+1
t
2
=
ex
2
t
2
.
Hence we obtain the bounds
(5.24) I
1,5
_
T
(log x)
x
2
t
2
dt
x
2
log x
T
.
Estimates of I
2
and I
4
. These are the integrals along the horizontal
segments [a iT, b iT] and [b +iT, a +iT]. By Theorem 5.10 we have on
these paths
(s)
(s)
(log T)
9
and
x
s+1
s(s + 1)
x
a+1
[s[[s + 1[
x
2
T
2
.
Hence
(5.25) I
2,4
_
b
a
(log T)
9
x
2
T
2
d
x
2
(log T)
9
T
2
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 159
Estimate of I
3
. The remaining integral I
3
is the integral over the vertical
segment [biT, b+iT]. By Theorem 5.10 and our choice b = 1c
6
(log T)
9
,
we have on this segment
(s)
(s)
max
_
(log T)
9
, (1 b)
1
_
(log T)
9
.
Since
x
s+1
s(s + 1)
=
x
b+1
[s[[s + 1[
x
b+1
min(1, t
2
),
we obtain the bound
(5.26) I
3
_
T
T
x
b+1
(log T)
9
min(1, t
2
)dt x
b+1
(log T)
9
.
Estimation of
1
(x). Substituting the estimates (5.23)(5.26) into (5.22),
we obtain
1
(x) =
1
2
x
2
+R(x, T)
with
R(x, T)
5
j=1
[I
j
[ x
2
_
log x
T
+
(log T)
9
T
2
+x
b1
(log T)
9
_
x
2
_
log x
T
+ (log T)
9
exp
_
c
6
log x
(log T)
9
__
,
where in the last step we used the assumption T x, which implies that
the term (log T)
9
T
2
is of smaller order than the term (log x)T
1
and hence
can be dropped. We now choose T as
T = exp
_
(log x)
1/10
_
.
Since x e, this choice satises our initial requirement on T, namely e
T x, and we obtain
R(x, T) x
2
_
(log x) exp
_
(log x)
1/10
_
+ (log x)
9/10
exp
_
c
6
(log x)
1/10
__
x
2
exp
_
c
7
(log x)
1/10
_
,
with a suitable positive constant c
7
. (In fact, any constant less than c
6
will
do.) Hence we have
(5.27)
1
(x) =
1
2
x
2
+O
_
x
2
exp
_
c
7
(log x)
1/10
__
(x e).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
160 CHAPTER 5. DISTRIBUTION OF PRIMES II
Transition to (x). As the nal step in the proof of the prime number
theorem, we need to derive an estimate for (x) from the above estimate
for
1
(x). Recall that the two functions are related by
1
(x) =
_
0
(y)dy.
While from an estimate for a function one can easily derive a corresponding
estimate for the integral of this function, a similar derivation in the other
direction is in general not possible. However, in this case we are able to do so
by exploiting the fact that the function (x) =
nx
(n) is nondecreasing.
We x x 6 and a number 0 < < 1/2 (to be chosen later as a suitable
function of x) and note that, by the monotonicity of (x), we have
1
(x)
1
(x(1 )) =
_
x
x(1)
(y)dy x(x).
Since x x(1 ) x/2 3 e by our assumptions x 6 and < 1/2,
we can apply (5.27) to each of the two terms on the left and obtain
x(x)
1
2
x
2
+O(x
2
)
1
2
x
2
(1 )
2
+O(x
2
) (5.28)
= x +O(x
2
(
2
+ +
)),
where
= exp
_
c
7
(log x)
1/10
_
,
= exp
_
c
7
(log x(1 ))
1/10
_
denote the relative error terms in (5.27), applied to x and x
= x(1 ),
respectively. Since x(1 ) x/2
x, we have
(log x(1 ))
1/10
(log
x)
1/10
(1/2)(log x)
1/10
,
and hence
1/2
1. With this inequality, (5.28) yields
(x) x +O(x( +
/)).
Dening now by
= min(1/2,
1/4
),
we obtain
(5.29) (x) x +O(x) = x +O
_
exp
_
c
8
(log x)
1/10
__
with c
8
= c
7
/4.
A similar, but slightly simpler, argument starting from the inequality
1
(x(1 +))
1
(x) x(x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 161
shows that (5.29) also holds with the inequality sign reversed. Thus we have
obtained the estimate
(x) = x +O
_
exp
_
c
8
(log x)
1/10
__
in the range x 6. Since the same estimate holds trivially for 2 x 6,
this completes the proof of the prime number theorem in the form (5.5)
stated at the beginning of this section.
5.6 Consequences and remarks
Consequences of the PNT with error term. Using partial summa-
tion one can easily derive from the prime number theorem with the error
term established here estimates for other prime number sums with compa-
rable error terms. We collect the most important of these estimates in the
following theorem.
Theorem 5.11 (Consequences of the PNT). For x 2 we have
(x) = x +O(xR(x)), (i)
(x) = Li(x) +O(xR(x)), (ii)
px
log p
p
= log x +C
1
+O(R(x)), (iii)
px
1
p
= log log x +C
2
+O(R(x)), (iv)
px
_
1
1
p
_
=
e
log x
(1 +O(R(x))) , (v)
where Li(x) =
_
x
2
dt
log t
, the C
i
are absolute constants, and R(x) is an error
term of the same type as in Theorem 5.1, except possibly for the value of the
constant in the exponent, i.e., R(x) = exp(c(log x)
1/10
), with c a positive
constant (not necessarily the same as in Theorem 5.1).
Proof. Estimate (i) follows from the (x) version of the PNT (i.e., Theorem
5.1) and the estimate
0 (x) (x) =
x
[(log x)/(log p)]
m=2
log p
x
log p
log x
log p
= (
x) log x
x xR(x).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
162 CHAPTER 5. DISTRIBUTION OF PRIMES II
Estimates (ii)(iv) can be deduced from (i) by a routine application of
partial summation. We omit the details, and only note that the process
typically results in a small loss in the constant in the exponent in R(x).
This is because one has to apply the PNT with values y x in place of x
and use estimates such as
expc(log y)
exp
_
c2
(log x)
_
(
x y x),
to bound error terms at y in terms of error terms at x.
The estimate (v) is a sharper version of Mertens formula. Except for
the value of the constant, this estimate follows from (iv), on noting that
px
log
_
1
1
p
_
px
1
p
=
px
m=2
1
mp
m
=
m=2
1
mp
m
+O
_
p>x
1
p
2
_
= C +O
_
1
x
_
,
where C is a constant. That the constant on the right of (v) must be equal
to e
i=1
(i 1)!x
(log x)
i
+O
k
_
x
(log x)
k+1
_
(x 2).
Estimates for the Moebius function. As we have seen in Chapter 3,
the PNT in its asymptotic form (x) x is equivalent to the asymptotic
relation M(, x) =
nx
(n) = o(x). It is reasonable to expect that a
sharper form of the PNT would translate to a corresponding sharpening of
the estimate for M(, x). This is indeed the case, and we have:
Theorem 5.13 (Moebius sum estimate). For x 2, we have
nx
(n) xR(x),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 163
where R(x) is dened as in Theorem 5.11.
This result can be proved by essentially repeating the proof of the last
section, with the function
(s)/(s), is analytic
at s = 1, so no main term appears when estimating the corresponding Perron
integral. We omit the details.
Zero-free regions of the zeta function and the error term in the
prime number theorem. The proof of the prime number theorem given
here depended crucially on the existence of a zero-free region for the zeta
function and bounds for (s) and 1/(s) within this region. It is easy to see
that a larger zero-free region, along with corresponding zeta bounds in this
region, would lead to a better error term. In turns out that, in some sense,
the converse also holds: a smaller error term in the prime number theorem
implies the existence of a larger zero-free region for the zeta function. Indeed,
there are results that go in both directions and give equivalences between
zero-free regions and error terms. We state, without proof, one simple result
of this type and derive several consequences from it.
Theorem 5.14. Let 0 < < 1. Then the following are equivalent:
(i) The Riemann zeta function has no zeros in the half-plane > .
(ii) The prime number theorem holds in the form
(x) = x +O
(x
+
) (x 2)
for every xed > 0.
The Riemann Hypothesis is the statement that (s) has no zeros in the
half-plane > 1/2. Taking = 1/2 in the above result, we therefore obtain
the following equivalence to the Riemann Hypothesis.
Corollary 5.15. The Riemann Hypothesis holds if and only if, for every
> 0,
(x) = x +O
(x
1/2+
) (x 2).
Since it is known that the Riemann zeta function has innitely many
zeros on the line = 1/2, condition (i) in Theorem 5.14 cannot hold with
< 1/2. By the equivalence of (i) and (ii) it follows that the same is true
for condition (ii); that is, we have:
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
164 CHAPTER 5. DISTRIBUTION OF PRIMES II
Corollary 5.16. Given any > 0, the estimate
(x) = x +O(x
1/2
) (x 2)
does not hold.
5.7 Further results
The results on the Riemann zeta function and the PNT that we have proved
in this chapter represent only a small part of what is known in this connec-
tion. The Riemann zeta function is one of the most thoroughly studied
special functions in mathematics, and entire books have been devoted to
this function. Even though the most famous problem about this function,
the Riemann Hypothesis, remains open, there exists a well-developed the-
ory of the Riemann zeta function, and its connection to the PNT. In this
section, we present, without proof, some of the major known results, as well
as some of the main conjectures in this area.
The functional equation and analytic continuation. A fundamental
property of the zeta function that is key to any deeper study of this function
is the functional equation it satises:
(5.31) (1 s) = 2(2)
s
(s) cos(s/2)(s).
Here (s) is the so-called Gamma function, a meromorphic function that
interpolates factorials in the sense that (n) = (n 1)! when n is a positive
integer. The Gamma function is analytic in the half-plane > 0 and there
has integral representation (s) =
_
0
e
x
x
s1
dx.
The functional equation relates values of (s) to values (1 s). The
vertical line = 1/2 acts as an axis of symmetry for this functional equation,
in that if s lies in the half-plane to the right of this line, then 1 s falls in
the half-plane to the left of this line.
By Theorem 4.11, (s) has an analytic continuation to the half-plane
> 0. Similar arguments could be used to obtain a continuation to > 1,
and, by induction, to > n, for any positive integer n. However, the
functional equation (5.31) provides an analytic continuation to the entire
complex plane in a single step. To see this, note that function on the right
of the equation is analytic in > 0 (the pole of (s) at s = 1 is cancelled
out by a zero of cos(s/2) at the same point). Hence the same must be true
for the function on the left. But this means that (1 s) has an analytic
continuation to the half-plane > 0, or, equivalently, that (s) has an
analytic continuation to the half-plane < 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 165
Analytic properties of the Riemann zeta function. The key ana-
lytic properties (both known and conjectured) of the Riemann zeta function
(henceforth considered as a meromorphic function on the entire complex
plane), are the following:
Poles: (s) has a single pole at s = 1, with residue 1, and is analytic
elsewhere.
Trivial zeros: (s) has simple zeros at the points s = 2n, n =
1, 2, . . . . These are called trivial zeros, as they are completely un-
derstood and have no bearing on the distribution of primes.
Nontrivial zeros: All other zeros of (s) are located in the critical
strip 0 < < 1. These zeros, commonly denoted by = + i, are
closely related to the error term in the prime number theorem. They
are symmetric with respect to both the critical line = 1/2, and
the real axis. It is known that there are innitely many nontrivial
zeros, and good estimates are available for the number of such zeros
up to a given height T, but their horizontal distribution within the
strip 0 < < 1 remains largely a mystery.
The Riemann Hypothesis: The Riemann Hypothesis (RH) is the
assertion that all nontrivial zeros lie exactly on the critical line =
1/2. This has been numerically veried for the rst several billion
nontrivial zeros (when ordered by increasing imaginary part). The
closest theoretical approximation to the Riemann Hypothesis are zero-
free regions of the form > 1 c(log [t[)
[t[
n=1
n
s
is only valid in > 1 (in fact, the series does not even
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
166 CHAPTER 5. DISTRIBUTION OF PRIMES II
converge when 1), an approximate version of this representation remains
valid to the left of the line = 1. Here is a typical result of this nature:
Theorem 5.17 (Approximate formula for (s)). For 1/2 2 and
[t[ 2 we have
(s) =
n|t|
1
n
s
+O
_
[t[
_
.
This estimate can be deduced from the identity (5.7). In fact, applying
this identity with N = [[t[ +1] gives, under the conditions 1/2 2 and
[t[ 2 and assuming (without loss of generality) that [t[ is not an integer,
(s)
n|t|
1
n
s
[[t[ + 1]
1
[1 s[
+[s[
_
[|t|+1]
uu
s1
du
,
The rst term on the right is of the desired order [t[
/ [t[
1
. This is too weak, but a more careful estimate
of the integral, using integration by parts and the estimate
_
x
0
udu =
(1/2)x +O(1), shows that this term is also of order [t[
.
Explicit formulae. The proof of the prime number theorem given in the
last section clearly showed the signicance of the zeros of the zeta function
in obtaining sharp versions of the prime number theorem. However, the
eect of possible zeros in this proof was rather indirect: A zero of the zeta
function leads to a singularity in the Dirichlet
1
(x) =
x
2
2
x
+1
( + 1)
(0)
(0)
x +
(1)
(1)
n=1
x
12n
(2n)(2n 1)
, (5.32)
where runs through all nontrivial zeros of (s).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 167
Theorem 5.19 (Explicit formula for (x)). We have, for x 2 and
any 2 T x
(x) = x
||T
x
+O
_
xlog
2
x
T
_
, (5.33)
where = +i runs through all nontrivial zeros of (s) of height [[ T.
It is known that
(5.34)
1
[[
2
< ,
while
(5.35)
1
[[
= ,
so the series over in the rst explicit formula (for
1
(x)) converges abso-
lutely, but not that in the second formula (for (x)). Hence the need for
working with a truncated version of this series in the latter case.
Explicit formulas can be used to translate results or conjectures on zeros
of the zeta function to estimates for the prime counting functions (x) or
1
(x). We illustrate this with two corollaries.
Corollary 5.20. Assuming the Riemann Hypothesis, we have
1
(x) =
x
2
2
+O(x
3/2
).
Proof. This follows immediately from (5.32) and (5.34) on noting that, under
the Riemann Hypothesis, [x
[ = x
Re
= x
1/2
for all nontrivial zeros .
The second corollary shows the eect of a single hypothetical zero that
violates the Riemann Hypothesis. We rst note that, due to the symmetry
of the (nontrivial) zeros of the zeta function, zeros o the line = 1/2 come
in quadruples: If = +i is a zero of the zeta function with 1/2 < 1,
then so are i and 1 i. Thus, it suces consider zeros = +i
in the quadrant 1/2 and 0.
Corollary 5.21. Suppose that there exists exactly one zero = + i of
the zeta function with 1/2 < 1, 0, so that all zeros except i
and 1 i have real part 1/2. Then
1
(x) =
x
2
2
+cx
1+
cos( log x) +O(x
3/2
),
where c is a constant depending on .
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
168 CHAPTER 5. DISTRIBUTION OF PRIMES II
Proof. As before, the contribution of the zeros satisfying the Riemann Hy-
pothesis to the sum over in (5.32) is of order O(x
3/2
), and the same holds
for the contribution of the zeros 1 i since 1 < 1/2. Thus, the only
remaining terms in this sum are those corresponding to = i. Their
contribution is
x
1++i
( +i)( + 1 +i)
+
x
1+i
( i)( + 1 i)
,
and a simple calculation shows that this term is of the form
cx
1+
cos( log x).
Thus, under the hypothesis of the corollary,
1
(x) oscillates around the
main term x
2
/2 with an amplitude cx
1+
. In particular, a zero on the
line = 1, i.e., with = 1, would result in an oscillatory term of the
form cx
2
cos( log x) in the estimate for
1
(x), and thus contradict the PNT
(though not Chebyshevs estimate, if the constant c is smaller than 1).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 169
5.8 Exercises
5.1 Show that, if x is suciently large, then the interval [2, x] contains
more primes than the interval (x, 2x].
5.2 Obtain an asymptotic estimate for the sum
S(x) =
x<p2x
1
p
with relative error 1/ log x (i.e., an estimate of the form S(x) =
f(x)(1 +O(1/ log x)) with a simple elementary function f(x).
5.3 Dene A(x) by (x) = x/(log x A(x)). Show that A(x) = 1 +
O(1/ log x) for x 2.
Remark. This result is of historical interest for the following reason:
While the function x/ log x is asymptotically equal to (x) by the
prime number theorem, examination of numerical data suggests that
the function x/ log x is not a particularly good approximation to (x).
Therefore, in the early (pre-PNT) history of prime number theory
several other functions were suggested as suitable approximations to
(x). In particular, Legendre proposed the function x/(log x1.08366)
(The particular value of the constant 1.08366 was presumably obtained
by some kind of regression analysis on the data.) On the other hand,
Gauss suggested that x/(log x 1) was a better match to (x). The
problem settles this dispute, showing that Gauss had it right.
5.4 Let f(n) = (n) 1. Show that the Dirichlet series F(s) =
n=1
f(n)n
s
converges for every s on the line = 1, and ob-
tain an estimate for the rate of convergence, i.e., the dierence
F(s)
nx
f(n)n
s
, when s = 1 + it for some xed t. (The es-
timate may depend on t, but try to get as good an error term as
possible assuming the PNT with exponential error term.)
5.5 Let 0 < < 1 be xed. Show that if
(1) (x) = x +O(xexpc(log x)
) (x 2)
with some positive constant c, then
(2) (x) = Li(x) +O
_
xexpc
(log x)
_
(x 2)
with some (other) positive constant c
nx
(n). Using complex integration as in the proof of
the PNT, show that M(x) = O(xexp(c(log x)
n=1
f(n)n
s
is a Dirichlet series, stating any condi-
tions that are needed for this formula to be valid.
5.8 Let F(s) =
p
p
s
, where the summation runs over all primes. Show
that F(s) = log (s) + G(s), where log denotes the principal branch
of the logarithm, and G(s) is analytic in the half-plane > 1/2. De-
duce from this that the function F(s) does not have a meromorphic
continuation to the left of the line = 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Chapter 6
Primes in arithmetic
progressions: Dirichlets
Theorem
6.1 Introduction
The main goal of this chapter is a proof of Dirichlets theorem on the exis-
tence of primes in arithmetic progressions, a result that predates the prime
number theorem by about 50 years, and which has had an equally profound
impact on the development of analytic number theory. In its original version
this result is the following.
Theorem 6.1 (Dirichlets Theorem). Given any positive integers q and
a with (a, q) = 1, there exist innitely many primes congruent to a modulo
q. In other words, each of the arithmetic progressions
(6.1) qn +a : n = 0, 1, 2, . . . , q, a N, (a, q) = 1,
contains innitely many primes.
The above form of Dirichlets theorem is the analog of Euclids theorem
on the innitude of primes. Our proof will in fact give a stronger result,
namely an analog of Mertens estimate for primes in arithmetic progressions.
Theorem 6.2 (Dirichlets Theorem, quantitative version). Given any
positive integers q and a with (a, q) = 1, we have
(6.2)
px
pa mod q
1
p
=
1
(q)
log log x +O
q
(1) (x 3).
171
172 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
This is still short of an analog of the PNT for primes in arithmetic pro-
gressions, which would be an asymptotic formula for the counting functions
(x; q, a) = #p x : p a mod q. Such estimates are indeed known, but
the proofs are rather technical, so we will not present them here. (Essen-
tially, one has to combine Dirichlets method for the proof of Theorem 6.1
with the analytic argument we used in the previous chapter to prove the
PNT with error term.)
We note that the condition (a, q) = 1 in Dirichlets theorem is necessary,
for if (a, q) = d > 1, then any integer congruent to a modulo q is divisible by
d, and hence there can be at most one prime in the residue class a modulo
q. Thus, for given q, all but nitely many exceptional primes fall into one of
the residue classes a mod q, with (a, q) = 1. By the denition of the Euler
phi function, there are (q) such residue classes, and if we assume that the
primes are distributed approximately equally among these residue classes,
then a given class a modulo q, with (a, q) = 1, can be expected to contain a
proportion 1/(q) of all primes. Thus, the factor 1/(q) in (6.2) is indeed
the correct factor here.
A natural attempt to prove Dirichlets theorem would be to try to mimick
Euclids proof of the innitude of primes. In certain special cases, this does
indeed succeed. For example, to show that there are innitely many primes
congruent to 3 modulo 4, assume there are only nitely many, say p
1
, . . . , p
n
,
and consider the number N = p
2
1
. . . p
2
n
+ 2. Since p
2
i
3
2
1 mod 4 for
each i, N must be congruent to 3 modulo 4. Now note that, since N is odd,
all its prime factors are odd, and so congruent to either 1 or 3 modulo 4.
Moreover, N must be divisible by at least one prime congruent to 3 modulo
4, and hence by one of the primes p
i
, since otherwise N would be a product
of primes congruent to 1 modulo 4 and thus itself congruent to 1 modulo 4.
But this is impossible, since then p
i
would divide both N and N 2, and
hence also N (N 2) = 2.
Similar, though more complicated, elementary arguments can be given
for some other special arithmetic progressions, but the general case of Dirich-
lets theorem cannot be proved by these methods.
As was the case with the PNT, the breakthrough that led to a proof
of Dirichlets theorem in its full generality came with the introduction of
analytic tools. The key tools that Dirichlet introduced and which are now
named after him, are the Dirichlet characters and Dirichlet L-functions.
Dirichlet characters are certain arithmetic functions that are used to ex-
tract terms belonging to a given arithmetic progression from a summation.
Dirichlet L-functions are the Dirichlet series associated with these functions.
The analytic properties of these Dirichlet series, and in particular the loca-
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 173
tion of their zeros, play a key role in the argument. In fact, the most dicult
part of the proof consists in showing that the single point s = 1 is not a zero
for a Dirichlet L-function.
6.2 Dirichlet characters
The basic denition of a character is as follows.
Denition (Dirichlet characters). Let q be a positive integer. A Dirich-
let character modulo q is an arithmetic function with the following
properties:
(i) is periodic modulo q, i.e., (n +q) = (n) for all n N.
(ii) is completely multiplicative, i.e., (nm) = (n)(m) for all n, m N
and (1) = 1.
(iii) (n) ,= 0 if and only if (n, q) = 1.
The arithmetic function
0
=
0,q
dened by
0
(n) = 1 if (n, q) = 1 and
0
(n) = 0 otherwise (i.e., the characteristic function of the integers coprime
with q) is called the principal character modulo q.
Remarks. (i) That the principal character
0
is, in fact, a Dirichlet character
can be seen as follows: condition (i) follows from the relation (n, q) = (n +
q, q), (ii) follows from the fact that (nm, q) = 1 holds if and only if (n, q) = 1
and (m, q) = 1, and (iii) holds by the denition of
0
.
(ii) The notation
0
has become the standard notation for the principal
character. In using this notation, one must keep in mind that
0
depends
on q, even though this is not explicitly indicated in the notation.
(iii) In analogy with the residue class notation a mod q, the notation
mod q is used to indicate that is a Dirichlet character modulo q. In
a summation condition this notation denotes a sum over all characters
modulo q.
Examples
(1) Characters modulo 1. The constant function 1 clearly satises the
conditions of the above denition with q = 1. Moreover, since any
character modulo 1 must be periodic modulo 1 and equal to 1 at
1, 1 is the only Dirichlet character modulo 1. Note that this
character is the principal character modulo 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
174 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
(2) Characters modulo 2. The coprimality condition forces any char-
acter modulo 2 to be 0 at even integers, and the periodicity condition
along with the requirement (1) = 1 then forces (n) to be equal to 1
at odd integers. Thus, as in the case q = 1, there is only one character
modulo 2, namely the principal character
0
dened by
0
(n) = 1 if
(n, 2) = 1 and
0
(n) = 0 otherwise.
(3) Characters modulo 3. We have again the principal character
0
(n),
dened by
0
(n) = 1 if (n, 3) = 1 and
0
(n) = 0 otherwise. We will
show that there is exactly one other character modulo 3.
Suppose is a character modulo 3. Then properties (i)(iii) force
(1) = 1, (3) = 0, and (2)
2
= (4) = (1) = 1, so that (2) = 1.
If (2) = 1, then is equal to
0
since both functions are periodic
modulo 3 and have the same values at n = 1, 2, 3. If (2) = 1, then
=
1
, where
1
is the unique periodic function modulo 3 dened by
1
(1) = 1,
1
(2) = 1, and
1
(3) = 0. Clearly
1
satises properties
(i) and (iii). The complete multiplicativity is not immediately obvious,
but can be seen as follows:
Dene a completely multiplicative function f by f(3) = 0, f(p) = 1
if p 1 mod 3 and f(p) = 1 if p 1 mod 3. Then f(n) =
1
(n) for
n = 1, 2, 3, so to show that f =
1
(and hence that
1
is completely
multiplicative) it suces to show that f is periodic with period 3.
To this end, note rst that if n 0 mod 3, then f(n) = 0.
Otherwise n 1 mod 3, and in this case n can be written as
n =
iI
p
i
jJ
p
j
, where the products over i I and j J
are nite (possibly empty), and the primes p
i
, i I, and p
j
,
j J, are congruent to 1, resp. 1, modulo 3. We then have
f(n) =
iI
f(p
i
)
jJ
f(p
j
) = (1)
|J|
. On the other hand, since
p
i
1 mod 3 and p
j
1 mod 3, we have n (1)
|J|
mod 3. Hence,
if [J[ is even, then n 1 mod 3, and f(n) = 1 = f(1), and if [J[ is
odd, then n 2 mod 3, and f(n) = 1 = f(2). Thus, f is periodic
with period 3, as we wanted to show.
(4) Legendre symbols as characters. Let q be an odd prime, and
let (n) =
_
n
q
_
denote the Legendre symbol modulo q, dened as 0 if
(n, q) > 1, 1 if (n, q) = 1 and n x
2
mod q has a solution (i.e., if n is a
quadratic residue modulo q), and 1 otherwise (i.e., if n is a quadratic
non-residue modulo q). Then (n) is a character modulo q. Indeed,
properties (i) (periodicity) and (iii) (coprimality) follow immediately
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 175
from the denition of the Legendre symbol, while (ii) (complete mul-
tiplicativity) amounts to the identity
_
nm
q
_
=
_
n
q
__
m
q
_
, which is a
known result from elementary number theory.
A character derived from a Legendre symbol in this way takes on only
the values 0, 1, and thus satises
2
=
0
, but ,=
0
. Characters
with the latter two properties are called quadratic characters. Note
that a character taking on only real values (such a character is called
real character) necessarily has all its values in 0, 1, and so is
either a principal character or a quadratic character.
We will later describe a systematic method for constructing all Dirichlet
characters to a given modulus q.
We next deduce some simple consequences from the denition of a char-
acter.
Theorem 6.3 (Elementary properties of Dirichlet characters). Let
q be a positive integer.
(i) The values of a Dirichlet character modulo q are either 0, or (q)-
th roots of unity; i.e., for all n, we have either (n) = 0 or (n) =
e
2i/(q)
for some N.
(ii) The characters modulo q form a group with respect to pointwise multi-
plication, dened by (
1
2
)(n) =
1
(n)
2
(n). The principal character
0
is the neutral element of this group, and the inverse of a character
is given by the character dened by (n) = (n).
Proof. (i) If (n) ,= 0, then (n, q) = 1. By Eulers generalization of Fermats
Little Theorem we then have n
(q)
1 mod q. By the complete multiplica-
tivity and periodicity of this implies (n)
(q)
=
_
n
(q)
_
= (1) = 1, as
claimed.
(ii) The group properties follow immediately from the denition of a
character.
In order to derive further information on the properties of the group of
characters, we need a well-known result from algebra, which we state here
without proof.
Lemma 6.4 (Basis theorem for nite abelian groups). Every nite
abelian group is a direct product of cyclic groups. That is, for every nite
abelian group G there exist elements g
1
, . . . , g
r
G of respective orders
h
1
, . . . , h
r
, such that every element g G has a unique representation g =
r
i=1
g
i
i
with 0
i
< h
i
. Moreover, we have
r
i=1
h
i
= [G[.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
176 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
We note that in case the group is trivial, i.e., consists of only the identity
element, the result remains valid with an empty set as basis if the product
representation g =
i
g
i
i
is interpreted as the empty product whose value
is the identity element.
If we specialize the group G in Lemma 6.4 as the multiplicative group
(Z/qZ)
i
with 0
i
< h
i
, such that n
r
i=1
g
i
i
mod q. Moreover, we have
r
i=1
h
i
= (q).
Examples
(1) If q = p
m
is a prime power, with p an odd prime, then, by a result from
elementary number theory, there exists a primitive root g modulo p
m
,
i.e., an element g that generates the multiplicative group modulo p
m
.
Thus, this group is itself cyclic, and the basis theorem therefore holds
with r = 1 and g
1
= g. (For powers of 2 the situation is slightly more
complicated, as the corresponding residue class groups are in general
not cyclic.)
(2) Let q = 15 = 3 5. We claim that (g
1
, g
2
) = (2, 11) is a basis. It is
easily checked that the orders of g
1
and g
2
are h
1
= 4 and h
2
= 2,
respectively. Note also that h
1
h
2
= 4 2 = (5)(3) = (15), in
agreement with the theorem. A simple case-by-case check then shows
that the congruence classes 2
1
11
2
with 0
1
< 4 and 0
2
< 2
cover every residue class a mod q with (a, 15) = 1 exactly once.
Our main application of the basis theorem is given in the following
lemma.
Lemma 6.6 (Number of characters modulo q). Let q be a positive
integer. Then exist exactly (q) Dirichlet characters modulo q. Moreover,
for any integer a with (a, q) = 1 and a , 1 mod q there exists a character
with (a) ,= 1.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 177
Proof. Let g
1
, . . . , g
r
and h
1
, . . . , h
r
be as in Lemma 6.5, and set
j
= e
2i/h
j
.
Thus
j
, = 0, 1, . . . , h
j
, are distinct h
j
-th roots of unity. We claim that
there is a one-to-one correspondence between the tuples
(6.3) = (
1
, . . . ,
r
), 0
i
< h
i
,
and the characters modulo q.
To show this, suppose rst that is a character modulo q. Then (n) = 0
for (n, q) > 1, by the denition of a character. On the other hand, if
(n, q) = 1, then, by the basis theorem, we have n
r
i=1
g
i
i
mod q with
a unique tuple = (
1
, . . . ,
r
) with 0
i
< h
i
. The periodicity and
complete multiplicativity properties of a character then imply
(6.4) (n) =
_
r
i=1
g
i
i
_
=
r
i=1
(g
i
)
i
, if n
r
i=1
g
i
i
mod q.
Thus, is uniquely specied by its values on the generators g
i
. Moreover,
since g
i
has order h
i
, we have
(g
i
)
h
i
= (g
h
i
i
) = (1) = 1,
so (g
i
) must be an h
i
-th root of unity; i.e., setting
j
= exp(2i/h
j
), we
have
(6.5) (g
i
) =
i
i
(i = 1, . . . , r),
for a unique tuple = (
1
, . . . ,
r
) of the form (6.3). Thus, every character
modulo q gives rise to a unique tuple of the above form.
Conversely, given a tuple of this form, we dene an arithmetic function
=
by setting (n) = 0 if (n, q) > 1 and dening (n) via (6.4) and
(6.5) otherwise. By construction, this function (n) is periodic modulo q
and satises (n) ,= 0 if and only if (n, q) = 1, and one can verify that
is also completely multiplicative. Thus is indeed a Dirichlet character
modulo q.
We have thus shown that the characters modulo q are in one-to-one
correspondence with tuples of the form (6.3). Since by Lemma 6.5 there
are h
1
. . . h
r
= (q) such tuples, it follows that there are (q) characters
modulo q. This establishes the rst part of the Lemma 6.6.
For the proof of the second part, let an integer a be given with (a, q) = 1
and a , 1 mod q. By the basis theorem, we have a
r
i=1
g
i
i
mod q with
suitable exponents
i
of the form (6.3). Since a , 1 mod q, at least one
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
178 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
of the exponents
i
must be non-zero. Without loss of generality, suppose
that
1
,= 0. Dene a character by setting (g
1
) =
1
and (g
i
) = 1 for
i = 2, . . . , r. Then
(a) = (g
1
i
) = (g
i
)
1
= exp
_
2i
1
h
1
_
,= 1,
since 0 <
1
< h
1
.
Example: Characters modulo 15
We illustrate the construction of characters in the case of the modulus
q = 15 considered in the example following Lemma 6.5. There we had
obtained g
1
= 2 and g
2
= 11 as generators with respective orders h
1
=
4 and h
2
= 2. The corresponding roots of unity
i
in (6.5) are
1
=
e
2i/4
= i and
2
= e
2i/2
= 1. Thus there are 8 characters
1
,
2
modulo
15, corresponding to the pairs (
1
,
2
) with 0
1
< 4 and 0
2
<
2, and dened by setting
1
,
2
(2) = i
1
and
1
,
2
(11) = (1)
2
. The
corresponding values of
1
,
2
(a) at all integers 1 a q with (a, 15) = 1
can be calculated by representing a in the form a 2
1
11
2
mod 15 with
0
1
< 4, 0
2
< 2, and then using the formula (a) = (2
1
11
2
) =
(2)
1
(11)
2
. The result is given in Table 6.1:
(
1
,
2
) (0, 0) (1, 0) (2, 0) (3, 0) (0, 1) (1, 1) (2, 1) (3, 1)
a 1 2 4 8 11 7 14 13
0,0
(a) 1 1 1 1 1 1 1 1
0,1
(a) 1 1 1 1 1 1 1 1
1,0
(a) 1 i 1 i 1 i 1 i
1,1
(a) 1 i 1 i 1 i 1 i
2,0
(a) 1 1 1 1 1 1 1 1
2,1
(a) 1 1 1 1 1 1 1 1
3,0
(a) 1 i 1 i 1 i 1 i
3,1
(a) 1 i 1 i 1 i 1 i
Table 6.1: Table of all Dirichlet characters modulo 15. The integers a in the
second row are the values of 2
1
11
2
modulo 15.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 179
The main result of this section is the following.
Theorem 6.7 (Orthogonality relations for Dirichlet characters). Let
q be a positive integer.
(i) For any Dirichlet character modulo q we have
q
a=1
(a) =
_
(q) if =
0
,
0 otherwise,
where
0
is the principal character modulo q.
(ii) For any integer a N we have
mod q
(a) =
_
(q) if a 1 mod q,
0 otherwise,
where the summation runs over all Dirichlet characters modulo q.
(iii) For any Dirichlet characters
1
,
2
modulo q we have
q
a=1
1
(a)
2
(a) =
_
(q) if
1
=
2
,
0 otherwise.
(iv) For any integers a
1
, a
2
N we have
mod q
(a
1
)(a
2
) =
_
(q) if a
1
a
2
mod q and (a
1
, q) = 1,
0 otherwise,
where the summation runs over all Dirichlet characters modulo q.
Proof. (i) Let S denote the sum on the left. If =
0
, then (a) = 1 if
(a, q) = 1, and (a) = 0 otherwise, so S = (q).
Suppose now that ,=
0
. Then there exists a number a
1
with (a
1
, q) = 1
such that (a
1
) ,= 1. Note that, since (a) = 0 if (a, q) > 1, the sum in (i)
may be restricted to terms with (a, q) = 1, 1 a q. Also, observe that, if
a runs through these values, then so does b = aa
1
, after reducing modulo q.
Therefore,
(a
1
)S =
1aq
(a,q)=1
(a
1
)(a) =
1aq
(a,q)=1
(a
1
a) =
1bq
(b,q)=1
(b) = S.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
180 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
Since (a
1
) ,= 1, this implies S = 0. This completes the proof of part (i).
(ii) Let S denote the sum on the left. If (a, q) > 1, then all terms in this
sum are 0, so S = 0. If (a, q) = 1 and a 1 mod q, then (a) = (1) = 1
for all characters modulo q, and since by Lemma 6.6 there exist exactly
(q) such characters, we have S = (q) in this case.
Now suppose that (a, q) = 1 and a , 1 mod q. By Lemma 6.6, there
exists a character
1
modulo q with
1
(a) ,= 1. Since the characters modulo
q form a group, if runs through all characters modulo q, then so does
1
.
We therefore have
1
(a)S =
mod q
1
(a)(a) =
mod q
(
1
)(a) =
mod q
(a) = S,
where in the last sum runs through all characters modulo q. Since
1
(a) ,=
1, this implies S = 0, as desired.
(iii) This follows by applying (i) with the character =
1
2
and noting
that =
0
if and only if
1
=
2
.
(iv) We may assume that (a
1
, q) = (a
2
, q) = 1 since otherwise the sum
is 0 and the result holds trivially. Therefore, a
2
has a multiplicative inverse
a
2
modulo q. We now apply (ii) with a = a
1
a
2
. Noting that
(a
2
)(a) = (a
2
a) = (a
2
a
1
a
2
) = (a
1
)
and hence
(a) =
(a
1
)
(a
2
)
= (a
1
)(a
2
),
and that a = a
1
a
2
1 mod q if and only a
1
a
2
mod q, we obtain the
desired relation.
The last part, (iv), is by far the most important, and it is key to Dirich-
lets argument. This identity allows one to extract terms satisfying a given
congruence from a sum. We will apply it with a
2
a xed congruence class
a mod q and with a
1
running through primes p, in order to extract those
primes that fall into the congruence class a mod q. This identity alone is
often referred to as the orthogonality relation for Dirichlet characters.
We record one simple, but useful consequence of part (i) of the above
theorem.
Corollary 6.8 (Summatory function of characters). Let q be a positive
integer and a character modulo q.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 181
(i) If is not the principal character
0
modulo q, then
nx
(n)
(q) (x 1).
(ii) If =
0
, then
nx
(n)
(q)
q
x
2(q) (x 1).
Proof. Since is periodic modulo q, we have, for any x 1,
nx
(n) = [x/q]S +R,
where
S =
q
a=1
(a), [R[
q
a=1
[(n)[.
Clearly, [R[ (q), and by part (i) of Theorem 6.7 we have S = 0 if ,=
0
,
and S = (q) if =
0
. The asserted bounds follow from these remarks.
6.3 Dirichlet L-functions
Given a Dirichlet character , its Dirichlet series, i.e., the function
(6.6) L(s, ) =
n=1
(n)
n
s
,
is called the Dirichlet L-function, or Dirichlet L-series, associated with the
character .
The analytic behavior of Dirichlet L-functions plays a crucial role in the
proof of Dirichlets theorem. The following theorem collects the main ana-
lytic properties of L-functions that we will need for the proof of Dirichlets
theorem. The key property, upon which the success of the argument hinges,
is the last one, which asserts that L-functions do not have a zero at the point
s = 1. This is also the most dicult property to establish, and we therefore
defer its proof to a later section.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
182 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
Theorem 6.9 (Analytic properties of L-functions). Let be a Dirich-
let character modulo q, and let L(s, ) denote the associated Dirichlet L-
function, dened by (6.6).
(i) If ,=
0
, where
0
is the principal character modulo q, then L(s, )
is analytic in the half-plane > 0.
(ii) If =
0
, then L(s, ) has a simple pole at s = 1 with residue (q)/q,
and is analytic at all other points in the half-plane > 0.
(iii) If ,=
0
, then L(1, ) ,= 0.
Proof. As mentioned above, we defer the proof of (iii) to a later section, and
only prove (i) and (ii) in this section.
By Corollary 6.8 the summatory function M(, x) =
nx
(n) satises,
for x 1,
(6.7) M(, x) =
_
_
_
O
q
(1) if ,=
0
,
(q)
q
x +O
q
(1) if =
0
,
where the O-constant depends only on q. Parts (i) and (ii) then follow as
special cases of Theorem 4.13.
Alternatively, one can obtain (i) and (ii) as follows:
If ,=
0
, then using partial summation and the fact that the partial
sums
nx
(n) are bounded in this case, one can easily see that the Dirich-
let series (6.6) converges in the half-plane > 0. Since a Dirichlet series
represents an analytic function in its half-plane of convergence, this shows
that L(s, ) is analytic in > 0 when ,=
0
.
In the case =
0
one can argue similarly with the arithmetic function
f(n) =
0
(n) (q)/q. By (6.7), the partial sums
nx
f(n) are bounded,
so the Dirichlet series F(s) =
n=1
f(n)n
s
converges in the half-plane
> 0 and is therefore analytic in this half-plane. On the other hand,
writing
0
(n) = f(n) + (q)/q, we see that L(s,
0
) = F(s) + ((q)/q)(s)
for > 1. Since F(s) is analytic in > 0 and (s) is analytic in > 0 with
the exception of a pole at s = 1 with residue 1, we conclude that L(s,
0
)
is analytic in > 0 with the exception of a pole at s = 1 with residue
(q)/q.
6.4 Proof of Dirichlets Theorem
In this section we prove Dirichlets theorem in the quantitative version given
in Theorem 6.2, modulo the nonvanishing result for L(1, ) stated in part
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 183
(iii) of Theorem 6.9. The latter result which will be established in the next
section.
We x positive integers a and q with (a, q) = 1, and we dene the
functions
S
a,q
(x) =
px
pa mod q
1
p
(x 2),
F
a,q
(s) =
p
pa mod q
1
p
s
( > 1),
F
(s) =
p
(p)
p
s
( > 1).
Note that the Dirichlet series F
a,q
(s) and F
(s), L(s, ),
and ultimately to the non-vanishing of L(1, ) for ,=
0
.
To ease the notation, we will not explicitly indicate the dependence of
error terms on q. Through the remainder of the proof, all O-constants are
allowed to depend on q.
Reduction to F
a,q
(s). We rst show that (6.8) follows from
(6.9) F
a,q
() =
1
(q)
log
1
1
+O(1) ( > 1).
To see this, let x 3 and take =
x
= 1+1/ log x in (6.9). Then the main
term on the right of (6.9) is equal to the main term on the right of (6.8),
and the error term in (6.9) is of the desired order O(1). Thus, it suces to
show that the left-hand sides of these relations, i.e., S
a,q
(x) and F
a,q
(
x
),
dier by at most O(1).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
184 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
To show this, we write
S
a,q
(x) F
a,q
(
x
) =
px
pa mod q
1 p
1x
p
p>x
pa mod q
1
p
x
(6.10)
=
2
,
say. Since
1 p
1x
= 1 exp
_
log p
log x
_
log p
log x
(p x),
we have
1
log x
px
log p
p
1,
by Mertens estimate. Moreover, by partial summation and Chebyshevs
estimate,
p>x
1
p
x
=
(x)
x
x
+
x
_
x
(u)
u
x+1
du
1
log x
+
_
x
1
u
x
log u
du
1
log x
+
x
1x
(
x
1)(log x)
1.
Thus, both terms on the right-hand side of (6.10) are of order O(1), which
is what we wanted to show.
Reduction to F
()
are absolutely convergent. By the orthogonality relation for characters (part
(iv) of Theorem 6.7) we have
F
a,q
() =
p
1
p
1
(q)
mod q
(a)(p) (6.11)
=
1
(q)
mod q
(a)
p
(p)
p
=
1
(q)
mod q
(a)F
().
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 185
Reduction to Dirichlet L-functions. Since a Dirichlet character is com-
pletely multiplicative and of absolute value at most 1, the associated L-
function L(s, ) has an Euler product representation in the half-plane > 1,
given by
L(s, ) =
p
_
1
(p)
p
s
_
1
.
Taking logarithms on both sides (using the principal branch of the loga-
rithm) we get, for > 1,
log L(s, ) =
p
log
_
1
(p)
p
s
_
=
m=1
(p)
m
mp
ms
= F
(s) +R
(s),
where F
(s) =
p
(p)p
s
is the function dened above and
[R
(s)[
m=2
1
mp
m
m=2
1
mp
m
p
1
p
2
< .
Thus, we have
(6.12) F
0
(s) is analytic in > 0. In particular, H(s) is bounded
in any compact set contained in this half-plane, and hence satises
(6.13) [H()[
(q)
2q( 1)
(1 <
0
)
with a suitable constant
0
> 1 (depending on q). Thus
log L(,
0
) = log
_
(q)/q
1
_
1 +H()( 1)
q
(q)
__
= log((q)/q) + log
1
1
+ log
_
1 +H()( 1)
q
(q)
_
= log
1
1
+O(1) (1 <
0
),
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
186 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
where in the last step we used (6.13). By (6.12) it follows that
(6.14) F
0
() = log
1
1
+O(1),
initially only in the range 1 <
0
, but in view of the trivial bound
[F
0
()[
p
1
p
p
1
p
0
< ( >
0
),
in the full range > 1.
Contribution of the non-principal characters. If ,=
0
, then, by
part (i) of Theorem 6.9, L(s, ) is analytic in > 0, and thus, in particular,
continuous at s = 1. Moreover, by the last part of this result, L(1, ) ,=
0. Hence, log L(s, ) is analytic and thus continuous in a neighborhood of
s = 1. In particular, there exists
0
> 1 such that log L(, ) is bounded in
1 <
0
. In view of (6.12), this implies
(6.15) F
() = O(1) ( ,=
0
),
rst for 1 <
0
, and then, since as before F
() is bounded in
0
,
for the full range > 1.
Proof of Dirichlets theorem. Substituting the estimates (6.14) and
(6.15) into (6.11), we obtain
F
a,q
() =
1
(q)
0
(a)F
0
() +O(1)
=
1
(q)
log
1
1
+O(1) (1 < 2),
since
0
(a) = 1 by the denition of a principal character and the assumption
(a, q) = 1. This proves (6.9), and hence the asserted estimate (6.8).
6.5 The non-vanishing of L(1, )
We now prove part (iii) of Theorem 6.9, which we restate in the following
theorem.
Theorem 6.10 (Non-vanishing of L(1, )). Let q be a positive integer
and a non-principal character modulo q. Then L(1, ) ,= 0.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 187
The proof requires several auxiliary results, which we state as lemmas.
The rst lemma is reminiscent of the 3-4-1 inequality of the previous
chapter (Lemma 5.9), which was key to obtaining a zero-free region for the
zeta function.
Lemma 6.11. Let
P(s) = P
q
(s) =
mod q
L(s, ).
Then, for > 1,
P() 1.
Proof. Expanding the Dirichlet series L(s, ) into Euler products and taking
logarithms, we obtain, for > 1,
log P() =
mod q
log L(, ) =
mod q
p
log
_
1
(p)
p
_
1
=
mod q
m=1
(p)
m
p
m
=
m=1
1
p
m
mod q
(p)
m
.
Since
mod q
(p)
m
=
mod q
(p
m
) =
_
(q) if p
m
1 mod q,
0 else,
by the complete multiplicativity of and the orthogonality relation for char-
acters (part (ii) of Theorem 6.7), the right-hand side above is a sum of
nonnegative terms, and the assertion of the lemma follows.
Proof of Theorem 6.10 for complex characters . We will use the above
lemma to show that L(1, ) ,= 0 in the case is a complex character modulo
q, i.e., if takes on non-real values. The argument is similar to that used in
the proof of the non-vanishing of (s) on the line = 1 (see Theorem 5.7).
We assume that L(1,
1
) = 0 for some complex character
1
modulo q.
We shall derive a contradiction from this assumption.
We rst note that, since
1
is a complex character, the characters
1
and
1
are distinct, and neither character is equal to the principal character
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
188 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
0
. Hence,
0
,
1
, and
1
each contribute a factor to the product P() in
Lemma 6.11. Splitting o these three factors, we obtain, for > 1,
(6.16) P() = L(,
0
)L(,
1
)L(,
1
)Q(),
where
Q() =
mod q
=
0
,
1
,
1
L(, ).
We now examine the behavior of each term on the right of (6.16) as
1+. First, by part (ii) of Theorem 6.9, L(s,
0
) has a simple pole
at s = 1, so we have L(,
0
) = O(1/( 1)) as 1+. Next, our
assumption L(1,
1
) = 0 and the analyticity of L(s,
1
) at s = 1 imply
L(,
1
) = O( 1), and since
L(,
1
) =
n=1
1
(n)
n
n=1
1
(n)
n
= L(,
1
),
we also have L(,
1
) = O( 1). Finally, by part (i) of Theorem 6.9, Q()
is bounded as 1+.
It follows from these estimates that P() = O( 1) as 1+. This
contradicts the bound P() 1 of Lemma 6.11. Thus L(1,
1
) cannot
be equal to 0, and the proof of Theorem 6.10 for complex characters is
complete.
The above argument breaks down in the case of a real character
1
, since
then
1
=
1
and in the above factorization of the product P() only one L-
function corresponding to
1
would appear, so the assumption L(1,
1
) = 0
would only give an estimate P() = O(1), which is not enough to obtain a
contradiction. To prove the non-vanishing of L(1, ) for real characters, a
completely dierent, and more complicated, argument is needed. We prove
several auxiliary results rst.
Lemma 6.12. Let be a real character and let f = 1 . Then
f(n)
_
1 if n is a square,
0 otherwise.
Proof. Since is multiplicative, so is f. Since is a real character modulo
q, we have (p) = 1 if p q, and (p) = 0 if p[q. Thus, at prime powers
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 189
p
m
we have
f(p
m
) = 1 +
m
k=1
(p)
k
=
_
_
1 if p[q,
m+ 1 if p q and (p) = 1,
0 if p q, (p) = 1, and m is odd,
1 if p q, (p) = 1, and m is even
It follows that
f(p
m
)
_
1 if m is even,
0 if m is odd.
By the multiplicativity of f this yields the assertion of the lemma.
Lemma 6.13. We have
nx
1
n
= 2
x +A+O
_
1
x
_
(x 1),
where A is a constant.
Proof. By Eulers summation formula, we have
nx
1
n
= 1 xx
1/2
+
_
x
1
u
1/2
du +
_
x
1
u(1/2)u
3/2
du
= 1 +O
_
1
x
_
+ 2(
x 1)
1
2
_
1
uu
3/2
du +O
__
x
u
3/2
du
_
= 2
x +A+O
_
1
x
_
with A = 1 (1/2)
_
1
uu
3/2
du.
Lemma 6.14. Let be a non-principal character modulo q and s a complex
number in the half-plane > 0. Then
(6.17)
nx
(n)
n
s
= L(s, ) +O
q,s
(x
) (x 1).
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
190 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
Proof. Let M(u) = M(, u) =
nu
(n). By partial summation we have,
for y > x,
x<ny
(n)
n
s
=
M(y)
y
s
M(x)
x
s
+s
_
y
x
M(u)u
s1
du.
Since is non-principal, we have M(u) = O
q
(1) by Corollary 6.8, so the
right-hand side above is bounded by
q
x
+[s[
_
y
x
u
1
du
q,s
x
.
Letting y , the left-hand side tends to L(s, )
nx
(n)n
s
, and
the result follows.
Proof of Theorem 6.10 for real characters . We x a real, non-principal
character modulo q. Throughout the proof, we let constants in O-estimates
depend on , and hence also on q, without explicitly indicating this depen-
dence.
We let f be dened as in Lemma 6.12 and consider the sum
S(x) =
nx
f(n)
n
.
On the one hand, by Lemma 6.12 we have
(6.18) S(x)
m
2
x
f(m
2
)
m
2
x
1
m
log x (x 2).
On the other hand, we can estimate S(x) by writing f(n) =
d|n
(d) =
dm=n
(d) and splitting up the resulting double sum according to the
Dirichlet hyperbola method:
S(x) =
d,mx
dmx
(d) 1
m
=
1
+
3
, (6.19)
where
1
=
x
(d)
mx/d
1
m
,
2
=
x
1
dx/m
(d)
d
,
3
=
x
(d)
x
1
m
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 191
The last three sums can be estimated using Lemmas 6.13 and 6.14: We
obtain
1
=
x
(d)
d
_
2
_
x/d +A+O
_
1
_
x/d
__
= 2
x
(d)
d
+A
x
(d)
d
+O
_
_
d
x
1
x
_
_
= 2
x
_
L(1, ) +O
_
1
x
__
+A
_
L(1/2, ) +O
_
1
x
1/4
__
+O(1)
= 2
xL(1, ) +O(1),
2
=
x
1
m
_
L(1/2, ) +O
_
1
_
x/m
__
= L(1/2, )
_
2x
1/4
+O(1)
_
+O
_
_
m
x
1
x
_
_
= L(1/2, )2x
1/4
+O(1),
3
=
_
L(1/2, ) +O
_
1
x
1/4
__
_
2x
1/4
+O(1)
_
= L(1/2, )2x
1/4
+O(1).
Substituting these estimates into (6.19), we get
S(x) = 2
xL(1, ) +O(1).
If now L(1, ) = 0, then we would have S(x) = O(1), contradicting (6.18).
Hence L(1, ) ,= 0, and the proof of Theorem 6.10 is complete.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
192 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
6.6 Exercises
6.1 Show that if every arithmetic progression a mod q with (a, q) = 1 con-
tains at least one prime, then every such progression contains innitely
many primes.
6.2 Show that if f is a periodic, completely multiplicative arithmetic func-
tion, then f is a Dirichlet character to some modulus q.
6.3 Let be a nonprincipal character mod q. Show that for all positive
integers a < b we have
b
n=a
(n)
(1/2)(q).
6.4 Given a rational number a with 0 < a 1, dene (s, a) =
n=0
(n +
a)
s
. Show that any Dirichlet L-function can be expressed in terms
of the functions (s, a), and that, conversely, any such function (s, a)
with rational a can be expressed in terms of Dirichlet L-functions.
6.5 Let a and q be positive integers with (a, q) = 1. Express the Dirichlet
series
na mod q
(n)n
s
in the half-plane > 1 in terms of Dirichlet
L-functions.
6.6 Given an arithmetic function f, and a real number , let f
(n) =
f(n)e
2in
, and let F
(s) in
terms of Dirichlet L-functions.
6.7 Let f(n) denote the remainder of n modulo 5 (so that f(n)
0, 1, 2, 3, 4 for each n), and let F(s) =
n=1
f(n)n
s
be the Dirichlet
series of f.
(i) Express F(s) in the half-plane > 1 in terms of Dirichlet L-
functions.
(ii) Show that F(s) has a merophorphic continuation to the half-
plane > 0 and nd all poles (if any) of F(s) in this half-plane,
and their residues.
6.8 Suppose is a non-principal character modulo q. We know (from
the general theory of L-series) that the Dirichlet L-series L(s, ) =
n=1
(n)n
s
converges at s = 1. Obtain an explicit estimate (i.e.,
one with numerical constants, rather than Os) for the speed of con-
vergence, i.e., a bound for the tails
n>x
(n)n
1
.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 193
6.9 Given a positive integer a with decimal representation a = a
1
. . . a
k
(so
that a
i
0, 1, . . . , 9, a
1
,= 0), let P
a
denote the set of primes whose
decimal representation begins with the string a
1
. . . a
k
, and let Q
a
de-
note the set of primes whose decimal representation ends with this
string. Let S
a
(x) =
px,pPa
1/p and T
a
(x) =
px,pQa
1/p. Ob-
tain asymptotic estimates for S
a
(x) and T
a
(x) with error term O
a
(1).
Which of these two functions is asymptotically larger?
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
194 CHAPTER 6. PRIMES IN ARITHMETIC PROGRESSIONS
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
Appendix A
Some results from analysis
A.1 Evaluation of
n=1
n
2
Theorem A.1.
n=1
n
2
=
2
/6.
Proof. We use the following result from Fourier analysis.
Theorem A.2 (Parsevals Formula). Let f be a bounded and inte-
grable function on [0, 1], and dene the Fourier coecients of f by a
n
=
_
1
0
f(x)e
2inx
dx. Then
_
1
0
[f(x)[
2
=
nZ
[a
n
[
2
.
This result can be found in most standard texts on Fourier series, and in
many texts on Dierential Equations (such as the text by Boyce/de Prima).
It is usually stated in terms of Fourier sine and cosine coecients, but the
above form is simpler and easier to remember.
We apply Parsevals formula to the function f(x) = x (which, when
extended by periodicity to all of R, becomes the fractional parts function
x). We have a
0
=
_
1
0
xdx = 1/2 and, for n ,= 0,
a
n
=
_
1
0
xe
2inx
dx =
1
2in
e
2inx
x
1
0
1
2in
_
1
0
e
2inx
dx =
1
2in
.
Hence, the right-hand side in Parsevals formula equals
1
4
+
nZ,n=0
1
4
2
n
2
=
1
4
+
1
2
2
n=1
1
n
2
,
195
196 APPENDIX A. SOME RESULTS FROM ANALYSIS
whereas the left side is equal to
_
1
0
x
2
dx = 1/3. Setting the two expressions
equal and solving for
n=1
1/n
2
gives
n=1
1/n
2
= (1/3 1/4)2
2
=
2
/6
as claimed.
Remark. There are several alternative proofs of this result. One consists of
expanding the function dened by f(x) = x
2
on [0, 1), and extended to a
periodic function with period 1, into a Fourier series: f(x) =
nZ
a
n
e
2inx
,
with a
0
=
_
1
0
x
2
dx = 1/3 and a
n
= 1/(2
2
n
2
) for n ,= 0. In this case, the
Fourier series converges (absolutely) for all x. From the general theory of
Fourier series its sum equals f(x) at points where f(x) is continuous, and
(1/2)(f(x)+f(x+)) at any point x at which f has a jump. We take x = 0.
The Fourier series at this point equals 1/3+(1/
2
)
n=1
1/n
2
. On the other
hand, we have (1/2)f(0) + f(0+)) = (1/2)(1 + 0) = 1/2. Setting the two
expressions equal and solving for
n=1
1/n
2
, we again obtain the evaluation
2
/6 for this sum.
Yet another approach is to express
n=1
1/n
2
= (2) in terms of (1),
using the functional equation of the zeta function, and evaluating (1).
This is the method given at in Apostols book (see Theorem 12.17).
A.2 Innite products
Given a sequence a
n
n=1
of real or complex numbers, the innite product
(A.1)
n=1
(1 +a
n
)
is said to be convergent to a (real or complex) limit P, if the partial products
P
n
=
n
k=1
(1 + a
k
) converge to P as n . The product is said to be
absolutely convergent, if the product
n=1
(1 +[a
n
[) converges.
It is easy to see that absolute convergence of an innite product implies
convergence. Indeed, dening P
n
as above and setting P
n
=
n
k=1
(1+[a
k
[),
we have, for m > n,
[P
m
P
n
[ =
n
k=1
[1 +a
k
[
k=n+1
(1 +a
k
) 1
k=1
(1 +[a
k
[)
_
m
k=n+1
(1 +[a
k
[) 1
_
= P
m
P
n
,
so by Cauchys criterion for the sequence P
n
, the sequence P
n
satises
Cauchys criterion and hence converges.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01
INTRODUCTION TO ANALYTIC NUMBER THEORY 197
The following lemma gives a criterion for the absolute convergence of an
innite product.
Lemma A.3. The product (A.1) converges absolutely if and only if
(A.2)
n=1
[a
n
[ < .
Proof. Let P
n
=
n
k=1
(1 +[a
k
[). By denition, the product (A.1) converges
absolutely if and only if the sequence P
n
converges. Since this sequence
is non-decreasing, it converges if and only if it is bounded. If (A.2) holds,
then, in view of the inequality 1 + x e
x
(valid for any real x), we have
P
n
exp
n
k=1
[a
k
[ e
S
, where S is the value of the innite series in
(A.2), so P
n
is bounded. Conversely, if P
n
is bounded, then using the
inequality P
n
1 +
n
k=1
[a
k
[ (which follows by multiplying out the factors
in P
n
and discarding terms involving more than one factor [a
k
[), we see that
the series in (A.2) converges.
The next lemma shows that the reciprocal of a convergent innite prod-
uct is the product of the reciprocals.
Lemma A.4. If the product (A.1) converges to a value P ,= 0, then the
product ()
n=1
(1 + a
n
)
1
converges to P
1
. If, in addition, the product
(A.1) is absolutely convergent, then so is the product ().
Proof. We rst note that if the product (A.1) converges to a non-zero limit
P, then none of the factors 1+a
k
can be zero (since if 1+a
k
= 0 then P
n
= 0
for all n k, and so P = lim
n
P
n
= 0). Hence, the partial products P
n
=
n
k=1
(1+a
k
)
1
are well-dened, and equal to P
1
n
, where P
n
=
n
k=1
(1+a
k
).
The convergence of P
n
to P therefore implies that of P
n
to P
1
, proving
the rst assertion of the lemma. If the product (A.1) converges absolutely,
then, by Lemma A.3, we have
n=1
[a
n
[ < and hence [a
n
[ 1/2 for
suciently large n. Writing (1 + a
n
)
1
= 1 + a
n
, and using the inequality
[1 +x[
1
1 +[x[
n=0
[x[
n
1 + 2[x[ ([x[ 1/2), we see that for such n,
[a
n
[ 2[a
n
[. Hence, the convergence of
n=1
[a
n
[ implies that of
n=1
[a
n
[,
and applying the criterion the absolute convergence given in Lemma A.3 we
conclude that the product
n=1
(1 +a
n
) converges absolutely.
Math 531 Lecture Notes, Fall 2005 Version 2006.09.01