A diluted version of the perceptron model
arXiv:math/0603162v1 [math.PR] 7 Mar 2006
David Márquez-Carreras1 , Carles Rovira1 and Samy Tindel2
1
Facultat de Matemàtiques, Universitat de Barcelona,
Gran Via 585, 08007-Barcelona, Spain
e-mail:
[email protected],
[email protected]
2
Institut Elie Cartan, Université de Nancy 1,
BP 239, 54506-Vandoeuvre-lès-Nancy, France
e-mail:
[email protected]
Abstract
This note is concerned with a diluted version of the perceptron
model. We establish a replica symmetric formula at high temperature,
which is achieved by studying the asymptotic behavior of a given spin
magnetization. Our main task will be to identify the order parameter
of the system.
Keywords: spin glasses, perceptron model, magnetization, order parameter.
MSC: 60G15, 82D30
1
Partially supported by DGES grant BFM2003-01345.
1
Introduction
A wide number of spectacular advances have occurred in the spin glasses
theory during the last past years, and it could easily be argued that this
topic, at least as far as the Sherrington-Kirkpatrick model is concerned, has
reached a certain level of maturity from the mathematical point of view:
the cavity method has been set in a clear and effective way in [9], some
monotonicity properties along a smart path have been discovered in [4], and
these elements have been combined in [10] in order to obtain a completely
rigorous proof of the Parisi solution [7].
However, there are some canonical models of mean field spin glasses for
which the basic theory is far from being complete, and this paper proposes
to study the high temperature behavior of one of them, namely the diluted
perceptron model, which can be described as follows: for N ≥ 1, consider the
configuration space ΣN = {−1, 1}N , and for σ = (σ1 , . . . , σN ) ∈ ΣN , define
σ ) by
a Hamiltonian −HN,M (σ
!
X
X
σ) =
ηk u
gi,k γi,k σi .
(1)
−HN,M (σ
i≤N
k≤M
In this Hamiltonian, M stands for a positive integer such that M = αN
for a given α ∈ (0, 1); u is a bounded continuous function defined on R;
{gi,k , i ≥ 1, k ≥ 1} and {γi,k , i ≥ 1, k ≥ 1} are two independent families
of independent random variables, gi,k following a standard Gaussian law and
γi,k being a Bernoulli random variable with parameter Nγ , which we denote
by B( Nγ ). Eventually, {ηk , k ≥ 1} stands for an arbitrary family of numbers,
with ηk ∈ {0, 1}, even if the case of interest for us will be ηk = 1 for all
k ≤ M. Associated to this Hamiltonian, define a random Gibbs measure GN
on ΣN , whose density with respect to the uniform measure µN is given by
−1
σ )), where the partition function ZN,M is defined by
ZN,M
exp (−HN,M (σ
X
σ )) .
ZN,M =
exp (−HN,M (σ
σ ∈ΣN
In the sequel, we will denote by hf i the average of a function f : ΣnN −→ R
with respect to dG⊗n
N , i.e.
!
X
X
−n
σl) .
σ 1 , . . . , σ n ) exp −
HN,M (σ
hf i = ZN,M
f (σ
σ 1 ,...,σ
σ n )∈Σn
(σ
N
l≤n
2
The measure described above is of course a generalization of the usual
perceptron model, which has been introduced for neural computation purposes (see [5]), and whose high temperature behavior has been described in
[9, Chapter 3], or [8] for an approach based on convexity properties of the
Hamiltonian. Indeed the usual perceptron model is induced by a Hamiltonian
ĤN,M on ΣN given by
!
X
1 X
σ) =
−ĤN,M (σ
u
gi,k σi ,
(2)
N 1/2 i≤N
k≤M
where we have kept the notations introduced for equation (1). Thus, our
model can be seen as aPreal diluted version of (2), in the sense that in our
model, each condition i≤N gi,k γi,k σi ≥ 0 only involves, in average, a finite
number of spins, uniformly in N. It is worth noticing at that point that
this last requirement fits better to the initial neural computation motivation,
since in a one-layer perceptron, an output is generally obtained by a threshold
function applied to a certain number of spins, that does not grow linearly
with the size of the system. Furthermore, our coefficient γ is arbitrarily large,
which means that the global interaction between spins is not trivial. Another
motivation for the study of the system induced by (1) can be found in [2].
Indeed, in this latter article, a social interaction model is proposed, based on
a Hopfield-like (or perceptron-like) diluted Hamiltonian with parameter N
and M, where N represents the number of social agents, and M the diversity
of these agents, the number of interactions of each agent varying with the
dilution parameter. However, in [2], the equilibrium of the system is studied
only when M is a fixed number. The result we will explain later on can thus
be read as follows: as soon as the diversity M does not grow faster than
a small proportion of N, the capacity the social interaction system is not
attained
Let us turn now to a brief description of the results contained in this
paper: in fact, we will try to get a replica symmetric formula for the system
when M is a small proportion of N, which amounts to identify the limit of
1
log(ZN,M ) when N → ∞, M = αN. This will be achieved, as in the diluted
N
SK model studied through the cavity method (see [3] for a study based on
monotonicity methods), once the limiting law for the magnetization hσi i is
obtained. This will thus be our first aim, and in order to obtain that result,
we will try to adapt the method designed in [9, Chapter 7]. However, in
3
our case, the identification of the limiting law for hσi i will be done through
an intricate fixed point argument, involving a map T : P → P (where P
stands for the set of probability measures on [−1, 1]), which in turn involves
a kind of P(λ)⊗P(µ) measure, for two independent Poisson measures P(λ)
and P(µ). For sake of readability, we will give the details of (almost) all
the computations we will need in order to establish our replica symmetric
formula, but it should be mentioned at that point that our main contribution,
with respect to [9, Chapter 7], is that construction of the invariant measure.
More specifically, our paper is divided as follows:
• At Section 2, we will establish a decorrelation result for two arbitrary
spins. Namely, setting U∞ = kuk∞ , for αU∞ small enough, we will
show that
K
E [|hσ1 σ2 i − hσ1 ihσ2 i|] ≤ ,
N
for a constant K > 0.
• At Section 3, we will study the asymptotic behavior of the magnetization of m spins, where m is an arbitrary integer. Here again, if αU∞ is
small enough, we will see that
"
#
X
Km3
,
E
|hσi i − zi | ≤
N
i≤m
where z1 , . . . , zm is a family of i.i.d random variable, with law µα,γ , and
µα,γ is the fixed point of the map T alluded to above, whose precise
description will be given at the beginning of Section 3.
• Eventually, at Section 4, we obtain the replica symmetric formula for
our model: set
Z D
E
X
V̄p =
exp(u(
gi,M σi ))
dµα,γ (x1 ) · · · dµα,γ (xp )
(x1 ,...,xp )
i≤p
G(γ) = α log
!
γp
V̄p+1
exp(−γ) E
,
p!
V̄
p
p=0
∞
X
where h·ix means integration
with respect to the product measure ν
R
p
on {−1, 1} such that σi dν = xi . Let F : [0, 1] → R+ be defined by
4
F (0) = log 2 − αu(0) and F ′ (γ) = G(γ). Then, if αU∞ is small enough,
we will get that
K
1
E [log(ZN,M )] − F (γ) ≤ ,
N
N
for a strictly positive constant K.
All these results will be described in greater detail in the corresponding
sections.
2
Spin correlations
As in [9, Chapter 7], the first step towards a replica symmetric formula will
be to establish a decorrelation result for two arbitrary spins in the system.
However, a much more general property holds true, and we will turn now to
its description: for j ≤ N, let Tj be the transformation of ΣnN that, for a
σ 1 , . . . , σ n ) in ΣnN , exchanges the j-th coordinates of σ 1 and
configuration (σ
σ 2 . More specifically, let f : ΣnN → R, with n ≥ 2, and let us write, for
j ≤ N,
f = f σ 1j c , σj1 ; σ 2j c , σj2 ; . . . ; σ njc , σjn ,
l
l
l
, σj+1
, . . . , σN
). Then define f ◦ Tj
where, for l = 1, . . . , n, σ lj c = (σ1l . . . , σj−1
by
σ 1 , . . . , σ n ) = f σ 1j c , σj2 ; σ 2j c , σj1 ; . . . ; σ njc , σjn .
(3)
f ◦ Tj (σ
For j ≤ N − 1, we will call Uj the equivalent transformation on ΣnN −1 .
Definition 2.1 We say that Property P(N, γ0 , B) is satisfied if the following
requirement is true: let f and f ′ be two functions on ΣnN depending on m
coordinates, such that f ≥ 0, f ′ ◦ TN = −f ′ , and there exists Q ≥ 0 such
that |f ′ | ≤ Qf ; then if γ ≤ γ0 we have
E
hf ′ i
mQB
≤
,
hf i
N
for any Hamiltonian of the form (1), uniformly in η.
Set now U∞ = kuk∞ . With Definition 2.1 in hand, one of the purposes of
this section is to prove the following Theorem.
5
Theorem 2.2 Let γ0 be a positive number, and U∞ be small enough, so that
4U∞ αγ02 e4U∞ eαγ0 (e
4U∞ −1)
3 + 2γ0 + α(γ02 + γ03 )e4U∞ < 1.
(4)
Then there exists a number B0 (γ0 , U∞ ) such that if γ ≤ γ0 , the property
P(N, γ0 , B0 ) holds true for each N ≥ 1.
In the previous theorem, notice that the value of γ0 has been picked arbitrarily. Then we have to choose U∞ , which also contains implicitly the temperature parameter, accordingly. Let us also mention that the spin decorrelation
follows easily from the last result:
Corollary 2.3 Assuming (4) there exists K > 0 such that, for all γ < γ0 ,
E |hσ1 σ2 i − hσ1 ihσ2 i| ≤
K
.
N
Proof: It is an easy consequence of property P(N, γ0, B0 ) applied to n = 2,
σ 1 , σ 2 ) = σ11 (σ21 − σ22 ).
f = 1 and f ′ (σ
We will prepare now the ground for the proof of Theorem 2.2, which will
be based on an induction argument over N. A first step in this direction will
be to state the cavity formula for our model: for σ = (σ1 , . . . , σN ) ∈ ΣN , we
write
ρ ≡ (ρ1 , . . . , ρN −1 ) = (σ1 , . . . , σN −1 ) ∈ ΣN −1 .
Then the Hamiltonian (1) can be decomposed into
σ) =
−HN,M (σ
X
ηk γN,k u
X
gi,k γi,k σi + gN,k σN
i≤N −1
k≤M
!
− HN−−1,M (ρρ),
with
−HN−−1,M (ρρ) =
X
k≤M
ηk− u
X
gi,k γi,k σi
i≤N −1
!
,
and
ηk− = ηk (1−γN,k ). (5)
Note that in HN−−1,M , the coefficients ηk− = ηk (1 −γN,k ) are not deterministic,
and hence HN−−1,M is not really of the same kind as HN,M . However, this
problem can be solved by conditioning on {γN,k , k ≤ M}. Then, given the
6
randomness contained in the γN,k , the expression HN−−1,M (ρρ) is a Hamiltonian
−
of a (N − 1)-spin system with γi,k ∼ B( Nγ−1 ), where γ − = γ NN−1 and so
γ − ≤ γ ≤ γ0 .
Thus, given a function f : ΣnN −→ R, we easily get the following decomposition of the mean value of f with respect to G⊗n
N :
hf i =
hAvf ξi−
,
hAvξi−
with
ξ = exp
XX
X
ηk γN,k u
l≤n k≤M
l
gi,k γi,k σil + gN,k σN
i≤N −1
(6)
!!
,
(7)
and with hf¯i− defined, for a given f¯ : ΣnN −1 → R, by
P
P
f¯(ρρ1 , . . . , ρ n ) exp − l≤n HN−−1,M (ρρl )
ρ 1 ,...,ρ
ρn )∈Σn
(ρ
P N−1
P
.
hf¯i− =
exp − l≤n HN−−1,M (ρρl )
ρ1 ,...,ρ
ρn )∈Σn
(ρ
N−1
Notice also that in expression (6), Av stands for the average with respect to
1
n
the last component of the system, namely if f = f (ρρ1 , σN
, . . . , ρ n , σN
), then
Avf (ρρ1 . . . , ρ n ) =
1
2n
X
1
n
f (ρρ1 , σN
, . . . , ρ n , σN
).
j
=±1,j≤n
σN
Let us introduce now a little more notation: in the sequel we will have to
take expectations for a fixed value of ξ given at (7). Let us denote thus by
EγN the expectation given γN,k , k ≤ M, and define
M
E−,γN [ · ] = EγN · | gN,k , gi,k , γi,k , i ≤ N − 1, k ∈ DN,1
,
(8)
M
where DN,1
is given by
M
DN,1
= {k ≤ M; γN,k = 1}.
One has to be careful about the way all these conditioning are performed,
M
but it is worth observing that the set DN,1
is not too large: indeed, it is
obvious that, setting |A| for the size of a set A, we have
X
M
γN,k ,
(9)
|DN,1
|=
k≤M
7
and thus
M
E|DN,1
|=M
γ
= αγ.
N
Let us go on now with the first step of the induction procedure for the
proof of Theorem 2.2: in P(N, γ0 , B) we can assume without loss of generality
that f and f ′ depend on the coordinates 1, . . . , m − 1, N. Moreover, since
|f ′ξ| ≤ Qf ξ, we have
|hAvf ′ ξi− | ≤ hAv|f ′ξ|i− ≤ hQAvf ξi− ,
and hence
hAvf ′ ξi−
≤ Q.
hAvf ξi−
(10)
We now define the following two events:
M
Ω1 = {∃p ≤ m − 1, k ∈ DN,1
; γp,k = 1}
= {∃p ≤ m − 1, k ≤ M; γp,k = γN,k = 1},
M
Ω2 = {∃j ≤ N − 1, k1 , k2 ∈ DN,1
; γj,k1 = γj,k2 = 1}
= {∃j ≤ N − 1, k1 , k2 ≤ M; γj,k1 = γj,k2 = γN,k1 = γN,k2 = 1}.
These two events can be considered as exceptional. Indeed, it is readily
checked that
P (Ω1 ) ≤ α
γ2
(m − 1),
N
P (Ω2 ) ≤ α2 γ 4
N −1
.
N2
Thus, if Ω = Ω1 ∪ Ω2 , we get
P (Ω) ≤
αγ 2 (m − 1) + α2 γ 4
,
N
and using this fact together with (10), we have
E
hf ′ i
hf i
hAvf ′ ξi−
hAvf ξi−
hAvf ′ ξi−
hAvf ′ ξi−
= E 1Ω
+ E 1Ω c
hAvf ξi−
hAvf ξi−
2
2 4
hAvf ′ ξi−
αγ (m − 1) + α γ
.
+ E 1Ω c
≤ Q
N
hAvf ξi−
= E
8
(11)
Consequently, in order to prove Theorem 2.2 we only need to bound accurately the expectation of the right-hand side of (11) by means of the induction
hypothesis. To this purpose, we will introduce some new notations and go
through a series of lemmas: set
M
J1 = {j ≤ N; γj,k = 1 for some k ∈ DN,1
} − {N},
and observe that, when Ω1 does not occur,
J1 ∩ {1, . . . , m − 1} = ∅.
Denote |J1 | = card(J1 ) and write an enumeration of J1 as follows: J1 =
{j1 , . . . , j|J1 | }.
Lemma 2.4 Let Uj be the transformation defined at (3), and f ′ : ΣnN → R
such that f ′ ◦ TN = −f ′ . When Ω1 does not occur, we have
Y
Uj = −Avf ′ ξ.
(Avf ′ ξ) ◦
j∈J1
Proof: The proof of this lemma can be done following the steps
Qof [9, Lemma
7.2.4], and we include it here for sake of readability. Set T = j∈J1 Tj . Since
f ′ depends only on the coordinates {1, . . . , m − 1, N} and this set is disjoint
from J1 , we have f ′ ◦ T = f ′ . Moreover,
f ′ ◦ T ◦ TN = f ′ ◦ TN = −f ′ .
On the other hand, ξ only depends on J1 ∪ {N} and using
ξ(σ 1 , σ 2 , . . . , σ n ) = ξ(σ 2 , σ 1 , . . . , σ n ),
we obtain
ξ ◦ T ◦ TN = ξ.
Hence
and, since TN2 = Id, we get
(f ′ ξ) ◦ T ◦ TN = −f ′ ξ,
(f ′ ξ) ◦ T = −(f ′ ξ) ◦ TN .
9
(12)
Eventually,
Av[(f ′ ξ) ◦ TN ] = Avf ′ ξ,
′
(13)
′
Av[(f ξ) ◦ T ] = (Avf ξ) ◦
Y
Uj .
(14)
j∈J1
The proof is now easily concluded by plugging (13) and (14) into (12).
Let us now go on with the proof of Theorem 2.2: thanks to Lemma 2.4,
when Ω1 does not occur, we can write
Y
1
1 X
Avf ′ ξ = Avf ′ ξ − (Avf ′ ξ) ◦
fs ,
(15)
Ujs =
2
2
1≤s≤|J1 |
s≤|J1 |
with
fs = (Avf ′ ξ) ◦
Y
l≤s−1
Ujl − (Avf ′ ξ) ◦
Y
Ujl .
l≤s
Notice that Uj2 = Id, and that fs enjoys the same kind of antisymmetric
property as f ′ , since fs ◦ Ujs = −fs .
M
Define R1 = |DN,1
|. Then, recalling relation (9), we have
M
R1 = |DN,1
|=
X
γN,k ,
k≤M
and let us enumerate as k1 , . . . , kR1 the values k ≤ M such that γN,k = 1.
We also define I11 , . . . , IR1 1 as follows:
Iv1 = {j ≤ N − 1; γj,kv = 1},
for
v ≤ R1 ,
and observe that we trivially have
J1 =
[
Iv1 .
v≤R1
Moreover, when Ω2 does not occur, we have
Iv11 ∩ Iv12 = ∅,
10
if v1 6= v2 .
(16)
Then, on Ωc , we get
|J1 | = Card(J1 ) =
X
v≤R1
|Iv1 |.
(17)
Furthermore, it is easily checked that, for each v, and conditionally on the
γN,k , the quantity |Iv1 | is a binomial random variable with parameters N − 1
and Nγ , which we denote by Bin(N − 1, Nγ ).
With all these notations in mind, our next step will be to bound fs in
function of f , in order to get a similar condition to that of Definition 2.1:
Lemma 2.5 Recall that U∞ = kuk∞. Then, on Ωc , for js ∈ Iv1 , we have
|fs | ≤ Q̂Avf ξ,
where
Q̂ ≡ 4QU∞ exp (4U∞ R1 ) .
Proof: Let us decompose ξ as ξ = ξ ′ ξ ′′, with
ξ ′ = exp
X X
X
ηk γN,k u
i≤N −1
3≤l≤n k≤M
ξ ′′ = exp
XX
l
gi,k γi,k σil + gN,k σN
X
ηk γN,k u
l
gi,k γi,k σil + gN,k σN
i≤N −1
l≤2 k≤M
!!
!!
,
.
(18)
Thus
ξ ≥ ξ ′ exp −
′
XX
l≤2 v̄≤R1
≥ ξ exp (−2U∞ R1 ) ,
u
X
l
gi,kv̄ γi,kv̄ σil + gN,kv̄ σN
i≤N −1
!!
and hence
Avf ξ ≥ (Avf ξ ′ ) exp (−2U∞ R1 ) .
11
(19)
On the other hand, since f ′ only depends on {1, . . . , m − 1, N}, we have
f ′ ◦ Tjl = f ′ for any l ≤ |J1 | on Ωc , which yields
Y
Y
fs = (Avf ′ ξ) ◦
Ujl − (Avf ′ ξ) ◦
Ujl
l≤s
l≤s−1
= Av (f ′ ξ) ◦
= Av f ′
ξ◦
Y
Tjl − (f ′ ξ) ◦
Y
Y
l≤s−1
l≤s−1
Tjl − ξ ◦
Y
l≤s
Tjl
l≤s
Tjl
!
!!
,
(20)
where we have used the fact that J1 can be written as J1 = {j1 , . . . , j|J1 | }.
Moreover, for any l, by construction of ξ ′ , we have ξ ′ ◦ Tjl = ξ ′ . Thus,
"
#
Y
Y
Y
Y
Tjl − ξ ′′ ◦
Tjl .
ξ◦
Tjl − ξ ◦
Tjl = ξ ′ ξ ′′ ◦
(21)
l≤s−1
l≤s−1
l≤s
l≤s
Set now
Γ = sup ξ ′′ ◦
σ
Y
l≤s−1
Tjl − ξ ′′ ◦
Y
l≤s
Tjl = sup |ξ ′′ − ξ ′′ ◦ Tjs |.
σ
Then, from (20) and (21), and invoking the fact that |f ′ | ≤ Qf , we get
|fs | ≤ ΓAv(|f ′ |ξ ′) ≤ QΓAvf ξ ′ .
(22)
We now bound Γ: recall that ξ ′′ is defined by (18), and thus
Y
ξv̄ ,
ξ ′′ =
v̄≤R1
with
ξv̄ = exp
X
l≤2
ηkv̄ u
X
l
gi,kv̄ γi,kv̄ σil + gN,kv̄ σN
i≤N −1
!!
.
Recall now that we have assumed that js ∈ Iv1 . Therefore, we have js ∈
/ Iv̄1 if
v̄ 6= v, according to the fact that Iv1 ∩ Iv̄1 = ∅ on Ωc . Hence
ξv̄ ◦ Tjs = ξv̄ ,
12
and
ξ ′′ − ξ ′′ ◦ Tjs = (ξv − ξv ◦ Tjs )
Y
ξv̄ .
(23)
v̄6=v
On the other hand, since |ex − ey | ≤ |x − y|ea for |x|, |y| ≤ a, we obtain
|ξv − ξv ◦ Tjs | ≤ 4U∞ e2U∞ ,
(24)
and we also have the trivial bound
ξv̄ ≤ e2U∞ .
(25)
Thus, plugging (24) and (25) into (23), we get
Γ ≤ 4U∞ e2U∞ R1 .
Combining this bound with (19) and (22), the proof is now easily completed.
We are now ready to start the induction procedure on P(N, γ0, B), which
will use the following elementary lemma (whose proof is left to the reader).
Lemma 2.6 Let R be a random variable following the Bin(M, Nγ ) distribution, and λ be a positive number. Then
λ
E ReλR ≤ αγeλ eαγ(e −1) ,
(26)
2 λR
2 2 2λ αγ(eλ −1)
λ αγ(eλ −1)
E R e
≤ α γ e e
+ αγe e
.
(27)
Let us proceed now with the main step of the induction:
Proposition 2.7 Assume that P (N −1, γ0, B) holds for N ≥ 2 and γ ≤ γ0 .
Consider f and f ′ as in Definition 2.1. Then
′
hf i
mQ
E
αγ 2 + α2 γ 4 + 4BΥ(α, γ, U∞ ) ,
(28)
≤
hf i
N
where
Υ(α, γ, U∞ ) = U∞ αγ 2 e4U∞ eαγ(e
4U∞ −1)
13
3 + 2γ + α(γ 2 + γ 3 )e4U∞ .
Proof: Using (11) and (15), we have
E
′
hf i
hf i
2
≤Q
2 4
X |hfs i− |
αγ (m − 1) + α γ
1
.
+ E 1Ωc
N
2
hAvf ξi−
s≤|J1 |
However, on Ωc , the functions fs and Avf ξ depend on m − 1 + |J1 | coordinates. Since γ − ≤ γ and m − 1 + |J1 | ≤ m(1 + |J1 |), the definition of the
expectation E−,γN , the property P(N − 1, γ0 , B), (17) and Lemma 2.5 imply
X |hfs i− |
X
|hfs i− |
= E 1Ωc
E 1Ωc
E−,γN
hAvf ξi−
hAvf ξi−
s≤|J1 |
s≤|J1 |
X (m − 1 + |J1 |)B Q̂
≤ E 1Ωc
N −1
s≤|J1 |
m
≤4
BQU∞ E 1Ωc |J1 |(1 + |J1 |) e4U∞ R1
N −1
m
≤ 8 BQU∞ E 1Ωc |J1 |(1 + |J1 |) e4U∞ R1 .
N
Recall that, according to (16), we have
X
|Iv1 |,
|J1 | ≤
v≤R1
and that the quantity R1 is a Bin(M, Nγ ) random variable. Thus
io
io
n
h
n h
λR1
λR1
λR1
R1 = E e
E 1Ωc |J1 | R1
= E E 1Ωc |J1 | e
E 1Ωc |J1 | e
≤ γE R1 eλR1 ,
n h
io
E 1Ωc |J1 |2 eλR1 = E E 1Ωc |J1 |2 eλR1 R1
io
n
h
= E eλR1 E 1Ωc |J1 |2 R1
(29)
≤ (γ + γ 2 )E (R1 + R12 ) eλR1 .
The proof of this proposition is now easily concluded by applying the previous
bounds, together with Lemma 2.6, to the quantity
E 1Ωc |J1 |(1 + |J1 |) e4U∞ R1 .
14
We can turn now to the main aim of this Section:
Proof of Theorem 2.2: The result is now an immediate consequence of
(4) and Proposition 2.7, applied to
B = B0 =
αγ 2 + α2 γ 4
,
1−ε
where ε satisfies
4U∞ αγ02 e4U∞ eαγ0 (e
4U∞ −1)
3 + 2γ0 + α(γ02 + γ03 )e4U∞ < ε < 1.
Before closing this Section, we will give an easy consequence of Theorem
2.2: we will see that, as N grows to ∞, the Gibbs measure GN taken on a
finite number of spins looks like a product measure. To this purpose, let us
denote by h·i• the average with respect to the product measure ν on ΣN −1
such that
Z
∀i ≤ N − 1,
σi dν(ρρ) = hσi i− .
Equivalently, for a function f¯ on ΣN −1 , we can write
N −1
hf¯i• = hf¯(σ11 , . . . , σN
−1 )i− ,
where σii is the i-th coordinate of the i-th replica ρ i . Recall also that, for
v ≤ R1 , Iv1 has been defined as
Iv1 = {i ≤ N − 1; γi,kv = 1}.
We now introduce the enumeration {iv1 , . . . , iv|Iv1 | } of this set. Furthermore,
given the randomness contained in the γN,k , the law of |Iv1 | is a Bin(N −1, Nγ ).
Proposition 2.8 Assume (4) and γ ≤ γ0 , and consider
X
X
givp ,kv σivp + gN,kv σN .
Θ = exp
ηkv u
v≤R1
p≤|Iv1 |
15
Then, when Ω does not occur, we have
E−,γN
hAvσN Θi− hAvσN Θi•
|J1 | + 1 2U∞
≤ 2B0 (|J1 | − 1)
−
(e
− 1),
hAvΘi−
hAvΘi•
N −1
where the conditional expectation E−,γN has been defined at (8).
Remark 2.9 The quantity Θ appears naturally in the decomposition of the
Hamiltonian HN,M . Indeed, on Ωc2 , we have
!
X
X
σ) =
−HN,M (σ
ηk u
gi,k γi,k σi
k≤M
=
=
X
ηk u
X
gi,k γi,k σi
M
k ∈D
/ N,1
i≤N −1
X
X
M
k ∈D
/ N,1
ηk u
i≤N
gi,k γi,k σi
i≤N −1
!
!
X
+
X
ηk u
X
+
v≤R1
gi,k γi,k σi + gN,k σN
i≤N −1
M
k∈DN,1
ηkv u
X
p≤|Iv1 |
!
givp ,kv σivp + gN,kv σN .
Observe also that ξ defined by (7) evaluated for n = 1 gives ξ = Θ.
Proof of Proposition 2.8: The proof is similar to Proposition 7.2.7 in [9],
and we include it here for sake of completeness: On Ωc , since the sets Iv1 are
disjoint, the values ivp , for any v and p, are different and we can write
[
v≤R1
Iv1 = J1 ≡ j1 , . . . j|J1 | .
Set
f ′ = f ′ σj11 , . . . , σj1|J
1|
≡ AvσN Θ,
f = f σj11 , . . . , σj1|J
Let us also define, for 2 ≤ l ≤ |J1 |,
fj′l = f ′ σ11 , . . . , σjjll , σj1l+1 , . . . σj1|J
16
1|
1|
≡ AvΘ.
and fjl in a similar way. Then
hAvσN Θi− hAvσN Θi•
−
hAvΘi−
hAvΘi•
" ′
#
′
X
fj′l −
fjl −1 −
hfj′1 i− hfj|J1| i−
≤
E−,γN
−
−
= E−,γN
hfj1 i− hfj|J1| i−
hfjl −1 i−
hfjl i−
2≤l≤|J1 |
X
hfj′l −1 − fj′l i−
hfj′l i− hfjl −1 − fjl i−
≤
E−,γN
. (30)
+ E−,γN
hfjl −1 i−
hfjl −1 i− − hfjl i−
E−,γN
2≤l≤|J1 |
Let us concentrate now on the first term of the right-hand side of (30), since
the other term can be bounded similarly: observe that, for 2 ≤ l ≤ |J1 |, we
have
fj′l = fj′l−1 ∆, with e−2U∞ ≤ ∆ ≤ e2U∞ .
Furthermore, it is easily seen that fj′l −1 − fj′l enjoys the antisymmetric property assumed in Definition 2.1. Thus, applying P(N − 1, γ0, B0 ), we get
E−,γN
hfj′l −1 − fj′l i−
B0 (|J1 | + 1) 2U∞
≤
(e
− 1),
hfjl −1 i−
N −1
which ends the proof.
3
Study of the magnetization
For the non-diluted perceptron model, in the high temperature regime, the
asymptotic behavior of the magnetization can be summarized easily: indeed,
it has been shown
in [6] that hσ1 i converges in L2 to a random variable of
2 √
the form tanh (z r), where r is a solution to a deterministic equation, and
z ∼ N(0, 1). Our goal in this section is to analyze the same problem for the
diluted perceptron model. However, in the current situation, the limiting law
is a more complicated object, and in order to present our asymptotic result,
we will go through a series of notations and preliminary lemmas.
Let P be the set of probability measures on [−1, 1]. We start by constructing a map T : P → P in the following way: for any integer θ ≥ 1,
let (τ1 , . . . , τθ ) be θ arbitrary integers. Then, for k = 1, . . . , θ, let tk be the
17
P
cumulative sum of the τk ; that is, t0 = 0 and tk = k̂≤k τk̂ for k ≥ 1. Let also
{ḡi,k , i, k ≥ 1} and {ḡk , k ≥ 1} be two independent families of independent
standard Gaussian random variables. Define then a random variable ξθ,τ by
ξθ,τ = ξθ,τ (σ1 , . . . , σtθ , ε)
= exp
θ
X
τk
X
u
!
ḡi,k σtk−1 +i + ḡk ε .
i=1
k=1
(31)
Whenever θ = 0, set also ξθ = 1, which is equivalent to the convention
P
0
k=1 wk = 0 for any real sequence {wk ; k ≥ 0}.
Consider now x = (x1 , . . . , xP θ
k=1 τk
f : {−1, 1}
) with |xi | ≤ 1 and a function
Pθ
k=1 τk
→ R.
We denote
by hf ix the average of f with respect to the product measure ν on
Pθ
R
{−1, 1} k=1 τk such that σi dν(δδ ) = xi , where δ = (σ1 , . . . , σP θ τk ). Using
k=1
this notation, when θ ≥ 1, we define Tθ,τ : P → P such that, for µ ∈ P,
Tθ,τ (µ) is the law of the random variable
hAvεξθ,τ iX
,
hAvξθ,τ iX
(32)
where X = (X1 , . . . , XP θ τk ) is a sequence of i.i.d. random variables of law
k=1
µ independent of the randomness in ξθ,τ and Av denotes the average over
ε = ±1. When θ = 0 we define Tθ,τ (µ) as the Dirac measure at point 0.
Eventually, we can define the map T : P → P by
X X
T (µ) =
κ(θ, τ1 , . . . , τθ ) Tθ,τ (µ),
(33)
θ≥0 τ1 ,...,τθ ≥0
with
P
(αγ)θ −θγ γ l≤θ τl
κ(θ, τ1 , . . . , τθ ) = e
e
,
(34)
θ!
τ1 ! · · · τθ !
and where the coefficients α, γ are the parameters of our perceptron model.
We will see that the asymptotic law µ of the magnetization hσ1 i will satisfy
the relation µ = T (µ). Hence, a first natural aim of this section is to prove
that the equation µ = T (µ) admits a unique solution:
−αγ
18
Theorem 3.1 Assume
1
2U∞ e2U∞ αγ 2 < .
2
(35)
Then there exists a unique probability distribution µ on [−1, 1] such that
µ = T (µ).
Remark 3.2 Notice that (4) implies (35).
In order to settle the fixed point argument for the proof of Theorem 3.1,
we will need a metric on P, and in fact it will be suitable for computational
purposes to choose the Monge-Kantorovich transportation-cost distance for
the compact metric space ([−1, 1], | · |): for two probabilities µ1 and µ2 on
[−1, 1], the distance between µ1 and µ2 will be defined as
d(µ1 , µ2 ) = inf E|X1 − X2 |,
(36)
where this infimum is taken over all the pairs (X1 , X2 ) of random variables
such that the law of Xj is µj , j = 1, 2. This definition is equivalent to say
that
Z
d(µ1 , µ2 ) = inf d(x1 , x2 )dζ(x1, x2 ), with d(x1 , x2 ) = |x2 − x1 |,
where this infimum is now taken over all probabilities ζ on [−1, 1]2 with
marginals µ1 and µ2 (see Section 7.3 in [9] for more information about
transportation-cost distances). Finally, throughout this section, we also use
a local definition of distance between two probabilities, with respect to an
event Ω:
dΩ (µ1 , µ2 ) = inf E |(X1 − X2 )1Ω | ,
(37)
where this infimum is as in (36).
Proof of Theorem 3.1: Assume that θ ≥ 1 and τk ≥ 1 for some k =
1, · · · , θ. Then,
Pθ using similar arguments to Lemma 7.3.5 in [9] we can prove,
for 1 ≤ i ≤ k=1 τk , that
∂ hAvεξθ,τ ix
≤ 2U∞ e2U∞ ,
∂xi hAvξθ,τ ix
19
(38)
with x = (x1 , . . . , xP θ τk ). Then if y = (y1 , . . . , yP θ τk ), the bound (38)
k=1
k=1
implies that
τk
θ X
X
hAvεξθ,τ ix hAvεξθ,τ iy
2U∞
|xtk +i − ytk +i |.
−
≤ 2U∞ e
hAvξθ,τ ix
hAvξθ,τ iy
k=1 i=1
(39)
Remark that if θ = 0 or θ 6= 0 but τk = 0 for any k = 1, . . . , θ, then the
left-hand side of (39) is zero.
Let now (X, Y ) be a pair of random variables such that the laws of X and
Y are µ1 and µ2 , respectively (µ1 and µ2 are independent of the randomness in
ξθ,τ ). Consider independent copies (Xi , Yi )i≤P θ τk of this couple of random
k=1
variables. Then, if X = (Xi )i≤P θ τk and Y = (Yi )i≤P θ τk , we have that
k=1
k=1
hAvεξθ,τ iX (d)
= Tθ,τ (µ1 ) and
hAvξθ,τ iX
hAvεξθ,τ iY (d)
= Tθ,τ (µ2 ).
hAvξθ,τ iY
Hence, applying (39) for x = X and y = Y and taking first expectation and
then infimum over the choice of (X, Y ), we obtain
2U∞
d(Tθ,τ (µ1 ), Tθ,τ (µ2 )) ≤ 2U∞ e
d(µ1 , µ2 )
θ
X
τk .
(40)
k=1
Eventually, recall (see [9, Lemma
P 7.3.2]) that for a given sequence {cn ; n ≥ 1}
of positive numbers such that n≥1 cn = 1, and two sequences {µn , νn ; n ≥ 1}
of elements of P, we have
!
X
X
X
d
cn µ n ,
cn νn ≤
cn d (µn , νn ) .
(41)
n≥1
n≥1
n≥1
Applying this elementary result to cθ,τ = κ(θ, τ1 , . . . , τθ ), µθ,τ = Tθ,τ (µ1 ) and
νθ,τ = Tθ,τ (µ2 ), we get
X X
d(T (µ1 ), T (µ2)) ≤
κ(θ, τ1 , . . . , τθ ) d(Tθ,τ (µ1 ), Tθ,τ (µ2 ))
θ≥0 τ1 ,...,τθ ≥0
≤ 2U∞ e2U∞
2U∞
= 2U∞ e
X
X
θ
X
κ(θ, τ1 , . . . , τθ ) τk d(µ1 , µ2 )
θ≥0 τ1 ,...,τθ ≥0 k=1
X
θ≥0
−αγ
e
(αγ)θ
θγ
θ!
= 2U∞ e2U∞ αγ 2 d(µ1 , µ2 ),
20
!
d(µ1 , µ2)
where we have used the fact that the mean of a Poisson random variable
with parameter ρ is ρ. Then, under assumption (35), T is a contraction and
there exists a unique probability distribution such that µ = T (µ).
Notice that the solution to the equation µ = T (µ) depends on the parameters α and γ. Furthermore, in the sequel, we will need some continuity
properties for the application (α, γ) 7→ µα,γ . Thus, we will set µ = µα,γ
when we want to stress the dependence on the parameters α and γ, and the
following holds true:
Lemma 3.3 If (α, γ) and (α′ , γ ′ ) satisfy (35), then
h
i
′ ′ ′ |γ−γ ′ |
′ ′ |αγ−α′ γ ′ |
+ |αγ − α γ |e
.
d(µα,γ , µα′ ,γ ′ ) ≤ 4 |γ − γ |α γ e
Proof: Since µα,γ = Tα,γ (µα,γ ) and µα′ ,γ ′ = Tα′ ,γ ′ (µα′ ,γ ′ ), using the triangular
inequality and Theorem 3.1 we have
d(µα,γ , µα′ ,γ ′ ) ≤ d(Tα,γ (µα,γ ), Tα,γ (µα′ ,γ ′ )) + d(Tα,γ (µα′ ,γ ′ ), Tα′ ,γ ′ (µα′ ,γ ′ ))
1
≤
d(µα,γ , µα′ ,γ ′ ) + d(Tα,γ (µα′ ,γ ′ ), Tα′ ,γ ′ (µα′ ,γ ′ )).
2
So
d(µα,γ , µα′ ,γ ′ ) ≤ 2d(Tα,γ (µα′ ,γ ′ ), Tα′ ,γ ′ (µα′ ,γ ′ ))
and we only need to deal with d(Tα,γ (µα′ ,γ ′ ), Tα′ ,γ ′ (µα′ ,γ ′ )). However, Lemma
7.3.3 in [9] implies that
d(Tα,γ (µα′ ,γ ′ ), Tα′ ,γ ′ (µα′ ,γ ′ ))
X X
≤2
|κα,γ (θ, τ1 , . . . , τθ ) − κα′ ,γ ′ (θ, τ1 , . . . , τθ )|
θ≥0 τ1 ,...,τθ ≥0
≤ 2(V1 + V2 ),
with κ defined in (34) and
V1 =
X
V2 =
X
X
θ≥0 τ1 ,...,τθ
P
γ l≤θ τl
′ ′
e−θγ
e−αγ (αγ)θ − e−α γ (α′ γ ′ )θ ,
θ! τ1 ! · · · τθ !
≥0
X
θ≥0 τ1 ,...,τθ ≥0
′ ′
e−α γ
P
P
(α′γ ′ )θ
′
e−θγ γ l≤θ τl − e−θγ γ ′ l≤θ τl .
θ! τ1 ! · · · τθ !
21
Now, following the arguments of (7.53) in [9], we get
V1 =
X1
′ ′
′ ′
e−αγ (αγ)θ − e−α γ (α′ γ ′ )θ ≤ |αγ − α′ γ ′ |e|αγ−α γ | ,
θ!
θ≥0
′
|γ−γ ′ |
V2 ≤ |γ − γ |e
X
−α′ γ ′ (α
θe
θ≥0
′ ′ θ
γ)
′
= α′ γ ′ |γ − γ ′ |e|γ−γ | ,
θ!
which ends the proof of this lemma.
From now on, we will specialize our Hamiltonian to the case of interest
for us:
Hypothesis 3.4 The parameters ηk , k = 1, . . . , M in the Hamiltonian (1)
are all equal to one.
This assumption being made, we can now turn to the main result of the
section:
Theorem 3.5 Let γ0 be a positive number such that
4U∞ αγ02 e4U∞ eαγ0 (e
4U∞ −1)
3 + 2γ0 + α(γ02 + γ03 )e4U∞ < 1,
(42)
and assume that there exists a positive number C0 satisfying
C0 α γ06 U∞ e2U∞ ≤ 1.
(43)
Then for any γ ≤ γ0 , given any integer m, we can find i.i.d. random variables
z1 , . . . , zm with law µα,γ such that
"
#
X
Km3
,
(44)
E
|hσi i − zi | ≤
N
i≤m
for a constant K > 0 independent of m.
Remark 3.6 The two conditions in the above theorem are met when the
following hypothesis is satisfied: there exists L > 0 such that
L U∞ α γ06 exp 8U∞ + αγ0 e4U∞ − 1
< 1.
22
As in the case of Theorem 2.2, the proof of Theorem 3.5 will require
the introduction of some notations and preliminary Lemmas. Let us first
recast relation (44) in a suitable way for an induction procedure: consider
the metric space [−1, 1]m , equipped with the distance given by
X
d((xi )i≤m , (yi )i≤m ) =
|xi − yi |.
i≤m
We also denote by d the transportation-cost distance on the space of probability measures on [−1, 1]m , defined as in (36). Define now
D(N, M, m, γ0 ) = sup d L(hσ1 i, . . . , hσm i), µ⊗m
(45)
α,γ ,
γ≤γ0
where L(X) stands for the law of the random variable X. Then the statement
of Theorem 3.5 is equivalent to say that, under Hypothesis (43), we have
D(N, M, m, γ0 ) ≤
Km3
,
N
for any fixed integer m ≥ 1.
It will also be useful to introduce a cavity formula for m spins, which
we proceed to do now: generalizing some aspects of the previous section, we
consider, for p ∈ {1, . . . , m}, the random sets
M
DN,p
= {k ≤ M; γN −p+1,k = 1},
and
m
FN,M
=
m
[
M
DN,p
.
p=1
We also define the following two rare events:
Ω̃1 = {∃k ≤ M, p1 , p2 ≤ m; γN −p1 +1,k = γN −p2+1,k = 1},
m
Ω̃2 = {∃i ≤ N − m, k1 , k2 ∈ FN,M
; γi,k1 = γi,k2 = 1},
satisfying
αγ 2 m2
P (Ω̃1 ) ≤
N
and
23
P (Ω̃2 ) ≤
α2 γ 4 m2
.
N
(46)
Then, the following properties hold true: first, for a fixed k, if Ω̃c1 is realized,
we have
Card{p ≤ m, γN −p+1,k = 1} ≤ 1.
Moreover, still on Ω̃c1 , for p1 6= p2 ,
M
M
DN,p
∩ DN,p
= ∅;
1
2
and hence,
m
Rm ≡ |FN,M
|=
m
X
p=1
M
|DN,p
|=
XX
γN −p+1,k .
k≤M p≤m
Actually, notice that we always have
Rm ≤
m
X
p=1
M
|DN,p
|.
m
Let us introduce now an enumeration of FN,M
:
m
FN,M
= {k1 , . . . , kRm },
and for any v ≤ Rm set
Ivm = {j ≤ N − m; γj,kv = 1}.
Then, on Ω̃c2 , we get
Ivm1 ∩ Ivm2 = ∅,
if v1 6= v2 ,
(47)
and we can also write
Jm =
[
v≤Rm
Ivm =
[
v≤Rm
{j ≤ N − m; γj,kv = 1}.
Let us separate now the m last spins in the Hamiltonian HN,M : if Ω̃c1 is
realized, for ρ = (σ1 , . . . , σN −m ), we have the following decomposition:
σ ) = −HN−−m,M (ρρ) + log ξ,
−HN,M (σ
24
with
− HN−−m,M (ρρ) =
ξ = exp
m
X
X
u
m )c
k∈(FN,M
X
u
p=1 k∈D M
N,p
X
i≤N −m
gi,k γi,k σi
!
,
!
X
gi,k γi,k σi + gN −p+1,k σN −p+1 .
(48)
i≤N −m
Observe that, in the last formula, HN−−m,M (ρρ) is not exactly the Hamiltonian
n
of a (N − m)-spin system changing γ into γ − , because the set FN,M
is not
deterministic. But this problem will be solved again by conditioning upon
the random variables {γN −p+1,k , p = 1, . . . , m, k ≤ M}. For the moment,
let us just mention that the m cavity formula will be the following: given f
on ΣN , we have
hAvf ξi−
1Ω̃c1 hf i = 1Ω̃c1
,
(49)
hAvξi−
where h·i− is the average with respect to HN−−m,M and Av is the average
with respect to last m spins. Moreover, in the last formula, we have kept the
notation ξ from Section 2, which hopefully will not lead to any confusion.
Eventually, we denote by L0 the law of a random variable conditioned by
{γN −p+1,k , p = 1, . . . , m, k ≤ M}, and by EγN,m the associated conditional
expectation.
We can start now stating and proving the lemmas and propositions that
will lead to the proof of Theorem 3.5. Recall that given x = (x1 , . . . , xN −m ),
|xi | ≤ 1, and a function f on ΣN −m , hf ix means theRaverage of f with
respect to the product measure ν on ΣN −m such that σi dν(ρρ) = xi , for
−m
. Then, as a direct consequence
0 ≤ i ≤ N − m. Recall also that γ − = γ N N
of the definition of the operator Tθ,τ , we have the following result:
Lemma 3.7 Let X = (X1 , . . . , XN −m ) be an independent sequence of random variables, where the law of each Xl is µα,γ − . Set
wp =
hAvσN −p+1 ξiX
,
hAvξiX
p = 1, . . . , m.
Then, on Ω̃c = (Ω̃1 ∪ Ω̃2 )c , we have
M |,(|I m |,k∈D M ) (µα,γ − ) ⊗ · · · ⊗ T|D M |,(|I m |,k∈D M ) (µα,γ − ).
L0 (w1 . . . , wm ) = T|DN,1
N,m
N,m
N,1
k
k
25
We will now try to relate the random variables wp with the magnetization
of the m last spins. A first step in that direction is the following lemma where
we use the random value of the parameter α− associated to the Hamiltonian
of a (N − m)-spin system.
Lemma 3.8 On Ω̃c , set
Γm = d (L0 (w1 . . . , wm ), L0 (w̄1 . . . , w̄m )) ,
where, for p = 1, . . . , m,
M |,(|I m |,k∈D M ) (µα− ,γ − ),
w̄p = T|DN,p
N,p
k
with α− =
M − Rm
.
N −m
Then, on Ω̃c , we have
2U∞
Γm ≤ 2U∞ e
Rm
|Rm − mα| X
|Rm − mα|
|Ikm |.
γ0
exp γ0
N
N
k=1
Proof: Using (40) we obtain
2U∞
Γm ≤ 2U∞ e
m
X
X
p=1 k∈D M
N,p
|Ikm | d(µα,γ − , µα− ,γ − ).
The proof of this lemma is then easily finished thanks to Lemma 3.3, and
taking the following equality into account:
γ − |α − α− | = γ
|Rm − mα|
.
N
Notice that we have introduced the random variables w̄p for the following
reason: given the randomness contained in the {γN −p+1,k , p = 1, . . . , m, k ≤
M}, w̄p can be interpreted as
w̄p =
hAvσN −p+1 ξiX̄
,
hAvξiX̄
p = 1, . . . , m,
where X̄ = (X̄1 , . . . , X̄N −m ) is an independent sequence of random variables
with law µα− ,γ − .
26
Lemma 3.9 Consider Z = (hσ1 i− , . . . , hσN −m i− ), and denote
up =
hAvσN −p+1 ξiZ
,
hAvξiZ
p = 1, . . . , m.
Then, on Ω̃c ,
d (L0 (u1 , . . . , um ), L0(w̄1 , . . . , w̄m ))
m
≤ 4D(N − m, |(FN,M
)c |, |Jm|, γ0) U∞ e2U∞ ,
where the quantity D has been defined at relation (45).
Proof: As in (38) we can obtain, for any i ≤ N − m,
∂ hAvσN −p+1 ξix
≤ 2U∞ e2U∞ .
∂xi
hAvξix
(50)
But in fact, these derivatives are vanishing, unless
[
m
i ∈ IN,p
≡
Ivm ,
M
v; kv ∈DN,p
for some p = 1, . . . , m. Indeed, on Ω̃c , from (47), we have
m
m
IN,p
∩ IN,p
= ∅,
1
2
if p1 6= p2 .
Then, for a given p ∈ {1, . . . , m}, we can decompose ξ into ξ = ξN,p ξ¯N,p ,
with
!
X
X
ξN,p = exp
u
gi,k γi,k σi + gN −p+1,k σN −p+1
M
k∈DN,p
i≤N −m
m
= ξN,p {σi , i ∈ IN,p
}, σN −p+1 ,
Then
m
ξ¯N,p = ξ¯N,p {σi , i ∈ Jm \IN,p
}, σN −p̄+1 , p̄ ≤ m, p̄ 6= p} .
hAvσN −p+1 ξN,p ix
hAvσN −p+1 ξN,p ix hAvξ¯N,p ix
hAvσN −p+1 ξix
=
=
,
¯
hAvξix
hAvξN,p ix
hAvξN,p ix hAvξN,p ix
27
and clearly the derivative
p ∈ {1, . . . , m}.
∂
∂xi
m
is zero when i does not belong to IN,p
, for any
Now, invoking inequality (50), we get
m X
m
X
X
hAvσN −p+1 ξiX̄
X̄i − hσi i− 2U∞ e2U∞ .
− up ≤
hAvξi
X̄
p=1 i∈I m
p=1
N,p
Then, the definition of EγN,m and (45) easily yield
m
X X
m
EγN,m
X̄i − hσi i− ≤ 2D(N − m, |(FN,M
)c |, |Jm|, γ0 ),
m
p=1 i∈IN,p
which ends the proof.
Set now, for 1 ≤ p ≤ m,
ūp =
hAvσN −p+1 ξi−
.
hAvξi−
(51)
Then ūp is closer to the real magnetization in the sense that ūp = hσN −p+1 i
on Ω̃c , and the following Lemma claims that the distance between ūp and up
vanishes as N → ∞.
Lemma 3.10 For 1 ≤ p ≤ m, let ūp be defined by (51). Then, on Ω̃c , we
have
d (L0 (ū1 , . . . , ūm), L0 (u1 , . . . , um )) ≤ 2B0
|Jm |2 − 1
(e2U∞ − 1),
N −m+1
where the constant B0 has been defined in the previous section.
Proof: The computations can be leaded here almost like in the proof of
Proposition 2.8, and the details are left to the reader.
We will now identify the law of the ūp in terms of laws of the type T (µα,γ − ):
28
Lemma 3.11 Recall that dΩ̃c has been defined by relation (37). Then, for
m ≥ 1, set
XX
a ((b1 , v1 ), . . . , (bm , vm ))
δm = dΩ̃c L(ū1 , . . . , ūm),
(b)
v)
(v
Tb1 ,vv1 (µα,γ − ) ⊗ · · · ⊗ Tbm ,vvm (µα,γ − ) ,
where we have used the following conventions: for j P
≤ m, vj is a multi-index
j
j
v
of the form j = (v1 , . . . , vbj ); the first summation (b) is over bj ≥ 0, for
P
j = 1, . . . , m; the second one (vv ) is over v1j , . . . , vbjj ≥ 0, for j = 1, . . . , m;
and a((b1 , v1 ), . . . , (bm , vm )) is defined by
M
M
a ((b1 , v1 ), . . . , (bm , vm )) = P |DN,j
| = bj , (|Ikm|, k ∈ DN,j
) = vj , ∀j ≤ m .
Then, under the conditions of Lemma 3.10, we have
δm ≤ c1 (N, m),
with
m
c1 (N, m) = 4 U∞ e2U∞ E D(N − m, |(FN,M
)c |, |Jm|, γ0)
E|Jm |2 − 1 2U∞
+2B0
(e
− 1)
N − m + 1
nγ
o
γ0
0
E |Rm − mα| |Jm | exp
|Rm − mα| .
+2U∞ e2U∞
N
N
Proof: This result is easily obtained by combining Lemmas 3.7, 3.8, 3.9,
3.10 and taking expectations.
With Lemma 3.11 in hand, we can see that the remaining task left to us
is mainly to compare the coefficients a((b1 , v1 ), . . . , (bm , vm )) with the coefficients κα,γ − (bj , vj ). This is done in the following lemma.
Lemma 3.12 With the conventions of Lemma 3.11, we have
XX
(b)
v)
(v
a ((b1 , v1 ), . . . , (bm , vm )) −
29
m
Y
j=1
κα,γ − (bj , vj ) ≤
mL0 (γ)
.
N
(52)
Proof: In fact, it is easily seen that we only need to prove that
X
v ≥0
t,v
|a(b, v ) − κα,γ − (b, v )| ≤
L0 (γ)
,
N
with v = (v1 , . . . , vb ). However, notice that
b
γ N −m−vl
γ M −b Y N − m γ vl
γ b
M
1−
,
1−
a(b, v ) =
vl
N
N
N
N
b
l=1
and recall that
−αγ −
κα,γ − (b, v ) = e
Then
X
v ≥0
b,v
with
A =
X
v ≥0
b,v
|a(b, v ) − κα,γ − (b, v )| ≤ A + B,
−αγ − (αγ
e
P
(αγ − )b −bγ − (γ − ) l≤b vl
e
.
b!
v1 ! · · · vb !
− b
b!
)
Āb,vv ,
P
b
γ N −m−vl
(γ − ) l≤b vl Y N − m γ vl
,
1−
−
Āb,vv = e
v
v1 ! · · · vb !
N
N
l
l=1
b
Y N − m γ vl
X
γ N −m−vl
B̄b
B =
1−
,
v
N
N
l
v ≥0
l=1
b,v
− b
γ b
γ M −b
M
−αγ − (αγ )
1−
.
−
B̄b = e
b
b!
N
N
−bγ −
Now, following the estimates for the approximation of a Poisson distribution
by a Binomial given in [9, Lemma 7.4.6], we can bound Āb,vv and B̄b by a
quantity of the form Nc . The proof is then easily finished.
Let us relate now the law of (ū1 , . . . , ūm ) with µ⊗m
α,γ − .
30
Lemma 3.13 We have
≤ c2 (N, m),
dΩ̃c L(ū1 , . . . , ūm ), µ⊗m
−
α,γ
with
m
c2 (N, m) = 4 U∞ e2U∞ E D(N − m, |(FN,M
)c |, |Jm|, γ0)
E|Jm |2 − 1 2U∞
2m2 L0 (γ0 )
(e
− 1) +
N − m + 1
Nn
o
γ0
γ
0
E |Rm − mα| |Jm | exp
|Rm − mα| .
+2U∞ e2U∞
N
N
+2B0
Proof: Notice that, invoking relation (33) and Theorem 3.1, we get
m
XXY
κα,γ − (bj , vj ) Tb1 ,vv1 (µα,γ − ) ⊗ · · · ⊗ Tbm ,vvm (µα,γ − ) = µ⊗m
α,γ − .
(b)
v)
(v
j=1
Then, the results follows easily from Lemmas 3.11 and 3.12, Lemma 7.3.3 in
[9] and the triangular inequality.
We are now ready to end the proof of the main result concerning the
magnetization of the system.
Proof of Theorem 3.5: First of all, notice that by symmetry we have
L(hσ1 i, . . . , hσm i) = L(hσN −m+1 i, . . . , hσN i).
Furthermore, thanks to (49) and (46) and Lemma 3.3, we can write
D(N, M, m, γ0 ) = sup d L(hσ1 i, . . . , hσm i), µ⊗m
α,γ
γ≤γ0
hAvσN −m+1 ξi−
hAvσN ξi−
⊗m
≤ sup dΩ̃c L
, µα,γ
,...,
hAvξi−
hAvξi−
γ≤γ0
2m3 αγ02 (1 + αγ02 )
+
N
hAvσN −m+1 ξi−
hAvσN ξi−
⊗m
≤ sup dΩ̃c L
, µα,γ −
,...,
hAvξi−
hAvξi−
γ≤γ0
2m3 αγ02 (1 + αγ02 )
+
N
n mγ o
n mαγ o
4mαγ0
0
0
+
γ0 exp
+ exp
.
N
N
N
31
Then, Lemma 3.13 implies
m
D(N, M, m, γ0 ) ≤ 4 U∞ e2U∞ E D(N − m, |(FN,M
)c |, |Jm |, γ0)
E|Jm |2 − 1 2U∞
(e
− 1)
N − m + 1
nγ
o
0
2U∞ γ0
E |Rm − mα| |Jm | exp
|Rm − mα|
+2U∞ e
N
N
2m2 L0 (γ0 ) 12m3 αγ04 exp(γ0 )
+
.
+
N
N
+2B0
It is readily checked, as we did in (29), that
N −m
αmγ02 ,
N
N −m
2
(γ0 + γ02 ) αmγ0 + (αmγ0 )2 ,
E |Jm |
≤
N
N −m
3
E |Jm |
≤
(γ0 + 3γ02 + γ03 ) αmγ0 + 3(αmγ0 )2 + (αmγ0 )3 .
N
E (|Jm |)
≤
Thus, using the fact that Rm ≤ Y where Y ∼ B(mM, Nγ ), together with the
trivial bound Rm ≤ M, there exists a constant K0 ≥ 1 such that
m
D(N, M, m, γ0 ) ≤ 4 U∞ e2U∞ E D(N − m, |(FN,M
)c |, |Jm |, γ0)
K0 m3 αγ04 exp( 23 γ0 ) + L0 (γ0 )
.
(53)
+
N
Now we are able to prove, by induction over N, that
2K0 m3 αγ04 exp( 23 γ0 ) + L0 (γ0 )
D(N, M, m, γ0 ) ≤
,
N
for all m ≤
N
.
2
Indeed, in order to check the induction step from N − 1 to N, notice that
m
|(FN,M
)c | ≤ M and that
N −m 3 6
(m αγ0 ).
E |Jm |3 ≤ 25
N
So, using also that
16m2 αγ04
N
4
P |Jm | ≥
≤ 2 E |Jm |2 ≤
,
2
N
N2
32
and by our induction hypothesis and (53), we have
K0 m3 αγ04 exp( 23 γ0 ) + L0 (γ0 )
D(N, M, m, γ0 ) ≤
N
!
3 M
3
4
3
4
2K
E
|J
|
γ
exp(
γ
)
+
L
(γ
)
32m
αγ
0
m
0
0
0
0
N −m
2
0
+4 U∞ e2U∞
+
N −m
N2
K0 m3 αγ04 exp( 32 γ0 ) + L0 (γ0 )
≤
N
!
3
3
6
4
50K
32m3 αγ04
0 m αγ0 2α γ0 exp( 2 γ0 ) + L0 (γ0 )
2U∞
+4 U∞ e
+
.
N
N2
Finally, since M < N − m, the proof easily follows from hypothesis (43).
4
Replica symmetric formula
Now that the limiting law of the magnetization has been computed, we can
try to evaluate the asymptotic behavior of the free energy of our system,
namely
!#
"
X
1
σ ))
exp (−HN,M (σ
.
pN (γ) = E log
N
σ ∈Σ
N
To this purpose, set
G(γ) = α log
!
γp
V̄p+1
exp(−γ) E
,
p!
V̄p
p=0
∞
X
where
V̄p :=
Z D
X
E
exp u
gi,M σi
i≤p
(x1 ,...,xp )
dµα,γ (x1 ) × . . . × dµα,γ (xp )
and h·ix means
integration with respect to the product measure ν on {−1, 1}p
R
such that σi dν = xi . Then, the main result of this part states that:
33
Theorem 4.1 Set F such that F ′ (γ) = G(γ) and F (0) = log 2 − αu(0).
Then, if γ ≤ γ0 and (42) and (43) hold true, we have
|pN (γ) − F (γ)| ≤
K
,
N
where K does not depend on γ and N.
Since pN (0) = log 2 − αu(0), the proof of the theorem is a consequence of the
following proposition.
Proposition 4.2 If γ ≤ γ0 and (42) and (43) hold, we have
|p′N (γ) − G(γ)| ≤
K
,
N
where p′N (γ) is the right derivative of pN (γ).
Proof: We divide the proof into two steps.
Step 1: We will check that
|p′N (γ) − G1 (γ)| ≤
K
.
N
(54)
where G1 (γ) is defined as
! +#
" *
X
X
.
αE log exp u
gi,M γi,M σi
gi,M γi,M σi + gN,M σN − u
i≤N
i≤N
Following the method used in Lemma 7.4.11 in [9], we introduce the
Hamiltonians
!
X
X
1
σ) =
−HN,M (σ
u
gi,k (γi,k + δi,k ) σi ,
k≤M
2
σ) =
−HN,M
(σ
X
k≤M
u
X
i≤N
gi,k min(1, (γi,k + δi,k )) σi
i≤N
!
,
where {δi,k }1≤i≤N,1≤k≤M is a family of i.i.d. random variables with P (δi,k =
1) = Nδ , P (δi,k = 0) = 1 − Nδ . We also assume that this sequence is independent of all the random sequences previously introduced. Observe that the
34
random variables min(1, (γi,k + δi,k )) are i.i.d with Bernoulli law of parameter
γ′
, where γ ′ ≡ γ + δ − γδ
. Set now, for j = 1, 2,
N
N
"
!#
X
1
j
σ)
pjN (δ) = E log
exp −HN,M
(σ
.
N
σ ∈Σ
N
Obviously, p2N (δ) = pN (γ ′ ), and our first task will be to show that p1N (δ) −
p2N (δ) is of order δ 2 : notice that
p1N (δ) − p2N (δ) =
1
1
2
σ ) + HN,M
σ ) i2 ,
E loghexp −HN,M
(σ
(σ
N
where h·i2 denotes the average for the
P Gibbs’ measure defined by the Hamilto2
1
nian HN,M
. Consider now YN,M
= i,k γi,k δi,k . Since, γi,k +δi,k = min(1, γi,k +
1
1
2
δi,k ) + γi,k δi,k , on the set {YN,M
= 0}, we have HN,M
= HN,M
. So, we can
write
i
1 h
1
2
1
2
σ ) + HN,M (σ
σ ) i2
1
E 1{YN,M
pN (δ) − pN (δ) =
=1} loghexp −HN,M (σ
N
h
i
1
1
2
σ ) + HN,M (σ
σ ) i2 .
1
+ E 1{YN,M
≥2} loghexp −HN,M (σ
N
Using that
P (YN,M
N M
N M −1
γδ
γδ
γδ
≤ α2 δ 2 γ 2
− NM 1 − 2
≥ 2) = 1 − 1 − 2
2
N
N
N
and
1
P (YN,M
N M −1
γδ
γδ
= 1) = NM 1 − 2
≤ αγδ,
N
N2
it is easily checked that
lim+
δ−→0
p1N (δ) − p2N (δ)
K
≤ ,
δ
N
(55)
which means that we can evaluate the difference p1N (δ) − pN (γ) instead of
p2N (δ) − pN (γ).
However, following the same arguments as above, we can write
p1N (δ) − pN (γ) =
1
1
σ ) + HN,M (σ
σ) i .
E loghexp −HN,M
(σ
N
35
P
We consider now YN,M =
i,k δi,k . Notice that on the set {YN,M = 0},
1
HN,M = HN,M . So, we can write
1
1
σ ) + HN,M (σ
σ) i
E 1{YN,M =1} loghexp −HN,M
(σ
N
1
1
σ ) + HN,M (σ
σ) i
(σ
+ E 1{YN,M ≥2} loghexp −HN,M
N
≡ V1 (δ) + V2 (δ).
p1N (δ) − pN (γ) =
Let us bound now V1 (δ) and V2 (δ): since
N M
N M −1
δ
δ
δ
P (YN,M ≥ 2) = 1 − 1 −
≤ α(NM − 1)δ 2 ,
− NM 1 −
N
N
N
we have
|V2 (δ)| ≤ 2α2 (NM − 1)U∞ δ 2 .
On the other hand, using a symmetry argument, we get
N M −1
δ
δ
V1 (δ) = NM 1 −
N
N2
" *
! +#
X
X
× E log exp u
gi,M γi,M σi + gN,M σN − u
gi,M γi,M σi
.
i≤N
i≤N
Hence, we obtain that
p1N (δ) − pN (γ)
V1 (δ) + V2 (δ)
= lim+
(56)
δ−→0
δ−→0
δ
" *δ
! +#
X
X
= αE log exp u
gi,M γi,M σi + gN,M σN − u
gi,M γi,M σi
.
lim+
i≤N
i≤N
Eventually, since
p′N (γ) = ′ lim +
γ −→γ
p2N (δ) − pN (γ)
pN (γ ′ ) − pN (γ)
,
=
lim
δ−→0+
γ′ − γ
δ 1 − Nγ
putting together (55) and (56), we obtain (54).
Step 2: Let us check now that
|G(γ) − G1 (γ)| ≤
36
K
.
N
(57)
To this purpose, set
*
!+
X
X
Ψ := exp u
gi,M γi,M σi + gN,M σN − u
gi,M γi,M σi
,
i≤N
i≤N
and let us try to evaluate first E[Ψ]: notice that
P
P
σ ))
σ ∈ΣN exp u(
i≤N gi,M γi,M σi + gN,M σN ) exp (−HN,M −1 (σ
P
Ψ =
σ ))
σ ∈ΣN exp (−HN,M (σ
P
hexp u( i≤N gi,M γi,M σi + gN,M σN ) iM −1
P
,
=
hexp u( i≤N gi,M γi,M σi ) iM −1
where h·iM −1 denotes the usual average using the Hamiltonian HN,M −1 . Set
PN −1
Bp := { i=1
γi,M = p, γN,M = 0} and B := {γN,M = 1}, and let us denote
by EM the conditional expectation given {γi,M , 1 ≤ i ≤ N}. Then
#
"N −1
X
(58)
E [Ψ] = E
1Bp EM [Ψ] + E [1B EM [Ψ]]
p=0
=
N
−1
X
p=0
N − 1 γ p
γ N −p+1
γ
hexp(Vp+1)iM −1
1−
+ e2U∞ ,
E
p
N
N
hexp(Vp )iM −1
N
where
Vp := u
X
i≤p
gi,M σi
!
.
Set X p = (hσ1 i, . . . , hσp i). Then, using the triangular inequality and following the same arguments as in Proposition 2.8, we get, for a strictly positive
constant K,
hexp(Vp+1)iX p+1
hexp(Vp+1 )iM −1
−E
E
hexp(Vp )iM −1
hexp(Vp )iX p
hexp(Vp+1 )iM −1hexp(Vp )iX p − hexp(Vp+1 )iX p+1 hexp(Vp )iM −1
= E
hexp(Vp )iM −1 hexp(Vp )iX p
3U∞
≤e
E hexp(Vp+1 )iM −1 − hexp(Vp+1 )iX p+1
+ hexp(Vp )iM −1 − hexp(Vp )iX p
p2 K
≤ e3U∞
.
(59)
N
37
Consider now some i.i.d. random variables z1 , . . . , zp of law µα,γ such that
(44) holds. Set Y p = (z1 , . . . , zp ). Then, following the same arguments as
above, we get, for a strictly positive constant K,
hexp(Vp+1)iX p+1
hexp(Vp+1 )iY p+1
E
−E
hexp(Vp )iX p
hexp(Vp )iY p
≤ e3U∞ E hexp(Vp+1 )iX p+1 − hexp(Vp+1)iY p+1
+ hexp(Vp )iX p − hexp(Vp )iY p
p3 K
≤ e3U∞
,
(60)
N
where in the last inequality we have used (44) and the fact that
∂
hexp(Vp )ix ≤ eU∞ .
∂xi
Notice that if W is a random variable with law Bin(N − 1, Nγ ), then
E(W 3) ≤ K, where K does not depend on N. So, putting together (58),
(59) and (60), we get
E [Ψ] =
N
−1
X
p=0
hexp(Vp+1 )iY p+1
N − 1 γ p
γ N −p
K
1−
E
+ .
p
N
N
hexp(Vp )iY p
N
Using now similar arguments to those ones used in the proof of Lemma 3.11,
we get
∞
X
hexp(Vp+1 )iY p+1
K
γp
(61)
+ .
exp(−γ) E
E [Ψ] =
p!
hexp(V
)i
N
p
Y
p
p=0
Eventually, once (61) has been obtained, (57) can be established following
the method used in Proposition 7.4.10 in [9], the remaining details being left
to the reader.
References
[1] Bardina, X.; Márquez-Carreras, D.; Rovira, C.; Tindel, S.: The p-spin
interaction model with external field. Potential Analysis 21 no. 4 (2004)
311–362.
38
[2] Cont, R.; Lowe, M.: Social distance, heterogeneity and social interactions.
Preprint.
[3] Franz, S.; Toninelli, F.: The Kac limit for diluted spin glasses. Internat.
J. Modern Phys. B 18 (2004), no. 4-5, 675–679.
[4] Guerra, F.; Toninelli, F.: The thermodynamic limit in mean field spin
glass models. Comm. Math. Phys. 230 (2002), no. 1, 71–79.
[5] Hertz, J.; Krogh, A.; Palmer, R.: Introduction to the Theory of Neural
Computation. Addison-Wesley Publishing Company, 1991.
[6] Márquez-Carreras, D.; Rovira, C.; Tindel, S.: Assymptotic behavior of
the magnetization for the perceptron model. To appear at Ann. Inst. H.
Poincaré Probab. Statist.
[7] Mezard, M.; Parisi, G.; Virasoro, M.A.: Spin glass theory and beyond.
World Scientific, 1987.
[8] Shcherbina, M.; Tirozzi, B.: Rigorous solution of the Gardner problem.
Comm. Math. Phys. 234 (2003), no. 3, 383–422.
[9] Talagrand, M.: Spin Glasses: a Challenge for Mathematicians. Springer,
2003.
[10] Talagrand, M.: The Parisi solution; To appear at Annals of Math.
39