Components of The Game Result in A Football League

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

APPLICATIONES MATHEMATICAE

28,1 (2001), pp. 55–72

Bolesław Kopociński (Wrocław)

COMPONENTS OF THE GAME RESULT


IN A FOOTBALL LEAGUE

Abstract. We assume that the result of a football game depends upon


the difference of the strengths of the teams, home-field advantage, random
factors and also other components. We describe the goal outcome per game
by independent Poisson random variables; we concentrate on expected val-
ues. The least squares estimators of the parameters are obtained. The study
is illustrated by examples from the Italian and Polish leagues.

1. Introduction. The 1998 Soccer World Cup concentrated the atten-


tion of many people. The result of a football (soccer) game is in considerable
degree a reflection of the “strengths” of the playing teams. Starting from
this supposition and using famous rankings of teams, experts forecast the
results of elimination groups and publish their predictions (on the Inter-
net: Magne Aldrin, Norwegian Computing Center). In the description of a
football league, important factors seem to be the home-field advantage and
a random element. The purpose of the paper is to separate these compo-
nents using diverse data: the results of a league season or, in the extreme
case, the league table only. It is not our intention here to define the best
strengths. It is reasonable to assume that the point table reflects the distri-
bution of strengths. So we assume the knowledge of points in the final table
of one season and use these as the strengths of the teams. The strength
and home-field advantage were considered by Glickman and Stern [3] when
analysing the differences of points scored in the American National Football
League (NFL). The competitions of the Soccer World Cup have a limited
element of home-field advantage, but the strengths and the random compo-
nent remain.

2000 Mathematics Subject Classification: 62F10, 62P99.


Key words and phrases: football (soccer) league, strength of team, home-field advan-
tage, style of game, retaliation, least squares estimation.

[55]
56 B. Kopociński

In the description of a football league we make considerable theoretical


simplifications and assume a limited knowledge of outcomes of one season of
competitions. We assume that the numbers of goals scored by teams in one
game are independent Poisson random variables. In details we concentrate
on the expected values of these variables. We assume that the strengths of
teams are constant during each season. Note that Keller [4] assumed this
for games of England, Ireland, Scotland and Wales in the wide 1883–1980
period. In the supplement we indicate additional components of the game
result: the mutual dependence of a game–return pair and the question of
game style (imposing the defensive or offensive style on the opponent).
Let us recall some football terminology and introduce the notation. As-
sume that in a league there are n teams, and their strengths are denoted by
m = (m1 , . . . , mn ). The goal outcome of a game between team i and team
j with strengths mi and mj is denoted by Xij , Yij (the home team placed
first). Define K to be the number of points gained by a winning team. In the
past, K = 2 points were given to a winning team and 1 point to both teams
in a tie game. Now there are usually K = 3 points for a win and 1 point
for a tie. Denote by Uij , Vij the point outcome of a game between teams i, j
and define
(1) Uij = K 1Xij >Yij + 1Xij =Yij , Vij = K 1Xij <Yij + 1Xij =Yij .
The point outcome of team i in the whole season is
n
X
(2) Ui = (Uij + Vji ), 1 ≤ i ≤ n.
j=1, j6=i

The random variables (2) are mutually dependent. For example, if K = 2,


then Vij = 2 − Uij and we have
U1 = U12 + U13 + . . . + U1n + V21 + V31 + . . . + Vn1 ,
U2 = U21 + U23 + . . . + U2n + V12 + V32 + . . . + Vn2 ,
thus Cov(U1 , U2 ) = −Var(U12 )−Var(U21 ) is negative.
The position of team i in the final table is given by the range of Ui in
the sequence (2). The goals scored give the additional classification which
is used if the numbers of points are equal. Here the additional classification
is not essential but the difference of real and expected numbers of goals in
the final table will illustrate the efficiency of the estimation.
We illustrate our considerations by examples from the Italian and Polish
leagues. In particular, we consider past and present-day leagues in which the
number of teams, the parameter K and the mean number of goals scored
per game differ. The statistical data used in the paper come from [1] and
[2]. Fragments of text concerning examples are enclosed within  .
Components of the game result 57

We easily find analogies to the football problems in other sport disci-


plines and also in other areas. In the medical problem of carcinogenesis one
considers the share of genetic and environmental components. In the psycho-
logical problem of the level of human intelligence the heredity and education
components are considered. However the main components considered here
are rather specific to sport.

2. Components of the game result. Let f (k), 1 ≤ k ≤ n, denote the


point outcome for a team which occupied the kth place in the final table
of a league season. If we assume that a high-placed team always beats a
low-placed one, then the number of points in the final table is the linear
function fA (k) = 2(n − k)K. If a home team always wins, then the number
of points in the final table is constant: fB (k) = (n − 1)K.
Now consider the result of each game to be purely random with any
probabilities of a win by the home team, defeat and a tie game. Let K = 2.
For every pair of teams i, j we introduce independent identically distributed
random variables Uij such that P (Uij = 2) = p > 0, P (Uij = 1) = r ≥ 0,
p + r < 1 (1 ≤ i, j ≤ n, i 6= j). The expected number of points under
random results was established by simulation for p = 0.467, r = 0.327 and
1 − p − r = 0.206 (which corresponds to the Italian league in the 1954/55
season). The point scores in the final table are evidently unequal but the
differences are smaller than in reality.
“Pure” models of a league generated only by the strength of teams,
the home-field advantage and random effects give league tables of different
forms. This permits the estimation of these components if we consider them
together.
2.1. Poisson model of league. Assume that the strength vector m is
known and that the goal outcome of a game depends upon: the difference of
the strengths of the teams, the home-field advantage and a random factor.
Formally, assume that the following formulas hold:
Xij = Π (1) (a1 (rij )) + Π (3) (b(mi )) + Π (4) (c1 ),
(3)
Yij = Π (2) (a2 (rji )) + Π (5) (c2 ),
where rij = max(mi − mj , 0) is called the strength index ; Π(λ) denotes a
Poisson random variable with parameter λ; the random variables Π (1) to
Π (5) are mutually independent. Here a1 , a2 , b are known functions and c1 ,
c2 are the model parameters. We will specify these functions later on.
We interpret the random variables Π (1) , Π (2) as the components of the
game result which are due to the difference in team strengths; Π (3) (strictly
speaking its expected value) as home-field advantage; and Π (4) , Π (5) as
random factors. The properties of independent Poisson random variables
58 B. Kopociński

imply the existence of random variables Π1 , Π2 such that


d d
(4) Xij = Π1 (a1 (rij ) + b(mi ) + c1 ), Yij = Π2 (a2 (rji ) + c2 ).
In practice the model functions must be more precise. It is natural to
assume that the expectations of Π (1) , Π (2) are proportional to the strength
index, the expectation of Π (3) is proportional to the strength of the home
team, and the random elements have an identical expected value for each
team. The data of one football season are not extensive, therefore a limita-
tion of the number of parameters is desired. In what follows, in the Poisson
model we make the following
Assumption. The strength vector m is known, the model functions are
as follows:
(5) a1 (rij ) = arij , a2 (rji ) = arji , b(mi ) = bmi , c1 = c2 = c,
and the vector of the model parameters is (a, b, c) = p.
Under the assumption (5) the goal outcome of a game of teams i, j is a
pair of independent Poisson random variables Xij , Yij with expected values
(6) E(Xij ) = arij + bmi + c, E(Yij ) = arji + c, 1 ≤ i, j ≤ n, i 6= j.

3. Problem of parameter estimation. We consider three variants of


available data. First we consider the goal results for games of one league
season. In the second case, besides goals, for each game we also consider the
date of the match. In the third case we consider the final table of points
only. Let D = {(xij , yij ) : 1 ≤ i, j ≤ n, i 6= j} denote the goal results of
the games of one season. Obviously using (1) and (2) we can evaluate the
league table.
The least squares method of estimation is applied for all variants of
the problem. Note that other models and methods of analysis are possible.
For example Lee [5] assumed that the strength index depends upon the
available quotient of strengths. In this model the number of goals scored
by team A in a game between team A and team B is Poisson distributed
with parameter λ(A, B) = (constant)(strength of team A)(strength of team
B)−1 . Then log-linear models of mathematical statistics (see [6]) may be
useful.
3.1. Estimation of p from one season results. Suppose that we estimate
the components of a game result having D. Denote by N = n(n − 1) the
number of games and let 1 ≤ k ≤ N index the games. For each k we
introduce the following variables:
m1 (k) — strength of the home team in the kth game,
m2 (k) — strength of the guest,
r1 (k) — strength index of the home team,
Components of the game result 59

r2 (k) — strength index of the guest,


X(k) — goal outcome of the home team,
Y (k) — number of goals lost by the home team.
If it does not cause confusion, we omit the index k and summation limits.
Set m1 = m1 (k), r = r(k) = m1 (k) − m2 (k), r1 = r1 (k) = max(r(k), 0),
r2 = r2 (k) = max(0, −r(k)), X = X(k), Y = Y (k). Let
 P 2 P P 
P r Pr1 m2 1 P |r|
(7) A =  Pr1 m1 P m1 m1  ,
|r| m1 2N
P P P
(8) B = ( (r1 X + r2 Y ) m1 X (X + Y )).
Theorem 1. Under the assumption (6) the least squares method (LSM )
estimator of p is given by
(9) pb = BA−1 .
It is unbiased with covariance matrix
N
X
(10) Cov(b
p) = (A−1 B1 (k)T B1 (k)A−1 Var(X(k))
k=1
+ A−1 B2 (k)B2 (k)T A−1 Var(Y (k))),
where B1 (k) = (r1 (k), m1 (k), 1), B2 (k) = (r2 (k), 0, 1).
Proof. Define
L(p | X(k), Y (k), 1 ≤ k ≤ N )
N
X
= ((ar1 (k) + bm(k) + c − X(k))2 + (ar2 (k) + c − Y (k))2 ).
k=1

The condition L = min yields the system of linear equations ApT = B T ,


hence pT = A−1 B T . Since AT = A, the solution of the estimation problem
is (9). The estimator pb is unbiased: E(b
pT ) = A−1 E(B T ) = A−1 ApT = p.
Using the notations of Theorem 1 we obtain
N
X
pb = BA−1 = (B1 (k)A−1 X(k) + B2 (k)A−1 Y (k)).
k=1

This estimator is a linear combination of vectors with coefficients being


independent random variables. This yields (10).
3.2. Expected number of goals. We evaluate the efficiency of estimation
taking into account the expected number of goals in the final table. Denote
60 B. Kopociński

by Xi the goal outcome and by Yi the number of goals lost for team i during
the season. We have
n
X n
X
Xi = (Xij + Yji ), Yi = (Yij + Xji ), 1 ≤ i ≤ n.
j=1, j6=i j=1, j6=i

In applications we use the chi-square statistic as the measure of discrep-


ancy of the observed and expected values but omit the statistical inference
because the random variables used are dependent.

Theorem 2. Under the assumptions (5) we have


n
X
(11) E(Xi ) = 2a rij + (n − 1)bmi + 2(n − 1)c,
j=1
n
X
(12) E(Yi ) = 2a rji + b(nm1 − mi ) + 2(n − 1)c;
j=1

the expected number of goals per game in the whole season is


n n
1 X X 4am
e1
(13) µ= E(Xij ) = (2a + b)m1 − + 2c,
N i=1 n−1
j=1, j6=i

where
n n
1X 1X
m1 = mi , e1 =
m (i − 1)mi,n ,
n i=1 n i=1

and m1,n ≥ m2,n ≥ . . . ≥ mn,n is the ordered sequence m1 , . . . , mn .

Proof. The formulas (11) and (12) can be easily verified. To prove (13),
assume without loss of generality that mi ≥ mj for i > j. Then rij = mi −mj
for i > j and rij = 0 otherwise. Hence from (11) we have
X
n  n X
X n
E Xi = a rij + N bm1 + 2N c.
i=1 i=1 j=1

Under the above additional assumption we have mi = mi,n , 1 ≤ i ≤ n.


Hence
n X
X n n
X n
X n
X
rij = 2 (mi − mj ) = 2 (n − 2i + 1)mi = 2N m1 − 4m
e 1.
i=1 j=1 i=1 j=i+1 i=1

Substituting the definition of m1 we obtain (13).


Components of the game result 61

Table 1. The 1938 Polish football league (constant strengths)

Points Goals scored Goals lost


Ruch Chorzów 27 23.7 57 58.3 35 33.2
Warta Poznań 21 20.4 58 43.2 38 34.9
Wisła Kraków 20 19.7 41 40.9 36 35.3
Polonia Warszawa 19 19.0 40 38.8 38 36.0
Pogoń Lwów 19 19.0 23 38.8 26 36.0
AKS Chorzów 18 18.3 42 37.1 30 37.1
Cracovia 18 18.3 37 37.1 42 37.1
Warszawianka 15 15.8 34 33.1 46 41.5
ŁKS Łódź 12 13.3 25 29.8 45 46.5
Śmigły Wilno 11 12.5 29 28.9 50 48.4
Fitting χ2 = 0.83 χ2 = 13.00 χ2 = 5.88

Model parameters: b a = 0.0996, bb = 0.0796, b c = 1.167. Com-


ponents of
P the goal result: due to difference in strength—bar =
1
b
a n(n−1) rij = 0.261; due to home-field advantage—b bm1 =
1.433.

 The data concerning the Polish league in the 1938 season are taken from
[1]. Table 1 shows the results of one season and the corresponding expected
values following from the model. The estimates b a, bb, b
c of the model are also
given. As a final result we divide the goals scored in three parts: b ar, due
to difference in strengths; bam1 , due to home-field advantage; and b c, due to
random factors. Taking into account these components in the example we
divide the mean number 2.861 of goals per game for a home team in the fol-
lowing way: effect of difference in strengths—0.261 goals; effect of home-field
advantage—1.433 goals; effect of random factors—1.167 goals; and similarly
we divide the number 1.428 of goals for a guest as the effect of difference in
strengths—0.261 goals, and effect of random factors—1.167 goals. 
3.3. Estimation of home-field advantage. It is intuitively clear that the
result of a game and the return enables the estimation of home-field advan-
tage. Using notations of the Poisson model we now give two new estimators
of b. The intuitive form is (15), and its improved version is (16). Let us
introduce notations for moments of strengths:
n n
1X 1X 2
(14) m1 = mi , m2 = m .
n i=1 n i=1 i
Theorem 3. Let Dij = Xij − Yij denote the difference of goals scored
per game of teams i, j. Under the assumptions (5), the following formulas
give estimators of home-field advantage:
Xn n
X
(15) bb1 = 1 Dij ,
N m1 i=1
j=1,j6=i
62 B. Kopociński

Xn n
X
(16) bb2 = 1 cij Dij ,
c0 n i=1
j=1, j6=i

where c0 = (n − 2)2 m2 + (3n2 − 4n)m21 , cij = (n − 2)(mi + mj ) + 2nm1 ,


1 ≤ i, j ≤ n. The estimators are unbiased with variances
 2 X
n n
X
1
(17) Var(bb1 ) = Var(Dij ),
m1 N i=1 j=1, j6=i
 2 Xn n
X
1
(18) Var(bb2 ) = c2ij Var(Dij ),
c0 n i=1 j=1, j6=i

where Var(Dij ) = Var(Xij ) + Var(Yij ) = a|mi − mj | + bmi + 2c, 1 ≤ i, j ≤ n.

Proof. From (6) we have E(Dij ) = arij + bmi − arji , and using the
equality
Xn Xn Xn Xn
rij = rji
i=1 j=1, j6=i i=1 j=1, j6=i

we obtain
X
n n
X 
E Dij = bN m1 .
i=1 j=1, j6=i

This shows that (15) is an estimator for b, and also that it is unbiased.
Let Rij = Dij + Dji denote P the sum of differences of goals in a game
n
and in the return and let Ri = j=1, j6=i Rij . From (6) we have E(Rij ) =
b(mi + mj ), hence E(Ri ) = b((n − 2)mi + nm1 ).
Then the LSM estimator of b based on R1 , . . . , Rn has the form
X
n −1 X
n n
X
bb2 = 2 −1
((n−2)mi +nm1 ) ((n−2)mi +nm1 )Ri = (nc0 ) ci Ri ,
i=1 i=1 i=1

where ci = (n − 1)mi + nm1 . Hence


n
X n
X n
X n
X
bb2 = (nc0 )−1 ci (Dij + Dji ) = (nc0 ) −1
cij Dij .
i=1 j=1, j6=i i=1 j=1, j6=i

The estimators (15) and (16) are unbiased. They are linear combinations
of independent random variables Dij . This also gives the formulas for the
variances.
Components of the game result 63

Table 2. Dispersions and covariance matrix of estimators for


selected seasons of the Italian league

Season Dispersion of estimator


Disp(b
a) Disp(b
b) Disp(b
c)
1954/55 .0094 .0028 .0648
1969/70 .0087 .0031 .0602
1994/95 .0043 .0019 .0592
Covariance matrix of b
p
1954/55 1.000
−.078 1.000
−.415 −.547 1.000
1969/70 1.000
−.097 1.000
−.391 −.551 1.000
1994/95 1.000
−.095 1.000
−.404 −.514 1.000

 Table 2 shows the dispersions of estimators (9) of a, b, c and their corre-


lation matrix, calculated for selected seasons of the Italian league. We omit
the results of estimation of home-field advantage from Theorem 3. While
calculating the covariances the point strength estimators of a, b, c are used.
The results support the opinion that the parameters of the Poisson model
can be estimated from one season with usable precision. 

3.4. League plays for points. In the opinion of many football experts a
league team plays for points (or for a place in the final table), not for goals.
Note that in the football pools we often guess the point result of the game.
Hence, for a game between teams i, j with strengths mi , mj it is interesting
to anticipate the point result Uij , Vij . Consider the linear model

arij + ebmi + e
E(Uij ) = e c, E(Vij ) = e
arji + e
c.

a, eb, e
It is easy to see that the LSM estimator of the parameter pe = (e c)
has the form
e −1 ,
p̌ = BA

where A is of the form (7) and

Be = (P(rij Uij + rji Vij ), P


mi Uij
P
(Uij + Vij )),

where the sum is over 1 ≤ i, j ≤ n, i 6= j. We omit the covariance matrix of


this estimator.
64 B. Kopociński

Table 3. The Italian football league (points, constant strengths)

Season 1954/55 Season 1969/70 Season 1994/95


Milan 48 50.7 Cagliari 45 47.6 Juventus 73 78.4
Udinese 44 44.7 Inter 41 41.8 Lazio 63 63.1
Roma 41 40.5 Juventus 38 37.7 Parma 63 63.1
Bologna 40 39.2 Milan 36 35.1 Milan 60 58.9
Fiorentina 39 37.9 Fiorentina 36 35.1 Roma 59 57.6
Napoli 38 36.7 Napoli 31 29.5 Inter 52 48.9
Juventus 37 35.6 Torino 30 28.4 Napoli 51 47.7
Inter 36 34.5 Vicenza 29 27.5 Sampdoria 50 46.6
Sampdoria 34 32.6 Lazio 29 27.5 Cagliari 49 45.6
Torino 34 32.6 Bologna 28 26.7 Fiorentina 47 43.7
Genoa 31 30.1 Roma 28 26.7 Torino 45 41.9
Catania 30 29.4 Verona 26 25.4 Bari 44 41.1
Lazio 30 29.4 Sampdoria 24 24.3 Cremonese 41 38.9
Triestina 30 29.4 Brescia 20 22.4 Genoa 40 38.2
Atalanta 28 28.3 Palermo 20 22.4 Padova 40 38.2
Novara 28 28.3 Bari 19 22.1 Foggia 34 35.2
Spal 23 26.4 Reggiana 18 28.1
Pro Patria 21 25.7 Brescia 12 25.9
Fitting χ2 = 1.86 Fitting χ2 = 1.65 Fitting χ2 = 13.51

The characteristics of the leagues are shown in Table 6.

 Table 3 shows the result of estimations for selected seasons of the Italian
league. The left columns show the points scored, and the right columns the
expected ones. It may be observed that the main league teams scored fewer
points than it follows from the model and there are overestimates in the
lower area of the table. 
3.5. Dynamics of strength. The strength of teams varies during the sea-
son. At the beginning of a league season this is a topic of speculations, after
each round it is reasonable to update it using the actual results.
Suppose that both the results and dates of all games of the season are
known. Let us group the games in N = 2(n − 1) rounds consisting of n/2
games each (for simplicity we assume that n is even and that the league plays
without changing the time table). Let 1 ≤ k ≤ 2(n − 1) index rounds and let
1 ≤ l ≤ n/2 index games in a round. We can assume that the strengths are
random variables depending upon the number of the round. We arbitrarily
(1)
assume that the initial strengths mi are known, and after each round we
(k)
update them using the actual table of points fi in the following way:
(k+1) (k) N − k − 1 (k)
mi = mi + fi , 1 ≤ i ≤ n, 1 ≤ k ≤ N − 1.
N
Let (Xk , Yk ) = {(Xjl , Yjl ) : 1 ≤ j ≤ k − 1, 1 ≤ l ≤ n/2} be the set
of results of all games before the kth round. From (1) it follows that the
Components of the game result 65

(k) (k)
outcome table fi , 1 ≤ i ≤ n, as well as the strengths mi are random
variables depending upon (Xk , Yk ).
Assume that the game indexed by (k, l) is played by teams indexed by
(k) (k)
i = i(k, l), j = j(k, l) with strengths mi , mj and the goal outcome is
Xkl , Ykl . For simplicity of notation we omit the indices and set
m1 — strength of the home team in the game,
m2 — strength of the guest,
r1 = max(m1 − m2 , 0) — strength index of the home team,
r2 = max(m2 − m1 , 0) — strength index of the guest,
X — goal outcome of the home team,
Y — number of goals lost by the home team.

Theorem 4. Assume that the random variables X, Y conditioned by


Xk , Yk are mutually independent Poisson variables with E(X | (Xk , Yk )) =
ar1 + bm1 + c, E(Y | (Xk , Yk )) = ar2 + c. Then p has an estimator of the
form (9), with covariance matrix (10), where A, B are given by (7) and (8),
and the index k in (10) is replaced by (k, l).

Table 4. The 1997/98 Polish football league (constant point


strengths)

Points Goals scored Goals lost


ŁKS 66 60.8 52 55.8 23 33.6
Polonia 63 58.8 46 52.6 30 33.7
Wisła 61 57.4 50 50.5 30 34.0
Widzew 61 57.4 53 50.5 34 34.0
Legia 59 55.9 50 48.7 32 34.4
Ruch 55 53.0 48 45.1 39 35.5
Amica 50 49.2 38 41.0 31 37.1
Górnik 48 47.7 48 39.4 42 37.9
Odra 48 47.7 51 39.4 50 37.9
Lech 46 46.2 41 38.1 37 38.8
Stomil 45 45.4 38 37.5 45 39.3
GKS 43 43.9 37 36.3 33 40.5
Zagłębie 43 43.9 39 36.3 40 40.5
Pogoń 43 43.9 36 36.3 40 40.5
Petrochemia 38 40.1 28 34.3 54 44.2
Dyskobolia 29 33.6 30 31.0 55 51.2
KSZO 24 30.1 24 29.4 47 55.4
Raków 17 25.6 21 27.6 68 61.6
Fitting χ2 = 6.39 χ2 = 11.07 χ2 = 16.78

Model parameters: b a = 0.0258, bb = 0.0123, b c = 0.706. Com-


ponents of the goal result: due to difference in strength—b
ar =
0.199; due to home-field advantage—b am1 = 0.575.
66 B. Kopociński

Fig. 1. The 1997/98 Polish football league. Variability of strengths during the season.

Table 5. The 1997/98 Polish football league (varied strengths)

Points Goals scored Goals lost


ŁKS 66 63.6 52 59.6 23 32.4
Polonia 63 58.6 46 52.0 30 33.4
Wisła 61 56.0 50 48.4 30 33.9
Widzew 61 59.2 53 52.5 34 32.9
Legia 59 58.4 50 51.2 32 33.1
Ruch 55 53.0 48 44.8 39 34.9
Amica 50 49.5 38 40.9 31 36.5
Górnik 48 47.5 48 39.0 42 37.7
Odra 48 46.4 51 38.1 50 38.4
Lech 46 44.8 41 36.8 37 39.2
Stomil 45 45.1 38 37.2 45 39.2
GKS 43 45.3 37 37.2 33 39.1
Zagłębie 43 43.2 39 35.8 40 41.0
Pogoń 43 44.1 36 36.3 40 39.9
Petrochemia 38 39.8 28 33.6 54 44.3
Dyskobolia 29 34.8 30 31.5 55 49.8
KSZO 24 27.8 24 28.3 47 58.7
Raków 17 23.7 21 26.8 68 65.5
Fitting χ2 = 4.71 χ2 = 12.40 χ2 = 15.95

Model parameters: ba = 0.0309, bb = 0.0124, bc = 0.677. Compo-


nents of the point result: due to difference in strength—bar =
0.231; due to home-field advantage—b am1 = 0.576.
Components of the game result 67

 Table 4 shows the results for the Polish league of the 1997/98 season and
the expected results given by the model under the assumption of constant
point strengths. The model parameters and the expected components of the
result in goals are also given.
Table 5 shows the result of estimation of the model parameters for the
season considered in Table 4 under the assumption of changeable strength.
The parameters and expected results are as above. Other modifications of
the definition of strength are considered. A better forecast is given under
the assumption of equal strengths for each team at the beginning of the
season than assuming the strengths given by the table of the previous season.
Figure 1 shows the variability of strength of teams in the whole season. In
order to eliminate inessential drifts we assume that the initial and final
strengths are equal. 
3.6. Estimation of p from the table. Suppose that we estimate the com-
ponents of the game result under the limit data: having the final league table
k
only. Let pk (λ) = λk! e−λ , k ≥ 0, λ > 0, denote the Poisson probabilities with
parameter λ. In the Poisson model from (1), using the values (6) we get
∞ X
X ∞
E(Uij ) = pk (rij a + mi b + c)pl (rji a + c)(K1k>l + 1k=l ).
k=0 l=0

Hence the expected point outcome for the ith team is a function ui of p:
n
X
ui (p) := E(Ui ) = E(Uij ).
j=1, j6=i

We define
n
X

L (p | Ui , 1 ≤ i ≤ n) = (ui (p) − Ui )2 /ui (p).
i=1

The condition L (p | Ui , 1 ≤ i ≤ n) = min gives the LSM estimator of p. The


function L is very complex, so the calculations of the minimum are rather


laborious. We get some simplification by assuming that the expected and
empirical numbers of goals per game are equal.

4. Strength in the return match. Football experts devote much at-


tention to “historical” remarks. In particular it is supposed that the return
game has an element of retaliation: a defeat implies additional strength in
the return. We describe the role of retaliation using the residuals of the LSM
estimation.
For the game and return of teams i, j, consider the residuals Xij −E(Xij ),
Yij − E(Yij ) and the difference ∆ij of residuals:
68 B. Kopociński

(19) ∆ij = Dij − a(rij − rij ) − bmi


= Dij − a(mi − mj ) − bmi , 1 ≤ i < j ≤ n,
and also the difference ∆0ij = ∆ji of the residuals in the return:
∆0ij = Dji − a(mj − mi ) − bmj .
Under the assumption (5) these random variables are centered and mu-
tually independent. We say that the retaliation hypothesis is confirmed if
the covariance of the residuals is positive. For the correlation test we define
the following statistic:
n n
1 X X
%= ∆ij ∆0ij (Var(Dij ) Var(Dji ))−1/2 .
2N i=1 j=1,j<i
 A few seasons of the Italian league do not confirm the retaliation hy-
pothesis. Taking into account the goal results of the whole season, assuming
point strengths mi and the LSM estimators of the parameters, we calculate
% for residuals (19). We get % = 0.012 in 1954/55, % = 0.122 in 1969/70,
% = 0.040 in 1994/95. These correlations are positive, but the differences
from zero are not significant. 

5. Stable strength vector. The definition of team or player orderings,


for example in tennis, chess and so on, is an interesting problem for experts
and mathematicians. Obviously the result of competition is expected to be
consistent with the defined order. Hence the proposed strengths in football
should simulate the point table.
For the strength vector m and the data D let pb = P (m, D) denote the
estimator of p. Having the strengths and parameters, using the model, we can
anticipate each outcome, and from (1) and (2) we can create the expected
final point table m0 . It may play the role of the new strength vector. The
vector m0 is a function of m and pb. We denote it by T :
m0 = T (P (m, D), m).
We say that a strength vector m∗ is stable for D if it satisfies the equation
m∗ = T (P (m∗ , D), m∗ ).
The equation does not guarantee the uniqueness of a stable strength.
Some examples can be calculated using the possible convergence of the se-
quence
(20) mt+1 = T (P (mt , D), mt ), t ≥ 0,
where m0 is the initial strength, proposed in practice by experts.
 Starting from the point strength for the Italian league the sequence (20)
converges for the 1954/55 and 1969/70 seasons, while for the 1994/95 season
Components of the game result 69

it is periodic. The stable (periodic) strengths do not differ much from the
point strengths. We omit the numerical details. 

6. Supplements and generalizations. The assumption that home-


field advantage is a linear function of the strength of the home team may be
attractive. A natural modification of the model may be the differentiation
of the mean number of random goals for the home and guest teams. Under
the linear dependence in (6) this idea is included in the modification of
home-field advantage given below.
6.1. Modified home-field advantage. Suppose that in the Poisson model
the goal outcome of the game of teams i, j with strengths mi , mj equals
(21) Xij = Π1 (arij + bmi + b0 + c), Yij = Π2 (arji + c).
Comparing this with (4) we see here the additional parameter b0 .
Having the strength vector m and the data D we can estimate a, b, b0 , c
using the least squares method. The Italian league shows that the assump-
tion b0 = 0, b 6= 0 describes the league better than the assumption b0 6= 0,
b = 0. In general, one season data do not permit us to reject the hypothesis
b0 = 0 under b 6= 0.
It is obvious that the number of points scored in one season depends upon
the number of teams. In the history of many leagues the numbers of teams
are different, hence also the model parameters based on point strength are
not comparable. This disadvantage may be eliminated by a standardization
of strengths.
Consider a league with strength vector m and model (21) with parame-
ters a, b, b0 , c. If m is linearly transformed to
m∗i = Ami + B, 1 ≤ i ≤ n,
then the model parameters are
a∗ = a/A, b∗ = b/A, b∗0 = b0 + B/A, c∗ = c.
In the standard league we take n = 18 and K = 3. If n 6= 18 then we
use A = 17 · 18/(n(n − 1)). When K 6= 2, then we use A = E(U3 )/E(UK )
independently of the previous one, where E(UK ) is the mean number of
points scored per game in the whole season, E(U2 ) = 2. We omit the details
of examples.
6.2. Style of game. Experts often emphasize the importance of imposing
one’s own style of playing on one’s opponent. In great simplification sup-
pose that a style can be defensive or offensive. In the first case the team
puts its whole effort to self-defense, in the other case the team responds
to each attack by an attack. The Poisson model considered before is now
70 B. Kopociński

understood as being defensive. For the offensive style we assume that in the
game between teams i, j with strengths mi , mj the goal outcome is
Xij = Π1 (arij + bmi + c) + Π3 (d), Yij = Π2 (arji + c) + Π3 (d),
where Π1 , Π2 have the interpretation as in (4), and Π3 (d) is a Poisson
random variable with parameter d, independent of Π1 and Π2 . We interpret
Π3 as the result of reciprocal attacks.
We reduce testing the hypothesis on imposing the offensive style to test-
ing the parametric hypothesis H0 : d = 0 against H1 : d > 0.
 The data concerning the Italian league, for example in the 1994/95 season,
do not permit the rejection of the hypothesis H0 under the alternative H1 .
One may suppose that the element of style also appears in other games, for
example in basketball, but this is beyond the scope of this note. 

7. Final remarks. The components of sport results described in this


note are rather evident for experts, but the examples give usable conclusions,
thanks to quantitative expressions of the effects considered.
Note that our data are always limited: we have the full results of the
league round after round, composite results of games of one season, results
of an incomplete season, or at least, the table of the league only. The model
which uses strengths, home-field advantage and a random component gives
in my opinion a quite good description of the expected league table. For the
Italian and Polish leagues the established components are comparable. Note
that the random component is large. This does not permit forecasting the
result of one particular game, but it may be used to anticipate the result of
group eliminations.
The Poisson model sometimes gives interesting details. If we estimate
the components from the table, the point outcome of a champion is smaller
than expected; the point outcome in the critical area of the table is larger
than expected. We observe large deviations in the home-field advantage for
the leagues considered.
The estimation of components which determine the outcome is not equiv-
alent to the problem of forecasting a result. Having the expected number of
goals per game, assuming Poisson variables Xij , Yij for goals scored, we can
calculate the probability of win P (Xij > Yij ), tie P (Xij = Yij ) and defeat
P (Xij < Yij ). For many leagues we have 1 < E(Xij ) < 2, 1 < E(Yij ) < 2,
which implies that the most probable result is 1:1.
 According to the assumptions (3) and (5) of the Poisson model in the game
between teams i, j with strengths mi , mj the expected score is as follows:
arij is the expected number of goals following from the strength index, bbmi
b
is the expected number goals following from home-field advantage and b c is
Components of the game result 71

the expected number of incidental goals. Hence ar 1 is the expected number


of goals in the season due to the strength index, and bm1 is the expected
number of goals due to home-field advantage.
Table 6. Model parameters in Polish and Italian leagues

LSM model for goals: estimation from league season


Season n K Goals Points b
a b
b b
c b
ar b
bm1
I 1954/55 18 2 2.72 2 .0467 .0155 .9028 .193 .528
I 1969/70 16 2 1.94 2 .0441 .0134 .5727 .195 .402
I 1994/95 18 3 2.53 2.75 .0221 .0129 .7747 .188 .601
P 1938 10 2 4.29 2 .0996 .0796 1.167 .261 1.43
P 1997/98 18 3 2.39 2.75 .0258 .0123 .706 .199 .575
LSM model for points: estimation from league season
I 1954/55 18 2 .0365 .0142 .6074
I 1969/70 16 2 .0409 .0150 .5946 .181 .450
I 1994/95 18 3 2.75 .0366 .0172 .6593 .312 .806
Estimation from league table
I 1969/70 16 2 .0473 .0206 .4495
I 1994/95 18 3 .0216 .0244 .5101
I 1996/97 18 3 .0269 .0170 .5890

I – Italian league, P – Polish league, n – number of teams, K – number of points for


winning team; goals – mean number of goals in the season; points – mean point outcome
a, b
in the season; b c – estimated values of the model parameters; b
b, b ar – expected number
of goals per game following from the strength index; b bm1 – expected number of goals
per game following from home-field advantage. The parameters are not reduced to the
standard league.

Table 6 gathers the results concerning the Italian league in 1954/55–


1994/95 and the Polish league in 1938 and 1997/98. The number of teams
in the league, the number of teams dropping to the lower class after the
season and the parameter K vary. Also tendencies in football art change (see
the mean number of goals scored per game). This improves the variation of
the parameters in Table 6. The unexpected conclusion from this analysis
is a relatively large expected number of goals due to the strength index,
compared to the number of goals due to home-field advantage and to the
expected number of random goals. 

References

[1] J. Jeleń et al., Liga gra po pięćdziesiątce, Wyd. II, Wydawnictwo Sport i Turystyka,
Warszawa, 1987 (in Polish).
[2] Encyklopedia Fuji, Rocznik 98-99, Tom 22, Wydawnictwo GiA, Katowice 1998 (in
Polish).
72 B. Kopociński

[3] M. E. Glickman and H. S. Stern, A state-space model for National Football League
scores, J. Amer. Statist. Assoc. 93 (1998), 25–35.
[4] J. B. Keller, A characterization of the Poisson distribution and the probability of
winning a game, Amer. Statistician 48 (1994), 294–298.
[5] A. Lee, Modeling scores in the Premier League: Is Manchester United really the best? ,
Chance 10 (1997), 15–19.
[6] P. McCullagh and J. A. Nelder, Generalized Linear Models, 2nd. ed., Chapman &
Hall, 1989.

Mathematical Institute
University of Wrocław
Pl. Grunwaldzki 2/4
50-384 Wrocław, Poland
E-mail: [email protected]

Received on 15.2.2000;
revised version on 26.6.2000 (1528)

You might also like