Components of The Game Result in A Football League
Components of The Game Result in A Football League
Components of The Game Result in A Football League
[55]
56 B. Kopociński
by Xi the goal outcome and by Yi the number of goals lost for team i during
the season. We have
n
X n
X
Xi = (Xij + Yji ), Yi = (Yij + Xji ), 1 ≤ i ≤ n.
j=1, j6=i j=1, j6=i
where
n n
1X 1X
m1 = mi , e1 =
m (i − 1)mi,n ,
n i=1 n i=1
Proof. The formulas (11) and (12) can be easily verified. To prove (13),
assume without loss of generality that mi ≥ mj for i > j. Then rij = mi −mj
for i > j and rij = 0 otherwise. Hence from (11) we have
X
n n X
X n
E Xi = a rij + N bm1 + 2N c.
i=1 i=1 j=1
The data concerning the Polish league in the 1938 season are taken from
[1]. Table 1 shows the results of one season and the corresponding expected
values following from the model. The estimates b a, bb, b
c of the model are also
given. As a final result we divide the goals scored in three parts: b ar, due
to difference in strengths; bam1 , due to home-field advantage; and b c, due to
random factors. Taking into account these components in the example we
divide the mean number 2.861 of goals per game for a home team in the fol-
lowing way: effect of difference in strengths—0.261 goals; effect of home-field
advantage—1.433 goals; effect of random factors—1.167 goals; and similarly
we divide the number 1.428 of goals for a guest as the effect of difference in
strengths—0.261 goals, and effect of random factors—1.167 goals.
3.3. Estimation of home-field advantage. It is intuitively clear that the
result of a game and the return enables the estimation of home-field advan-
tage. Using notations of the Poisson model we now give two new estimators
of b. The intuitive form is (15), and its improved version is (16). Let us
introduce notations for moments of strengths:
n n
1X 1X 2
(14) m1 = mi , m2 = m .
n i=1 n i=1 i
Theorem 3. Let Dij = Xij − Yij denote the difference of goals scored
per game of teams i, j. Under the assumptions (5), the following formulas
give estimators of home-field advantage:
Xn n
X
(15) bb1 = 1 Dij ,
N m1 i=1
j=1,j6=i
62 B. Kopociński
Xn n
X
(16) bb2 = 1 cij Dij ,
c0 n i=1
j=1, j6=i
Proof. From (6) we have E(Dij ) = arij + bmi − arji , and using the
equality
Xn Xn Xn Xn
rij = rji
i=1 j=1, j6=i i=1 j=1, j6=i
we obtain
X
n n
X
E Dij = bN m1 .
i=1 j=1, j6=i
This shows that (15) is an estimator for b, and also that it is unbiased.
Let Rij = Dij + Dji denote P the sum of differences of goals in a game
n
and in the return and let Ri = j=1, j6=i Rij . From (6) we have E(Rij ) =
b(mi + mj ), hence E(Ri ) = b((n − 2)mi + nm1 ).
Then the LSM estimator of b based on R1 , . . . , Rn has the form
X
n −1 X
n n
X
bb2 = 2 −1
((n−2)mi +nm1 ) ((n−2)mi +nm1 )Ri = (nc0 ) ci Ri ,
i=1 i=1 i=1
The estimators (15) and (16) are unbiased. They are linear combinations
of independent random variables Dij . This also gives the formulas for the
variances.
Components of the game result 63
3.4. League plays for points. In the opinion of many football experts a
league team plays for points (or for a place in the final table), not for goals.
Note that in the football pools we often guess the point result of the game.
Hence, for a game between teams i, j with strengths mi , mj it is interesting
to anticipate the point result Uij , Vij . Consider the linear model
arij + ebmi + e
E(Uij ) = e c, E(Vij ) = e
arji + e
c.
a, eb, e
It is easy to see that the LSM estimator of the parameter pe = (e c)
has the form
e −1 ,
p̌ = BA
Table 3 shows the result of estimations for selected seasons of the Italian
league. The left columns show the points scored, and the right columns the
expected ones. It may be observed that the main league teams scored fewer
points than it follows from the model and there are overestimates in the
lower area of the table.
3.5. Dynamics of strength. The strength of teams varies during the sea-
son. At the beginning of a league season this is a topic of speculations, after
each round it is reasonable to update it using the actual results.
Suppose that both the results and dates of all games of the season are
known. Let us group the games in N = 2(n − 1) rounds consisting of n/2
games each (for simplicity we assume that n is even and that the league plays
without changing the time table). Let 1 ≤ k ≤ 2(n − 1) index rounds and let
1 ≤ l ≤ n/2 index games in a round. We can assume that the strengths are
random variables depending upon the number of the round. We arbitrarily
(1)
assume that the initial strengths mi are known, and after each round we
(k)
update them using the actual table of points fi in the following way:
(k+1) (k) N − k − 1 (k)
mi = mi + fi , 1 ≤ i ≤ n, 1 ≤ k ≤ N − 1.
N
Let (Xk , Yk ) = {(Xjl , Yjl ) : 1 ≤ j ≤ k − 1, 1 ≤ l ≤ n/2} be the set
of results of all games before the kth round. From (1) it follows that the
Components of the game result 65
(k) (k)
outcome table fi , 1 ≤ i ≤ n, as well as the strengths mi are random
variables depending upon (Xk , Yk ).
Assume that the game indexed by (k, l) is played by teams indexed by
(k) (k)
i = i(k, l), j = j(k, l) with strengths mi , mj and the goal outcome is
Xkl , Ykl . For simplicity of notation we omit the indices and set
m1 — strength of the home team in the game,
m2 — strength of the guest,
r1 = max(m1 − m2 , 0) — strength index of the home team,
r2 = max(m2 − m1 , 0) — strength index of the guest,
X — goal outcome of the home team,
Y — number of goals lost by the home team.
Fig. 1. The 1997/98 Polish football league. Variability of strengths during the season.
Table 4 shows the results for the Polish league of the 1997/98 season and
the expected results given by the model under the assumption of constant
point strengths. The model parameters and the expected components of the
result in goals are also given.
Table 5 shows the result of estimation of the model parameters for the
season considered in Table 4 under the assumption of changeable strength.
The parameters and expected results are as above. Other modifications of
the definition of strength are considered. A better forecast is given under
the assumption of equal strengths for each team at the beginning of the
season than assuming the strengths given by the table of the previous season.
Figure 1 shows the variability of strength of teams in the whole season. In
order to eliminate inessential drifts we assume that the initial and final
strengths are equal.
3.6. Estimation of p from the table. Suppose that we estimate the com-
ponents of the game result under the limit data: having the final league table
k
only. Let pk (λ) = λk! e−λ , k ≥ 0, λ > 0, denote the Poisson probabilities with
parameter λ. In the Poisson model from (1), using the values (6) we get
∞ X
X ∞
E(Uij ) = pk (rij a + mi b + c)pl (rji a + c)(K1k>l + 1k=l ).
k=0 l=0
Hence the expected point outcome for the ith team is a function ui of p:
n
X
ui (p) := E(Ui ) = E(Uij ).
j=1, j6=i
We define
n
X
∗
L (p | Ui , 1 ≤ i ≤ n) = (ui (p) − Ui )2 /ui (p).
i=1
it is periodic. The stable (periodic) strengths do not differ much from the
point strengths. We omit the numerical details.
understood as being defensive. For the offensive style we assume that in the
game between teams i, j with strengths mi , mj the goal outcome is
Xij = Π1 (arij + bmi + c) + Π3 (d), Yij = Π2 (arji + c) + Π3 (d),
where Π1 , Π2 have the interpretation as in (4), and Π3 (d) is a Poisson
random variable with parameter d, independent of Π1 and Π2 . We interpret
Π3 as the result of reciprocal attacks.
We reduce testing the hypothesis on imposing the offensive style to test-
ing the parametric hypothesis H0 : d = 0 against H1 : d > 0.
The data concerning the Italian league, for example in the 1994/95 season,
do not permit the rejection of the hypothesis H0 under the alternative H1 .
One may suppose that the element of style also appears in other games, for
example in basketball, but this is beyond the scope of this note.
References
[1] J. Jeleń et al., Liga gra po pięćdziesiątce, Wyd. II, Wydawnictwo Sport i Turystyka,
Warszawa, 1987 (in Polish).
[2] Encyklopedia Fuji, Rocznik 98-99, Tom 22, Wydawnictwo GiA, Katowice 1998 (in
Polish).
72 B. Kopociński
[3] M. E. Glickman and H. S. Stern, A state-space model for National Football League
scores, J. Amer. Statist. Assoc. 93 (1998), 25–35.
[4] J. B. Keller, A characterization of the Poisson distribution and the probability of
winning a game, Amer. Statistician 48 (1994), 294–298.
[5] A. Lee, Modeling scores in the Premier League: Is Manchester United really the best? ,
Chance 10 (1997), 15–19.
[6] P. McCullagh and J. A. Nelder, Generalized Linear Models, 2nd. ed., Chapman &
Hall, 1989.
Mathematical Institute
University of Wrocław
Pl. Grunwaldzki 2/4
50-384 Wrocław, Poland
E-mail: [email protected]
Received on 15.2.2000;
revised version on 26.6.2000 (1528)