Understanding Baseball Team Standings and Streaks: PACS Numbers: 89.75.-k, 02.50.Cw

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

a

r
X
i
v
:
0
8
0
4
.
1
1
1
0
v
2


[
p
h
y
s
i
c
s
.
s
o
c
-
p
h
]


2
8

J
u
l

2
0
0
8
Understanding Baseball Team Standings and Streaks
C. Sire
1
and S. Redner
1, 2
1
Laboratoire de Physique Theorique - IRSAMC, CNRS,
Universite Paul Sabatier, 31062 Toulouse, France
2
Center for Polymer Studies and Department of Physics,
Boston University, Boston, Massachusetts 02215, USA
Can one understand the statistics of wins and losses of baseball teams? Are their consecutive-
game winning and losing streaks self-reinforcing or can they be described statistically? We apply the
Bradley-Terry model, which incorporates the heterogeneity of team strengths in a minimalist way, to
answer these questions. Excellent agreement is found between the predictions of the Bradley-Terry
model and the rank dependence of the average number team wins and losses in major-league baseball
over the past century when the distribution of team strengths is taken to be uniformly distributed
over a nite range. Using this uniform strength distribution, we also nd very good agreement
between model predictions and the observed distribution of consecutive-game team winning and
losing streaks over the last half-century; however, the agreement is less good for the previous half-
century. The behavior of the last half-century supports the hypothesis that long streaks are primarily
statistical in origin with little self-reinforcing component. The data further show that the past half-
century of baseball has been more competitive than the preceding half-century.
PACS numbers: 89.75.-k, 02.50.Cw
I. INTRODUCTION
The physics of systems involving large numbers of in-
teracting agents is currently a thriving eld of research
[1]. One of its many appeals lies in the opportunity
it oers to apply precise methods and tools of physics
to the realm of soft science. In this respect, biolog-
ical, economic, and a large variety of human systems
present many examples of competitive dynamics that can
be studied qualitatively or even quantitatively by statis-
tical physics. Among them, sports competitions are par-
ticularly appealing because of the large amount of data
available, their popularity, and the fact that they con-
stitute almost perfectly isolated systems. Indeed, most
systems considered in econophysics [2] or evolutionary
biology [3] are strongly aected by external and often
unpredictable factors. For instance, a nancial model
cannot predict the occurrence of wars or natural dis-
asters which dramatically aect nancial markets, nor
can it include the eect of many other important exter-
nal parameters (Chinas GDP growth, German exports,
Googles prot. . . ). On the other hand, sport leagues
(soccer [4], baseball [5], football [6]. . . ) or tournaments
(basketball [7, 8], poker [9]. . . ) are basically isolated sys-
tems that are much less sensitive to external inuences.
Hence, despite their intrinsic human nature, which ac-
tually contribute to their appeal, competitive sports are
particularly suited to quantitative theoretical modeling.
In this spirit, this work is focused on basic statistical
features of game outcomes in Major-League baseball.
In Major-League baseball and indeed in any competi-
tive sport, the main observable is the outcome of a single
game who wins and who loses. Then at the end of
a season, the win/loss record of each team is fundamen-
tal. As statistical physicists, we are not concerned with
the fates of individual teams, but rather with the aver-
age win/loss record of the 1
st
, 2
nd
, 3
rd
, etc. teams, as
well as the statistical properties of winning and losing
streaks. We concentrate on major-league baseball to il-
lustrate statistical properties of game outcomes because
of the large amount of available data [10] and the near
constancy of the game rules during the so-called modern
era that began in 1901.
For non-US readers or for non-baseball fans, during the
modern era of major-league baseball, teams have been
divided into the nearly-independent American and Na-
tional leagues [11]. At the end of each season a champion
of the American and National leagues is determined (by
the best team in each league prior to 1961 and by league
playos subsequently) that play in the World Series to
determine the champion. As the data will reveal, it is
also useful to separate the 19011960 early modern era,
with a 154-game season and 16 teams, and the 19612005
expansion era, with a 162-game season in which the num-
ber of teams expanded in stages to its current value of
30, to highlight systematic dierences between these two
periods. Our data is based on the 163674 regular-season
games that have occurred between 1901 and the end of
the 2005 season (72741 between 190160 and 90933 be-
tween 19612005).
While the record of each team can change signicantly
from year to year, we nd that the time average win/loss
record of the r
th
-ranked team as a function of rank r is
strikingly regular. One of our goals is to understand the
rank dependence of this win fraction. An important out-
come of our study is that the Bradley-Terry (BT) compe-
tition model [12, 13] provides an excellent account of the
team win/loss records. This agreement between the data
and theory is predicated on using a specic form for the
distribution of team strengths. We will argue that the
best match to the data is achieved by using a uniform
distribution of teams strengths in each season.
2
Another goal of this work is to understand the sta-
tistical features of consecutive-game team winning and
losing streaks. The existence of long streaks of all types
of exceptional achievement in baseball, as well as in most
competitive sports, have been well documented [14] and
continue to be the source of analysis and debate among
sports fans. For long consecutive-game team winning and
team losing streaks, an often-invoked theme is the no-
tion of reinforcementa team that is on a roll is more
likely to continue winning, and vice versa for a slump-
ing team on a losing streak. The question of whether
streaks are purely statistical or self reinforcing contin-
ues to be vigorously debated [15]. Using the BT model
and our inferred uniform distribution of team strengths,
we compute the streak length distribution. We nd that
the theoretical prediction agrees extremely well with the
streak data during 19612005. However, there is a slight
discrepancy between theory and the tail of the streak dis-
tribution during 190160, suggesting that non-statistical
eects may have played a role during this early period.
As a byproduct of our study, we nd clear evidence
that baseball has been more competitive during 1961
2005 than during 190160 and feature that has been
found previously [16]. The manifestation of this increased
competitiveness is that the range of team records and the
length of streaks was narrower during the latter period.
This observation ts with the general principle [17] that
outliers become progressively rarer in a highly compet-
itive environment. Consequently, extremes of achieve-
ment become less and less likely to occur.
II. STATISTICS OF THE WIN FRACTION
A. Bradley-Terry Model
Our starting point to account for the win/loss records
of all baseball teams is the BT model [12, 13] that incor-
porates the heterogeneity in team strengths in a natural
and simple manner. We assume that each team has an
intrinsic strength x
i
that is xed for each season. The
probability that a team of strength x
i
wins when it plays
a team of strength x
j
is simply
p
ij
=
x
i
x
i
+x
j
. (1)
Thus the winning probability depends continuously on
the strengths of the two competing teams [18]. When two
equal-strength teams play, each team has a 50% proba-
bility to win, while if one team is much stronger, then its
winning probability approaches 1.
The form of the winning probability of Eq. (1) is quite
general. Indeed, we can replace the team strength x
i
by
any monotonic function f(x
i
). The only indispensable
attribute is the ordering of the team strengths. Thus the
notion of strength is coupled to the assumed form of the
winning probability. If we make a hypothesis about one
of these quantities, then the other is no longer a vari-
able that we are free to choose, but an outcome of the
model. In our analysis, we adopt the form of the winning
probability in Eq. (1) because of its simplicity. Then the
only relevant unknown quantity is the probability distri-
bution of the x
i
s. As we shall see in the next section,
this distribution of team strengths can then be inferred
from the season-end win/loss records of the teams, and
a good t to the data is obtained when assuming a uni-
form distribution of team strengths. Because only the
ratio of team strengths is relevant in Eq. (1), we there-
fore take team strengths to be uniformly distributed in
the range [x
min
, 1], with 0 x
min
1. Thus the only
model parameter is the value of x
min
.
For uniformly distributed team strengths {x
j
} that lie
in [x
min
, 1], the average winning fraction for a team of
strength x that plays a large number of games N, with
equal frequencies against each opponent is
W(x) =
1
N
N

j=1
x
x +x
j

x
1 x
min
_
1
xmin
dy
x +y
=
x
1 x
min
ln
_
x + 1
x +x
min
_
, (2)
where we assume N in the second line. We then
transform from strength x to scaled rank r by x = x
min
+
(1 x
min
)r, with r = 0, 1 corresponding to the weakest
and strongest team, respectively (Fig. 1). This result for
the win fraction is one of our primary results.
0 0.2 0.4 0.6 0.8 1
r
0.3
0.4
0.5
0.6
0.7
W
(
r
)
FIG. 1: Average win fraction W(r) versus scaled rank r
for 190160 () and 19612005 (). For these periods, the
dashed lines are simulation results for the BT model with
xmin = 0.278 and 0.435 respectively. The solid curves rep-
resent Eq. (2), corresponding to simulations for an innitely
long season and an innite number of teams.
To check the prediction of Eq. (2), we start with a
value of x
min
and simulate 10
4
periods of a model base-
ball league that consists of: (i) 16 teams that play 60
3
0 0.2 0.4 0.6 0.8 1
r
0.4
0.5
0.6
W
(
r
)
FIG. 2: Convergence of W(r) versus scaled rank r as a func-
tion of season length for 19612005, using xmin = 0.435 and
30 teams. The circles and the thick dashed curve are the
baseball data and the corresponding BT model data for a
n = 162 game season. The thin dashed lines are model data
for a season of n = 300, 500, and 1000 games averaged over
100000 seasons. The full line corresponds to the model for an
innitely long season with 30 teams. Finally, the + symbols
give the result of Eq. (2), which corresponds to an innite-
length season and an innite number of teams.
seasons of 154 games (corresponding to 190160) and (ii)
30 teams that play 45 seasons of 162 games (19612005),
with uniformly distributed strengths in [x
min
, 1] for both
cases, but with dierent values of x
min
. Using the win-
ning probability p
ij
of Eq. (1), we then compute the av-
erage win fraction W(r) of each team as function of its
scaled rank r. We then incrementally update the value
of x
min
to minimize the dierence between the simu-
lated values of W(r) with those from game win/loss data.
Nearly the same results are found if each team plays every
opponent with equal probability or equally often, as long
as the number of teams and number of games is not unre-
alistically small. The BT model, with each team playing
each opponent with the same probability, gives very good
ts to the data by choosing x
min
= 0.278 for the period
190160, and x
min
= 0.435 for 19612005 (Fig. 1). If
the actual game frequencies in each season are used to
determine opponents, x
min
changes slightlyto 0.289 for
190160but remains unchanged for 19612005.
Despite the fact that the number of teams has in-
creased from 16 to 30 since in 1961, the range of win
fractions is larger in the early era (0.320.67) than in the
expansion era (0.360.63), a feature that indicates that
baseball has become more competitive. This observation
accords with the notion that the pressure of continuous
competition, as in baseball, gradually diminishes the like-
lihood of outliers [17]. Given the crudeness of the model
and real features that we have ignored, such as home-eld
advantage (approximately 53% for the past century and
slowly decreasing with time), imbalanced playing sched-
ules, and in-season personnel changes due to trades and
player injuries, the agreement between the data and sim-
ulations of the BT model is satisfying.
It is worth noting in Fig. 1 is that the win fraction
data and the corresponding numerical results from sim-
ulations of the BT model deviate from the theoretical
prediction given in Eq. (2) when r 0 and r 1. This
discrepancy is simply a nite-season eect. As shown in
Fig. 2, when we simulate the BT model for progressively
longer seasons, the win/loss data gradually converges to
the prediction of Eq. (2).
The present model not only reproduces the average
win record W(r) over a given period, but it also correctly
explains the season-to-season uctuation
2
(r) of the win
fraction dened as

2
(r)
1
Y
Y

j=1
(W(r) W
j
(r))
2
, (3)
where W
j
(r) is the winning fraction of the r
th
-ranked
team during the j
th
season and
W(r) =
1
Y
Y

j=1
W
j
(r),
is the average win fraction of the r
th
-ranked team and Y
is the number of years in the period. These uctuations
are the largest for extremal teams (and minimal for aver-
age teams). There is also an asymmetry of (r) with re-
spect to r = 1/2. Our simulations of the BT model with
the optimal x
min
values that were determined previously
by tting to the win fraction quantitatively reproduce
these two features of (r).
0 0.2 0.4 0.6 0.8 1
r
0.01
0.02
0.03
0.04

(
r
)
FIG. 3: Season-to-season uctuation (r) for 190160 ()
and for 19612005 (). The dashed lines are numerical simu-
lations of the BT model for 10
4
periods with the same xmin
as in Fig. 1.
In addition to the nite-season eects described above,
another basic consequence of the niteness of the season
is that the intrinsically strongest team does not necessar-
ily have the best win/loss record. That is, the average
4
win fraction W does not necessarily increase with team
strength. By luck, a strong team can have a poor record
or vice versa. It is instructive to estimate the number
of games G that need to be played to ensure that the
win/loss record properly reects team strength. The dif-
ference in the number of wins of two adjacent teams in the
standings is proportional to G(1x
min
)/T, namely, the
number of games times their strength dierence; the lat-
ter is proportional to (1 x
min
)/T for a league that con-
sists of T teams. This systematic contribution to the dif-
ference should signicantly exceed random uctuations,
which are of the order of

G. Thus we require
G
_
T
1 x
min
_
2
(4)
for the end-of-season standings to be ordered by team
strength. Fig. 2 and Fig. 3 illustrate the fact that this
eect is more important for the top-ranked and bottom-
ranked teams. During the 190160 period, when major-
league baseball consisted of independent American and
National leagues, T = 8, G = 154, and x
min
0.3, so
that the season was just long enough to resolve adjacent
teams. Currently, however, the season length is insu-
cient to resolve adjacent teams. The natural way to deal
with this ambiguity is to expand the number of teams
that qualify for the post-season playos, which is what is
currently done.
B. Applicability of the Bradley-Terry Model
Does the BT model with uniform teams strength pro-
vide the most appropriate description of the win/loss
data? We perform several tests to validate this model.
First, as mentioned in the previous section, the assump-
tion (1) for the winning probability can be recast more
generally as
p
ij
=
f(x
i
)
f(x
i
) +f(x
j
)
, (5)
so that an arbitrary X
i
= f(x
i
) reduces to the orig-
inal winning probability in Eq. (1). Hence the cru-
cial model assumption is the separability of the winning
probability. In particular, the BT model assumes that
p
ij
/p
ji
= p
ij
/(1 p
ij
) is only a function of characteris-
tics of team i, divided by characteristics of team j. One
consequence of this separability is the detailed-balance
relation
p
ik
1 p
ik

p
kj
1 p
kj
=
p
ij
1 p
ij
, (6)
for any triplet of teams. This relation quanties the obvi-
ous fact that if team A likely beats B, and B likely beats
C, then A is likely to beat C. Since we do not know the
actual p
ij
in a given baseball season, we instead consider
z
ij
=
W
ij
G
ij
W
ij
, (7)
where W
ij
is the number of wins of team i against j,
and G
ij
is the number of game they played against each
other in a given season. If seasons were innitely long,
then z
ij
p
ij
/(1 p
ij
), and hence
z
ik
z
kj
= z
ij
. (8)
-3 -2 -1 0 1 2 3
<ln(z
ij
)>
-1
0
1
<
l
n
(
z
i
k

z
k
j
)
>
FIG. 4: Comparison of the detailed balanced relation Eq. (8)
for baseball data to the results of the BT model over 10
4
periods (dashed lines), where each period corresponds to the
results of all baseball games during either 190160 (triangles)
or 19602005 (circles). The xmin values are the same as in
Fig. 1. The straight lines are guides for the eye, with slope
0.63 for the data for 190160 and 0.30 for 19612005.
-3 -2 -1 0 1 2 3
<ln(z
ij
)>
-0.5
0
0.5
<
l
n
(
z
i
k

z
k
j
)
>
FIG. 5: Dependence of ln(z
ik
z
kj
) vs ln(zij) on season
length for the 19612005 period. All Gijs are multiplied by
M = 5, 10, 100 (steepening dot-dashed lines). The thick
dashed line corresponds to M = 10
4
and is indistinguishable
from a linear dependence with unit slope.
To test the detailed balance relation Eq. (8), we plot
ln(z
ik
z
kj
) as a function of ln(z
ij
) from game data,
averaged over all team triplets (i, j, k) and all seasons
in a given period (Fig. 4). We discard events for which
5
W
ij
= G
ij
or W
ij
= 0 (team i won or lost all games
against team j). Our simulations of the BT model over
10
4
realizations of the 190160 and 19612005 periods
with the same G
ij
as in actual baseball seasons and with
the optimal values of x
min
for each period are in excel-
lent agreement with the game data. Although z
ik
z
kj
in
the gure has a sublinear dependence of z
ij
(slope much
less than 1 in Fig. 4), the slope progressively increases
and ultimately approaches the expected linear relation
between z
ik
z
kj
and z
ij
as the season length is increased
(Fig. 5). We implement an increased season length by
multiplying all the G
ij
by the same factor M. Notice
also that ln(z
ik
z
kj
) versus ln(z
ij
) for the 190160 pe-
riod has a larger slope than for 1961-2005 because the
G
ij
s are larger in the former period (G
ij
= 22) than in
the latter (G
ij
in the range 519).
This study of game outcomes among triplets of teams
provides a detailed and non-trivial validation for the BT
form Eq. (2) for the winning probability. As a byproduct,
we learn that cyclic game outcomes, in which team A
beats B, B beats C, and C beats A, are unlikely to occur.
C. Distribution of Team Strengths
Thus far, we have used a uniform distribution of team
strengths to derive the average win fraction for the BT
model. We now determine the most likely strength dis-
tribution by searching for the distribution that gives the
best t to the game data for W(r) by minimizing the
deviation between the data and the simulated form of
W(r). Here the deviation is dened as

2
=

r
[W(r) W(r; )]
2

r
W(r)
2
, (9)
where W(r; ) is the winning fraction in simulations of
the BT model for a trial distribution (x) in which the
actual game frequencies G
ij
were used in the simulation,
and W(r) is the game data for the winning fraction.
We assume that the two periods 190160 and 1961
2005 are long enough for W(r) to converge to its average
value. We parameterize the trial strength distribution as
a piecewise linear function of n points, {(y
i
)}, with y
i

[0, 1] and y
n
1. We then perform Monte Carlo (MC)
simulations, in which we update the y
i
and
i
= (y
i
) by
small amounts in each step to reduce . Specically, at
each MC step, we select one value of i = 1, ..., n, and
with probability 1/2 adjust y
i
(except y
n
= 1) by
u y/10, where y is the spacing between y
i
and
its nearest neighbor, and u is a uniform random
number between 0 and 1;
with probability 1/2, update (y
i
) by u (y
i
)/10.
If decreases as a result of this update, then y
i
or (y
i
)
is set to its new value; otherwise the change in the param-
eter value is rejected. We choose n = 8, which is large
enough to obtain a distribution with signicant features
and for which typically 10002000 MC steps are su-
cient for convergence. A larger n greatly increases the
number of MC steps necessary to converge and also in-
creases the risk of being trapped in a metastable state
because the size of the phase space grows exponentially
with n. To check that this algorithm does not get trapped
in a metastable state, we started from several dierent
initial states and found virtually identical nal distribu-
tions (Fig. 6). The MC-optimized distribution for each
period is remarkably close to uniform, as shown in this
gure.
0 0.2 0.4 0.6 0.8 1
x
0
0.5
1
1.5
2
2.5

(
x
)
FIG. 6: Optimized strength distributions (x) for 190160
(triangles) and 19612005 (circles), together with the opti-
mal uniform distributions (dashed). For 19612005, we also
show the nal distributions starting from yis equally spaced
between y1 = 0.1 and y8 = 1 with the distribution : (a) uni-
form on [0.1, 1] (open circles), and (b) a symmetric V-shape
on [0.1, 1] (full circles).
0 0.2 0.4 0.6 0.8 1
r
0.3
0.4
0.5
0.6
W
(
r
)
FIG. 7: Comparison of the winning fraction W(r) extracted
from the actual baseball data (symbols) to the model with a
constant (x) (dashed lines), and with the optimal log-normal
distribution (x) (full lines).
Although the optimal distributions are visually not
6
uniform, the small dierence in the relative errors, the
closeness of y
1
and x
min
, and the imperceptible dier-
ence in the r dependence of W(r) for the uniform and
optimized strength distributions suggests that a uniform
team strength distribution on [x
min
, 1] describes the game
data quite well.
For completeness, we also considered the
conventionally-used log-normal distribution of team
strengths [5, 19]:
(x) =
1

2x
exp
_

1
2
2
_
ln
_
x
x
_
+

2
2
_
2
_
. (10)
With the normalization convention of Eq. (10), the av-
erage team strength is simply x, which can be set to
any value due to the invariance of p
ij
with respect to
the transformation x x. Hence, the only relevant
parameter is the width . Using the same MC optimiza-
tion procedure described above, we nd that a log-normal
ansatz for the strength distribution with optimal parame-
ter gives a visually inferior t of the winning fraction in
both periods compared to the uniform strength distribu-
tion, especially for r close to 1 (see Fig. 7). The relative
error for the log-normal distribution is also a factor of
6 and 3 larger, respectively, than for the optimal distri-
bution in the 190160 and 19612005 periods. However,
we do reproduce the feature that the optimal log-normal
distribution for 19612005 is narrower ( = 0.238) than
that for 190160 ( = 0.353), indicating again that base-
ball is more competitive in the second period than in the
rst.
III. WINNING AND LOSING STREAK
STATISTICS
We now turn to the distribution of consecutive-game
winning and losing streaks. Namely, what are the prob-
abilities W
n
and L
n
to observe a string of n consecu-
tive wins or n consecutive losses, respectively? Because
of its emotional appeal, streakiness in a wide variety of
sports continues to be vigorously researched and debated
[15, 20, 21]. In this section, we argue that indepen-
dent game outcomes that depend only on relative team
strengths describes the streak data for the period 1961-
2005 quite well. The agreement is not as good for the
period 1901-60 and suggests that non-statistical eects
may have played a role in the longest streaks.
Historically, the longest team winning streak (with ties
allowed) in major-league baseball is 26 games, achieved
by the 1916 New York Giants in the National League
over a 152-game season [22]. The record for a pure win-
ning streak since 1901 (no ties) is 21 games, set by the
Chicago Cubs in 1935 in a 154-game season, while the
American League record is a 20-game winning streak by
the 2002 Oakland Athletics over the now-current 162-
game season. Conversely, the longest losing streak since
1901 is 23, achieved by the 1961 Philadelphia Phillies
in the National League [23], and the American League
losing-streak record is 21 games, set by the Baltimore
Orioles at the start the 1988 season. For completeness,
the list of all winning and all losing streaks of 15 games
is given in the appendix.
0 5 10 15 20 25
n
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
P
n
FIG. 8: Distribution of winning/losing streaks Pn versus n
since 1901 on a semi-logarithmic scale for 190160 () and
19612005 (). The dashed curves are the result of simula-
tions with xmin = 0.278 and xmin = 0.435 for the two re-
spective periods. The smooth curves are streak data from
randomized win/loss records, and the dotted curve is 2
n
.
Fig. 8 shows the distribution of team winning and los-
ing streaks in major-league baseball since 1901. Because
these winning and losing streak distributions are virtu-
ally identical for n 15, we consider P
n
= (W
n
+L
n
)/2,
the probability of a winning or a losing streak of length
n (Fig. 8). It is revealing to separate the streak distri-
butions for 190160 and 19612005. Their distinctness
is again consistent with the hypothesis that baseball is
becoming more competitive. In fact, exceptional streaks
were much more likely between 190160 than after 1961.
Of the 55 streaks of 15 games, 27 occurred between
190130, 13 between 193160, and 15 after 1960 [24].
The rst point about the streak distributions is that
they decay exponentially with n, for large n. This be-
havior is a simple consequence of the following bound:
consider a baseball league that consists of teams with ei-
ther strengths x = 1 or x = x
min
> 0, and with games
only between strong and weak teams. Then the distri-
bution of winning streaks of the strong teams decays as
(1 +x
min
)
n
; this represents an obvious upper bound for
the streak distribution in a league where team strengths
are uniformly distributed in [x
min
, 1].
We now apply the BT model to determine the form
of the consecutive-game winning and losing streak dis-
tributions. Using Eq. (2) for the single-game outcome
probability, the probability that a team of strength x
has a streak of n consecutive wins is
P
n
(x) =
n

j=1
x
x +x
j
x
0
x +x
n+1
x
n+1
x +x
n+1
. (11)
7
The product gives the probability for n consecutive wins
against teams of strengths x
j
, j = 1, 2, . . . , n (some fac-
tors possibly repeated), while the last two factors give
the probability that the 0
th
and the (n + 1)
st
games are
losses to terminate the winning streak at n games. As-
suming a uniform team strength distribution (x), and
for the case where each team plays the same number of
games with every opponent, we average Eq. (11) over all
opponents and then over all teams.
The rst average gives:
P
n
(x)
{xj }
= x
n
_
1
x +y
_
n
_
y
x +y
_
. (12)
with
_
1
x +y
_
=
1
1
ln
_
x + 1
x +
_
,
_
y
x +y
_
= 1
x
1
ln
_
x + 1
x +
_
for a uniform distribution of team strengths in [x
min
, 1].
Here we use the fact that each team strength is indepen-
dent, so that the product in Eq. (11) factorizes. We now
average over the uniform strength distribution, to nd,
for the team-averaged probability to have a streak of n
consecutive wins,
P
n
=
1
1 x
min
_
1
xmin
f(x) e
ng(x)
dx, (13)
where
f(x) =
_
1
x
1 x
min
ln
_
x + 1
x +x
min
__
2
g(x) = ln x + ln
_
1
1 x
min
ln
_
x + 1
x +x
min
__
.
Since g(x) monotonically increases with x within
[x
min
, 1], the integral in Eq. (13) is dominated by the
behavior near the maximum of g(x) at x = 1 for large n.
Performing the integral by parts [25], the leading behav-
ior is
P
n
e
ng(1)
, (14)
with
g(1) = ln(1 x
min
) + ln ln
_
2
1 +x
min
_
.
As expected, P
n
decays exponentially with n, but
with a decay rate that decreases as teams become more
heterogeneous (decreasing x
min
). In the limit of equal-
strength teams, the most rapid decay of the streak prob-
ability arises, P
n
= 2
n
, while the widest disparity in
team strengths, x
min
= 0, leads to the slowest possible
decay P
n
(ln 2)
n
(0.693)
n
.
We simulated the streak distribution P
n
using the same
methodology as that for the win/loss records; related
simulations of streak statistics are given in Refs. [19, 21].
Taking x
min
= 0.435 for 19612005the same value as
those used in simulations of the win/loss recordswe nd
a good match to the streak data for this period. The ap-
parent systematic discrepancy between data and theory
for n 17 is illusory because streaks do not exist for
every value of n. Moreover, the number of streaks of
length n 17 is only eight, so that uctuations are quite
important.
For the 190160 period, if we use x
min
= 0.278, the
data for P
n
is in excellent agreement with theory for
n < 17. However, for n in the range 1722, the data is a
roughly factor of 2 greater than that given by the analyt-
ical solution Eq. (14) or by simulations of the BT model.
Thus the tail of the streak distribution for this early pe-
riod appears to disagree with a purely statistical model
of streaks. Again, the number of events for a n 17 is 5
or less, compared to a total number of 70000 winning
and losing streaks during this period. Hence one cannot
exclude the possibility that the observed discrepancy for
n 17 is simply due to lack of statistics.
Finally, we test for the possible role of self-
reinforcement on winning and losing streaks. To this end,
we take each of the 2166 season-by-season win/loss his-
tories for each team and randomize them 10
5
times. For
each such realization of a randomized history, we com-
pute the streak distribution and superpose the results
for all randomized histories. The large amount of data
gives streak distributions with negligible uctuations up
to n = 30 and which extend to n = 44 and 41 for the two
successive periods. More strikingly, these streak distribu-
tions based on randomized win/loss records are virtually
identical to the simulated streak data as well as to the
numerical integration of Eq. (13), as shown in Fig. 8.
IV. SUMMARY
To conclude, the Bradley-Terry (BT) competition
model, in which the outcome of any game depends
only on the relative strengths of the two competing
teams, quantitatively accounts for the average win/loss
records of Major-League baseball teams. The distribu-
tion of team strengths that gives the best match to these
win/loss records was found to be quite close to uniform
over a range [x
min
, 1], with x
min
0.28 for the early mod-
ern era of 19011960 and x
min
0.44 for the expansion
era of 19612005. This same BT model also reproduces
the season-to-season uctuations of the win/loss records.
An important consequence of the BT model is the ex-
istence of a non trivial detailed-balance relation which
we veried with satisfying accuracy. We consider this
verication as a quite stringent test of the theory.
The same BT model was also used to account for the
distribution of team consecutive-game winning and losing
streaks. We found excellent agreement between the pre-
diction of the BT model and the streak data for n < 17
for both the 1901-60 and 1961-2005 periods. However,
8
the tail of the streak distribution for the 190160 period
with n 17 is less accurately described by the BT the-
ory and it is an open question about the mechanisms
for the discrepancy, although it could well originate from
lack of statistics. We also provided evidence that self-
reinforcement plays little role in streaks, as randomiza-
tions of the actual win/loss records produces streak dis-
tributions that are indistinguishable from the streak data
except in for the n 17 tail during the 1901-60 period.
We also showed that the optimal team strength distri-
bution is narrower for the period 19612005 compared to
190160. This narrowing shows that baseball competi-
tion is becoming keener so that outliers in team perfor-
mance over an entire seasonas quantied by win/loss
records and lengths of winning and losing streaksare
less likely to occur.
We close by emphasizing the parsimonious nature
of our modeling. The only assumed features are the
Bradley-Terry form Eq. (2) for the outcome of a single
game, and the uniform distribution of the winning proba-
bilities, controlled by the single free parameter x
min
. All
other model features can then be inferred from the data.
While we have ignored many aspects of baseball that
ought to play some rolethe strength of a team chang-
ing during a season due to major trades of players and/or
injuries, home-eld advantage, etc.the agreement be-
tween the win fraction data and the streak data with
the predictions of the Bradley-Terry model are extremely
good. It will be worthwhile to apply the approaches of
this paper to other major sports to learn about possible
universalities and idiosyncracies in the statistical features
of game outcomes.
Acknowledgments: SR thanks Guoan Hu for data
collection assistance, Jim Albert for literature advice,
and nancial support from NSF grant DMR0535503 and
Universite Paul Sabatier.
[1] See e.g., W. Weidlich Sociodynamics; A Systematic Ap-
proach to Mathematical Modelling in Social Sciences,
(Harwood Academic Publishers, 2000); M. Lassig and
A. Valleriani (eds.), Biological Evolution and Statisti-
cal Physics, (Springer, Berlin, 2002); M. Newman, A.-L.
Barabasi, and D. J. Watts, The structure and dynamics
of networks, Princeton University Press (2006).
[2] J.-P. Bouchaud and M. Potters, Theory of nancial risk
and derivative pricing: from statistical physics to risk
management, Cambridge University Press (2003).
[3] J. Krug and C. Karl, Physica A 318, 137 (2003); K. Jain
and J. Krug, J. Stat. Mech. P04008 (2005).
[4] E. Bittner, A. Nussbaumer, W. Janke, and M. Weigel,
Eurpphys. Lett. 78, 58002 (2007); Nature 441 793
(2006).
[5] J. Albert & J. Bennett, Curve Ball: Baseball, Statistics,
and the Role of Chance in the Game (Springer New York,
2001).
[6] J. Park and M. E. J. Newman, J. Stat. Mech. P10014
(2005).
[7] E. Ben-Naim, S. Redner, and F. Vazquez, Europhys.
Lett. 77, 30005 (2007).
[8] E. Ben-Naim and N. W. Hengartner, Phys. Rev. E 76,
026106 (2007).
[9] C. Sire, J. Stat. Mech. P08013 (2007).
[10] The data presented here were obtained from
www.shrpsports.com.
[11] However, since 1997 a small amount of interleague play
during the regular season has been introduced.
[12] E. Zermelo Mathematische Zeitschrift 29, 435 (1929).
[13] R. A. Bradley & M. E. Terry, Biometrika 39, 324 (1952).
[14] See, e.g., http://en.wikipedia.org/wiki/List of MLB
individual streaks.
[15] R. C. Vergin, J. of Sport Behavior 23 2000.
[16] E. Ben-Naim, F. Vazquez, and S. Redner, Journal of
Quantitative Analysis in Sports 2, No. 4, Article 1 (2006);
[17] S. J. Gould, Full House: The Spread of Excellence from
Plato to Darwin (Three Rivers Press, New York, 1996).
[18] In contrast, in Ref. [16], the winning probability was
taken to be independent of the relative strengths of the
two teams; the stronger team won with a xed probabil-
ity p and the weaker won with probability 1 p.
[19] B. James, J. Albert, & H. S. Stern, Chance 6, 17 (1993).
[20] T. Gilovich, R. Vallone, & A. Tversky, Cognitive Psy-
chology 17, 295 (1985).
[21] J. Albert, Chance 17, 37 (2004).
[22] http://answers.yahoo.com/question/index?qid=
1006053108634. This record is slightly tainted be-
cause of a tie during this streak, and ties are no longer
allowed to occur; every game that is tied at the end of
the regulation 9 innings must continue until one team
wins.
[23] http://en.wikipedia.org/wiki/List of worst MLB
season records.
[24] Moreover, three of the post-1960 15 game losing
streaks occurred during the initial year of necessarily
weak expansion teams because they were stocked with
the weakest players from established teams (1962 NY
Mets, 1969 Montreal Expos, 1972 Texas Rangers).
[25] C. M. Bender & S. A. Orszag, Advanced Mathematical
Methods for Scientists and Engineers (McGraw-Hill, New
York, 1978) section 6.3.
9
APPENDIX: TEAM WINNING AND LOSING
STREAKS
TABLE I: Winning streaks of n 15 games since 1901.
n year team
26 1916 New York Giants (1 tie)
21 1935 Chicago Cubs
20 2002 Oakland Athletics
19 1906 Chicago White Sox (1 tie)
19 1947 New York Yankees
18 1904 New York Giants
18 1953 New York Yankees
17 1907 New York Giants
17 1912 Washington Senators
17 1916 New York Giants
17 1931 Philadelphia Athletics
16 1909 Pittsburgh Pirates
16 1912 New York Giants
16 1926 New York Yankees
16 1951 New York Giants
16 1977 Kansas City Royals
15 1903 Pittsburgh Pirates
15 1906 New York Highlanders
15 1913 Philadelphia Athletics
15 1924 Brooklyn Dodgers
15 1936 Chicago Cubs
15 1936 New York Giants
15 1946 Boston Red Sox
15 1960 New York Yankees
15 1991 Minnesota Twins
15 2000 Atlanta Braves
15 2001 Seattle Mariners
TABLE II: Losing streaks of n 15 games since 1901.
n year team
23 1961 Philadelphia Phillies
21 1988 Baltimore Orioles
20 1906 Boston Americans
20 1906 Philadelphia As
20 1916 Philadelphia As
20 1969 Montreal Expos (rst year)
19 1906 Boston Beaneaters
19 1914 Cincinnati Reds
19 1975 Detroit Tigers
19 2005 Kansas City Royals
18 1920 Philadelphia As
18 1948 Washington Senators
18 1959 Washington Senators
17 1926 Boston Red Sox
17 1962 NY Mets (rst year)
17 1977 Atlanta Braves
16 1911 Boston Braves
16 1907 Boston Doves
16 1907 Boston Americans (2 ties)
16 1944 Brooklyn Dodgers (1 made-up game)
15 1909 St. Louis Browns
15 1911 Boston Rustlers
15 1927 Boston Braves
15 1927 Boston Red Sox
15 1935 Boston Braves
15 1937 Philadelphia As
15 2002 Tampa Bay
15 1972 Texas Rangers (rst year)

You might also like