Effects and Valuation of Fielding in Maj-1
Effects and Valuation of Fielding in Maj-1
Effects and Valuation of Fielding in Maj-1
A Play-by-Play Analysis
Clemson University
We attempt to estimate (1) the relative importance of fielding skills in the determination
of game outcomes in major league baseball and (2) valuation of these skills in the labor market.
Fielding skills have received little attention relative to batting, pitching, and managerial inputs in
economic analyses of baseball. This lack of attention is due to two main factors. First, sports
economics as a discipline has had bigger fish to fry. Studies of the baseball labor market have
focused on issues such as discrimination, outcomes of the arbitration process, and the extent to
which structural constraints sever the link between player wages and productivity. Evaluation of
fielding skills may not be an important component of these questions. Second, measurements of
batting and pitching skills have been well known and widely available for decades. Good data
on fielding attributes has become available only in recent years. Although measurement and
valuation of fielding skills is the focal point of this study, we also seek to make methodological
Jay Bennett (1984) used the concept of "player game percentages" (PGPs) to address the
question "Did Shoeless Joe Jackson Throw the 1919 World Series?" By taking into account the
conditions in effect – inning, outs, runners on base, score difference – each time Jackson was at
the plate or made a play on the ball, Bennett was able to compute the impact of Jackson's
performance on game outcomes. This level of detail enables one to get around the argument that
Jackson's aggrega te statistics somehow disguise his alleged throwing of the Series. Bennett was
able to show that Jackson's performance actually increased, rather than decreased, the Black Sox
using play-by-play data for an entire season. 1 Situational conditions and fielding skills are
1
A critique of this approach is that it leans excessively towards what Rodney Fort (1992, p. 137) calls the "Bill
James model" of the salary process. Our ultimate aim differs from that of James, whose target is typically limited to
comparison of different player abilities in different eras, ballparks, etc. Our quantification of different elements by
2
potentially quite relevant to the outcome of the game. For example, suppose some shortstops are
Since these situations occur when runners are on base and scoring is more likely, this skill can
have a tangible effect on the outcome of games and, thus, a team's winning percentage. This,
and other skills, can only be measured and evaluated using techniques based on play-by-play
data. Similar examples for outfielders with outstanding range and accurate throwing arms which
prevent runners from advancing amplify the point: fielding skills are critical elements of a
baseball game.
Proper evaluation of these skills in the past has been inhibited by the availability of data
and computational constraints (our play-by-play file for 2000 is over 40mb). The "best"
indicator of fielding performance that has been explored in previous research is the "fielding
average," which measures the proportion of chances a fielder faces that he executes without
making an error. While important, this measure clearly fails to capture critical skills (some
described above) that impact the outcome of a game. Relaxation of data and computational
baseball games. It is feasible and useful to address a new set of questions by pushing the
There are a host of questions that can be addressed using player game percentages and
play-by-play data. Our focus in this paper is to (1) measure the contribution of fielding to game
outcomes, and (2) assess whether the labor market's valuation of these skills is commensurate
with their contribution to team success. There are various reasons why one might find a dis-
play game percentages should be viewed as a more careful and detailed analysis of the (baseball) production side of
the profit maximization model.
3
Early papers on the baseball labor market ignore fielding for reasons stated previously
(e.g. Scully (1974), Sommers (1982), Zimbalist (1992), MacDonald and Reynolds (1994)).
More recent papers incorporate measures of fielding inputs in analyses of issues such as
arbitration and technical efficiency, but without much discussion. On the production side, Kahn
(1993a) and Ruggiero et al (1996) find strong positive relation between fielding average and
winning percentage. 2 In contrast, valuation studies find a weak relationship between measures of
fielding and player salaries. Kahn (1993b) and Gius and Hylan (1996) report coefficients in log
wage regressions for the number of Gold Glove awards that are positive, but marginally
significant. Similarly, Marburger (1996) obtains coefficient estimates for fielding average that
are positive, but at best marginally significant, in his model of salary determination under
arbitration.
James and Henzler (2002) devised a system called “win shares” to estimate the
contribution of players to team victory totals. To do so, they faced the same difficulties in
measuring defensive ability with aggregate fielding statistics that thwarted previous researchers.
To resolve them, James and Henzler use several ad hoc approximations, which result in plausible
estimates of the contributions of players in historical seasons. Their description of the system’s
details, however, is not sufficiently thorough to allow replication of their results or application of
their system to new data. Also, they report individual player defensive win shares only as a
fraction of the player’s total win shares for the decade, which precludes the possibility of our
using their estimates in our salary study unless we make simplifying assumptions about the time
path and variation of a player’s defensive skills. That said, their reported results do suggest that
defense is a significant factor of production for team wins. James and Henzler (2002) attribute
13-21% of team wins to defense for the 2000 major league season.
2
They do not emphasize the fact, but the coefficient estimates are on the order of 5 to 6 times their standard errors.
4
The contrast between production and valuation estimates presents a puzzle. While it
could be, despite the findings cited above, that fielding is of meager importance in baseball, it is
more likely that traditional fielding measures are poor estimators of defensive ability at the level
of the individual player. Another possibility to be addressed is that fielding isn't valued in
proportion to its impact on the game because sound defense isn’t as entertaining to the casual
(marginal) fan. Fans occassionally may cheer a quick catch and release that yields a double play
in a close game, but routinely give standing ovations for home runs (at least by the home team)
even in lop-sided contests. It is our hope that construction of more detailed and precise estimates
of the contributions made by playmakers in the field will enable us to address these issues.
The economic literature regarding baseball fo llows two major themes. The first analyzes
the production aspects of the sport, and seeks to identify how teams win games and, by
association, how individual players contribute to those wins. The second studies how production
The simplist possible reason for the relative lack of attention given to fielding in
superficial look at fielding statistics would tend to support this assessment, as there is very little
correlation between the most commonly available fielding statistics and team winning
percentage.
The most commonly available defensive statistics, collected for both individuals and for
teams are aggregate season totals of errors, putouts, assists, outfield assists, and double plays.
The first three statistics are combined to produce fielding percentage (fpct), which is calculated
as
5
assists + putouts assists + putouts
fpct = = . (1)
chances assists + putouts + errors
Using these statistics to look at team performance does not reveal much of immediate
interest. Table 1 depicts all 30 major league teams in the 2000 season, ranked by their regular
season winning percentage. For each team, the team aggregates of errors, double plays, outfield
assists and fpct are listed next to the team winning percentage. As many baseball managers and
former players are of the opinion that defensive strength of the team is more important “up the
middle”, error totals for those positions are isolated from errors committed by pitchers and
Table 2 uses simple correlation coefficients between team winning percentage (winpct)
and various team aggregates to identify the "best" correlates of winning. Runs scored and runs
against correlate very highly with winning percentage for obvious reasons. Stepping back one
pace to look at the inputs into run scoring, hits allowed are still strongly detrimental to team
performance, but hits made by a team, while covarying directly with team winning percentage,
The defensive statistics listed in Table 2 are not encouraging for the argument that
defense matters. Double plays, errors and fpct are not significantly correlated with winning, and
isolating errors by corner men and up the middle fielders is not helpful. 3 The one significant
correlate with winning percentage is outfield assists, which is significant at the 5% level and has
a negative sign, indicating that teams with higher numbers of outfield assists are less successful
on the field. This result could either be dismissed as a fluke or rationalized in that losing teams
allow more baserunners, hence there are more opportunities to record outfield assists.
the defensive aggregates lose all statistical significance. It is only in Model 3, where both runs
3
The correlation coefficient for winpct versus errors by “up the middle” defenders (catcher, second base, shortstop,
and center field) is –0.016, which is not statistically significant.
6
and hits are omitted, that outfield assists are significant at even the 90% confidence level, and
then the coefficient has the negative sign witnessed above. While Kahn (1993a) and Ruggiero et
managerial efficiency, we were unable to obtain similar results with fpct for this particular
sample.
Table 4 presents the results of OLS regression models of player valuation. Each team’s market
size is captured through ln(population), which is the population of the metropolitan area, using
the CMSA size where available, and halving the population for markets shared by two major
league teams (New York, Chicago, Los Angeles, and San Francisco/Oakland) before taking the
natural logarithm. For lack of better data at this time, the ability-to-contract variables for
“Arbitration eligible” and “Free agent” were estimated using estimated years of service based
upon the first season in which the player appeared in 25 or more games.
Models 3 and 4 in Table 4 pool the statistics of all position players on major league
rosters for whom opening day salaries were available through USA Today. All of the coefficients
have the expected signs, and all of the control variables are significant at the 95%, and often the
99%, level. Model 3 focuses upon whether error information influences salaries, which it does
not, while Model 4 uses normalized fielding percentage and “zone rating”. The fielding
percentages were normalized by calculating the deviation of the player’s fielding percentage
from the opportunities-weighted average at his position, and dividing by the standard deviation
of fpcts at that position. Thus a one-unit increase in normalized fielding pct. represents a one
standard deviation improvement in that statistic. Zone rating is a relatively recent statistic which
reflects the percentage of the balls hit into a player’s area of defensive responsibility on which he
is able to make a play. Between zone rating and normalized fielding percentage, we can
7
presumably see how many balls a player is able to get to and how adroitly he is able to handle
The results in Model 4, however, would suggest that while sure- handedness is valued by
the player salary market, range is not. This result might appear to be at odds with the
insignificance in a player’s error totals in Model 3, but the difference in results might be due to
information on opportunities not included in plate appearances. 4 When the model is limited to
full time players, the relevance of defensive statistics disappears. 5 In contrast, the measure of
batting performance -- on-base percentage plus slugging percentage (OPS) -- becomes much
more important in both magnitude and in statistical significance of the estimated coefficient.
While the evidence from aggregate defensive statistics as to the importance of fielding is
somewhat equivocal, it is possible that the failure to find telling results is due to information lost
in the aggregation. To see whether fielding matters at a more disaggregated level, we have
looked at individual game results to find the correlation of errors in a particular game with the
Table 5 reports the distribution of error differentials for games in the 2000 season. One
team committed two or more errors more than its opposition in less than 20% of the contests.
Table 6 reports the conditional probability of winning a game once a team’s opponent has
allowed an error advantage. With even a one error advantage, a team’s chances of winning
increase from even to about five-to-three. As the error differential increases, the probability of
winning increases significantly. Figure 1 shows this result graphically, showing that the density
of wins is higher for teams with negative differences between own errors and opponent errors.
4
For instance, some teams have more fly ball or ground ball pitchers than others. Also, as pitching staffs differ in
strikeout rates and in hits allowed, there may be more balls put in play in an average game against certain teams,
which would alter error totals for their fielders.
5
Full time is arbitrarily defined here as being among the top 30 players at a given position in balls hit into one’s
zone.
8
This result carries over to Table 7, which shows that the winning teams average fewer errors than
Table 7 also counters concerns that the error differentials are due to home field
advantage. Errors per game are statistically equivalent at home and on the road, and for teams
which normally play on grass versus those which are at home on artificial surfaces. More errors
per game occur during games on natural grass, but the extra errors are made by both turf teams
That errors in a game influence a game’s outcome does not logically lead to the
conclusion that aggregate errors should influence team winning percentage. We would also need
to know whether errors occur randomly or if error rates are conditional upon the context of the
game.
The play-by-play data indicate that errors, at least in part, are situational in nature. The
incidence of errors that occur on batted balls is broken down in Table 8. With no outs, the error
rate is 1.4% with the bases empty, but rises to 2.1% with runners on base. A similar increase
takes place with one out (from 1.3 to 1.8%), but the effect vanishes with two outs. In the latter
case, the error rate is similar to all out situations with the bases empty. The multiple options and
decisions that must be made in a split second are more numerous when runners are on base.
These situations may induce momentary indecision which would account for the higher error
rate. With two outs, the runners on base are less relevant to the strategy employed by the fielder,
and often allow for an easier play to be made to end the inning (any force out will do the job).
Table 9 suggests that the score of the game may also influence the likelihood of the error.
Error rates are higher for games which are going poorly and lower for games in which a team is
cruising along. This may be due either to the absence of difficult base runner situations when
one’s team is far ahead, or to the greater likelihood of better defensive teams obtaining the lead.
9
If errors were purely random acts of God, then the variation in fielding ability from
season to season would simply be noise. There would be no such thing as error-prone fielders or
those with soft hands. Fielding percentage would then be relevant for game outcomes only in the
year in which it is measured, but would be irrelevant for salary determination. Alternatively,
some fielders may be consistently good and others consistently bad. If this is the case, then year
to year measures of fielding ability across players will be positively correlated. We examine this
by estimating correlation coefficients corr(Fi(t-1),Fit ), where Fit is a measure of fielding ability for
player i in year t. We estimate these coefficients for the three common indicators of ability:
fielding percentage, range factor, and zone rating. Since these measures are systematically
different for infielders, outfielders, catchers and first basemen, we estimate the coefficients
Table 10 shows that the correlation across seasons is positive for all indicators,
particularly for infielders and outfielders. This is further evidence that there is a systematic
component to both good and bad fielding attributes which should impact both game outcomes
In order to measure the influence of a play upon the outcome of the game, we adapt the
“player game percentage” (PGP) method developed in Bennett and Fleuck (1984) and Bennett
(1993). By determining the probability of a team winning a game from a given situation, and
measuring that probability both before and after a play, the effect of the play upon the game can
shown in greater detail below, to know the distribution of runs scored in a given inning. We
assume that this distribution is statio nary across innings. Using the play-by-play data, the
10
empirical run scoring distribution is as shown in Table 11. The distributions conditional upon
base runner codes and outs demonstrate the extent to which allowing base runner advancement
harms a defensive team’s chances, and the changes in distribution as outs increase help inform
the tradeoffs involved in sacrifices and decisions on how to play a particular batted ball (e.g. to
attempt to throw out the lead runner, or prevent the trailing runner from advancing). 6
Using the probabilities in Table 11, and the assumption of stationarity, we can use
backward induction to calculate the probability of winning in each inning, conditional on the run
difference, basecode and number of outs. Let PH(h,I,b,o,d) represent the probability that the
home team, H, wins a game situated in the h half of inning I, with runners indicated by basecode
b, o outs, and facing a run difference of d runs. If the home team is trailing at the start of the
where P S(R|b,o) is the stationary probability function for scoring R runs during the inning
conditional on basecode and outs, with a score difference (runs less opponents runs) of d runs at
the start of the ninth. At the start of the ninth, the conditional probabilities are taken from the
first row of Table 5. As outs are recorded and/or runners advance, the probabilities move to the
row which matches the basecode and out situation of the game. The relevant columns for the
probability sums also change when d changes as the home team scores.
Once we know the probability that the home team overcomes a deficit in the bottom of
the ninth, we can take any run difference facing the visiting team in the top of the ninth, the
probability it scores and hence changes the run difference, and the probability that the home team
overcomes this new difference (if necessary), and thereby compute the probability that the
visiting team is victorious given any situation it faces in the top of the ninth. For example,
6
Baserunner positons are coded as follows: 1 = runner on 1st ; 2 = runner on 2nd ; 3 = runner on 3rd; 4 = runners on 1st
and 2nd ; 5 = runners on 1st and 3rd; 6 = runners on 2nd and 3rd; 7 = bases loaded.
11
9
PV (0 ,9, 0 ,0 , d ) = ∑ PS ( R | 0 ,0 )(1 − PH (1,9 ,0 , 0, − (d + R )))
R =0
where P V(h,I,b,o,d) is the visitor’s probability of winning the game. In recursive fashion, the
probabilities can be computed in this manner all the way to the top half of the first inning.
illustrate the relative impact of particular plays at various stages of the game. Section A of Table
12 indicates the increasing value of plays as the game wears on. The value of a proficient
leadoff hitter (such as Rickey Henderson, the record holder for career leadoff home runs) is
apparent in the second through fifth rows of the table. Nevertheless, a single or one-base error in
the ninth inning of a tight game is worth about twice that of a single in the first inning. More
pronounced is the ninth inning home run, which is worth about three times as much as its first
inning counterpart. 7
Section B of Table 12 illustrates how the value of small (one run) and larger (three run)
leads increases as the game moves towards its conclusion. Section C measures the probability of
winning at the start of the bottom half of innings 1 and 9 in a tie game. The difference of 5 and
15 percentage points over the P V (0,I,0,0,0) of .500 at the top of the inning indicates the
increasing value to the home team of recording a scoreless top half of the inning.
The extent to which a particular play influences the probability of winning that game is
illustrated in Table 13. As would be expected, singles are “worth” more than walks due to the
possibilities for runner advancement, and extra base hits are worth more than singles. The
negative effect of a ground into double play (GIDP) on the offensive team’s probability of
winning is more than three times the magnitude of a routine ground out, and errors are only very
slightly less costly to a team than allowing a hit for an identical number of bases.
7
From a management perspective, Tommy Lasorda's use of Kirk Gibson in the 1988 World Series comes to mind.
12
There are two shortcomings of these probabilities, but we do not believe they
compromise our approach. First, endgame and batting order effects are likely to make scoring
probabilities non-stationary. Second, we have ignored the identity of the home team and the
pitchers, which will also affect the numbers in the table. But these effects will impact all
probabilities, differencing any bias of constant magnitude will result in a clean, unbiased
probability estimate.
The PGP numbers that we have constructed so far are incomplete, but they contain many
of the factors by which defense can influence a game’s outcome. The numbers presented in
Table 7 directly measure key plays, (i.e. the PGP effects of errors, turning double plays, and
recording outfield assists), as well as estimates of outfield and infield range calculated from the
average PGP result of all balls hit into a player’s area of responsibility. While several factors
have yet to be included in our measure, such as ballpark effects, and the deterrence effect of a
strong outfield throwing arm upon baserunners, the estimates represent a first pass at a
The top three and bottom three players at each position, using our method, are listed in
Table 14. In many cases, these rankings are compatible with the conventional wisdom. Alex
Rodriguez, in addition to his hitting prowess, also has the highest PGP value for his fielding,
which gives an expected value of three expected wins higher than the worst shortstops in the
majors. Noted outfielders Darin Erstad, Barry Bonds, Torii Hunter, and Jim Edmonds also are
among the leaders at their positions, while known defensive liabilities Mo Vaughn and Al Martin
13
Using these PGP numbers under the label “defensive credit”, Table 15 reestimates the
salary model first shown in Table 4. As the statistic currently stands, defensive PGP is not
offensive PGPs calculated in the same manner for each batters plate appearances are a viable
6. Future Research
The main benefit of PGPs over traditional statistics, however, will come when we
allocate the share of a play’s probability change accurately between the batter, the pitcher, the
baserunners, and the fielders who handle the ball. Then the units of PGP can be used to estimate
the dollar value of a win. As the current “batter credit” variable (incorrectly) gives the full credit
of a hit to the batter, without assigning any blame to the pitcher, it overstates a player’s games
won through his batting. As such, the salary effect of a game won through an everyday player’s
batting prowess is likely to be higher than the 9.6% point estimate in Model 1. Once credit is
properly attributed, however, through incorporation of the missing elements in Tables 16 and 17,
it should be possible to determine whether games won with a bat are valued in salaries equally
14
References
Bennett, Jay. "Did Shoeless Joe Jackson Throw the 1919 World Series?" The American
Statistician, November 1993, 47, 241-250.
Bennet, Jay, and Fleuck, J. A. "Player Game Percentage," In Proceedings of the Social Statistics
Section, American Statistical Association, 1984, 378-380.
Burgess, Paul L., Marburger, Daniel R., and Scoggins, John F. "Do Baseball Arbitrators Simply
Flip a Coin?" In Baseball Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.
Fizel, John P. "Is There Bias in Major League Baseball Arbitration?" In Baseball Economics, J.
Fizel et al eds., Praeger, 1996: Westport, CT.
Fort, Rodney. "Pay and Performance: Is the Field of Dreams Barren? In Diamonds are
Forever, P. Sommers ed., Brookings, 1992: Washington D.C.
Gius, Mark Paul and Hylan, Timothy R. "An Interperiod Analysis of the salary Impact of
Structural Changes in Major League Baseball: Evidence from Panel Data," In Baseball
Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.
James, Bill and Henzler, Jim. Win Shares. Stats Inc. Publishing, 2002: Morton Grove, IL.
Kahn, Lawrence M. "Management Quality, Team Success, and Individual Player Performance in
Major League Baseball," Industrial and Labor Relations Review, 1993a, 46, 531-547.
Kahn, Lawrence M. "Free Agency, Long-Term Contracts and Compensation in Major League
Baseball: Estimates from Panel Data," The Review of Economics and Statistics, February 1993b,
75, 157-164.
Marburger, Daniel R. "A Comparison of Salary Determination in the Free Agent and Salary
Arbitration Markets," In Baseball Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.
McDonald, Don N. and Reynolds, Morgan O. "Are Baseball Players Paid their Marginal
Products," Managerial and Decision Economics, September/October 1994, 15, 443:457.
Ruggiero, John, Hadley, Lawrence, and Gustafson, Elizabeth. "Technical Efficiency in Major
League baseball," In Baseball Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.
Scully, Gerald P. "Pay and Performance in Major League Baseball," American Economic
Review, December 1974, 64: 915-30.
Sommers, Paul M. and Quinton, Noel. ""Pay and Performance in Major League Baseball: The
Case of the First Family of Free Agents," Journal of Human Resources, Summer 1982, 17: 426-
36.
Zimbalist, Andrew. "Salaries and Performance: Beyond the Scully Model," In Diamonds are
Forever, P. Sommers ed., Brookings, 1992: Washington D.C.
15
Table 1: Team success and aggregate defensive statistics,
2000 season
Errors by Errors by
Win Double OF C,2B,SS, P,1B,3B, Fielding
Team pct. Errors plays assists CF LF,RF Pct.
SF_ 0.599 93 173 31 48 45 .985
CWS 0.586 133 190 33 73 60 .978
ATL 0.586 129 137 25 64 65 .979
STL 0.586 111 148 23 56 55 .981
NYM 0.580 118 122 28 50 68 .980
OAK 0.565 134 165 20 73 61 .978
SEA 0.562 99 176 20 41 58 .984
CLE 0.556 72 147 26 35 37 .988
NYY 0.540 109 133 18 56 53 .981
LA_ 0.531 135 151 25 65 70 .978
ARI 0.525 107 138 29 51 56 .982
CIN 0.525 111 157 38 51 60 .982
BOS 0.525 109 120 33 51 58 .982
TOR 0.512 100 177 27 47 53 .984
ANA 0.506 134 182 34 63 71 .978
COL 0.506 94 176 35 51 43 .985
FLA 0.491 125 144 37 67 58 .980
DET 0.488 105 172 33 39 66 .983
KC_ 0.475 102 186 32 40 62 .983
SD_ 0.469 141 156 32 73 68 .977
BAL 0.457 116 152 26 54 62 .981
MIL 0.451 118 186 37 57 61 .981
HOU 0.444 133 150 20 63 70 .978
TEX 0.438 135 162 31 57 78 .978
TB_ 0.429 118 171 37 67 51 .981
MIN 0.426 102 155 41 58 44 .983
PIT 0.426 132 170 33 61 71 .979
MON 0.414 132 151 47 52 80 .978
PHI 0.401 100 135 31 55 45 .983
CHC 0.401 100 139 19 43 57 .983
16
Table 2:
Correlation of team winning percentage with aggregate statistics
Variable Corr
Runs Scored 0.608
(0.00)
Runs allowed -0.677
(0.00)
Hits made 0.261
(0.16)
Hits allowed -0.547
(0.00)
Errors Made -0.129
(0.50)
Double plays -0.029
(0.88)
Outfield assists -0.367
(0.05)
Errors by cornermen -0.190
(0.31)
Fielding Pct. 0.131
(0.49)
p-values in parentheses
17
Table 3:
Regressions of team winning percentage on aggregate statistics
Variable Dependent = Win pct.
Model 1 Model 2 Model 3
Runs / g 0.085
(0.011)
Runs allowed / g -0.104
(0.010)
Hits / g 0.084
(0.018)
Hits allowed / g -0.118
(0.018)
Errors made 0.000205 0.000294 -0.000343
(0.000250) (0.000430) (0.000674)
Double plays 6.9e-06 0.000590 0.000225
276.1e-06 (0.000410) (0.000601)
Outfield assists 0.000105 -0.001455 -0.003298
(0.000733) (0.001035) (0.001657)
Constant term 0.567 0.732 0.603
(0.079) (0.170) (0.119)
R-squared (adj.) 0.8755 0.6535 0.0487
n=30; Standard errors in parentheses;
Bold indicates significant at 0.05 level;
Italics indicates significant at 0.10 level
18
Table 4: OLS Regressions of Everyday Player Salary as a
Function of Individual and Team Characteristics
Dependent = ln(salary in 2000)
Everyday players All (opening day)
position players
Variable Model 1 Model 2 Model 3 Model 4
ln(Population) 0.217*** 0.212** 0.201*** 0.198***
(0.089) (0.090) (0.076) (0.075)
Team WPct. 0.519 0.510 1.251** 1.268**
(0.712) (0.714) (0.610) (0.607)
Arbitration 1.560*** 1.563*** 1.115*** 1.112***
Eligible (0.124) (0.125) (0.100) (0.100)
Free agent 2.477*** 2.491*** 1.891*** 1.888***
(0.107) (0.108) (0.088) (0.088)
OBP + SLG. 1.667*** 1.666*** 0.649** 0.862***
(0.401) (0.404) (0.318) (0.323)
Plate 0.0014 0.0015*** 0.0027*** 0.0024***
Appearances (0.0004) (0.0004) (0.0002) (0.0002)
Errors -0.005 -0.009
(0.007) (0.007)
Normalized -0.008 0.067**
fielding pct. (0.050) (0.027)
Zone rating 0.444 0.095
(0.675) (0.355)
Constant 7.071*** 6.725*** 7.489*** 7.349***
(1.345) (1.452) (1.154) (1.186)
Obs 225 225 431 431
R-squared 0.760 0.758 0.665 0.668
(adj.)
19
Table 5. Distribution of error difference between opponents
Error difference Frequency Percent
0 906 37.30
1 1049 43.19
2 347 14.28
3 97 4.00
4 23 0.94
5 7 0.29
Total 2429 100.00
20
Figure 1: Distribution of “error gap”, own errors minus
opponent’s errors, for winning teams
.372993
Fraction
0
-5 5
errorgap
21
Table 7: Average errors per team per game, situational
Average errors made T-stat for significant
Situation per team per game difference (assume
(standard error) equal variances)
Winning team 0.570 11.15***
(0.016)
Losing team 0.849
(0.019)
At home 0.710 0.05
(0.018)
On the road 0.709
(0.018)
On (natural) grass 0.722 1.75*
(0.015)
On (artificial) turf 0.669
(0.025)
“Grass” team 0.712 0.37
(0.014)
“Turf” team 0.701
(0.026)
22
Table 8. Distribution of Error Rates, by Outs and Men-on-base
Situation No runners Men on
NO OUTS
Batted balls 34,205 15,694
Error rate 1.4% 2.1%
ONE OUT
Batted balls 23,355 22,843
Error rate 1.3% 1.8%
TWO OUTS
Batted balls 18,114 24,897
Error rate 1.6% 1.4%
23
Table 11: Probability of Runs Scored in an Inning,
by Basecode and Outs
base
code out p(0) p(1) p(2) p(3) p(4) p(5+) E(runs) Obs
0 0 0.695 0.159 0.078 0.038 0.017 0.014 0.577 45495
0 1 0.818 0.105 0.046 0.019 0.007 0.006 0.313 31968
0 2 0.918 0.054 0.019 0.006 0.002 0.001 0.124 25392
1 0 0.558 0.168 0.138 0.071 0.035 0.031 0.972 10804
1 1 0.709 0.117 0.097 0.044 0.018 0.016 0.600 12227
1 2 0.858 0.058 0.056 0.019 0.005 0.004 0.267 11946
2 0 0.368 0.352 0.139 0.074 0.040 0.027 1.170 3470
2 1 0.585 0.236 0.098 0.050 0.017 0.014 0.727 5867
2 2 0.780 0.147 0.047 0.017 0.005 0.003 0.329 7448
3 0 0.136 0.508 0.189 0.105 0.025 0.038 1.513 524
3 1 0.329 0.485 0.111 0.044 0.022 0.010 0.980 2010
3 2 0.731 0.186 0.054 0.018 0.006 0.004 0.398 3054
4 0 0.349 0.218 0.164 0.130 0.071 0.068 1.616 2886
4 1 0.569 0.160 0.104 0.094 0.039 0.034 0.998 5123
4 2 0.760 0.108 0.060 0.048 0.016 0.008 0.479 6435
5 0 0.116 0.435 0.165 0.136 0.082 0.067 1.889 1068
5 1 0.340 0.380 0.118 0.091 0.044 0.027 1.214 2277
5 2 0.729 0.146 0.054 0.051 0.014 0.006 0.496 2886
6 0 0.154 0.246 0.308 0.138 0.072 0.082 2.034 668
6 1 0.291 0.296 0.212 0.101 0.054 0.045 1.490 1695
6 2 0.727 0.048 0.146 0.047 0.017 0.016 0.628 1911
7 0 0.114 0.257 0.208 0.120 0.153 0.148 2.509 802
7 1 0.315 0.260 0.141 0.112 0.094 0.078 1.691 1949
7 2 0.672 0.091 0.107 0.055 0.049 0.025 0.807 2356
Note: Calculations extend to four decimal places, allow for scoring of
up to 9 runs in an inning, and track p(win) for run differentials of
(+/ ( ) 9 runs.
Basecodes: 1 = runner on 1st; 2 = runner on 2nd; 3 = runner on 3rd; 4 =
runners on 1st and 2nd; 5 = runners on 1st and 3rd; 6 = runners on 2nd
and 3rd; 7 = bases loaded.
24
Table 12: Probability of winning, situational
Home/ Run Basecode Outs Probability of
Inning Visitor Difference Before Before Winning
A. Early vs Late Plays
Value of Early Plays
1 Visitor 0 0 0 0.500
1 Visitor 0 1 0 0.534 Single/1B Error
1 Visitor 0 2 0 0.552 Double
1 Visitor 0 3 0 0.584 Triple
1 Visitor 1 0 0 0.590 Homer
Value of Late Plays
9 Visitor 0 0 0 0.500
9 Visitor 0 1 0 0.579 Single/1B Error
9 Visitor 0 2 0 0.660 Double
9 Visitor 0 3 0 0.770 Triple
9 Visitor 1 0 0 0.821 Homer
B. Changes in the Value of Leads
Increasing Value of a Small Lead
3 Visitor 1 0 0 0.604
6 Visitor 1 0 0 0.648
9 Visitor 1 0 0 0.821
Increasing Value of a Large Lead
3 Visitor 3 0 0 0.779
6 Visitor 3 0 0 0.848
9 Visitor 3 0 0 0.960
C. The Bottom Half vs. the Top Half
1 Home 0 0 0 0.551
9 Home 0 0 0 0.653
Note: The probability value applies to the team given by the team
indicator.
25
Table 13: Average Effect of Selected Events Upon Probability of
Winning
Event Frequency Mean Change Std. Error
in P(win)
Walk 17,028 0.0281 0.0002
Hit by pitch 1,572 0.0284 0.0006
Single 29,686 0.0418 0.0003
Double 8,902 0.0646 0.0007
Triple 952 0.0948 0.0026
Home run 5,693 0.1217 0.0013
Strikeout 31,254 -0.0276 0.0001
Ground out 35,191 -0.0220 0.0001
Fly out 25,279 -0.0248 0.0001
Ground into double play 3,833 -0.0753 0.0010
1B error 1,549 0.0375 0.0010
2B error 247 0.0595 0.0046
3B error 25 0.0920 0.0222
Stolen base 2,531 0.0137 0.0003
Caught stealing 1,308 -0.0428 0.0008
26
Table 14: Best/worst everyday players, using PGP, 2000 season
Rank: Credit
Pitcher First name Last name (33%) Errors DPs
1 ORLANDO HERNANDEZ 0.849 0 4
2 KIRK RUETER 0.715 0 4
3 KENNY ROGERS 0.615 2 5
118 CHUCK FINLEY -0.329 4 0
119 JOHN SNYDER -0.378 2 0
120 ALBIE LOPEZ -0.383 3 1
Catcher
1 EINAR DIAZ 0.683 4 4
2 JOE GIRARDI 0.477 5 5
3 BOBBY ESTALELLA 0.453 5 11
28 MARK L. JOHNSON -0.005 4 4
29 TODD HUNDLEY -0.013 13 3
30 JASON VARITEK -0.179 7 1
1B
1 TODD ZEILE 1.095 10 4
2 MARK GRACE 0.921 4 8
3 MARK MCGWIRE 0.887 1 2
28 J.T. SNOW -0.550 6 9
29 MO VAUGHN -0.889 14 12
30 KEVIN YOUNG -1.020 17 11
2B
1 LUIS CASTILLO 2.952 11 76
2 MICKEY MORANDINI 2.664 6 68
3 POKEY REESE 2.579 14 79
28 CHUCK KNOBLAUCH 0.037 15 39
29 JAY CANIZARO -0.115 6 39
30 JOSE OFFERMAN -0.178 7 42
3B
1 VINNY CASTILLA 1.055 8 20
2 WILLIE GREENE 0.894 7 17
3 AARON BOONE 0.779 8 18
28 ARAMIS RAMIREZ -0.878 14 6
29 ERIC CHAVEZ -1.258 18 15
30 DEAN PALMER -1.696 23 10
27
Table 14: Best/worst everyday players at each position, using PGP, 2000
season (cont.)
Credit
First name Last name (33%) Errors DPs OAs
SS
1 ALEX RODRIGUEZ 3.393 10 112
2 NEIFI PEREZ 3.276 18 107
3 REY SANCHEZ 2.681 4 100
28 RAFAEL FURCAL 0.446 23 53
29 ALEX GONZALEZ 0.389 19 57
30 MELVIN MORA 0.206 19 45
LF
1 DARIN ERSTAD 3.140 3 1 8
2 GEOFF JENKINS 2.819 7 3 12
3 BARRY BONDS 1.966 3 3 8
28 WILFREDO CORDERO -0.378 2 0 5
29 AL MARTIN -0.439 9 0 5
30 RUSTY GREER -1.451 3 0 2
CF
1 TORII HUNTER 2.306 3 3 12
2 MIKE CAMERON 2.100 6 3 5
3 JIM EDMONDS 1.818 4 2 8
28 JUAN ENCARNACION -0.126 5 1 3
29 GERALD WILLIAMS -0.136 6 1 6
30 CARL EVERETT -0.601 6 4 11
RF
1 MARK KOTSAY 3.151 3 3 12
2 JOSE GUILLEN 2.009 4 3 7
3 BRIAN JORDAN 1.912 3 0 4
28 MATT LAWTON -0.391 3 0 2
29 SAMMY SOSA -0.435 10 1 3
30 JEFFREY HAMMONDS -0.518 1 0 5
28
Table 15: OLS Regressions of Player Salary as a Function
of Individual and Team Characteristics, 2000 data
Dependent = ln(salary in 2000)
Everyday players All opening day position
players
Variable Model 1 Model 2 Model 3 Model 4
ln(Population) 0.227** 0.209** 0.196*** 0.199***
(0.089) (0.089) (0.075) (0.076)
Team WPct. 0.538 0.538 1.143* 1.246**
(0.709) (0.712) (0.607) (0.610)
Arbitration 1.588*** 1.567*** 1.110*** 1.120***
Eligible (0.122) (0.123) (0.099) (0.100)
Free agent 2.518*** 2.484*** 1.904*** 1.901***
(0.105) (0.106) (0.087) (0.088)
OBP + SLG. 1.661*** 0.695**
(0.400) (0.317)
Batting credit 0.096*** 0.069***
(0.022) (0.022)
Plate 0.0014*** 0.0015*** 0.0026*** 0.0025***
Appearances (0.0004) (0.0004) (0.0002) (0.0002)
Defensive -0.055 -0.046 -0.003 -0.003
credit (33%) (0.049) (0.050) (0.028) (0.028)
Constant 8.234*** 7.167*** 8.092*** 7.492***
(1.337) (1.348) (1.130) (1.152)
Obs 225 225 432 432
R-squared (adj.) 0.762 0.760 0.668 0.665
29
Table 16: Defensive factors considered in current version of PGP credit
Valued Factor
YES Errors
YES Double plays
YES Outfield assists
YES Infielder range
YES Outfielder range
NO Catcher/Pitcher ability to prevent stolen bases
NO Catcher/Pitcher avoidance of wild pitches/passed balls
NO Shifting of zones to “fit” batter’s hitting style
(pull hitters, strong/weak)
NO Preventing advancement of baserunners on
hits/flyballs
NO Pitcher pickoff move
NO Ballpark effects
30