Effects and Valuation of Fielding in Maj-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Effects and Valuation of Fielding in Major League Baseball:

A Play-by-Play Analysis

Jahn K. Hakes and Raymond D. Sauer

Clemson University

October 31, 2002

PRELIMINARY DRAFT: PLEASE DO NOT CITE WITHOUT PERMISSION

E- mail: [email protected]; [email protected]. Mail: The John E. Walker Dept. of


Economics, 222 Sirrine Hall, Clemson University, Clemson, SC 29634-1309.
1. Introduction

We attempt to estimate (1) the relative importance of fielding skills in the determination

of game outcomes in major league baseball and (2) valuation of these skills in the labor market.

Fielding skills have received little attention relative to batting, pitching, and managerial inputs in

economic analyses of baseball. This lack of attention is due to two main factors. First, sports

economics as a discipline has had bigger fish to fry. Studies of the baseball labor market have

focused on issues such as discrimination, outcomes of the arbitration process, and the extent to

which structural constraints sever the link between player wages and productivity. Evaluation of

fielding skills may not be an important component of these questions. Second, measurements of

batting and pitching skills have been well known and widely available for decades. Good data

on fielding attributes has become available only in recent years. Although measurement and

valuation of fielding skills is the focal point of this study, we also seek to make methodological

advances in other aspects of measurement of player marginal revenue product (MRP).

Jay Bennett (1984) used the concept of "player game percentages" (PGPs) to address the

question "Did Shoeless Joe Jackson Throw the 1919 World Series?" By taking into account the

conditions in effect – inning, outs, runners on base, score difference – each time Jackson was at

the plate or made a play on the ball, Bennett was able to compute the impact of Jackson's

performance on game outcomes. This level of detail enables one to get around the argument that

Jackson's aggrega te statistics somehow disguise his alleged throwing of the Series. Bennett was

able to show that Jackson's performance actually increased, rather than decreased, the Black Sox

chances of winning the series.

We calculate measures of performance for current players based on situational conditions

using play-by-play data for an entire season. 1 Situational conditions and fielding skills are

1
A critique of this approach is that it leans excessively towards what Rodney Fort (1992, p. 137) calls the "Bill
James model" of the salary process. Our ultimate aim differs from that of James, whose target is typically limited to
comparison of different player abilities in different eras, ballparks, etc. Our quantification of different elements by
2
potentially quite relevant to the outcome of the game. For example, suppose some shortstops are

consistently more effective than their counterparts at converting double-play opportunities.

Since these situations occur when runners are on base and scoring is more likely, this skill can

have a tangible effect on the outcome of games and, thus, a team's winning percentage. This,

and other skills, can only be measured and evaluated using techniques based on play-by-play

data. Similar examples for outfielders with outstanding range and accurate throwing arms which

prevent runners from advancing amplify the point: fielding skills are critical elements of a

baseball game.

Proper evaluation of these skills in the past has been inhibited by the availability of data

and computational constraints (our play-by-play file for 2000 is over 40mb). The "best"

indicator of fielding performance that has been explored in previous research is the "fielding

average," which measures the proportion of chances a fielder faces that he executes without

making an error. While important, this measure clearly fails to capture critical skills (some

described above) that impact the outcome of a game. Relaxation of data and computational

constraints now allow us to incorporate important information from play-by-play accounts of

baseball games. It is feasible and useful to address a new set of questions by pushing the

boundaries of research beyond aggregate statistics.

There are a host of questions that can be addressed using player game percentages and

play-by-play data. Our focus in this paper is to (1) measure the contribution of fielding to game

outcomes, and (2) assess whether the labor market's valuation of these skills is commensurate

with their contribution to team success. There are various reasons why one might find a dis-

connect between these impacts, which we discuss below.

2. Prior Estimates of the Impact of Fielding

play game percentages should be viewed as a more careful and detailed analysis of the (baseball) production side of
the profit maximization model.
3
Early papers on the baseball labor market ignore fielding for reasons stated previously

(e.g. Scully (1974), Sommers (1982), Zimbalist (1992), MacDonald and Reynolds (1994)).

More recent papers incorporate measures of fielding inputs in analyses of issues such as

arbitration and technical efficiency, but without much discussion. On the production side, Kahn

(1993a) and Ruggiero et al (1996) find strong positive relation between fielding average and

winning percentage. 2 In contrast, valuation studies find a weak relationship between measures of

fielding and player salaries. Kahn (1993b) and Gius and Hylan (1996) report coefficients in log

wage regressions for the number of Gold Glove awards that are positive, but marginally

significant. Similarly, Marburger (1996) obtains coefficient estimates for fielding average that

are positive, but at best marginally significant, in his model of salary determination under

arbitration.

James and Henzler (2002) devised a system called “win shares” to estimate the

contribution of players to team victory totals. To do so, they faced the same difficulties in

measuring defensive ability with aggregate fielding statistics that thwarted previous researchers.

To resolve them, James and Henzler use several ad hoc approximations, which result in plausible

estimates of the contributions of players in historical seasons. Their description of the system’s

details, however, is not sufficiently thorough to allow replication of their results or application of

their system to new data. Also, they report individual player defensive win shares only as a

fraction of the player’s total win shares for the decade, which precludes the possibility of our

using their estimates in our salary study unless we make simplifying assumptions about the time

path and variation of a player’s defensive skills. That said, their reported results do suggest that

defense is a significant factor of production for team wins. James and Henzler (2002) attribute

13-21% of team wins to defense for the 2000 major league season.

2
They do not emphasize the fact, but the coefficient estimates are on the order of 5 to 6 times their standard errors.
4
The contrast between production and valuation estimates presents a puzzle. While it

could be, despite the findings cited above, that fielding is of meager importance in baseball, it is

more likely that traditional fielding measures are poor estimators of defensive ability at the level

of the individual player. Another possibility to be addressed is that fielding isn't valued in

proportion to its impact on the game because sound defense isn’t as entertaining to the casual

(marginal) fan. Fans occassionally may cheer a quick catch and release that yields a double play

in a close game, but routinely give standing ovations for home runs (at least by the home team)

even in lop-sided contests. It is our hope that construction of more detailed and precise estimates

of the contributions made by playmakers in the field will enable us to address these issues.

3. Measurement of Fielding's Impact Using Aggregates

The economic literature regarding baseball fo llows two major themes. The first analyzes

the production aspects of the sport, and seeks to identify how teams win games and, by

association, how individual players contribute to those wins. The second studies how production

of wins is incorporated into team revenue and player salaries.

The simplist possible reason for the relative lack of attention given to fielding in

economic studies of baseball is that fielding may be regarded as irrelevant by researchers. A

superficial look at fielding statistics would tend to support this assessment, as there is very little

correlation between the most commonly available fielding statistics and team winning

percentage.

The most commonly available defensive statistics, collected for both individuals and for

teams are aggregate season totals of errors, putouts, assists, outfield assists, and double plays.

The first three statistics are combined to produce fielding percentage (fpct), which is calculated

as

5
assists + putouts assists + putouts
fpct = = . (1)
chances assists + putouts + errors

Using these statistics to look at team performance does not reveal much of immediate

interest. Table 1 depicts all 30 major league teams in the 2000 season, ranked by their regular

season winning percentage. For each team, the team aggregates of errors, double plays, outfield

assists and fpct are listed next to the team winning percentage. As many baseball managers and

former players are of the opinion that defensive strength of the team is more important “up the

middle”, error totals for those positions are isolated from errors committed by pitchers and

players at the “corner positions”.

Table 2 uses simple correlation coefficients between team winning percentage (winpct)

and various team aggregates to identify the "best" correlates of winning. Runs scored and runs

against correlate very highly with winning percentage for obvious reasons. Stepping back one

pace to look at the inputs into run scoring, hits allowed are still strongly detrimental to team

performance, but hits made by a team, while covarying directly with team winning percentage,

are not statistically significant.

The defensive statistics listed in Table 2 are not encouraging for the argument that

defense matters. Double plays, errors and fpct are not significantly correlated with winning, and

isolating errors by corner men and up the middle fielders is not helpful. 3 The one significant

correlate with winning percentage is outfield assists, which is significant at the 5% level and has

a negative sign, indicating that teams with higher numbers of outfield assists are less successful

on the field. This result could either be dismissed as a fluke or rationalized in that losing teams

allow more baserunners, hence there are more opportunities to record outfield assists.

When controlling for offensive production through multivariate regression, as in Table 3,

the defensive aggregates lose all statistical significance. It is only in Model 3, where both runs

3
The correlation coefficient for winpct versus errors by “up the middle” defenders (catcher, second base, shortstop,
and center field) is –0.016, which is not statistically significant.
6
and hits are omitted, that outfield assists are significant at even the 90% confidence level, and

then the coefficient has the negative sign witnessed above. While Kahn (1993a) and Ruggiero et

al (1996) obtained significant correlations for fielding percentage in regression models of

managerial efficiency, we were unable to obtain similar results with fpct for this particular

sample.

Fielding percentage, however, does influence player compensation in certain situations.

Table 4 presents the results of OLS regression models of player valuation. Each team’s market

size is captured through ln(population), which is the population of the metropolitan area, using

the CMSA size where available, and halving the population for markets shared by two major

league teams (New York, Chicago, Los Angeles, and San Francisco/Oakland) before taking the

natural logarithm. For lack of better data at this time, the ability-to-contract variables for

“Arbitration eligible” and “Free agent” were estimated using estimated years of service based

upon the first season in which the player appeared in 25 or more games.

Models 3 and 4 in Table 4 pool the statistics of all position players on major league

rosters for whom opening day salaries were available through USA Today. All of the coefficients

have the expected signs, and all of the control variables are significant at the 95%, and often the

99%, level. Model 3 focuses upon whether error information influences salaries, which it does

not, while Model 4 uses normalized fielding percentage and “zone rating”. The fielding

percentages were normalized by calculating the deviation of the player’s fielding percentage

from the opportunities-weighted average at his position, and dividing by the standard deviation

of fpcts at that position. Thus a one-unit increase in normalized fielding pct. represents a one

standard deviation improvement in that statistic. Zone rating is a relatively recent statistic which

reflects the percentage of the balls hit into a player’s area of defensive responsibility on which he

is able to make a play. Between zone rating and normalized fielding percentage, we can

7
presumably see how many balls a player is able to get to and how adroitly he is able to handle

the ball once he has it.

The results in Model 4, however, would suggest that while sure- handedness is valued by

the player salary market, range is not. This result might appear to be at odds with the

insignificance in a player’s error totals in Model 3, but the difference in results might be due to

information on opportunities not included in plate appearances. 4 When the model is limited to

full time players, the relevance of defensive statistics disappears. 5 In contrast, the measure of

batting performance -- on-base percentage plus slugging percentage (OPS) -- becomes much

more important in both magnitude and in statistical significance of the estimated coefficient.

While the evidence from aggregate defensive statistics as to the importance of fielding is

somewhat equivocal, it is possible that the failure to find telling results is due to information lost

in the aggregation. To see whether fielding matters at a more disaggregated level, we have

looked at individual game results to find the correlation of errors in a particular game with the

probability of winning that game.

Table 5 reports the distribution of error differentials for games in the 2000 season. One

team committed two or more errors more than its opposition in less than 20% of the contests.

Table 6 reports the conditional probability of winning a game once a team’s opponent has

allowed an error advantage. With even a one error advantage, a team’s chances of winning

increase from even to about five-to-three. As the error differential increases, the probability of

winning increases significantly. Figure 1 shows this result graphically, showing that the density

of wins is higher for teams with negative differences between own errors and opponent errors.

4
For instance, some teams have more fly ball or ground ball pitchers than others. Also, as pitching staffs differ in
strikeout rates and in hits allowed, there may be more balls put in play in an average game against certain teams,
which would alter error totals for their fielders.
5
Full time is arbitrarily defined here as being among the top 30 players at a given position in balls hit into one’s
zone.
8
This result carries over to Table 7, which shows that the winning teams average fewer errors than

the losing teams.

Table 7 also counters concerns that the error differentials are due to home field

advantage. Errors per game are statistically equivalent at home and on the road, and for teams

which normally play on grass versus those which are at home on artificial surfaces. More errors

per game occur during games on natural grass, but the extra errors are made by both turf teams

and grass teams.

That errors in a game influence a game’s outcome does not logically lead to the

conclusion that aggregate errors should influence team winning percentage. We would also need

to know whether errors occur randomly or if error rates are conditional upon the context of the

game.

The play-by-play data indicate that errors, at least in part, are situational in nature. The

incidence of errors that occur on batted balls is broken down in Table 8. With no outs, the error

rate is 1.4% with the bases empty, but rises to 2.1% with runners on base. A similar increase

takes place with one out (from 1.3 to 1.8%), but the effect vanishes with two outs. In the latter

case, the error rate is similar to all out situations with the bases empty. The multiple options and

decisions that must be made in a split second are more numerous when runners are on base.

These situations may induce momentary indecision which would account for the higher error

rate. With two outs, the runners on base are less relevant to the strategy employed by the fielder,

and often allow for an easier play to be made to end the inning (any force out will do the job).

Table 9 suggests that the score of the game may also influence the likelihood of the error.

Error rates are higher for games which are going poorly and lower for games in which a team is

cruising along. This may be due either to the absence of difficult base runner situations when

one’s team is far ahead, or to the greater likelihood of better defensive teams obtaining the lead.

9
If errors were purely random acts of God, then the variation in fielding ability from

season to season would simply be noise. There would be no such thing as error-prone fielders or

those with soft hands. Fielding percentage would then be relevant for game outcomes only in the

year in which it is measured, but would be irrelevant for salary determination. Alternatively,

some fielders may be consistently good and others consistently bad. If this is the case, then year

to year measures of fielding ability across players will be positively correlated. We examine this

by estimating correlation coefficients corr(Fi(t-1),Fit ), where Fit is a measure of fielding ability for

player i in year t. We estimate these coefficients for the three common indicators of ability:

fielding percentage, range factor, and zone rating. Since these measures are systematically

different for infielders, outfielders, catchers and first basemen, we estimate the coefficients

separately for each group and present them in Table 10.

Table 10 shows that the correlation across seasons is positive for all indicators,

particularly for infielders and outfielders. This is further evidence that there is a systematic

component to both good and bad fielding attributes which should impact both game outcomes

and valuation of skills.

In order to measure the influence of a play upon the outcome of the game, we adapt the

“player game percentage” (PGP) method developed in Bennett and Fleuck (1984) and Bennett

(1993). By determining the probability of a team winning a game from a given situation, and

measuring that probability both before and after a play, the effect of the play upon the game can

be expressed in expected value terms.

4. The Impact of Plays on the Probability of Winning

To establish the probability of winning from a given situation, it is necessary, as will be

shown in greater detail below, to know the distribution of runs scored in a given inning. We

assume that this distribution is statio nary across innings. Using the play-by-play data, the

10
empirical run scoring distribution is as shown in Table 11. The distributions conditional upon

base runner codes and outs demonstrate the extent to which allowing base runner advancement

harms a defensive team’s chances, and the changes in distribution as outs increase help inform

the tradeoffs involved in sacrifices and decisions on how to play a particular batted ball (e.g. to

attempt to throw out the lead runner, or prevent the trailing runner from advancing). 6

Using the probabilities in Table 11, and the assumption of stationarity, we can use

backward induction to calculate the probability of winning in each inning, conditional on the run

difference, basecode and number of outs. Let PH(h,I,b,o,d) represent the probability that the

home team, H, wins a game situated in the h half of inning I, with runners indicated by basecode

b, o outs, and facing a run difference of d runs. If the home team is trailing at the start of the

bottom of the ninth inning, the probability that it wins is

PH(1,9,0,0,d) = PS(R>d|0,0) + .5* P S(R=d|0,0),

where P S(R|b,o) is the stationary probability function for scoring R runs during the inning

conditional on basecode and outs, with a score difference (runs less opponents runs) of d runs at

the start of the ninth. At the start of the ninth, the conditional probabilities are taken from the

first row of Table 5. As outs are recorded and/or runners advance, the probabilities move to the

row which matches the basecode and out situation of the game. The relevant columns for the

probability sums also change when d changes as the home team scores.

Once we know the probability that the home team overcomes a deficit in the bottom of

the ninth, we can take any run difference facing the visiting team in the top of the ninth, the

probability it scores and hence changes the run difference, and the probability that the home team

overcomes this new difference (if necessary), and thereby compute the probability that the

visiting team is victorious given any situation it faces in the top of the ninth. For example,

6
Baserunner positons are coded as follows: 1 = runner on 1st ; 2 = runner on 2nd ; 3 = runner on 3rd; 4 = runners on 1st
and 2nd ; 5 = runners on 1st and 3rd; 6 = runners on 2nd and 3rd; 7 = bases loaded.
11
9
PV (0 ,9, 0 ,0 , d ) = ∑ PS ( R | 0 ,0 )(1 − PH (1,9 ,0 , 0, − (d + R )))
R =0

where P V(h,I,b,o,d) is the visitor’s probability of winning the game. In recursive fashion, the

probabilities can be computed in this manner all the way to the top half of the first inning.

Table 12 contains a selected group of probability estimates generated in this way to

illustrate the relative impact of particular plays at various stages of the game. Section A of Table

12 indicates the increasing value of plays as the game wears on. The value of a proficient

leadoff hitter (such as Rickey Henderson, the record holder for career leadoff home runs) is

apparent in the second through fifth rows of the table. Nevertheless, a single or one-base error in

the ninth inning of a tight game is worth about twice that of a single in the first inning. More

pronounced is the ninth inning home run, which is worth about three times as much as its first

inning counterpart. 7

Section B of Table 12 illustrates how the value of small (one run) and larger (three run)

leads increases as the game moves towards its conclusion. Section C measures the probability of

winning at the start of the bottom half of innings 1 and 9 in a tie game. The difference of 5 and

15 percentage points over the P V (0,I,0,0,0) of .500 at the top of the inning indicates the

increasing value to the home team of recording a scoreless top half of the inning.

The extent to which a particular play influences the probability of winning that game is

illustrated in Table 13. As would be expected, singles are “worth” more than walks due to the

possibilities for runner advancement, and extra base hits are worth more than singles. The

negative effect of a ground into double play (GIDP) on the offensive team’s probability of

winning is more than three times the magnitude of a routine ground out, and errors are only very

slightly less costly to a team than allowing a hit for an identical number of bases.

7
From a management perspective, Tommy Lasorda's use of Kirk Gibson in the 1988 World Series comes to mind.
12
There are two shortcomings of these probabilities, but we do not believe they

compromise our approach. First, endgame and batting order effects are likely to make scoring

probabilities non-stationary. Second, we have ignored the identity of the home team and the

pitchers, which will also affect the numbers in the table. But these effects will impact all

numbers by a similar magnitude. Since our measure of performance is based on changes in

probabilities, differencing any bias of constant magnitude will result in a clean, unbiased

probability estimate.

5. Estimates of Fielding Ability Using PGP

The PGP numbers that we have constructed so far are incomplete, but they contain many

of the factors by which defense can influence a game’s outcome. The numbers presented in

Table 7 directly measure key plays, (i.e. the PGP effects of errors, turning double plays, and

recording outfield assists), as well as estimates of outfield and infield range calculated from the

average PGP result of all balls hit into a player’s area of responsibility. While several factors

have yet to be included in our measure, such as ballpark effects, and the deterrence effect of a

strong outfield throwing arm upon baserunners, the estimates represent a first pass at a

comprehensive, and separable, valuation of fielding ability.

The top three and bottom three players at each position, using our method, are listed in

Table 14. In many cases, these rankings are compatible with the conventional wisdom. Alex

Rodriguez, in addition to his hitting prowess, also has the highest PGP value for his fielding,

which gives an expected value of three expected wins higher than the worst shortstops in the

majors. Noted outfielders Darin Erstad, Barry Bonds, Torii Hunter, and Jim Edmonds also are

among the leaders at their positions, while known defensive liabilities Mo Vaughn and Al Martin

make the bottom-three lists.

13
Using these PGP numbers under the label “defensive credit”, Table 15 reestimates the

salary model first shown in Table 4. As the statistic currently stands, defensive PGP is not

reflected in salaries. As a confirmation of the usefulness of the PGP methodology, however,

offensive PGPs calculated in the same manner for each batters plate appearances are a viable

substitute for OPS as a measure of offensive value.

6. Future Research

The main benefit of PGPs over traditional statistics, however, will come when we

allocate the share of a play’s probability change accurately between the batter, the pitcher, the

baserunners, and the fielders who handle the ball. Then the units of PGP can be used to estimate

the dollar value of a win. As the current “batter credit” variable (incorrectly) gives the full credit

of a hit to the batter, without assigning any blame to the pitcher, it overstates a player’s games

won through his batting. As such, the salary effect of a game won through an everyday player’s

batting prowess is likely to be higher than the 9.6% point estimate in Model 1. Once credit is

properly attributed, however, through incorporation of the missing elements in Tables 16 and 17,

it should be possible to determine whether games won with a bat are valued in salaries equally

with games won with a glove or a pitching arm.

14
References

Bennett, Jay. "Did Shoeless Joe Jackson Throw the 1919 World Series?" The American
Statistician, November 1993, 47, 241-250.

Bennet, Jay, and Fleuck, J. A. "Player Game Percentage," In Proceedings of the Social Statistics
Section, American Statistical Association, 1984, 378-380.

Burgess, Paul L., Marburger, Daniel R., and Scoggins, John F. "Do Baseball Arbitrators Simply
Flip a Coin?" In Baseball Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.

Fizel, John P. "Is There Bias in Major League Baseball Arbitration?" In Baseball Economics, J.
Fizel et al eds., Praeger, 1996: Westport, CT.

Fort, Rodney. "Pay and Performance: Is the Field of Dreams Barren? In Diamonds are
Forever, P. Sommers ed., Brookings, 1992: Washington D.C.

Gius, Mark Paul and Hylan, Timothy R. "An Interperiod Analysis of the salary Impact of
Structural Changes in Major League Baseball: Evidence from Panel Data," In Baseball
Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.

James, Bill and Henzler, Jim. Win Shares. Stats Inc. Publishing, 2002: Morton Grove, IL.

Kahn, Lawrence M. "Management Quality, Team Success, and Individual Player Performance in
Major League Baseball," Industrial and Labor Relations Review, 1993a, 46, 531-547.

Kahn, Lawrence M. "Free Agency, Long-Term Contracts and Compensation in Major League
Baseball: Estimates from Panel Data," The Review of Economics and Statistics, February 1993b,
75, 157-164.

Marburger, Daniel R. "A Comparison of Salary Determination in the Free Agent and Salary
Arbitration Markets," In Baseball Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.

McDonald, Don N. and Reynolds, Morgan O. "Are Baseball Players Paid their Marginal
Products," Managerial and Decision Economics, September/October 1994, 15, 443:457.

Ruggiero, John, Hadley, Lawrence, and Gustafson, Elizabeth. "Technical Efficiency in Major
League baseball," In Baseball Economics, J. Fizel et al eds., Praeger, 1996: Westport, CT.

Scully, Gerald P. "Pay and Performance in Major League Baseball," American Economic
Review, December 1974, 64: 915-30.

Sommers, Paul M. and Quinton, Noel. ""Pay and Performance in Major League Baseball: The
Case of the First Family of Free Agents," Journal of Human Resources, Summer 1982, 17: 426-
36.

Zimbalist, Andrew. "Salaries and Performance: Beyond the Scully Model," In Diamonds are
Forever, P. Sommers ed., Brookings, 1992: Washington D.C.

15
Table 1: Team success and aggregate defensive statistics,
2000 season
Errors by Errors by
Win Double OF C,2B,SS, P,1B,3B, Fielding
Team pct. Errors plays assists CF LF,RF Pct.
SF_ 0.599 93 173 31 48 45 .985
CWS 0.586 133 190 33 73 60 .978
ATL 0.586 129 137 25 64 65 .979
STL 0.586 111 148 23 56 55 .981
NYM 0.580 118 122 28 50 68 .980
OAK 0.565 134 165 20 73 61 .978
SEA 0.562 99 176 20 41 58 .984
CLE 0.556 72 147 26 35 37 .988
NYY 0.540 109 133 18 56 53 .981
LA_ 0.531 135 151 25 65 70 .978
ARI 0.525 107 138 29 51 56 .982
CIN 0.525 111 157 38 51 60 .982
BOS 0.525 109 120 33 51 58 .982
TOR 0.512 100 177 27 47 53 .984
ANA 0.506 134 182 34 63 71 .978
COL 0.506 94 176 35 51 43 .985
FLA 0.491 125 144 37 67 58 .980
DET 0.488 105 172 33 39 66 .983
KC_ 0.475 102 186 32 40 62 .983
SD_ 0.469 141 156 32 73 68 .977
BAL 0.457 116 152 26 54 62 .981
MIL 0.451 118 186 37 57 61 .981
HOU 0.444 133 150 20 63 70 .978
TEX 0.438 135 162 31 57 78 .978
TB_ 0.429 118 171 37 67 51 .981
MIN 0.426 102 155 41 58 44 .983
PIT 0.426 132 170 33 61 71 .979
MON 0.414 132 151 47 52 80 .978
PHI 0.401 100 135 31 55 45 .983
CHC 0.401 100 139 19 43 57 .983

16
Table 2:
Correlation of team winning percentage with aggregate statistics

Variable Corr
Runs Scored 0.608
(0.00)
Runs allowed -0.677
(0.00)
Hits made 0.261
(0.16)
Hits allowed -0.547
(0.00)
Errors Made -0.129
(0.50)
Double plays -0.029
(0.88)
Outfield assists -0.367
(0.05)
Errors by cornermen -0.190
(0.31)
Fielding Pct. 0.131
(0.49)
p-values in parentheses

17
Table 3:
Regressions of team winning percentage on aggregate statistics
Variable Dependent = Win pct.
Model 1 Model 2 Model 3
Runs / g 0.085
(0.011)
Runs allowed / g -0.104
(0.010)
Hits / g 0.084
(0.018)
Hits allowed / g -0.118
(0.018)
Errors made 0.000205 0.000294 -0.000343
(0.000250) (0.000430) (0.000674)
Double plays 6.9e-06 0.000590 0.000225
276.1e-06 (0.000410) (0.000601)
Outfield assists 0.000105 -0.001455 -0.003298
(0.000733) (0.001035) (0.001657)
Constant term 0.567 0.732 0.603
(0.079) (0.170) (0.119)
R-squared (adj.) 0.8755 0.6535 0.0487
n=30; Standard errors in parentheses;
Bold indicates significant at 0.05 level;
Italics indicates significant at 0.10 level

18
Table 4: OLS Regressions of Everyday Player Salary as a
Function of Individual and Team Characteristics
Dependent = ln(salary in 2000)
Everyday players All (opening day)
position players
Variable Model 1 Model 2 Model 3 Model 4
ln(Population) 0.217*** 0.212** 0.201*** 0.198***
(0.089) (0.090) (0.076) (0.075)
Team WPct. 0.519 0.510 1.251** 1.268**
(0.712) (0.714) (0.610) (0.607)
Arbitration 1.560*** 1.563*** 1.115*** 1.112***
Eligible (0.124) (0.125) (0.100) (0.100)
Free agent 2.477*** 2.491*** 1.891*** 1.888***
(0.107) (0.108) (0.088) (0.088)
OBP + SLG. 1.667*** 1.666*** 0.649** 0.862***
(0.401) (0.404) (0.318) (0.323)
Plate 0.0014 0.0015*** 0.0027*** 0.0024***
Appearances (0.0004) (0.0004) (0.0002) (0.0002)
Errors -0.005 -0.009
(0.007) (0.007)
Normalized -0.008 0.067**
fielding pct. (0.050) (0.027)
Zone rating 0.444 0.095
(0.675) (0.355)
Constant 7.071*** 6.725*** 7.489*** 7.349***
(1.345) (1.452) (1.154) (1.186)
Obs 225 225 431 431
R-squared 0.760 0.758 0.665 0.668
(adj.)

19
Table 5. Distribution of error difference between opponents
Error difference Frequency Percent
0 906 37.30
1 1049 43.19
2 347 14.28
3 97 4.00
4 23 0.94
5 7 0.29
Total 2429 100.00

Table 6: Probability of winning given an error advantage


# errors
advantage Obs Win pct. Std. Err (95% c.i.)
0 906 [.500]
1 1049 .620 .015 (.590, .649)
2 347 .646 .026 (.595, .696)
3 97 .742 .045 (.653, .831)
4 23 .870 .072 (.721, 1.018
5 7 .714 .184 (.263, 1.166)

20
Figure 1: Distribution of “error gap”, own errors minus
opponent’s errors, for winning teams

.372993
Fraction

0
-5 5
errorgap

21
Table 7: Average errors per team per game, situational
Average errors made T-stat for significant
Situation per team per game difference (assume
(standard error) equal variances)
Winning team 0.570 11.15***
(0.016)
Losing team 0.849
(0.019)
At home 0.710 0.05
(0.018)
On the road 0.709
(0.018)
On (natural) grass 0.722 1.75*
(0.015)
On (artificial) turf 0.669
(0.025)
“Grass” team 0.712 0.37
(0.014)
“Turf” team 0.701
(0.026)

22
Table 8. Distribution of Error Rates, by Outs and Men-on-base
Situation No runners Men on
NO OUTS
Batted balls 34,205 15,694
Error rate 1.4% 2.1%
ONE OUT
Batted balls 23,355 22,843
Error rate 1.3% 1.8%
TWO OUTS
Batted balls 18,114 24,897
Error rate 1.6% 1.4%

Table 9: Distribution of error rates, by current score of game


Score of game Balls in play Errors Error %
3 or more runs ahead 25489 534 2.09
Within two runs 89531 2252 2.51
3 or more runs behind 24095 661 2.74

Table 10: Season to Season Correlation of Individual Fielding


Attributes
All
Positions Infield Outfield 1st Base Catcher
Fielding Pct 0.760 0.667 0.413 0.258 0.231
Range
Factor 0.988 0.949 0.754 0.467 0.267
Zone Rating 0.761 0.833 0.571 0.427 0.737
Number of
Paired
Observation
s 1025 363 323 166 173
Notes: Active players in 2000 who started 50+ games at a given
position in successive years. Data from StatsInc, 1991-2001.

23
Table 11: Probability of Runs Scored in an Inning,
by Basecode and Outs
base
code out p(0) p(1) p(2) p(3) p(4) p(5+) E(runs) Obs
0 0 0.695 0.159 0.078 0.038 0.017 0.014 0.577 45495
0 1 0.818 0.105 0.046 0.019 0.007 0.006 0.313 31968
0 2 0.918 0.054 0.019 0.006 0.002 0.001 0.124 25392
1 0 0.558 0.168 0.138 0.071 0.035 0.031 0.972 10804
1 1 0.709 0.117 0.097 0.044 0.018 0.016 0.600 12227
1 2 0.858 0.058 0.056 0.019 0.005 0.004 0.267 11946
2 0 0.368 0.352 0.139 0.074 0.040 0.027 1.170 3470
2 1 0.585 0.236 0.098 0.050 0.017 0.014 0.727 5867
2 2 0.780 0.147 0.047 0.017 0.005 0.003 0.329 7448
3 0 0.136 0.508 0.189 0.105 0.025 0.038 1.513 524
3 1 0.329 0.485 0.111 0.044 0.022 0.010 0.980 2010
3 2 0.731 0.186 0.054 0.018 0.006 0.004 0.398 3054
4 0 0.349 0.218 0.164 0.130 0.071 0.068 1.616 2886
4 1 0.569 0.160 0.104 0.094 0.039 0.034 0.998 5123
4 2 0.760 0.108 0.060 0.048 0.016 0.008 0.479 6435
5 0 0.116 0.435 0.165 0.136 0.082 0.067 1.889 1068
5 1 0.340 0.380 0.118 0.091 0.044 0.027 1.214 2277
5 2 0.729 0.146 0.054 0.051 0.014 0.006 0.496 2886
6 0 0.154 0.246 0.308 0.138 0.072 0.082 2.034 668
6 1 0.291 0.296 0.212 0.101 0.054 0.045 1.490 1695
6 2 0.727 0.048 0.146 0.047 0.017 0.016 0.628 1911
7 0 0.114 0.257 0.208 0.120 0.153 0.148 2.509 802
7 1 0.315 0.260 0.141 0.112 0.094 0.078 1.691 1949
7 2 0.672 0.091 0.107 0.055 0.049 0.025 0.807 2356
Note: Calculations extend to four decimal places, allow for scoring of
up to 9 runs in an inning, and track p(win) for run differentials of
(+/ ( ) 9 runs.
Basecodes: 1 = runner on 1st; 2 = runner on 2nd; 3 = runner on 3rd; 4 =
runners on 1st and 2nd; 5 = runners on 1st and 3rd; 6 = runners on 2nd
and 3rd; 7 = bases loaded.

24
Table 12: Probability of winning, situational
Home/ Run Basecode Outs Probability of
Inning Visitor Difference Before Before Winning
A. Early vs Late Plays
Value of Early Plays
1 Visitor 0 0 0 0.500
1 Visitor 0 1 0 0.534 Single/1B Error
1 Visitor 0 2 0 0.552 Double
1 Visitor 0 3 0 0.584 Triple
1 Visitor 1 0 0 0.590 Homer
Value of Late Plays
9 Visitor 0 0 0 0.500
9 Visitor 0 1 0 0.579 Single/1B Error
9 Visitor 0 2 0 0.660 Double
9 Visitor 0 3 0 0.770 Triple
9 Visitor 1 0 0 0.821 Homer
B. Changes in the Value of Leads
Increasing Value of a Small Lead
3 Visitor 1 0 0 0.604
6 Visitor 1 0 0 0.648
9 Visitor 1 0 0 0.821
Increasing Value of a Large Lead
3 Visitor 3 0 0 0.779
6 Visitor 3 0 0 0.848
9 Visitor 3 0 0 0.960
C. The Bottom Half vs. the Top Half
1 Home 0 0 0 0.551
9 Home 0 0 0 0.653
Note: The probability value applies to the team given by the team
indicator.

25
Table 13: Average Effect of Selected Events Upon Probability of
Winning
Event Frequency Mean Change Std. Error
in P(win)
Walk 17,028 0.0281 0.0002
Hit by pitch 1,572 0.0284 0.0006
Single 29,686 0.0418 0.0003
Double 8,902 0.0646 0.0007
Triple 952 0.0948 0.0026
Home run 5,693 0.1217 0.0013
Strikeout 31,254 -0.0276 0.0001
Ground out 35,191 -0.0220 0.0001
Fly out 25,279 -0.0248 0.0001
Ground into double play 3,833 -0.0753 0.0010
1B error 1,549 0.0375 0.0010
2B error 247 0.0595 0.0046
3B error 25 0.0920 0.0222
Stolen base 2,531 0.0137 0.0003
Caught stealing 1,308 -0.0428 0.0008

26
Table 14: Best/worst everyday players, using PGP, 2000 season
Rank: Credit
Pitcher First name Last name (33%) Errors DPs
1 ORLANDO HERNANDEZ 0.849 0 4
2 KIRK RUETER 0.715 0 4
3 KENNY ROGERS 0.615 2 5
118 CHUCK FINLEY -0.329 4 0
119 JOHN SNYDER -0.378 2 0
120 ALBIE LOPEZ -0.383 3 1
Catcher
1 EINAR DIAZ 0.683 4 4
2 JOE GIRARDI 0.477 5 5
3 BOBBY ESTALELLA 0.453 5 11
28 MARK L. JOHNSON -0.005 4 4
29 TODD HUNDLEY -0.013 13 3
30 JASON VARITEK -0.179 7 1
1B
1 TODD ZEILE 1.095 10 4
2 MARK GRACE 0.921 4 8
3 MARK MCGWIRE 0.887 1 2
28 J.T. SNOW -0.550 6 9
29 MO VAUGHN -0.889 14 12
30 KEVIN YOUNG -1.020 17 11
2B
1 LUIS CASTILLO 2.952 11 76
2 MICKEY MORANDINI 2.664 6 68
3 POKEY REESE 2.579 14 79
28 CHUCK KNOBLAUCH 0.037 15 39
29 JAY CANIZARO -0.115 6 39
30 JOSE OFFERMAN -0.178 7 42
3B
1 VINNY CASTILLA 1.055 8 20
2 WILLIE GREENE 0.894 7 17
3 AARON BOONE 0.779 8 18
28 ARAMIS RAMIREZ -0.878 14 6
29 ERIC CHAVEZ -1.258 18 15
30 DEAN PALMER -1.696 23 10
27
Table 14: Best/worst everyday players at each position, using PGP, 2000
season (cont.)
Credit
First name Last name (33%) Errors DPs OAs
SS
1 ALEX RODRIGUEZ 3.393 10 112
2 NEIFI PEREZ 3.276 18 107
3 REY SANCHEZ 2.681 4 100
28 RAFAEL FURCAL 0.446 23 53
29 ALEX GONZALEZ 0.389 19 57
30 MELVIN MORA 0.206 19 45
LF
1 DARIN ERSTAD 3.140 3 1 8
2 GEOFF JENKINS 2.819 7 3 12
3 BARRY BONDS 1.966 3 3 8
28 WILFREDO CORDERO -0.378 2 0 5
29 AL MARTIN -0.439 9 0 5
30 RUSTY GREER -1.451 3 0 2
CF
1 TORII HUNTER 2.306 3 3 12
2 MIKE CAMERON 2.100 6 3 5
3 JIM EDMONDS 1.818 4 2 8
28 JUAN ENCARNACION -0.126 5 1 3
29 GERALD WILLIAMS -0.136 6 1 6
30 CARL EVERETT -0.601 6 4 11
RF
1 MARK KOTSAY 3.151 3 3 12
2 JOSE GUILLEN 2.009 4 3 7
3 BRIAN JORDAN 1.912 3 0 4
28 MATT LAWTON -0.391 3 0 2
29 SAMMY SOSA -0.435 10 1 3
30 JEFFREY HAMMONDS -0.518 1 0 5

28
Table 15: OLS Regressions of Player Salary as a Function
of Individual and Team Characteristics, 2000 data
Dependent = ln(salary in 2000)
Everyday players All opening day position
players
Variable Model 1 Model 2 Model 3 Model 4
ln(Population) 0.227** 0.209** 0.196*** 0.199***
(0.089) (0.089) (0.075) (0.076)
Team WPct. 0.538 0.538 1.143* 1.246**
(0.709) (0.712) (0.607) (0.610)
Arbitration 1.588*** 1.567*** 1.110*** 1.120***
Eligible (0.122) (0.123) (0.099) (0.100)
Free agent 2.518*** 2.484*** 1.904*** 1.901***
(0.105) (0.106) (0.087) (0.088)
OBP + SLG. 1.661*** 0.695**
(0.400) (0.317)
Batting credit 0.096*** 0.069***
(0.022) (0.022)
Plate 0.0014*** 0.0015*** 0.0026*** 0.0025***
Appearances (0.0004) (0.0004) (0.0002) (0.0002)
Defensive -0.055 -0.046 -0.003 -0.003
credit (33%) (0.049) (0.050) (0.028) (0.028)
Constant 8.234*** 7.167*** 8.092*** 7.492***
(1.337) (1.348) (1.130) (1.152)
Obs 225 225 432 432
R-squared (adj.) 0.762 0.760 0.668 0.665

29
Table 16: Defensive factors considered in current version of PGP credit
Valued Factor
YES Errors
YES Double plays
YES Outfield assists
YES Infielder range
YES Outfielder range
NO Catcher/Pitcher ability to prevent stolen bases
NO Catcher/Pitcher avoidance of wild pitches/passed balls
NO Shifting of zones to “fit” batter’s hitting style
(pull hitters, strong/weak)
NO Preventing advancement of baserunners on
hits/flyballs
NO Pitcher pickoff move
NO Ballpark effects

Table 17: future research


• Use of PGP to study effectiveness of strategies
o stolen bases
o sacrifice bunts
o intentional walks
o hit-and-runs
o infield in/back/dp-depth
• Who are the real RBI-men, controlling for opportunities?
• Inclusion of memorabilia values to control for popularity unrelated to
PGP
• Calculation of pitcher PGP data
o establish pitcher salaries
o test for starter vs. reliever value
o determine share of credit to give to batting, baserunning, pitching,
and fielding.
o Check whether markets correctly value each element.

30

You might also like