Copyright

An Exploration of and Case Studies in Demand Forecast Accuracy:
Replenishment, Point of Sale, and Bounding Conditions
DISSERTATION
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy
in the Graduate School of The Ohio State University
By
Kevin Barry Smyth, M.S.
Graduate Program in Business Administration
The Ohio State University
2017
Dissertation Committee:
A. Michael Knemeyer, Advisor
Keely L. Croxton
Rod Franklin
Xiang (Sean) Wan

Copyright by
Kevin Barry Smyth
2017
Abstract
Forecasts are a critical input that drive actions within the firm and throughout the
supply chain. For good reason, there is a tremendous focus on accuracy for this input.
This dissertation addresses three areas regarding forecast accuracy in logistics and the
supply chain relating to three questions posed by demand planners at a logistics provider
firm that partnered with this research. In attempting to determine “What is causing our
replenishment forecast error?”, “What predictive factors can help improve our demand
forecast accuracy?”, and with regards to forecast accuracy “How good is good enough?”,
we explore three interrelated topics that have a broader impact on the academic
conception of forecast accuracy than the original questions posed.
In three essays, we identify governance form factors that affect replenishment
forecast deviation and bias, demonstrate accuracy improvement though the inclusion of
uncertain weather forecast information in demand forecasts, and identify themes that
serve to bound achievable and desirable demand forecast accuracy through a systematic
literature review of logistics and supply chain journals. Our first study measures the
deviation and bias related to franchise governance form, but also demonstrates a novel
approach to contextualize the heterogeneity of effects across regionally, temporally and
product category related conditions. Our second study expands on previous work linking
the inclusion of uncertain weather forecast variables to improvements in demand forecast
ii
accuracy by examining a wider range of products and locations in a new industry, but
also by demonstrating the limits to the value of uncertain information. Finally, our
systematic literature review comprehensively presents the current state of research on the
thematic drivers of forecast accuracy.
Each essay expands theoretical understanding of management phenomena, and
reframes the manner in which previous research can be applied in practice. In each we
also propose future avenues to expand on the work here, and on forecasting in general in
the context of logistic and the supply chain.
iii
Acknowledgments
I first wish to thank my committee and, and especially my advisor for guiding me
through this process. Their advice, constructive criticism, and at times moral support
helped get me through a long and arduous effort. Drs. Mike Knemeyer, Keely Croxton,
Rod Franklin and Xiang Wan all dedicated a tremendous amount of time and energy to
ensure the success of my research. By their support, this product is superior to what
would be possible through my work alone.
I want to thank the supply chain planning division of our practitioner company
research partner for their patience and support as we worked with them on multiple
projects. To Scott Saunders and Chris Karsten for facilitating our research through the
Global Supply Chain Forum at The Ohio State University. To Chris Awalt for his
continued effort to guide, inform and provide for the collaboration between our research
team and their firm. To Jagadeesh Deva, Brent Labhart, Jim O’Rourke and Caroline
Francisca for their guidance and facilitation of data collection. Without all of their
efforts, none of this research would have been possible, and it is my sincere hope that this
work contributes to their success as well.
Thanks to Dr. Douglas Lambert and the all those that help run the Global Supply
Chain Forum. Their work facilitating a collaborative research environment for
iv
practitioners and academics served as the genesis of this research, and enabled the
required resources to see it through.
Finally, and most importantly, I want to thank my wife Emily. Her love and
support gave me the strength to complete this project. She sacrificed her time and
shouldered additional burdens with our family, all while excelling as the communications
director at the Virtual Labs School at The Ohio State University and pursuing a Master’s
of Social Work at Indiana University.
v
Vita
2008 B.S. Nuclear Engineering with a Minor in Aerospace
Studies, Oregon State University
2012 M.S. Logistics and Supply Chain Management, Air Force
Institute of Technology (AFIT)
2012 Graduate Certificate, Nuclear Weapon Effects,
Proliferation and Policy, AFIT
2008 to present Munitions and Missile Maintenance Officer, United States
Air Force
2014 to present Graduate Teaching Associate, Department of Marketing
and Logistics, The Ohio State University
Fields of Study
Major Field: Business Administration
vi
Table of Contents
Abstract ............................................................................................................................... ii
Acknowledgments.............................................................................................................. iv
Vita..................................................................................................................................... vi
Table of Contents .............................................................................................................. vii
List of Tables ................................................................................................................... xiii
List of Figures ................................................................................................................... xv
Chapter 1: Introduction ...................................................................................................... 1
Chapter 2: “The Effect of Governance Form on Replenishment Forecast Deviation and
Bias: A Case Study in the Quick-Service Restaurant Industry” ........................................ 7
Introduction ..................................................................................................................... 7
Research Context............................................................................................................. 9
Literature Review .......................................................................................................... 13
Transaction Costs ...................................................................................................... 13
Resource Scarcity ...................................................................................................... 14
Agency Costs or Administrative Efficiency ............................................................... 15
vii
Post-Contractual Performance.................................................................................. 19
Hypotheses .................................................................................................................... 21
Model Development ...................................................................................................... 24
Data Collection and Analysis .................................................................................... 24
Variable Evaluation and Selection ............................................................................ 26
Training Data Set Formation .................................................................................... 28
Data Processing ........................................................................................................ 30
Data Exploration ....................................................................................................... 32
A Novel Regression-Based Approach ........................................................................ 34
Accounting for Cases of Non-Deviation .................................................................... 38
Outlier Identification ................................................................................................. 43
Results ........................................................................................................................... 44
Deviation Model ........................................................................................................ 45
Bias Model ................................................................................................................. 47
Discussion ..................................................................................................................... 49
Effects of Governance Form ...................................................................................... 50
Extensions to the Post-Contractual Effects from Governance Form ........................ 51
Methodological Implications ..................................................................................... 52
Limitations .................................................................................................................... 55
viii
Conclusions ................................................................................................................... 56
Chapter 3: “The Impact of Including Forward Indicators on POS Demand Forecast
Accuracy: The Case of Short-Term Weather Forecast Data” .......................................... 58
Introduction ................................................................................................................... 58
Literature Review .......................................................................................................... 63
Observed Weather’s Effect on Demand..................................................................... 63
Weather Effects on Retail and Service Sectors .......................................................... 64
Explanations for Consumer Behavior ....................................................................... 67
Problems with Using Observed Weather as a Proxy for Weather Forecasts ........... 68
Advances in Weather Forecasting ............................................................................. 69
Forecasted Weather to Predict Demand ................................................................... 71
Data ............................................................................................................................... 73
Hypotheses .................................................................................................................... 79
Methodology ................................................................................................................. 81
Autoregressive Models .............................................................................................. 81
Models for Comparison ............................................................................................. 85
Models Including Exogenous Observed and Predicted Weather Indicators ............. 86
Forecast Evaluation .................................................................................................. 88
Addressing Computational Scale............................................................................... 89
ix
Output Analysis.......................................................................................................... 90
Results ........................................................................................................................... 93
Discussion ................................................................................................................... 107
Specification Errors ................................................................................................. 107
Confounding ............................................................................................................ 109
Weather Forecast Reliability ................................................................................... 109
Differential Effect between Demand Forecast Quality Measures ........................... 111
Limitations .................................................................................................................. 112
Managerial Implications .............................................................................................. 113
Future Research ........................................................................................................... 114
Chapter 4: “A Systematic Literature Review and Typology of Factors that Bound
Demand Forecast Accuracy” .......................................................................................... 117
Introduction ................................................................................................................. 117
Defining Accuracy in Logistics and Supply Chain Management Research ............... 118
Basic Overview of Systematic Literature Review ...................................................... 120
Criteria for Inclusion in the Literature Review ........................................................... 121
Topics for the Literature Search .............................................................................. 121
Scoping the Literature Search ................................................................................. 126
Individual Article Criteria for the Literature Search .............................................. 129
x
Collecting Relevant Research and Applying Search Criteria ..................................... 130
Overview of Thematic Findings .................................................................................. 132
Technical Drivers of Forecast Accuracy Bounds ........................................................ 137
Forecastability ......................................................................................................... 137
Horizon .................................................................................................................... 141
Overfitting and Misspecification ............................................................................. 143
Tradeoffs of Metrics................................................................................................. 144
Level of Aggregation and Hierarchy ....................................................................... 152
Data Quality and Availability.................................................................................. 156
Managerial Drivers of Forecast Accuracy Bounds ..................................................... 160
Error Amplification ................................................................................................. 160
Cost Tradeoffs.......................................................................................................... 163
Supply Chain Integration......................................................................................... 173
Supply Chain Flexibility .......................................................................................... 178
Manual Adjustments ................................................................................................ 180
Risk .......................................................................................................................... 182
Production and Inventory Control Policy ............................................................... 185
Limitations .................................................................................................................. 192
Conclusion................................................................................................................... 192
xi
Chapter 5: Conclusions ................................................................................................... 197
Appendix A: Boolean Keywords for EBSCO Business Source Complete..................... 233
xii
List of Tables
Table 1: Predicted Weather Indicator Point Forecast Reliability ..................................... 76
Table 2: Predicted Weather Indicator Probability Forecast Reliability ............................ 77
Table 3: Restaurant Sample by NOAA Region ............................................................... 93
Table 4: Mean Demand Forecast Quality Measures by Included Exogenous Weather
Variable ............................................................................................................................. 90
Table 5: Mean Demand Forecast Quality Measures by NOAA Region for Evaluation
Models............................................................................................................................... 91
Table 6: Mean Demand Forecast Quality Measures by Menu Category for Evaluation
Models............................................................................................................................... 91
Table 7: Regression Effects of Observed Weather in Demand Forecasts on Forecast
Quality Measures .............................................................................................................. 95
Table 8: Regression Effects of Predicted Weather in Demand Forecasts on Forecast
Quality Measures .............................................................................................................. 95
Table 9: Summary of Hypothesis Test Results ............................................................... 106
Table 10: Journals Included in Initial Keyword Search................................................. 129
Table 11: Keyword Search Results in Initial and Cascaded Search .............................. 132
Table 12: Types of Analysis in Article Sample ............................................................. 133
Table 13: Forecast Accuracy Measures Explored in Article Sample ............................ 134
xiii
Table 14: Focal Supply Chain Entities in Article Sample ............................................. 135
Table 15: Technical and Managerial Drivers of Forecast Accuracy Bounds ................ 136
Table 16: Summary of Technical Drivers of Demand Forecast Accuracy ..................... 159
Table 17: Summary of Managerial Drivers of Demand Forecast Accuracy .................. 191
xiv
List of Figures
Figure 1: Restaurant Firm Data Flows .............................................................................. 10
Figure 2: Prevalence of Deviation Existence by Governance Form ................................. 41
Figure 3: NOAA Climate Regions ................................................................................... 92
xv
Chapter 1: Introduction
The origin of this research is a collaboration with a fourth party logistics (4PL)
provider for a large multinational quick service restaurant firm. Through the Global
Supply Chain Forum at The Ohio State University, we met with the 4PL and discussed
research opportunities. The firm had been experiencing three related issues in their
demand planning. After discussing their concerns, we determined that their issues
coincided with under-investigated themes in logistics research all relating to demand
forecast accuracy. These three issues were then transformed into three sets of research
questions with the aim of creating a better understanding of demand forecast accuracy.
The following research is divided into three essays. Each essay relates to a distinct, but
related aspect of forecast accuracy.
The first essay stems from the first question posed by the 4PL: “What is causing
our replenishment forecast error?” This question relates to the way that the restaurant
firm generated replenishment forecasts. The 4PL first collaborates with their restaurant
firm client to generate a demand forecast for each menu item at each restaurant. This
occurs each week on a rolling basis. Each restaurant location is given the estimate for
demand for the upcoming week, and is responsible to purchase the raw materials required
to meet the predicted demand.
1
To inform this process, the 4PL generates a forecast for bi-weekly replenishment
derived from their own demand forecasts. This estimate includes safety stock and waste
factors peculiar to each restaurant location. For instance, lead time from a distributor and
available storage space may affect how much can be stored at one time, and how much
may be discarded at each restaurant. By policy, each restaurant receives a supply
shipment twice each week from one of the firm’s distribution partners. The individual
restaurant, as the entity most affected by errors in a replenishment forecast and
presumably in the best position to understand local conditions not captured in a statistical
forecast, is given the final say on replenishment orders.
The problem that arises is that while restaurant order edits may help the individual
restaurant, they introduce variance upstream in the supply chain. Distributors, who also
receive predicted replenishment orders from the 4PL, have to react to changes in
replenishment orders initiated at the restaurant level in order to meet contractually
mandated service levels. This increase in variance increases required levels of safety
stock at distribution centers, as most often distributors have already placed orders with
their suppliers by the time restaurant replenishment orders are made. This is costly not
only because inventory and warehousing costs increase, but as the products are largely
perishable, increased stock levels also increase product loss to spoilage.
In our first essay, we examine the potential reasons why restaurants edit their
orders in a data exploration case study. We frame the investigation around the theme of
agency costs arising from governance form. Previous works (Brickley and Dark 1987,
Norton 1988a, Bertagnoli 1989, Krueger 1990, Carney and Gedajlovic 1991, Kaufmann
2
and Lafontaine 1994, Michael 2000, Yin and Zajac 2004, de Leeuw, Holweg and
Williams 2011) indicate franchise owners are more likely than corporate outlet managers
to act independently of the best interests of the parent franchisor firm, and consistent with
local incentives. Profit motivated franchise owners are also more likely than revenue
motivated corporate outlet managers not to shirk, more actively monitor costs, and waste
less (Rubin 1978, Krueger 1990, Noren 1990, Norton 1988a, Norton 1988b).
This first work, titled “The Effect of Governance Form on Replenishment
Forecast Deviation and Bias: A Case Study in the Quick-Service Restaurant Industry”,
promises to answer the 4PL firm’s question of what may be driving their replenishment
forecast error. In it, we simultaneously address a deficiency in management literature on
the post-contractual operational effects of utilizing the franchise governance form.
The second question posed by the 4PL, and the genesis of our second essay, is:
“What predictive factors can help improve our demand forecast accuracy?” The firm was
eager to incorporate additional information into their demand forecasts, but were unsure
of what information was valid to include, and what effect it may have on their forecast
accuracy.
The firm’s demand forecasts, as described above, are generated once each week
and cover two replenishment periods. To be predictive, information must be available to
the decision maker prior to a decision being made. This means that demand planners at
the 4PL must have knowledge of events up to and including one week in the future in
order to effectively include the information in a demand forecast.
3
We frame our second essay around the inclusion of uncertain information in a
demand forecast, and our prediction for if and how consumers will respond to these
external factors. For predictive information to hold value, it must be sufficiently reliable
over a forecast horizon, and managers must be able to make material changes as a result
of the information (Thompson and Brier 1955, Thompson 1962, Murphy 1977, Katz and
Murphy 1990, Katz and Lazo 2011). One readily available source of predicted future
information is that of short term weather. Weather sensitivity research, though well
established in fields like agriculture and mining, is lacking in restaurant and service
industries (Lazo et al. 2011, Bujisic et al. 2016). Work that incorporates weather
forecasts into demand forecasts is similarly sparse, but also currently extant only in
industries unrelated to the focal firm.
In our second essay, titled: “The Impact of Including Forward Indicators on POS
Demand Forecast Accuracy: The Case of Short-Term Weather Forecast Data”, we
measure the effect of including predicted weather on demand forecast accuracy. We
predict demand effects are driven by consumer mood and utility (Steele 1951, Starr-
McCluer 2000, Tran 2016), and that this effect is heterogeneous across regions and
product categories.
This not only answers the 4PL’s question of whether this particular set of forward
indicators can improve their demand forecasts, but also addresses two deficiencies in
management literature. Namely, it more systematically answers what effect does weather
have on demand in the restaurant industry, and what are the positive or deleterious effects
of using uncertain weather information to generate demand forecasts.
4
The third question from the 4PL firm, “How good is good enough?”, is the origin
of our third essay. By “good enough”, they mean that they wish to know when the
pursuit of greater accuracy in demand and replenishment forecasts is a lost cause. We
recognized that there does not appear to be any work that addresses this question
holistically, and endeavored to map out a current understanding of all potential bounds
for demand forecast accuracy.
To this end, we sought to answer this question through a systematic review of
logistics and supply chain management literature. In our third essay, titled: “A Literature
Review and Typology of Factors that Bound Demand Forecast Accuracy”, we explore
relevant supply chain and logistics academic journals in an attempt to identify a typology
of factors that form tradeoffs in forecast accuracy.
Relevant search topics aim to identify conditions that drive requirements and
capabilities for greater forecast accuracy, but also situations when it is only possible or
indeed desirable to accept lower levels of forecast accuracy. As the voluminous literature
on demand forecasting can be quite diverse in focus, it was imperative that we properly
scoped our search to those topics most relevant to a logistics or supply chain setting.
General conditions for consideration included the effects of error signal amplification
through a supply chain, cost tradeoffs of information gathering and processing, level of
aggregation, impact of manual forecast adjustments, effect of information sharing, effect
of accuracy metrics used, effects of overfitting, degree of supply chain integration and
collaboration, length of lead time and forecast horizon, the impact of product substitution,
supply chain flexibility and resilience, and effect of demand variability. Together, this
5
body of literature on tradeoffs in various dimensions of forecast accuracy is formed into a
general typology. Such a framework can be used to inform future research defining
individual tradeoffs as well as the interplay between multiple factors, and the identified
types that require further investigation for full understanding are identified. This
typology also serves as a self-assessment tool for the 4PL, addressing their third question.
In answering the three questions posed by our 4PL research partner: “What is
causing our replenishment forecast error?”, “What predictive factors can help improve
our demand forecast accuracy?”, and “How good is good enough?”, we also fill shortfalls
that currently exist in management literature in three dimensions. Our first essay
explores factors that drive error in upstream forecasts, framed around governance form
effects on post-contractual performance. Our second essay examines the effect of
including uncertain predictive factors in demand forecasts, framed on information
uncertainty and heterogeneous consumer response. Our final essay reviews the current
academic understanding of factors that form trade-offs in demand forecast accuracy, and
proposes a typology for future research development. All three essays include forecast
accuracy as their focal construct, and contribute to a better understanding of factors that
affect it.
6
Chapter 2: “The Effect of Governance Form on Replenishment Forecast Deviation and
Bias: A Case Study in the Quick-Service Restaurant Industry”
Introduction
The strategic decision to outsource the rights to use one’s business model and
brand to an outside party through franchise agreements is one that is made by many
different types of firms. As of the 2012 census, franchisors account for 12.1% (560,086)
of all establishments and 17.4% ($1.454 Trillion) of all revenue among U.S.-based
businesses (Census Bureau 2012). As a result, researchers have developed a body of
literature focused on the antecedents of and causes for the managerial decision to pursue
this business approach. In the process they have identified firm, market, and situational
characteristics that significantly contribute to the ex-ante development of franchise
governance form. However to date, an examination of the ex-post supply chain
performance characteristics of firms pursuing this business model is lacking.
Franchise agreements are developed with the intent of maximizing realized long-
term profits to a parent firm. Management literature to date has identified multiple
sources of potential drivers for pursuing a franchise business model, and the vast majority
of this research has been dedicated to identifying the significant antecedents of a firm’s
decision to implement this organizational form. But how might the use of the franchise
governance form influence the supply chain performance of the firm after a contract is in
place? In their comprehensive review of recent management literature on franchising,

7
Combs et al. (2010) note a dearth of “operations management” franchising research.
Specifically, they identify no articles exploring how parent firms can create value for
their franchisee clients (and vice versa) via supply chain activities. de Leeuw et al.
(2011) examined inventory performance in franchised business units in the automotive
industry. While they did not directly compare forms of governance, they did observe
differential performance between distributed outlets with decentralized inventory control.
The heterogeneity stemmed from incentives unique to each outlet. As a result, they call
for research comparing inventory policies in devolved supply chains, which most notably
includes firms employing the franchise governance form.
In an effort to understand the operational supply chain implications of pursuing a
franchise model, we measure whether governance form has a significant effect on
replenishment forecast deviation and bias beyond what can be explained by other factors.
We examine rates of deviation and bias in the replenishment forecast of a focal firm that
utilizes both company-owned and franchised outlets, and discuss the ramifications of
governance form with respect to outlet inventory and replenishment policy. To
accomplish these research goals, we collected distribution center level replenishment
forecast and inventory data from a major U.S.-based restaurant chain that utilizes
company-owned and franchised retail outlets. The analysis will identify whether
franchised outlets experience higher levels of replenishment forecast deviation and
whether this deviation has a greater negative bias than their company-owned
counterparts, when controlling for other factors. We will observe whether these results
are consistent with extant management literature on governance form as a result of
8
agency costs. Theoretical and managerial implications of these findings will then be
described.
Research Context
The origins of this study stem from an ongoing relationship with corporate
members of a university-affiliated research group. Through the group, we started
working with the primary fourth party logistics (4PL) provider for a major international
quick service restaurant to develop research questions that simultaneously address
relevant theoretical management issues and solve multiple supply chain issues their major
restaurant customer was experiencing. One such problem was significant order deviation
by individual restaurant outlets in their centrally developed replenishment forecasts.
Managers and owners at individual restaurant locations are permitted the freedom to
deviate from centrally planned order levels due to the likelihood that they would have
greater knowledge of local conditions that would drive demand. While this policy is
likely to improve operations at the outlet level, deviations drive uncertainty in
replenishment and are viewed as error by distributors.
The restaurant firm in our research operates almost 15,000 retail outlets
domestically, of which more than 80% are contracted to franchise companies. The firm
utilizes the aforementioned 4PL to centrally develop sales forecasts and plan
replenishment for all restaurant outlets, and five third party logistics (3PL) companies
provide warehousing and distribution services for the restaurant chain. The restaurant
chain manages national and regional promotions as part of their demand planning
process, but each outlet may add local promotional activity that tends to be underreported
9
and can impact centralized forecast accuracy. All involved firms, i.e. the restaurant firm,
its 4PL provider, its 3PL providers, and the franchise companies, utilize a management
information system (MIS) curated by the 4PL. Each entity’s relevant MIS data are
visible to at least the adjacent link in the supply chain. Figure 1 illustrates the relevant
data flows within and between firms.
Figure 1: Restaurant Firm Data Flows
With this complex replenishment process involving so many parties, we had to
determine our best course of action to identify potential causes of deviation in the
replenishment forecast. Interviews within the 4PL firm, and with one 3PL provider
10
indicated that demand information asymmetry (Lee et al. 2000, Cachon and Fisher 2000)
was not likely a cause of deviation, as multiple echelons share data that are updated daily.
Nor were delays in information (Chen 1999), as MIS data are updated daily reflecting
both distribution center (DC) and restaurant inventory levels, as well as the projected
effect of daily orders. Anecdotally, data entry errors, particularly at the individual
restaurant level introduce some noise in the replenishment process and may lead to
deviation, but this did not appear to be a primary source. Interviews with the restaurant
firm’s promotions team, as well as with one 3PL distribution team, indicated restaurant
governance form could be a source of deviation. Specifically, these teams perceived
franchised outlets to exhibit higher levels of deviation and a more positive bias in their
deviations than the restaurant company-owned outlets. The restaurant firm’s financial
disclosures corroborated this account by indicating outlet governance form was a
significant operational risk factor, as franchised locations were seen as more likely to
deviate from corporate standard procedure.
The logic in permitting both franchised and company-owned outlets the discretion
to edit the centrally developed replenishment forecasts is that outlet-level management
are more likely to have an understanding of local demand conditions (including locally
planned promotions) beyond what is captured in the statistical forecast. This notion is
heavily supported in forecasting literature (Fildes et al. 2008, Syntetos, Boylan and
Disney 2009, Eroglu and Knemeyer 2010, Eroglu and Croxton, 2010), and subjective
adjustments are often incorporated in the demand planning process. However, any order
from the outlet that differs from the forecast replenishment amount constitutes a deviation
11
that must be accounted for with buffer stock at the DC (Zinn et al. 1989, Gardner 1990).
Each 3PL-operated DC receives a recommended buffer stock level from the 4PL
planning firm, but has the discretion to establish their own stock levels. These DC
operators are contractually obligated to meet service level requirements of all company-
owned and franchised outlets, and are responsible for any charges incurred from stock-
outs and emergency shipments. The result of deviation by the restaurant outlet operator
is that the restaurant firm’s supply chain holds excessive system-wide inventory across
echelons. In addition to incurring unnecessary holding costs, many stock keeping units
are perishable or tied to a finite promotion, so some portion of excess inventory will
constitute an irrecoverable loss. DC operators will inevitably pass these costs on to their
supply chain partners.
Replenishment forecasts are currently developed in the same way regardless of
governance form. While this may lower the cost and complexity of demand management
for the firm, management literature on governance form suggests that franchisees may
behave differently when it comes to replenishment, which contributes to additional
inventory carrying costs at the next higher echelon in the supply chain (de Leeuw et al.
2011). As residual claimants of their local business unit, franchisees are more likely than
their firm-employed managerial counterpart to behave in a manner that maximizes their
local interests (Lafontaine 1992, Michael 2000, Kaufmann and Lafontaine 1994, Shelton
1967, Krueger 1991, Brickley and Dark 1987, Rubin 1978). To date, studies have used
aggregated factors to describe ex-ante determinants of organizational form. This research
12
is positioned to identify specific ex-post performance effects of the choice of
organizational form.
Literature Review
In order to understand the franchising governance form and the performance
implications of its adoption, we first examine extant literature on governance form.
Many possible theoretical explanations for the strategic decision to pursue a franchise
approach have been posited; most prominently transaction costs, resource scarcity and
agency costs. We review each briefly, but derive our primary theoretical motivation from
agency theory.
Transaction Costs
Transaction cost theorists see franchisor firms as an interstice or hybrid of pure
vertical integration and the open market (Rubin 1978, Norton 1988a, Norton 1988b).
Caves and Murphy II (1976) describe franchising as an “intracorporate management of
decentralized units”. Ex-ante investment in a franchise by an entity legally separate from
the firm acts as a “hostage” in Williamsonian terms, bridging the gap between a pure
market and internal corporate transactions (Noren 1990). While most authors agree that
this hybrid form exists, the argument for causal mechanisms that drive the level of firm-
like or market-like characteristics of this organization form, as well as the extent to which
a firm chooses to franchise their outlets tends to split between a resource based view and
an administrative efficiency view.
13
Resource Scarcity
The idea that a firm chooses to franchise its outlets as a result of a lack of access
to resources, which are then supplied by franchisees, was originated by Oxenfeldt and
Kelly (1969), and further asserted by Caves and Murphy II (1976). Rooted in the
resource based theory of the firm, franchising allows for firm growth even when
resources for expansion are scarce. Rapid growth is viewed as desirable, as it allows a
firm to build brand capital, take advantage of a market opportunity to seize a larger share
of total demand, or achieve economies of scale in promotion and production (Carney and
Gedajlovic 1991). Growth of a firm permits further acquisition and dynamic
replenishment of the available pool of productive resources (Penrose 1959), and is
limited primarily by the market, capital resources and managerial resources.
Due to the problem of adverse selection, in which lenders have inadequate cues to
evaluate the availability of alternatives, Combs and Ketchen (1999) assert franchising can
alleviate the problem of resource scarcity by giving firms access to capital that can be
less costly than capital from debt and equity markets. Firms may also access additional
entrepreneurial capacity by changing investors into managers. Franchising transfers
increased residual ownership and risk to managers, eliminating the need for a loan or
capital market investment while controlling net monitoring (or those associated with
observing and controlling actions of outlet managers) costs (Norton 1988a). Carney and
Gedajlovic (1991) and Mathewson and Winter (1985) both support this resource scarcity
view, but in an effect moderated by administrative efficiency and stage of firm
development.
14
However, multiple studies refute the logical development of a resource scarcity
based motivation for franchising. Franchising as a capital scarcity expedient is countered
with the proposition that passive investment is diversified over all of a firm’s outlets,
presumably demanding a lower return on this lower risk (Rubin 1978, Brickley and Dark
1987, Lafontaine 1992). Detractors found either no significant effect from scarcity on
organizational form, or found that effect to be dominated or heavily moderated by the
effect from agency costs (Rubin 1978, Brickley and Dark 1987). In a meta-analysis of 44
previous studies on the causes of governance form choice, Combs and Ketchen (2003)
conclusively redirect the argument, finding no significant resource scarcity drivers for the
choice of organizational form. They did, however, find numerous significant agency
costs that may help explain the choice to franchise.
Agency Costs or Administrative Efficiency
For this research, the most relevant theoretical explanation for the determination
of governance form is agency theory, which describes the two-sided moral hazard that
exists between principals (firm owners) and agents (firm employees). In the context of
this research, this hazard exists between the parent restaurant firm central planners and
outlet owners or managers. Both work toward their own self-interest where possible, but
agency loss can be mitigated through bonding by the agent and monitoring by the
principal. Franchising is a hybrid of these two control mechanisms (Brickley and Dark
1987).
Franchise bonding occurs by transferring residual risk and claim to a legally
(though not practically) separate firm (Rubin 1978). This is typically in the form of an
15
ex-ante investment by the franchisee, usually as a franchising fee and a non-returnable
real investment, who in turn receives excess rents as the principal of their own firm.
Bonding may also include elements beyond simple claims on expected profit; such as
supernormal ex-ante and ex-post rents left unclaimed in franchise contracts (Kaufmann
and Lafontaine 1994), expectation for contract renewal, or promise of additional lucrative
contract award (Bertagnoli 1989).
This method of ex-ante bonding investment is intended to capture all of the
expected excess profits for the parent firm after discounting a reasonable return to the
franchisee, monitoring costs and residual agency costs such as those related to shirking.
Due to bounded rationality (or the inability of the parent firm to predict all future profits
with certainty), incomplete franchise agreements (Mathewson and Winter 1985)
necessitate ex-post royalties that are some percent of revenue (Caves and Murphy II
1976, Rubin 1978, Noren 1990, Combs and Ketchen 2003) in addition to the initial fee.
Bonding is a less direct means of minimizing the two-sided moral hazard, in
which it is difficult or expensive for the principal and agent to observe the other’s actions,
by both eliminating the agent’s incentive to shirk (Bertagnoli 1989, Noren 1990, Krueger
1991, Carney and Gedajlovic 1991, Michael 2000) and ensuring the principal’s continued
assistance and interest in the outlet’s performance (Rubin 1978, Combs and Ketchen
2003).
Post contractual monitoring is the primary mechanism designed to eliminate the
remaining moral hazard after bonding costs are established in the contract (Rubin 1978,
Fama and Jensen 1983, Yin and Zajac 2004, Paik and Choi 2007). Monitoring refers to
16
continued costs related to observing and controlling agent actions after contract formation
and bonding, which could include formation and dissemination of corporate policies to
outlets, or travel and technology costs to observe outlet behavior. This is a more direct
approach that is effective when there is direct coercive control and outlets are easily
accessible. Where monitoring is relatively more expensive, such as when outlets are at
greater distances from a corporate headquarters or in a market with unknown or transient
conditions, bonding is substituted (Fama and Jensen 1983, Norton 1988a, Norton 1988b,
Lafontaine 1992, Combs and Ketchen 2003). Monitoring costs are found to be more
significant among franchised outlets (Brickley and Dark 1987, Norton 1988a, Norton
1988b, Lafontaine 1992, Combs and Ketchen 2003), but some elements of cost are
mitigated in recent years by advances in telecommunications and MIS (Yin and Zajac
2004, Cochet et al. 2008).
After implementing both control mechanisms, there remains a residual loss as a
result of self-interested actions that could not be controlled (Mahoney 2000, Jensen and
Meckling 1976). Firms choose the franchise governance form more often for firms with
high monitoring costs, and in doing so substitute bonding costs to a greater extent.
Therefore, residual agency loss among franchised units is more likely to come as a result
of inefficient bonding than inefficient monitoring. We should expect the manner of
royalty collection between the two governance forms to reflect this difference in
significance.
Managers of firm-owned outlets are typically compensated on a fixed or revenue-
based scale, so additional costs from inefficiencies, shirking and perquisite taking do not
17
significantly affect their personal income. This revenue basis of evaluation is due to the
fact that it is difficult for parent firms to attribute drivers of revenue or cost to actions of a
manager independent of the firm. Franchisees, on the other hand, typically pay royalties
based on revenue and claim any realized profit (Rubin 1978, Krueger 1991). This
increased claim comes with increased risk. Franchisees, who may have large proportions
of their personal wealth tied up in an outlet, and may be restricted in their property rights
to quickly sell assets (Noren 1990), directly feel any costs from inefficiencies. Given this
greater risk experienced by franchisees, they have a greater motivation than managers at
corporate outlets to actively monitor their own operations. An example of this is in labor
costs. Franchise governance form is positively related to cost structures with large labor
components (Norton 1988a, Norton 1988b). Presumably as a result of more engaged
management, franchise employees are found to report higher levels adequate managerial
supervision, and outlets experience lower rates of perquisite taking, lower mean wage,
and lower rate of wage increase (Krueger 1991).
Though residual loss components are more prevalent in choosing the franchise
model, inefficient risk bearing, free ridership, and quasi rent appropriation drive the use
of the corporate owned outlet. Increasing risk (and thereby interest and effort) of the
franchisee may cause inefficient risk bearing, or the unwillingness to make optimal
investments as a result of heavy undiversified investment in a single outlet. This cost is
found to be more significant in choosing the franchise governance form (Brickley and
Dark 1987, Carney and Gedajlovic 1991). Underinvestment may also relate to what is
known as the externality or free rider problem, where the franchisee bears less risk from
18
underinvesting in advertising, customer experience, and overall quality (Rubin 1978,
Mathewson and Winter 1985, Brickley and Dark 1987, Carney and Gedajlovic 1991,
Michael 2000). This effect is found to manifest among franchisees of a parent firm with
high brand value who receive a significant portion of their patronage from customers
unlikely to return, as is the case with business catering to travelers (Norton 1988b),
though this also has contrapositive examples (Brickley and Dark 1987). Free riding loss
can be based on brand value of the parent firm, or on local reputation (Mathewson and
Winter 1985). Finally residual loss may include quasi-rent appropriation (Brickley and
Dark 1987, Carney and Gedajlovic 1991, Michael 2000). In this case, franchisees may
own assets (as part of the business) that hold higher value with alternative uses, and use
this as leverage in contract renegotiation. Alternatively franchisees may demand lower
initial fees or royalty rates for higher levels of asset specificity.
Post-Contractual Performance
The majority of franchising research is focused on the ex-ante determination of
governance form. However, some more recent work examines the post-contractual
operational performance implications of the franchise governance form. The opportunity
for the current research lies in the ex-ante incompleteness of contracts and the inability to
perfectly monitor ex-post, allowing for residual agency loss. The agency loss
component, found to be most prevalent in choosing the franchise outlet form, also serves
as the basis for explaining ex-post performance. Existing research on ex-post
performance takes on three dimensions; plural form, relational governance, and the fit of
strategy to organizational form.
19
The first dimension of franchise performance research is the limits to adoption of
organizational form. In other words, firm performance is observed from the perspective
of plural form, or one where neither franchised nor firm-owned outlets completely
dominate. Research on choice of organizational form describe a plural arrangement, but
only as a result of balanced agency factors, transaction costs, or resource limitations.
This economic equilibrium view does not recognize that under the plural form, a firm is
able to symbiotically exploit unique performance characteristics of both firm-owned and
franchised outlets. The plural arrangement permits system-wide uniformity, but also
adaptation to local opportunities (Bradach 1997, Yin and Zajac 2004). Through
performance benchmarking referred to as ratcheting, both organizational forms benefit
from the advantages of the other (Bradach 1997). Managers at firm-owned outlets,
motivated by promotion within a corporate hierarchy, are incented to maintain standards
and contribute to brand value. Conversely, franchisees have more leeway to try new
strategies (Rubin 1978, Bradach 1997, Yin and Zajac 2004, Paik and Choi 2007), and are
motivated by residual claims to innovate. More precisely, franchisees are more willing to
deviate from corporate guidance if they believe it will contribute to their outlet’s profit
(Brickley and Dark 1987, Norton 1988a, Bertagnoli 1989, Krueger 1991, Carney and
Gedajlovic 1991, Kaufmann and Lafontaine 1994, Michael 2000, Yin and Zajac 2004, de
Leeuw, Holweg and Williams 2011). The parent firm, monitoring via integrated
information systems, has the capability to benchmark franchisee performance and adapt
corporate strategy if franchisee deviations prove effective. Exploiting the advantage of
franchisee adaptability can then be achieved by fiat at corporate owned outlets.
20
If, however, the parent firm observes superior performance at corporate owned
outlets relative to franchised outlets, it has limited power to coerce franchised outlets to
adopt the effective policy. Contracts permit greater freedom to franchisees, and
monitoring is relatively more expensive for franchisees. This introduces the second
dimension of post-contractual performance, relational governance. Marketing channel
theorists posit that non-coercive interactions are more effective at influencing behavior
with corporate partners that parent firms have limited contractual influence over (Paik
and Choi 2007, Cochet et al. 2008). Bradach (1997) supports the notion that persuasion
is far more effective than threat of contract termination or even monitoring for franchised
units. High performance of corporate outlets can be a persuasive tool to convince
franchisees to similarly uphold standards.
The third dimension of post-contractual performance is fit of form and strategy.
Contingency theorists observe that the choice of plural form exists as a result of matching
compatible strategies and outlet forms (Bradach 1997, Yin and Zajac 2004, Barthélemy
2008). Outlets have agency costs that are more significant to each form, and advantages
that are specific to their form. Performance depends on developing a strategy that
minimized the form specific residual loss and maximizes the advantages of the plural
form.
Hypotheses
The literature predicting governance form informs the extension to the effect of
governance form on operational performance. By determining sources of residual agency
loss, we can suggest remedies tailored for the conditions of incomplete control as in the
21
franchise contract model. In this way, we will contribute to research on post contract
performance.
Research on the agency theoretical drivers of governance form and more recent
work on performance implications of franchising indicate that franchisees, as residual
claimants of their local business unit have incentives that focus on local profit rather than
the goals of the parent firm and third party distributors, and as a result are more likely to
act independently of the franchisor (Brickley and Dark 1987, Norton 1988a, Bertagnoli
1989, Krueger 1991, Carney and Gedajlovic 1991, Kaufmann and Lafontaine 1994,
Michael 2000, Yin and Zajac 2004, de Leeuw, Holweg and Williams 2011). Despite the
limited previous research on the performance outcomes of franchising focus on
aggregated external business measures and not internal indicators, this notion is
supported by discussions with the 4PL firm and 3PL DC operator. Therefore, we
hypothesize that franchised outlet owners are more likely to order supplies in quantities
different from the forecasted replenishment.
H1: Replenishment forecast deviation from proposed order levels by a restaurant

will be relatively higher among restaurants owned by a franchisee than those
corporately owned.
Similarly, franchising researchers have indicated that franchisees are motivated by
profit and not revenue like their corporate manager counterparts. This gives them
incentive not to shirk, more actively monitor costs, and waste less (Rubin 1978, Norton
1988a, Norton 1988b, Noren 1990, Krueger 1991). This would imply bias in
replenishment forecast adjustments would be negative, and helps the local performance
22
of a restaurant outlet. However, there are alternative explanations for this hypothesis, and
potential for a competing hypothesis.
An alternative explanation for negative bias supported by prior literature is that of
incomplete accounting for residual agency costs. Inefficient risk bearing by franchisees
could include inventory underinvestment in addition to the more traditional concerns of
capital and marketing underinvestment (Brickley and Dark 1987, Carney and Gedajlovic
1991). Free ridership by franchisees could also be an explanation for negative bias in
cases where repeat patronage is low or if local competition is not significant. In these
cases, negative bias would reflect scarcity rather than efficiency, and hurt local
performance. The effect for upstream distributors would be the same, however, so we
include this as merely an alternative explanation for our existing hypothesis rather than a
separate competing hypothesis. If a franchise owner observes a centrally developed
forecast higher than their own perceived knowledge of projected demand, they are more
likely than their corporate manager counterpart to revise their replenishment order
downward.
H2: Bias in replenishment forecast deviations will be relatively more negative

among restaurants owned by a franchisee than those corporately owned.
A competing hypothesis arose from interviews with the focal restaurant firm’s
promotions team as well as from one 3PL distribution team. They indicate positive
expected bias in replenishment order deviations driven by a desire to minimize lost sales
by franchise owners. Prior literature could provide support for this view, as particularly
in cases where local competition is high (Cochet et al. 2008) and repeat patronage is
likely (Norton 1988b), service level and market share considerations could increase
23
positive replenishment order deviation bias among franchisees to the point of dominating
the negative effects described above. We do not measure levels of competition or travel
intensity (indicating degree of free ridership) in this work, so do not include this as a
formal hypothesis. This could, however, serve as a direction for future research if H2 is
not supported.
Model Development
To test these hypotheses, we built models to predict replenishment forecast
deviation and bias with the primary predictor being restaurant outlet governance form
(𝐺𝑂𝑉), defined and measured as the binary existence of a franchise contract at an outlet
(0 indicates a corporate restaurant and 1 indicates a franchisee owned restaurant).
Franchised locations are treated as equal, even if a single franchise owner operated
several restaurants.
Data Collection and Analysis
We began collecting data with the aim of gathering a sample representative of the
almost 15,000 distributed outlets and 8,100 stock units. In the population, there was the
potential for more than 120 million daily transactions to evaluate. Data also were stored
in multiple segregated repositories, and included a mixture of numeric and non-numeric
data. With this combination of volume, variety and velocity, our data were consistent
with the most widely used definition for “big data” (McAfee and Brynjolfsson 2012,
Megahed and Jones-Farmer 2013, Kitchin and McArdle 2016).
While the “bigness” of data provided tremendous opportunities to explore
unexpected relationships, it presented unique challenges and demanded a different
24
analytical approach. Aside from technical and computational limitations related to the
scale of data, described separately in a working paper (Smyth et al. 2017), there are two
interrelated inferential challenges in big data analysis. First, the increased size and
complexity of big data drive exponential increases in the prevalence of false positives
(Waller and Fawcett 2013a). That is, both unsupervised and supervised machine learning
algorithms identify (typically) correlative relationships that may have no practical or
theoretical meaning (known as epiphenomenality), may be confounded by other
unexplored factors (spuriousness), or if practically related, may have a causal implication
that is the exact reverse of reality (Darlington and Hayes 1990). The second challenge
arises when researchers treat big data as any other, and pursue traditional hypothesis
testing and analytic methods. Though theoretically grounded, the results will almost
certainly produce a so-called statistically significant result, as the statistical power of a
test increases with the size of data (Hair et al. 2006, Wooldridge 2015).
For these reasons, both Waller and Fawcett (2013b) and Cotteleer and Wan
(2016) suggest pairing theoretic grounding with big data exploration. A-priori theorizing
limits false positives in exploration, and exploration lends credibility to an observed
effect by providing necessary context. As such, we explored the additional relationships
in our data sample to augment the hypothesis testing of the effect of governance form on
replenishment forecast deviation and bias.
To accomplish this task, we required data that was not just “big”, but also
multidimensional. Only under this condition do complex and unexpected patterns arise.
As a result, our goal was to bring in enough relevant explanatory variables so we could
25
train a process of exploratory knowledge extraction on a representative subset, which
could then be applied in cases of larger data in an automated fashion. Data training is a
common approach when developing a supervised machine learning algorithm for labeled
data (Megahed and Jones-Farmer 2013), and is consistent with the Waller and Fawcett’s
(2013b) call to combine big data methods with theoretically grounded empirical research.
Variable Evaluation and Selection
Our outcome variables of interest, replenishment forecast deviation (𝐷𝐸𝑉 ∈ ℤ+ )
and bias (𝐵𝐼𝐴𝑆 ∈ ℤ), were both derived from individual replenishment transaction data
furnished by the 4PL. Replenishment orders are in units of cases of menu item
ingredients. Deviation is the absolute difference between recommended and ordered
amounts, and bias is simply the difference. Replenishment forecast deviation and bias
was internally tracked by stock keeping unit, individual restaurant outlet, and daily order.
Restaurant outlet-level replenishment forecasts and actual ordered quantity were
available from a single database.
In an effort to relate this research to the limited extant research on post-
contractual performance in firms with franchised or plural form, we also strove to
identify common predictors and control variables from extant literature. However, due to
the nature of the data in existing research, few commonalities emerged. Previous work
either relied on perceptual measures, did not measure governance form, were at a higher
level of aggregation, or measured information unavailable to us such as multiunit
ownership among franchisees (Bradach 1997, Paik and Choi 2007, Barthélemy 2008).
Only Yin and Zajac (2004) utilize governance form as a variable. Theirs is also the only
26
existing study that includes data that is directly measured, rather than perception-based.
As a result of this limited comparability, we derive control variables and later exploratory
constructs primarily from within the available temporal, geographic, and product-based
structure characteristics of the available transaction data. In their review of the supply
chain forecasting literature, Syntetos et al. (2016) note these grouping dimensions have
been shown to effectively account for demand variance, and multiple papers have utilized
each to find significant effects (Mentzer and Cox 1984, Fliedner and Lawrence 1995,
Zhao et al. 2002b, Zotteri et al. 2005, Williams and Waller 2011b, Rostami-Tabar et al.
2013, Jin et al. 2015, Moon 2015, Paul et al. 2015).
As each deviation represents some edit on the transaction between an individual
restaurant and their servicing DC, we included an indicator for which DC would fulfill
each order (and would be affected by each deviation). Restaurants are geographically
nested in a DC, in that each DC services all stores within a defined geographic area for all
products. Advertising is coordinated through a geographically organized group of
restaurants, called a cooperative, and since advertising patterns could have a great effect
on the variance of replenishment forecasts (and resulting deviations) for a product, we
need to track the advertising cooperative to which each store belongs. Stores are nested
in cooperatives much as they are in DCs, but DCs and advertising cooperatives are not
nested in or coincidental to each other. That is, a cooperative of restaurant outlets could
be serviced by multiple DCs, and vice versa. Cooperative membership is geographically
nested in a television market, which is then geographically nested in a region. Therefore
27
we included cooperative membership, television market, and region for each transaction
from a separate database.
As our measures of interest were scale dependent, we needed to include controls
for restaurant throughput. Historical usage (𝐻𝐼𝑆𝑇 ∈ ℝ+ ) was included as a scaler for the
level of consumption at a restaurant outlet over the previous replenishment period.
Restaurants with high historical usage tended to have higher deviation levels, but merely
as a function of higher throughput volume. The same is true for recommended order
quantity for the upcoming replenishment period (known as proposal amount or 𝑃𝑅𝑂𝑃, ∈
ℝ+ ), though in a manner somewhat independent of historical usage. For this reason, we
included both PROP and HIST.
Training Data Set Formation
As described in Megahed and Jones-Farmer (2013), model building in an
environment with large multidimensional data can quickly become overwhelmingly
complex. That is why they recommend training a model on data that is relatively simple,
yet can be expected to represent effects beyond the limited scope. When gathering a
sample to train a big data exploratory model, we had to strike a balance between
complete representation and tractability. If our sample was too large, we may get bogged
down by limitations relating to computational complexity. If our sample was too small,
we run the risk of identifying anomalous effects and limiting the general applicability of
our results. To achieve this balance, we limited the stock keeping units, date range, and
restaurants examined to be a representative sample of a wide range of menu offerings,
demand patterns, geographic regions, and seasons. The representativeness of our sample
28
was achieved through multiple discussions with the 4PL firm’s analysts. In this way, we
could develop a machine-operated process guided by domain knowledge. We could test
theoretically-based hypotheses on a limited range of data that can then be applied via a
supervised machine learning algorithm to the exhaustive set of transactions for use by the
restaurant firm and 4PL.
We included one full year to capture the seasonal effects and annual promotions
observable in the data. We also limited our sample to one year of data, as firm
forecasters were still in the process of refining their approach to restaurant replenishment
forecasts, and the process had changed in the previous two years. As a result, the types of
information collected before the identified range differed in nature and quality.
Following this period, increased competitive pressure caused the parent firm to make
significant menu changes. Therefore, data after the identified range differed in products
offered and in maturity of product demand. We selected 39 of over 8,100 stock keeping
units, representing multiple of the highest volume limited promotional items, perishable
refrigerated items, frozen items, fresh produce, meats, dry goods, condiments and paper
products. The restaurant firm’s 4PL indicated that this sample was representative of all
major demand patterns they experience in their forecasting. Similarly, we selected the
restaurants serviced by only 8 of 34 domestic DCs. This resulted in only 4,173
restaurants spread fairly evenly across the contiguous U.S., with the greatest
concentrations of outlets being in the greater Los Angeles, Chicago, Washington D.C.
and Dallas-Fort Worth metropolitan areas. Again, the firm’s 4PL partner indicated this
was a representative sample of DC locations and operators. The result was a more
29
manageable sample of 15.5 million observations, each representing an individual
transaction between a restaurant outlet and their servicing DC. Training samples would
then be randomly drawn from this pool, depending on the complexity requirements of the
analysis method, and retaining some portion of the data as a holdout or verification
sample.
Data Processing
As noted in Cotteleer and Wan (2016), rich datasets from a large company with
unintegrated databases can pose significant difficulties and require a tremendous amount
of time to process and understand, especially in the early stages of analysis.
Understanding the data meant understanding the business operations that drive data
generation; by whom, how, and for what reason each element of data was recorded. It
also meant reconciling inconsistent labeling, definitions and usages for data elements
between the various repositories we drew from. For this we are grateful to the steadfast
(and patient) support from the 4PL analysts and corporate liaison as we gained an
understanding of the vast pool of data we were analyzing. Their assistance was integral
as we assessed data quality, integrated data samples from each separate source, recoded
and parsed our data prior to our exploratory analysis.
Data entry errors, database anomalies and other intrinsic quality issues are
magnified in big data. As manual scrubbing of data for errors is not possible, we
calculated descriptive statistics for all numeric indicators, as well as factor summaries for
all non-numeric indicators to identify anomalies. We relied on the guidance of the 4PL’s
analysts to determine what constituted an “unreasonable” value in order to assess whether
30
a mathematical outlier must be cleansed from the data. If any null cells or unreasonable
values existed, the value was recoded (if evident the null indicates zero), or the
observation was eliminated (if an unexplainable value).
Integrating or merging data may also be more difficult with big data, as
information gathered for different purposes even within the same business may have
differing levels of aggregation, dissimilar time windows and date coding, and may have
no common features required for seamless merging. We merged our data by identifying
common factors in the multiple databases we drew samples from, making sure along the
way that definitions of these common factors were consistent. Often this entailed
incorporating data from unrelated repositories solely as a “cypher” linking data elements
we were interested in. This also required frequent recoding of date formats, stock unit
identifiers, and location codes in nested levels.
The process of merging multiple databases left many redundancies and irrelevant
indicators, so parsing was a necessary step. The remaining variables after parsing were a
replenishment forecast deviation and bias (DEV, BIAS) terms as the dependent variables,
governance form (GOV) indicator as the primary independent variable, the scaling
variables HIST and PROP, multiple nested multicategorical location indicators, date
indicator and multicategorical product indicators.
To numerically analyze multicategorical variables, they had to be transformed
into indicator variables for each category level. This increased the number of
independent indicators to be roughly the sum total of factor levels in each of the original
variables. We chose to use a weighted effects coding scheme, as each treatment
31
combination did not have equal sample sizes. In this scheme, the numeric value of the
indicator variable represents the deviation from the overall mean through membership in
a level of a multicategorical variable. Any multicategorical variable with g levels
becomes 𝑔 − 1 indicator variables (𝐷1 through 𝐷𝑔−1 ), where levels 1 through 𝑔 − 1 are
denoted by only one indicator variable taking the value of 1, and the rest 0. The
reference level g is denoted by all 𝑔 − 1 indicator variables taking the value of a
counterweight −ℎ𝑗 ⁄ℎ𝑔 , ∀𝑗 ∈ {1,2 … , 𝑔 − 1}, where ℎ𝑗 is the number of observations in
each treatment level 1 through 𝑔 − 1 and ℎ𝑔 is the number of observations in each
treatment level 𝑔 (Darlington and Hayes 1990, Cohen et al. 2013).
Data Exploration
After data processing was complete, we could begin exploring the sample for
contextual factors, clusters, or variables to better characterize the effect of governance
form on deviation and bias. Our collected data permitted exploration over a limited range
of products, restaurant outlets and dates, such that a more general pattern could emerge.
We attempted multiple extant data mining approaches as suggested in Hand et al. (2001),
Han et al. (2011), and Kuhn and Johnson (2013), with the intent of demonstrating
effective data grouping and reductive techniques scalable to the population, and
replicable in different contexts. These group constructs would then be included, along
with governance form indicators, in a regression model predicting forecast deviation and
bias. We had limited success with these approaches due primarily to the scale of our
dataset. The exploratory approaches we tried, and ultimately abandoned, are described in
more detail in a separate working paper (Smyth et al. 2017). A brief summary of our
32
efforts is listed below. The primary subdivisions among the exploratory approaches are
observation-based and covariance structure-based grouping mechanisms.
We employed the observation-based grouping mechanisms of k-means and both
agglomerative and divisive hierarchical clustering, but were unsuccessful for two reasons.
First, a distance matrix must be calculated between all n observations, thereby increasing
the temporary or random access storage requirements. For samples as small as one
million, storage requirements are on the order of terabytes or more (Buchholtz 1962);
well beyond the current capabilities of most computers. This meant that our training
sample size was limited in size. The second and even more hindering reality is that
observation-based methods require manual interpretation of a stable structure. When
attempted, the resultant clusters were neither stable nor interpretable, so we moved on to
covariance structure-based grouping mechanisms.
Factor analysis, our chosen covariance structure-based approach, had the
computational advantage of requiring less temporary memory (Sharma and Paliwal 2007)
than observation-based methods. This is because they identify structures that may exist
between m predictor variables, typically far fewer in number than n observations. This
method also has the advantage of established statistical indicators to aid selection of
factor models (Horn 1965, Velicer 1976, Zwick and Velicer 1986). Unfortunately as
with observation-based methods, structures were neither stable nor interpretable, and
each statistical indicator specified a different model.
The instability and uninterpretability of models developed by both observation-
based and covariance structure-based exploratory methods was likely due to complex and
33
unexplainable interactions of effects due to high dimensionality (Gu et al. 2012). In
response, we shifted our exploration to be based on a more parsimonious set of temporal,
geographic, and product-based structure characteristics in regression-based exploratory
modelling.
A Novel Regression-Based Approach
Making use of the hierarchical structure of the available data, we formulated a
process to predict replenishment forecast deviation and bias from governance form, as
well as temporal, geographic and product indicators. In this method of analysis, we only
estimated first order linear effects. While this omits the more complex interactions that
may exist between variables, it also greatly reduces the manual interpretation of the vast
number of variable combinations that are possible in higher order effects. By eschewing
interaction effects, we avoid one of the major pitfalls of massive and highly dimensional
data (Gu et al. 2012), and permit additional scalability of our model (National Research
Council 2013). In terms of computational parsimony, regression models estimated by
ordinary least squares (OLS) are highly scalable (providing that 𝑚 ≪ 𝑛, and Cohen et al.
2013 recommend 𝑛⁄𝑚 ≥ 40 for data-driven hierarchical analysis) when compared to
interdependence techniques such as cluster and factor analysis.
The concept behind the approach is that initially only aggregate level nested
dummy indicators for categorical variables such as region or month are used as predictor
variables in linear models with replenishment forecast deviation and bias as the
dependent variables. Only the indicators with the highest aggregation are initially
included to limit the number of terms to evaluate, similar to the fixed effects approach to
34
clustering described in Cohen et al. (2013). Should a term prove to be a significant
predictor of deviation or bias, we remove only the significant predictor and replace it
with the nested disaggregated indicators contained in it, for instance television market in
place of region and week rather than month. As the data are nested, and we are already
using a weighted effects coding scheme, we can interpret the other unremoved
coefficients as nearly equivalent to their interpretation in the original model with only
aggregated terms. This iterative process requires an evaluation of terms after each stage
of disaggregation, with the number of iterations being dependent on the number of
hierarchical tiers present in the data. As terms are iteratively disaggregated, we approach
a model similar to Cohen et al.’s (2013), disaggregated analysis that has completely
atomized indicators and numeric independent variables. This method, that we call
hierarchical progressive disaggregation (HPD), permits an analyst or manager to
identify relatively few aggregated identifiers that have some significant effect prior to
diving deeper. This is particularly useful as this method is scaled up from our sample
with only 39 disaggregated product indicators and 4,173 disaggregated restaurant outlet
indicators to one that potentially includes all 8,100 stock units, nearly 15,000 restaurant
locations, and longer time windows.
We estimated separate models for replenishment forecast deviation (DEV) and
replenishment forecast bias (BIAS). To estimate these two dependent variables, we
initially fit linear regression models via OLS. Under this estimation method, we had to
satisfy certain assumptions: (1) linearity in the relationship between independent and
35
dependent variables, and in the residuals, (2) normality, (3) homoscedasticity and (4)
independence (Cohen et al. 2013).
Predictors in our model that are multicategorical were subsequently recoded as
weighted effects coded binary indicator variables, so the assumption of linearity for these
were assured. With only two treatment levels possible for each condition, we would not
be able to characterize a more complex relationship between independent and dependent
variables. However, HIST and PROP both are either continuous or have enough
treatment levels where nonlinearity could be an issue. Both DEV and BIAS are counts,
but deviation is strictly non-negative whereas bias can be either negative or positive.
Both counts also include a large proportion of observations with zero values, meaning the
restaurant accepted the centrally developed replenishment forecast. Of note, we had
eliminated the multicategorical DC indicator during previous exploratory analysis due to
high multicollinearity with other location indicators. This made factor models
inestimable. The terms were also left out of subsequent analyses both for high
collinearity and because DC indicators were not nested like location indicators restaurant,
cooperative, and region.
When fitting initial OLS models for deviation and bias, we found that our
residuals exhibited non-normality and heteroscedasticity. While both sets of residuals
still had the characteristic bell shape indicating normality, they were bimodal with a
heavy concentration near zero. This indicated error in the model prediction when the true
deviation or bias was zero, with a more normally distributed second error source.
Plotting residuals against predicted values, we found similar tight groupings around zero,
36
with a more homoscedastic grouping of observations away from zero. This result implied
model misspecification, and as a result we had to explore alternative model conceptions.
Cohen et al. (2013) recommend generalized linear models (GLM) to account for
such heteroscedasticity, and as deviation occurs as positive integer counts, we initially
pursued a Poisson distribution for the outcome variable. We encountered two main
issues with fitting such a model. One is that the computational complexity in iteratively
fitting a ML regression equation exceeds that of OLS (Minka 2003) and lacks scalability
(Toulis and Airoldi 2015). Even with a (relatively) small training sample of five million,
the difference in computation time between ML (using Fisher scoring or Newton-
Raphson iteratively weighted least squares algorithms) and OLS (using the QR
decomposition algorithm) is quite noticeable (Fox and Weisberg 2010). This of course
would be exacerbated in the larger enterprise-sized samples, though can be mitigated
partially by capping the number of iterations in the ML estimation (Cohen et al. 2013).
The second issue is that a Poisson model ended fitting poorly to our data. Standard
indicators of fit will tend to fail (asymptotically) as sample size increases (Maydeu-
Olivares and Garcia-Forero 2010). Fit indices are based on the assumption of an
approximate fit to a theoretic distribution. As sample size increases, the credible interval
window will narrow and a null hypothesis will invariably be rejected, thus violating the
assumptions of such a model. Observed phenomena rarely (if ever) converge perfectly to
a theoretic shape, and as the fidelity of the sample grows, the likelihood that it will
deviate from a theoretic distribution shaped (in this case) by only one parameter
increases. It is likely that a Poisson distribution poorly approximates a non-negative
37
observed value that is actually an amplified reflection of negative and positive count
values.
Jöreskog (2002) notes that this particular data structure may better be
approximated with a Tobit model that assumes a censored normal distribution in the
response variable. A histogram of deviation in our data appeared to support this
assumption, with a large proportion of values (~70%) accumulated around the censored
tail at zero. Unfortunately, this type of model shares the same weaknesses of a Poisson
model in that it is estimated by ML, so lacks scalability (Jöreskog 2002) and
asymptotically violates its (in this case three) parameter assumptions (Henningsen 2010).
Accounting for Cases of Non-Deviation
In our attempts to characterize the data via ML models with parametric
assumptions, we began to recognize a different character in those orders in which no
deviation or bias occurs. As mentioned previously, heteroscedasticity occurs mainly
around the zero values, indicating that linear or parametric models estimate non-zero
values fairly well, but that zero values are potentially determined by a separate function.
This problem is known as nonrandom sample selection driven by incidental truncation
(Wooldridge 2015). If this truncation is ignored and we attempt to estimate a parametric
model of the whole sample, there exists a sample selection bias (in a direction determined
by the nature of truncation). One method for correcting this bias is via the Heckit
method, which estimates the existence and magnitude of a dependent variable in separate
equations. The existence estimate is accomplished via logit or probit regression on the
whole sample, which permits estimation of an inverse Mills ratio 𝜆(𝑧𝑖 𝛾𝑖 ), 𝑖 ∈ {1: 𝑚}. In
38
this function, 𝑧𝑖 represents the list of variables predicting 𝑠, the existence of deviation
(which contain 𝑥𝑗 , the predictor variables for 𝑦, nonzero deviation magnitude), 𝛾𝑖 is the
list of regression coefficients. The magnitude model is then estimated via OLS with the
Mills ratio included as 𝐸(𝑦|𝑧, 𝑠 = 1) = 𝑥𝑗 𝛽𝑗 + 𝜌𝜆(𝑧𝑖 𝛾𝑖 ), 𝑗 ∈ {1: 𝑘 ∈ 𝑚}, and 𝜌 is the
correlation between error terms of the two models (Wooldridge 2010). Models estimated
in this way have been found to be both consistent and unbiased (Heckman 1976).
This sample selection observation has a rationale beyond data mining and
exploration of residuals. A restaurant outlet manager or franchisee may be predisposed to
avoid deviating from the forecasted replenishment under certain conditions that are
independent of the conditions that affect how much they would deviate. Perhaps in
instances where they have low volume in or low proportion of profit derived from a
particular product, or little past knowledge of the sales performance of an item, they
would defer to the centrally developed replenishment. Consequently, we would not
expect these (or other) factors to have the same effect for an observation in which no
deviation or bias occurs as we would for those in which we observe at least some
deviation or bias. With such a large proportion of observations with no change to the
proposed forecast, it may make more sense to truncate those observations rather than
censor or try to parameterize them in a single model.
We coded a new variable to be a deviation existence indicator (𝐷𝐸𝑉𝐸𝑋 ∈ {0,1}),
and estimated a logit model to try and predict DEVEX. Utilizing the purposeful method
of fitting logit models (Hosmer Jr. et al. 2013), our preliminary main effects model
retained all predictor variables included in the original OLS model. This is based on their
39
Wald test significance and consistent with their logical and theoretic justification for
inclusion. Though removing terms with the lowest Wald test p-values did not heavily
influence the remaining regression coefficients (change of < 15%), nested models were
significantly different from each other as measured by the likelihood ratio test. Therefore
we retained all original predictors. Evidence of a main effects model was supported by
estimating a model via the method of fractional polynomials, which indicated that
deviance is minimized when only first order terms are included. Thus the preliminary
final model contained only predictor variables included in the original OLS model, and
the model’s log-likelihood indicates a significant increase in deviance explained over a
null model (𝑝 ≪ 0.001).
The logit model indicated franchised outlets were 39% more likely than corporate
restaurants to deviate from a forecast, and that this effect was highly significant (𝑝 ≪
0.001). This is further support of cross-tabulations and descriptive statistics that indicate
increased likelihood among franchisees. The mosaic plot in Figure 2 indicates a larger
proportion of orders placed by franchisees were non-zero when compared to orders
placed by corporate managed outlets. The width of the mosaic tiles serve to indicate the
relative number of franchised and corporate transactions. 89.5% of observations in our
sample were from franchised restaurants (which made up 87.5% of sampled locations).
40
Figure 2: Prevalence of Deviation Existence by Governance Form
However, our logit model had marginal discrimination (𝐴𝑈𝐶 = 0.649), meaning
that given two observations where one has zero deviation and the other has non-zero
deviation, the model will only correctly classify these observations 65% of the time.
Beyond classification weaknesses, our model also had a number of indicators of poor
model calibration. LOWESS smoothed plots of numeric (treated as continuous) variables
HIST and PROP against DEVEX reveal monotonic behavior, which would be expected in
41
a properly fitting logit model. However, when logit transformed, these plots exhibit
nonlinearity in the lower range of values, indicating possible model misspecification.
Both the Pearson’s Chi-Square and the Hosmer-Lemeshow goodness of fit tests indicate
poor model fit (𝑝 ≪ 0.001). Similarly, the McFadden pseudo R2 measure that represents
a proportional reduction in error variance (not to be confused with the R2 measure from
2
OLS) indicates sub-par explanatory power of the model at 𝑅𝑀𝑐𝐹 = 0.048 (Wooldridge
2010). Due to these limitations, we cannot have a reasonable estimate for the Mills ratio.
This lack of fit may be due to the lack of an exogenous variable in 𝑧𝑖 ∉ 𝑥𝑗 (Wooldridge
2015), and requires the assumption (that we later examine) that 𝜌 = 0.
We then conducted a separate OLS analysis only on those observations in which a
deviation occurs. We did so following the HPD procedure described previously. The
initial model is the most highly aggregated, with indicators only for month
(𝑀𝑂𝑁𝑇𝐻𝑐 , 𝐶 = {1,2, … 12}), region (𝑅𝐸𝐺𝑘 ) and an aggregate indicator for product
grouping (𝐺𝑅𝑃𝑟 ). Every day (𝐷𝐴𝑌𝑎 , 𝐴 = {1,2, … 365}) is nested in a week (𝑊𝐾𝑏 , 𝐵 =
{1,2, … 52}), that is then nested in a month by the month a week began. All 𝐼 = 4,173
restaurants (𝑅𝐸𝑆𝑇𝑖 ) are nested in 𝐽 = 56 advertising cooperatives (𝐶𝑂𝑂𝑃𝑗 ), which in
turn are nested in 𝐾 = 16 geographic regions. Each of 𝑃 = 39 products (𝑃𝑅𝑂𝐷𝑝 ) fall
into 𝑅 = 6 aggregate product grouping indicators that we jointly determined with the
restaurant’s 4PL to be most likely to have common advertising, storage and handling
characteristics. Frozen, refrigerated, paper, dry goods, promotional items and bread
products were all determined to be unique groupings that would be expected to behave
similarly in terms of replenishment forecast deviation and bias. By initially using such
42
aggregated terms and estimating only first order effects, we were able to initially evaluate
and interpret only 34 regression coefficients. The initial evaluated model for deviation is
of the form:
𝐷𝐸𝑉𝑎𝑖𝑝 = 𝛽0 + 𝛽1 𝐺𝑂𝑉 + 𝛽2 𝐻𝐼𝑆𝑇 + 𝛽3 𝑃𝑅𝑂𝑃 + 𝛽3+𝑐 𝑀𝑂𝑁𝑇𝐻 + 𝛽3+𝐶+𝑘 𝑅𝐸𝐺

+ 𝛽3+𝐶+𝐾+𝑟 𝐺𝑅𝑃
And the model for bias is simply:
𝐵𝐼𝐴𝑆𝑎𝑖𝑝 = 𝛽0 + 𝛽1 𝐺𝑂𝑉 + 𝛽2 𝐻𝐼𝑆𝑇 + 𝛽3 𝑃𝑅𝑂𝑃 + 𝛽3+𝑐 𝑀𝑂𝑁𝑇𝐻 + 𝛽3+𝐶+𝑘 𝑅𝐸𝐺

+ 𝛽3+𝐶+𝐾+𝑟 𝐺𝑅𝑃
Outlier Identification
As with the OLS models that included all observations, we estimated separate
models for bias and deviation, but now among transactions with strictly nonzero values.
While fitting the models, we tested for and observed outliers in both models based on the
∑(𝑌̂−𝑌̂ )2
global influence measure Cook’s Distance 𝐷𝑖 = (𝑚+1)𝑀𝑆𝑖 with 𝑚 predictors (Cohen et
𝑟𝑒𝑠
al. 2013). We selected a global influence measure over a specific influence measure such
as DFBETA because estimation requires m times less computation time in the global
measure. This was significant given our training sample size of five million. While
many cutoff thresholds are proposed as being worth examining, we selected a value
4⁄(𝑛 − 𝑚 − 1) with 𝑛 observations and 𝑚 predictors, which is more appropriate for
large data samples (Fox and Weisberg 2010). Despite this being a more stringent cutoff
designed to limit the amount of outliers an analyst must examine, the deviation and bias
models had 53,144 and 58,023 respectively (out of 1.6 million observations). In the end,
four observations were removed based on Cook’s distances that were several times larger
than all other highly influential observations. While it is unwise to remove observations
43
simply due to their large influence (Hair et al. 2006), we observed these four transactions
also to have order values 23.5-91.7 times larger than historic usage and 76.3-80.2 time
larger than the proposed order size. These are not reasonable values and constitute
obvious entry errors. Worth noting: while outliers manifested in a fairly proportional
manner to all predictor factor levels, two stood out. Bun and fry transactions made up
91.8% of the most influential observations in the deviation model and 93.2% of the most
influential observations in the bias model. In fact, fully 40.3% of bun and 15.7% of fry
transactions in the deviation model and 43.7% of bun and 18.5% of fry transactions in the
bias model were considered “outliers”. This indicates that these products behave
differently than the others in this model, and that the linear prediction may not be
sufficient for these products.
Results
After removing erroneous outliers, we began the disaggregation process and
report the results starting with the deviation model. Starting with terms at the highest
level of aggregation, we fit a model that explains 50.3% of the variation in order
deviation. The results of the aggregated model indicates nearly all predictors had a
significant effect by traditional alpha-level cutoff standards. This is to be expected, as
our sample size is extremely large. In fact, as this process is scaled to larger portions of
an enterprise dataset, it would be unlikely to calculate an effect that wasn’t significant by
traditional statistical cutoff levels (i.e.: 𝛼 = 0.05, 0.01 or 0.001). Using any fixed level
of statistical significance as a threshold becomes problematic in large data sets, because
significance depends not only on effect size and dispersion, but increasingly on the
44
number of observations in any treatment level (Hair et al. 2006, Wooldridge 2015). We
therefore used a relative threshold to select and disaggregate only those factors whose
effects are least likely to be due only to chance. As governance form is the focal
predictor in this model, we defined statistical significance of covariates in relation to the
statistical significance of that term. Alternatively, if the analysis is purely exploratory, or
if there is a much larger set of variables, some percent of the most statistically significant
terms may be candidates for disaggregation.
Deviation Model
For the deviation model (when deviation is nonzero), only two regions and two
product categories had higher significance than the effect of governance form. For the
first disaggregation, we replaced the indicators for the Heartland and Southern California
regions and the frozen and refrigerated product categories with the 16 and 15 respective
nested terms that make up those aggregated categories. We then re-estimated the model
with the disaggregated terms. Not surprisingly, the disaggregated model explained more
of the variance in deviation at 52.0%. In this model, deviation in orders made by
franchised outlets are expected to be 8% higher than those made by corporate outlets, all
else equal. We should expect a greater number of terms to represent a higher proportion
of variance given that all information contained in the aggregated terms is implicitly
contained in the disaggregated replacement terms. By again observing only some subset
of terms (bound by either the significance of a focal term or some percent of variables),
we see the value of such a hierarchical procedure. Restaurants orders were on average
45
3.1 cases different from recommendations when they made replenishment forecast
revisions, so all effects are cited relative to this.
The partially disaggregated location indicator narrowed the search for the source
of variance from two very expansive regions, to many of the constituent marketing
cooperatives. The result is that the Los Angeles cooperative is now the only location
whose effect is less likely than governance form to be simply attributable to chance.
Orders in the Los Angeles cooperative deviate on average 0.3 cases less than other
cooperatives and regions, all else equal.
Interesting results also come from disaggregating product terms. The
disaggregated model includes 3 product groups and 20 individual products that have a
greater significance than governance form. This result is slightly more complicated in its
interpretation, as three indicators (representing the dry goods, bun and paper product
categories) that had previously been less significant than franchising are now more
significant with the removal of some terms and the addition of others. This relative
change is due to collinearity that exists between removed and retained variables. As
indicators of product category that were removed are mutually exclusive of product
category indicators that remain, it is only the removed indicators of region that covaried
with the remaining product category terms. In the same vein, the additional cooperative
indicators (as nested product indicators also are mutually exclusive of product category
indicators) covaried less than the original aggregated terms and so had less of a
confounding effect on the remaining product category indicators. This result provides
useful information about the effect of specific product categories and products on
46
replenishment forecast deviation. For instance, among those significant predictors
(relative to governance form) only small beef patties, buns, fries and ice cream had a
positive effect on deviation (0.9, 2.7, 3.7 and 0.5 cases respectively). However, these
items constituted 69.2% of all products ordered at the restaurant level. In essence, the
products that have the greatest volume of orders (and thus likely the greatest impact on
costs) are driving positive deviation by the greatest extent.
As we are observing independent variable significance relative to the significance
of governance form, we should also note changes in the significance of governance form
as a predictor of replenishment forecast deviation. The likelihood that the effect of
governance form on deviation is likely to be due only to chance has increased with the
introduction of disaggregated indicators. This is because the newly introduced indicators
covary with governance form. We should expect such confounding with the introduction
of additional terms. If governance form is heavily confounded by the introduction of
additional terms, then a more stable point of comparison may be desirable (such as some
fraction of the most significant terms, as suggested above). For our current analysis,
however, we retain a consistent logic for selecting terms to disaggregate.
Bias Model
For the model predicting nonzero bias, two of 12 months, eight of 16 regions and
five of six product categories had a significant effect on bias relative to the significance
of franchising. The aggregate model explains 46.1% of variance in nonzero bias. After
disaggregation, 54.8% of variance is accounted for by the model. In this model, bias in
orders made by franchised outlets were 3% higher than those made by corporate outlets,
47
all else equal. This increase in explained variance is expected when expanding the
number of temporal, geographic and product-based terms via disaggregation. Restaurants
tended to revise their orders up by an average of 2.2 cases if they revised a replenishment
forecast, so all effects are cited relative to this.
The partially disaggregated time indicator provided additional fidelity in
determining which time periods may have significant effects on order bias. Five months
that had previously had less statistical significance than governance form, as well as
seven weekly indicators were now relatively more significant in predicting order bias.
This increase of significant terms had less to do with the disaggregated covariates than it
had to do with the decreased significance of franchising. After disaggregation, it became
8.043 × 1031 times more likely (though still with 𝑝 ≪ 0.001) that the effect of
governance form on bias was purely due to chance. The introduction of disaggregated
terms confounded governance form’s effect to a greater extent than the retained
aggregated terms, thus reducing its apparent effect. Additionally, the removal of
aggregate non-temporal terms that acted as confounders caused the retained aggregate
time predictors to gain significance. The greatest effects occurred in the second week of
May (0.5 cases larger), and three of four weeks in December, all of which coincide with a
significant negative bias effect (0.4, 0.6, and 0.7 cases smaller respectively). This
disaggregation has identified multiple more specific time periods to evaluate possible
reasons for the observed effects. For instance, the advent of the holiday season may
signal lower levels of sales throughout the system that are not currently being captured in
the restaurant firm’s demand planning process. Individual restaurants are recognizing
48
this, but for some reason the corporate forecasters may not. Worth noting, we do not
compare the bias in restaurant level orders to actual sales data, so it may be some
cognitive dissonance in the individual restaurant owners and managers (and not in
corporate demand planning) that is driving this observed behavior.
Location indicators in the partially disaggregated model included 3 regions that
were not previously significant, as well as 25 advertising cooperatives as statistically
significant. Among those regions and cooperatives considered statistically significant,
the largest positive bias effect occurred in two cooperatives between Oklahoma City and
Dallas (1.7 and 1.2 cases higher on average respectively), as well as two in central
Missouri (1.4 and 1.5 cases). The largest negative bias effect was experienced in a
cooperative in rural west Texas (1.2 cases), but also the two urban cooperatives in Las
Vegas, Nevada and Los Angeles, California (0.7 and 0.6 cases).
Finally, in the disaggregation of product terms in the model predicting bias, one
previously non-significant product category and 33 individual products were relatively
more significant in predicting order bias than governance form. In fact, the only non-
significant product term was lemonade. The largest positive bias effect was observed in
buns (2.1 cases) and fries (2.8 cases), whereas the largest negative effects manifested in
napkins (1.1 cases) and apple slices (0.9 cases).
Discussion
These results have implications for our hypotheses regarding the effect of
governance form on replenishment deviation and bias, extensions to post-contractual
49
performance management, and methodological implications from our proposed technique
of disaggregation.
Effects of Governance Form
Through our analysis, we found mixed support for our hypotheses. H1 was
supported; replenishment forecast deviation from proposed order levels by a restaurant
was relatively higher among restaurants owned by a franchisee than those corporately
owned. This finding is consistent with both restaurant firm expectations and extant
literature on governance form, but now applied to internal measures of supply chain
performance (deviation).
However, H2 was not supported; bias in replenishment forecast deviations was
relatively less negative among restaurants owned by a franchisee than those corporately
owned. This would seem to support the proposed unofficial competing hypothesis, where
service level and market share considerations could increase positive replenishment order
deviation bias among franchisees to the point of dominating the negative bias effects
from improved oversight and frugality, or inventory underinvestment from inefficient
risk bearing or free ridership. Further work to confirm this alternative explanation would
require additional measurement of proxies for proportion of wealth tied to a franchise
such as multi-unit ownership (indicating degree of inefficient risk bearing), and levels of
competition or travel intensity (indicating degree of free ridership).
This unexpected result could also be the result of greater levels of unreported
local promotions by franchisees, or improperly reported inventory levels. Alternatively,
as we did not assess the accuracy of the proposed replenishment forecast against point of
50
sale consumption, the franchisees could be compensating for a systematic
underestimation of demand. It could also be some combination of these causes. Each of
these possible alternative explanations challenges the accepted paradigm among
governance form scholars that franchisees more parsimoniously steward their resources,
and merits further investigation.
Extensions to the Post-Contractual Effects from Governance Form
Firms that utilize franchise governance can use these findings on deviation and
bias to address how internal order deviations are structured in contracts. This could be
implemented in future contracts as a limit to how much an order can be edited, or a
simple cue so that changes over a certain threshold require an explanation or update of
previously misreported local inventory or promotional activity. Parent firms can also
implement these findings in ways described in recent research on post-contractual
franchise performance, as described below.
Those firms that utilize the plural form of governance can use these findings (or
rather, employ these methods to their own replenishment data) to ratchet franchisee order
edits if it is found that they improve replenishment forecast accuracy relative to point of
sale consumption (Bradach 1997, Yin and Zajac 2004). By recognizing local information
advantages, parent firms can incorporate the information in an attempt to improve
replenishment forecasts, which in turn requires less system wide inventory to account for
variance.
Conversely, if it is found that franchisee order edits degrade accuracy relative to
point of sale, this provides evidence to convince franchisees to act in their own best
51
interests. Given that franchise contracts provide for less coercive control by the parent
firm, such information is integral to the relational governance proposed by Bradach
(1997), Paik and Choi (2007), and Cochet et al. (2008).
Finally, knowing that franchisees exhibit higher deviation and more positive bias
can aid the restaurant firm in aligning fit of form and strategy. If the cost of this
increased variance in their replenishment system outweighs the marginal benefit of
franchising, the restaurant firm may consider changing their policies for managing
franchisees or their mix of franchised restaurants. Bradach (1997), Yin and Zajac (2004)
and Barthélemy (2008) all suggest properly accounting for and mitigating agency costs
particular to a governance form can contribute to minimizing form-specific residual loss
and maximizing the advantages of the plural form.
Methodological Implications
The contextualizing effect of HPD can help target responses to only those
temporal, regional or product category peculiarities that have a distinct positive or
negative effect. For instance, knowing that replenishment forecast deviation is much
higher among only small beef patties, buns, fries and ice cream or that positive bias is
driven overwhelmingly by buns and fries may reduce unintended consequences of
broader policy changes. The higher bias observed in mostly rural Midwestern areas may
be either systematic under-forecasting, or could be a response by restaurant outlets for
unreliable service from a common distributor. This allows the parent firm to target
resources to either improve point of sale demand forecasts, or to evaluate potential poor
service of a regional distributor. The same is true for identifying benchmarking
52
opportunities. The lower observed deviation in the Los Angeles cooperative, and lower
bias observed in many urban cooperatives can serve as an example for how to structure
future contracts, or how to interact with restaurants after contracts are established.
Regardless of how these outlet level order edits are affecting point of sale demand
accuracy, deviation and bias is causing additional variance for upstream distributors.
The results indicate that HPD provides targeted information to managers at the
restaurant firm with minimal computation and human interpretation in a supervised
learning process. This principle can be extended to any large enterprise with an
independently nested structure. This includes most restaurant chains like the focal firm
of this study, retail chains, but also firms that provide primarily services rather than
goods. Take for instance a large cable company trying to forecast the consumption of
cable during its service calls. In terms of regional and temporal aggregation, they may
observe higher consumption in northern regions during periods with known severe
weather that may damage lines. This would indicate they should provide offices region-
wide with greater supplies of cable and possibly shift their staffing. After disaggregating
the most highly significant terms, they may find that the variance of the region is actually
driven by a single office or small subset of offices. This would indicate a different
response, perhaps related to local management or training. While this example is
fictional, it demonstrates the possibility for surprising insights through HPD for alternate
large hierarchical organizations where data volume makes manual and even
computational identification difficult. The process is also not limited to those firms that
53
use the franchise contract model, as any factor to be examined can be contextualized by
pairing with such an exploratory process.
In this paper, we demonstrate only one level of disaggregation in our model,
though the data structure of our temporal and geographic indicators would permit
additional iterations as they both include three nested levels. This is because we wish
only to demonstrate the potential value of HPD by showing that it can parsimoniously
isolate heterogeneous effects. We do not test any theories about regional, temporal, or
product-based effects, so end our analysis at one level of disaggregation. Our models
predicting deviation and bias in replenishment forecasts are a small-sized example, but it
is evident the value the HPD process can bring to a large sized data set. Instead of an
over-specified model with thousands of confounding indicators, as would be the case if
the lowest level of disaggregation were used, the analysis begins with a relatively simple
model with coarse indicators. Besides being easier to interpret, this aggregate model is
more scalable, given the time complexity of calculating an OLS regression model in a
population of billions (or more) of transactions and thousands of indicators. The
disaggregation process then can refine a search based on coarse indicators that
demonstrate a significant effect on the dependent variable (in our case replenishment
forecast deviation or bias). This process can be set up as a simple machine learning
process as it uses simple logic to disaggregate. The “supervised” portion of the process is
then an evaluation of the various iterations of disaggregation. Each level will be an
increasingly complex model to interpret, so it is up to the analyst to decide when to end
54
the process. The nested structure of the data also acts as a natural termination for the
process.
Limitations
This study provides contributions to both management theory and explanatory
methods. However, as with all research, it suffers from a number of limitations. The
initial limitation is that the sample this analysis is based on comes from a single large
quick service restaurant firm. While this limits generalizability of our results, there are
several mitigating factors. First, we include data from over 4,000 geographically
dispersed restaurant outlets in our model, which accounts for about 1.9% of all domestic
quick service restaurant outlets (Census Bureau 2012). Second, we worked with the
restaurant firm’s demand planners to sample products that represent a wide range of
demand, advertising, storage and handling characteristics. Finally, as an industry leader
(Hoover’s 2016), the characteristics of this firm’s outlets can be expected to represent
large portions of the industry.
A second limitation is the use of a truncated sample examining only transactions
with non-zero values of deviation and bias. This requires the assumption 𝜌 = 0, or that
the error terms of the two models in the Heckit method are uncorrelated. This
assumption holds if model coefficients are observed to be consistent in the truncated
sample (Wooldridge 2015). Using a random holdout sample of five million transactions,
we observed an OLS model with the same aggregate terms found to be significantly
related to changes in deviation and bias. Coefficient estimates were directionally
identical between samples and changed by at most a few percent (acceptable under
55
thresholds established in Hosmer Jr. et al. 2013). This represents a very small effect
difference and so can be considered consistent.
This truncation relates to a third limitation of our study. When fitting GLM
models, parametric assumptions fail asymptotically as sample size increases (Maydeu-
Olivares and Garcia-Forero 2010). As researchers set out to explore larger datasets,
traditional fit measures may indicate rejection of model types that are logically and
practically appropriate for representing the data. Future research needs to address this
weakness in fitting GLM models to larger sample sizes.
A fourth limitation of the study is that it ignores complex interactions and higher
order effects. It is likely that some combinations of factors have significant effects on
deviation and bias. However, in an effort to limit required manual interpretation of
complex interactions as HPD is scaled to massive datasets with thousands of indicators,
we purposely limit our scope to first order effects.
Finally, deviation and bias in restaurant level orders are not compared to point of
sale data, so it is unknown whether restaurant level adjustments of replenishment
forecasts relate to actual end consumption. While deviation and bias will cause adverse
effects in higher echelons of the supply chain regardless of this information, it would be
useful to know whether the source is local information advantage or the result of some
cognitive dissonance by the individual restaurant operator.
Conclusions
In this research, we examined the operational effects of governance form that, in
part, addressed the claim by Combs et al. (2010) on the relative lack of research on
56
franchising operational performance after contract formation. As part of our inquiry, we
incorporated the call of Waller and Fawcett (2013b) and Cotteleer and Wan (2016) to pair
theoretical inquiry with big data exploration. This tactic helped protect against the risk of
false positives inherent to big data exploration and of over-inflated statistical power when
testing theory in big data. This also permitted rich contextualization of the effect of
governance form on replenishment forecast deviation and bias. In doing so, we also
developed a scalable method for examining and isolating effects in a hierarchical
framework, called HPD. Our proposed method of analysis permitted rapid
characterization of effects from millions of individual transactions with limited manual
interpretation. This process, scalable to much larger populations, isolates temporal,
regional, and product based peculiarities of the impact of governance form on our two
dependent variables. Finally, our mixed results support previous findings on the effect of
governance form on replenishment forecast deviation, but indicate greater theoretical
work must be done to characterize and predict the effect of governance form on
replenishment forecast bias. Future work must aim to identify the alternative causes or
combinations of causes that drive higher bias in franchised restaurants.
57
Chapter 3: “The Impact of Including Forward Indicators on POS Demand Forecast
Accuracy: The Case of Short-Term Weather Forecast Data”
Introduction
Demand forecasting attempts the unenviable task of predicting complex human
preference and behavior with incomplete information. Short term demand predictions at
the product level are most often executed through a statistical forecast developed from
records of past demand, and extrapolated forward (Jain 2001, Fildes et al. 2015).
Remarkably, this simple approach has managed to produce some highly accurate
forecasts over wide ranges of industries, products, and locations, despite assuming
stationarity of conditions which drive changes in demand (Armstrong 2002). Weather, an
established driver of mood, preference, and ultimately consumer demand, has long been
incorporated into statistical forecasts via estimates of decomposed seasonal effects.
However, such estimates carry the implicit assumption that past seasons and their effects
will occur again in the same way.
What seasonality estimates fail to capture is that weather can change drastically
on a day to day basis, and has an immediate impact beyond a mean seasonal effect. This
has motivated many forward-leaning firms to incorporate short-medium term weather
forecasts into their demand planning processes. Increasingly accurate short-medium term
weather forecasts are available to demand planners from a variety of sources, and
58
predictive weather indicators have increasingly shown promise in the last few years,
though with some mixed results.
Nestle began incorporating weather forecast data into their bottled water demand
forecasts in 2008, improving weekly sales forecasts by 2-6% (IGD 2009) and saving
them as much as $12 million annually (Banker 2009). British grocery chain Tesco even
started employing their own weather forecasters in 2009 to improve their demand
forecasts (Werdigier 2009). Giants like Walmart and Proctor and Gamble paired with the
Weather Company (owner of The Weather Channel and weather.com) in 2013 to match
their point-of-sale (POS) data with weather data to identify trends down to the individual
consumer level (Suddath 2014). IBM, recognizing a growing demand among businesses
for accurate weather forecast data, purchased the sensor, digital and data assets of the
Weather Company for $2 billion in late 2015 (Hardy 2015). Combining the Weather
Company’s assets with sensors from the National Oceanic and Atmospheric
Administration (NOAA), NASA and the U.S. Geological Survey (Dillow 2011), IBM
launched their Deep Thunder hyperlocal custom weather forecast engine in 2016,
providing weather-based insights for their business clients (Stockton 2016). This
expanding interest and investment in predictive weather indicators demands a greater
academic investigation into implications of predictive weather indicators for business
managers.
Observed weather’s immediate effect on demand has been examined in a number
of contexts and industries, with mixed conclusions on its nature, magnitude, and even
direction within the academic literature (Bertrand et al. 2015, Arunaj and Ahrens 2016,
59
Bujisic et al. 2016, Tran 2016, Li et al. 2017, among the most recent). Though consensus
exists that weather has an effect on demand, the causal linkages are still not
comprehensively understood. Practically applying the limited understanding of the
effects of weather on demand is further hampered by the fact that almost all research
focuses on the effect of observed weather; information unavailable to demand planners
when they make critical predictions for their supply chain. To date, despite growing use
of predictive weather indicators among practitioners, little has been published on
forecasting short term demand from predicted weather indicators (Nikolopoulos and
Fildes 2013, Steinker et al. 2016).
This distinction between observed and predicted weather indicators is important,
and is based in the inherent uncertainty of a weather forecast. Though the typical impulse
for a forecaster is to include as much information as possible to improve accuracy, there
is inherent risk in including information in a forecast which is itself uncertain. Thompson
and Brier (1955), Thompson (1962), Murphy (1977), Katz and Murphy (1990) and Katz
and Lazo (2011) all demonstrate, using cost-loss models, that imperfect weather forecast
information can only improve expected economic value for a business decision maker if
they are sufficiently reliable over the decision making horizon, and if it is possible to
make investments that protect against negative weather effects. If a decision maker
incorrectly estimates the reliability of the incorporated weather forecast, the cost or
efficacy of a loss preventive investment, or the loss that would be associated with a
weather event, they stand to lose money. Improper incorporation of uncertain
information like weather forecasts places businesses in a wide array of industries at risk.
60
For instance in February of 2017, incorporation of an improperly specified temperature
forecast that was only a few degrees off into a power load projection led to blackouts in
over 40,000 South Australian homes in the middle of a dangerous heatwave (Burton
2017). Regarding economic risk, an estimated 16-25% of U.S. GDP and 80% of US
companies are considered weather sensitive, or elastic to changes in weather (Bertrand
and Sinclair‐Desgagné 2011).
This risk generally increases as the uncertainty of the weather forecast increases,
or as the horizon of the demand forecast is extended, but differs by application. Demand
planners are interested in forecasting horizons, which at a minimum, extend through
decision points where changes can be made to material flows (Murphy 1993). This is, of
course, highly dependent on production and logistics lead time, as well as the degree of
inventory and production centralization. For an industry like agriculture, there is no
requirement to accurately predict weather on a day-to-day basis, but weather information
is required months in advance. Climate forecasts indicate with reasonable accuracy the
expected accumulated levels of rain, wind and sun a farmer might expect, and would
dictate which fields they may fallow or which crops they may plant. For sales and
operations planning, the required weather forecast information horizon is typically
shorter, but requires much greater day-to-day accuracy. Weather effects can only be
aggregated over the relevant planning horizon. For manufacturing concerns, weather
forecasts would need to span a production cycle. To impact logistics costs, weather
forecasts need to span a replenishment cycle.
61
Paradoxically, due to spatio-temporal aggregation mitigating the effects of short
term variation, long-term climate forecasts tend to be more accurate than short and
medium term weather forecasts (Camargo and Hubbard 1999, Janis et al. 2004). Short
and medium term weather forecasts do not benefit from this aggregation effect, and so
their accuracy significantly degrades at ranges past one-two weeks. Pepsi (France)
experimented with incorporating weather forecasts into their sales and operations forecast
in 2009, but ultimately abandoned the effort when they found weather forecast
degradation past a two week horizon countered any benefit from inclusion (Fustier 2011).
Their two-week production cycle was too long, or available weather predictions at the
time were too unreliable to provide value to their forecasts.
In this study, we demonstrate the effect of including short-medium term predicted
weather indicators in demand forecasts for the quick service restaurant industry.
Utilizing autoregressive prediction models, we introduce exogenous weather variables
into time series forecasts for 41 menu items at 2742 individual restaurants distributed
throughout the continental U.S. We expand on initial work by Nikolopoulos and Fildes
(2013), and Steinker et al. (2016) to estimate the effect of a greater variety of weather
forecast variables, across more products and locations. In the process, we demonstrate
actual improvement in various demand forecast quality measures through inclusion of
predicted weather, and possible improvements as weather forecast reliability improves.
Further, we demonstrate some instances where simple linear models that include
predictive weather factors show improvement over the proprietary forecast generated by
the restaurant firm over the same period. These improvements in forecast quality have
62
direct financial implications for the restaurant firm, and provide support for inclusion of
readily available predictive indicators in forecasting efforts in broader contexts.
The remainder of the paper is divided as follows: first we review the literature
regarding observed and predicted weather effects on demand and demand forecasts, next
we describe our modeling effort, report comparative results from our models, discuss
implications and draw conclusions from our results, present limitations, and finally
highlight opportunities for future research.
Literature Review
Observed Weather’s Effect on Demand
The economic effect of weather is well established in a number of familiar
contexts, and there is a significant body of literature dedicated to describing it. The
effects (and resultant implications) vary by industry, weather and demand forecast
horizon, specific weather phenomena and scope or level of aggregation.
Though weather has sizable demonstrated effects on financial markets,
manufacturing, retail, and services, the industries that have seen the most weather-related
research are those that are most directly impacted by (and thus sensitive to) weather
effects; agriculture and energy (Lazo et al. 2011). Agriculture depends directly on both
immediate and accumulated climate effects. Agricultural papers typically model effects
on crop yields (Mjelde et al. 1989, Potgeiter et al. 2003, Hamjah 2014), and depend on
long-range climatological, rather than short-medium range weather forecasts for most
predictive models. Energy related industries have an effect that is nearly as direct. Both
mining and energy utility demand increase under conditions of higher energy
63
consumption. Consumption tends to increase with both high and low temperature
extremes (Considine 2000, Auffhammer and Mansur 2014), and energy forecast models
may incorporate either short term weather forecasts for load balancing or long-term
climatological forecasts for capacity planning.
Restaurants are typically identified as part of the service industry (Howells and
Morgan 2017), though at times are grouped with retail (Starr-McCluer 2000), and share
multiple demand factors with retail. Lazo et al. (2011) notes a relative lack of research
on weather sensitivity in the service industry sector, relative to agricultural and energy
sectors. This is despite weather accounting for an estimated $60B in variation within
service industrial sector revenue, compared to less than $16B in either agricultural or
energy sectors (2008 dollars). Bujisic et al. (2016) cite a specific lack of research on
weather sensitivity in hospitality and restaurant segments of the service industry sector
outside of coarse climatic and seasonality effects. For this reason, we review the
literature of weather sensitivity of both retail and services.
Weather Effects on Retail and Service Sectors
Steele (1951) was the first to demonstrate the effect of weather on retail sales,
determining a negative impact from precipitation, snow accumulation, and ambient
cooling on daily department store sales. Early studies of weather’s effect on retail tended
to be either limited in scope (like Steele’s research to one store), or be regionally or
temporally aggregated. This is likely due to data availability and computational
limitations, and makes such studies of limited value for enterprise-level, short term and
distributed demand planning. For example, Johnston and Harrison (1980) used monthly
64
nation-wide average temperature and sunshine deviation on aggregate UK cider sales.
They found that increases in a combination of temperature and sunlight positively
affected cider sales, but this analysis was not location specific. Juselius (1985) similarly
used monthly nationwide averages, but of numbers of warm-temperature days,
determining a significant positive effect on sales of Finnish soft drinks. Later studies in
retail decreased the temporal or regional aggregation, or included additional effects.
Since these early retail studies, various specific weather effects, most notably
temperature, have been empirically explored in relation to numerous contexts.
Operationalizations of temperature have been found to have a nonlinear but generally
positive effect on demand for a wide range of products, including lawn care products
(Cawthorne 1998), aggregate nondurable products (Starr-McCluer 2000), soft drinks
(Divakar et al. 2005, Ramanathan and Muyldermans 2010), beer (Bratina and Faganel
2008), aggregate service industry sales (Lazo et al. 2011), demand for both cars and
homes with warm-weather features (Busse et al. 2012), online clothing (Steinker et al.
2016), and food and clothing retail sales (Arunaj and Ahrens 2016). This positive effect
is, however, diminished or even negative in restaurant sales (Starr-McCluer 2000, Bujisic
et al. 2016), winter sports demand (King et al. 2014), extreme temperatures (Parsons
2001, Tran 2016), or may be dominated by a negative effect from variation in
temperature (Koksalan et al. 1999, Mena et al. 2014, Bertrand et al. 2015). The
temperature effect can also be heterogeneous based on region (Divakar et al. 2005, Tran
2016), season (Johnston and Harrison 1980, Bahng and Kincade 2012), temporal position
in a season (Cawthorne 1998, Choi et al. 2011), channel (Divakar et al. 2005), and
65
product (Starr-McCluer 2000, Choi et al. 2011, Busse et al. 2012, Arunaj and Ahrens
2016).
In addition to temperature, retail studies have found precipitation (rain or snow) to
have negative impacts on demand in department stores (Steele 1951), outdoor malls
(Parsons 2001), purchases via mobile phones (Li et al. 2017), online clothing (Steinker et
al. 2016), food and clothing retail sales (Arunaj and Ahrens 2016), and sporting goods
stores (Tran 2016). Though, this effect is reversed in demand for winter weather
appropriate vehicles (Busse et al. 2012) or winter sports (King et al. 2014), and Lazo et
al. (2011) note an overall positive effect in service industry revenues related to
precipitation. Sunlight is found to reduce negative affect, increase demand for tea, coffee
(Murray et al. 2010), alcoholic cider (Johnston and Harrison 1980), purchases via mobile
phones (Li et al. 2017), and online clothing (Steinker et al. 2016), though not outdoor
mall foot traffic (Parsons 2001), and exactly the opposite in demand for cars with winter
weather features (Busse et al. 2012). Humidity reduces positive affect, and has a
negative impact on restaurant sales (Bujisic et al. 2016), though is not found to
significantly impact tea and coffee sales (Murray et al. 2010) or outdoor mall foot traffic
(Parsons 2001). Wind also has been found to coincide with lower restaurant sales
(Bujisic et al. 2016). As with temperature, these alternate weather effects tend to be both
nonlinear and heterogeneous across a number of dimensions (Arunaj and Ahrens 2016,
Tran 2016).
There are instances where we would not expect the effect of weather on quick
service restaurant demand to resemble retail demand, such as with online (Steinker et al.
66
2016) or mobile (Li et al. 2017) purchases, or in purchases of some durable goods (Starr-
McCluer 2000, Choi et al. 2011, Bahng and Kincade 2012, Busse et al. 2012, Bertrand et
al. 2015). This is due to the experiential nature of restaurants. Though they sell tangible
goods, those goods are typically consumed within a short time, and so are only purchased
when there is an immediate need. This makes restaurant demand, particularly for the
partially commoditized quick service restaurant industry, highly dependent on short term
inclinations of people to be out and to physically visit a store. Steele (1951) posits four
ways short term weather might affect a customer’s desire (or ability) to visit a business.
Explanations for Consumer Behavior
First, they may be physically prevented, as would be the case in an extreme
weather event. Second, a shopper may be disinclined due to inconvenience. This might
be the case with severe cold, heat, fog, snow or precipitation, which would require
additional planning, protective clothing, or caution. This notion is supported by research
that links increased outdoor leisure activity to increases in temperature (Smith 1993),
though the effects are regionally and seasonally heterogeneous (Tucker and Gilliland
2007), and negative in extreme temperatures (Zivin and Neidell 2014). These first two
mechanisms are facilitated by local infrastructure and individual adaptability, driven by
local weather norms (Tran 2016). Third, the shopper may face psychological barriers to
either go out, or once out, to make a particular purchase. There is a tremendous amount
of psychology and marketing literature which indicates linkages of weather with mood,
and mood with behavior (Cao and Wei 2005). Persinger and Levesque (1983) indicate
that 40% of mood evaluations can be attributed to weather. Increased temperature
67
(Cunningham 1979, Howarth and Hoffman 1984), sunlight (Cunningham 1979),
barometric pressure (Goldstein 1972), and decreased humidity (Sanders and Brizzolara
1982, Murray et al. 2010) all relate to positive affect; with negative affect mitigated by
sunlight (Murray et al. 2010). Positive affect is then positively related to purchase
interaction quality (Gardner 1985), positive perceptions of goods (Bitner 1992), and
increased spending (Donovan et al. 1994, Spies et al. 1997); with negative affect related
to a decreased willingness to pay (Murray et al. 2010). Some research also links “bad”
weather (increased precipitation, fog, and decreased sunlight) to risk-averse behavior
(Hirshleifer and Shumway 2003, Bassi et al. 2013, Li et al. 2017). Fourth, a shopper may
perceive different product utilities based on weather conditions. Unexpected rain may
spur the purchase of ponchos or umbrellas, and an early snow flurry may initiate the
season for selling winter garments. Subsequent studies (Starr-McCluer 2000, Tran 2016)
have measured this by incorporating weather variables into household production utility
models. This may also manifest as weather effect heterogeneity, and in cases of
substitution where weather has individual product effects, but not overall demand effects
(Choi et al. 2011, Bahng and Kincade 2012).
Problems with Using Observed Weather as a Proxy for Weather Forecasts
Previous work relating past observed weather effects to demand were either
descriptive, in that they did not claim to be able to predict future behavior with the
identified relationships, or they were implicitly forecasting weather along with demand.
Murphy (1997) and Armstrong (2002) refer to this as ex-post or conditional forecasting,
and note that while it can provide extremely accurate description of past behavior, it can
68
perform quite poorly when predicting behavior. The problem with this approach is that
the best known methods for statistically estimating future demand from a time series
differ substantially from the best known methods for forecasting weather.
Advances in Weather Forecasting
It is useful at this point to define what is meant by a short term or medium term
forecast. The National Weather Service (NWS) and American Meteorological Society
define short term forecasts as up to two days ahead, and medium range forecasts as being
between two and seven days ahead (AMS 2015). We use this definition for short range
weather predictions, though as has increasingly been the case in recent years (Hu and
Skaggs 2009) we extend the definition of medium range weather prediction out to ten
days.
Short and medium range weather forecast accuracy has increased rapidly in recent
decades, and the reason for this is also the primary reason why it is advantageous to
estimate weather separately from demand. While demand forecasting is primarily limited
(at least mathematically) to statistical and probabilistic extrapolations, weather has (for
quite some time) been better estimated through an ensemble of methods that include
simulation. From the earliest manual attempts by Lewis Fry Richardson in 1922, to the
more successful computerized efforts in the 1940s by the mathematician John von
Neumann, large scale simulation of fluid mechanic and thermodynamic weather effects
have accuracy limited only by computing power and environmental sensor data
availability (Tribbia 1997). Significant public and private investments in environmental
sensors and exponential advances in computation have permitted steady improvements in
69
weather forecast accuracy via Monte-Carlo simulation based ensemble forecasts (Dutton
2002). NWS short term temperature forecast average error was cut in half between 1966
and 2014, now between 2.5-3.0oF for two-day ahead forecasts (Huntemann et al. 2014).
Between 1992 and 2012, temperature forecasts of five-six days achieved the previous
accuracy of three-four day forecasts (AMS 2015). Probabilistic forecast performance for
short and medium range precipitation have improved similarly (Hu and Skaggs 2009,
Huntemann et al. 2014). Overall, the reliable forecast range of most weather phenomena
has increased roughly one day each decade, and is currently greater than one week (AMS
2015).
Effort has also been made to forecast demand using simulation, but demand
planners still primarily rely on extrapolative time series methods for quantitative
forecasting (Fildes et al. 2015). The reason for this is that the required econometric and
behavioral simulation parameters for demand forecasting depend on much less reliable
information than Newtonian factors which are found to drive short term weather effects.
Forecasters have found limited success extrapolating exogenous effects in what
Armstrong (2002) terms static simulation, which supposes that an effect will not change
from period to period. Similarly, judgmental adjustment bootstrap simulations
outperform both manual adjustments and forecasts without subjective adjustments, but
only under a narrow range of stable exogenous conditions (Ritzman and Sanders 2001,
Fildes et al. 2008). This is clearly not a suitable assumption for most daily weather
conditions, and fails to leverage the tremendous advances in predictive power stemming
from an accurate understanding of physical forces, simulation, exponentially increasing
70
computing power, and an ever expanding network of physical sensors. Therefore, despite
so many previous efforts to extrapolate weather and demand concurrently, we choose to
incorporate separately generated weather forecasts as exogenous factors in a statistical
model for demand.
Forecasted Weather to Predict Demand
Although observed weather’s effect on demand is widely studied, if somewhat
less-so in restaurant and service contexts, it is of limited utility for prediction. As all of
the above listed research measures the effect of only observed weather, they imply an
ability for perfect weather prediction. We must acknowledge that information available
to demand forecasters is imperfect, and so should include accurately predicted weather
information rather than perfect observed weather information to estimate predictive
models. This is particularly true in a supply chain context, where immediate decisions
about sourcing, production, and material flow relate to sales and operations plan with
horizons of a week or more. The director of the NWS recently noted that weather
forecast accuracy drops off considerably after a few days, due to unpredictable lower
order effects from physical inputs (Palmer 2013). While the NWS and many other
providers now offer 10-day forecasts, and there is evidence of forecaster skill that extends
as far as 14 days (Stern and Davidson 2015), accuracy beyond that range tends to be no
better than what you may find in the Old Farmer’s Almanac (Samenow and Fritz 2015).
Silver (2012) notes that most temperature forecasts beyond nine days in advance actually
tend to perform worse than historical averages, and only slightly better than a naïve
persistence forecast. This limitation has in the past made weather forecast information
71
less valuable for businesses with longer production cycles or lead times, and certainly
calls into question the use of observed weather as a proxy for predicted weather when
demonstrating the effect of incorporating weather into demand forecasts.
The work of Thompson and Brier (1955), Thompson (1962), Murphy (1977),
Katz and Murphy (1990) and Katz and Lazo (2011) indicate that weather forecast
reliability over a business’s operational planning horizon is a necessary condition for it to
provide value. Weather forecasts with sufficient lead time to permit changes in material
flow or capital investments have only in recent years become more reliable than simple
historical trends or seasonality estimations (Stern and Davidson 2015), a requirement for
use in decision-making (Mjelde and Dixon 1993). To date, only two studies have
observed the impact of incorporating medium range weather forecasts on demand
forecast accuracy.
Nikolopoulos and Fildes (2013) published the first work that evaluates the effect
of incorporating medium range weather forecasts in a demand forecast. Highly context-
specific, they investigated the effect of including 10-day ahead temperature deviation
predictions in demand forecasts of beer sales in the United Kingdom. They indicate
significant improvement in demand forecast accuracy by this inclusion, especially in
warmer months of the year. These results also support previous indications that weather
effects vary by region, season, product, mood, and a number of other difficult to capture
factors. Steinker et al. (2016) expand on this initial inclusion of weather forecasts, but in
the (again) specific context of German online retail in only two cities. They examine the
effect of including seven-day ahead predicted sunlight, temperature and precipitation on
72
demand forecast accuracy. They first establish an upper-bound fitting a model to in-
sample data with observed weather, then validate the model in an out-of-sample forecast
generated with historical weather forecast data. They find sunlight and temperature are
positively related to online sales, with this effect increased on the weekend, whereas rain
is negatively related. This is consistent with expectations for weather impacting the time
spent indoors versus outdoors. They also note a significant improvement in demand
forecast accuracy through the inclusion of weather forecast indicators, though the effect
diminishes as forecast horizon increases.
In sum, previous research on the effect of observed weather on demand have
indicated significant effects from a number of weather variables in a variety of industrial
and regional contexts, though this effect tends to be heterogeneous across industries,
regions, seasons, temporal positions in a season, and product. Research also indicates
that predicted weather can improve demand forecasts, but to date this is limited to a
narrow scope and context. We wish to expand on the scope of previous work and test the
potential of weather forecast data to improve demand forecast quality over a broader set
of contexts and in a new industrial setting.
Data
Our data were collected from multiple sources. Historical time series demand
data was furnished by the primary fourth party logistics (4PL) provider for a major
international quick service restaurant. Through an ongoing relationship with a university
research group, they enlisted our assistance in helping determine the effect of predictive
indicators which may improve their forecasting. To accomplish this, they provided us
73
with a large sample of forecasts and corresponding POS records they considered to be
representative of a wide range of menu offerings, demand patterns, geographic regions,
and seasons. The potential sample included 41 of the most popular menu-level products
at 4240 individual restaurants, covering a date range from 26 September 2014 through 1
February 2016. In total, we evaluated nearly 85 million individual observations, with
each corresponding to an individual menu item-location-day.
Demand planners at the 4PL currently generate rolling POS and replenishment
forecasts once each week for each menu item-location. If weather forecast information is
to be included in these POS forecasts, it must have a horizon of at least seven days, and
cover the same geographic regions and time periods as our POS sample. Though the
NOAA does not systematically store NWS weather forecasts, we were able to obtain The
Weather Company’s historical weather forecast data from a third party weather forecast
monitoring and assessment firm called ForecastWatch. The firm gathers forecasts from
multiple public and private sources, and regularly assesses their relative performance.
Most private weather forecasters use the NOAA network and even the NWS forecast as a
basis for improvement (Silver 2012), but they vary in how much (if at all) they improve
on the public forecasts. In a 2014 assessment, The Weather Company’s one-nine day
temperature forecasts were competitive with the best domestic forecast provider
(WeatherUnderground) and better than the NWS forecast (Floehr 2015).
We collected daily data from 836 available airport weather stations in the
continental US, providing widespread geographic coverage and tracked by International
Civil Aviation Organization (ICAO) codes. Each daily prediction includes a nine, five,
74
three and one day prior forecast for high and low temperature, vector average wind speed,
five point Likert scale for cloud cover percentage, and five point Likert scale probability
of rain, thunderstorms, snow, and overall precipitation. ForecastWatch provided daily
observed high and low temperature, vector average wind speed, and accumulated
precipitation for each observation. We augmented their data with observed daily rain,
thunderstorm and snow occurrence for each ICAO code from the NOAA’s Local
Climatological Database (NCEI 2017). We include multiple measures of predicted
weather indicator reliability for all point forecasts (temperature and wind speed) in Table
1. Mean error (ME) is an indicator of bias and mean absolute error (MAE) is a measure
of accuracy for point forecasts. Bias decreases with increased forecast horizon, as the
further a projection is, the more closely it resembles long-term climatology. There is also
an indication of conservatism in the predictions, as bias (for all three phenomena) is
directionally away from extreme values. Accuracy degrades with longer horizons, which
may negatively affect the value of predicted indicators in longer horizons.
75
Point Forecasts One-Day Three-Day Five-Day Nine-Day
ME
High Temp (oF) -0.64 -0.64 -0.56 -0.41
Low Temp (oF) 0.32 0.35 0.24 0.07
Wind Speed (mph) -2.46 -2.20 -1.97 -1.58
MAE
High Temp (oF) 2.22 2.89 3.78 5.75
Low Temp (oF) 2.28 2.84 3.61 5.05
Wind Speed (mph) 2.90 2.88 3.10 3.62
Table 1: Predicted Weather Indicator Point Forecast Reliability
Table 2 includes measures of predicted weather indicator quality for all
probabilistic forecasts (rain, thunderstorms, snow and overall precipitation). The Brier
Score is often used to assess probability forecasts, and is a special case of mean squared
error (MSE) bound by zero and one (Murphy 1997). As with all cases of MSE, lower
values indicate better forecasts. Unfortunately, this measure equally rewards correctly
predicting both occurrences and non-occurrences of a weather event. For rarer events
like snow or thunderstorms (that may have a significant effect on demand), this value is
artificially low. We, therefore include a measures of positive predictive value (PPV) and
sensitivity (Brenner and Gefeller 1997). PPV expresses the proportion of positive
occurrences given positive predictions of an event. For probabilistic forecasts, we use the
classification probability cutoff of 0.5, indicating that an event is predicted to be more
likely than not to occur. Sensitivity expresses the proportion of positive predictions given
positive occurrences of an event. As with the point forecast quality indicators, the
probability forecast quality indicators generally degrade with longer horizons. PPV and
76
sensitivity also indicate conservatism in predictions, as the predicted probabilities and
frequencies of weather events tend to be lower than the observed probability. For
example, 94% of one day ahead rain forecasts predicting a 50% or greater chance of rain
are followed by an observed occurrence of rain, but only 23% of rain events are
predicted.
Probability Forecasts One-Day Three-Day Five-Day Nine-Day

Brier Score
Rain 0.306 0.309 0.326 0.365
Thunderstorm 0.067 0.072 0.073 0.072
Snow 0.035 0.041 0.044 0.054
Precipitation (all) 0.176 0.179 0.199 0.238
PPV
Rain 0.936 0.878 0.804 0.586
Thunderstorm 0.375 0.336 0.298 0.250
Snow 0.887 0.720 0.648 0.377
Sensitivity
Rain 0.228 0.239 0.215 0.208
Thunderstorm 0.512 0.563 0.491 0.282
Snow 0.333 0.301 0.265 0.127
Table 2: Predicted Weather Indicator Probability Forecast Reliability
One limitation of our available weather predictions are the gaps in projection (i.e.
one, three, five and nine rather than one through nine day predictions). As a result, we
“bin” effects from projected weather into horizon categories. In each weekly demand
forecast, six-seven day horizons depend on weather projected nine days prior, four-five
77
on five days prior, two-three on three days prior, and next day symmetrically matched.
The resultant conservative application of proxies for full weather forecasts means our
results will serve as a lower bound estimate for weather accuracy at longer ranges.
Relevance of station forecasts to individual restaurants was also an area of
concern. In order to sufficiently capture geographic weather variance in the U.S.,
minimum sensor density is dependent on the weather measure of interest. Camargo and
Hubbard (1999) found that distances could not exceed 60 km in order to capture 90% of
inter-site variation in daily max temperature. This distance reduces to 30 km for min
temperature and sunlight, 10 km for wind, and five km for precipitation. Micro-climates
in mountainous areas can drive the minimum distance as low as one km. Unfortunately,
many of the 836 weather stations we gathered data from were too far distant from the
nearest restaurants in our sample to be relevant by this metric. In this case, we
compromise granularity of the measure with relevant sample size. We use the
conservative threshold of 30 km distance to identify a weather station as being relevant to
a restaurant. This ensures that temperature effects can be accurately estimated, and that a
less granular categorical characterization of wind and precipitation can be used. Applied
as the geodesic ellipsoid distance threshold between coordinates for stations and
restaurants (NGIA 2014, Hijmans et al. 2016), this eliminated about 25% of our original
restaurant sample (3162 remaining from 4240), and disproportionately from more
sparsely populated Western states.
78
Hypotheses
As indicated in Nikolopoulos and Fildes (2013) and Steinker et al. (2016), we
expect predicted temperature variables to have a significant (if heterogeneous) effect on
demand, and so inclusion is likely to improve demand forecast accuracy. The direction
of the weather effect does not matter when estimating forecast models, so the
heterogeneity observed across several dimension (Johnston and Harrison 1980,
Cawthorne 1998, Starr-McCluer 2000, Divakar et al. 2005, Choi et al. 2011, Bahng and
Kincade 2012, Busse et al. 2012, Arunaj and Ahrens 2016, Tran 2016) will not affect the
accuracy of a prediction using weather predictors, providing the effect is correctly
estimated. By including both daily high and low temperature data, we capture the
expected effects from the most extreme possible values. During summer months when
high temperatures are more likely to have an effect, the high temperature indicator is
more likely to have an effect on demand. In winter months, the low temperature
indicator is more likely to have an effect on demand. Temperature is also one of the
more reliably estimated weather parameters over a short horizon (Camargo and Hubbard
1999), but accuracy degrades at middle ranges (Floehr 2015). This leads to the following
hypotheses:
H1a: Demand forecast models that incorporate exogenous high temperature predictions
will be more accurate than models that do not.
H1b: Demand forecast models that incorporate exogenous low temperature predictions
will be more accurate than models that do not.
H1c: Demand forecast models that incorporate short range (one to three days)
exogenous temperature predictions will be more accurate than models that incorporate
medium range (five to nine days) exogenous temperature predictions.
79
Bujisic et al. (2016) indicate wind has a significant negative effect on restaurant
demand, and so inclusion is likely to improve demand forecast accuracy. Wind
prediction, though also quite accurate (Camargo and Hubbard 1999), degrades in quality
with longer forecast horizons. We therefore hypothesize the following:
H2a: Demand forecast models that incorporate exogenous wind speed predictions will be
more accurate than models that do not.
H2b: Demand forecast models that incorporate short range (one to three days)
exogenous wind speed predictions will be more accurate than models that incorporate
medium range (five to nine days) exogenous wind speed predictions.
The effect of sunlight on negative (Murray et al. 2010) and positive (Cunningham
1979) affect is well established, and it has been found to positively relate to willingness
to pay (Murray et al. 2010), interaction quality (Gardner 1985), perceptions of goods
(Bitner 1992), decreased risk aversion (Li et al. 2017), and increased spending (Donovan
et al. 1994, Spies et al. 1997). However, it is still unclear whether this translates to
customer propensity to physically patronize businesses (Parsons 2001). Based on
previous work that indicate a significant effect of this weather phenomenon on demand,
we predict its inclusion will improve demand forecast accuracy. We do not have an
indicator of weather forecast quality for sunlight (cloud cover), so will not speculate
differences in forecast horizon. The resulting hypothesis is:
H3: Demand forecast models that incorporate exogenous cloud cover predictions will be
more accurate than models that do not.
Various forms of rain and other precipitation have been found to have a
significant, but heterogeneous, effect on demand (Parsons 2001, Lazo et al. 2011, Busse
et al. 2012, King et al. 2014, Arunaj and Ahrens 2016, Steinker et al. 2016, Tran 2016, Li
80
et al. 2017). As posited in Steele (1951), this is likely negative (and more pronounced)
when more extreme weather such as snow and thunderstorms make business patronage
inconvenient or impossible. Unfortunately, precipitation forecasting tends to have lower
reliability, particularly for locations further from a weather station (Camargo and
Hubbard 1999). This is especially true as the forecast horizon increases. As a result, we
hypothesize:
H4a: Demand forecast models that incorporate exogenous rain predictions will be more
accurate than models that do not.
H4b: Demand forecast models that incorporate exogenous thunderstorm predictions will
be more accurate than models that do not.
H4c: Demand forecast models that incorporate exogenous snow predictions will be more
accurate than models that do not.
H4d: Demand forecast models that incorporate exogenous precipitation (all kinds)
predictions will be more accurate than models that do not.
H4e: Demand forecast models that incorporate short range (one to three days)
exogenous rain, thunderstorm, snow, or precipitation (all kinds) predictions will be more
accurate than models that incorporate medium range (five to nine days) predictions.
Methodology
Autoregressive Models
To as great an extent as practical, we replicated the 4PL firm’s forecast
conditions. This was to ensure enhanced comparability with their own forecasting
efforts. We did not have access to the forecast management system in use by the firm,
and so could not replicate individual forecast model decisions. The firm customizes
models, parameters, and adjustments in some cases to even the product-restaurant level
using a combination of quantitative extrapolation and subject matter expertise of local
81
conditions. By comparison, we use a single (albeit adaptive) method to generate all
forecasts, but include weather forecast indicators that are not included in the 4PL’s
models. Despite our inability to directly compare identical models, it is interesting to
observe whether the inclusion of predictive indicators can improve a generic time series
model to the extent that it might even outperform a more customized model.
We wished to generate POS forecasts for each product, at each restaurant, for
each day over a seven day horizon, once each week. This required that we generate as
many as 174,000 separate sets of rolling forecasts. Each set of rolling forecasts would be
replicated with each of eight weather effects under both perfect knowledge and
uncertainty, and include on average 10-15 re-estimated rolling forecasts. It was apparent
that an automated method of forecast generation was required. Automated forecasting
methods have been shown to outperform methods with static manual estimation, and
adaptive to multiple time series characteristics (Makridakis and Hibon 2000). In
addition, we needed a method that permitted effective control of exogenous effects. We
wished to observe the effect of including exogenous short term predicted weather
variables. In order to isolate these effects in a regression model, we had to include
corrections for violations of ordinary least squares regression. Specifically, we wish to
correct for autocorrelation, trend and seasonality if they exist. ARIMA is a flexible class
of model that can incorporate these corrections and include exogenous regressors, all in
an automated fashion.
ARIMA models include autoregressive (AR) terms, or lagged values included as
predictors of outcome variables. They include differencing or integration (I) terms, to
82
transform non-stationary time series to be stationary. They also include moving average
(MA) terms that express errors as linear combinations of current and lagged errors. Each
of these terms have positive integer orders denoted by p, d, and q, indicating the number
of lagged, differencing, and moving average terms respectively that the outcome variable
is regressed on. To account for seasonality, ARIMA models can be adapted to include
additional lagged, moving average, and differencing terms, denoted by P, Q, and D
respectively with backshift operators in multiples of m seasons. This model, denoted
ARIMA(𝑝, 𝑑, 𝑞)(𝑃, 𝐷, 𝑄)𝑚 , is expressed as (Cools et al. 2009, Arunaj et al. 2016):
𝜙𝑝 (𝐵)Φ𝑃 (𝐵𝑚 )(1 − 𝐵 𝑚 )𝐷 (1 − 𝐵)𝑑 𝑌𝑡 = 𝜃𝑞 (𝐵)Θ𝑄 (𝐵𝑚 )𝜀𝑡
Where 𝑌𝑡 is a time series observed value (daily sales) at time t, 𝜙𝑝 (𝐵) and
Φ𝑃 (𝐵𝑚 ) are the non-seasonal and seasonal autoregressive operators with respective
orders p and P, 𝜃𝑞 (𝐵) and Θ𝑄 (𝐵 𝑚 ) are the non-seasonal and seasonal moving average
operators with respective orders q and Q, (1 − 𝐵)𝑑 and (1 − 𝐵 𝑚 )𝐷 are the non-seasonal
and seasonal differences with respective orders d and D, and 𝜀𝑡 is a residual error term at
time t. All operators of form 𝛼𝑥 (𝐵) represent polynomials of backshift operators of
form: 1 − 𝛼1 (𝐵) − 𝛼2 (𝐵 2 ) − … 𝛼𝑥 (𝐵 𝑥 ).
To include the effects of exogenous variables, the seasonal ARIMA terms can be
represented as the stochastic error term in a multiple linear regression model (Aburto and
Weber 2007, Cools et al. 2009, Peter and Silvia 2012, Arunaj et al. 2016):
𝑌𝑡 = 𝛽0 + ∑ 𝛽𝑖 𝑋𝑖,𝑡 + 𝜂𝑡
𝑖=1
83
Where 𝑌𝑡 , the dependent variable, is a time series observed value (daily sales) at time t, 𝑋
represents k separate independent regressors, 𝛽 represent multiple regression coefficients
of the k regressors, and 𝜂𝑡 is a residual series defined by the seasonal ARIMA
parameters:
𝜃𝑞 (𝐵)Θ𝑄 (𝐵𝑚 )
𝜂𝑡 = 𝜀
𝜙𝑝 (𝐵)Φ𝑃 (𝐵𝑚 )(1 − 𝐵 𝑚 )𝐷 (1 − 𝐵)𝑑 𝑡
This model, referred to as seasonal ARIMA with external regressors
(SARIMAX), was estimated for each menu item-location combination. Since each time
series would likely exhibit distinct seasonality, trend, and autoregressive characteristics,
we chose an adaptive algorithm developed by Hyndman and Khandakar (2008), the
‘auto.arima’ function in the ‘forecast’ package v.8.0 in R (Hyndman et al. 2017).
As described above, when external regressors are used, ARIMA parameters are
estimated from the residuals of a linear model predicting the time series of interest. The
‘auto.arima’ function estimates seasonal ARIMA parameters automatically by, first,
selecting the order of differencing using successive unit root tests (Hyndman and
Athanasopoulos 2014). As Hyndman and Khandakar (2008) describe, it estimates
seasonal difference order (D) using the Osborn, Chui, Smith and Birchenhall unit root
test. This has been shown to perform favorably compared to other unit root tests for
seasonal differences (Rodrigues and Osborn 1999). Second, it estimates non-seasonal
difference order using the Kwiatkowski-Phillips-Schmidt-Shin test, as it corrects a bias
toward over-differencing found in other tests of stationarity by testing the assumption that
𝑑 = 0, rather than 𝑑 = 1. Once D and d are determined, it estimates values of p, q, P and
84
Q by minimizing the corrected Akaike’s Information Criterion (AICc), a correction of the
AIC for finite samples:
2(𝑟)(𝑟 + 1)
𝐴𝐼𝐶𝑐 = −2 log(𝐿) + 2(𝑟) +
(𝑛 − 𝑟 − 1)
Where L is the maximum likelihood function value for the model, n is the number of
observations used to fit the model, and 𝑟 = 𝑝 + 𝑞 + 𝑃 + 𝑄 + 𝑘 + 1 is the number of
model parameters. Model parameters include all order parameters, the variance of the
random error term 𝜀𝑡 , and 𝑘 = 1 when there exists bias in the ARIMA error, else 𝑘 = 0.
AICc rewards goodness of fit and penalizes both model complexity and small relative
sample size, resulting in high performing, but parsimonious models (Hyndman and
Athanasopoulos 2014).
Models for Comparison
In order to measure the effect of including predicted weather variables in seasonal
ARIMA models, what we will call the evaluation models, we include three benchmarks
as a mean for comparison. First are the baseline models, with no external predicted
weather indicators. This is the most direct method of comparison. Second, following the
example of Steinker et al. (2016), we generate upper-bound models that include observed
(rather than predicted) weather external covariates to forecast demand over the evaluation
period. This is obviously unavailable to the demand forecaster, but provides an idea of
the potential for weather’s inclusion as weather prediction continues to improve. The
third comparison, and least direct, are the 4PL forecasts. The baseline, upper-bound and
4PL forecasts are either generated or collected for each menu item-location for
comparison against the evaluation models.

85
We begin by estimating baseline seasonal ARIMA forecast models without
exogenous variables for each menu item-location combination. Each baseline model is
fit on a full year of in-sample POS data to capture all potential annual seasons. This
exceeds the requirements suggested by Hyndman (2007) of 𝑝 + 𝑞 + 𝑃 + 𝑄 + 𝑑 + 𝑚𝐷 +
1 minimum observations required to estimate a seasonal ARIMA model, which is based
on statistical estimability only. This does not take into account the potential for increased
model fidelity achievable when all annual seasonal variations are included in a training
set. This stringent requirement reduced our sample further, as not all menu item-
locations had the minimum training sample size.
The estimated model is then used to forecast over a seven day horizon. The out-
of-sample data, or all observations that occur after the minimum 365 days of in-sample
data, is then used for generating a successively updated forecast. This is a common
method of forecast validation (Armstrong 2002), but we augment this further by also re-
estimating model parameters with updated POS data. In seven day intervals, all model
parameters are re-estimated with the previous 365 days of POS observations. In this way,
we generate forecasts over the entire test period with the same frequency that the 4PL
firm would, and ensure models have the best possible fit over a forecast horizon. Of the
retained menu item-location combinations, re-estimation was conducted an average of
10-15 times.
Models Including Exogenous Observed and Predicted Weather Indicators
We include weather effects individually for four reasons. First, although we can
assess the relative significance of an external regressor in a seasonal ARIMA model, we
86
cannot know whether a significant predictor improves demand forecast quality.
Therefore, we must assess each regressor separately to determine whether it improves
predictive power. Second, each weather effect varies in its reliability. It is useful to
separately observe differences in demand forecast improvement from inclusion of factors
that differ in reliability. Third, this helps to prevent manual heterogeneous variable
selection. As we saw in previous efforts to quantify the effect of weather factors on
demand, the effects were heterogeneous over a number of dimensions. Each menu item-
location combination may indicate significance of separate weather factors that requires
either manual model fitting or potentially misspecified models. For thousands of separate
forecasts, this would not be practical, and would limit comparability of each forecast.
Fourth, and perhaps most limiting when incorporating external regressors in an
autoregressive model, high covariance among exogenous regressors makes a model
inestimable. This is of particular concern with weather effects that tend to be highly
correlated. Wind and rain, for instance, almost always coincide with thunderstorms, and
snow and low temperatures are strictly linked.
Just as in the baseline models, each model with exogenous observed and
predicted weather is fit on a full year of in-sample POS data. In-sample observed
weather phenomena is included to capture the effect of weather. As mentioned
previously, observed cloud cover (sunlight) data was unavailable, so we include one day
ahead forecasts as a proxy. One day ahead forecasts are presumed to be the most
accurate available substitute for observed data.
87
Each model is then projected forward with out-of-sample POS data. For the
upper-bound forecasts, this includes out-of-sample observed weather variables as
covariates. Evaluation models instead include out-of-sample predicted weather over a
seven day horizon. As with the baseline model, this forecast occurs every seven days,
and each successive forecast re-estimates model parameters with the previous 365 days of
in-sample POS and observed weather data.
Forecast Evaluation
From each successively updated forecast, we generate indicators of forecast
quality. As Hyndman and Koehler (2006) note, each proposed measure of forecast
quality has limitations. Some are scale-dependent, such as MAE or root mean squared
error (RMSE). These are more useful for calculating costs, but provide no value for
comparison of forecasts. Measures based on percent error, such as mean absolute percent
error (MAPE) are popular in practice because they are easy to calculate and can be useful
for comparison. This is also the measure currently in use by the 4PL. Unfortunately,
MAPE suffers from inflation in time series with low values and is inestimable for time
series with zero values. Because of these deficiencies, and because both Makridakis and
Hibon (2000) and Armstrong (2002) recommend using multiple measures of accuracy,
we also include the bias indicator mean error (ME), the relative error indicator Theil’s U
(Theil 1966), and mean absolute scaled error (MASE), a scaled error term suggested by
Hyndman and Koehler (2006). MASE removes scale by comparing a forecast to a
known method (usually naïve), and also permits multiple period out-of-sample error
estimation, not possible with relative error measures like Theil’s U. Error is scaled using
88
𝑒𝑡 (ℎ−1)
in-sample naïve forecast MAE (Hyndman and Koehler 2006) 𝑞𝑡 = ∑ℎ for
𝑡=2 |𝑌𝑡 −𝑌𝑡−1 |
nonseasonal data, or in-sample seasonal naive forecast MAE (Hyndman and
𝑒𝑡 (ℎ−𝑚−1)
Athanasopoulos 2014) 𝑞𝑡 = ∑𝑛 for seasonal data, where 𝑒𝑡 = 𝑌𝑡 − 𝐹𝑡 is the
𝑡=𝑚+1 |𝑌𝑡 −𝑌𝑡−1 |
difference between forecast and observed demand, m is seasonality and h is the length of
the forecast horizon. MASE is then simply ∑ℎ𝑡=1 |𝑞𝑡 |⁄ℎ. Of note, though we generate
daily forecasts, MAPE is generated over the relevant replenishment period to limit
inflation and ensure estimability. Restaurant replenishment occurs twice weekly, so each
seven-day forecast period is evaluated on two replenishment periods. We finally also
include weekly MAPE as this is how the 4PL firm tracks accuracy, so that a direct
comparison can be made.
Addressing Computational Scale
Automated parameter estimation permits an increase in the number of menu item-
location combinations to be forecast by limiting manual model estimation. However,
even after filtering out menu item-locations with too few observations or at distances too
far for reliable weather effect estimation, we estimated 105,875 baseline sets of forecast
models. After matching available observed and predicted weather data, we estimated an
additional 747,542 upper-bound and evaluation sets of forecast models. Finally, we
calculated forecast quality metrics for each individual seven day forecast model
contained in each rolling set, as well as for each individual seven day forecast period
provided in the 4PL sample. In total, we generated and (or) evaluated over 1.7 million
sets of rolling forecasts. Particularly since we re-estimated ARIMA parameters in
89
successively updated forecasts, this became computationally burdensome. As a result,
utilized extensive parallel computing for tractability. All forecasts were generated
through the Ohio Supercomputer Center’s Owens Cluster, a 23,392-core HP Intel Xeon
E5-2680 v4 machine capable of 750 teraflops (OSC 1987).
Output Analysis
In order to determine the effect of including predicted weather variables on
demand forecasts, we compare the eight generated forecast quality measures in separate
linear models. We include a categorical predictor of external weather variable used to
generate a forecast. Each of the aforementioned eight weather prediction variables
represents a factor level, with the baseline condition serving as the reference level.
We control for two expected continuous sources of variance in each of the
models. Increased volatility in times series is known to inherently reduce forecastability,
so we control for this by including the coefficient of variation (CV) as a covariate
(Armstrong 2002). Our method of matching restaurant locations via a single distance
threshold also likely has an effect on weather forecast accuracy, and thus demand forecast
quality derived from weather. Both observed and predicted weather relevant to each
ICAO weather sensor will more accurately reflect the weather conditions at restaurants
close by, but less so at restaurants close to the cutoff threshold. We therefore control for
potential differences by including geodesic ellipsoid distance as a covariate.
In addition to continuous predictors likely to influence accuracy, we also control
for two expected categorical sources of variance. To account for confounding effects
from regional and product-based heterogeneities in weather variable effects on measures
90
of demand forecast quality, we separate our analysis by product categories and regions.
As we have no hypotheses regarding the direction or size of the heterogeneous effects,
we chose a weighted effects coding scheme that merely measures differences from an
overall mean (Darlington and Hayes 1990, Cohen et al. 2013).
Regional heterogeneities are accounted for using weighted effects codes for the
nine climatic regions defined by the NOAA as exhibiting similar characteristics for
temperature and precipitation for more than a century (Karl and Koss 1984).
Figure 3 (Sanchez-Lugo 2017) displays the regional boundaries used in this
research, and relative frequency of restaurants for each climate region are reported in
Table 3. While more granular microclimate divisions exist that could account for
greater degrees of weather differentiation (Vose et al. 2014), these more aggregate
regions have more explainable differences in response to specific weather effects. For
instance, it is likely that consumer response to snow in the Northeast, where such weather
is common winter months, will not be the same as in the warm Southwest. However
responses in Albany and Buffalo, NY, who occupy different microclimate divisions
(Fenimore 2017) but the same climate region, would likely not differ substantially due to
similar weather patterns.
91
Figure 3: NOAA Climate Regions
92
NOAA Region No. of Restaurants
Central (Ohio Valley) 663
East North Central (Upper Midwest) 4
Northeast 300
Northwest 0
South 583
Southeast 375
Southwest 6
West 758
West North Central (Northern Rockies and Plains) 53
Table 3: Restaurant Sample by NOAA Region
We account for potential product-based demand response heterogeneities
similarly to Bujisic et al. (2016). They separate their full service menu items as main
courses, sides, as children’s or adult meals and by mealtime. We coded menu items as
breakfast entrees or sides, lunch/dinner entrees or sides, desserts or shakes, drinks, and
add-ons or specialty items.
Results
Comparing output from the four sets of models (evaluation, baseline, upper-
bound and 4PL), we find some interesting results. We began with a preliminary
comparison of means for each of the forecast quality measures we generated as shown in
Table 4.
93
ME RMSE MAE MASE Theil's U MAPE 1 MAPE 2 Weekly MAPE
4PL -1.132 14.903 12.269 * 0.838 25.538 27.330 20.688
Baseline -1.514 18.095 15.015 0.860 0.908 18.302 19.473 14.932
Upper-Bound
High Temp. -1.291 18.116 15.024 0.858 0.908 18.018 19.124 14.712
Low Temp. -1.418 18.179 15.084 0.861 0.911 18.175 19.294 14.825
Wind Speed -1.492 18.097 15.019 0.860 0.909 18.303 19.494 14.948
Cloud Cover -0.676 18.833 15.900 0.901 0.987 22.253 23.282 17.927
Rain -1.526 18.126 15.045 0.862 0.911 18.365 19.528 14.983
Thunderstorms -1.731 18.147 15.061 0.854 0.905 18.144 18.990 14.690
Snow -1.717 18.189 15.084 0.858 0.912 18.800 19.852 15.223
Precipitation -1.494 18.090 15.013 0.860 0.909 18.275 19.417 14.882
94
Evaluation
High Temp. -1.295 18.116 15.024 0.858 0.907 18.020 19.155 14.709
Low Temp. -1.409 18.156 15.064 0.860 0.910 18.157 19.268 14.812
Wind Speed -1.304 18.180 15.084 0.864 0.913 18.362 19.604 15.073
Cloud Cover -0.923 18.832 15.930 0.901 0.991 22.409 24.045 18.232
Rain -1.754 18.144 15.066 0.863 0.912 18.518 19.693 15.141
Thunderstorms -1.773 18.166 15.075 0.855 0.905 18.168 19.014 14.708
Snow -2.157 18.322 15.203 0.865 0.919 19.257 20.433 15.687
Precipitation -1.821 18.132 15.060 0.863 0.912 18.574 19.761 15.198
Table 4: Mean Demand Forecast Quality Measures by Included Exogenous Weather Variable
Bias (ME) across all models tended to be negative on average. This negative bias
was amplified in models with various precipitation-based effects (for both evaluation and
upper-bound models), and was lower in 4PL models. Mean RMSE and MAE were also
lowest in 4PL models, and were lower in the baseline models than in models that
included weather effects. A lone exception in the upper-bound models were models that
included overall precipitation data, which had on average slightly lower RMSE and
MAE. Scaled error (MASE) was on average slightly lower in both evaluation and upper-
bound models that included thunderstorm data, but equal or slightly worse than the
baseline models with all other weather effects. We could not calculate this metric for
4PL models, as it depended on unavailable in-sample data. Relative measures (Theil’s
U) returned similar results, with thunderstorm data (and curiously predicted high
temperature data) resulting in slightly lower Theil’s U scores on average. As with scale-
dependent metrics, 4PL models had the lowest relative measures.
Percent errors represent the most interesting results, as 4PL models that had
dominated scale-dependent and relative measures were on average worse for percent
errors. Mean MAPE, as measured over the first and second replenishment periods as
well as over the weeklong planning period, was significantly higher in 4PL models than
in baseline, upper-bound or evaluation models. Further, inclusion of high and low
temperature, as well as thunderstorm data resulted in lower average MAPE for both
evaluation and upper-bound models. Upper-bound models with overall precipitation
included also had lower MAPE. This curious result is likely due to the differences
inherent in the metrics in use. Scale-dependent metrics penalize large deviations,
95
regardless of demand volume, and percent metrics experience inflation when demand is
small. Therefore we can conclude that evaluation, baseline, and upper-bound models
tend to be more accurate when demand is small, but when they miss, they miss larger
than in the 4PL models. Further, it seems that by including external weather effects in
estimation, this effect is exacerbated.
In addition to the aggregate effects we observe between the forecast methods and
included predictive weather factors, we also examined differences in demand forecast
quality measures that were likely due to regional or product-related heterogeneities.
Table 5 and Table 6 display differences in forecast
quality metrics among evaluation models based on NOAA region and menu category
respectively.
96
NOAA Region ME RMSE MAE MASE Theil's U MAPE 1 MAPE 2 Weekly MAPE
Central (Ohio Valley) -1.947 18.388 15.368 0.848 0.926 19.252 20.452 15.639
East North Central
(Upper Midwest) -3.371 19.814 16.506 0.850 0.885 20.113 18.179 15.039
Northeast -2.540 18.446 15.410 0.892 0.965 22.224 24.810 18.723
South -1.460 18.232 15.124 0.865 0.904 18.354 19.124 14.732
Southeast -1.579 17.897 14.854 0.885 0.951 19.895 21.614 16.614
Southwest -1.387 17.793 14.633 0.817 0.903 19.162 19.646 15.060
West -0.449 18.268 15.158 0.862 0.896 16.678 17.576 13.565
West North Central
(Northern Rockies and
Plains) -1.753 17.217 14.247 0.838 0.882 19.708 18.939 14.916
97
Table 5: Mean Demand Forecast Quality Measures by NOAA Region for Evaluation Models
Menu Category ME RMSE MAE MASE Theil's U MAPE 1 MAPE 2 Weekly MAPE
Add-on/Specialty -0.317 9.739 8.060 0.859 0.849 13.701 13.973 10.727
Breakfast Entrée -2.386 13.836 11.483 0.889 0.893 20.085 20.840 15.804
Breakfast Side -5.445 32.840 27.100 0.909 0.916 20.597 20.962 16.014
Dessert/Shake -0.836 10.088 8.300 0.731 0.875 26.408 27.531 20.629
Drinks -2.242 16.522 13.721 0.876 0.909 15.675 16.706 12.956
Lunch/Dinner Entrée -0.035 19.842 16.583 0.886 0.954 16.962 18.809 14.662
Lunch/Dinner Side -2.420 27.614 23.035 0.893 0.956 18.039 19.159 14.777
Table 6: Mean Demand Forecast Quality Measures by Menu Category for Evaluation Models
98
Scale-dependent metrics tended to be worse in the Upper Midwest and Northeast
regions, which could be a result of greater proportions of inclement winter weather in
those areas. Severe weather can cause large misses in demand estimates, which are
indicated by scale-dependent measures. The evaluation period spans mostly winter
months, so these areas are more likely than Southern regions to experience inclement
winter weather. Scaled, relative and percent measures tended higher primarily in the
Northeast and Southeast.
Among menu categories, side items tended to perform worse among both scale-
dependent and scaled metrics. Dinner items performed worst among dependent
measures, and desserts were significantly worse than all others for percent errors.
Significant heterogeneities exist between both NOAA regions and menu categories, and
so the use of these indicators as controls is justified when evaluating the effect of weather
variable incorporation on demand forecast quality measures.
We next conducted an analysis of variance for upper-bound and evaluation
models that include all previously identified control variables. In the upper-bound
models that include observed weather, we observe the asymptotic limit of the effect of
weather forecast information on demand forecast quality measures. If the included
weather information did not introduce additional variance, these are the results we would
expect. Table 7 displays the
results from inclusion of error-free weather variables.
99
-2.204 27.217 22.836 0.938 0.893 6.899 8.005 6.472
Intercept *** *** *** *** *** *** *** ***
Weather Variables
0.223 -0.002 -0.284 -0.349 -0.221
High Temp. *** * *** *** ***
0.096 0.004 -0.127 -0.179 -0.108
Low Temp. ** *** * ** *
Wind Speed
0.838 0.738 0.885 0.040 0.079 3.953 3.809 2.995
Cloud Cover *** *** *** *** *** *** *** ***
0.002 0.003
Rain . ***
-0.002 -0.003 -0.333 -0.529 -0.287
100
Thunderstorms . *** *** *** ***

0.171 -0.003 -0.006 -0.334 -0.579 -0.421
Snow *** * *** *** *** ***
Precipitation
Covariates
1.878 -23.829 -20.388 -0.213 0.046 32.473 33.169 24.254
CV *** *** *** *** *** *** *** ***
Station -5.47E-2 -4.80E-2 -2.29E-4 -1.23E-4 -1.81E-2 -6.91E-3
Distance *** *** *** *** *** ***
NOAA Region *** *** *** *** *** *** *** ***
Menu Category *** *** *** *** *** *** *** ***
Significance codes: ‘***’= <0.001, ‘**’= <0.01, ‘*’= <0.05, ‘.’= <0.1
Table 7: Regression Effects of Observed Weather in Demand Forecasts on Forecast Quality Measures
Evaluation models represent a more realistic state for demand planners. Each
weather prediction may have varying reliability depending on weather phenomena, range
and proximity of a restaurant to a weather station. Each of these considerations makes
the inclusion of predictive factors a more complicated and questionable decision. The
results of models that include weather predictions in the estimation of demand forecast
models are included in Table 8. A brief
discussion of the implications of these findings on each hypothesis posed earlier follows,
summarized in Table 9.
101
-2.248 27.219 22.845 0.939 0.895 6.814 7.895 6.379
Intercept *** *** *** *** *** *** *** ***
Weather Variables
0.220 -0.002 -0.282 -0.318 -0.224
High Temp. *** . *** *** ***
0.106 0.002 -0.144 -0.205 -0.120
Low Temp. *** * ** ** **
0.210 0.004 0.006 0.131 0.141
Wind Speed *** *** *** * ***
0.592 0.737 0.915 0.041 0.083 4.109 4.573 3.299
Cloud Cover *** *** *** *** *** *** *** ***
-0.240 0.003 0.004 0.220 0.224 0.211
Rain *** ** *** *** *** ***
102
-0.002 -0.304 -0.468 -0.249

Thunderstorms . *** *** ***
-0.233 0.174 0.118 0.004 0.109
Snow *** * . ** .
-0.307 0.003 0.004 0.274 0.289 0.267
Precipitation *** ** *** *** *** ***
Covariates
2.035 -23.854 -20.425 -0.216 0.043 32.833 33.358 24.492
CV *** *** *** *** *** *** *** ***
-2.64E-3 -5.41E-2 -4.77E-2 -2.29E-4 -1.29E-4 -6.73E-3 -1.44E-2 -6.17E-3
Station Distance * *** *** *** *** *** *** ***
NOAA Region *** *** *** *** *** *** *** ***
Menu Category *** *** *** *** *** *** *** ***
Significance codes: ‘***’= <0.001, ‘**’= <0.01, ‘*’= <0.05, ‘.’= <0.1
Table 8: Regression Effects of Predicted Weather in Demand Forecasts on Forecast Quality Measures
By including high temperature in demand forecasts, we demonstrated significant
reductions in negative bias, scaled and percent errors. However, there were no significant
effects on scale-dependent or relative measures. These effects are consistent between
upper-bound and evaluation models in both significance and relative effect size,
indicating that this effect is robust to some degradation in forecast accuracy. This
partially supports H1a, that inclusion of high temperature predictions improves demand
forecast quality.
Inclusion of low temperature in demand forecasts produced similar results, if
generally at a lower magnitude. Both upper-bound and evaluation models showed a
decrease in negative bias, and significant reductions in percent error measures. However,
there was no significant effect on scale-dependent or scaled measures, and a small but
significant increase in Theil’s U. This indicates partial support for H1b, that inclusion of
low temperature predictions improves demand forecast quality. However, improvements
tended to be in measures that face inflation from error coinciding with low demand.
Theil’s U, scaled by RMSE and therefore more sensitive to large errors regardless of
coinciding demand level, saw a slight increase.
H1c was also partially supported, that forecast models incorporating short range
predictions (one to three days) of temperature are more accurate than those incorporating
medium range predictions (five to nine days). While it is true that error was higher in
forecasts at a longer range, the effect of temperature predictions on percent error is
greater at longer ranges. Table 4 shows
that MAPE is higher for the second replenishment period under all upper-bound and
103
evaluation models, regardless of weather effect included. However, as shown in
Table 7 and
Table 8, the magnitude of error reduction through inclusion of temperature
variables is greater at longer ranges. This means that the potential for error reduction
through inclusion of external predictive factors is greater when the demand forecast is
more uncertain, regardless of the uncertainty of the predictive factor. This contradicts the
theory behind, if not the explicit statement of H1c.
Wind speed inclusion appears to have no effect on any measure of demand
forecast quality among upper-bound models, and in evaluation models actually increases
measures of scaled, relative and percent error. Despite also reducing negative bias in
evaluation models, the overall effect of wind prediction mostly contradicts H2a, or that
inclusion of wind predictions will increase demand forecast quality. H2b on the other
hand, is supported. Less accurate medium range wind forecasts tend to degrade demand
forecasts to a greater extent than short range wind forecasts.
Inclusion of cloud cover data in both upper-bound and evaluation models
significantly increases scale-dependent, scaled, relative and percent error, while
decreasing negative bias. As a result, H3 is mostly not supported. It is worth noting, this
measure also has no indicator of accuracy, so nothing definitive could be said about the
effect of including perfect cloud cover prediction.
Given perfect prediction of rain, as in upper-bound models, the only significant
effects are a slight increase in scaled and relative error. In evaluation models, predicted
104
rain significantly increases negative bias, scaled, relative and percent error. H4a, that
inclusion of rain predictions in demand forecasts will improve accuracy, is not supported.
Forecasts including thunderstorm data, on the other hand, demonstrated
significant improvements in scaled, relative and percent errors in upper-bound models,
and in relative and percent errors in evaluation models. As a result, H4b is partially
supported.
Inclusion of snow weather data resulted in significant reductions of negative bias
and improvements in scaled, relative and percent errors in upper-bound models.
However, these effects were reversed in evaluation models. Negative bias was increased,
and scale-dependent, scaled and percent errors all showed some degree of increase. H4c
was not supported.
Models that included overall precipitation data in upper-bound models showed no
significant differences in demand forecast quality measures of any kind. In evaluation
models, negative bias was exacerbated, while scaled, relative and percent errors all
increased. H4d was therefore not supported.
H4d, the supposition that forecast models incorporating short range precipitation
data (of all kinds) are more accurate than medium range is partially supported. While
longer range demand forecasts each tended to be less accurate regardless of which
weather forecast variable was included, the directional effect of including each variable
was amplified at longer ranges. This means that predicted thunderstorm data decreased
error to a greater extent at longer ranges. Conversely, predicted rain and overall
precipitation inclusion increased demand forecast error slightly more at longer ranges.
105
Variable Support Explanation
Temperature
H1a High (oF) Partially Reduced negative bias, scaled and percent errors
Supported (supports), no other significant effects (does not
support)
H1b Low (oF) Partially Reduced negative bias and percent errors
Supported (supports), increased relative error, no other
significant effects (does not support)
H1c Medium Partially Increased error at longer range (supports), error
Range Supported reduction greater at longer range (does not support)
Wind
H2a Speed (mph) Mostly Reduced negative bias (supports), Increased scaled,
Not relative and percent error, no other significant
Supported effects (does not support)
H2b Medium Supported Increased error at longer range and error
Range amplification greater at longer range (supports)
Cloud Cover
H3 Percent Mostly Reduced negative bias (supports), Increased scale-
Not dependent, scaled, relative and percent error (does
Supported not support)
Precipitation Related
H4a Rain Not Increased negative bias, scaled, relative and
Probability Supported percent error, no other significant effects (does not
support)
H4b Thunderstorm Partially Reduced relative and percent error (supports), no
Probability Supported other significant effects (does not support)
H4c Snow Not Increased negative bias, scale-dependent, scaled
Probability Supported and percent error, no other significant effects (does
not support)
H4d Total Not Increased negative bias, scaled, relative and
Precipitation Supported percent error, no other significant effects (does not
Probability support)
H4e Medium Partial Increased error at longer range and error
Range (all Support amplification greater at longer range for rain and
kinds) total precipitation (supports), error reduction
greater at longer range for thunderstorms and error
amplification reduced at longer range for snow
(does not support)
Table 9: Summary of Hypothesis Test Results
106
Discussion
Upper-bound models demonstrate the potential for improvement in demand
forecast quality through inclusion of a number of external weather data under ideal
conditions. In particular, future information on both high and low temperature and
extreme weather such as thunderstorms and snow promised to improve a number of
demand forecast quality measures, providing the weather data contained no errors. In
evaluation models, this effect persisted for high temperature, low temperature and
thunderstorm predictions, but reversed for predictions of snow. Despite potential
improvements in both upper-bound and evaluation models, the inclusion of some weather
variables degraded predictions, and all had disparate effects on the various measures of
demand forecast quality. These mixed results we experienced demonstrate that
incorporation of external variables do not have straightforward effects, and that the
decision to include predictive indicators in an effort to improve demand forecasts
depends on a number of factors.
Specification Errors
Estimating demand as we have depends on some key assumptions inherent to
linear regression models. As Cohen et al. (2013) note, these include a correct
specification of the form of the relationship between independent (in our case external
weather indicators) and dependent variables (demand), that independent variables are
correctly specified (are significant), and that the independent variables are measured
without error. When including uncertain weather information in demand forecasts, it is
likely that at least one of these assumptions is not strictly satisfied. However this does
107
not typically matter in practice, as models that perform better are chosen regardless of
adherence to strict statistical orthodoxy (Hyndman and Athanasopoulos 2014).
However, ignoring these assumptions can introduce error in estimation and lead to
over-fitting and misspecification. As SARIMAX models are estimated in two stages, the
risks of estimation error can be compounded. Even if we assume independent variables
are measured perfectly, as was the case in our upper-bound models, their relationship can
be improperly specified or result in non-significant (or weakly significant) relationships.
Such weak relationships often include some conflation of truly random error when
estimated, which is then carried to the next stage of estimation assumed to be a genuine
relationship. This has been shown to degrade prediction performance (Kolassa 2016a,
Katsikopoulos and Syntetos 2016). We assumed linear relationships between weather
factors and demand in the first stage, and estimated seasonal ARIMA terms from the
residuals of that model. This can result in spurious explanation of random variance, and
confound relationships that may exist in the second stage of estimation. Misspecification
of this type will inevitably explain additional variance, whether or not the measured
relationships are spurious, so we controlled for this by minimizing our automatically
generated models on a measure (AICc) that includes a penalty for model complexity.
Even so, overfitting can prove problematic with predictive models as evaluation
transitions from in-sample to out-of-sample. Our results in upper-bound models for
models including error free wind speed and overall precipitation data, which proved non-
significant or even deleterious across all forecast quality metrics, may have been a result
of such overfitting.
108
Confounding
Significant effects of the inclusion of error free weather predictors on demand
forecast quality measures may have also been affected by confounding by product and
regional heterogeneities. We did control for NOAA region and menu category, and the
effects of these covariates were highly significant. However, errors in how these regions
or menu categories are specified can have confounding effects for a model. For instance,
boundary conditions between NOAA regions are likely similar in effect, but are assigned
the mean effect for their respective regions. These discrete differences make estimation
tractable, but can lead to distortion of effect estimation.
Weather Forecast Reliability
The varying reliabilities of weather factors may also have driven mixed results.
This is evident in the degradation of demand forecast quality between upper-bound and
evaluation models that included wind speed, cloud cover, rain, snow and overall
precipitation data. Each weather variable had differing reliability in evaluation models,
and therefore experienced different levels of degradation between upper-bound and
evaluation models.
For evaluation models including wind speed, the degradation of the effect of
predictive indicator inclusion on demand forecast quality measures indicates a possible
combination of effects. Insignificant effects from these predictors observed in upper-
bound models may have been from overfitting or misspecification. Including uncertainty
in the predictors introduces noise to models that are already potentially misspecified,
amplifying this effect.
109
The negative effect of including cloud cover is possibly a function of the available
data for cloud cover. Though many prior studies found support for levels of sunlight
significantly affecting demand (Johnston and Harrison 1980, Murray et al. 2010, Busse et
al. 2012, Li et al. 2017, Steinker et al. 2016), we only had predicted cloud cover data
available and thus no means of registering its relative accuracy. Our threshold of 30 km
for the distance between a restaurant and weather station are consistent with expectations
for reliable sunlight forecasts on the aggregate (Camargo and Hubbard 1999). However
Camargo and Hubbard (1999) directly measured solar radiation in their study, whereas
other studies measure binary sunny or non-sunny days (Li et al. 2017), hours of sunlight
(Johnston and Harrison 1980, Parsons 2001, Murray et al. 2010), sunlight as an input to a
composite weather measure (Steinker et al. 2016), or predicted percent of cloud cover
(Busse et al. 2012) to ostensibly measure the same effect. These proxies may all have
varying levels of reliability that corrupt their effect as previously reported and negatively
impact their value to demand forecasters.
Distinct aspects of rain, thunderstorm, snow and overall precipitation prediction
reliability may help explain our mixed results as well. Of the four precipitation-based
weather predictors, only models including thunderstorm predictions did not degrade
significantly in demand accuracy between upper-bound and evaluation models. Brier
scores alone could not account for this difference, as both snow and thunderstorm
predictions had low scores, and inclusion of uncertain snow forecasts in demand forecasts
degraded forecast quality measures to a greater extent than did inclusion of uncertain
thunderstorm forecasts. Thunderstorm predictions had the lowest PPV and highest
110
sensitivity of the four measures, indicating the lowest conservatism of the four
precipitation-based weather predictors. This implies that demand forecast quality is not
as adversely affected by false positives as it is by missed predictions. Demand managers
may use this insight to their advantage when selecting weather forecast services or
classification probability cutoffs for precipitation-based weather predictors.
Differential Effect between Demand Forecast Quality Measures
Variation in the effect of inclusion of exogenous weather predictors are also a
function of the metric in use. None of the included weather predictors in either the
upper-bound or evaluation models demonstrated an improvement in the scale-dependent-
metrics RMSE or MAE. In evaluation models, weather predictors tended to only slightly
improve and more likely degrade scaled and relative error metrics. For the included
weather predictions that significantly improved demand forecast error, improvement
came in the percent error metrics. These differences depend on the manner in which the
SARIMAX models were estimated, and the relative penalties imposed by each demand
forecast quality metric.
Predictive models were based on a minimization of AICc based on in-sample
demand and observed weather. That primary improvement occurred in percent error
metrics, that experience inflation in error coinciding with low demand, implies that
inclusion of external weather predictors improves accuracy when demand is low.
Increased error or insignificant effects on scale-dependent metrics imply that forecast
responsiveness is higher and may lead to larger individual errors when demand is higher.
Increases in scaled and relative errors imply an increase in responsiveness of demand
111
forecasts approaching that of a naïve forecast, with larger individual errors when demand
is higher.
Limitations
Limitations of our research included deficiencies in the weather and POS demand
samples, lack of insight into the 4PL model parameterization and adjustments as a basis
of comparison, and subjectivity regarding assignment to NOAA regions and menu
categories.
One year of in-sample data allowed for a characterization of seasonal demand
patterns for both demand and weather. However, we are limited to the assumption that
conditions which drive changes in demand are stationary, which may not be the case
given shifts in local and national tastes and stages of menu-item life cycle. Without more
demand data, characterization of trends or shifts are more limited. Similarly, weather
patterns can demonstrate anomalies from one year to the next, even on a national level.
As noted in Hu and Skaggs (2009), though weather forecast reliability has increased in
recent years, anomalies and associated reductions in medium-range weather forecast
reliability are likely to increase in frequency as the effects of global climate change
continue to manifest. Our out-sample evaluation data covers primarily winter months.
Though we have demand history that would indicate effects during other seasons, we
have no indication of model performance other than in the winter season. Additionally,
our weather samples were drawn from a sensor network with sparse geographic coverage,
resulting us having to compromise station relevance for sample size.
112
By virtue of the sample’s nationwide scope, these findings are generalizable over
a large number of circumstances. However, the results of this research are limited to a
single (albeit industry leading) quick service restaurant firm.
As noted above, boundary conditions of NOAA region assignments may be a
cause for additional effect distortion, and menu categorization is an admittedly subjective
assignment. It is possible that a more exploratory clustering or principal components
analysis approach could reveal a better of regional or product based grouping mechanism.
Managerial Implications
This research suggests that inclusion of short term predictive weather indicators
for high temperature, low temperature and thunderstorms can significantly reduce
demand forecast percent errors. It also indicates that the benefits of including weather
predictions is greater as forecast horizon is increased, despite a decrease in weather
forecast accuracy. These results apply to a wide range of geographic and product specific
contexts. However, demand planners must take care when including other weather
predictions, as they can have negative effects on demand forecast quality. The positive or
negative effects of including predictive weather indicators in demand forecasts depends
on factors such as weather phenomena forecast reliability, demand forecast error metric
of interest, and heterogeneous effects that may exist between regions and products.
While such external predictive weather indicators are constantly improving, they
currently have a reliable range that is on the boundary of being useful for most
operational planning. As weather prediction technology and weather forecaster skill
increases, managers can place more trust in these external indicators. However, even
113
reliable indicators should be treated with caution for the specification and confounding
issues discussed above.
As each demand forecast quality measure responded differently to inclusion of
additional information, managers should also carefully select which quality measures are
important to their business. In our case study, inclusion of weather variables tended to
make demand forecasts more responsive. When this improved measures of demand
forecast quality, it occurred most in relative measures. This means that for businesses
where large misses in high volume locations are relatively more expensive than a series
of small misses in low volume situations, inclusion of weather in demand forecasts may
not help operations. This may be the case if inventory costs are high and include high
proportions of perishable goods. The opposite may be true for businesses where costs
from low customer satisfaction are relatively more significant. Outlets facing high levels
of competition may risk more from a stock-out than from overstocking. Therefore, the
unequal improvement between measures of forecast quality require a manager to assess
which measures are most relevant to their situation when including external weather
predictions.
Future Research
The findings in this research provide support for a growing body of work relating
observed weather phenomena to demand and predicted weather to demand forecast
quality. We supported previous findings that inclusion of predicted temperature can
significantly improve demand forecast accuracy (Nikolopoulos and Fildes 2013, Steinker
et al. 2016), while motivating greater investigation into findings that were not supported,
114
relating predicted sunlight and precipitation to increased demand forecast accuracy
(Steinker et al. 2016). The results of this work are limited to a select few forecast quality
metrics. Future research must include more measures of quality, but also of overall
value.
Some of the counterintuitive results also indicate other relevant information about
the observed weather. Previous work estimating the effect of observed weather on
demand includes studies that look specifically at how different the observed effect is
from the average (Johnston and Harrison 1980, Tran 2016). We include no indication of
the unusualness of a prediction or observed weather effect. Future research should
extend these previous works to include some indication of the unusualness of a predicted
weather effect on demand forecast accuracy.
We also do not include potential substitution indicators. Many of the restaurants
in our sample share a common weather station from which their weather predictions are
gathered. They could be close enough to serve as substitutes for each other’s demand.
The same may be true of similar competing restaurants. Future research should attempt
to control for the confounding effects of substitution.
Travel intensity is another potential factor that could significantly interact with
weather in its effect on demand. Restaurants that experience a significant portion of their
demand from traveling customers, as may be the case in airports and along highways,
may experience effects from weather unrelated to mood and psychology. Effects from
delayed flights, closed roads, or other travel delays may confound or amplify weather
effects and contribute to poor demand forecasts. Controlling for individual restaurant
115
characteristics could improve future efforts to incorporate predicted short term weather in
demand forecasts.
Previous work in the economic value of uncertain information (Arrow 1965, Katz
and Murphy 1997), especially research on cost-loss utility functions (Thompson and
Brier 1955, Thompson 1962, Murphy 1977, Katz and Murphy 1990, Katz and Lazo
2011), can help guide future extensions to convert demand forecast quality improvements
into calculations of expected value. This research included some discussion of
differential effects of included exogenous weather forecast variable reliability on demand
forecast quality. However, a more explicit treatment of the effects of accuracy, bias,
sensitivity and specificity by predictive weather factor is warranted in order to fully
quantify the effects on demand forecast quality, and eventually value.
Finally, this research estimated models based on linear and stationary effects of
predictive weather indicators. Prior research suggests that many weather effects are
nonlinear (Murray et al. 2010, Lazo et al. 2011, Bahng and Kincade 2012, Arunaj and
Ahrens 2016, Tran 2016) and short to medium term weather forecasts will face increasing
non-stationarity (Hu and Skaggs 2009), so future extensions ought to include more
general estimation techniques.
116
Chapter 4: “A Systematic Literature Review and Typology of Factors that Bound
Demand Forecast Accuracy”
Introduction
Demand forecasting is the bellwether that synchronizes the supply chain.
Common thought regarding this critical input, among both practitioners and academics is
that achieving more accurate forecasts is universally beneficial to the supply chain.
Whether measured as the driver of safety stock, of variance in production processes, or
bullwhip propagating throughout the supply chain, accuracy in forecasts goes a long way
in ensuring efficient supply chain operations (Silver et al. 1998).
It is rare in academic literature for the somewhat counterintuitive question to be
asked: “How good is good enough?”. However, this is of central importance to the
demand planner. We know that demand forecast accuracy is critical and that more is
generally desirable in the supply chain, but what are the achievable limits? For that
matter, are there instances where demand planners ought not even pursue the achievable?
While a tremendous amount of academic literature investigates statistical, managerial, or
technological means to improve demand forecast accuracy (Winklhofer et al. 1996,
Fildes et al. 2008), few even concede the Pareto limits or costs to advances in precision
(Yokum and Armstrong 1995). Current exploration also exists in vertically isolated
channels, without an overarching frame to guide inquiry.
117
In this paper, we endeavor to identify the effects and contexts that have been
explored in logistics and supply chain management literature that bound the feasible
region for demand forecast accuracy. In doing so, we also identify areas that require
further inquiry with the hopes of guiding future academic engagement. This search
includes managerial, statistical, technological, and contextual facilitators and
impediments for forecast accuracy. To accomplish this, we conducted a systematic
review of the logistics and supply chain management literature, and identified structural
topics regarding drivers of both achievable and desirable levels of forecast accuracy.
This manuscript is organized as follows. First, we define forecast accuracy and
discuss general logistics and supply chain management research topics that may affect, or
be affected by, accuracy. Next, we detail our methodology of systematic literature
review, emphasizing transparency and replicability. Then, we discuss extant findings in
academic literature within identified logistics and supply chain management research
themes, describing managerial implications and suggesting future research areas for each
theme. Finally, we discuss limitations to this study, and draw conclusions regarding the
identified structural themes of demand forecast accuracy.
Defining Accuracy in Logistics and Supply Chain Management Research
Makridakis and Wheelwright (1989) refer to forecast accuracy as the “goodness
of fit” of some predictive model. This obviously has different connotations for each user,
application, and modeling approach. Murphy (1993) defines forecast accuracy as the
correspondence of individual observations and predictions. Though he was referring to
weather forecasts, this is equally applicable to demand forecasts. Box and Jenkins (1976)
118
express accuracy as the probability limits of a forecast such that some proportion of
realized values fall within it. This takes a statistical viewpoint, and implies a distribution
rather than point forecast. Armstrong (2002) defines forecast accuracy only in its relation
to forecast error, with error being the difference between a forecasted and observed
values. Wooldridge (2015), like Armstrong, defines accuracy in relation to error, but as
the additive inverse of Armstrong’s error. Both Armstrong’s and Wooldridge’s
definitions recognize that it is often easier and more useful to measure when a forecast is
wrong than when it is right. Silver et al. (1998) defines forecast accuracy simply as a
surrogate for overall production/inventory system performance. This definition lacks
precision, as many factors affect system performance beyond demand forecasting
accuracy. Makridakis and Hibon (2000) and Hyndman and Koehler (2006) note multiple
measures of forecast accuracy that each prioritize different aspects of error, and conclude
no one measure fully reflects accuracy. While neither work explicitly defines accuracy,
they do present strengths, weaknesses and applications for numerous proposed measures
of forecast error (as proxies for accuracy).
Among the various definitions, there is a general consensus that forecast accuracy
is a measure of absolute closeness of a prediction to observed conditions and a
complement to error. We choose to adopt this more general definition, as many slight
variations exist on how accuracy and/or error are measured and applied.
While this study focuses on forecast accuracy, many forecast evaluation
constructs are derived from, or are used as proxies for accuracy. Terms such as precision,
bias, deviation, error, quality, performance, consistency, and reliability all may overlap
119
the concept of accuracy, but have definitions that vary depending on source and usage.
We explore these concepts, in addition to the concept of accuracy, as they are discussed
in a wide range of supply chain management and logistics contexts that may help to
categorize our understanding of the effect of forecast accuracy.
Basic Overview of Systematic Literature Review
To address the question of “How good is good enough?”, we conducted a
structured literature review of research that has been done with respect to the bounds of
forecast accuracy. A structured or systematic review involves searching, selecting,
appraising, interpreting and summarizing of data from original studies (Crowther and
Cook 2007), and serves as the highest level in a hierarchy of evidence (Tranfield et al.
2003). In essence, it is the most complete manner of characterizing the state of
knowledge on a given subject, and a methodological advance over the narrative literature
review used more often in management research (Denyer and Neely 2004). Imposing
structure in a review helps make the process of knowledge collection and generation
transparent and replicable, while reducing the impact of researcher bias (Durach et al.
2017).
Following the advice of Tranfield et al. (2003), and expounded on by Durach et
al. (2017), we divide our literature review into several steps. The first is to define a
research question, as we have done above. All subsequent steps ought to be guided by
this original research question. Second, we determine the criteria for inclusion in a
review. This includes criteria to ensure relevance to the research question, but also to
ensure quality of source and tractability of the review itself. Third, we collect potentially
120
relevant research for review. We conduct this in multiple rounds. Based first on a
keyword search in a research database, and then on review of the cited and citing
literature of studies we have found to be relevant; a process we call “cascading”. Fourth,
we apply the inclusion criteria on the collected article sample. For each article returned
in the keyword search, this is done separately based on a review of the abstract, and then
on the full article for those abstracts that were deemed relevant. The same process is
conducted for cascaded articles considered for inclusion. Fifth, we synthesize the
relevant literature sample and develop themes around our research question. Sixth and
finally, we present the results of our search.
Criteria for Inclusion in the Literature Review
Topics for the Literature Search
Determining criteria for article inclusion began with the development of several
general topics and contexts that may have an effect on the upper and lower bounds of
demand forecast accuracy. These were identified through discussions within the author
team, and an initial search of the logistics and supply chain management literature for
forecast accuracy. General topics initially emerged as factors that may drive or inhibit
levels of forecast accuracy. These could be statistical, such as time series variability
limits to prediction or model over-fitting. Topics could be managerial, such as the cost of
gathering or processing information or the propensity of agents to distort or withhold
information. These could also be technological, such as advances in information sharing
technology or introduction of novel forecasting techniques. Below are the general topics
that emerged from our initial search, and the guide for our structured literature review.
121
The complete list of Boolean search terms can be found in Appendix A: Boolean
Keywords for EBSCO Business Source Complete.
Bullwhip Effect: Perhaps most prominent in the supply chain management
literature is the demand signal propagation effect originally known as industrial dynamics
(Forester 1958), but now more commonly referred to as the bullwhip effect (Lee et al.
1997). Forecast accuracy, and by extension all of the related constructs we listed, have
been observed to affect the nature and amplify the magnitude of the bullwhip effect.
Forecast error amplification can have significant negative cost effects within and between
firms of a supply chain, and it is useful to identify the importance of the accuracy of the
original demand signal relative to other factors contributing to the bullwhip effect. We
therefore included “bullwhip” and several variants in our keyword search.
Cost Tradeoffs: Accuracy improvements almost always come at some additional
cost. These costs must be balanced against the potential benefits derived from
incremental improvements. To capture this dimension, we included (primarily) empirical
research that strove to quantify either the incremental cost of forecast accuracy
improvements, or the costs borne from inaction. For each specific circumstance, we
preferred works that tried to calculate both. We included the term “tradeoff”, and several
variants as identified in two such works (Metters 1997, Sanders and Graman 2009).
Aggregation or Vertical Hierarchical Level: The hierarchical level at which the
forecast is generated, and the level for which it is intended significantly affect the impact
of forecast errors. While greater levels of aggregation tend to wash out noise and result
in higher levels of accuracy, the resultant forecasts also eliminate much of the detectable
122
signal. This makes more aggregated forecasts more useful for strategic purposes, but of
limited utility for operations. We include variations of “aggregation” defined in Clemen
(1989) and Zotteri et al. (2005) to select research that empirically defines the contextual
utility of various levels and types of forecast aggregation.
Impact of Manual Adjustments: Manual adjustments are often a necessity with the
reality of imperfect methods, insufficient information, or unquantifiable demand factors.
Furthermore, despite significant advances in both technology and methods to develop
reliable statistical forecasts, a large proportion of businesses rely very little or not at all
on quantitative forecasting methods. We include the term “judgement” and many
variations identified in Sanders and Ritzman (1995, 2004b) in order to explore the
significance of manual forecast adjustments on demand forecast error.
Impact of Information Sharing: A significant portion of research is dedicated to
the cost and efficacy of various types of information sharing in the supply chain. There
are many reasons to share information in the supply chain, but we are interested
specifically in the value generated in demand planning as outlined in Lee et al. (2000)
and Cachon and Fisher (2000). While obviously linked to the mitigation of the bullwhip
effect, it constitutes multiple separate streams of research and is but one of many means
of reducing the impact of forecast error signal amplification. For this reason, we include
“information sharing” along with various derivatives as a keyword search term.
Supply Chain Collaboration and Integration: Related to the idea of information
sharing, some supply chain partners go further and jointly forecast demand to smooth
supply chain operations. We differentiate collaboration from information sharing, and
123
include it on a continuum of integration between supply chain partners that in the extreme
results in vertical integration (Bowersox et al. 2013). We consider both vertical and
horizontal collaboration, between complementary and also competing members of the
supply chain. We included various instantiations of “collaboration” and “integration” as
keywords to identify work that discusses the forecast accuracy ramifications of different
arrangements.
Choice of Metric or Measure: Though we define accuracy above, there are still
many ways to measure the closeness of a prediction to an observed condition. We
include many ways of referring to this closeness in our search terms above, but we
needed additional search criteria to identify the implications of specific types of measures
in use. Answering the question “How good is good enough?” can have different answers
depending on the accuracy metric in use. Makridakis and Hibon (2000) and Hyndman
and Koehler (2006) served as a starting point to describe the contextual effects of the
choice of error metric on desired (or achievable) forecast accuracy. We therefore include
the search words “measure” and “metric”.
Overfitting and Misspecification: One noteworthy area for forecasting research
focuses on the difference between verification and validation. Forecast models typically
are fit on available data, and extrapolated forward. Modelers verify that their models fit
the available data, and the model is only validated when the extrapolation is sufficiently
close to the observed outcomes. A bedeviling reality for the modeler is that models that
fit very well to the past often do quite poorly in predicting the future (Armstrong 2002,
124
Kuhn and Johnson 2013). We include “over-fit” and its variations to capture the research
that defines the role of model misspecification on forecast accuracy.
Lead Time and Inventory Policy: With regards to inventory costs, demand
forecast inaccuracy likely has greater significance under conditions of longer lead time
and when paired with specific inventory policies (Silver et al. 1998, Fildes and Kingsman
2011). Some combinations of forecast inaccuracy and replenishment policy may also
significantly drive up transportation, handling, warehousing, ordering, and expiration or
obsolescence costs. As a result, we include terms related to “lead time” and “inventory”
in our search.
Forecast Horizon: As forecast horizons increase, demand forecast accuracy
diminishes. It is useful, however to forecast to various lengths for different operational
and strategic purposes. We include “horizon” and multiple variants as search keywords
to include research that considers the need for accuracy at various horizons.
Product Inimitability and Supply Chain Flexibility: As described in Rajaram and
Tang (2001), forecast accuracy may be relatively more important for supply chains in
which there are few or no substitutes for a product or component. Conversely, accuracy
may be substituted for with a more agile or flexible supply chain Christopher (2000),
Chopra and Sohdi (2004). As such, we include variants of “substitute”, “flexible” and
“agile” in our keyword search.
Tolerance, Resilience and Point Forecasts: Weiland and Wallenburg (2012)
describe a situation related to inimitability. Some supply chains are at greater risk, or
conversely have a greater tolerance for ambiguity. Demand forecasting in these cases
125
often includes not only extrapolating past events, but quantifying the probability of
demand factors as well. Forecasting for these purposes often requires more than just a
point forecast, particularly when the uncertain event is binomial in nature. We include
terms like “robust”, “resilient”, and “point” to capture this type of research.
Variability and Forecastability: The final set of search terms have to do with
achievable limits of accuracy. Increased demand variability, nonstationary demand
factors, lumpiness and intermittence all have been found to significantly limit the
possible forecast accuracy (Syntetos et al. 2005, Gardner Jr. and Acar 2016). To capture
research that systematically defines these conditions in the context of logistics and supply
chains, we include terms such as “variability” and “uncertainty”.
Scoping the Literature Search
Upon establishing the general search topics, our next step in developing criteria
for inclusion was to find ways to limit a sample so it could be tractably vetted for quality.
Initial searches in various databases returned thousands of articles, some from dubious
sources and of questionable relevance. Several means of limiting the sample of a
literature review have been suggested, including limiting the date range (Grimm et al.
2015), journal selection (Carter and Easton 2011), database selection (Winter and
Knemeyer 2013), and citation count (Tate et al. 2012, Ellram and Cooper 2014).
To ensure the quality of the sources (Crowther and Cook 2007), and to limit the
amount of unverified and un-vetted material we have to analyze (Eksoz et al. 2014), we
bound our search to be peer-reviewed work that has a focus on logistics or supply chain
management topics. In this way, we were able to define what the common academic
126
understanding of “How good is good enough?” was among logistics and supply chain
management researchers over a wide array of contexts.
Two previous systematic literature reviews of logistics and supply chain
management literature utilized Google Scholar, and scoped their search based on a
prescribed number of top cited results from their keyword search (Tate et al. 2012, Ellram
and Cooper 2014). We found three primary issues with this approach. First is that
Google Scholar results tended to include a tremendous amount of noise. That is, search
results that were not from peer reviewed sources, and citation counts that included non-
peer reviewed citations. The second problem is a bias toward older articles, as citation
counts (a heavily weighted criterion for relevance) for recent articles tended to be quite
low. Finally, this method undercuts the “completeness” of a systematic search. By
selecting an arbitrary number of highly cited or highly relevant articles, many practically
relevant works are omitted.
We instead decided to limit our search to journals that are widely recognized as
purveyors of high quality logistics and supply chain management research. We utilized
the EBSCO Business Source Complete database, as this provided the most
comprehensive collection of journals (and articles within journals) based on an initial
keyword search in multiple databases.
In their systematic review of sustainable supply chain management research,
Carter and Easton (2011) included a list of only seven journals as containing relevant
research. We found this to be too restrictive, and included also those journals from the
Supply Chain Management Journal list. This list, endorsed by over 300 university
127
professors conducting logistics and supply chain management research, expanded our
total number of journals to 12. After consulting a leading logistics and supply chain
management researcher who specializes in forecasting research, we expanded this list
further to ten additional journals that included the highest number of articles that met our
keyword search criteria, based on an unrestricted search.
The journals listed in Table 10 were included in the initial keyword search, which
we divide into journals where logistics or supply chain management is their primary
focus, and those that include relevant research, but have a primary focus elsewhere (such
as operations research or general management topics):
128
Logistics and Supply Chain Journals Non-Logistics and Supply Chain Specific
Journals
Manufacturing and Service Operations Computers and Industrial Engineering
Management
International Journal of Logistics Decision Sciences
Management
International Journal of Physical Decision Support Systems
Distribution and Logistics Management
Journal of Business Logistics European Journal of Operational
Research
Journal of Operations Management Foresight: The International Journal of
Applied Forecasting
Journal of Supply Chain Management International Journal of Forecasting
Production and Operations Management International Journal of Production
Economics
Supply Chain Management: An International Journal of Production
International Journal Research
Transportation Journal Journal of the Operational Research
Society
Transportation Research: Part E Management Science
Omega: The International Journal of
Management Science
Operations Research
Table 10: Journals Included in Initial Keyword Search
We opted not to limit our search based on date range, as we found no basis for
deeming older work to be non-relevant, and were able to scope a tractable sample with
existing limiters.
Individual Article Criteria for the Literature Search
Once the scope of the search is established, each article must be reviewed for
relevance to the research question. Keyword searches often return false positives if
search terms are too general, or omit relevant work if search terms are too specific. In
response, we erred on the side of generality in the keyword search and subject each
129
article (in both abstract and full article reviews) to the following criteria to ensure it
contributes to our intended research question. Each article:
1. Must help answer the question: "How good is good enough?"
2. Must at a minimum discuss the implications of differing levels of forecast
accuracy, and this should be a main focus of the paper.
3. Should explore unique (and in combination, comprehensive) contexts that may
affect requirements for accuracy that differ from simply "more accuracy is better".
4. Cannot simply assume greater accuracy is a sufficient goal across all contexts,
must be application-specific, and agnostic to forecasting method (assumes most
appropriate method used, and all reasonable control actions within the firm are
exercised).
5. Must deal with demand forecasts (or those directly relating to the downstream
flow of goods or services in a supply chain). This excludes economic, financial,
demographic, managerial, technological, supply (or yield), or returns (though
permits closed-loop systems) forecasts.
6. If forecast accuracy is not the main focus, then there must be some unexpected
application or context-based insight regarding accuracy
Collecting Relevant Research and Applying Search Criteria
The first-round search returned 180 articles that met our keyword and journal
restrictions. We reviewed the abstract for each, applying the six criteria for individual
article inclusion, and categorized them on a five-point scale ranging from “clearly does
not meet criteria for inclusion based on abstract, and therefore removed” to “clearly
130
meets criteria for inclusion based on abstract, and therefore merits full review of article”.
Based on this scale, all but the lowest category was retained for a full review, resulting in
130 carried forward. We then applied the same criteria for individual inclusion reviewing
the full article. We found 107 articles met our criteria for inclusion after this second
round.
From these 107 remaining articles, we began our cascading process. We
reviewed all article titles cited in the bibliography and all citing articles identified in
Google Scholar, using the individual article inclusion criteria. If a cited or citing article
appeared to merit inclusion, and it was published in one of our previously identified
journals, then it was carried forward for a review of its abstract. If it appeared in an
alternate journal title, then the keyword search in Business Source Complete was
expanded to include these journals. The result was an expansion to 191 articles, reduced
to 181 articles after reviewing first abstracts, then full articles. Table 11 provides an
overview of the cascading and vetting process.
131
Journal Initial Vetted After
Search Cascading
International Journal of Production Economics 18 23
International Journal of Production Research 9 16
Journal of Operations Management 15 15
International Journal of Forecasting 5 15
Foresight: The International Journal of Applied 5 14
Forecasting
Production Planning and Control NA 12
Production and Operations Management 10 11
Journal of Business Logistics 9 10
Decision Sciences 2 7
Omega: The International Journal of Management 4 6
Science
European Journal of Operational Research 2 6
Supply Chain Management: An International Journal 5 5
Computers and Industrial Engineering 4 5
Management Science 3 5
Manufacturing and Service Operations Management 3 5
Journal of the Operational Research Society 3 4
Operations Research 1 4
International Journal of Physical Distribution and 3 3
Logistics Management
The International Journal of Logistics Management 3 3
Journal of Business Forecasting Methods and Systems NA 3
Journal of Forecasting NA 3
Naval Research Logistics NA 3
Decision Support Systems 1 1
Journal of Supply Chain Management 1 1
Transportation Research: Part E 1 1
Transportation Journal 0 0
Total 107 181
Table 11: Keyword Search Results in Initial and Cascaded Search
Overview of Thematic Findings
As we reviewed our sample of literature, we identified primary and alternate
themes (from the 13 general topics included in our search terms) that each article
132
included in their analysis. We noted the type of analysis conducted, the error
measurement each work included (if any), and the supply chain entity(s) that served as
their unit of analysis. Summary statistics for these characteristics can be found in Table
12, Table 13, and Table 14.
Type of Analysis No. of Articles

Simulation, Non-Empirical 46
Analytic, Non-Empirical 35
Statistical Forecast Model, Empirical Validation 28
Thought Leadership 22
Case Study 21
Analytic, Empirical Validation 14
Regression Analysis of Forecast Accuracy 11
Survey 8
Simulation, Empirical Validation 7
Literature Review 6
Statistical Forecast Model, Non-Empirical 4
Table 12: Types of Analysis in Article Sample
133
Kolassa (2016b) Accuracy Measure No. of
Classification Articles
Cumulative 6
Bias Mean (Median) 23
Scaled Mean 2
Mean (Median) 24
Mean (Median) Relative 8
Absolute Error Geometric Mean 2
Geometric Mean Relative 3
Mean (Median) Scaled 5
Cumulative 3
Mean (includes minor corrections) 42
Absolute Percent Median (includes Relative) 5
Symmetric Mean (Median) 12
Weighted Mean 4
Percent Error Mean (includes Symmetric) 10
Mean Squared 1
Percent Squared Error
Root Mean (Median) Squared 1
Sum (includes Root Mean) 2
Mean (includes minor corrections) 27
Root Mean 8
Relative Mean 2
Squared
Relative Root Mean (Theil's U) 1
Relative Geometric Root Mean 2
Geometric Root Mean 4
Coefficient of Determination 2
Univariate Normal 40
Univariate Non-Normal/Bayesian Updated 9
Scaled
Multivariate 5
Coefficient of Variation 5
Loss Function Variants 5
Error Implication Statistics 3
Functional
Nonparametric (Ordered or Percent 3
Better/Best)
Self-Reported 4
NA
None 38
Table 13: Forecast Accuracy Measures Explored in Article Sample
134
Supply Chain Entity Studied No. of Supply Chain
Articles Echelons
General Demand Forecasting 59 One-Tier
Distributor-Retailer 27
Manufacturer-Retailer 23
Supplier-Manufacturer 10
Two-Tier
Manufacturer-Multiple Channels 2
Manufacturer-Distributor 1
Supplier-Distributor 1
Manufacturer-Distributor-Retailer 6
Supplier-Distributor-Retailer 1 Three-Tier
Supplier- Manufacturer -Retailer 1
Supplier-Manufacturer-Distributor- 3
Four-Tier
Retailer
Production and Inventory Control 47
NA
Residual Demand 10
Table 14: Focal Supply Chain Entities in Article Sample
We also found that we needed to edit our themes slightly to better reflect the
extant literature and form a typology of drivers of “How good is good enough?”. Themes
can be divided generally into technical drivers of forecast accuracy bounds, and
managerial drivers to accuracy bounds. Table 15 includes the six edited themes that we
categorized as technical drivers and seven edited themes that fell under managerial
drivers, and the numbers of articles where each theme was either a primary or an alternate
focus. We then describe the state of logistics and supply chain management research
regarding each theme, state what that means for our research question, and propose
directions for future inquiry in academic research.
It is clear that many of the works reviewed focused on multiple dimensions we
identified as being important to the question of “How good is good enough?”. The
135
classification of each of these works as contributing to each dimension, while systematic,
is ultimately subjective. Some research included merely a cursory and tangential
discussion of the implications of accuracy. Some of the classifications could be viewed
as equally contributing to another of the dimensions we set out. For instance, a study
describing the bullwhip implications of forecast accuracy almost certainly include
discussions of information sharing, the most prominent ameliorative tactic to reduce the
impact of supply chain demand distortion. In that vein, a discussion of information
sharing most certainly involves some focus on the degree of supply chain integration.
This is then related to issues of hierarchy, aggregation, and the inevitable cost tradeoffs
that occur within and between principals of a complex supply chain. Our judgement is
based on the subjective set of criteria set out above, and a qualitative assessment of which
dimension featured most prominently in a work.
Technical Drivers of No. of Managerial Drivers of No. of

Forecast Accuracy Bounds Articles Forecast Accuracy Bounds Articles
Forecastability 19 Error Amplification 8
Horizon 8 Cost Tradeoffs 43
Overfitting and
4 Supply Chain Integration 34
Misspecification
Tradeoffs of Metrics 31 Supply Chain Flexibility 9
Level of Aggregation and
21 Manual Adjustments 11
Hierarchy
Data Quality and
12 Risk 24
Availability
Inventory and Control
34
Policy
Table 15: Technical and Managerial Drivers of Forecast Accuracy Bounds
136
Technical Drivers of Forecast Accuracy Bounds
While many elements of demand forecasting are under the control of managers or
demand planners, technical drivers are either only partially endogenous or completely
exogenous. These themes primarily drive achievable demand forecast accuracy.
Literature findings, implications for “good enough”, and proposed future research is
discussed below for each technical driver of forecast accuracy bounds, summarized in
Table 16.
Forecastability
Forecastability refers to the inherent limits to generating accurate future
predictions from a particular demand pattern. In their survey of Midwestern
businesspeople, Mentzer and Cox (1984) describe this concept of forecastability as a
distinction between potential and achieved forecast accuracy. The reviewed literature
seems to split between statistical and situationally-based inherent limits to forecastability.
Statistical limits relate to characteristics the time series itself that make that pattern
difficult to project. Situational limits refers to contexts likely to present these statistical
limitations. Mentzer and Cox (1984) note that most work in demand forecasting focuses
on methods to improve accuracy given an isolated context rather than a general
classification of what is possible, much less the situational characteristics that may
actually be under a manager’s control. Despite the intense focus on methods, the
discussion of what is forecastable remains underdeveloped. This is in part due to the
concept being a moving target.
137
Statistical forecastability depends on the current state of constantly evolving
forecasting methods and technology, the ability to gather data (covered more under a
separate theme), and numerous data characteristics such as times series trend, seasonality,
intermittence (Willemain et al. 2004, Hatzakis et al. 2010), (non)stationarity (Hosoda and
Disney 2006, Gaur et al. 2007, Pearson et al. 2010, Thiel et al. 2014) and variability in
both demand and supply (Lorentz et al. 2007, Fildes et al. 2009, Davis et al. 2016). As
Fildes et al. (2008) suggest, improvement in statistical forecasting methods is not nearly
as important as improving methods to match (particularly algorithmically) forecasting
methods to circumstance. Enhanced statistical sophistication has not led to universal
improvements in forecastability even though there is possibility for improvement.
Additionally the focus of our review was not a technical discussion of forecasting
methods, but instead of demand forecasting applications. As such, we include only data
characteristics in our theme of forecastability and exclude a discussion of methods.
Situational forecastability is simply an examination of contexts in which the
aforementioned statistical characteristics are likely. High demand variability can occur in
any industry, and can be exacerbated by lead time or production instability, supply
uncertainty, and the bullwhip effect between echelons in the supply chain. Davis et al.
(2016) demonstrate the effect of supply uncertainty on forecastability by modeling food
bank operations, though this can apply to other contexts such as nonprofits, agricultural
supply chains, and extended supply chains with sourcing in unstable or poorly regulated
areas. Lorentz et al. (2007) demonstrate the effect from lead time uncertainty in
unreliable Eastern European logistics infrastructure. Intermittence is also notoriously
138
difficult to forecast owing to both variability in demand size and pattern (Syntetos and
Boylan 2001), and has been studied in such contexts as residual demand for service parts
(Willemain et al. 2004) and service demand (Hatzakis et al. 2010). Non-stationarity is
observed in instances of rapidly changing demand conditions, and can be the result of a
number of factors including firm (or product) age, level of market competition, industry,
forecast horizon, level of aggregation, and market (or regulatory) maturity (Winklhofer
1996). Thiel et al. (2014) explore this in the effects of a health scare in the French
poultry industry. This involves drastic and unpredictable reductions of demand in a
tainted product, rapid increases in alternatives, uncertain supply, and unpredictable
regulatory intervention. Short lived products with highly elastic consumer responses, like
those found in the fashion retail (Pearson et al. 2010) and consumer technology (Gaur et
al. 2007) industries also contribute heavily to non-stationarity. Even relatively stable and
predictable point of sale demand can have non-stationarity introduced at higher echelons
via bullwhip demand distortion (Hosoda and Disney 2006).
What these studies on statistical and situational forecastability indicate for
practitioners is that some demand patterns commonly found in the situations described
above will necessarily have lower achievable accuracy, meaning “good enough” is lower
in these contexts. This also implies that alternate strategies to investing in more
sophisticated forecasts will bear more fruit. Fildes et al. (2009) suggests that in situations
of high variance (though not low), manual adjustments to forecasts are effective. Babai
et al. (2014) suggest introducing bias known to be incorrect results in better operational
outcomes than being able to correctly forecast intermittent demand complicated by such
139
factors as obsolescence. Chen and Blue (2010) suggest aggregating negatively correlated
demands to offset low forecastability. Kurtuluş et al. (2012) suggest greater operational
integration between supply chain echelons to limit variance inflation from bullwhip, but
concede that this also likely has Pareto returns. Morlidge (2014b) suggests adjusting the
manner in which forecastability is measured to prioritize how it is addressed. The most
common measure associated with forecastability is the coefficient of variation (CV), or
the ratio of standard deviation to mean in a time series. However, this is merely an
indicator of dispersion which happens to correlate with poor forecast accuracy in most
(but not all) cases (Morlidge 2013). Wallstrom and Segerstedt (2010) suggest also
including indications of intermittence as well as dispersion, and Morlidge (2013, 2014
a,b) proposes measures like relative absolute error (RAE) as an alternative. This
measure, which compares error in forecasts to error in a naïve forecast more
appropriately demonstrates the potential forecastability relative to a naïve approach. He
analytically demonstrates a theoretical lower bound of 0.7 and empirically finds a
practical lower bound of 0.5 across numerous demand signals. Using this metric and
some indicator of relative value contribution for each product, forecasters can better
target limited funds to improve on low forecastable items.
Much academic work focuses on improving accuracy for demand predictions in
the low forecastable situations listed above, and ought to continue in the future. However
future research on forecastability should also examine interactions between these
situations, and alternative means of both measuring and avoiding low forecastability.
140
Horizon
It should come as no surprise that previous logistics and supply chain
management research has indicated that forecast accuracy degrades at higher forecast
horizons, demonstrated in self response among demand planners (Mentzer and Cox
1984), and in simulations of multi-echelon production systems as lead time increases
(Huang et al. 2011).
However lower accuracy does not necessarily mean they are not useful, as some
research indicates the necessity of lower accuracy, longer range forecasts for point of sale
demand. Inaccurate forecasts over longer horizons are necessary in products with long
development lead times and short product life cycles (Fisher and Raman 1996, Li et al.
2015), and shortening lead time or increasing responsiveness to early indications of lead
time inaccuracy promise greater returns than attempting to improve accuracy at those
ranges. Clark (2005) and Amornpetchkul et al. (2015) show that inaccurate longer range
forecasts also work as coarse indicators to suppliers, permitting production smoothing
and more effective capacity planning. Miyoaka and Hausman (2004) show that the use
of “stale” forecasts (longer horizon from previous period) for setting downstream base
stock levels can help better align stock policy to production as forecasts are updated.
Such examples of using inaccurate long range forecasts include benefits that are
unequally shared and costs unequally borne (discussed further under the cost tradeoffs
theme), often requiring built-in incentives or penalties in contracts.
The accuracy of longer range forecasts is also affected by the intended purpose of
the forecast. Though disaggregated demand forecasts degrade at longer horizons,
141
temporal aggregation for long term operations, capacity, marketing or strategic planning
tends to dominate the negative effect of forecast degradation due to horizon. Willemain
et al. (2004) found certain methods for estimating intermittent demand (based on
temporal aggregation) actually performed better with longer horizons as a function of the
central limit theorem. This tradeoff between the accuracy benefit of aggregation with the
deleterious effects of increased horizon is explored by Zhao et al. (2002b) in a simulation
of a two-tier soft drink bottling production system. As forecast error increases, however,
the benefits to suppliers of early order commitment decreases.
Increased horizon for disaggregated demand forecasts tends to lower achievable
accuracy, and so indicates “good enough” is lower for situation where longer forecasts
are warranted. Practitioners can offset this lower achievable accuracy by implementing
postponement, electronic data interchange with agile production, and expediting when
necessary (Li et al. 2015). Such investments, which can also be undertaken for short-
term operational forecasts, promise greater overall cost savings with longer horizon
forecasts (Rafuse 1995). However, these responses require a full understanding of both
the incremental cost of inaccuracy and the proportion of error attributable to forecast
horizon. This changes as the utility of the forecast is adjusted to longer range operational
or non-operational purposes. Due primarily to temporal aggregation, “good enough”
increases when longer range forecasts are combined for other planning purposes.
Future work regarding the effects of forecast horizon on demand forecast
accuracy needs to include comparisons of the relative effect of horizon across industries,
levels of hierarchy and various other contexts. It would also be useful to generate non-
142
linear models to characterize the degradation effect of increasing horizon on demand
forecast accuracy.
Overfitting and Misspecification
This topic was somewhat less explored in the logistics and supply chain
management literature, as it has a greater focus on pure statistical precision. Overfitting
revolves around the tradeoff that exists between model complexity and precision, with
regards to a training sample. As Kolassa (2016a) and Katsikopoulos and Syntetos (2016)
describe in their discussions of tradeoffs between bias and variance, models with higher
complexity are attractive as they tend to minimize bias and often “pass for intelligence”.
The problem is that unnecessary complexity attempts to explain variance that may be true
random error, particularly when the underlying signal is weak. Katsikopoulos and
Syntetos (2016) found more complex models increased out-of-sample forecast error by
an average of 27%, and Kolassa (2016a) often found misspecified (biased) models
outperforming correctly specified models. Even forecast model complexity increases that
have promised improvements tend to be based on extremely small validation samples,
and generalized claims from highly complex models should be viewed with skepticism
pending further validation (Goodwin 2011). Willemain et al. (2004) demonstrate
overfitting as a particular problem in intermittent derived demand situations. Empirically
derived distributions for spare parts demand were found to be unreliable in predicting
future demand when demand was non-stationary across 28,000 products. Their proposed
alternative to accuracy was aggregation via cumulative lead time demand.
143
This implies that “good enough” is often lower than what is achievable. This is a
more pernicious problem for demand planners, as overfitting is often a subjective
assessment a-priori. That is, until the predictions are proven false, there is limited
knowledge of their quality. The surest way to protect against this is to follow good
modeling practices of including validation or holdout samples when fitting statistical
models, and to employ demand forecasters that have a fundamental understanding of
what is inevitably being optimized in the statistical models employed. As overfitting is
likely amplified when demand is non-stationary, it is also useful to provide manual
adjustments where possible to account for model deficiencies where it is known that
demand pattern environmental conditions will change.
Future research on the effect of overfitting on demand forecast accuracy should
focus on providing additional guidance for practitioners to detect overfitting. More work
also needs to be done to describe the effect of the size and quality of the training sample
on overfitting or misspecifying prediction models.
Tradeoffs of Metrics
Each metric of accuracy rewards and penalizes different aspects of closeness or
difference of a prediction from reality. It is critical to match the measure appropriately to
the circumstances. Unfortunately, in many cases, the circumstance is a demand planning
function with little expertise that utilizes a deeply flawed, but simple to calculate and
interpret, measure. This mistake of using a single simple but flawed metric to analyze
accuracy pervades the logistics and supply chain management academic literature as well,
as the plurality of reviewed articles chose mean absolute percent error (MAPE) to
144
evaluate forecasts. This metric suffers from inflation at low levels of demand, tends to
drive systematic under-forecasting, is susceptible to outliers and is not estimable when
demand is zero (Armstrong and Callopy 1992, Fildes 1992, Makridakis 1993, Hyndman
2006, Hyndman and Koehler 2006, Kolassa and Martin 2011). The primary concepts
discussed in our literature sample were the tradeoffs of measuring bias and accuracy,
comparability of measures and the insufficiency of individual traditional error metrics.
Though error metrics tend to command a greater portion of focus in both
forecasting practice and research, and there is some indication that small nonzero bias in
either direction can have positive effects on supply chain operations (Kolassa and Martin
2011, Sanders and Graman 2009), several papers indicate that reducing bias is relatively
more important than reducing error to logistics and supply chain performance. Ebrahim-
Khanjari et al. (2012) show that positive bias in shared forecasts deteriorate trust between
wholesalers and retailers to a greater extent than error. Barman et al. (1990), Ritzman
and King (1993), Flores et al. (1993) and Huang et al. (2011) show bias dominates error
in its effect on cost reduction and delivery performance in production control systems,
particularly with large lot sizes and small buffers, and that the positive effects of reducing
error are due primarily to the reduction of the bias component of error. The same relative
effect size is demonstrated in labor and inventory costs in a warehouse (Sanders and
Ritzman 2004a, Sanders and Graman 2009, 2016), distribution network costs (Lee et al.
1993, Zhao and Xie 2002), and manufacturing supply chains (Chang and Yeh 2012).
Bias measured cumulatively has been found to have a greater effect on logistics and
supply chain costs for intermittent demand items (Willemain et al. 2004), and under some
145
control policies in a production control system (Sourirajan et al. 2008). Though
Sourirajan et al. (2008) find that control policies that depend on cumulative information
are more sensitive to bias, whereas rate based control policies are more sensitive to error
outliers. Most of these works at least mention a tradeoff between bias and accuracy.
That is, maximizing a measure of accuracy at best does not produce optimal bias, and can
end up increasing negative effects of bias (Huang et al. 2011).
Barman et al. (1990), Syntetos and Boylan (2010) and Katsikopoulos and
Syntetos (2016) present the tradeoff of bias and variance more explicitly as a
reformulation of mean squared error (MSE), the only error measure that is tractably
decomposed into bias, forecast variance, and random error variance components. This
implies that there is a theoretical limit to how small MSE can be, as random error is by
definition unexplainable. It follows that logistics and supply chain costs are minimized
only by balancing bias and error terms (Katsikopoulos and Syntetos 2016).
While it may seem an attractive and simple way to benchmark performance, most
metrics cannot effectively be compared across products, forecasters, forecasting methods
or over time. This is particularly true of scale-bound metrics like mean error (ME), mean
absolute error (MAE) (Moon 2015) and MSE (Chatfield 1992, Armstrong and Fildes
1995, Hyndman and Koehler 2006), but this applies to other error (and bias) metrics as
well. Since the time series characteristics (not to mention other significant production
and distribution considerations) differ between scenarios, we should not expect a
common metric to hold the same meaning.
146
Three primary alternatives have been proposed to address limited comparability,
and some eschew the use of accuracy measures altogether. The first alternative measures
are nonparametric comparisons such as ordered or percent better measurements (Syntetos
and Boylan 2005, Hyndman and Koehler 2006). However, such nonparametric
comparisons can have low construct validity and deviate from the consensus of a number
of other measures (Armstrong and Callopy 1992), making them potentially misleading in
isolation. Accuracy implication metrics and loss functions are the second alternative that
try to estimate the direct impact on the firm or supply chain (Lee et al 1987, 1993, Fildes
1992, Flores et al. 1993, Boylan and Syntetos 2006, Hyndman and Koehler 2006,
Syntetos et al. 2010), with forecast accuracy metrics possibly not even included as an
input. This may not be useful for diagnosing forecast misspecification, as accuracy
metrics often have little relation to loss function or cost performance (Lee et al. 1987), or
could have highly disproportionate effects on accuracy implication metrics with even
small changes in accuracy (Syntetos et al. 2010). A third suggestion by Moon (2015) is
to solely compare error rates over time, but even that admittedly assumes stationarity of
demand drivers.
Comparability of metrics is even lower between different principals in the supply
chain. Though many different may forecast demand, the level of aggregation, time
horizon, and ultimate use of the forecast require separate metrics and result in different
levels of both achievable and desirable accuracy. In a case study of a large industrial
production firm, Kerkkänen et al. (2009) demonstrate that different metrics ought to be
generated and utilized by different organizations generating forecasts. Sales, marketing,
147
finance, accounting, production, purchasing and logistics should all use metrics tailored
to their function. This all implies that the “goodness” of any metric is less important than
the metric being matched to the appropriate context. Hançerlioğulları et al. (2016) show
that optimization of metrics at one level can negatively impact metrics at another in a
study of financial data from 304 firms. They find negative bias, or what they call “sales
surprise” a boon for investors in the short term by increasing inventory turnover, but that
this negatively impacts production and long term customer goodwill. Conversely, they
find increased forecast accuracy reduces turnover and short term shareholder value. This
means that “goodness” of a metric is relative to the affected principal, and a firm, or a
coordinated supply chain should balance the “goodness” of the most appropriate metrics
between the multiple generators and beneficiaries of forecasts.
Finally, even metrics appropriately applied are found to be insufficient to
completely summarize a demand forecast’s quality. Knowing the level of error does not
indicate the cost of such error, the persistence of error, whether positive or negative
deviation affects the supply chain in different ways (Kolassa and Martin 2011), and
whether errors affect different parties in a supply chain the same way. Several works
present typologies of various forecast error metrics (Armstrong and Callopy 1992,
Mathews et al. 1994, Hyndman and Koehler 2006, Kolassa 2016b), and many others
describe potential defects and strengths of each measure. We loosely group our
discussion by the types identified in Kolassa’s (2016b) framework.
The first two types, absolute and squared errors, are both scale dependent, and so
have low reliability (comparability between forecasts) (Hyndman and Koehler 2006).
148
Additionally, absolute error metrics such as MAE and Median Absolute Error (MdAE)
rewards point forecasts for approaching the median, and so are sensitive to outliers
(Kolassa 2016b). Squared terms such as the sum of squared errors (SSE), MSE, and root
MSE (RMSE) rewards forecasts that approach the mean, are sensitive to outliers, exhibit
non-normality and low construct validity (tend to disagree with the consensus of forecast
measures) (Armstrong and Callopy 1992, Fildes 1992, Kolassa 2016b). Percent error
measures like MAPE, median symmetric and weighted variants of MAPE, as well as
mean percent error (MPE) each suffer in varying degrees from the limitations attributed
to MAPE at the beginning of this theme. Relative error metrics, that use a benchmark
forecast method (usually some variant of a naïve forecast) as a point of comparison share
some inflation and definability shortcomings with percent errors (Makridakis 1993,
Hyndman and Koehler 2006), scaled errors penalize over-forecasts more than under-
forecasts (useful for intermittent data) but increase reliability, and loss functions tend to
be highly context specific, complex (not easily interpreted) and non-generalizable
(Armstrong and Fildes 1995, Kolassa 2016b).
Insufficiency of traditional measures has led to two basic propositions. First is to
scrap traditional point forecasting methods in favor of predictive distributions, which then
require alternate means of reporting forecast quality. The second is to account for the
biases and informational deficiencies of single error metrics by including multiple
metrics to balance them.
Adopting the first proposal, Kolassa (2016b) demonstrates by forecasting over
2000 products in drug and grocery retailers that accuracy metrics in common use are
149
inadequate, particularly for items with intermittent demand. Further, the use of a normal
assumption when applying error metrics to determine safety stock ignores the asymmetry
of error effects from over and under-forecasting. Kolassa suggests distributional
forecasts rather than point forecasts to make up for the deficiencies he describes for each
individual error measure in his typology. Point forecasts of all types provide too little
information, particularly for very low and high demand values and skewed distributions.
Zhao and Xie (2002) support this call for using predictive distributions as they found the
type of forecast distribution, independent of either error and bias, significantly affected
costs in a simulated distribution network. Limits to using predictive distributions instead
of more traditional error metrics include the significant increase in data collection,
required demand planner skill, reduced interpretability (Kolassa 2016b) and bias
introduced from equal observation weighting in goodness of fit tests for distribution
(Boylan and Syntetos 2006).
Mathews and Diamanopoulos (1994) and Wallstrom and Segerstedt (2010) adopt
the second proposal. Mathews and Diamanopoulos find through principal components
analysis (PCA) that 14 error measures reduced to four distinct components accounting for
more than 85% of forecast variance. Ratio, volume, bias and fit-based measures (roughly
equivalent to percent, scale dependent, bias and the coefficient of determination
respectively) all account for different aspects of demand forecast variance. Also though
PCA of forecast error measures for smooth (low CV, frequent), intermittent (low CV,
infrequent), erratic (high CV, frequent), and lumpy items (high CV, infrequent),
Wallstrom and Segerstedt suggest traditional error measures (MAE and MSE
150
specifically) insufficiently describe the difference of predicted and observed phenomena.
Specifically, they find mathematically distinct and complementary explanatory power in
MSE, MAE, sMAPE (symmetric MAPE), CFE (cumulative forecast error), PIS (periods
in stock), and NOS (number of shortages). Morlidge (2014b) supports the approach of
including multiple metrics in his examination of 11,000 demand forecasts of items with
varying volumes and variance. He found that even though traditional measures of
accuracy tend to be lower among high volume (and cost) items, they perform the same as
low cost, low volume items on his proposed measure of RAE. This implies that
traditional measures motivate a misalignment of resources, as high volume, high RAE
items currently compose the largest possibility for cost improvements for demand
forecasters. Reliance on a single or small number of measures leaves demand planners
blind to potential areas for forecast improvement.
The tradeoffs between metrics means that the question “How good is good
enough?” depends on the metric(s) in use and the type of forecast it is applied to. In
general, positive and negative deviation will generate distinct cost (and revenue) effects
that differ for each member of a supply chain. Extant literature suggests that error
minimization can increase bias, which may increase overall costs. It is also suggested
that regardless of what is considered “good enough” for a single product, that metric
ought not be applied across all products, as differing circumstances and demand patterns
make such comparisons inappropriate. “Good enough” should instead be based on more
comparable nonparametric functions, accuracy implication metrics, loss functions, or
solely be relative to past performance (assuming the demand pattern is more or less
151
stationary). Different metrics, and thereby different determinations of what is “good
enough” should be applied based on the creators and consumers of each demand forecast.
Marketers may be more interested in aggregate sales over a several month horizon,
whereas operations may be more interested in production and distribution demands for a
single product over several days. The negative impact of horizon and positive impact of
aggregation on achievable accuracy in such differing applications are discussed in
separate themes. Finally, research on the tradeoffs of metrics in logistics and supply
chain management literature indicate that traditional error metrics are insufficient to
communicate all of the relevant information regarding the difference in predictions from
reality. Predictions ought to capture distributions, rather than point forecasts, and
multiple measures are needed to convey all of the relevant information to decision
makers, each with their own level of “good enough”.
Future research on the tradeoffs of metrics must focus on quantifying the
additional costs and potential benefits of following the suggestions of extant research.
Shifting the focus of measurement from simple error to bias, or to a more holistic
complement of measures for forecasts with higher information density will inevitably
cost more in data collection, information sharing between supply chain partners,
information technology (IT) investment, and recruitment and training for demand
planners.
Level of Aggregation and Hierarchy
This theme can be viewed as a tradeoff of the level of aggregation at which the
forecast is generated, and the level of hierarchy for which the forecast is generated.
152
Aggregation can be accomplished by product characteristic (Fliedner and Lawrence
1995, Moon 2015, Paul et al. 2015), location (Mentzer and Cox 1984, Zotteri et al. 2005,
Williams and Waller 2011b), time (Zhao et al. 2002b, Rostami-Tabar et al. 2013, Jin et
al. 2015), and even by customer profile (Chung et al. 2012, Breiter and Huchzermeier
2015), and consistently has been found to have a positive effect on the level of achievable
accuracy. However, there exists a gap between theory and practice in aggregation
methods. Practitioners tend to aggregate by predetermined hierarchies rather than
covariance that would suggest effective grouping, which leads to mixed results in
aggregation in practice (Syntetos et al. 2016). Aggregation typically also implies that
forecasts are generated centrally (hierarchically), as upper echelons will have access to
greater resources, more concentrated talent, and increased data quantity and quality. This
also leads to greater degrees of accuracy in both demand and supply forecasts (Mentzer
and Cox 1984, Davis et al. 2016), though this effect degrades for very short term
forecasts and beyond the firm-level unit of analysis.
However, just because forecasts generated at a higher level of hierarchy or
aggregation are likely more accurate, does not mean that they are more useful. In the
end, the forecast must be tailored for the appropriate user, wherever they are in the
hierarchy. Reconciliation of forecasts between functions, as well as between hierarchical
levels remains a pressing issue for forecasters (Syntetos et al. 2016). A significant
amount of logistics and supply chain management research aims to determine whether an
automated means of reconciliation through a top-down strategy, in which aggregate
forecasts are developed and later apportioned for use at lower levels of hierarchy,
153
provides better accuracy (and supply chain performance) than a bottom-up strategy,
where disaggregated forecasts are later combined for use in higher levels of hierarchy.
Zotteri et al. (2005, 2014) suggests that the choice between these demand forecasting
strategies depends primarily on the relative import of sampling or specification error, and
that minimum error is achieved through some mixture of these approaches.
Sampling error occurs when time series samples are short, noisy, nonstationary,
contain errors, or include unrepresentative observations. Bottom-up approaches suffer
when this type of error is common. Widiarta et al. (2006), Chen and Blue (2010),
Hatzakis et al. (2010), Williams and Waller (2011b) and Rostami-Tabar et al. (2013) all
find examples of this limitation when aggregating positively autocorrelated (within a
demand pattern) or covarying (between demand patterns) demand signals. Williams and
Waller (2011b) and Jin et al. (2015) demonstrate that though bottom-up approaches more
often dominate top-down, limited disaggregated data can make bottom-up approaches
less accurate than top-down approaches.
Specification error occurs when disaggregated units of analysis contain
heterogeneous demand signals that are obfuscated by pooling demand, and thus top-down
approaches suffer in the presence of this type. Fliedner and Lawrence (1995)
demonstrate this in testing grouping mechanisms for diesel engine production spare parts
forecasting. Flores and Wichern (2005) demonstrate that this confounding has a
particularly acute effect on measures of bias. Caniato et al. (2005) show that a top-down
clustering approach to account for structural, managerial and random irregularities can
help save money on the (often) high costs of collecting diverse data, but that clustering
154
the random irregular data did not improve forecasts. This implies random effects wash
each other out and represent misspecification. Moon (2015) presents an alternate
example where relevant information is lost when grouping products that differ
substantially in profit contribution. In this case accuracy may indeed be improved, but
any sort of explanatory power for identifying causes of error in the relatively more
important products is masked.
“How good is good enough?” depends on both sides of the tradeoff in this theme.
Aggregation has been found to increase achievable accuracy, but forecasts must be
tailored to the proper level of hierarchy for use. Tailoring the proper level of aggregation
can be accomplished via top-down disaggregation or bottom-up aggregation, but the
effectiveness of each approach depends on the relative import of sampling and
specification error to a demand planner.
Future research on the effect of aggregation and hierarchy should attempt to
differentiate the effects of different types and interactions of sampling and specification
error on the effect of aggregation on demand forecast accuracy. For instance, if demand
signals are both noisy and heterogeneous, is specification error or sampling error more
significant. Also, we should expect the relative effect of these error types to differ based
on the mechanism of aggregation. Future work should estimate how temporal,
geographic, product and customer based forms of aggregation are affected differently by
sampling and specification error.
155
Data Quality and Availability
Data quality and availability have been shown to be critical limiters of achievable
demand forecast accuracy. The logistics and supply chain management literature on this
theme cover both aspects of data input with regards to their effect on demand forecast
accuracy.
Quality refers to errors in recording, undocumented substitutions, inventory
record inaccuracy, data dependent on already included information, and erroneous or
perhaps even dishonest reporting of data between supply chain principals. Quality has
been found to suffer when policies for collecting data are nonstandard, poorly enforced,
and involve judgement from personnel with low levels of familiarity or training. Such
conditions have been found to be prevalent in humanitarian logistics operations (van der
Laan et al. 2016), but varying degrees of these conditions have been found to be
prevalent in demand management functions across all kinds of industries and situations.
Sanders and Manrodt (2003) found that among 240 firms, forecasters were equally likely
to pursue a qualitative forecasting strategy with subjective inputs, despite this approach
achieving lower forecast accuracy regardless of industry, size or marketing strategy.
They found the effect of data quality on accuracy was moderated by both forecasting
expertise and degree of supply chain integration. This implies that the level of
information sharing and coordination between members of a supply chain can also affect
data (and therefore forecast) quality. Cachon and Lariviere (2001) demonstrate this
analytically, and find that poor information quality not only negatively affects forecast
accuracy, but that inaccurate shared information in a supply chain can erode trust and
156
eliminate the possible benefits from forecast information sharing. Clemen and Winkler
(1985) find also that data quality (and its value to a forecast) deteriorates monotonically
with the level of dependence the information has on previous data.
Availability refers to the lack of a capability to collect certain types of
information, whether it be a technological, cost, access, or legal or regulatory limitation.
Fildes and Petropoulos (2015) and Moon (2015) indicate that practitioners cite
availability of or access to data as their most critical forecasting deficiency. This implies
that demand forecasting processes, rather than technical capabilities, promise the greatest
opportunities for improvement. As is the case with data quality, availability of data is
lower in rapidly changing situations with fewer standard policies for data gathering and
management (van der Laan et al. 2016). Kelle and Silver (1989) and de Brito and van der
Laan (2009) find that reverse channels in closed loop supply chains often suffer from data
availability, as data collection often depends on customers being incentivized to provide
record rather than trained employees being compelled. Data collection in this case may
also involve significant IT investment or the use of potentially costly tracking
technology, which has mixed results on forecast accuracy (de Brito and van der Laan
2009). Holmström et al. (2006) find that in complex extended supply chains, different
members of the supply chain may have different capabilities to generate and share
accurate data, particularly at different stages of product, industry and firm maturity, as
well as channel mix. Sanders and Manrodt (2003) note that differing levels of IT
investment in firms also lead to lower levels of available data for qualitative forecasts at
various levels of a supply chain.
157
For demand planners trying to answer “How good is good enough?”, this means
that there are limits to both achievable and desirable demand forecast accuracy as a result
of these two dimensions of data inputs. Data can be of a lower quality due to unstable
circumstances, which would not be possible to mitigate with a reasonable amount of
investment. Availability may be lower due to the lack of technology to collect certain
types of information. These would both decrease achievable forecast accuracy. On the
other hand, some levels of quality and availability can be controlled, and end up as a
management decision of what level of accuracy is desirable. Data quality that is driven
by levels of demand planner training, enforced collection standards, IT sophistication,
and quality control is strongly influenced by overall levels of investment. Availability
can also be increased by incentivizing accurate information sharing with supply chain
partners, or investments in technologies to increase data sources. It is up to the managers
in a firm and across a supply chain to weight the costs and benefits of data quality and
availability on demand forecast accuracy.
Beyond estimating how less accurate and more limited data affect forecast
accuracy, the logistics and supply chain management literature has little that describes the
effect of data quality and availability on overall business or supply chain performance.
Future research in this area must attempt to quantify the effect of investments in quality
or availability on more than just forecast accuracy (particularly single measures). This
may include tying such investments into an integrated value model like the Economic
Value Added (EVA) model presented in Lambert and Pohlen (2001), or even just
counting it as a relevant cost in a decision model of tradeoffs.
158
Driver Implications Proposed Future Research
Low forecastable demand patterns have lower achievable accuracy, Examine interactions between
indicating “good enough” is also lower. May be better addressed with multiple low forecastable
Forecastability relative measures to target limited resources to only those demand situations, develop detection and
forecast accuracy increases that promise the greatest return. avoidance mechanisms when
forecastability is low.
Where longer-range forecasts are required achievable accuracy is Compare the effect of forecast
lowered, implying a lower “good enough”. Where temporal aggregation horizon in more varied contexts,
Horizon is not possible to offset the lower levels of accuracy, alternate strategies and increase the estimation of
such as postponement may be a better use of resources. nonlinear models.
Overfit models can seem more accurate than they end up being, Must focus on more effective
particularly when estimating (and extrapolating) weak effects. Lower detection of overfitting and of
desired accuracy (by reducing training samples and increasing holdout mapping the tradeoff of training
Overfitting and samples) implies a lower level of “good enough”. Resources are best and holdout sample sizes,
Misspecification spent training and recruiting forecasters with a sound understanding of particularly in cases of
modeling practices who are able to recognize overfitting. nonstationarity.
“Good enough” depends on the metric in use, as each measure imposes Increasing the number and
different penalties depending on the type of deviation. Single metrics complexity of measures to
159
Tradeoffs of
are inappropriate for comparison, and traditional error metrics evaluate likely increases cost of
Metrics insufficiently indicate the business impact of accuracy. demand planning, so benefits and
costs should be quantified.
“Good enough” depends on the level of aggregation (which increases Should explore relative import of
Level of accuracy) and level of hierarchy at which the forecast is used. Tailoring sampling and specification error
Aggregation and aggregation to required level of hierarchy depends on the relative when both are significant, and
Hierarchy importance of sampling and specification error. over multiple mechanisms of
aggregation.
Data quality and availability limitations make “good enough” lower. Should estimate the effect of data
Improving quality may be either impossible or cost prohibitive, and quality and availability on
Data Quality and improving availability may be limited by technology. Resources can business performance measures
Availability best be used to train the workforce on standard and accurate data such as EVA rather than
collection to improve quality, and relationship management to increase (especially singular) measures of
information sharing to improve availability. demand forecast accuracy.
Table 16: Summary of Technical Drivers of Demand Forecast Accuracy

Managerial Drivers of Forecast Accuracy Bounds
Managerial drivers are more under the control of managers and demand planners;
exhibiting a greater degree of endogeneity than technical drivers. They represent
differing degrees of capability or strategic emphasis on forecasting accuracy. They
include policy considerations, various interest tradeoffs, and inter-firm dynamics that
primarily affect the level of desired forecast accuracy. Literature findings, implications
for “good enough”, and proposed future research is discussed below for each managerial
driver of forecast accuracy bounds, summarized in Table 17.
Error Amplification
Error has long been known to amplify as a demand signal moves back up a supply
chain, in an effect known as the bullwhip effect. This means that at higher echelons,
achievable forecast accuracy is typically lower. While most research on the bullwhip
effect involves integration and information sharing efforts to mitigate this amplification,
we find three groupings within the theme of error amplification which either enhance or
diminish the importance of error between echelons.
First, there are situations where the effects of error amplification are expected to
be particularly acute. This implies that requirements for “good enough” are perhaps less
achievable, and that alternate means of mitigating extreme bullwhip may be more
effective. The case of no information sharing or inaccurate information sharing is
discussed in separate themes. Lorentz et al. (2007) find that underdeveloped distribution
networks with unreliable infrastructure, carriers, and suppliers exacerbate the bullwhip
effect. The mitigating effect of information sharing is enhanced in this context, though
160
through informal channels and individual relationships (increasing risk of corruption and
extortion). van der Laan et al. (2016) find similar results in humanitarian logistics
operations due to obsolescence of shelf-life items, highly fractious supply chains with
unintegrated agencies, short working histories and unstable supply. This also results in
potential underreporting and hoarding due to scarcity.
Second, we found work that examined effects other than error amplification
apparent in bullwhip. This implies that reducing error or error amplification will not
fully eliminate the damaging effects of bullwhip, relaxing requirements for “good
enough”. Sanders and Graman (2016) find significant bias magnification in addition to
error in a retail distribution simulation corroborated with survey responses. Hosoda and
Disney (2006) find inventory variance increases alongside demand variance in a cost
model of a manufacturing supply chain, and find that error reduction does not reduce
supply chain costs when inventory variance inflation exceeds demand variance inflation.
Ma et al. (2013a) similarly find significant costs from inventory, not demand or
production, variance inflation, and discuss the differential effects between supply chain
members. Bullwhip costs from inventory oscillation costs more for downstream
members, while upstream members bear more costs from production oscillation. This
indicates that upstream supply chain members benefit unequally from efforts to reduce
demand variance inflation.
Third, several research papers examined the relative role of demand forecast error
reduction in combatting the ill effects of bullwhip. This indicates that “good enough”
may not be as important as other factors, and may therefore be relaxed. In testing the five
161
drivers of bullwhip proposed in Lee et al. (1997), Agrawal et al. (2009) find that
information sharing (and therefore forecast accuracy) is dominated by lead time in
bullwhip reduction, and that some level of bullwhip is unavoidable. Doganis et al. (2008)
explore the bullwhip effect in the Greek dairy industry, and find that forecast error
reduction outpaces cost reduction. This implies diminishing returns on forecast accuracy
investments for upstream partners. Williams and Waller (2010) found that in grocery
store replenishment, if non-turn volume or information distortion was high, distributors
benefitted more from using order (rather than POS) data. This implies forecast accuracy
has a natural limit in instances where internal dynamics that distort information are high.
Lackes et al. (2016) find analytically that contract penalties for downward order revisions
can replace upstream forecast accuracy via POS information sharing. Though in price or
buyback penalty contracts, suppliers are found always to benefit from increases in
forecast accuracy (Amornpetchkul et al. 2015).
Currently, the logistics and supply chain management literature indicate that
while error is important to the costs borne from bullwhip amplification, there are other
significant effects and potential solutions to the costs of bullwhip. Even in contexts that
are prone to error inflation, considerations like bias and inventory inflation must be
weighed with error inflation, and may even deserve higher priority depending on industry
context and level of hierarchy. Reducing the costs of bullwhip can also be achieved
through means other than demand forecast error reduction, such as through lead time
reduction, contract incentives, or realignment of internal drivers of information distortion.
“How good is good enough?” is unfortunately complicated by all of these issues, and
162
practitioners must assess their susceptibility to bullwhip, the relative import of the error
amplification component of bullwhip costs, and feasibility of alternate means (other than
error reduction) to combat these costs.
Future research should aim to assist this effort. While efforts in the logistics and
supply chain management literature to date address identification of at most small
numbers of bullwhip cost drivers or mitigators, they must shift to more comprehensive
diagnoses and normative responses.
Cost Tradeoffs
While many of the themes previously discussed involve cost tradeoffs, the papers
in this theme address costs explicitly, and across a more diverse array of potential sources
in the supply chain. This includes works that discuss costs associated with increasing
demand forecast accuracy, relative direct and indirect costs resulting from forecast error,
offsetting costs affected by error, compensatory costs which may dominate the effect of
error, major production policy considerations of error levels, non-pecuniary error costs
that have been considered, and the unequal distribution of costs and benefits between
supply chain members.
In any discussion of “How good is good enough?”, there must be an accounting of
what it would take to improve on the current state of demand forecast accuracy.
Improvements rarely come free, and tend to be nonlinear and discrete with each type and
level of forecasting investment. Despite overwhelming evidence that such investments as
IT integration and systems to generate statistical forecasts improve accuracy, nearly half
of firms use simple spreadsheets to store demand data and generate forecasts (Canitz
163
2016), and firms are equally likely to use simple qualitative forecasting methods as more
accuracy quantitative methods (Sanders and Manrodt 2003). Numerous costs related to
improving forecast accuracy act as barriers to pursuing more accuracy forecasts.
Recruiting forecasters with technical talent or providing better training forecasters
improves accuracy (Mentzer et al. 1984, Chung et al. 2012, Canitz 2016, Doering and
Suresh 2016, van der Laan et al. 2016), and such training is readily available through
professional organizations as APICS, CSCMP, IBF and IIF. Accuracy can be improved
through investments to collect and store additional data such as customer profile
information (Chung et al. 2012), data collection technology, infrastructure or incentives
(Kelle and Silver 1989), exogenous environmental factors for integration into forecasts
(Trapero et al. 2012), product performance and failure data (Tibben-lembke and Amato
2001), and data storage capacity to store it all (Sanders and Manrodt 2003). Accuracy
investments may also include more sophisticated forecast support systems for generating,
tracking and managing forecasts (Trapero et al. 2012, Canitz 2016, Doering and Suresh
2016). In a supply chain, investments for sharing information are necessary for
improving forecasts upstream (Kelle and Silver 1989, Chung et al. 2012, Trapero et al.
2012, Babai et al. 2013, Canitz 2016, Lackes et al. 2016, Huang et al. 2017), and involve
potentially unbalanced costs between supply chain members. Each of these costs may
include some discrete up-front investment as well as maintenance, training and
continuation investments that differ by industry, product and supply chain relationship,
making it difficult for a firm to determine a marginal cost for accuracy improvement.
Additionally, such investments do not necessarily guarantee any measurable level of
164
improvement, merely greater forecasting capability. Actual accuracy improvement
depends on many (and in some cases all) of the themes we identify in this paper, though
increased capabilities have been shown to improve supply chain cost and service even
when accuracy is not improved (Doering and Suresh 2016).
To motivate investment in increased accuracy, what costs are relevant to
consider? We reviewed logistics and supply chain management literature that indicates
direct and indirect costs from demand forecast error, as well as costs that offset other
relevant costs but are also affected by levels of error. Forecast error has been found to
drive higher unfinished, buffer and finished inventory carrying costs (Schmidt 1984,
Wemmerlöv 1984, 1989, Sridharan and LaForge 1989, Ritzman and King 1993, Zhao
and Lee 1993, Tibben-lembke and Amato 2001, Kahn 2003, Sethi et al. 2007, Kerkkänen
et al. 2009, LeBlanc et al. 2009, Persona et al. 2011, Chang and Yeh 2012, Babai et al.
2013, Ma et al. 2013a, van der Laan et al. 2016), higher production and storage capacity
costs (Schmidt 1984, Kerkkänen et al. 2009), higher switching, freezing, lot sizing or
other production instability costs (Schmidt 1984, Sridharan and LaForge 1989, Zhao and
Lee 1993, Lin et al. 1994, Kahn 2003, Sethi et al. 2007, Yelland 2010), higher lost
primary and accessory sales, ordering, transportation, trans-shipping and expediting costs
due to shortages (Wemmerlöv 1989, Tibben-lembke and Amato 2001, Kahn 2003, Sethi
et al. 2007, LeBlanc et al. 2009, Persona et al. 2011, Chang and Yeh 2012, Canitz 2016),
higher trans-shipping and lower margins from overages (Kahn 2003, Chang and Yeh
2012), higher design and production costs from flexibility and postponement response
strategies (Wemmerlöv 1984, 1989, LeBlanc et al. 2009, Yelland 2010), higher
165
warehousing costs (Sanders and Ritzman 2004a, van der Laan et al. 2016), higher
spoilage or obsolescence costs (Kahn 2003, Chang and Yeh 2012, van der Laan et al.
2016), and higher information sharing costs (Yelland 2010).
Some indirect costs come as a result of higher forecast error reducing service
level, reducing customer satisfaction, increasing lead time, reducing value of information
sharing, and reducing growth and shareholder value (Wemmerlöv 1984, Kahn 2003,
Canitz 2016). Though these costs are harder to measure and are certainly influenced by
other factors, they are the most critical to securing top-level support for any major
investments in demand forecast accuracy. Without demonstrating the effect of
forecasting accuracy on shareholder value, such improvements will be up to operational
level leaders with fewer resources available.
The challenge of estimating the effect of a change in accuracy is the dynamic
responses from various offsetting costs. Both hard and soft constraints may drive higher
requirements for offsetting compensatory costs. Investments to increase accuracy can
reduce reliance on such compensatory measures, but likely in an asymmetrical manner.
For instance, increased service level requirements may be achievable through either
flexible production or increased inventory (Wemmerlöv 1984). Increased forecast
accuracy can also help achieve that service level, while simultaneously reducing the
required investment in flexibility or inventory. The difference is that flexibility
investments may be more capital intensive, and therefore have large discontinuities,
whereas inventory reductions would more likely be incremental. This implies that when
weighing the cost tradeoffs of forecast accuracy improvements, managers must consider
166
multiple real options for alternative compensatory investments. Realized cost savings,
however, will only be from the option exercised. Huang et al. (2011) demonstrate this
effect as they show lead time reduction replacing the need for accuracy to offset flexible
production or increased inventory costs. Sridharan and LaForge (1989), Tibben-lembke
and Amato (2001) and Persona et al. (2011) show this in offsetting costs of inventory and
lost sales, representing opposing signs of forecast bias, that are simultaneously reduced
with reductions in error. The tradeoff of buffer inventory costs and production switching
costs is shown to be moderated by forecast accuracy (Zhao et al. 2001). Yelland (2010)
demonstrates this in the tradeoffs of information sharing costs and flexible manufacturing
costs, as reliance on both are reduced by forecast accuracy.
Some studies have found these offsetting costs to be more effective than forecast
accuracy improvement at reducing overall costs. Sridharan and LaForge (1989) found
inventory savings to be greater from investments to reduce setup time than similar
investments to improve forecast accuracy. Ritzman and King (1993), Clark (1998) and
Venkataraman and D'Itri (2001) found costs associated with matching lot sizing and
inventory policies was more effective than forecast accuracy investments in reducing
inventory and production costs, though this is not supported in all cases (Fildes and
Kingsman 2011). Sanders and Ritzman (2004a) indicate warehouse workplace flexibility
can offset accuracy, though this is not as effective against bias. Finally, some cost
savings typically associated with error reductions are truly a result of reductions in bias.
As Chang and Yeh (2012) note, over-forecasting costs such as excess inventory carrying
cost, trans-shipping cost, obsolescence, reduced margin and labor; and under-forecasting
167
costs like expediting, higher per-unit production costs, lost sales, lost accessory sales,
reduced customer satisfaction, and delayed delivery are most often conflated as simply
error costs.
Beyond moderating a tradeoff of two or a few offsetting costs, achievable
accuracy levels may even dictate production control methods, involving differing costs
associated with aggregate production, labor, material, lot size, holding, shortage and
switching costs. Such drastic changes in cost basis have been observed in multiple
contexts, often involving estimation of policy tradeoff curves. Barman et al. (1990) show
that linear decision rules for production shifts, optimal under low or no forecast error, are
dominated by a proposed production switch heuristic when error is nontrivial. Johnson
and Anderson (2000) generate tradeoff curves for transition to a postponement
production strategy in the HP printer supply chain, dependent on forecast accuracy.
Jeong (2011) estimates a supply chain decoupling point, or lateral position in a supply
chain where production shifts from make to stock to make to order. The further back in
the supply chain the decoupling point is, the greater reliance a supply chain has on
forecast accuracy. High costs to accuracy, low achievable accuracy, high inventory costs,
and greater demand for customization drives this decoupling point forward and thus
reduces the reliance on highly accurate forecasts. Wemmerlöv (1989) demonstrate that
high achievable accuracy is a prerequisite to just-in-time inventory minimizing
production strategies.
Tradeoffs do not necessarily need to be of a pecuniary nature, as Niakan and
Rahimi (2015) demonstrate in a healthcare supply distributor and Ji et al. (2014) in a
168
manufacturing system that public externalities in the form of greenhouse gas emissions,
energy, and toxic chemical consumption can factor into a firm’s decision to invest in
forecast accuracy. However, these are increasingly becoming real costs through
environmental regulation. van der Laan et al. (2016) demonstrate in humanitarian
logistics operations that a particularly difficult proposition is determining the value of the
potential to save a life by increasing forecast investment. Wacker and Sprague (1998)
show that different cultural (as measured by Hofstedte’s cultural dimensions) values in
seven countries affect the required level of forecast accuracy and investment in
technology. For instance, greater power distance and individualistic cultures are more
likely to invest in demand forecasting technology and depend on more accurate statistical
forecasting methods, masculine and individualist cultures are more likely to depend on
qualitative factors and manual adjustments, and masculine cultures are more likely to
involve senior leadership in forecast development. These non-pecuniary biases, altruistic
impulses, cultural and societal pressures, and risk orientations (discussed separately) must
be considered with cost tradeoffs of forecast accuracy, even if they are not directly
implemented in a model of costs.
In a supply chain, the complex interdependent cost tradeoffs with forecast
accuracy are also unequally shared between different members of a supply chain. This
often introduces additional inducement, contracting, or information sharing costs to
balance this asymmetry, moderated by the balance of relational power and level of
information asymmetry. Hosoda and Disney (2009) and Chen and Xiao (2012) both
show that increasing forecast accuracy to minimize local costs can actually increase
169
overall supply chain costs, as additional costs from moral hazard manifest from
information asymmetry. Zhao et al. (2002b) and Sethi et al. (2007) both note that
downstream members of the supply chain typically bear higher costs from forecast
accuracy investments, and do not share equally in benefits of forecast accuracy.
Upstream members typically benefit from increased accuracy through reduced production
and distribution variance, whereas downstream partners benefit primarily from inventory
reductions (Hosoda and Disney 2009, Lackes et al. 2016). Taylor and Xiao (2010) find
this effect to be convex, and that manufacturers only enjoy cost benefits from accuracy
improvements when downstream demand planners already have acceptable forecast
accuracy. Miyaoka and Hausman (2008) show that cost benefits enjoyed by upstream
supply chain members from increased downstream demand forecast accuracy are only
realized if they set the wholesale price (either by enjoying greater market power or as the
Stackleberg leader). Downstream members are motivated to over-order (Chen and Xiao
2012, Lackes et al. 2016), particularly in cases where buybacks or cancellations are
permitted. In such situations, upstream supply chain members can bear the additional
cost of incentives, contract costs to implement buyback or cancellation penalties (Zhao et
al. 2002b, Sethi et al. 2007), or information sharing costs (Huang et al. 2017) in order to
motivate forecast accuracy investments by downstream members in order to avoid excess
production variance costs. Inducement costs can increase with information asymmetry
between supply chain members, the relative power of principals, and the risk orientation
of each (Lackes et al. 2016). Chang and Yeh (2012) propose a method for objectively
controlling cost sharing in demand planning collaboration in a steel manufacturing supply
170
chain through a prior agreement on exception thresholds defined by Taguchi loss function
for the supply chain, thus helping to minimize additional costs from moral hazard.
For implications on “How good is good enough?”, cost tradeoffs are the most
difficult to accurately (and completely) identify, but likely the most important in
determining a final level of desired accuracy. Improving forecast accuracy can involve
significant investments in forecaster skill, as well as IT tools for gathering data,
generating and tracking accurate forecasts, and sharing relevant information with supply
chain partners. Though even with such investments there is no guarantee of accuracy
improvement, there are significant potential cost reductions from pursuing these
capabilities. Managers must determine which costs are relevant to their situation,
accurately measure them, and determine the potential improvements from incremental
changes in forecast accuracy. Cost savings from increased accuracy can be direct, such
as through reduced inventory or production variance costs, or indirect, by reducing those
costs associated with lost shareholder value from lower customer satisfaction. Multiple
offsetting potential costs must be weighed against forecasting investments, and in some
cases alternate compensatory investments may be preferred to investments to improve
accuracy. The achievable level of accuracy may also dictate the policies and types of
investments necessary to meet demand at the lowest cost. Improvements in demand
forecast accuracy can reduce non-pecuniary costs as well, as social, cultural and
environmental considerations increase in importance to a firm. Finally, managers must
take into consideration that the costs and benefits of forecast accuracy improvements are
borne unequally between supply chain members. To maintain a healthy supply chain and
171
maximize the benefit for all, some supply chain members must bear additional costs to
share the burden of accuracy improvement costs.
As with other themes, the considerations for cost tradeoffs merely identify the
relevant costs and measured effects currently present in the logistics and supply chain
management literature. To act on these findings, a firm needs to assess their own context
and relevant costs. Estimating the cost tradeoffs of nonlinear and discontinuous accuracy
investments with separate nonlinear and potentially discontinuous functions for various
offsetting costs (or real options) with any degree of accuracy is a complex and difficult
endeavor. However, armed with the knowledge from the other themes discussed in this
work, managers can make an informed decision of desirable accuracy, given contextual
factors that drive achievable accuracy. One helpful suggestion is to focus what is likely
to be a significant effort on those products that promise the most return from forecasting
accuracy investment. Morlidge (2014b) suggests, in their examination of forecasts from
11,000 products, that as little as 6% of products make up nearly two thirds of the
potential improvement in forecasts due to forecaster skill (and 40% of forecasts would be
made better by simply implementing a naïve forecast).
Future research should focus on aiding practitioners navigate the complex
interdependencies of estimating cost tradeoffs. Extant research seems somewhat myopic
in examining relatively few local cost tradeoffs. Researchers must strive to capture a
more holistic set of interdependent costs and opportunities. Canitz (2016) suggests that
forecasting investments will remain low until firms can demonstrate value exceeding
additional costs through such tools as the Du Pont Financial performance model, or we
172
would propose EVA, as suggested in Lambert and Pohlen (2001). Research must target
research to aid this effort.
Supply Chain Integration
We found in our search of logistics and supply chain management literature that
the initially separate search topics of “information sharing”, “collaboration” and
“integration” were largely conflated, and so were not effectively separable. Therefore,
the emergent theme we call supply chain integration exists as a continuum spanning
simple information sharing through complete integration of forecasting efforts in a supply
chain. We separate the reviewed literature into work on the simple information sharing
side of the spectrum from work that involves more direct collaboration and in some cases
integrated demand planning.
First, we discuss the effects of various types of information sharing that have been
studied in the logistics and supply chain management literature. Ramanathan (2013)
describes multiple characteristics of information that can be shared between (and within)
firms for the purpose of forecast accuracy improvement, including factors of importance,
relevant forms of information shared between supply chain entities, and the implications
of external sources of information. We use this typology to examine the literature on
simple information sharing.
Factors of information importance include cost, which can be cost of analyzing,
sharing, gathering, and how these risks or costs (and resultant benefits) are shared
between supply chain members; usability, or the way it can or will be used; reliability, or
accuracy of input data, trustworthiness and persistence of the source; actionability, or
173
how readily it can be converted to immediate utility; and finally capability, or the
capacity of the information acquirer to transform the information to a valuable outcome.
Eksoz et al. (2014) find these factors to be important to both departmental (internal
collaboration) and group (external collaboration) information sharing.
As discussed in the previous theme of cost tradeoffs, costs and risks for forecast
accuracy improvements tend to fall more heavily on those downstream organizations
with direct access to demand. Cost savings also tend to be greater upstream in the supply
chain for improved accuracy. This makes various types of demand information or
demand forecasts valuable to upstream partners, who are then willing to share some
burden in order to gain access to that information in order to optimize supply chain
benefits. Ülkü et al. (2007), Zhu et al. (2011), Kurtuluş et al. (2012) and Bian et al.
(2016) generalize this to show that increased cost of forecast accuracy increases the value
of information to the supply chain, but disproportionately against whomever makes the
investment and to whom is primarily responsible for inventory costs. Yao et al. (2005)
provide an example of this more general view where the upstream manufacturer has
greater forecasting capability than their customer. They find that for manufacturers
employing a multichannel marketing approach with a direct channel, information sharing
from a customer-competitor only holds value if the customer’s forecasting capability
exceeds the manufacturers’. In addition to the cost of accurate information, the level of
information asymmetry between supply chain principals increases the value of
information sharing. Kung and Chen (2014) find that upstream members of the supply
chain benefit from sharing forecast information when asymmetry is high, but only if all
174
independent downstream members are synchronized in both their accuracy and sharing.
Hartzel and Wood (2017) find support for the asymmetry argument when they observe
greater value from POS information sharing when order frequency is low, and shared
information is stationary and non-intermittent. This implies that upstream signals without
sharing have low fidelity as a function of distortion, and shared POS data has high
forecastability. Bian et al. (2016) also shows that in industries with high competitive
intensity, competing supply chains also benefit from observing the actions of information
sharing supply chains.
Cachon and Lariviere (2001), Terwiesch et al. (2005) and Guo et al. (2006)
indicate that a lack of trust of information from downstream partners may hinder
information sharing, particularly if information is volatile or found to have (particularly
positive) bias. They find that upstream partners who normally benefit unequally from
information sharing are unwilling to pay for incentives to share information if trust low,
and instead institute penalties (delayed delivery, reduced capacity dedication) for
untrustworthy downstream partners. Paradoxically, this increases volatility and positive
bias in downstream forecasts, and further reduces the demand for accurate forecast
information sharing.
Though the majority of logistics and supply chain management research on
information sharing has to do with either point of sale demand data or demand forecasts,
relevant forms can include sales data; order data; seasonal, discount, promotional and
historical sales; relevant regulations and policies; and local forecast information. Sharing
these types of information has been shown to simultaneously reduce the negative effects
175
of demand forecast error and increase the achievable forecast accuracy at higher
echelons, as discussed above in the error amplification (Williams and Waller 2010) and
aggregation (Williams and Waller 2011b) themes.
Sharing POS data has been found to lower upstream supply chain labor and
inventory costs through improved replenishment forecast accuracy (Trapero et al. 2012,
Babai et al. 2013) and reduced bias (Sanders and Graman 2016), and dampen the
inventory costs from positive bias introduced by increased product variety (Wan and
Sanders 2017). Though as Cui et al. (2015) note, POS information sharing alone may not
improve upstream forecasts unless alternate information such as replenishment policy is
also shared. Similarly, demand forecast sharing has been shown to reduce inventory and
production costs (Zhao et al. 2002a, Ali et al. 2012), reduce demand variance
amplification (Ma et al. 2013b), and can serve as a substitute for downstream demand
forecast accuracy (Yao et al. 2005), each moderated by the degree of demand forecast
accuracy.
Alternate forms of information to share include engineering and failure data
(Tibben-lembke and Amato 2001), additional information on demand or forecasts such as
the distribution (not simply the level) of uncertainty Terwiesch et al. (2005), relevant
replenishment and stock policies (Williams and Waller 2011a, Cui et al. 2015), inventory
and return data both between firms and between channels (Coronado Mondragon et al.
2011) that can serve as replacements for forecast accuracy. These alternate data forms
can improve upstream performance even when accurate demand or forecast information
is not shared.
176
External sources may include competitor and market data. Most of this would be
gathered from a third-party marketing firm or consultant, and can be expensive.
Achievable and desirable forecast accuracy improvements from external data such as this
would depend on its availability, quality and cost.
Beyond simple information sharing, there are numerous examples in the literature
of the effects of collaboration or full integration on forecast accuracy. McCarthy and
Golicic (2002) and Yao et al. (2013) identify some of the major difficulties in
collaborative planning, forecasting, and replenishment (CPFR) implementation. In the
long-run it promises forecast accuracy improvement and lower costs, but implementation
costs include software and technology, investments in coordination and information
exchange, time and personnel for set up and maintained coordination operations,
difficulty in scalability from pilot usage across suppliers and products, and perhaps most
difficult synchronous change efforts in multiple firms. Improvements in forecast
accuracy do not always align with inventory cost savings as the collaborative efforts are
harmonized, and this also depends on product life cycle stage. There is also risk, as these
costs do not guarantee successful implementation, and may disrupt and hinder
(particularly replenishment) operations initially. Moon (2015) notes that the
effectiveness of collaborative forecasting depends on the internal source of data, and both
the generator and consumer of a forecast. If all three of these groups share interests and
integrate effort, then collaborative forecasting can provide value. Nagashima et al.
(2015) demonstrate that collaboration intensity improves forecast accuracy, but only
among products that have low market competition (are inimitable). In one form of
177
integration, vendor managed inventory, Kannan et al. (2013) and Kurtuluş et al. (2012)
show that such an arrangement shifts the cost burden for order processing costs,
transportation costs, lot quantity costs, and in some cases inventory carrying costs
upstream to a supplier. This significantly affects the desirable level of forecast accuracy
based on the previously discussed sets of cost tradeoffs.
For the manager attempting to determine “How good is good enough?”, various
degrees of supply chain integration typically entail greater achievable accuracy at higher
echelons of the supply chain, and lower required accuracy at lower echelons. This effect
depends on several factors of information importance, such as the cost of the data, level
of information asymmetry, the trustworthiness of the shared data, and for shared forecasts
the accuracy of the downstream forecast. For organizations that are further along the
spectrum and collaborate or are truly integrated in their demand planning, “good enough”
can change substantially based on how the collaboration shifts cost burdens and on the
effectiveness and maturity of the collaborative relationship.
As most work on information sharing focuses on sharing POS data and demand
forecasts, it appears there is an opportunity to develop our understanding of variable
effects of sharing different types of data. To date, little has been done to differentiate the
effect by information type. Future research should also focus on how different
collaboration types shift the complex set of cost tradeoffs that dictate “good enough”.
Supply Chain Flexibility
Flexible strategies such as postponement or standardization of parts (as in a super
bill of materials) can serve as a substitute for forecast accuracy, and in some cases
178
improve forecast accuracy. The value of postponement increases with increased forecast
error (Johnson and Anderson 2000) and degree of product proliferation (Lee 1996), while
greater degrees of postponement increasingly diminish the effect of forecast accuracy
(Wemmerlöv 1984). Postponement and standardization of parts have been shown in
improve forecast accuracy when the firm desires greater product variety (Persona et al.
2007, Paul et al. 2015), but the success of such strategies depends on the maturity of an
industry, market, technology or product (Terwiesh et al. 2005), as well as the quality of
management (Khouja and Kumar 2002). Revolutionary products likely do not have the
market penetration to offer variants that permit postponement or standardization. By
either increasing accuracy or replacing the need for accuracy, flexible strategies shift the
relative cost tradeoffs with lower levels of forecast improvement investments (Khouja
and Kumar 2002, Graman and Sanders 2009) and both unfinished and finished goods
inventory (Persona et al. 2007, Graman and Sanders 2009), while increasing costs for
sourcing (Paul et al. 2015), engineering (Persona et al. 2007), redesign (Lee 1996),
postponement variety and volume capacity (Khouja 1998, Graman and Sanders 2009)
and manufacturing of common parts (Khouja and Kumar 2002).
Implementing flexible strategies can mean “good enough” is lower, when used to
replace relatively expensive improvements to forecast accuracy, or can alternatively
mean “good enough” is higher due to increased achievable accuracy. This will depend
on the perceived necessity of product variety.
Future research on flexible strategies in relation to forecast accuracy must address
many of the same issues identified for future research on cost tradeoffs. To date,
179
considerations of such policies are myopically focused on local costs. The potential
impact on value generation when shifting to flexible strategies means that research ought
to include more cross-functional perspectives. Operations and logistics costs are
currently examined in the research, but marketing and research and design considerations
are either ignored or remain untested. As with other themes, the scope of research ought
to span the supply chain rather than the firm, as accuracy implications of flexible
strategies are likely unequal between different supply chain principals.
Manual Adjustments
Though generally statistical forecasts are found to have greater accuracy than
subjective or qualitatively generated ones, subjective manual adjustments are found to
improve upon statistical forecasts. The effect of manual forecast adjustment is even
found to have an enhanced effect on statistical forecast improvement when incorporated
as automated adjustment heuristics (Hur et al. 2004, Fildes et al. 2009). Such
adjustments have limitations, however, as they can add considerable cost and time when
there are many forecasts to generate (Sanders and Ritzman 1995). Manual adjustments
can replace error with more costly bias (Fildes et al. 2009, Wan and Sanders 2017),
especially when products are newly introduced or in the decline phase of their lifecycle
(Petersen 2003), and lose effectiveness or can even increase error when adjustments are
small (Fildes et al. 2009) or frequent (Wacker and Sprague 1998). The effectiveness of
adjustments depends on degree of adjuster expertise, but also can differ by biases from
personal motivation (Eroglu and Croxton 2010), organizational motivation, information
access, level of procedural control (Oliva and Watson 2009), culture (Wacker and
180
Sprague 1998) and gender (Eroglu and Knemeyer 2010). Effectiveness may not be
measured as reductions in forecast error or bias, as Ebert and Lee (1995) show manual
adjustments can reduce production operation costs while decreasing accuracy.
Effectiveness can also depend on the type of information introduced by an adjuster.
Marmier and Cheikhrouhou (2010) note that time sensitivity, immediacy, impact and
persistence of information all can affect adjustment quality.
In determining “good enough”, this may mean accounting for the cost or
availability of incorporating manual adjustments. If error is high, manual adjustments are
more likely to improve accuracy. If there are numerous products to forecast with high
frequency, adjustments can be expensive and may require contextual knowledge that
might not be available. Though automated heuristic implementations of manual
adjustments could increase speed or reduce costs, Fildes et al. (2009) note that these
features are not yet common in forecast support systems, and may require significant
technology investments. Managers must assess their situation to determine if they have a
positive potential for forecast accuracy improvement from incorporating manual
adjustments, and weight this against the realized costs from implementing these
adjustments.
Research should support this effort to a greater extent. Manual adjustments ought
to be examined on a contextual basis. The effectiveness of adjustments has been shown
to depend on the size and frequency of an adjustment, and on numerous characteristics of
the adjustor, but what about forecasts generated for different purposes, at differing levels
181
of hierarchy, and over different horizons? Academic research will provide the most
benefit for practitioners if it can normatively prescribe action based on circumstance.
Risk
Risk occupies a separate theme from cost tradeoffs because most often cost
tradeoffs imply perfect information and rational action. Situations of actual risk involve a
significant lack or loss of information. Perceived risk drives irrational action based on
risk preference (for both seekers and avoiders). For this reason, the investigation of the
limits of forecast accuracy based on actual and perceived risks from forecast error
occupies a significant stream of logistics and supply chain management literature.
Sanders and Manrodt (2003) identify environments likely to be sensitive and
insensitive to demand forecast error that they call high and low uncertainty environments.
High uncertainty environments include frequent product changes, short product
lifecycles, market substitutes, global competition, low market power, and potential for
obsolescence, and are likely to not only have higher degrees of forecast error but to also
experience higher costs as a result. Low uncertainty environments include those with
monopoly protections such as patents, long product life cycles, and unique products with
high barriers to entry, and are unlikely to incur as great of costs as a result of forecast
error.
Examples of high uncertainty environments include food supply chains (Eksoz et
al. 2014), susceptible to spoilage and waste, but also risks to public health and from
regulatory intervention. Thiel (2014) describes the massive shifts in both demand and
supply when a health risk prompts a recall. In the case of a recall, hedging sources or a
182
flexible strategy of production dominates forecast accuracy, which may not be possible in
unpredictable public risk perception. Rapid innovation and heavy regulation in an
industry, as is experienced in the pharmaceutical industry, can also make rapid and
dramatic changes to demand conditions commonplace (Kiely 2004). Lorentz et al.
(2007) provides an example of poor infrastructure inducing variance in supply and
production, making stable and efficient operations impossible and exacerbating the
effects of forecast error. Wickramatillake et al. (2007) show that in the execution of
major projects like infrastructure construction, the interdependencies of different
suppliers, service providers, regulators and principals make such endeavors particularly
susceptible to forecast errors.
Responses to high uncertainty typically entail some hedging or robustness
strategy, as forecast investments likely have poor returns. Supply chains can be made
more robust to high uncertainty through diversification of supply (Cachon and Lariviere
2001), excess inventory or production capacity (Georgiadis and Vlachos 2006,
Kerkkänen et al. 2009), through investments in increased communication (Terwiesch et
al. 2005), or through agility investments like dynamic assortment planning (Rajesh and
Ravi 2015). As high uncertainty environments tend to have extremely low
forecastability, in both demand levels and for relevant costs, there is a much greater
potential for risk preference of decision makers to drive significant over and under
investment in forecast accuracy and other hedging strategies.
The negative effect from perceived risk differing from actual risk can manifest in
both risk seekers and risk avoiders (neutrality implies cost optimal decisions). Managers
183
who are risk seeking may choose not to hedge, instead seeking short term cost savings,
but in the end face major shortage costs. Risk averse managers instead overspend on a
risk management over time so that the large, but rarer, shortages can be avoided. These
actions may actually cause greater uncertainty for other members of the supply chain.
Guo et al. (2006), Chen and Xiao (2012) and Lackes et al. (2016) demonstrate that a
more risk averse downstream partner is more likely to over-order (inducing bullwhip),
and will likely charge a higher premium to share POS or forecast information, more
valuable to their suppliers. Shin and Tunca (2010) find that firms can overinvest in
demand forecast error reduction when perceived market competition is high, or if barriers
to demand forecast improvements are low (implying competitors are likely to make such
investments). Li et al. (2015) show that risk aversion leads to overinvestment in lead-
time reduction with increasing forecast error, which can be somewhat mitigated with
revenue sharing contracts between supply chain echelons. Risk aversion can be
considered a significant, if difficult to measure, problem among logistics and supply
chain managers as Smith and Mentzer (2010) find strong belief among managers that
improved forecasts will improve decision making and logistics performance. While
generally true, this belief can lead to risk averse overinvestment when specific conditions
do not call for increased accuracy.
Correcting the gap between perceived and actual risk can be accomplished
through increased communication, and strengthening of relationships between supply
chain members. Ebrahim-Khanjari et al. (2012) and Gönül and Goodwin (2012) find that
perceived competence, benevolence and integrity between retailers and suppliers fosters
184
trust and reduces the risk from forecast inaccuracy. Similarly, Bian et al. (2016) find
trust to reduce the inequality of value from information sharing. That is, regardless of
forecast accuracy, benevolent forecasters are trusted and selfish forecasters are not trusted
(though error and bias affect future trust).
For the manager determining “How good is good enough?”, the key task from this
theme is determining the levels of actual and perceived risks they are exposed to. Higher
levels of uncertainty in their environment mean achievable accuracy is lower. What is
more important in these instances is knowing precisely how inaccurate their forecasts are.
This can help them quantify their potential costs from differences in perceived and actual
risk. Hedging strategies are found to effectively mitigate actual risks, while increased
communication and building of supply chain relationships has been shown to mitigate the
effects of perceived risk deviating from actual risk.
Future research on the effect of risk on forecast accuracy should explore the
differences between perceived and actual risks in how they affect total supply chain costs.
Current logistics and supply chain management research conflates these distinct types of
uncertainty, and tend to focus on remedies that can be enacted by a single firm. Supply
chains depend on the willing participation of multiple firms, each consisting of agents
with varying risk preferences, so the effects of these differences on supply chain costs
should be measured.
Production and Inventory Control Policy
Generally speaking, more accurate demand forecasts decrease production and
inventory costs, regardless of control system. However, the effects of forecast accuracy
185
on costs depend on the type of control policies in place, and has been found to have
Pareto returns.
Production and inventory replenishment policies that include small buffers and lot
sizes, and who have little excess volume or varietal capacity will tend to be more
sensitive to demand forecast errors and are likely to see the most benefit from increases
in accuracy. Such policies are driven by achievable demand forecast accuracy, but also
varying costs of production switching and setup, master production schedule (MPS)
replanning, flexible production capacity, production volume capacity, labor, buffer and
lead time inventory carrying, and shortage costs. While many of these costs offset each
other depending on inventory or production policy, error has been found to significantly
affect the choice of policy and the costs associated. Many significant examples exist
regarding the choice of lot sizing policy, MPS freezing and replanning periodicity, buffer
stock policy, and lead time replenishment policy.
Demand forecast error, along with material requirements planning (MRP) process
error (Fildes and Kingsman 2011) and relevant cost tradeoffs between setup and
inventory carrying costs (Ho and Ireland 2012) have been found to have a significant
effect on the choice of lot sizing policy. Error can drive more frequent orders and more
responsive lot size policies if setup costs are low, it can motivate less responsive lot
sizing policies to take advantage of the effects of error aggregation is inventory holding
costs are low (Ho and Ireland 2012). The interaction of lot sizing policy and demand
forecast error have been found to have significant effects on overall production costs
(Wemmerlöv 1985, Venkataraman and D'Itri 2001, Fildes and Kingsman 2011),
186
increasingly as MRP structures become more complex (Lee and Adam Jr. 1986). Though
some studies have found cost savings from error reduction to be insensitive to lot sizing
policy (Jeunet 2006), particularly in cases of capacity smoothing (Harl and Ritzman
1985). Lot sizing policy has also been found to offset the production instability effects of
demand forecast error (Ho and Ireland 1993, 1998, 2012), except in cases of very high
error (De Bodt and Van Wassenhove 1983).
Forecast errors significantly impact optimal MPS planning horizon, freezing
proportion, freezing method (by period or order), and replanning periodicity depending
on direction of error bias (Zhao and Lee 1993, Lin et al. 1994,Venkataraman and Nathan
1999, Xie et al. 2004). Demand forecast error effects are shown to dominate the effect of
freezing proportion and replanning periodicity on MRP system costs (Yang and Jacobs
1999), particularly in push systems. In pull systems with frequent replanning and lower
degrees of MPS freezing, demand forecast error has a smaller relative effect dependent
on the relative costs of responsiveness and inventory holding (Masuchun et al. 2004).
There are some cases, such as in multilevel, multiproduct production systems that
experience hedging between product lines, that error is found to have linear effects on
costs (Altendorfer et al. 2016). However, most research indicates inventory reductions
and unit costs are not proportional to error reductions in production control environments
(Doganis et al. 2008, Fildes and Kingsman 2011, Jeong 2011), and some production
control policies are relatively more asymmetrically sensitive to bias than error (Xie et al.
2004, Sourirajan et al. 2008). When shortage costs are accounted for, slight positive bias
(nonzero error) has been found to minimize costs (Biggs and Campion 1982, Lee and
187
Adam Jr. 1986, Xie et al. 2004) and improve delivery performance (Enns 2002),
particularly when capacity is tight or shortage costs are high, but this has also been shown
to increase MPS tardiness (Enns 2002).
Demand forecast error drives required levels of safety or buffer stock, but
decreases in error provide Pareto returns on inventory cost savings (Zinn and
Marmorstein 1990), and may better be accomplished through reductions in bias
(Willemain et al. 2004). Buffer stock policy can also offsets forecast accuracy in
reducing schedule instability (Ho and Ireland 1993, De Bodt and Van Wassenhove 1983,
Enns 2002), though buffering is insufficient to completely control “nervousness”
(Wemmerlöv 1985). Inventory costs are found to be more sensitive to increases in error
when safety stock follows a constant cycle versus a constant stock policy, and when error
is nonstationary (Campbell 1995), and stock performance depends on order frequency,
lead time, and error distribution.
Logistics and supply chain management research also indicates significant effects
from forecast accuracy on inventory control policy. Liao and Chang (2010) find
interesting results under periodic review order-up-to (s,T) and continuous review fixed-
order quantity (r,Q) varying lead time and error in an ant colony optimization. In (s,T)
systems, longer lead time costs were positively correlated with error, whereas in an (r,Q)
system shorter lead time costs were more positively correlated with error. This implies
achievable error can be substituted for with appropriate inventory replenishment policy
and lead time combinations, though in general longer lead times monotonically increase
forecast error and overall inventory costs (Jeffery et al. 2008). Higher error associated
188
with longer lead times can also moderate the cost tradeoffs between quantity discounts
and inventory carrying costs in an (r,Q) system (Kim et al. 2003). Tratar (2010) show
that neither minimization of error nor bias under (s,T) replenishment minimizes inventory
costs, and suggest instead a joint optimization. Though this could be improved upon by
expanding to a more holistic measure like EVA. Huang et al. (2011) show that in a (s,T)
system, order adjustments based on bias are more cost effective than those based on error.
Ganeshan et al. (2001) show that in a four tier supply chain (distribution requirements
planning system) for a chemical company, increased error is related to increases in cycle
time costs (inventory carrying costs, obsolescence, warehousing) and reductions in
service level and return on investment (due to stock-outs, on-time delivery performance
and transportation costs). Inventory planning in closed loop systems may face difficulty
achieving accurate forecasts due to the increased expense and reduced reliability of data
in reverse channels (Kelle and Silver 1989). Finally, the effect of forecast accuracy on
inventory costs can be shifted to other members of the supply chain through collaborative
planning and control arrangements like vendor managed inventory (Kannan et al. 2013).
In general, this theme would imply “good enough” depends on the lot sizing
policy, MPS freezing and replanning periodicity, buffer stock policy, and lead time
replenishment policy in use. The appropriate choice of each policy and the effect of
demand forecast error would then include a consideration of relevant cost tradeoffs. As
with previous themes, work relating to control policies indicate Pareto returns on forecast
accuracy investments, and that other metrics such as forecast bias may be more effective
indicators of overall costs than forecast error.
189
Current research on this theme is limited to a production control policies in highly
specific contexts. Future work should attempt to generalize the effect forecast accuracy
on production control policies under multiple circumstances to determine whether
product, industry or market factors are significant covariates. The majority of research
regarding the effect of forecast accuracy on inventory control policy costs focuses on two
control mechanisms. Future research in this area ought to include a greater focus on
collaborative inventory models, hybrid inventory models, and inclusion of stochastic
assumptions, as suggested by Williams and Tokar (2008).
190
Driver Implications Proposed Future Research
“Good enough” depends on susceptibility to bullwhip, the relative Must shift to more comprehensive
Error import of the error amplification component of bullwhip costs, and diagnoses and normative responses.
Amplification feasibility of alternate means (other than error reduction) to combat
these costs.
Desired accuracy may be lower depending on the complex tradeoffs, Should focus on direct and indirect
both pecuniary and non-pecuniary, within and between firms in a effects of demand forecast accuracy
Cost Tradeoffs supply chain. To determine “good enough”, firms must assess their improvements on more holistic long
relevant tradeoffs and prioritize accuracy improvement investments to term value measures such as EVA.
those situations that promise the greatest returns.
“Good enough” can change substantially based on cost of shared data, The different effects of various types of
level of information asymmetry, trustworthiness of the shared data, information to be shared and types of
Supply Chain
for shared forecasts the accuracy of the downstream forecast, how collaboration between supply chain
Integration collaborating shifts cost burdens and on the effectiveness and principals have yet to be explored.
maturity of the collaborative relationship.
Flexible strategies have been shown to both increase achievable Current research on flexibility has too
Supply Chain accuracy and reduce reliance on accuracy, potentially meaning lower great a focus on local operations costs,
desired accuracy. Flexible strategies affect “good enough” by both
191
Flexibility must focus on holistic metrics like EVA.

replacing and augmenting demand forecast accuracy.
Manual adjustments can be expensive to implement, but have been Effects of adjustments need to be studied
Manual shown to increase achievable accuracy in a variety of circumstances. over a greater variety of contexts to
Adjustments Practitioners must weigh costs of updates against the potential benefit normatively prescribe action.
to determine “good enough”.
“Good enough” depends on both actual and perceived risk. Hedging Requires a greater understanding of the
against actual risk and increasing communication to limit the portion effect of differences between perceived
Risk of perceived risk that differs from actual risk can replace higher and actual levels of risk over multiple
required demand forecast accuracy. supply chain principals.
Inventory and production control policies introduce discontinuities in Should examine more generalizable
the cost tradeoff considerations of “good enough”. Practitioners production conditions, include more
Inventory and
should consider the effects of such policies explicitly when measuring stochastic assumptions, and focus on
Control Policy potential tradeoffs, and include other metrics besides accuracy more complex collaborative and hybrid
(particularly bias). inventory models.
Table 17: Summary of Managerial Drivers of Demand Forecast Accuracy

Limitations
Our search was limited to journals with either a focus on logistics or supply chain
management, or related journals that feature research that specifically mention
implications of logistics or the supply chain. The term “logistics” only began to
popularly replace “physical distribution management” in the 1970s (Farris 1997,
Southern 2011). “Supply chain management” was coined in the 1980s, but did not gain
prominence until the later 1990s (Cooper et al. 1997). By including these terms, and not
more antiquated terms for similar concepts, we necessarily limited the search to more
recent articles. We also focused on peer reviewed journals written in English. The
limitation to English language articles reflects a limitation of the research team. Non-
English articles likely hold merit and would significantly improve our exploration of the
bounds of demand forecast accuracy, but our team did not possess the capabilities to
assess this. Omitting non-peer reviewed material admittedly eliminated several theses,
dissertations, books, and conference proceedings that certainly have merit and are closely
related to our topic. However, as these works were not subject to rigorous peer review,
our team had no verification of their rigor.
Conclusion
This review of logistics and supply chain management literature reveals a number
of themes that have been found to shape the bounds to achievable and desirable forecast
accuracy. Each theme has been explored and measured in academic research to varying
degrees, and the expected effects have been measured over a wide range of conditions.
For each theme, we provide a brief overview of extant academic work, relate what this
192
means for a demand manager, and recommend future directions for academic efforts to
aid practice. For the practitioner, this provides some indication, supported empirically,
analytically, or theoretically, of where to look when attempting to answer “How good is
good enough?”.
For technical drivers of demand forecast accuracy, the six themes indicate both
positive and negative forces on primarily achievable forecast accuracy. Research on
forecastability would suggest some demand patterns, characterized by high variation,
intermittence, and other non-stationarity makes lower levels of forecast error
unachievable. “Good enough” is lowered, and alternative mitigating investments, or
metrics to prioritize efforts like RAE better serve managers. Extant research indicates
increasing the horizon of a forecast also lowers achievable forecast accuracy, but that this
effect can be offset by aggregation effects when forecasts are generated for alternate
purposes. In these cases, managers must try to estimate what proportion of error comes
from the horizon. Overfitting and misspecification also lower “good enough”, as when
forecasters develop extrapolative models, they must be wary of replicating past patterns
too closely. This unfortunate ailment of all mathematical models implies that achievable
accuracy is higher than desirable accuracy when modelers adapt explanatory models for
use in prediction. In considering tradeoffs of metrics findings in the literature are that
“good enough” may not matter unless the right type of deviation of prediction from
reality is measured. Depending on circumstance, some metrics are more important to
overall performance than others, and optimizing any one will cause others to suffer.
Multiple measures should be used, and “good enough” for any one metric should be
193
lower than what is achievable. Research on level of aggregation and hierarchy indicate
that increases in both of these dimensions drive “good enough” higher. The effect from
hierarchy results from a concentration of resources. However, there is a risk from
aggregating in misunderstanding the relative importance of sampling and specification
error. Higher achievable accuracy through aggregation may also result in lower utility of
a forecast. Finally, academic work regarding the effect of data quality and availability
indicate that limits to these drivers will lower “good enough”. Hard limits exist where
improvements on quality or availability are not possible, but these can often be affected
through investments in capabilities to collect information or incentives to share
information between organizations.
Among managerial drivers of demand forecast accuracy, the seven themes
indicate both positive and negative forces on primarily desirable forecast accuracy. Error
amplification (or more commonly referred to as bullwhip) research demonstrates that the
effect of demand forecast accuracy on bullwhip-related costs is often smaller than other
factors. The literature also indicates signal amplification costs are only some of the
relevant bullwhip-related costs, and that these are unequally shared based on a firm’s
position in a supply chain. This suggests other remedies such as bias or lead time
reduction are more cost effective responses, and “good enough” may be lower. The most
complex and heavily investigated theme, cost tradeoffs, reveals the interdependent direct,
indirect and nonpecuniary costs associated with investment in additional forecast
accuracy or costs associated with lower levels of accuracy. Work reviewed under this
theme indicate the criticality for firms to accurately estimate their relevant costs, while
194
also revealing the incredible difficulty of generating cost (or value) functions for several
possible alternatives, requiring voluntary coordination from supply chain members who
bear costs and benefits unequally. Previous work indicates which costs will likely effect
“good enough”, but combination of effects depends on individual circumstances.
Research on supply chain integration indicates “good enough” increases with increasing
integration among upstream supply chain members, but “good enough” is lower with
increasing integration downstream. This means that “good enough” becomes a function
of position in a supply chain, and of how effectively the supply chain members can share
relevant costs and benefits of information sharing. The literature would indicate supply
chain flexibility has some mixed effects on “good enough”. Flexible strategies
simultaneously lower “good enough” by reducing reliance on forecast accuracy, and
increase “good enough” by increasing achievable accuracy. Manual adjustments are
found to generally increase “good enough”, but these depends on the costs associated
with gathering inputs and have diminishing returns. Previous investigations on risk
indicate mixed effects on “good enough”. Actual risk can entail both greater levels of
demand uncertainty, which lowers “good enough”, and greater cost vulnerability to
uncertainty, which increases “good enough”. If substantially different from actual risk,
perceived risk introduces additional costs, but in this case the requirement for accurate
knowledge of the level of error is affected more than the actual level of forecast error.
Finally, work regarding the theme of production and inventory control policy indicates
mixed effects on “good enough”. While inventory and production cost savings from
increasing forecast accuracy would indicate higher levels of “good enough”, these have
195
Pareto returns, and choice of policy appears to have a greater effect on costs than forecast
accuracy.
Within each theme, we also recommend areas for future research. While the
recommendations for each theme were unique, some common deficiencies in extant
literature were apparent. Future work in most themes must include more holistic
measures of value, and include a greater scope of extra-firm considerations. Local costs
do not motivate high level investments to change how a business operates, so measures of
forecast accuracy must be associated with long term value to the greatest extent possible.
Firms also do not exist independently of their networks of suppliers or customers, so
considerations of how forecast accuracy at various levels affect relative costs and benefits
between members of the supply chain must be included in future research. This research
can also be extended by applying these identified themes in a case study evaluation of a
firm or supply chain’s demand management processes. By explicitly measuring these
themes, it would provide a template for practitioners in evaluating their own conditions
for achievable and desirable forecast accuracy.
Demand managers will still have to identify which of these themes hold relevance
to their situation, and attempt to measure the relevant tradeoffs present in these themes in
order to determine what level of demand forecast accuracy is “good enough”. However,
guided by research findings in these six technical and seven managerial themes, they are
better equipped to assess the nuanced and complicated set of considerations and tradeoffs
that shape the answer to “How good is good enough?”.
196
Chapter 5: Conclusions
These three essays examine interrelated concerns in a critical supply chain input.
Whether forecasts are used for replenishment, sales staffing, coordinating storage and
transportation, or for sourcing and production, accuracy is critical to a supply chain’s
health. The three questions posed by the 4PL firm we partnered with on this research:
“What is causing our replenishment forecast error?”, “What predictive factors can help
improve our demand forecast accuracy?”, and “How good is good enough?” helped us to
address three more general deficiencies in forecasting literature as it relates to logistics
and the supply chain.
By identifying the effects of a previously unexplored driver of upstream
(replenishment) forecast deviation and bias, we show that internally controllable factors
can affect the performance of upstream replenishment. In our examination of inclusion
of exogenous (weather) factors on demand forecast quality, we demonstrate the ability to
harness readily available external information to improve prediction. Finally, our review
of the bounding factors of forecast accuracy provide practitioners with the means to
identify the levels of forecast accuracy that are achievable and desirable for their firm or
supply chain.
197
In our first essay, we found that the franchise governance form does significantly
affect both replenishment forecast deviation and bias. While the effect on deviation was
consistent with previous research on the proclivities of franchisees (Brickley and Dark
1987, Norton 1988a, Bertagnoli 1989, Krueger 1991, Carney and Gedajlovic 1991,
Kaufmann and Lafontaine 1994, Michael 2000, Yin and Zajac 2004, de Leeuw, Holweg
and Williams 2011), our findings on the effect of governance form on bias would seem to
contrast with previous indications from research on the operational effect of agency in
organizations (Rubin 1978, Norton 1988a, Norton 1988b, Noren 1990, Krueger 1991).
These results suggest that governance form does indeed significantly drive behavior
potentially misaligned with parent firm incentives, but the explanation for these
differences may be more nuanced than previously identified.
In addition to these main findings, we developed a novel method to explore
extremely large datasets in order to quantify the heterogeneities of the effect of
governance form on replenishment forecast deviation and bias. The technique we call
HPD permits the parsimonious identification of regional, temporal and product category
differences in the effect so that firm resources can be effectively targeted. This is
important, as firms are gathering data faster than they can effectively utilize it, and are
urgently seeking means to leverage this resource into a competitive advantage.
Given the decentralized structure of the franchising governance form, there is a
continued need to research factors that drive differences in post-contractual performance.
Knowing these factors can guide firms in their efforts to align their mix of governance
form with their overall strategy to minimizing form-specific residual loss and maximizing
198
the advantages of the plural form (Bradach 1997, Yin and Zajac 2004, Barthélemy 2008).
If deviation or bias related to governance form is beneficial, the individual differences
can be benchmarked (Bradach 1997, Yin and Zajac 2004). If not, firms can us relational
governance to effect change in outlets where they have little coercive power (Bradach
1997, Paik and Choi 2007, Cochet et al. 2008).
In our second essay, we find multiple predicted weather factors that can
significantly improve demand forecast accuracy. We found that weather forecasts for
high and low temperatures, as well as for thunderstorms significantly improved demand
forecast accuracy. We also found that other predicted weather such as wind speed, cloud
cover, rain, snow and overall precipitation largely did not improve demand forecast
accuracy, and in some cases made it worse. Effects were similar, if slightly more positive
for models that utilized perfect weather prediction (observed rather than predicted
weather). These effects differed by accuracy measure, product category, and weather
region.
These mixed results indicate that inclusion of exogenous information into demand
forecasts can prove a difficult and complicated matter, despite the promise of greater
precision for operational planning (Bertrand and Sinclair‐Desgagné 2011, Nikolopoulos
and Fildes 2013, Steinker et al. 2016). This means that potential differences in the effect
of inclusion of these exogenous factors depends not only on the identification of a
significant weather effect, but the correct specification of that effect, which can be
nonlinear and heterogeneous across a number of dimensions. There is additionally the
potential for confounding present in aggregation of any sort of effect for use in demand
199
forecasting. The reliability of the exogenous factors themselves are also a concern worth
monitoring, as each weather prediction differs in quality based on forecast horizon,
specific weather phenomena, the sensitivity of weather sensors, and their distance from a
relevant demand point. Finally, the type of measurement used for demand forecast error
can affect the perceived benefit of including external weather predictions. The fact that
most of the observed demand accuracy improvement was in percent error measures, and
not in absolute measures, indicates that inclusion of external weather predictions
increased the responsiveness of demand forecasts.
This supports previous findings that predicted weather can significantly improve
demand forecasts (Nikolopoulos and Fildes 2013, Steinker et al. 2016), but extends these
finding s to a new industry, and over a broader set of regional and product-based
circumstances. It also demonstrates some of the difficulties inherent in inclusion of
uncertain exogenous information in demand forecasts.
Our third essay discusses in detail the current state of logistics and supply chain
research on the bound of forecast accuracy. Through our systematic literature review, we
identify six technical and seven managerial themes found to drive differential levels of
both achievable and desirable demand forecast accuracy. For each theme, we
comprehensively described the current state of logistics and supply chain research, the
implications of research for practitioners, and potential directions for future work. Our
review indicates the numerous factors that can shape the levels of demand forecast
accuracy are highly specific to the context, goals and capabilities of a firm or a supply
chain. Beyond identifying the state and future direction of research, our themes provide a
200
guide for practitioners in assessing their own demand planning situation to determine the
relevant factors that drive higher and in some cases lower forecast accuracy.
In addressing these three specific concerns from demand planners in a large 4PL,
we expand the academic understanding in three interrelated topics in forecast accuracy as
it related to logistics and the supply chain. Our results should drive future research on
factors affecting upstream replenishment forecast accuracy, factors that may improve
downstream demand forecast accuracy, and themes that dictate achievable and desirable
levels of demand prediction accuracy.
201
References
Agrawal, S., Sengupta, R.N. and Shanker, K., 2009. Impact of information sharing and
lead time on bullwhip effect and on-hand inventory. European Journal of Operational
Research, 192, pp.576–593.
Ahlburg, D.A., 1992. A commentary on error measures: Error measures and the choice of
a forecast method. International Journal of Forecasting, 8, pp.99–100.
Ali, M.M., Boylan, J.E. and Syntetos, A.A., 2012. Forecast errors and inventory
performance under forecast information sharing. International Journal of Forecasting,
28(4), pp.830–841. Available at: http://dx.doi.org/10.1016/j.ijforecast.2010.08.003.
Altendorfer, K., Felberbauer, T. and Jodlbauer, H., 2016. Effects of forecast errors on
optimal utilisation in aggregate production planning with stochastic customer demand.
International Journal of Production Research, 54(12), pp.3718–3735. Available at:
http://dx.doi.org/10.1080/00207543.2016.1162918.
Amornpetchkul, T., Duenyas, I. and Şahin, Ö., 2015. Mechanisms to Induce Buyer
Forecasting: Do Suppliers Always Benefit from Better Forecasting? Production and
Operations Management, 24(11), pp.1724–1749.
AMS (American Meteorological Society)., 2015. Weather Analysis and Forecasting: An

Information Statement of the American Meteorological Society. AMS Council.
Armstrong, J.S., 2002. Principles of Forecasting: A Handbook for Researchers and

Practitioners. (J.S. Armstrong, Ed.), International Series in Operations Research &
Management Science (Vol. 18). New York: Kluwer Academic Publishers.
Armstrong, J.S. and Collopy, F., 1992. Error Measures for Generalizing about
Forecasting Methods: Empirical Comparisons. International Journal of Forecasting,
8(1), pp.69–80. Available at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=662701.
Armstrong, J.S. and Fildes, R., 1995. Correspondence on the Selection of Error Measures
for Comparisons among Forecasting Methods. Journal of Forecasting, 14(August 1994),
pp.67–71.
Arrow, K.J., 1965. Aspects of the theory of risk-bearing. Yrjö Jahnssonin Säätiö.
202
Arunraj, N.S. and Ahrens, D., 2016. Estimation of non-catastrophic weather impacts for
retail industry. International Journal of Retail & Distribution Management, 44(7),
pp.731–753. Available at: http://dx.doi.org/10.1108/IJRDM-07-2015-
0101%5Cnhttp://dx.doi.org/10.1108/.
Arunraj, N.S., Ahrens, D. and Fernandes, M., 2016. Application of SARIMAX Model to
Forecast Daily Sales in Food Retail Industry. International Journal of Operations
Research and Information Systems, 7(2), pp.1–21.
Auffhammer, M. and Mansur, E.T., 2014. Measuring climatic impacts on energy

consumption: A review of the empirical literature. Energy Economics, 46, pp.522–530.
Available at: http://dx.doi.org/10.1016/j.eneco.2014.04.017.
Babai, M.Z., Ali, M.M., Boylan, J.E. and Syntetos, A.A., 2013. Forecasting and
inventory performance in a two-stage supply chain with ARIMA (0,1,1) demand: Theory
and empirical analysis. International Journal of Production Economics, 143(2), pp.463–
471. Available at: http://dx.doi.org/10.1016/j.ijpe.2011.09.004.
Babai, M.Z., Syntetos, A.A. and Teunter, R., 2014. Intermittent demand forecasting: An
empirical study on accuracy and the risk of obsolescence. International Journal of
Production Economics, 157, pp.212–219. Available at:
http://dx.doi.org/10.1016/j.ijpe.2014.08.019.
Bahng, Y. and Kincade, D. H., 2012. The relationship between temperature and sales.
International Journal of Retail & Distribution Management, 40(6), 410–426.
https://doi.org/10.1108/09590551211230232.
Banker, S., 2009. “Nestle Waters and Weather-Driven Demand.” Logistics Viewpoints.
ARC Advisory Group. https://logisticsviewpoints.com/2009/07/09/nestle-and-weather-
driven-demand/.
Barman, S., Tersine, R.J. and Burch, E.E., 1990. Performance evaluation of the LDR and
the PSH with forecast errors. Journal of Operations Management, 9(4), pp.481–499.
Barthélemy, J. 2008. “Opportunism, Knowledge, and the Performance of Franchise

Chains”. Strat. Mgmt. J. 29 (13): 1451-1463. Doi:10.1002/smj.719.
Bassi, A., Colacito, R. and Fulghieri, P., 2013. ’O Sole Mio: An Experimental Analysis
of Weather and Risk Attitudes in Financial Decisions. The Review of Financial Studies,
(919), pp.1–29.
Bertagnoli, L., 1989. "McDonald’s: Company of the Quarter Century". Restaurants and
Institutions, 32.
203
Bertrand, J. L., Brusset, X., and Fortin, M., 2015. Assessing and hedging the cost of
unseasonal weather: Case of the apparel sector. European Journal of Operational
Research, 244(1), 261–276. https://doi.org/10.1016/j.ejor.2015.01.012.
Bertrand, J. and Sinclair‐Desgagné, B., 2011. Environmental Risks and Financial

Markets: A Two-Way Street. In P. Bansal and A. J. Hoffman (Eds.), The Oxford
Handbook of Business and the Natural Environment (pp. 1–26). Oxford University Press.
https://doi.org/10.1093/oxfordhb/9780199584451.003.0026.
Bian, W., Shang, J. and Zhang, J., 2016. Two-way information sharing under supply
chain competition. International Journal of Production Economics, 178, pp.82–94.
Available at: http://dx.doi.org/10.1016/j.ijpe.2016.04.025.
Biggs, J.R. and Campion, W.M., 1982. The Effect and Cost of Forecast Error Bias for
Multiple-Stage Production-Inventory Systems. Decision Sciences, 13, pp.570–584.
Bitner, M.J., 2017. Servicescapes: The Impact of Physical Surroundings on Customers

and Employees. Journal of Marketing, 56(2), pp.57–71.
Box, G.E., Jenkins, G.M., Reinsel, G.C. and Ljung, G.M., 2015. Time series analysis:
forecasting and control. John Wiley & Sons.
Bowersox, D.J., Closs, D.J., Cooper, M.B. and Bowersox, J.C., 2013. Supply chain
logistics management (4th ed). New York, NY: McGraw-Hill.
Boylan, J.E. and Syntetos, A.A., 2006. Accuracy and Accuracy Implication Metrics for
Intermittent Demand. Foresight: International Journal of Applied Forecasting, (4),
pp.39–42.
Bradach, J.L., 1997. Using the Plural Form in the Management of Restaurant Chains.
Administrative Science Quarterly 42 (2): 276. Doi: 10.2307/2393921.
Bratina, D. and Faganel, A., 2008. Forecasting the Primary Demand for a Beer Brand
Using Time Series Analysis. Organizacija, 41(3), pp.116–124.
Breiter, A. and Huchzermeier, A., 2015. Promotion planning and supply chain
contracting in a high-low pricing environment. Production and Operations Management,
24(2), pp.219–236.
Brenner, H. and Gefeller, O., 1997. Variation of sensitivity, specificity, likelihood ratios
and predictive values with disease prevalence. Statistics in Medicine, 16(9), pp.981–991.
Brickley, J.A. and Dark, F.H., 1987. The Choice of Organizational Form the Case of
Franchising. Journal of Financial Economics 18 (2): 401-420. Doi:10.1016/0304-
405x(87)90046-8.
204
Buchholz, W., 1962. Planning a Computer System: Project Stretch.
Bujisic, M., Bogicevic, V. and Parsa, H.G., 2016. The effect of weather factors on
restaurant sales. Journal of Foodservice Business Research, 0(0), pp.1–21. Available at:
http://dx.doi.org/10.1080/15378020.2016.1209723.
Burton, M., 2017. South Australia suffers another weather-induced blackout. Reuters.
http://in.reuters.com/article/australia-electricity-outages-idINKBN15N2QY.
Busse, M.R., Pope, D.G., Pope, J.C. and Silva-Risson, J., 2012. Projection bias in the
housing and car markets. Unpublished manuscript.
Cachon, G.P. and Fisher, M., 2000. Supply Chain Inventory Management and the Value
of Shared Information. Management Science, 46(8), pp.1032–1048.
Cachon, G. and Lariviere, M., 2001. Contracting to Assure Supply: How to Share
Demand Forecasts in a Supply Chain. Management Science, 47(5), pp.629–646.
Camargo, M.B.P. and Hubbard, K.G., 1999. Spatial and temporal variability of daily
weather variables in sub-humid and semi-arid areas of the United States high plains.
Agricultural and Forest Meteorology, 93(2), 141–148. https://doi.org/10.1016/S0168-
1923(98)00122-1.
Campbell, G.M., 1995. Establishing safety stocks for master production schedules.
Production Planning & Control, 6(5), pp.404–412. Available at:
http://www.tandfonline.com/doi/abs/10.1080/09537289508930297.
Caniato, F., Kalchschmidt, M., Ronchi, S., Verganti, R. and Zotteri, G., 2005. Clustering
customers to forecast demand. Production Planning & Control, 16(1), pp.32-43.
Canitz, H., 2016. Overcoming Barriers to Improving Forecast Capabilities. Foresight:

International Journal of Applied Forecasting, Spring, pp.26–35.
Carney, M. and Gedajlovic, E. 1991. “Vertical Integration in Franchise Systems: Agency

Theory and Resource Explanations”. Strategic Management Journal. 12 (8): 607-629.
Doi:10.1002/smj.4250120804.
Carter, C.R. and Liane Easton, P., 2011. Sustainable supply chain management: evolution
and future directions. International Journal of Physical Distribution & Logistics
Management, 41(1), pp.46-62.
Caves, R.E. and Murphy II, W.F.. 1976. Franchising: Firms, Markets, and Intangible
Assets. Southern Economic Journal 42 (4): 572. Doi:10.2307/1056250.
Cawthorn, C., 1998. Weather as a strategic element in demand chain planning. The
Journal of Business Forecasting Methods & Systems, 17(3), pp.18–21.
205
Census Bureau., 2012. Franchise Status for Selected Industries and States. Part of: Core
Business Statistics Series, 2012. Data Set: 2012 Economic Census of the United States.
Available at American FactFinder, http://factfinder.census.gov; Accessed: 1/3/17.
Chang, S.Y. and Yeh, T.Y., 2012. Applying TLFs to design the exception thresholds in
collaborative forecasting. International Journal of Production Research, 50(7), pp.1932–
1941. Available at: http://www.tandfonline.com/doi/abs/10.1080/00207543.2011.564669.
Chatfield, C., 1992. A commentary on error measures. International Journal of

Forecasting, 8, pp.100–102.
Chen, A. and Blue, J., 2010. Performance analysis of demand planning approaches for
aggregating, forecasting and disaggregating interrelated demands. International Journal
of Production Economics, 128(2), pp.586–602. Available at:
Chen, F., 1999. Decentralized Supply Chains Subject to Information Delays.

Management Science, 45(8), pp.1076–1090.
Chen, Y. and Xiao, W., 2012. Impact of Reseller’s Forecasting Accuracy on Channel
Member Performance. Production and Operations Management, 21(6), pp.1075–1089.
Choi, C., Kim, E. and Kim, C., 2011. A Way of Managing Weather Risks Considering
Apparel Consumer Behaviors. Working Paper. Available at:
http://ssrn.com/abstract=1909185.
Chopra, S. and Sodhi, M.S., 2004. Managing risk to avoid supply-chain breakdown. MIT
Sloan Management Review, 46(1), p.53.
Christopher, M., 2000. The agile supply chain: competing in volatile markets. Industrial
Marketing Management, 29(1), pp.37-44.
Chung, C., Niu, S.C. and Sriskandarajah, C., 2012. A sales forecast model for short-life-
cycle products: New releases at blockbuster. Production and Operations Management,
21(5), pp.851–873.
Clark, A.R., 1998. Batch sequencing and sizing with regular varying demand. Production
Planning & Control, 9(3), pp.260-266.
Clark, A.R., 2005. Rolling horizon heuristics for production planning and set-up
scheduling with backlogs and error-prone demand forecasts. Production Planning &
Control, 16(1), pp.81-97.
Clemen, R.T., 1989. Combining forecasts: A review and annotated bibliography.

International Journal of Forecasting, 5(4), pp.559-583.
206
Clemen, R.T. and Winkler, R.L., 1985. Limits for the Precision and Value of Information
from Dependent Sources. Operations Research, 33(2), pp.427–443.
Cochet, O., Dormann, J. and Ehrmann, T., 2008. Capitalizing On Franchisee Autonomy:
Relational Forms of Governance as Controls in Idiosyncratic Franchise Dyads*. Journal
of Small Business Management 46 (1): 50-72. Doi:10.1111/j.1540-627x.2007.00231.x.
Cohen, J., Cohen, P., West, S.G. and Aiken, L.S., 2013. Applied Multiple
Regression/Correlation Analysis For The Behavioral Sciences. Routledge.
Combs, J.G. and Ketchen, D.J., 1999. Can Capital Scarcity Help Agency Theory Explain
Franchising? Revisiting the Capital Scarcity Hypothesis. Academy of Management
Journal 42 (2): 196-207. Doi:10.2307/257092.
Combs, J.G. and Ketchen, D.J., 2003. Why Do Firms Use Franchising As An
Entrepreneurial Strategy?: A Meta-Analysis. Journal of Management 29 (3): 443-465.
Doi:10.1016/s0149-2063_03_00019-9.
Combs, J.G., Ketchen, D.J., Shook, C.L. and Short, J.C., 2010. Antecedents and
Consequences of Franchising: Past Accomplishments and Future Challenges. Journal of
Management 37 (1): 99-126. Doi:10.1177/0149206310386963.
Considine, T.J., 2000. The impacts of weather variations on energy demand and carbon
emissions. Resource and Energy Economics, 22, pp.295–314.
Cooper, M.C., Lambert, D.M. and Pagh, J.D., 1997. Supply chain management: more
than a new name for logistics. The International Journal of Logistics Management, 8(1),
pp.1-14.
Coronado Mondragon, A.E., Lalwani, C. and Coronado Mondragon, C.E., 2011.

Measures for auditing performance and integration in closed‐loop supply chains. Supply
Chain Management: An International Journal, 16(1), pp.43–56. Available at:
http://www.emeraldinsight.com/doi/10.1108/13598541111103494.
Cotteleer, M.J. and Wan, X., 2016. Does The Starting Point Matter? The Literature‐
Driven and The Phenomenon‐Driven Approaches of Using Corporate Archival Data In
Academic Research. Journal of Business Logistics, 37(1), Pp.26-33.
Crowther, M.A. and Cook, D.J., 2007. Trials and tribulations of systematic reviews and
meta-analyses. ASH Education Program Book, 2007(1), pp.493-497.
Cui, R., Allon, G., Bassamboo, A. and Van Mieghem, J.A., 2015. Information Sharing in
Supply Chains: An Empirical and Theoretical Valuation Information. Management
Science, 61(11).
207
Cunningham, M.R., 1979. Weather, Mood, and Helping Behavior: Quasi Experiments
with the Sunshine Samaritan. Journal of Personality and Social Psychology, 37(11),
pp.1947–1956.
Dalrymple, D.J., 1975. Sales Forecasting Methods and Accuracy. Business Horizons,
(December).
Darlington, R.B. and Hayes, A.F., 1990. Regression and Linear Models. New York:
McGraw-Hill.
Davis, L.B. et al., 2016. Analysis and prediction of food donation behavior for a domestic
hunger relief organization. International Journal of Production Economics, 182, pp.26–
De Bodt, M.A. and Van Wassenhove, L.N., 1983. Cost Increases Due to Demand
Uncertainty in MRP Lot Sizing. Decision Sciences, 14, pp.345–362.
De Brito, M.P. and van der Laan, E.A., 2009. Inventory control with product returns: The
impact of imperfect information. European Journal of Operational Research, 194(1),
pp.105–121. Available at: http://dx.doi.org/10.1016/j.ejor.2007.11.063.
De Leeuw, S., Holweg, M. and Williams, G. 2011. The Impact of Decentralised Control
on Firm‐Level Inventory. International Journal of Physical Distribution & Logistics
Management 41 (5): 435-456. Doi:10.1108/09600031111138817.
Denyer, D. and Neely, A., 2004. Introduction to special issue: innovation and
productivity performance in the UK. International Journal of Management Reviews, 5(3‐
4), pp.131-135.
Dillow, C., 2011. In Brazil, an Explosion in Computing Power is Revolutionizing

Weather Prediction. Popular Science. Available at:
http://www.popsci.com/science/article/2011-05/better-weather-explosion-computing-
power-fueling-weather-modeling-revolution.
Divakar, S., Ratchford, B. and Shankar, V., 2005. CHAN4CAST: A Multichannel,

Multiregion Sales Forecasting Model and Decision Support System for Consumer
Packaged Goods. Marketing Science, 24(3), pp.334–350. Available at:
http://mktsci.journal.informs.org/content/24/3/334.abstract.
Doering, T. and Suresh, N.C., 2016. Forecasting and Performance: Conceptualizing

Forecasting Management Competence as a Higher-Order Construct. Journal of Supply
Chain Management, 52(4), pp.77–91.
Doganis, P., Aggelogiannaki, E. and Sarimveis, H., 2008. A combined model predictive
control and time series forecasting framework for production-inventory systems.
208
International Journal of Production Research, 46(24), pp.6841–6853. Available at:
http://www.scopus.com/inward/record.url?eid=2-s2.0-
55349105856&partnerID=tZOtx3y1.
Donovan, R.J., 1994. Store atmosphere and purchasing behavior. Journal of Retailing,
70(3), pp.283–294.
Durach, C.F., Kembro, J. and Wieland, A., 2017. Systematic Literature Reviews:
Addressing the Ontological and Epistemological Idiosyncrasies of Supply Chain
Management Research. Journal of Supply Chain Management. Forthcoming.
Dutton, J. A., 2002. Opportunites and Priorities in a New Era for Weather and Climate
Services. American Meteorological Society, (May), 1303–1311.
Ebert, R.J. and Lee, T.S., 1995. Production loss functions and subjective assessments of
forecast errors: untapped sources for effective master production scheduling.
International Journal of Production Research, 33(1), pp.137–159.
Ebrahim-Khanjari, N., Hopp, W. and Iravani, S.M.R., 2012. Trust and information
sharing in supply chains. Production and Operations Management, 21(3), pp.444–464.
Eksoz, C., Mansouri, S.A. and Bourlakis, M., 2014. Collaborative forecasting in the food
supply chain: A conceptual framework. International Journal of Production Economics,
158, pp.120–135. Available at: http://dx.doi.org/10.1016/j.ijpe.2014.07.031.
Ellram, L.M. and Cooper, M.C., 2014. Supply chain management: It’s all about the
journey, not the destination. Journal of Supply Chain Management, 50(1), pp.8-20.
Enns, S.T., 2002. MRP performance effects due to forecast bias and demand uncertainty.
European Journal of Operational Research, 138, pp.87–102.
Eroglu, C. and Croxton, K.L., 2010. Biases in judgmental adjustments of statistical

forecasts: The role of individual differences. International Journal of Forecasting, 26(1),
pp.116–133. Available at: http://dx.doi.org/10.1016/j.ijforecast.2009.02.005.
Eroglu, C. and Knemeyer, A.M., 2010. Exploring the Potential Effects of Forecaster
Motivational Orientation and Gender on Judgmental Adjustments of Statistical Forecasts.
Journal of Business Logistics, 31(1), pp.179–195.
Fama, E.F. and Jensen, M.C., 1983. Separation of Ownership and Control. Journal of
Law and Economics, 301-325.
Farris, M.T., 1997. Evolution of academic concerns with transportation and logistics.
Transportation Journal, 37(1), pp.42-50.
209
Fenimore, C., 2017. U.S. Climate Divisions. National Climatic Data Center, NOAA,
www.ncdc.noaa.gov/monitoring-references/maps/us-climate-divisions.php.
Fildes, R., 1992. The evaluation of extrapolative forecasting methods. International

Journal of Forecasting, 8, pp.81–98.
Fildes, R., Goodwin, P. and Onkal, D., 2015. Information use in supply chain forecasting.
(Dept. Management Science Working Paper 2015:2). Lancaster University.
Fildes, R. and Kingsman, B., 2011. Incorporating demand uncertainty and forecast error
in supply chain planning models. Journal of the Operational Research Society, 62(3),
pp.483–500. Available at: http://link.springer.com/10.1057/jors.2010.40.
Fildes, R., Nikolopoulos, K., Crone, S.F. and Syntetos, A.A., 2008. Forecasting and
Operational Research: A Review. Journal of the Operational Research Society. 59 (9):
1150-1172. Doi:10.1057/palgrave.jors.2602597.
Fildes, R., Goodwin, P., Lawrence, M. and Nikolopoulos, K., 2009. Effective forecasting
and judgmental adjustments: an empirical evaluation and strategies for improvement in
supply-chain planning. International Journal of Forecasting, 25(1), pp.3–23. Available
at: http://dx.doi.org/10.1016/j.ijforecast.2008.11.010.
Fildes, R. and Petropoulos, F., 2015. Improving Forecast Quality in Practice. Foresight:
International Journal of Applied Forecasting, Winter, pp.5–13.
Fisher, M. and Raman, A., 1996. Reducing the Cost of Demand Uncertainty through
Accurate Response to Early Sales. Operations Research, 44(1), pp.87–99.
Fliedner, E.B. and Lawrence, B., 1995. Forecasting system parent group formation: An
empirical application of cluster analysis. Journal of Operations Management, 12,
pp.119–130.
Floehr, E., 2015. Analysis of One- to Nine-Day-Out High Temperature Forecasts for
U.S., Europe, and Asia-Pacific (Calendar Year 2014).
Flores, B.E., Olson, D.L. and Pearce, S.L., 1993. Cost and Accuracy Measures in
Forecasting Method Selection: a Physical Distribution Example. International Journal of
Production Research, 31(1), pp.139–160. Available at:
http://cat.inist.fr/?aModele=afficheN&cpsidt=4228222%5Cnpapers3://publication/uuid/2
1EFC91D-401B-4863-A369-7591AA4D0113.
Flores, B.E. and Wichern, D.W., 2005. Evaluating Forecasts: A Look at Aggregate Bias
and Accuracy Measures. Journal of Forecasting, 24, pp.433–451.
Forrester, J.W., 1958. Industrial dynamics-a major breakthrough for decision makers.
Harvard Business Review, 36(4), p.37.
210
Fox, J. and Weisberg, S., 2010. An R Companion to Applied Regression. Sage.
Fustier, J., 2011. PepsiCo, peu convaincu par la fiabilité des prévisions météo. Supply
Chain Magazine, Juillet-Ao.
Ganeshan, R., Boone, T. and Stenger, A.J., 2001. The impact of inventory and flow
planning parameters on supply chain performance: An exploratory study. International
Journal of Production Economics, 71(18).
Gardner, E.S., 1990. Evaluating forecast performance in an inventory control system.

Management Science 36, no. 4: 490-499.
Gardner, E.S. and Acar, Y., 2016. The forecastability quotient reconsidered. International
Journal of Forecasting, 32(4), pp.1208-1211.
Gardner, M.P., 1985. Mood States and Consumer Behavior: A Critical Review. Journal
of Consumer Research, 12(Dec), p.281=300.
Gaur, V., Kesavan, S., Raman, A. and Fisher, M.L., 2007. Estimating Demand
Uncertainty Using Judgmental Forecasts. Manufacturing & Service Operations
Management, 9(4), pp.480–491.
Georgiadis, P. and Vlachos, D., 2006. The Impact of Product Lifecycle on Capacity
Planning of Closed‐Loop Supply Chains with Remanufacturing. Production and
Operations Management, 15(4), pp.514–527. Available at:
http://onlinelibrary.wiley.com/doi/10.1111/j.1937-5956.2006.tb00160.x/abstract.
Goldstein, K.M., 1972. Weather, mood, and internal–external control. Perceptual and
Motor Skills. 35 (August), 786.
Gönül, M.S. and Goodwin, P., 2012. Why Should I Trust Your Forecasts? Foresight:
International Journal of Applied Forecasting, pp.5–10.
Goodwin, P., 2011. High on Complexity, Low on Evidence: Are Advanced Forecasting
Methods Always as Good as They Seem? Foresight: International Journal of Applied
Forecasting, pp.10–13.
Graman, G.A. and Sanders, N.R., 2009. Modelling the tradeoff between postponement
capacity and forecast accuracy. Production Planning and Control, 20(3), pp.206-215.
Grimm, C., Knemeyer, M., Polyviou, M. and Ren, X., 2015. Supply chain management
research in management journals: A review of recent literature (2004-2013).
International Journal of Physical Distribution & Logistics Management, 45(5), pp.404-
458.
211
Gu, Q., Li, Z. and Han, J., 2012. Generalized Fisher Score For Feature Selection. Arxiv
Preprint Arxiv:1202.3725.
Guo, Z., Fang, F. and Whinston, A.B., 2006. Supply chain information sharing in a macro
prediction market. Decision Support Systems, 42(3), pp.1944–1958.
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E. and Tatham, R.L., 2006.
Multivariate Data Analysis. Vol. 6. Upper Saddle River, NJ: Pearson Prentice Hall.
Hamjah, M.A., 2014. Temperature and Rainfall Effects on Spice Crops Production and
Forecasting the Production in Bangladesh : An Application of Box- Jenkins ARIMAX
Model. Mathematical Theory and Modeling. Shahjalal University of Science and
Technology.
Han, J., Pei, J. and Kamber, M., 2011. Data Mining: Concepts and Techniques. Elsevier.
Hançerlioğulları, G., Şen, A. and Aktunç, E.A., 2016. Demand uncertainty and inventory
turnover performance: An empirical analysis of the US retail industry. International
Journal of Physical Distribution & Logistics Management, 46(6/7), pp.681–708.
Hand, D., Mannila H. and Smyth P., 2001. Principles of Data Mining. MIT Press. Vol.
30. Doi:10.2165/00002018-200730070-00010.
Hardy, Q., 2015. IBM to Acquire the Weather Company. The New York Times.
https://www.nytimes.com/2015/10/29/technology/ibm-to-acquire-the-weather-
company.html?_r=1.
Harl, J.E. and Ritzman, L.P., 1985. A heuristic algorithm for capacity sensitive
requirements planning. Journal of Operations Management, 5(3), pp.309–326.
Hartzel, K.S. and Wood, C.A., 2017. Factors that affect the improvement of demand
forecast accuracy through point-of-sale reporting. European Journal of Operational
Research, 260(1), pp.171–182. Available at: http://dx.doi.org/10.1016/j.ejor.2016.11.047.
Hatzakis, E.D., Nair, S.K. and Pinedo, M.L., 2010. Operations in financial services – An
overview. Production and Operations Management, 19(6), pp.633–664. Available at:
http://www.scopus.com/inward/record.url?eid=2-s2.0-
79952036300&partnerID=40&md5=da6ff8eec20732321daf7d1b5fa2cd5a.
Heckman, J.J., 1976. The Common Structure of Statistical Models of Truncation, Sample
Selection and Limited Dependent Variables and a Simple Estimator for Such Models. In
Annals of Economic and Social Measurement, Volume 5, Number 4 (Pp. 475-492).
NBER.
Henningsen, A., 2010. Estimating Censored Regression Models In R Using The Censreg
Package. University of Copenhagen.
212
Hijmans, R.J., Williams, E. and Vennes, C., 2016. Package ‘geosphere’: Spherical
Trigonometry.
Hirshleifer, D. and Shumway, T., 2017. American Finance Association Good Day
Sunshine: Stock Returns and the Weather. The Journal of Finance, 58(3), pp.1009–1032.
Ho, C.J. and Ireland, T.C., 1993. A diagnostic analysis of the impact of forecast errors on
production planning via MRP system nervousness. Production Planning & Control, 4(4),
p.311. Available at:
http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=5793096&lang=es&si
te=ehost-live.
Ho, C.J. and Ireland, T.C., 1998. Correlating MRP system nervousness with forecast
errors. International Journal of Production Research ISSN: 36(8), pp.2285–2299.
Ho, C. and Ireland, T.C., 2012. Mitigating forecast errors by lot-sizing rules in ERP-
controlled manufacturing systems. International Journal of Production Research, 50(11),
pp.3080–3094.
Holmström, J., Korhonen, H., Laiho, A. and Hartiala, H., 2006. Managing product
introductions across the supply chain: findings from a development project. Supply Chain
Management: An International Journal, 11(2), pp.121–130. Available at:
Hoover’s Inc., 2016. Fast-Food & Quick-Service Restaurants. Retrieved August 18, 2016
From Hoover’s Database.
Horn, J.L., 1965. A Rationale and Test for the Number of Factors in Factor Analysis.
Psychometrika, 30(2), Pp.179-185.
Hosmer Jr, D.W., Lemeshow, S. and Sturdivant, R.X., 2013. Applied Logistic Regression
(Vol. 398). John Wiley & Sons.
Hosoda, T. and Disney, S.M., 2006. On variance amplification in a three-echelon supply

chain with minimum mean square error forecasting. Omega: International Journal of
Hosoda, T. and Disney, S.M., 2009. Impact of market demand mis-specification on a

two-level supply chain. International Journal of Production Economics, 121(2), pp.739–
Howells, T. and Morgan, E., 2017. Gross Domestic Product by Industry: Fourth Quarter
and Annual 2016. US Department of Commerce, Bureau of Economic Analysis.
213
Hu, Q.S. and Skaggs, K., 2009. Accuracy of 6-10 Day Precipitation Forecasts and Its
Improvement in the Past Six Years. In 7th NOAA Annual Climate Prediction Application
Science Workshop. Pp. 1–2.
Huang, L.T., Hsieh, I.C. and Farn, C.K., 2011. On ordering adjustment policy under
rolling forecast in supply chain planning. Computers and Industrial Engineering, 60(3),
pp.397–410. Available at: http://dx.doi.org/10.1016/j.cie.2010.07.018.
Huang, Y.S., Hung, J.S. and Ho, J.W., 2017. A study on information sharing for supply
chains with multiple suppliers. Computers & Industrial Engineering, 104, pp.114–123.
Available at: http://dx.doi.org/10.1016/j.cie.2016.12.014.
Huntemann, T.L., Rudack, D.E., and Ruth, D.P., 2014. Forty Years of NWS Verification:
Past Performance and Future Advances.
Hur, D., Mabert, V.A. and Bretthauer, K.M., 2009. Real-Time Work Schedule
Adjustment Decisions: An Investigation and Evaluation. Production and Operations
Management, 13(4), pp.322–339. Available at: http://doi.wiley.com/10.1111/j.1937-
5956.2004.tb00221.x.
Hyndman, R.J., 2006. Another Look at Forecast Accuracy Metrics for Intermittent
Demand. Foresight: International Journal of Applied Forecasting, 4(June), pp.43–46.
Hyndman, R.J. and Athanasopoulos, G., 2014. Forecasting: principles and practice.
Otexts.
Hyndman, R.J. and Khandakar, Y., 2008. Automatic time series forecasting: The forecast
package for R., Journal of Statistical Software, 26(3).
Hyndman, R.J. and Koehler, A.B., 2006. Another look at measures of forecast accuracy.
International Journal of Forecasting, 22(1), pp.679–688.
Hyndman, R.J. and Kostenko, A. V, 2007. Minimum Sample Size Requirements for
Seasonal Forecasting Models. Foresight: The International Journal of Applied
Forecasting, (6), pp.12–15.
Hyndman, R.J., O’Hara-Wild, M., Bergmeir, C., Razbash, S. and Wang, E., 2017.
Package ‘forecast’: Forecasting Functions for Time Series and Linear Models.
IGD (The Institute of Grocery Distribution)., 2009. Case Study: Nestlé Waters – Demand
Planning Weather Project. The Institute of Grocery Distribution (GB).
https://www.igd.com/Research/Supply-chain/Waste-prevention/Six-to-fix-to-prevent-
waste/Forecast/Nestle-Waters---Demand-planning-weather-project/.
Jain, C., 2001. Forecasting Practices in Corporate America. The Journal of Business
Forecasting, 20(1).
214
Janis, M.J., Hubbard, K.G. and Redmond, K.T., 2004. Station density strategy for
monitoring long term climatic change in the contiguous United States. Journal of
Climate, 17(2001), pp.151–163. Available at:
http://climate.geog.udel.edu/~climate/publication_html/Pdf/JHR_JClim_04.pdf.
Jeffery, M.M. et al., 2008. Determining a cost-effective customer service level. Supply
Chain Management: An International Journal, 13(3), pp.225–232.
Jensen, M.C. and Meckling, W.H. 1976. Theory of the Firm: Managerial Behavior,
Agency Costs and Ownership Structure. Journal of Financial Economics 3 (4): 305-360.
Doi:10.1016/0304-405x(76)90026-x.
Jeong, I.J., 2011. A dynamic model for the optimization of decoupling point and
production planning in a supply chain. International Journal of Production Economics,
131(2), pp.561–567. Available at: http://dx.doi.org/10.1016/j.ijpe.2011.02.001.
Jeunet, J., 2006. Demand forecast accuracy and performance of inventory policies under
multi-level rolling schedule environments. International Journal of Production
Economics, 103, pp.401–419.
Ji, G., Gunasekaran, A. and Yang, G., 2014. Constructing sustainable supply chain under
double environmental medium regulations. International Journal of Production
Economics, 147(PART B), pp.211–219. Available at:
Jin, Y., Williams, B.D., Tokar, T. and Waller, M.A., 2015. Forecasting With Temporally
Aggregated Demand Signals in a Retail Supply Chain. Journal of Business Logistics,
36(2), pp.199–211.
Johnson, E.M. and Anderson, E., 2000. Postponement Strategies for Channel Derivatives.
The International Journal of Logistics Management, 11(1), pp.19–36. Available at:
Johnston, F.R. and Harrison, P.J., 1980. An application of forecasting in the alcoholic
drinks industry. Journal of the Operational Research Society, 31, pp.699–709.
Jöreskog, K.G., 2002. Censored Variables and Censored Regression. Retrieved

January 19, P.2007.
Juselius, K., 1985. Modelling short- and long-term effects in the aggregate demand for
soft drinks. International Journal of Forecasting, 1, pp.253–272.
Kahn, K.B., 2003. How to measure the impact of a forecast error on an enterprise.
Journal of Business Forecasting Methods & Systems, Spring, pp.21–25.
215
Kannan, G., Grigore, M.C., Devika, K. and Senthilkumar, A., 2013. An analysis of the
general benefits of a 216entralized VMI system based on the EOQ model. International
Journal of Production Research, 51(1), pp.172–188. Available at:
http://www.tandfonline.com/doi/abs/10.1080/00207543.2011.653838.
Karl T.R. and Koss W. J., 1984: Historical Climatology Series 4-3: Regional and
National Monthly, Seasonal and Annual Temperature Weighted by Area, 1895-1983
Katsikopoulos, K. and Syntetos, A.A., 2016. Bias-Variance Trade-offs in Demand

Forecasting. Foresight: The International Journal of Applied Forecasting, (40), pp.12–
20.
Katz, R.W. and Lazo, J.K., 2011. Economic Value of Weather and Climate Forecasts. In
Oxford Handbooks Online. pp. 1–32. Available at:
http://books.google.com/books?hl=en&lr=&id=FnTVdEfsY2oC&pgis=1.
Katz, R.W. and Murphy, A.H., 1990. Quality/value relationships for imperfect weather
forecasts in a prototype multistage decision-making model. Journal of Forecasting, 9(1),
pp.75–86.
Katz, R.W. and Murphy, A.H. eds., 1997. Economic Value of Weather and Climate
Forecasts. Cambridge University Press.
Kaufmann, P.J. and Lafontaine, F., 1994. Costs of Control: The Source of Economic
Rents for McDonald’s Franchisees. The Journal of Law and Economics 37 (2): 417-453.
doi:10.1086/467319.
Kelle, P. and Silver, E.A., 1989. Forecasting the Returns of Reusable Containers. Journal
of Operations Management, 8(1), pp.17–35.
Kerkkänen, A., Korpela, J. and Huiskonen, J., 2009. Demand forecasting errors in
industrial context: Measurement and impacts. International Journal of Production
Economics, 118(1), pp.43–48.
Khouja, M., 1998. An aggregate production planning framework for the evaluation of
volume flexibility. Production Planning & Control, 9(2), pp.127-137.
Khouja, M. and Kumar, R.L., 2002. Information technology investments and volume-
flexibility in production systems. International Journal of Production Research, 40(1),
pp.205–221. Available at:
Kiely, D., 2004. The State of Pharmaceutical Industry Supply Planning and Demand
Forecasting. Journal of Business Forecasting Methods & Systems, 23(3), pp.20–22.
Available at:
216
http://search.proquest.com/docview/226914243?accountid=10297%5Cnhttp://metalib.dm
z.cranfield.ac.uk:9003/cranfield?url_ver=Z39.88-
2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&genre=unknown&sid=ProQ:ProQ:abigl
obal&atitle=THE+STATE+OF+PHARMACEUTICAL+INDUSTR.
Kim, J.S., Shin, K.Y. and Ahn, S.E., 2003. A Multiple Replenishment Contract with
ARIMA Demand Processes. Journal of the Operational Research Society, 54(11),
pp.1189–1197.
King, M.A., Abrahams, A.S. and Ragsdale, C.T., 2014. Ensemble methods for advanced
skier days prediction. Expert Systems with Applications, 41, pp.1176–1188.
Kitchin, R. and McArdle, G., 2016. What makes Big Data, Big Data? Exploring the
ontological characteristics of 26 datasets. Big Data & Society, 3(1), pp.1–10. Available
at: http://bds.sagepub.com/lookup/doi/10.1177/2053951716631130.
Koksalan, M., Erkip, N. and Moskowitz, H., 1999. Explaining beer demand: A residual
modeling regression approach using statistical process control. International Journal of
Production Economics, 58(3), pp.265–276. Available at:
http://linkinghub.elsevier.com/retrieve/pii/S0925527398002072.
Kolassa, S., 2016. Sometimes it’s better to Be Simple than Correct. Foresight:
International Journal of Applied Forecasting, Winter, pp.20–27.
Kolassa, S., 2016. Evaluating predictive count data distributions in retail sales
forecasting. International Journal of Forecasting, 32(3), pp.788–803. Available at:
http://dx.doi.org/10.1016/j.ijforecast.2015.12.004.
Kolassa, S. and Martin, R., 2011. Percentage errors can ruin your day (and rolling the
dice shows how). Foresight: International Journal of Applied Forecasting, Fall, pp.21–
27.
Krueger, A.B., 1991. Ownership, Agency, and Wages: An Examination of Franchising In

the Fast Food Industry. The Quarterly Journal of Economics 106 (1): 75-101.
doi:10.2307/2937907.
Kuhn, M. and Johnson K. 2013. Applied Predictive Modeling. Springer. doi:10.1007/978-

1-4614-6849-3.
Kung, L.C. and Chen, Y.J., 2014. Impact of Reseller’s and Sales Agent’s forecasting
Accuracy in a Multilayer Supply Chain. Naval Research Logistics, 61(3), pp.207–222.
Available at: http://www.interscience.wiley.com/jpages/0894-069X/.
217
Kurtuluş, M., Ülkü, S. and Toktay, B.L., 2012. The Value of Collaborative Forecasting in
Supply Chains. Manufacturing & Service Operations Management, 14(1), pp.82–98.
Available at: http://pubsonline.informs.org/doi/abs/10.1287/msom.1110.0351.
Lackes, R., Schlüter, P. and Siepermann, M., 2016. The impact of contract parameters on
the supply chain performance under different power constellations. International Journal
of Production Research, 54(1), pp.251–264. Available at:
http://www.tandfonline.com/doi/full/10.1080/00207543.2015.1076943.
Lafontaine, F., 1992. Agency Theory and Franchising: Some Empirical Results. The
RAND Journal of Economics 23 (2): 263. doi:10.2307/2555988.
Lazo, J.K., Lawson, M., Larsen, P.H. and Waldman, D.M., 2011. U.S. economic
sensitivity to weather variability. Bulletin of the American Meteorological Society, 92(6),
pp.709–720.
LeBlanc, L., Hill, J. and Harder, J., 2009. Modeling uncertain forecast accuracy in supply
chains with postponement. Journal of Business Logistics, 30(1), pp.19–31. Available at:
http://onlinelibrary.wiley.com/doi/10.1002/j.2158-1592.2009.tb00097.x/full.
Lee, H.L., 1996. Effective inventory and service management through product and
process redesign. Operations Research, 44(1), pp.151–159.
Lee, H.L., Padmanabhan, V. and Whang, S., 1997. Information distortion in a supply
chain: The bullwhip effect. Management science, 43(4), pp.546-558.
Lee, H.L., So, K.C. and Tang, C.S., 2000. The Value of Information Sharing in a Two-
Level Supply Chain. Management Science, 46(5), pp.626–643. Available at:
http://www.jstor.org/stable/2661463.
Lee, T.S. and Adam Jr., E.E., 1986. Forecasting Error Evaluation in Material
Requirements Planning (MRP) Production-Inventory Systems. Management Science,
32(9), pp.1186–1205.
Lee, T.S., Adam Jr., E.E. and Ebert, R.J., 1987. An Evaluation of Forecast Error In
Master Production Scheduling for Material Requirements Planning Systems. Decision
Sciences, 18, pp.292–307.
Lee, T.S., Cooper, F.W. and Adam Jr., E.E., 1993. The Effects of Forecasting Errors on
the Total Cost of Operations. Omega: International Journal of Management Science,
21(5), pp.541–550.
Li, C., Luo, X., Zhang, C. and Wang, X., 2017. Sunny, Rainy, and Cloudy with a Chance
of Mobile Promotion Effectiveness. Marketing Science.
218
Li, Y., Ye, F. and Lin, Q., 2015. Optimal lead time policy for short life cycle products
under Conditional Value-at-Risk criterion. Computers and Industrial Engineering, 88,
pp.354–365. Available at: http://dx.doi.org/10.1016/j.cie.2015.07.011.
Liao, W.T. and Chang, P.C., 2010. Impacts of forecast, inventory policy, and lead time
on supply chain inventory numerical study. International Journal of Production
Economics, 128(2), pp.527–537. Available at:
Lin, N.P., Krajewski, L., Leong, G.K. and Benton, W.C., 1994. The effects of
environmental factors on the design of master production scheduling systems. Journal of
Lorentz, H., Wong, C.Y. and Hilmola, O.P., 2007. Emerging distribution systems in
central and Eastern Europe Implications from two case studies. International Journal of
Physical Distribution & Logistics Management, 37(8), pp.670–697.
Ma, Y., Wang, N., Che, A., Huang, Y. and Xu, J., 2013. The bullwhip effect under
different information-sharing settings: a perspective on price-sensitive demand that
incorporates price dynamics. International Journal of Production Research, 51(10),
pp.3085–3116.
Ma, Y., Wang, N., Che, A., Huang, Y. and Xu, J., 2013. The bullwhip effect on product
orders and inventory: a perspective of demand forecasting techniques. International
Journal of Production Research, 51(1), pp.281–302. Available at:
http://www.tandfonline.com/doi/abs/10.1080/00207543.2012.676682.
Mahoney, J.T., 2005. Economic Foundations of Strategy. Thousand Oaks, CA: Sage.
Makridakis, S.G., 1993. Accuracy measures: theoretical and practical concerns.

International Journal of Forecasting, 9, pp.527–529.
Makridakis, S.G. and Hibon, M., 2000. The M3-Competition: results, conclusions and
implications. International Journal of Forecasting, 16, pp.451–476.
Makridakis, S.G. and Wheelwright, S.C., 1989. Forecasting methods for management.
Marmier, F. and Cheikhrouhou, N., 2010. Structuring and integrating human knowledge
in demand forecasting: a judgmental adjustment approach. Production Planning &
Control, 21(4), pp.399–412. Available at:
Masuchun, W., Davis, S. and Patterson, J.W., 2004. Comparison of push and pull control
strategies for supply network management in a make-to-stock environment. International
Journal of Production Research, 42(20), pp.4401–4419.
219
Mathews, B.P. and Diamantopoulos, A., 1994. Towards a taxonomy of forecast error
measures a factor-comparative investigation of forecast error dimensions. Journal of
Forecasting, 13(4), pp.409–416.
Mathewson, G.F., and Winter, R.A., 1985. The Economics of Franchise Contracts. The
Journal of Law and Economics 28 (3): 503-526. doi:10.1086/467099.
Maydeu-Olivares, A. and Garcia-Forero, C., 2010. Goodness-of-Fit Testing.

International Encyclopedia of Education, 7(1), Pp.190-196.
McAfee, A. and Brynjolfsson, E., 2012. Big Data. The management revolution. Harvard
Business Review, 90(10), pp.61–68. Available at:
http://www.buyukverienstitusu.com/s/1870/i/Big_Data_2.pdf.
McCarthy, T.M. and Golicic, S.L., 2002. Implementing collaborative forecasting to

improve supply chain performance. International Journal of Physical Distribution &
Logistics Management, 32(6), pp.431–454. Available at:
Megahed, F.M. and Jones-Farmer, A., 2013. A Statistical Process Monitoring Perspective
on “Big Data.” Frontiers in Statistical Quality Control, 11th ed, p.21. Available at:
http://www.eng.auburn.edu/users/fmm0002/ISQC2013Paper.pdf.
Mena, C., Terry, L.A., Williams, A. and Ellram, L., 2014. Causes of waste across multi-
tier supply networks: Cases in the UK food sector. International Journal of Production
Economics, 152, pp.144–158. Available at: http://dx.doi.org/10.1016/j.ijpe.2014.03.012.
Mentzer, J.T. and Cox Jr., J.E., 1984. A Model of the Determinants of Achieved Forecast
Accuracy. Journal of Business Logistics, 5(2), pp.143–155.
Metters, R., 1997. Quantifying the bullwhip effect in supply chains. Journal of
operations management, 15(2), pp.89-100.
Michael, S.C., 2000. The Effect of Organizational Form on Quality: The Case of
Franchising. Journal of Economic Behavior & Organization 43 (3): 295-318.
doi:10.1016/s0167-2681(00)00125-6.
Minka, T.P., 2003. A Comparison of Numerical Optimizers for Logistic

Regression. Unpublished Draft.
Miyaoka, J. and Hausman, W., 2004. How a Base Stock Policy Using “Stale” Forecasts
Provides Supply Chain Benefits. Manufacturing & Service Operations Management,
6(2), pp.149–162.
220
Miyaoka, J. and Hausman, W.H., 2008. How Improved Forecasts Can Degrade
Decentralized Supply Chains. Manufacturing & Service Operations Management, 10(3),
pp.547–562.
Mjelde, J.W., Sonka, S.T. and Peel, D.S., 1989. Climate and Weather forecasting: A
Review. Water, 7495(August), p.22.
Mjelde, J.W. and Dixon, B.L., 1993. Valuing the lead time of periodic forecasts in
dynamic production systems. Agricultural Systems, 42(1–2), pp.41–55.
Moon, M.A., 2015. Commentary on Improving Forecast Quality in Practice. Foresight:

The International Journal of Applied Forecasting, 25(1), p.24. Available at:
http://content.wkhealth.com/linkback/openurl?sid=WKPTLP:landingpage&an=00001577
-201325010-00006.
Morlidge, S., 2014. Forecast Quality in the Supply Chain. Foresight: International
Journal of Applied Forecasting, Spring, pp.26–32.
Morlidge, S., 2013. How Good Is a “Good” Forecast? Forecast Errors and Their
Avoidability. Foresight: International Journal of Applied Forecasting, Summer, pp.5–
12.
Morlidge, S., 2014. Using Relative Error Metrics to Improve Forecast Quality in the
Supply Chain. Foresight: The International Journal of Applied Forecasting, Summer
(34), pp.39–47.
Murphy, A.H., 1977. The Value of Climatological, Categorical and Probabilistic

Forecasts in the Cost-Loss Ratio Situation. Monthly Weather Review, 105(7), pp.803–
816.
Murphy, A.H., 1993. What Is a Good Forecast? An Essay on the Nature of Goodness in
Weather Forecasting. Weather and Forecasting, 8(2), pp.281–293.
Murphy, A.H., 1997. Forecast Verification. Economic value of weather and climate
forecasts, pp.19-74.
Murray, K.B., Di Muro, F., Finn, A. and Leszczyc, P.P., 2010. The effect of weather on
consumer spending. Journal of Retailing and Consumer Services, 17(6), pp.512–520.
Available at: http://dx.doi.org/10.1016/j.jretconser.2010.08.006.
Nagashima, M., Wehrle, F.T., Kerbache, L. and Lassagne, M., 2015. Impacts of adaptive
collaboration on demand forecasting accuracy of different product categories throughout
the product life cycle. Supply Chain Management: An International Journal, 20(4),
pp.415–433.
221
National Research Council, 2013. Frontiers in Massive Data Analysis. Technical Paper
No. 201402. Board on Mathematical Sciences and Their Applications, National Research
Council. Washington, D.C.: National Academies Press.
NCEI (National Center for Environmental Information)., 2017. U.S. Local Climatological
Data. Rep. Asheville, NC. Web. https://www.ncdc.noaa.gov/orders/qclcd/.
NGIA (National Geospatial Intelligence Agency)., 2014. World Geodetic System 1984
(WGS 84). Office of Geomatics. http://earth-info.nga.mil/GandG/wgs84/index.html.
Niakan, F. and Rahimi, M., 2015. A multi-objective healthcare inventory routing

problem; a fuzzy possibilistic approach. Transportation Research Part E: Logistics and
Transportation Review, 80, pp.74–94. Available at:
http://dx.doi.org/10.1016/j.tre.2015.04.010.
Nikolopoulos, K. and Fildes, R., 2013. Adjusting supply chain forecasts for short-term
temperature estimates: A case study in a Brewing company. IMA Journal of Management
Mathematics, 24(1), 79–88. https://doi.org/10.1093/imaman/dps006.
Noren, D.L., 1990. The Economics Of The Golden Arches: A Case Study of the
McDonald’s System. The American Economist 34 (2 (Fall 1990): 60-64.
Norton, S.W., 1988. Franchising, Brand Name Capital, And The Entrepreneurial
Capacity Problem. Strategic Management Journal. 9 (S1): 105-114.
doi:10.1002/smj.4250090711.
Norton, S.W., 1988. An Empirical Look at Franchising As an Organizational Form. The

Journal of Business 61 (2): 197. doi:10.1086/296428.
Oliva, R. and Watson, N., 2009. Managing Functional Biases in Organizational

Forecasts: A Case Study of Consensus Forecasting in Supply Chain Planning. Production
and Operations Management, 18(2), pp.138–151.
OSC., 1987. Ohio Supercomputer Center. Columbus OH.

http://osc.edu/ark:/19495/f5s1ph73.
Oxenfeldt, A.R. and Kelly, A.O., 1969. Will Successful Franchise Systems Ultimately
Become Wholly-Owned Chains? Journal Of Retailing 44 (4): 69-83.
Paik, Y. and Choi, D.Y., 2007. Control, Autonomy and Collaboration in the Fast Food
Industry: A Comparative Study between Domestic and International
Franchising. International Small Business Journal 25 (5): 539-562.
doi:10.1177/0266242607080658.
Palmer, B., 2013. Long-Term Weather Forecasts Are A Long Way From Accurate.
Washington Post. https://www.washingtonpost.com/national/health-science/long-term-
222
weather-forecasts-are-a-long-way-from-accurate/2013/04/15/1f9a2ac8-a05b-11e2-be47-
b44febada3a8_story.html?utm_term=.8434156533de.
Parsons, A.G., 2001. The Association between Daily Weather and Daily Shopping
Patterns. Australasian Marketing Journal (AMJ), 9(2), pp.78–84. Available at:
http://dx.doi.org/10.1016/S1441-3582(01)70177-2.
Paul, A., Tan, Y. and Vakharia, A.J., 2015. Inventory Planning for a Modular Product
Family. Production and Operations Management, 24(7), pp.1033–1053.
Pearson, M., Masson, R. and Swain, A., 2010. Process control in an agile supply chain
network. International Journal of Production Economics, 128(1), pp.22–30. Available at:
Penrose, E.T., 1959. The Theory of the Growth of the Firm. Oxford: Oxford University
Press.
Persinger, M. and Levesque, B.F., 1983. Geophysical variables and behavior: XII. The
weather matrix accommodates large portions of variance of measured daily mood.
Perceptual and Motor Skills, 57, pp.868–870.
Persona, A., Battini, D., Manzini, R. and Pareschi, A., 2007. Optimal safety stock levels
of subassemblies and manufacturing components. International Journal of Production
Economics, 110(1–2), pp.147–159.
Peter, Ď. and Silvia, P., 2012. ARIMA vs. ARIMAX – which approach is better to
analyze and forecast macroeconomic time series? In Proceedings of 30th International
Conference Mathematical Methods in Economics θ. pp. 136–140.
Petersen, H., 2003. Integrating the Forecasting Process with the Supply Chain: Bayer
Healthcare’s Journey. Journal of Business Forecasting Methods & Systems, 22(4), pp.11–
16.
Potgieter, A.B., Everingham, Y.L. and Hammer, G.L., 2003. On measuring quality of a
probabilistic commodity forecast for a system that incorporates seasonal climate
forecasts. International Journal of Climatology, 23(10), pp.1195–1210.
Rajaram, K. and Tang, C.S., 2001. The impact of product substitution on retail
merchandising. European Journal of Operational Research, 135(3), pp.582-601.
Rajesh, R. and Ravi, V., 2015. Modeling enablers of supply chain risk mitigation in
electronic supply chains: A Grey-DEMATEL approach. Computers and Industrial
Engineering, 87, pp.126–139. Available at: http://dx.doi.org/10.1016/j.cie.2015.04.028.
223
Ramanathan, U., 2013. Aligning supply chain collaboration using Analytic Hierarchy
Process. Omega: International Journal of Management Science, 41(2), pp.431–440.
Available at: http://linkinghub.elsevier.com/retrieve/pii/S0305048312000552.
Ramanathan, U. and Muyldermans, L., 2010. Identifying demand factors for promotional
planning and forecasting: A case of a soft drink company in the UK. International
Journal of Production Economics, 128(2), pp.538–545. Available at:
Ritzman, L.P. and King, B.E., 1993. The relative significance of forecast errors in
multistage manufacturing. Journal of Operations Management, 11(1), pp.51–65.
Ritzman, L.P. and Sanders, N.R., 2001. Judgmental Adjustment of Statistical Forecasts.
In J. S. Armstrong, ed. Principles of Forecasting. New York: Springer Science+ Business
Media.
Rodrigues, P.M.M. and Osborn, D.R., 1999. Performance of seasonal unit root tests for
monthly data. Journal of Applied Statistics, 26(8), p.985± 1004.
Rostami‐Tabar, B., Babai, M.Z., Syntetos, A.A. and Ducq, Y., 2013. Demand
Forecasting by Temporal Aggregation. Naval Research Logistics, 60(6), pp.479–498.
Available at: http://www.interscience.wiley.com/jpages/0894-069X/.
Rubin, P.H., 1978. The Theory of the Firm and the Structure of the Franchise
Contract. The Journal of Law and Economics 21 (1): 223-233. doi:10.1086/466918.
Samenow, J. and Fritz, A., 2015. Five Myths about Weather Forecasting. Washington
Post. Available at: https://www.washingtonpost.com/opinions/five-myths-about-weather-
forecasting/2015/01/02/e49e8950-8b86-11e4-a085-
34e9b9f09a58_story.html?utm_term=.18b10a1622a0.
Sanchez-Lugo, E., 2017. U.S. Climate Regions. National Climatic Data Center, NOAA,
www.ncdc.noaa.gov/monitoring-references/maps/us-climate-regions.php.
Sanders, J.L. and Brizzolara, M.S., 1982. Relationships between weather and mood. The
Journal of General Psychology, 107(1), pp.155-156.
Sanders, N.R. and Graman, G.A., 2009. Quantifying costs of forecast errors: A case study
of the warehouse environment. Omega: International Journal of Management Science,
37(1), pp.116–125.
Sanders, N.R. and Graman, G.A., 2016. Impact of Bias Magnification on Supply Chain
Costs: The Mitigating Role of Forecast Sharing. Decision Sciences, 47(5), pp.881–906.
224
Sanders, N.R. and Manrodt, K.B., 2003. The efficacy of using judgmental versus
quantitative forecasting methods in practice. Omega: International Journal of
Sanders, N.R. and Ritzman, L.P., 2004. Using Warehouse Workforce Flexibility to Offset
Forecast Errors. Journal of Business Logistics, 25(2), pp.251–270.
Sanders, N.R. and Ritzman, L.P., 2004. Integrating judgmental and quantitative forecasts:
methodologies for pooling marketing and operations information. International Journal
of Operations & Production Management, 24(5), pp.514-529.
Sanders, N.R. and Ritzman, L.P., 1995. Bringing judgment into combination forecasts.
Journal of Operations Management, 13(4), pp.311–321.
Schmitt, T.G., 1984. Resolving uncertainty in manufacturing systems. Journal of

Sethi, S.P., Yan, H., Zhang, H. and Zhou, J., 2007. A supply chain with a service
requirement for each market signal. Production and Operations Management, 16(3),
pp.322–342.
Sharma, A. and Paliwal, K.K., 2007. Fast Principal Component Analysis Using Fixed-
Point Algorithm. Pattern Recognition Letters, 28(10), Pp.1151-1155.
Shelton, J.P., 1967. Allocative Efficiency Vs." X-Efficiency": Comment. The American
Economic Review 57 (5): 1252-1258.
Shin, H. and Tunca, T.I., 2010. Do Firms Invest in Forecasting Efficiently? The Effect of
Competition on Demand Forecast Investments and Supply Chain Coordination.
Operations Research, 58(6), pp.1592–1610.
Silver, E.A., Pyke, D.F. and Peterson, R., 1998. Inventory Management and Production
Planning and Scheduling (Vol. 3, p. 30). New York: Wiley.
Silver, N. 2012. The signal and the noise: Why so many predictions fail-but some don't.
Penguin.
Smith, C.D. and Mentzer, J.T., 2010. User Influence on the Relationship between
Forecast Accuracy, Application and Logistics Performance. Journal of Business
Logistics, 31(1), pp.159–177.
Smith, K., 1993. The influence of weather and climate on recreation and tourism.
Weather, 48(12), pp.398–404.
Smyth, K.B., Croxton, K.L., Franklin, R. and Knemeyer, A.M., 2017. Thirsty in an
Ocean of Data? Pitfalls and Practical Strategies when Partnering with Industry on Big
225
Data Supply Chain Research. Journal of Business Logistics. Manuscript submitted for
publication (copy on file with author).
Sourirajan, K., Ramachandran, B. and An, L., 2008. Application of control theoretic
principles to manage inventory replenishment in a supply chain. International Journal of
Production Research, 46(21), pp.6163–6188. Available at:
Southern, R.N., 2011. Historical perspective of the logistics and supply chain
management discipline. Transportation Journal, 50(1), pp.53-64.
Spies, K., Hesse, F. and Loesch, K., 1997. Store atmosphere, mood and purchasing
behavior. International Journal of Research in Marketing, 14, pp.1–17.
Sridharan, V. and LaForge, L.R., 1989. The impact of safety stock on schedule
instability, cost and service. Journal of Operations Management, 8(4), pp.327–347.
Starr-McCluer, M., 2000. The Effects of Weather on Retail Sales. Finance and
Economics Discussion Series Working Paper, 2000–8(Federal Reserve Board of
Governors).
Steele, A.T., 1951. Weather’s Effect on the Sales of a Department Store. Journal of
Marketing, 15(4), pp.436–443.
Steinker, S., Hoberg, K. and Thonemann, U.W., 2016. The Value of Weather Information
for E‐commerce Operations. Production and Operations Management.
Stern, H. and Davidson, N.E., 2015. Trends in the skill of weather prediction at lead
times of 1 – 14 days. Quarterly Journal of the Royal Meteorological Society, (October),
pp.2726–2736.
Stockton, N., 2016. Deep Thunder can forecast the weather – down to a city block. Wired
Magazine. https://www.wired.com/2016/06/deep-thunder-can-forecast-weather-city-
block/.
Suddath, C., 2014. The Weather Channel's Secret: Less Weather, More Clickbait.
Bloomberg. https://www.bloomberg.com/news/articles/2014-10-09/weather-channels-
web-mobile-growth-leads-to-advertising-insights.
Syntetos, A.A., Babai, Z., Boylan, J.E., Kolassa, S. and Nikolopoulos, K., 2016. Supply
chain forecasting: Theory, practice, their gap and the future. European Journal of
Operational Research, 252(1), pp.1-26.
Syntetos, A.A. and Boylan, J.E., 2001. On the bias of intermittent demand estimates.
International Journal of Production Economics, 71, pp.457–466.
226
Syntetos, A.A. and Boylan, J.E., 2005. The accuracy of intermittent demand estimates.
International Journal of Forecasting, 21, pp.303–314.
Syntetos, A.A. and Boylan, J.E., 2010. On the variance of intermittent demand estimates.
International Journal of Production Economics, 128(2), pp.546–555. Available at:
Syntetos, A.A., Boylan, J.E. and Croston, J.D., 2005. On the categorization of demand
patterns. Journal of the Operational Research Society, 56(5), pp.495-503.
Syntetos, A.A., Boylan, J.E. and Disney, S.M., 2009. Forecasting For Inventory
Planning: A 50-Year Review. Journal of the Operational Research Society. 60: S149-
S160. doi:10.1057/jors.2008.173.
Syntetos, A.A., Nikolopoulos, K. and Boylan, J.E., 2010. Judging the judges through
accuracy-implication metrics: The case of inventory forecasting. International Journal of
Forecasting, 26(1), pp.134–143. Available at:
Tate, W.L., Ellram, L.M. and Dooley, K.J., 2012. Environmental purchasing and supplier
management (EPSM): Theory and practice. Journal of Purchasing and Supply
Management, 18(3), pp.173-188.
Taylor, T.A. and Xiao, W., 2010. Does a Manufacturer Benefit from Selling to a Better-
Forecasting Retailer? Management Science, 56(9), pp.1584–1598.
Terwiesch, C., Ren, Z.J., Ho, T.H. and Cohen, M.A., 2005. An Empirical Analysis of
Forecast Sharing in the Semiconductor Equipment Supply Chain. Management Science,
51(2), pp.208–220. Available at:
http://pubsonline.informs.org/doi/abs/10.1287/mnsc.1040.0317.
Thiel, D., Vo, T.L.H. and Hovelaque, V., 2014. Forecasts impacts on sanitary risk during
a crisis: a case study. The International Journal of Logistics Management, 25(2), pp.358–
378.
Theil, H., 1966. Applied economic forecasting. Chicago, IL7 Rand McNally.
Thompson, J.C. and Brier, G.W., 1955. The Economic Utility of Weather Forecasts.
Monthly Weather Review, 83(11), pp.249–254.
Thompson, J.C., 1962. Economic Gains from Scientific Advances and Operational
Improvements in Meteorological Prediction. Journal of Applied Meteorology, 1(March).
Tibben-lembke, R.S. and Amato, H.N., 2001. Replacement Parts Management: The
Value of Information. Journal of Business Logistics, 22(2), pp.149–165.
227
Toulis, P. and Airoldi, E.M., 2015. Scalable Estimation Strategies Based On Stochastic
Approximations: Classical Results and New Insights. Statistics and Computing, 25(4),
Pp.781-795.
Tran, B. R., 2016. Blame it on the Rain - Weather Shocks and Retail Sales. Retrieved
from http://econweb.ucsd.edu/~brothtra/pdfs/BlameItOnTheRain.pdf.
Tranfield, D., Denyer, D. and Smart, P., 2003. Towards a methodology for developing
evidence‐informed management knowledge by means of systematic review. British
Journal of Management, 14(3), pp.207-222.
Trapero, J.R., Kourentzes, N. and Fildes, R., 2012. Impact of information exchange on
supplier forecasting performance. Omega: International Journal of Management Science,
40(6), pp.738–747. Available at: http://dx.doi.org/10.1016/j.omega.2011.08.009.
Tratar, L.F., 2010. Joint optimisation of demand forecasting and stock control
parameters. International Journal of Production Economics, 127(1), pp.173–179.
Available at: http://dx.doi.org/10.1016/j.ijpe.2010.05.009.
Tribbia, J.J., 1997. Weather prediction. Economic value of weather and climate forecasts,
pp.1-12.
Tucker, P. and Gilliland, J., 2007. The effect of season and weather on physical activity:
A systematic review. Public Health, 121, pp.909–922.
Ülkü, S., Toktay, L.B. and Yücesan, E., 2007. Risk Ownership in Contract
Manufacturing. Manufacturing & Service Operations Management, 9(3), pp.225–241.
Available at: http://pubsonline.informs.org/doi/abs/10.1287/msom.1060.0141.
van der Laan, E., van Dalen, J., Rohrmoser, M. and Simpson, R., 2016. Demand
forecasting and order planning for humanitarian logistics: An empirical assessment.
Journal of Operations Management, 45, pp.114–122. Available at:
http://dx.doi.org/10.1016/j.jom.2016.05.004.
Velicer, W.F., 1976. Determining the Number of Components from the Matrix of Partial
Correlations. Psychometrika, 41(3), Pp.321-327.
Venkataraman, R. and D'Itri, M.P., 2001. Rolling horizon master production schedule
performance: a policy analysis. Production planning & control, 12(7), pp.669-679.
Venkataraman, R. and Nathan, J., 1999. Effect of forecast errors on rolling horizon
master production schedule cost performance for various replanning intervals. Production
Planning & Control, 10(7), pp.682-689.
228
Vose, R.S., Applequist, S., Durre, I., Menne, M.J., Williams, C.N., Fenimore, C.,
Gleason, K. and Arndt, D., 2014: Improved Historical Temperature and Precipitation
Time Series For U.S. Climate Divisions. Journal of Applied Meteorology and
Climatology. DOI: http://dx.doi.org/10.1175/JAMC-D-13-0248.1
Wacker, J.G. and Sprague, L.G., 1995. The impact of institutional factors on forecast
accuracy: manufacturing executives’ perspective. International Journal of Production
Research, 33(11), pp.2945–2958.
Wacker, J.G. and Sprague, L.G., 1998. Forecasting accuracy: comparing the relative
effectiveness of practices between seven developed countries. Journal of Operations
Management, 16, pp.271–290.
Waller, M.A. and Fawcett, S.E., 2013. Data science, predictive analytics, and big data: A
revolution that will transform supply chain design and management. Journal of Business
Logistics, 34(2), pp.77–84.
Waller, M.A. and Fawcett, S.E., 2013. Click Here for a Data Scientist: Big Data,
Predictive Analytics, and Theory Development in the Era of a Maker Movement Supply
Chain. Journal of Business Logistics, 34(4), pp.249–252.
Wallstrom, P. and Segerstedt, A., 2010. Evaluation of forecasting error measurements

and techniques for intermittent demand. International Journal of Production Economics,
128, pp.625–636.
Wan, X. and Sanders, N.R., 2017. The negative impact of product variety: Forecast bias,
inventory levels, and the role of vertical integration. International Journal of Production
Economics, 186(July 2016), pp.123–131. Available at:
Wemmerlöv, U., 1985. Comments on “Cost Increases Due To Demand Uncertainty in

MRP Lot Sizing”: A Verification of Ordering Probabilities. Decision Sciences, 16(4),
pp.410–419.
Wemmerlöv, U., 1989. The behavior of lot-sizing procedures in the presence of forecast
errors. Journal of Operations Management, 8(1), pp.37–47.
Wemmerlöv, U., 1984. Assemble-To-Order Manufacturing: Implications for Materials

Management. Journal of Operations Management, 4(4), pp.347–368.
Werdigier, J., 2009. British Grocery Chain Uses The Weather to Predict Sales. New York
Times, September 1, p. B6.
Wickramatillake, C.D., Koh, L.S.C., Gunasekaran, A. and Arunachalam, S., 2007.

Measuring performance within the supply chain of a large scale project. Supply Chain
229
Management: An International Journal, 12(1), pp.52–59. Available at:
Widiarta, H., Viswanathan, S. and Piplani, R., 2006. On the Effectiveness of Top-Down
Strategy for Forecasting Autoregressive Demands. Naval Research Logistics, 54(Mar
2007), pp.176–188. Available at: http://www.interscience.wiley.com/jpages/0894-069X/.
Wieland, A. and Wallenburg, C.M., 2012. Dealing with supply chain risks: Linking risk
management practices and strategies to performance. International Journal of Physical
Distribution & Logistics Management, 42(10), pp.887-905.
Willemain, T.R., Smart, C.N. and Schwarz, H.F., 2004. A new approach to forecasting
intermittent demand for service parts inventories. International Journal of Forecasting,
20(3), pp.375–387.
Williams, B.D. and Tokar, T., 2008. A review of inventory management research in
major logistics journals: Themes and future directions. The International Journal of
Logistics Management, 19(2), pp.212-232.
Williams, B.D. and Waller, M.A., 2011. Estimating a retailer’s base stock level: an
optimal distribution center order forecast policy. Journal of the Operational Research
Society, 62(4), pp.662–666.
Williams, B.D. and Waller, M.A., 2011. Top-down versus bottom-up demand forecasts:
The value of shared point-of-sale data in the retail Supply Chain. Journal of Business
Logistics, 32(1), pp.17–26.
Williams, B.D. and Waller, M.A., 2010. Creating Order Forecasts: Point-of-Sale or Order
History? Journal of Business Logistics, 31(2), pp.231–251. Available at:
http://doi.wiley.com/10.1002/j.2158-1592.2010.tb00150.x.
Winklhofer, H., Diamantopoulos, A. and Witt, S., 1996. Forecasting practice: A review
of the empirical literature and an agenda for future research. International Journal of
http://www.sciencedirect.com/science/article/pii/0169207095006478.
Winter, M. and Knemeyer, A.M., 2013. Exploring the integration of sustainability and
supply chain management: Current state and opportunities for future inquiry.
International Journal of Physical Distribution & Logistics Management, 43(1), pp.18-38.
Wooldridge, J.M., 2010. Econometric Analysis of Cross Section and Panel Data. MIT
Press.
Wooldridge, J.M., 2015. Introductory Econometrics: A Modern Approach. Nelson

Education.
230
Xie, J., Lee, T.S. and Zhao, X., 2004. Impact of forecasting error on the performance of
capacitated multi-item production systems. Computers & Industrial Engineering, 46,
pp.205–219.
Yang, K.K. and Jacobs, F.R., 1999. Replanning the Master Production Schedule for a
Capacity-Constrained Job Shop. Decision Sciences, 30(3), pp.719–748.
Yao, Y., Kohli, R., Sherer, S.A. and Cederlund, J., 2013. Learning curves in collaborative
planning, forecasting, and replenishment (CPFR) information systems: An empirical
analysis from a mobile phone manufacturer. Journal of Operations Management, 31(6),
pp.285–297.
Yao, D.Q., Yue, X., Wang, X. and Liu, J.J., 2005. The impact of information sharing on a
returns policy with the addition of a direct channel. International Journal of Production
Economics, 97(2), pp.196–209.
Yelland, P.M., 2010. Bayesian forecasting of parts demand. International Journal of

Yin, X. and Zajac, E.J., 2004. The Strategy/Governance Structure Fit Relationship:
Theory and Evidence in Franchising Arrangements. Strategic Management Journal. 25
(4): 365-383. doi:10.1002/smj.389.
Yokum, J.T. and Armstrong, J.S., 1995. Beyond accuracy: Comparison of criteria used to
select forecasting methods. International Journal of Forecasting, 11(4), pp.591-597.
Zhao, X., Lai, F. and Lee, T.S., 2001. Evaluation of safety stock methods in multilevel
material requirements planning (MRP) systems. Production Planning & Control, 12(8),
pp.794-803.
Zhao, X. and Lee, T.S., 1993. Freezing the master production schedule for material
requirements planning systems under demand uncertainty. Journal of Operations
Management, 11, pp.185–205.
Zhao, X. and Xie, J., 2002. Forecasting errors and the value of information sharing in a
supply chain. International Journal of Production Research, 40(2), pp.311–335.
Zhao, X., Xie, J. and Leung, J., 2002. The impact of forecasting model selection on the
value of information sharing in a supply chain. European Journal of Operational
Research, 142(2), pp.321–344.
Zhao, X., Xie, J. and Wei, J.C., 2002. The Impact of Forecast Errors on Early Order
Commitment in a Supply Chain. Decision Sciences, 33(2), pp.251–280. Available at:
http://doi.wiley.com/10.1111/j.1540-5915.2002.tb01644.x.
231
Zhu, X., Mukhopadhyay, S.K. and Yue, X., 2011. Role of forecast effort on supply chain
profitability under various information sharing scenarios. International Journal of
Production Economics, 129(2), pp.284–291. Available at:
Zinn, W. and Marmorstein, H. Comparing two alternative methods of determining safety

stock levels: The demand and the forecast systems. Journal of Business Logistics 11, no.
1 (1990): 95-110.
Zivin, J.G. and Neidell, M., 2014. Temperature and the Allocation of Time: Implications
for Climate Change. Journal of Labor Economics, 32(1).
Zotteri, G., Kalchschmidt, M. and Caniato, F., 2005. The impact of aggregation level on
forecasting performance. International Journal of Production Economics, 94, pp.479–
491.
Zotteri, G., Kalchschmidt, M. and Saccani, N., 2014. Forecasting by Cross-Sectional

Aggregation. Foresight: The International Journal of Applied Forecasting, 35, pp.35–42.
Zwick, W.R. and Velicer, W.F., 1986. Comparison of Five Rules for Determining the
Number of Components to Retain. Psychological Bulletin, 99(3), P.432.
232
Appendix A: Boolean Keywords for EBSCO Business Source Complete
The following keywords were applied to Publication Name, Title, Subject, Keywords or
Abstract with the EBSCO Business Source Complete databases:
((forecast* N5 (error OR accura* OR quality OR deviation* OR performance OR

consistency OR reliability OR precision OR bias))
AND
((supply N5 chain*) OR logistic*)
AND
(((bull* N5 *whip) OR "industrial dynamics" OR "system dynamics")
OR ((trade* N5 *off) OR (break* N5 *even) OR cost* OR "economic impact")
OR (aggregat* OR hierarchical OR combination OR composite OR synthesis OR
consensus OR pooling)
OR (judgement* OR subject* OR adjust*)
OR (information AND shar*)
OR (metric* OR measure)
OR (over* N5 *fit*)
OR (((safety OR buffer) N5 (stock OR inventory)) OR "lead time" OR "stock policy" OR
inventory)
OR ("supply chain" N5 (collaborat* OR coordinat* OR manage*))
OR ("supply chain" N5 (integrat* OR synchroniz*))
OR ("service level" OR "fill rate" OR "ready rate" OR (stock* AND *out))
OR (horizon OR range)
OR (substitut* OR flexib* OR adapt* OR agil*)
OR (tolerance OR resilience OR robust* OR point)
OR (varia* OR uncertain* OR *predictab* OR *forecastab* OR volatil*)))
233

Copyright

Uploaded by

Copyright:

Available Formats

Copyright

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Copyright

Uploaded by

Copyright:

Available Formats

An Exploration of and Case Studies in Demand Forecast Accuracy:

Replenishment, Point of Sale, and Bounding Conditions

Kevin Barry Smyth, M.S.

Graduate Program in Business Administration

The Ohio State University

A. Michael Knemeyer, Advisor

Xiang (Sean) Wan

Kevin Barry Smyth

conception of forecast accuracy than the original questions posed.

In three essays, we identify governance form factors that affect replenishment

approach to contextualize the heterogeneity of effects across regionally, temporally and

the inclusion of uncertain weather forecast variables to improvements in demand forecast

thematic drivers of forecast accuracy.

Each essay expands theoretical understanding of management phenomena, and

the context of logistic and the supply chain.

would be possible through my work alone.

work contributes to their success as well.

Chain Forum. Their work facilitating a collaborative research environment for

required resources to see it through.

of Social Work at Indiana University.

2008 B.S. Nuclear Engineering with a Minor in Aerospace

Studies, Oregon State University

2012 M.S. Logistics and Supply Chain Management, Air Force

Institute of Technology (AFIT)

2012 Graduate Certificate, Nuclear Weapon Effects,

Proliferation and Policy, AFIT

2008 to present Munitions and Missile Maintenance Officer, United States

2014 to present Graduate Teaching Associate, Department of Marketing

and Logistics, The Ohio State University

Major Field: Business Administration

Table of Contents .............................................................................................................. vii

List of Tables ................................................................................................................... xiii

List of Figures ................................................................................................................... xv

Chapter 1: Introduction ...................................................................................................... 1

Chapter 2: “The Effect of Governance Form on Replenishment Forecast Deviation and

Bias: A Case Study in the Quick-Service Restaurant Industry” ........................................ 7

Literature Review .......................................................................................................... 13

Transaction Costs ...................................................................................................... 13

Resource Scarcity ...................................................................................................... 14

Agency Costs or Administrative Efficiency ............................................................... 15

Model Development ...................................................................................................... 24

Data Collection and Analysis .................................................................................... 24

Variable Evaluation and Selection ............................................................................ 26

Training Data Set Formation .................................................................................... 28

Data Processing ........................................................................................................ 30

Data Exploration ....................................................................................................... 32

A Novel Regression-Based Approach ........................................................................ 34

Accounting for Cases of Non-Deviation .................................................................... 38

Outlier Identification ................................................................................................. 43

Deviation Model ........................................................................................................ 45

Bias Model ................................................................................................................. 47

Effects of Governance Form ...................................................................................... 50

Extensions to the Post-Contractual Effects from Governance Form ........................ 51

Methodological Implications ..................................................................................... 52

Chapter 3: “The Impact of Including Forward Indicators on POS Demand Forecast

Accuracy: The Case of Short-Term Weather Forecast Data” .......................................... 58

Literature Review .......................................................................................................... 63

Observed Weather’s Effect on Demand..................................................................... 63

Weather Effects on Retail and Service Sectors .......................................................... 64

Explanations for Consumer Behavior ....................................................................... 67