MBAv1 PDF
MBAv1 PDF
MBAv1 PDF
Patrick R. McMullen,
ISBN-10: 1523352825
ISBN-13: 978-1523352821
This book is dedicated to my Dad and the memory of my Mom.
I was lucky enough to have Mom and Dad sit through a class session of
mine, and it meant a lot to me to see the pride in their eyes.
Contents
Introduction ...................................................................................................... 6
Software........................................................................................................ 6
Course “Prerequisites”.................................................................................. 7
Acknowledgements ...................................................................................... 8
1. Why Statistics? .......................................................................................... 9
1.1 Tools for this Course ....................................................................... 12
1.2 Conclusions ..................................................................................... 13
1.3 Exercises.......................................................................................... 13
2. Describing Data ....................................................................................... 14
2.1 Categorical Data Analysis ................................................................ 14
2.2 Numerical Data Analysis ................................................................. 15
2.2.1 Expectation ............................................................................. 16
2.2.2 Variation.................................................................................. 17
2.2.3 Distribution ............................................................................. 19
2.3 Presenting Graphical Data .............................................................. 20
2.3.1 Histogram................................................................................ 21
2.3.2 Box Plot ................................................................................... 27
2.4 Putting it All Together ..................................................................... 28
2.5 Conclusions ..................................................................................... 29
2.6 Exercises.......................................................................................... 29
3 Probability ............................................................................................... 32
3.1 Basic Probability.............................................................................. 32
3.2 Probability Rules ............................................................................. 33
3.3.1 Probability Rule 1 .................................................................... 34
3.3.2 Probability Rule 2 .................................................................... 34
3.3.3 Probability Rule 3 ....................................................................34
3.3.4 Probability Rule 4 ....................................................................34
3.3.5 Probability Rule 5 ....................................................................35
3.3.6 Probability Rule 6 ....................................................................36
3.3 Contingency Tables .........................................................................36
3.4 Basic Counting .................................................................................39
3.4.1 The Product Rule .....................................................................40
3.4.2 Combinations ..........................................................................40
3.4.3 Permutations...........................................................................41
3.4.4 Excel “Counts” .........................................................................42
3.5 Conclusions .....................................................................................42
3.6 Exercises ..........................................................................................42
4. Random Variables ...................................................................................45
4.1 Discrete Random Variables .............................................................45
4.1.1 Discrete Distribution ...............................................................45
4.1.2 Binomial Distribution ..............................................................47
4.2 Continuous Random Variables ........................................................49
4.2.1 Uniform Distribution ...............................................................49
4.2.2 Normal Distribution ................................................................50
4.3 The Central Limit Theorem .............................................................54
4.4 Conclusions .....................................................................................57
4.5 Exercises ..........................................................................................57
5. Estimation ...............................................................................................59
5.1 Means..............................................................................................59
5.2 Proportions .....................................................................................61
5.3 Differences between means ...........................................................63
5.4 Conclusions .....................................................................................64
5.5 Exercises.......................................................................................... 64
6. Hypothesis Testing .................................................................................. 67
6.1 General Process .............................................................................. 67
6.1.1 Null and Alternative Hypotheses ............................................ 68
6.1.2 Steps of the Hypothesis Test .................................................. 72
6.2 Testing Means ................................................................................. 78
6.3 Testing Proportions......................................................................... 80
6.4 Testing Differences between Means .............................................. 83
6.5 Confidence Intervals and Two-Tailed Test...................................... 87
6.6 Conclusions ..................................................................................... 88
6.7 Exercises.......................................................................................... 89
7. Oneway Analysis of Variance .................................................................. 93
7.1 Variation and the F-Distribution ..................................................... 93
7.2 Testing Equality of Means from Multiple Populations ................... 98
7.3 Multiple Factor Analysis of Variance ............................................ 101
7.4 Conclusions ................................................................................... 102
7.5 Exercises........................................................................................ 102
8. Chi-Square Testing ................................................................................ 104
8.1 The Chi-Square Test ...................................................................... 104
8.2 Goodness of Fit Test ..................................................................... 105
8.3 Test for Independence .................................................................. 111
8.4 Conclusions ................................................................................... 114
8.5 Exercises........................................................................................ 115
9. Simple Linear Regression ...................................................................... 118
9.1 Slope and Intercept....................................................................... 118
9.2 Ordinary Least Squares Regression...............................................119
9.3 Statistical Inference for Slope and Intercept ................................121
9.3.1 Excel for Regression ..............................................................121
9.3.2 Testing the Slope and Intercept ............................................122
9.4 Estimation / Prediction .................................................................123
9.5 Conclusions ...................................................................................124
9.6 Exercises ........................................................................................125
10. Multiple Linear Regression ...............................................................128
10.1 Improving Predictive Ability ..........................................................128
10.1.1 Adding More Variables..........................................................128
10.1.2 F-Statistic and Regression Output .........................................129
10.2 Multicollinearity ............................................................................131
10.2.1 Correlation ............................................................................132
10.2.2 Remediation ..........................................................................133
10.3 Parsimony......................................................................................134
10.4 Conclusions ...................................................................................135
10.5 Exercises ........................................................................................135
11. Business Forecasting .........................................................................139
11.1 Time Series Analysis ......................................................................139
11.2 Simple Forecasting Tools ..............................................................139
11.2.1 Simple Moving Average ........................................................140
11.2.2 Weighted Moving Average ...................................................140
11.2.3 Differencing ...........................................................................140
11.3 Regression Based Forecasting .......................................................141
11.3.1 Linear Trends.........................................................................142
11.3.2 Nonlinear Trends...................................................................143
11.3.3 Microsoft Excel and Forecasting ...........................................145
11.4 Seasonality in Forecasting ............................................................ 146
11.4.1 Order of Seasonality ............................................................. 146
11.4.2 De-Seasonalize Data ............................................................. 148
11.4.3 Capture Trend ....................................................................... 148
11.4.4 Fit and Forecast..................................................................... 149
11.5 Parsimony ..................................................................................... 151
11.6 Conclusions ................................................................................... 152
11.7 Exercises........................................................................................ 152
12. Decision Tree Analysis ...................................................................... 154
12.1 Decision Trees ............................................................................... 154
12.2 Decision Strategies........................................................................ 155
12.2.1 Optimistic Strategy (“Maximax”) .......................................... 156
12.2.2 Pessimistic Strategy (“Maximin”) ......................................... 156
12.2.3 Expected Value Strategy ....................................................... 157
12.3 Expected Value of Perfect Information ........................................ 158
12.4 An Example ................................................................................... 159
12.5 Conclusions ................................................................................... 160
12.6 Exercises........................................................................................ 161
Appendix. Development of Seasonal Forecasting Problem......................... 164
References .................................................................................................... 167
Quantitative Methods for MBA Students
Introduction
The decision to write this book was made over time and eventually out
of necessity. I have seen the students and their families spend much too much
money for their textbooks in addition to skyrocketing tuition costs. This
problem didn’t reach a critical point with me until my children became of
college age, when I experienced the problem myself.
Another reason for this book is so that there is more consistency between
my lectures and the content in the book. One of the common bits of feedback
from my students is that they don’t like the book. I don’t think the quality of
the book is the problem, but the way I use the book is the problem. With this
book, I will be able to better coordinate my lectures with the book, which will
benefit the students. There will be a stronger relationship between the book,
the lectures, and the homework assignments.
The homework problems in the book will also refer to data sets which I
have created, which will also result in better coordination between lectures
and assignments
Software
For this class, Microsoft Excel, version 2010 and later, is the software
package of choice. While I do not consider Microsoft Excel the best statistics
software available, it is the one software package to which essentially all
students will have access. I liken Microsoft Excel’s ability to perform statistics
similar to a Swiss Army knife – it does a lot of things well, but no one thing
exceptionally well. Microsoft Excel is flexible, and provides the tools for
successful data analysis in a classroom setting.
6
Patrick R. McMullen
Microsoft with more ease than Excel. JMP is not used as often as Microsoft
Excel because of the limited availability of JMP beyond the university years.
Nevertheless, JMP will be used on occasion to show some software
capabilities that Excel simply cannot perform. This is especially the case when
performing certain types of graphics.
Course “Prerequisites”
For this course, a basic understanding of high-school or college algebra is
a must. Beyond that, some “mathematical maturity” is needed at times. In
statistics, we often sum quantities and we use subscript notation to do this. It
is not to appear pretentious, but it is needed to explain something as briefly
as possible.
7
Quantitative Methods for MBA Students
Acknowledgements
Ironically, I have the textbook publishers and university bookstores to
thank for giving me the motivation to write this book. Because they form
partnerships and charge the students too much money, I now find an
affordable textbook more necessary than ever.
I would like to thank my friend and colleague, Jon Pinder, for selling me
on this point. Jon started doing the same thing a few years ago, with the intent
of saving the students money.
I would also like to thank Mike DiCello for his help with photography,
Karen Combs for her help with editing, and Vickie Whapham for her
administrative help. It is also important for me to thank Kevin Bender, Pat
Peacock, Carol Oliff and Chas Mansfield for their never-ending assistance in
helping the students better understand the importance of the learning
process.
Most importantly, I would like to thank Professor Larry Richards of the
University of Oregon. When I entered the University of Oregon, I had a real
fear of statistics. While there, I had several statistics classes with Larry. Under
his mentoring, I realized that I like statistics. I realized that it is the most
important mathematics class there is. I also learned that it would be fun to
teach statistics someday. Larry’s mentoring gave me the confidence to teach
statistics. Getting in front of a classroom full of students fearing statistics is
not an easy thing to do. Larry Richards gave me the confidence to do this.
8
Patrick R. McMullen
1. Why Statistics?
Statistics is a feared entity in the business school – for both
undergraduate and MBA students. I personally fit this theory given that I
received a grade of “D” when I first took statistics as an undergraduate
engineering student. Assuming you’ve not thrown your book away over that
revelation, I will continue.
Prior to taking statistics for the first time, I was excited about the class.
The reason for this, as embarrassing as it may be, was because I thought I
already had a handle on what statistics was all about. As a schoolboy, my
friends and I would trade baseball cards, in our effort to have a complete
collection. If I had two 1974 cards of Pete Rose, but I was lacking a 1974 Henry
Aaron card, I would gladly forfeit one of my Pete Rose cards for the Henry
Aaron card.
From inspection of Table 1.1, and inspection of other data sets detailing career
hitting statistics, it becomes clear that Henry Aaron was unarguably one of the
very best hitters in the history of the game. In fact, he has more Runs Batted
In (2,297) than anyone in history, and 755 career home runs were the most
ever until Barry Bonds broke the record in 2007 with 762 home runs.
Unfortunately for Barry Bonds, his record will always be in doubt due to his
9
Quantitative Methods for MBA Students
The problem with this is pretty clear. The sample is not representative of
the US population. Brentwood, California is a very affluent area, and the
average home price there would be much higher than it would be for the
entire US. Instead, we have to randomly select homes from all over the US to
use for our analysis. When observations are randomly selected for our
analysis, any sort of bias is avoided, and we are speaking on behalf of the
1
Bonds has admitted to performance-enhancing drug use, but he clarifies this by
stating he never “knowingly” took performance-enhancing drugs.
10
Patrick R. McMullen
Once we have gathered our random sample from the population, we can
make a proper analysis. Our intent is to learn something about our data, so
that we can “tell the world” about our findings. When we “tell the world”
about our findings, we are expected to use a very well-defined set of tools and
semantics to articulate our findings. Quite often, we must perform a
structured “test” prior to our telling the world about our findings. This formal
test essentially demonstrates that proper protocol has been followed in our
analyses.
11
Quantitative Methods for MBA Students
campaign for their product. Before spending the millions of dollars to launch
the campaign, they will need to “field test” the advertising campaign on focus
groups, to determine if the focus groups are favorably responsive to the new
advertising campaign. In order to make this determination, formal statistical
analyses must be performed that convince senior management that the
investment in the campaign is worthwhile.
12
Patrick R. McMullen
1.2 Conclusions
It is reasonable for us to think of statistics as a “toolbox” for us to better
understand our environment via a collection of data. These tools help us
assess our environment. When we better understand our environment, we
can improve and enhance the position of our organization, whatever that
particular organization’s purpose may be.
1.3 Exercises
1. Would Boston, Massachusetts be a good place to sample in order to
estimate the average SAT score for all high school students? Why or why not?
2. Would the state of Ohio be a good place to sample the customer’s liking
to a new food product? Why or why not?
3. Why might the FDA’s use of statistics be more important than other
organizations use of statistics?
13
Quantitative Methods for MBA Students
2. Describing Data
Statistics is all about studying data and articulating to the world our
findings. Prior to our performing formal statistical tests, we need to describe
the data in its most basic form. There are three basic approaches to doing
this: Categorical Data Analysis, Numerical Data Analysis and the Presenting of
Graphical Data.
14
Patrick R. McMullen
Male Female
15
Quantitative Methods for MBA Students
descriptive statistics, we can take a given set of data, calculate many statistics,
and then articulate our findings in a reasonable way. There are several
categories in which to describe data, but we concentrate on the most
important two categories: expectation and variation.
2.2.1 Expectation
Expectation is a measure of central tendency – a singular value we can
use to describe the expectation of a given data set. For example, when an
instructor administers an exam, they will often state the exam average for the
entire class. The instructor has summed the test scores, and divided this sum
by the number of exams taken. The instructor is articulating, in essence, how
the average student performed on the exam. Based upon a specific student’s
score then, any given student can compare their performance against the
average performance. In the world of statistics, the average value of some
entity is often called the mean.
The average isn’t the only metric we can use to articulate central
tendency. Consider a situation in which we are given an assignment to
estimate the expected home value in King County, Washington. We decide to
randomly sample 10 houses in the area for our estimation. It turns out that
one of the ten houses we use is the home of Microsoft founder Bill Gates. Mr.
Gates’ home is worth several million dollars – much higher than the other
home values in our collection of data. This will severely bias our average – it
would inflate our average and misrepresent the expected home value in King
County, Washington. Instead of calculating the average, we can sort our data
from low to high (or high to low) and selected the “middle value” for an odd
number of observations, or average the “middle two” values for an even
16
Patrick R. McMullen
number of observations. This value is called the median, and is often used on
socioeconomic data to eliminate any biases associated with extreme values.
2.2.2 Variation
Variation is an underappreciated and overlooked part of descriptive
statistics, but a very important one. In fact, it is just as important as the
measures of central tendency. The reason it is so underappreciated is because
it is not terribly well understood. In fact, when I didn’t do well the first time I
took statistics, my misunderstanding of variation was a large reason why.
Variation measures dispersion of the data set. That is the most general
form of variation, which we can capture that value by the range (Range),
which is the difference between the minimum observed value (xmin) and the
maximum observed value (xmax). Mathematically, this is as follows:
17
Quantitative Methods for MBA Students
(Eq. 2-3)
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠= √
𝑛−1
This calculation calculates the difference between each observation and the
sample mean. This difference is squared. It is squared for two reasons: the
first reason is to eliminate a negative number in the calculation, as any number
squared is a positive number; the second reason is to amplify big differences
between the observed value and the sample mean. These squared differences
are then summed, which accounts for the numerator of the equation. The
numerator is then divided by (n - 1). The division by n is to put the calculated
quantity into the same units as the observed data – similar to dividing by n
when a sample mean is calculated. The value of “1” is subtracted from n to
account for the fact that we have limited data – sample information instead
of population data. The subtraction of “1” from n adjusts the sample standard
deviation upward, “inflating” our estimate of the standard deviation, as
compared to the standard deviation of the population. Fortunately, this
formula is rarely used in practice, as Excel and other software packages easily
make the calculation.
Another formula, used to calculate the sample variance is the squared
value of the sample standard deviation, and is shown here:
2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 (Eq. 2-4)
𝑠 =
𝑛−1
The only difference between the two is that the sample standard deviation is
the square root of the sample variance, and is more commonly used, because
its value is in the same units as the observed value.
In Excel, the formulae for range, standard deviation and sample variance
are as follows:
18
Patrick R. McMullen
Now that we have the tools to describe central tendency and variation, a
brief time is spent discussing the distribution of the data set.
2.2.3 Distribution
Many observations comprise a data set. Each individual observation has
a “position” in the data set, so to speak. When the data is sorted, we can “see”
where a certain observation fits in with the rest of the data. Excel, JMP and
other software packages tell provide us with the position of an observation in
a data set without our having to sort or otherwise manipulate the data. In
particular, we can calculate percentiles and quantiles associated with a data
set.
The percentile of a data set basically breaks down the data into 100
pieces, or percentiles. The quartile of a data set basically breaks down the
data into four pieces. The first quartile is tantamount to the 25th percentile,
the second quartile is tantamount to the 50th percentile (which we also refer
to as the median), and the third quartile is tantamount to the 75th percentile.
The fourth quartile, which we never use, is essentially the maximum observed
value.
19
Quantitative Methods for MBA Students
In Excel, these values are easily computed using the percentile and
quartile functions.
For example, if we have a data set called “X” with several observations, the
function “=percentile(X, 0.63)” will return the value in the data set associated
with the 63rd percentile. If we use the function “=quartile(X, 3),” the function
will return the value in the data set associated with the 3rd quartile (or 75th
percentile). Note that for the percentile function, our value must be between
0 and 1. The 0th percentile is the minimum value in the data set, while the
100th percentile is the maximum value in the data set. For quartile values, our
quartile value must be either 0, 1, 2, 3 or 4. The 0th quartile is the minimum
value in the data set, while the 4th quartile is the maximum value in the data
set. It should be noted, however, that the 0th quartile and the 4th quartile are
of no practical value, as they respectively imply the minimum and maximum
values.
It should be noted that the functions shown in Table 2.4 assume that we
intend for the highest value possible to occur – as in a test score, for example.
If we seek the lowest score possible, we simply replace “=percentile(data
range, value)” with “1 – percentile(data range, value).” For quartiles, we
simply switch the values for the first and third quartiles.
20
Patrick R. McMullen
adage. Today’s computer and software resources have greatly enhanced our
ability to visualize data. Thirty years ago, our ability to visualize data was
essentially constrained to a handheld calculator, pencil and paper. Bad
memories.
While there are countless tools to present data in a graphical format, our
pursuit of presenting graphical data will be confined to the histogram and the
boxplot.
2.3.1 Histogram
Without question, the histogram is the most common way to graphically
present univariate (single variable) data. When we have a collection of data,
it is wise to organize the data based on similarity – we did this in Section 2.1,
where we categorized the jeans that the concert-goers wore by gender and by
the color of jeans. Now that we are dealing with numerical data, similarity can
be categorized much easier. Numbers that are “similar” can be numbers that
are close in value. In a histogram, we have two axes: the horizontal axis (often
referred to as the “x-axis”) is a listing of numerical outcomes in ascending
order. The vertical axis (often referred to as the ‘’y-axis”) is the frequency
(how often) each outcome was observed. A histogram is sometimes referred
to a frequency chart.
21
Quantitative Methods for MBA Students
The point of the above is to suggest that some thought should go into
how outcomes are organized. One rule of thumb is that a good number of
categories to use when deciding how many outcomes to have is a function of
the number of observations in the data set, which we will call “n.” This
particular rule of thumb suggests that √𝑛 is a good number of categories to
use. I agree with this rule of thumb, but I generalize it even more – for large
data sets (~ 500 observations), I like to use about 25 categories. For larger
data sets (≥ 1000 observations), I like to use about 30 or so categories.
Fortunately, most fully-dedicated statistical software packages like JMP and R
automatically choose how many categories to use, but this can be controlled
by the user. For Excel, the number of categories must be pre-determined.
Let’s build a histogram with some data. Let’s assume that it was “Las
Vegas Night” at the business school. Staff, students and faculty played a few
casino games for small amounts of money, and the “money lost” by the
players went to a charitable cause. There were 120 participants, and their
“winnings” are shown below in Table 2.5. Negative winnings are, of course,
losses and are preceded by a “minus sign.”
From close inspection of Table 2.5, the worst performance was a loss of
$10.00, and the best performance was a gain of $2.00, making for a range of -
10 to 2. It seems reasonable to have categories starting at -$10 and ending
with $2, and having them in $1 increments, for a total of 13 categories.
Incidentally, 13 total categories is reasonably close to the rule of thumb
number of categories of √𝑛.
22
Patrick R. McMullen
-3.75 -6.00 -8.50 -2.75 -0.50 -5.50 -6.00 -4.75 -6.50 -5.25
-7.50 -6.75 -4.75 -3.25 -5.75 -1.00 -1.00 -5.50 -5.75 -3.75
-1.75 0.75 -0.75 -0.50 -2.00 1.75 -2.75 1.75 -5.75 -2.50
-2.25 -1.25 -3.25 -5.75 -5.25 -3.00 -6.25 -5.25 -4.75 -7.75
-3.25 -2.25 -2.00 -4.50 -3.00 -8.25 -4.00 0.00 0.75 -1.50
-6.75 -2.25 -4.25 -6.25 -7.25 -2.50 1.25 -1.75 -3.00 -4.75
-9.50 -4.50 -4.25 -1.50 -4.25 -8.50 -3.00 -2.25 -3.00 -5.50
-5.00 -3.75 -4.25 -2.25 -3.75 -1.25 -6.50 -5.50 -6.50 -5.25
-0.75 -3.00 -4.50 -1.75 -3.00 -4.50 -2.50 -2.25 -4.50 -6.00
-3.50 -6.00 -2.25 -6.00 -3.50 -6.00 -0.75 -2.75 -1.50 -1.50
-3.25 -8.50 -2.75 -4.75 -6.25 -1.00 0.25 -5.00 -3.50 -10.00
-7.25 -2.75 -1.75 -2.00 -2.25 -3.25 2.00 -3.50 -0.50 -8.25
Table 2.5. Winnings from Las Vegas Night
23
Quantitative Methods for MBA Students
A histogram using the frequency data from Table 2.6 is shown in Figure 2.2.
24
Patrick R. McMullen
20
Frequency
15
10
Winnings
You will notice from Figure 2.2 that this histogram is nothing more than
a column chart, with the horizontal axis being the ordered categories, and the
vertical axis being the frequencies.
It should also be noticed that the rightmost column of Table 2.6 is entitled
“Relative Frequency.” For each outcome category, we can convert the
frequencies to relative frequencies by dividing the frequency value by the total
number of observations, which is 120 in this case. For example, the -$4
category has a frequency of 15. The relative frequency of this category is
12.50% (15/120). It would be of no particular benefit to show a histogram of
the relative frequencies, because the column heights would be exactly
proportion to the column heights of the frequency histogram in Figure 2.2.
The only difference between a frequency and a relative frequency is that the
frequencies will sum to the number of observations, while the sum of the
relative frequencies will sum to 1.
25
Quantitative Methods for MBA Students
In terms of the histogram itself, we see that losses of about $3 is the most
frequent outcome. Without getting into descriptive statistics at present, we
can reasonably conclude some form of central tendency of an approximate $3
loss. This will be discussed further in the next section.
The histogram has a few properties worth noting. First, we might notice that
most outcomes reside on the left – lower incomes, with a decreasing
frequency of higher incomes, until we get to the $200K and more category.
The median, a measure of central tendency, resides in the “hump,” while
there is a dominant tail to the right. We use the term “skewed right” to
describe this type of distribution – the tail is to the right of the “hump.” A
distribution that is “skewed left” is where the dominant tail is to the left of
the “hump.” If a distribution is not skewed either way, we call the distribution
symmetric.
26
Patrick R. McMullen
Any values that are below the lower threshold or above the upper threshold
are considered outliers – extreme observations. “Whiskers” are then drawn
in. The lower whisker is drawn from the first quartile to correspond to the
smallest value in the data set above the lower threshold. Similarly, the upper
whisker is drawn from the third quartile to the highest value in the data set
below the upper threshold. Values in the data set outside of these two
thresholds are considered outliers and are highlighted as such. It is possible
for a box plot not to have outliers. The figure below shows a generalized box
plot in the context as described.
27
Quantitative Methods for MBA Students
Once the JMP software is started, we can import an Excel data set. From
there, asking JMP to make the statistical analyses is very easy. While in JMP,
we select “Analyze,” then “Distribution.” From there, we select our variable
for the “Y, Columns” box and click “Go.” At this point, JMP returns an immense
amount of output, most of which we do not need. Once we are in output
mode, we can eliminate which output we do not wish to see, and focus on
what we do want to see, by unchecking various output options – this (selecting
which output to see) is something that requires experimentation. The figure
below is a JMP screen capture of our Las Vegas Night analyses – all of these
outputs are standard with JMP.
The first thing I notice is that the average winnings are -$3.73. In other words,
I can expect to lose $3.73, which is quite similar to the median value of -$3.5.
I also notice a standard deviation of $2.49, and first and third quartiles of -$5.5
and -$2.06. The next thing I do is examine the histogram, which is essentially
the same as Figure 2.2. Here, the x-axis of the histogram was manipulated to
28
Patrick R. McMullen
A very nice feature of JMP is that it places a box plot on top of the
histogram – this permits us to the see the relationship between the two
graphical tools, providing us a better perspective of the data set. JMP offers
two box plot features not shown via other software packages. In the box plot,
you will notice a diamond-shaped figure. The part of the diamond where it is
at its maximum height is the mean value of the data set. The width of the
diamond shows us a 95% confidence interval, which we will talk about in
Chapter 5. The red bracket just above the box plot is referred to as the
“shortest half,” which displays the highest-density collection of 50% of the
observations in the data set.
2.5 Conclusions
None of the concepts in this chapter are particularly difficult – I state this
from both a mathematical and conceptual perspective. Despite the relative
simplicity of this subject matter, none of it should be discounted as
unimportant. Descriptive statistics is perhaps the most important subject
covered in this book. When we talk about a data set, we should ALWAYS
summarize the data in numerical/statistical form, and use graphical support,
as many people in business, particularly those without a numerical
background, place more value on the pictures than they do the numbers.
2.6 Exercises
For problems 1 – 6, use the "ExamScoresData” data set. For problems 7 – 12,
use the “WireGaugeData” data set.
29
Quantitative Methods for MBA Students
The “ExamScoresData” data set records scores for two exams taken by a group
of students. The students took Exam 1 prior to their taking Exam 2.
2. Using Microsoft Excel, create a combined histogram for each exam using
a bin range in increments of 2 exam points.
6. Using JMP, construct box plots for Exams 1 and 2. Do your findings
support your answer in Problem 4?
For the “WireGaugeData” data set, data was taken from two shifts in a
factory where the diameter of 22-gauge wire was measured. It is intended
for the wire to have a diameter of 0.64mm.
30
Patrick R. McMullen
10. Using your findings from Problems 8 and 9, talk about the distribution
for each shift.
11. Which shift does a better job in achieving the targeted diameter of
0.64mm?
12. Which shift is more consistent in terms of wire diameter?
31
Quantitative Methods for MBA Students
3 Probability
We encounter probability every day in our lives. We watch the news on
TV and learn that there will be a slight chance of rain tomorrow afternoon.
Sometimes, a probability of rain will even be given explicitly. If we watch news
about politics, we will even see political pundits estimating probabilities of
candidates being elected to various forms of political office.
The probability rules in the next section provide some tools that assist us
in understanding slightly more complex probabilistic issues.
32
Patrick R. McMullen
Independent events can happen at the same time – these events are
independent of other events. Let’s assume that the New York Yankees are
playing the Boston Red Sox tonight. Meanwhile, the Pittsburgh Pirates are
playing the Cincinnati Reds. It is possible that both the Yankees and the Pirates
win because they are not playing each other – they have other opponents.
The Yankees / Red Sox game and the Pirates / Reds games are independent of
each other.
33
Quantitative Methods for MBA Students
It should also be stated that another way of stating P(A or B) is to say P(AB),
which implies the “union” of A and B. As an example, there is 10% probability
that I will have pizza for dinner tonight, and a 15% probability that I will have
34
Patrick R. McMullen
pasta for dinner. Therefore, there is a 25% probability (10% + 15%) probability
that I will have either pizza or pasta for dinner tonight.
We can also state P(A and B) as P(A B), which we call the “intersection” of
events A and B.
35
Quantitative Methods for MBA Students
A B
Figure 3.1. Venn Diagram of Events A and B
3.3.6 Probability Rule 6
For independent events, the probability of either events A or B occurring
is as follows:
Again, the notation P(A or B) can also be shown as P(A B). To continue with
our example, the probability of either the Yankees or Pirates winning their
game is (0.60 + 0.55 – (0.60 * 0.55)), which is 0.82, or 82%.
36
Patrick R. McMullen
house resolution – voting on whether or not a bill should become a law. Two
factors of interest here are first, the party affiliation – this factor has two
levels: Republican and Democrat. The second factor is how the
representative voted on the resolution. This factor also has two levels: Yes
and No. Because each factor has two levels, we have four combinations. For
our example, we study House Resolution 1599, which was brought to the
House of Representatives on July 23, 2015. The name of the resolution is the
“Safe and Accurate Food Labeling Act.” The contingency table of this vote is
as follows:
Yes No Total
Republican 230 12 242
Democrat 45 138 183
Total 275 150 425
Table 3.1. House Vote on HR 1599
The first order of business is to determine whether or not the bill passed the
house. It did, by a 275 – 150 margin. I look at the column total for “Yes” and
I compare it to the column total for “No” to make this determination. I also
notice that 242 Republicans voted, while 183 Democrats voted. I make this
determination by looking at the row totals. Finally, I arrive at a value of 425
total votes – this value can be determined by adding the row totals or the
column totals. This value can also be determined by adding the values of the
four factor-level combinations (230 + 12 + 45 + 138). 425 is the total number
of observations, which we can refer to as n.
37
Quantitative Methods for MBA Students
Yes No Total
Republican 54.12% 2.82% 56.94%
Democrat 10.59% 32.47% 43.06%
Total 64.71% 35.29% 100%
Table 3.2. Relative Frequencies of HR 1599
From Table 3.2, we see that 56.94% of the votes were cast by Republicans, and
43.06% of the votes were cast by Democrats. We also see that 64.71% of the
vote was “Yes,” while 35.29% of the vote was “No.” The political party votes
will sum to 100%, while the “Yes” vs “No” votes will also sum to 100%. These
values are referred to as “marginal probabilities” because they are on the
“margins,” or edges of the contingency table. In essence, they tell us how each
of our factors are divided. We can state these marginal probabilities in
mathematical notation as well. For example, we can say the following about
Democrats: P(Democrat) = 43.06%.
Figure 3.2 also provides with “joint probabilities,” values showing us the
probability of specific combinations of our factors occurring. For example,
2.82% of Republicans voted No on HR 1599. Note that both factors are
considered for this joint probability. We can state this mathematically via the
following: P(Repubican and No) = 0.0282. The notation P(Repubican No) is
also reasonable notation.
In this particular notation, the “|” means “given,” so we are implying the
probability of A given B. Said as simply as possible, we’d like to find the
probability of even A occurring knowing that event B has already occurred.
For example, we can calculate the probability of a voter voting “No” with
the prior knowledge that they are a democrat. In the context of this example,
38
Patrick R. McMullen
The calculation for this would be 0.3247 / 0.4306 = 0.7541. In other words,
there is a 75.41% chance that a representative will vote “No” provided they
are a Democrat.
For this question, we employ 0.5412 / 0.6471, which results in 0.8364. In other
words, given that a representative votes “Yes,” the probability of them being
Republican is 83.64%.
39
Quantitative Methods for MBA Students
Outcomes =∏𝑚
𝑖=1 𝑛𝑖 , (Eq. 3-11)
Where ni is the number of levels for factor level i, and the upper case “pi”
symbol () means multiply, similar to how the upper “sigma” sign () means
to sum.
As an example, let’s consider pizza options. There are four factors: crust,
cheese, topping and size. Table 3.3 details these factors. For this example, it
is assumed that for each factor, exactly one level is permitted.
3.4.2 Combinations
In the previous paragraph, the word “combination” is used somewhat
informally. In this section, we use the word “combination” more formally.
Given we have n unique items in a set, and we wish to select a subset of size r
40
Patrick R. McMullen
from the set, the number of unique combinations we have is often referred to
as “n choose r.” Another way to state this is C(n, r) or nCr. For the remainder
of this book, the C(n, r) notation will be used. With these definitions now
known, the number of combinations can be determined as follows:
𝑛! (Eq. 3-12)
Combinations = C(n, r) =
𝑟!(𝑛−𝑟)!
3.4.3 Permutations
Combinations are insensitive to ordering. For our book example above,
we calculated how many possible combinations there are of size three from
ten total items. These 120 combinations are insensitive to any sort of order.
For example, books “A,” “B” and “C” are the same as books “C,” “A,” and “B.”
For permutations, however, we are sensitive to ordering. A permutation is a
unique ordering from a set of n items, when we take a subset of r items.
Notation for permutations is similar to that of combinations – we use nPr and
P(n, r) to mean the same thing. From a mathematical perspective, the number
of unique permutations for a subset of size r from a set of n items is as follows:
𝑛! (Eq. 3-14)
Permutations = P(n, r) = (𝑛−𝑟)!
41
Quantitative Methods for MBA Students
having forty digits. There is a 3-digit combination to this lock, and the three
digits must be unique. We have a set of size forty, and a subset of size three.
Using equation 3-14, we have 59,280 unique combinations. If you give the
matter serious thought, the term “combination lock” is inappropriate – a
“permutation lock” is more appropriate because ordering of the three digits is
important.
3.5 Conclusions
As stated earlier, combinatorics and counting are topics that usually do
not find coverage in introductory statistics books. In my opinion they are
important because they tell us how many outcomes are possible, which is the
“denominator” in probability calculations. Often this number is surprisingly
large, which in turn reduces probability. Having this information is valuable,
and can be helpful in any sort of organization have a better understanding of
the environment in which they operate.
3.6 Exercises
1. I roll a pair of dice. Generate a table showing all possible outcomes and
the probability of each outcome.
2. From the table above, what is the probability of rolling either a five or a
nine?
42
Patrick R. McMullen
3. Are the outcomes from Exercise 1 mutually exclusive? Why or why not?
7. Using your answers from problems 5 and 6, what is the probability you
draw a 10 of Aces? Does this answer make sense?
9. Tonight the Houston Astros play at the New York Yankees. Analysts give
the Yankees a 59% chance of winning. The Chicago Cubs play at the Pittsburgh
Pirates tonight as well. Analysts project Pittsburgh winning with a 55%
probability. Ties are not allowed – a team will either win or lose the game.
Using this information, what is the probability that both the Yankees and the
Cubs win tonight?
10. Using the information from Exercise 9, what is the probability that either
Houston or the Cubs win?
11. Using the information from Exercise 9, what is the probability that neither
the Yankees nor Pirates will win?
13. Using the information from Exercise 12, what percent of those surveyed
were men?
43
Quantitative Methods for MBA Students
14. Using the information from Exercise 12, what percent of those surveyed
were women that preferred Wal-Mart?
15. Using the information from Exercise 12, of all women, what percent
preferred Target?
16. Using the information from Exercise 12, what percent were men given
those that preferred Wal-Mart?
17. Can you draw any general conclusions from the information in Exercise
12?
18. Given the pizza data in Table 3.3, how many pizza toppings are possible if
up to two toppings were permitted? Assume double-toppings on any single
item (such as double pepperoni) is not permitted.
19. Given the pizza data in Table 3.3, how many pizza toppings are possible if
up to two toppings were permitted? Assume double-toppings on any single
item (such as double pepperoni) are permitted.
20. I have 15 photos. I have been asked to submit 4 of them for a family
photo album. How many photo combinations are possible?
21. I have to visit 10 cities exactly once. The city from which I start my tour
must the city in which I end my tour. How many different tours are possible?
44
Patrick R. McMullen
4. Random Variables
Probability and statistics can be described in many ways. Dealing with
uncertainty is one aspect of the science. There are things in nature of which
we never know with certainty. The closing daily stock price of a company, the
monthly sales of a specific chemical and legal issues can all be considered
sources of uncertainty. We refer to these types of things as random variables.
Random variables are entities having values that we don’t know with
certainty. Perhaps we have a general idea of their behavior, but we cannot
say with certainty what the outcome will be.
45
Quantitative Methods for MBA Students
where xi represents the value of outcome i. It is also worth noting that the
cumulative probability through the ith possible outcome P(xi) is simply the
following:
Using the above equations, the kid can expect to sell 3.06 newspapers
with a standard deviation of 1.22 newspapers. Figure 4.1 shows both the
probability distribution and cumulative probability distributions for the seven
possible outcomes. The cumulative probability distribution can be interpreted
by stating the associated outcome or less has a probability of occurring equal
to that particular cumulative probability value. For example, the cumulative
46
Patrick R. McMullen
probability value of 0.90 is associated with four bundles sold. This means that
there is a 0.90 probability that 4 bundles or less will be sold.
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6
47
Quantitative Methods for MBA Students
𝑛! (Eq. 4-4)
P(k) = (𝑛−𝑘)!𝑘! 𝑝𝑘 𝑞𝑛−𝑘
48
Patrick R. McMullen
0.2500
0.2000
Probability
0.1500
0.1000
0.0500
0.0000
0 1 2 3 4 5 6 7 8 9 10
Successful Hires
From Figure 4.2, we can see details of the probability of successful hires
from three years ago. From this table, cumulative and inverse cumulative
distribution functions can be determined to ask additional questions.
49
Quantitative Methods for MBA Students
Outcome (0 to 1)
Figure 4.3. Uniform Distribution of a Simple Random Number Generator
Note that Figure 4.3 shows all values between 0 and 1 to occur with equal
probability.
50
Patrick R. McMullen
distribution of how much sleep someone got last night, etc. To speak frankly,
the normal distribution is a powerful force in nature – the distribution of many
things follow a normal distribution. In plotting the normal distribution, the
independent variable is “z,” the number of standard deviations from the
mean. The dependent variable is f(z), the probability, or density function
associated with z. In more practical terms, “z” can be thought of as an
outcome, whereas f(z) can be thought of as the probability associated with
that particular outcome. The mathematical relationship between z and its
density function is as follows:
1 −𝑧 2 (Eq. 4-5)
𝑓(𝑧) = 𝑒 2
√2𝜋
51
Quantitative Methods for MBA Students
Normal Distribution
0.4500
0.4000
0.3500
0.3000
0.2500
f(z)
0.2000
0.1500
0.1000
0.0500
0.0000
z
The above formula provides us with a z-score, which can then be used to
determine probabilities.
This z-score can then be used to determine the area under the curve to
the “left” of z, or more specifically, the area under the curve between -∞ and
z, which is the integral of Eq. 4-5. Eq. 4-5 cannot be integrated by conventional
means, so a numerical method must be used. Fortunately, Excel has provided
a function for this. The “normsdist(z)” function provides the area under the
normal curve to the left of the z-score. Figure 4.5 illustrates this, where “p”
52
Patrick R. McMullen
Normal Distribution
0.4500
0.4000
0.3500
0.3000
0.2500
f(z)
0.2000
0.1500
0.1000
0.0500
p
0.0000
z 0
Conversely, we might also wish to know the exam score associated with
some specific percentile. There are tools available to help us with this
calculation. First of all, the “normsinv(p)” function returns the z-score
associated with the area under the curve equal to p, where the area is to the
left of the z-score. Once we have the z-score, we can used Eq. 4-6 to get the
actual value x associated with the z-score.
53
Quantitative Methods for MBA Students
x = + z (Eq. 4-7)
Figure 4.6 below shows this graphically. Panel (a) shows a histogram as
the result of sampling from a population with a sample size of n = 10. Panel
(b) shows the histogram as the result of sampling from the same population
with a sample size of n = 100. Panels (c) and (d) show the same, but with
sample sizes of 1,000 and 10,000. As one can see, increasing the sample size
better resembles the population, providing us with less “predictive error.”
54
0
1
2
3
4
5
7
6
0.5
1.5
2.5
0
1
2
0.170
0.170
0.176
0.176
0.182
0.182
0.188
0.188
0.194
0.194
0.200
0.200
0.206
0.206
0.212
0.212
0.218
0.218
0.224
0.224
0.230 0.230
0.236 0.236
n =10
0.242
n = 100
0.242
0.248 0.248
0.254 0.254
0.260 0.260
0.266 0.266
Figure 4.6a. Sample Size n = 10
55
Patrick R. McMullen
0.302 0.302
Quantitative Methods for MBA Students
n = 1,000
70
60
50
40
30
20
10
0
0.182
0.194
0.206
0.218
0.230
0.242
0.254
0.266
0.278
0.290
0.302
0.170
0.176
0.188
0.200
0.212
0.224
0.236
0.248
0.260
0.272
0.284
0.296
Figure 4.6c. Sample Size n = 1,000
n = 10,000
600
500
400
300
200
100
0
0.224
0.290
0.170
0.176
0.182
0.188
0.194
0.200
0.206
0.212
0.218
0.230
0.236
0.242
0.248
0.254
0.260
0.266
0.272
0.278
0.284
0.296
0.302
where s is the sample standard deviation, and n is the sample size. As one can
56
Patrick R. McMullen
𝑥̅ − 𝜇 (Eq. 4-9)
𝑡=
𝑠⁄√𝑛
This particular equation has no application at this point in the book, but it will
when we explore hypothesis testing. At this point, it is reasonable just to
understand that the t-statistic is the result of sampling, and its value simply
represents how many standard errors our sample mean is from the population
mean.
4.4 Conclusions
Despite the fact that many different types of probability distributions
were covered in this chapter, we have only scratched the surface in terms of
the multitude of probability that exist in nature. Fortunately, the normal
distribution is a powerful force in nature, and this particular distribution serves
us well for the remainder of this book.
4.5 Exercises
1. I roll a pair of dice. What is the expected value of the outcome?
2. I roll a pair of dice. What is the standard deviation associated with the
outcome?
57
Quantitative Methods for MBA Students
58
Patrick R. McMullen
5. Estimation
We have stated many times that statistics involves using sample data to
say things about the population. This will be said many more times before you
finish this book. This adage is especially true in this chapter. Here, we wish to
articulate a population characteristic based on sample data. Given that we
are saying something about the population with incomplete data, we are
taking a leap of faith, so to speak. In short, we are estimating a population
parameter. However, the confidence interval helps us articulate how
confident we are in our estimate of this particular population parameter.
5.1 Means
Estimating the population mean (), via sample data is the first order of
business. When we gather sample data, we know the sample size (n), and we
can calculate the sample mean (𝑥̅ ) and sample standard deviation (s). We can
use this data, along with the appropriate t-statistic to construct a confidence
interval, providing a lower bound (LB) and an upper bound (UB) of our
estimate of . When this boundary is established, we state a certain level of
confidence that the true population mean lies within the specified boundary.
The level of confidence we have in this interval, or boundary is 1 - , where
the value of is either given or assumed. The value of it typically referred
to as the level of significance and is an input into the size of the boundaries of
the interval. Mathematically, this interval is as follows:
In essence, Eq. 5-1 is telling us that we are (1 - ) confident that the true
population mean is between the lower and upper boundaries. We can re-
write Eq. 5-1 so that the statistics associated with our sample data are shown:
𝑠 𝑠
P(𝑥̅ − 𝑡𝛼/2 ≤ ≤ 𝑥̅ + 𝑡𝛼/2 ) = 1 – . (Eq. 5-2)
√𝑛 √𝑛
Here, the value of t/2 represents the distance, expressed via the number of
59
Quantitative Methods for MBA Students
standard errors, our boundary is from the population mean. This value is often
referred to as the “half-width.” Excel provides us this value via the following
function:
This particular value is also used in hypothesis testing for means, covered in
the next chapter.
For convenience, the lower and upper boundaries are often shown in the
following compressed notation:
𝑠 (Eq. 5-4)
𝑥̅ ± 𝑡𝛼/2
√𝑛
Here, the “±” symbol show that we are both subtracting and adding the
specified number of standard errors from and to our sample mean, 𝑥̅ .
60
Patrick R. McMullen
In words, this is stated as follows: we are 95% confident that our true
population mean salary increase is between $1603.93 and $1996.07.
5.2 Proportions
We can also estimate proportions via confidence intervals. In this
context, a proportion is the percent of a sample that meets some criterion.
Proportions are often used in understanding consumer preferences, as well as
in political science, where we try to gain a better understanding of where
candidates stand in terms of voters.
Term Explanation
̂
𝒑 Estimate of the population proportion
p Population Proportion
n Sample Size
z/2 Half-Width of Interval
Table 5.1. Values for Proportion Confidence Interval
(Eq. 5-7)
𝑝̂ (1 − 𝑝̂ )
𝑠𝑒 = √
𝑛
Using notation similar to that from Eq. 5-4, our confidence interval is as
follows:
(Eq. 5-8)
𝑝̂ (1 − 𝑝̂ )
𝑝̂ ± 𝑧𝛼/2 √
𝑛
61
Quantitative Methods for MBA Students
The similarities between Eq. 5-8 and Eq. 5-4 are pretty obvious, but there is
one major difference that needs to be discussed. The confidence interval for
means exploits a t-distribution because we are sampling from a population –
this seems consistent with what was discussed in Section 4.3. The confidence
interval for a proportion, however, exploits a z-distribution, despite the fact
that we are sampling from a population – this seems counter-intuitive.
Another way of saying the above is that there is a 99% probability that the true
proportion of consumers preferring Brand A over B is between 0.4968 and
0.5617.
62
Patrick R. McMullen
(Eq. 5-11)
𝑠12 𝑠22
𝑠𝑒 = √ +
𝑛1 𝑛2
The formula for the confidence interval of the difference between two
population means is as follows:
(Eq. 5-13)
𝑠12 𝑠22
(𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼/2 √ +
𝑛1 𝑛2
63
Quantitative Methods for MBA Students
Here, the t/2 value is the same as it was as stated in Eq. 5-3.
Using the given values and formulae, our confidence interval is as follows:
In other words, the difference in population means between the two classes
is between -4.75 and -0.25.
5.4 Conclusions
Generating confidence intervals is simply a way to estimate values
associated with a population. The more confident we need to be, the wider
the interval must be. As the same time, if we use as a large a sample size as
possible, the width of our intervals with narrow.
5.5 Exercises
1. The “NFLLinemen” data set contains data on the weight of randomly
sampled offensive lineman in the National Football League. Using = 0.01,
construct the appropriate confidence interval for the population mean.
64
Patrick R. McMullen
“PulseRate” file.
7. A food company just came up with two new products for diabetics:
Product X and Product Y. They are concerned about the amount of
carbohydrates in the product, as diabetics need to be careful about
carbohydrate intake. The file “DiabeticFoods” contains carbohydrate data (in
grams) for randomly selected items of the two different products. Construct
a 96% confidence interval for the difference in carbohydrates between the
two products.
65
Quantitative Methods for MBA Students
66
Patrick R. McMullen
6. Hypothesis Testing
In science or commerce, we are not permitted to make unsubstantiated
claims. For example, if I work for an electronics firm, I cannot tell my
colleagues and managers the following: “our Q55 electronic switches cannot
handle a 25-volt load.” What is wrong with this question? We have made a
claim without any evidence of formal scientific testing. In order to make a
reasonable claim outside of a pub, we need to support the claim with scientific
proof of the claim’s authenticity. To provide the scientific proof, we need to
conduct a formal hypothesis test.
This chapter provides us the tools to make hypothesis tests. We will test
hypotheses involving means, proportions and differences between means. In
other words, we extend the work from the previous chapter with the intent of
affirming a disproving a claim.
These tests take on three different types. The first type of test is
interested in whether or not a population parameter is equal to some specific
value vs. not equal to some value. The second type of test is concerned with
determining whether or not some population parameter is equal to some
value vs. less than some value. The third type of test is concerned with
whether or not some population parameter is equal to some value vs. greater
than some value. These three types of tests are formalized in the subsequent
sections.
67
Quantitative Methods for MBA Students
The null hypothesis test is a benign hypothesis and states the status quo
– it states what is typically regarded as true. Here is an example of a null
hypothesis: the average weight of a box of Cheerios is 14 ounces.
Mathematically, we would state this as follows:
Note that the population mean is involved in H0. This is the case because the
H0 is making a claim about the population. Also note that equality is involved
– the H0 will always involve equality, because we are claiming the status quo.
Finally, we should note that when making our conclusions about a hypothesis,
we always base our conclusion on the H0: we will either reject H0, or fail to
reject H0.
68
Patrick R. McMullen
69
Quantitative Methods for MBA Students
= 14
Figure 6-1. H0: = 14; HA: 14
The second type of HA is the “less than” hypothesis. In the context of our
Cheerios example, we would say the following:
For this example, an interested party might be consumer advocates who wish
to expose Cheerios for “skimping” on the amount of Cheerios they put into
their boxes, in the event that HA is true. Figure 6-2 graphically shows this
scenario, with the reject region on the left-hand side. Because we have reject
region on just one side of the distribution, we call this a “one-tail test.”
70
Patrick R. McMullen
= 14
Figure 6.2. H0: = 14; HA: < 14
For this particular scenario, senior executives at General Mills (the company
who makes Cheerios) might be interested because if HA were true, cereal
would be given away, which is an unforgivable offense in the eyes of senior
management and shareholders. Figure 6-3 shows this scenario, where you
will note the reject region is on the right hand side. Because we have reject
region on just one side of the distribution, we call this a “one-tail test.”
71
Quantitative Methods for MBA Students
= 14
Figure 6-3. H0: = 14; HA: > 14
72
Patrick R. McMullen
Each of these statements are fair and clearly written. Statement “a” should
result in HA being of the “” variety, because the dangers of putting too little
cereal into the box is just as bad as putting too much cereal into the box.
Statement “b” should result in HA being of the “<” variety because a direct
claim is made implying as much. In this case, the HA is explicitly stated, which
occasionally happens. Statement “c” is of the “>” variety because the dangers
of putting too much of the silver alloy is discussed. In summary, then, we have
the following hypotheses ready for testing:
There are two important rules to remember here. First, the H0 always involves
equality – writing H0 is easy. Second, the HA always involves inequality.
Another important thing to remember is that the HA is more interesting than
the benign H0. A final noteworthy item is that if you find the problem not
clearly written, where the “inequality” is vaguely worded, a “” HA is probably
the best choice.
For this section, we will use scenario “a” above as our example:
73
Quantitative Methods for MBA Students
Additionally for this example, we will use = 0.05, and the Cheerios data set,
which contains 25 observed weights of randomly-selected Cheerios boxes
claiming to be 14 ounces.
This critical value will be compared with the test-statistic (discussed next) to
determine whether or not H0 should be rejected.
There is one final comment that needs to made about the value of .
Students often ask what value of should be used. This is a good question. A
rule of thumb is to use = 0.05, but this varies across industries. The medical
industry often uses value of 0.01 or lower. Lower values of result in more
74
Patrick R. McMullen
conservative tests, while higher values of result in more liberal tests. What
this means is that conservative tests make it harder to reject H0, while more
liberal tests make it easier to reject H0. If seeking a more liberal test, however,
an value exceeding 0.10 is considered inappropriate. Never use an value
in excess of 0.10.
𝑥̅ − 𝜇 (Eq. 6-5)
𝑡=
𝑠⁄√𝑛
From the Cheerios data set, we have the following statistics: the sample mean
(𝑥̅ ) is 14.03, with a sample standard deviation (s) of 0.1087. The data set
contains n = 25 observations. We also have, of course, our hypothesized
population mean of = 14. Substituting these values into Eq. 6-5, we have
the following test-statistic:
This value of 1.51 tells us that our sample mean is 1.51 standard errors above
the hypothesized population mean. It is above the hypothesized population
mean because it has a positive sign – in other words, 𝑥̅ exceeds .
This test statistic is compared to the critical value, so that we can make a
decision regarding H0.
75
Quantitative Methods for MBA Students
1.51
-2.06 2.06
76
Patrick R. McMullen
0.1445
-1.51 1.51
Figure 6.5. The p-value Associated with the Example Hypothesis Test
Because this is a two-tailed test, we must make the shaded region associated
with the test-statistic two-tailed as well. In order for us to calculate the p-
value, we must use the tdist functions, which are detailed in Table 6.3.
You will note that the p-value is greater than the value of . This should
be the case because we failed to reject H0. If our test statistic found itself in
the reject region, we would reject H0, and our resultant p-value would be less
than our value of . If we already knew that we weren’t going to reject H0 in
this example, why did we calculate the p-value? The answer to this question
is because the p-value gives us more information – it tells us the “break-even”
value that could be used for – the point of indifference between rejecting
77
Quantitative Methods for MBA Students
and not rejecting H0. If p is less than , we reject H0. If p is greater than , we
fail to reject H0. Or more crudely put:
Comparisons between the p-value and are not intended to confuse the
student. I have noticed over the years that the adage above puts things in
proper perspective for students. The p-value is more important than ,
because the p-value can be compared to any threshold (). In fact, statistical
software packages don’t even ask for values of – they just return p-values.
Figure 6.6 shows JMP output for our example problem – you will only see a p-
value reported, nothing about the value of .
2
More specifically, the p-value is defined as the probability of obtaining a result more
extreme than what was observed, assuming H 0 is true. “More extreme” means further
into the reject region. It is my opinion it is reasonable to consider the p-value the
probability of H0 being true.
78
Patrick R. McMullen
section was a test concerning means. There are also tests concerning
proportions and differences between means. In this section, we will do
another hypothesis test involving means.
Since the t-statistic is more extreme than the critical value, we reject H0 and
claim a delivery time of less than 6 hours. Figure 6-7 shows the relationship
between the critical value and the test statistic.
79
Quantitative Methods for MBA Students
-2.43
The test statistic falls into the reject region. Using the tdist function to
calculate the p-value, we have the following:
Note that our p-value above is less than = 0.01, supporting our decision to
reject H0.
It should be noted that Eq. 6-8 takes the absolute value of the t-statistic
as an argument, because this function will return an error message if the t-
statistic is negative. This caveat is also noted in Table 6-3.
80
Patrick R. McMullen
for this is because for testing proportions, we always assume a large sample
size, rendering the t-distribution the same as the z-distribution. Because of
this, our critical values, given some value of are shown in Table 6-4.
Note that for the two-tailed test, our critical value takes on both +/- values.
As a reminder from our coverage of confidence intervals for proportions, “𝑝̂ ”
is our sample proportion, while “p” represents the true population proportion.
Given this, our test statistic is shown in Eq. 6-9
𝑝̂ − 𝑝 (Eq. 6-9)
𝑧=
√𝑝̂ (1 − 𝑝̂ )
𝑛
With the basic formulae in place, let’s explore an example problem. Hank
and Keenan are running for mayor of Louisville, Kentucky. The winner of the
election will receive a majority of the votes – more than 50% of the vote to be
specific. 352 people have been sampled, and 182 of them have voted for
Keenan. At the = 0.03 level, can we conclude that Keenan is the winner?
This problem essentially asks if Keenan has more than 50% of the vote,
which is our HA. As such, we have the following:
81
Quantitative Methods for MBA Students
The test statistic does not fall into the reject region, so we fail to reject the H0,
as we do not have sufficient evidence to claim Keenan as the winner. Figure
6-8 shows this graphically
0.64
1.88
In terms of the p-value for this type of test, we use the “normsdist”
function. Depending on the type of HA, we use the following functions:
82
Patrick R. McMullen
you watch the news regarding politics and elections, you will see lots of
polling. If you look at the details of the polling, you will sometimes see
something like a “margin of error of +/- 3.5%.” This “margin of error” that is
referred to is simply what we call the standard error, √𝑝̂ (1 − 𝑝̂ )⁄𝑛. Smaller
sample sizes result in larger “margins of error,” which should make sense given
what we have already learned about the central limit theorem.
Let us consider one other example. In the hot dog business, the USDA
places an upper limit on variety meats that can be put into hot dogs. “Variety
meats” are tongues, tails, snouts, etc. Hot dogs must have less than 5% variety
meats. A local hot dog maker decides to test whether or not they are in
compliance. They documented their last 200 batches, and found an average
of 1.5% variety meats in them. At the = 0.05 level, are they in compliance?
Clearly, our hypotheses are as follows: H0: p = 0.05; HA: p < 0.05. Using
normsinv(0.05), we have a critical value of -1.65 which defines the reject
region. Using Eq. 6-9, our test statistic is as follows:
This test statistic is very small, and is in an extreme part of the reject region.
When normsdist(-4.07) is used to find a p-value, a very small number may
appear. In my case the number displayed was “2.33E-05.” This means
(2.33)(10)-5, which a very small number. When this happens, we simply state
that p < 0.0001.
83
Quantitative Methods for MBA Students
relevant terms.
When we test from two different populations for equality of means, we can
construct the following hypotheses:
For a specific different in means, we can use the following, where “d”
represents the value of the specific difference:
Both sets of hypotheses above are two-tailed tests, but one-tailed tests are
also possible, where HA: is of the “<” or “>” variety, although the one-tailed
tests are not as common as the two-tailed tests. As with other tests, we are
given or we assume a value for . Table 6.6 shows the t.inv functions we use
to determine critical values for reject regions:
84
Patrick R. McMullen
2 (Eq. 6-15)
𝑠2 𝑠2
( 1 + 2)
𝑛1 𝑛2
𝑑𝑓 = 2 2
1 𝑠2 1 𝑠2
(𝑛 − 1) (𝑛1 ) + (𝑛 − 1) (𝑛2 )
1 1 2 2
The general test statistic for the difference between means is as follows:
If we are testing for equality in means, Eq. 6-16 simplifies to the following,
because equality of population means implies 1 = 2:
The-p-values for these types of tests are similar to that shown in Table 6-
3. They are modified here to show degrees of freedom specific to this problem
type:
85
Quantitative Methods for MBA Students
Let’s do an example using the data set “ExamScores.” This data set
compares the scores from two different exams administered at a public
university with very large classes. Exam 1 was taken before Exam 2. We are
interested in seeing if the performances on the two exams were the same at
the = 0.05 level. We have the following hypotheses:
The following descriptive statistics are captured from the data set:
86
Patrick R. McMullen
We will use = 0.03. Our data set “Time400m” yields the following statistics:
The degrees of freedom, as determined via Eq. 6-15 is 22.65. Our critical value
is t.inv.2t(1 – 0.03, 22.65), which is +/- 2.32. Our test statistic, as determined
via Eq. 6-18 is as follows:
Given that our test statistic is more extreme than our critical value, we reject
H0 and claim that Alberto is in fact more than one second faster than
Guillermo. As stated in the structure of the actual test statistic, Guillermo is
more than one second slower than Alberto. The p-value is tdist(2.32, 22.65,
2) = 0.0189.
3
Here, Guillermo’s hypothesized slower time is subtracting Alberto’s hypothesized
faster time so as to have a positive value of the test statistic, simplifying understanding
We are than essentially saying that Guillermo is more than one second slower than
Alberto. Also note that our value of d = 1, the hypothesized difference in average
time.
87
Quantitative Methods for MBA Students
where we are 1 - confident the true population lies. This action is very
similar to a two-tailed test. In fact, if we construct a confidence interval using
some value of , and that interval contains the hypothesized population
parameter (such as ), we know that we fail to reject H0 at that particular value
of . Conversely, if the interval does not contain the population parameter,
we know to reject H0 at that specific value of .
One will notice that the hypothesized population mean of 14 is included in this
interval, telling us that H0 is not to be rejected. Conversely, if the hypothesized
population mean was not in this interval, we would reject H0.
6.6 Conclusions
It is my opinion that hypothesis testing is the most difficult part of
learning statistics. It is also my belief that it is the most important part of
statistics. Making a proclamation to the world requires a formal test so as to
support the proclamation – at least this is true in the scientific community,
which is an important community in society. As such, it is imperative that we
understand hypothesis testing.
While the topic may be difficult, it is not unreasonably difficult. The most
important part is to understand and visualize the H0 and HA. Once this is
accomplished, the rest falls into place. Of course, practice is needed so that
the proper formulae and Excel functions are correctly used in the pursuit of
the problems.
88
Patrick R. McMullen
6.7 Exercises
For problems involving actual hypothesis testing of data, the following steps
are required:
3. Mike Trout of the Anaheim Angels thinks his batting average exceeds
0.320. What are the appropriate hypotheses associated with this claim?
7. A courier service advertises that its average delivery time is less than 6
hours for local deliveries. A random sample of times for 12 deliveries to an
address across town was recorded. These data are shown below. Is this
sufficient evidence to support the courier’s advertisement at the 5% level of
89
Quantitative Methods for MBA Students
significance? Delivery time data: 3.03, 6.33, 6.50, 5.22, 3.56, 6.76, 7.98, 4.82,
7.96, 4.54, 5.09, 6.46
8. Mike’s Bikes in Columbus, Georgia sells a great many road bikes. One of
the things that causes Mike Reynolds (the store’s proprietor) heartburn is
customers coming back in with tire problems. If too much air is put into the
tires, blowouts can occur, which are dangerous. In addition to blowouts, the
inner tubes must be replaced, which is difficult. If too little air is put into the
tires, poor bike performance results, specifically frequent tire replacement.
Mike is wondering if his mechanics are putting the correct amount of air in the
tires. The correct amount of air needed for the tires is 115 psi (pounds of air
pressure per square inch). Mike randomly sampled 100 inflated tires, and
discovered the mean tire pressure to be 113.58 psi, with a standard deviation
of 8.1 psi. At the α = 0.08 level, what can we conclude about the tire pressure
at Mike’s?
10. Individuals filing Federal Income Tax returns prior to March 31 had an
average refund of $1,056. Consider the population of “last minute” filers who
mail their returns during the last five days of the income tax period (April 10-
15). A sample of 400 late filers was collected, and it was determined that their
average refund was $910 with a standard deviation of $1,600. Do late filers
receive a refund less than $1,056? Use α = 0.05.
90
Patrick R. McMullen
11. Ten years ago, the A.C. Nielson service claimed that on average, an
American household watched 6.70 hours of television per day. An
independent market research group believes that more television is watched
now. To test this claim, 200 households were surveyed, and they found a mean
of 7.25 hours of television are watched now, with a standard deviation of 2.5
hours. Has the amount of television viewing increased over the last ten years?
Use α = 0.02.
15. Use the “DiabeticFoods” data set, and perform the appropriate
hypotheses.
16. Use the “FinanceProfessors” data set, and perform the appropriate
hypotheses.
17. Use the “NFLLinemen” data set, and perform the appropriate
hypotheses.
91
Quantitative Methods for MBA Students
18. Use the “PulseRate” data set, and perform the appropriate hypotheses.
19. Use the data from Problem 5-4. Do a majority of Oregonians support
medicinal marijuana?
20. Use the data from Problem 5-3. Do a majority of North Carolinians
consider Wake Forest University the best school on the state?
92
Patrick R. McMullen
This question is asking whether or not defects per million units produced is
related to the shift. If defects are related to shift, we can conclude that shift
does have an effect on defects. Otherwise, we must conclude that shift does
not have an effect on shift.
93
Quantitative Methods for MBA Students
The reason this tool is called Analysis of Variance (“ANOVA” for short) is
because at the heart of our analysis, we are comparing two variations. The F-
statistic does this via a ratio of two measures of variation. Before presenting
this statistic, we need to define a few terms.
Term Definition
a number of groups, or populations
ni sample size of group i (i = 1, 2, …, a)
̅𝒊
𝒙 sample mean of group i
𝒔𝒊 sample standard deviation of group i
𝒙𝒊𝒕 the ith observation of group t
̿
𝒙 mean of all sample data
Table 7.1. Terms Used for ANOVA
The values in Table 7.1 are used to determine the F-statistic, which is the ratio
of two variation measures. In its most general form, the F-statistic looks like
this:
What this statistic tells us is that F measures how much “distance” there is
between groups compared to how much variation each group has on its own.
If this ratio is large, we can claim that group means are in fact statistically
different (reject H0). Otherwise, we cannot claim a significant difference
between groups (fail to reject H0). In more specific terms, our F-statistic looks
like this:
94
Patrick R. McMullen
accompanies the F-statistic, and like with other p-values we’ve pursued, a
smaller p-value implies we should reject the H0.
Consider a scenario where the exam score for the Charlotte class is about 86,
and the average exam score for the Winston-Salem class is about 85. The
standard deviation for both classes is about 5.5. The F-statistic for this
scenario is 0.07 (and an associated p-value of 0.7971), with a between group
variation of 39.20, and a within group variation of 25.8. Clearly, there is no
significant difference between the groups. Figure 7-1 supports this via a
combined histogram for the two classes.
95
Quantitative Methods for MBA Students
30
25
20
15
10
0
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Charlotte Winston-Salem
From a graphical standpoint, Figure 7.1 shows no separation between the two
groups. As such, we are unable to claim any differences between the groups.
Now we consider a scenario where the means of the two groups are the
same as they were before, but this time, the sample standard deviations are
about 4 for each group. This results in an F-statistic of 2.80, with a between
group variation of 45.00 and a within group variation of 16.05. The F-statistic
has an associated p-value of about 0.0947. Figure 7-2 shows the combined
histogram of this scenario.
96
Patrick R. McMullen
30
25
20
15
10
0
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Charlotte Winston-Salem
Here, we may notice that there is some evidence that the Charlotte class has
done slightly better than the Winston-Salem class, because there is more
“blue” to the right, and more “red” to the right. Nevertheless, our evidence is
less than compelling.
97
Quantitative Methods for MBA Students
120
100
80
60
40
20
0
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Charlotte Winston-Salem
98
Patrick R. McMullen
The first part of Figure 7-2 shows the descriptive statistics for each of the three
shifts. The average number of defects per shift is of greatest interest to us.
You will notice that the defects per million for the three shifts ranges from
around 44 to about 49. The ANOVA table below shows the F-statistic (F = 1.69)
and its associated p-value. You will see that the p-value is high (0.1871), which
prevents us from saying the defects per million units made are essentially the
same for each shift. Because of this, I am reluctant to present a combined
histogram of defects/million for each shift, as we’d be unable to show any
99
Quantitative Methods for MBA Students
distinction between the histograms. For this problem, we can conclude that
shift does not have an effect on defects. Incidentally, the remainder of the
ANOVA table simply shows the derivation of the F-statistic.
As another example, let us look at the useful life of tools coated with
three different metals: nickel, phosphorus and zinc. We wish to study the
useful tool life for each group of tools.
120
100
80
60
40
20
0
39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89
From the combined histogram in Figure 7.4, it appears that phosphorus gives
us the longest tool life, followed by zinc, with nickel providing us with the
shortest tool life. However, we cannot make this statement in any official
capacity until we perform the appropriate ANOVA test. Using Excel in a similar
fashion to the last example, we have the following ANOVA output, as shown
in Table 7.3.
100
Patrick R. McMullen
As one can see from Figures 7.4 and Table7.3, there is strong evidence for us
to claim that the three metal treatments do not result in similar useful tool
lives, with F = 6352.38 and a corresponding p-value < 0.0001. Despite the p-
value showing "0" above, we can never claim a p-value equal to 0, so we say
that is it < 0.0001. The reason for this is that we are using sample data from a
larger population, and the p-value will always have value in excess of 0. For
this problem, we can conclude that the metal treatment does have an effect
on tool life.
This question has two factors (Gender and Day), where Gender has two levels
(Male and Female), and Day of Week has seven levels (each day of the week).
4
Excel displays the p-value here as “0,” but it should be “< 0.0001.” Excel does this
when confronted with p-values less than 10-128.
101
Quantitative Methods for MBA Students
7.4 Conclusions
ANOVA determines whether or not some factor has an effect on the value
of some other variable. This factor can be considered a categorical variable –
a variable we cannot precisely measure numerically. Certainly we can assign
a number to a shift, but this is only a relative to other shifts. We cannot
measure specific metal treatments, gender of a salesperson or a specific day
of the week. Nevertheless, we are exploring a possible relationship between
a categorical variable and a numerical variable. When we get to the chapter
on Simple Linear Regression, our categorical variable will be replaced with a
numerical variable. Because of this, there is a strong relationship between
Single-Factor ANOVA and Simple Linear Regression. Chapter 9 will pursue this
further.
7.5 Exercises
1. I am interested in pursuing the relationship between temperature and
the time of day. What is the categorical variable, and what is the numerical
variable? How many levels does the categorical variable have?
2. Use the “GenderSalesData” file for this problem. At the = 0.05 level,
does Gender have an effect on sales volume? Report the mean sales for each
Gender and justify your response accordingly.
3. Use the “GenderSalesData” file for this problem. At the = 0.07 level,
does Day of the week have an effect on sales volume? Report the mean sales
for each Day and justify your response accordingly.
4. Use the “ServerComplaints” data file for this problem. At the = 0.03
level, does server have an effect on the percent of restaurant customers who
complain about the service? Report the mean complaint percentage for each
server and justify your response accordingly.
102
Patrick R. McMullen
5. Use the “ExamDay” data file for this problem. Professor Fraud teaches
statistics on Monday, Wednesday and Friday. When he gives his students an
exam, it will be on one of these days. The data set shows randomly selected
exam scores for students taking exams on these days. At the = 0.05 level,
does the day of the exam have an effect on the exam score? Present the
means for the exam days, and use this to justify your result.
6. Use the “WhisperingPines” data set for this question. Arnold, Jack and
Tiger are reputable golfers, and Whispering Pines is a famous golf course, with
a rich history. There are four Par 5s on the golf course. On a par 5, a
professional golfer is expected to get their ball on the green in three shots.
However, the best golfers try to get their ball on the green in two shots, so as
to increase their probability for a good score. If a golfer gets their ball on the
green of a par 5 in two shots, they have been successful at this challenge. The
data set includes randomly-selected success rates for Arnold, Jack and Tiger
for some of their rounds of golf played at par 5 holes at the Whispering Pines
golf course. Using = 0.03, are the success rates unique for the three golfers?
8. For this question, use the “HousePricesFourCities” data set. There are
four cities involved here: Indianapolis, Boston, Rochester and San Diego. For
each city, randomly-selected real-estate transaction prices have been shown.
At the = 0.01 level, can we say that the city has an effect on the transaction
price?
9. Using the same data set as above, and = 0.01, can we claim a difference
in transaction price between San Diego and Boston?
103
Quantitative Methods for MBA Students
8. Chi-Square Testing
In our study of statistics, there are often times when we are given a
distribution of data and we need to “tell the world” the type of data
distribution we have. One way to do this is by inspection. For example, if our
histogram shows a peak in the middle of the data, the histogram is symmetric,
and low frequencies of outcomes lie in the tails, a normal distribution is a good
guess. Unfortunately, it is not always that easy.
In this chapter, we will cover two types of chi-square tests. The first is
called a “goodness of fit” test to see if a given data distribution matches that
of an expected distribution. The second type of test is called a “test of
independence,” where we determine if the variables associated with a
marginal probability table are independent of each other.
H0: the given distribution fits the expected distribution (Eq. 8-1)
HA: the given distribution does not fit the expected
distribution
As is the case with ANOVA, these hypotheses are always the case, and the
need to formally state these hypotheses is not as necessary as it is for the tests
we performed in Chapter 6.
104
Patrick R. McMullen
outcomes is the observed distribution and the other set of outcomes is the
expected distribution. With this stated, assuming we have n possible
outcomes, we compare the observed frequencies (fo) with the expected
frequencies, and derive our chi-square test-statistic as follows:
2 (Eq. 8-2)
𝑛
(𝑓𝑜 (𝑖) − 𝑓𝑒 (𝑖) )
𝜒2 = ∑ ( )
𝑓𝑒(𝑖)
𝑖=1
This statistic takes the squared difference between the observed and expected
frequencies for each outcome and then divides by the expected frequency for
standardization purposes. The summation accounts for all possible outcomes.
If this value exceeds some critical value, we reject H0 and claim a difference
between observed and expected distributions. Table 8-1 shows the critical
value and p-values associated with a chi-square test for a level of significance
= and n possible outcomes.
Function Definition
chiinv(, n-1) critical value of a chi-square test
2 p-value of a chi-square test
chidist( , n-1)
Table 8-1. Excel Functions for Chi-Square Test
A good example of this is to roll a single die 100 times and study the
frequency of the six possible outcomes. Given that we expect each outcome
to occur with a 1/6 probability, each outcome over 100 trials will be 100/6,
which is 16.66. Table 8-2 shows the observed frequencies, compared to the
expected frequencies when 100 rolls were simulated. The table also shows
the by-outcome detail of the chi-square statistic shown in Eq. 8-2.
105
Quantitative Methods for MBA Students
Outcome 1 2 3 4 5 6
Observed 21 13 9 21 17 19
Expected 16.67 16.67 16.67 16.67 16.67 16.67
(fo-fe)2/fe 1.13 0.81 3.53 1.13 0.01 0.33
Table 8-2. Chi-Square Test for Single Die Roll
When we sum the bottom row of this table, we have a 2 statistic of 6.92.
Using a = 0.05 level of significance, we have a critical value of 11.07. Because
our 2 test statistic of 6.92 does not exceed our critical value of 110.07, we
cannot reject the H0. As such, we conclude the observed distribution is the
same as the given distribution, with a p-value of 0.2267. In layman’s terms,
we can claim the single die “is fair.” Figure 8-1 below shows the results via a
histogram – notice the small difference in the height of each bar.
Die Roll
Observed Expected
25
20
Frequency
15
10
5
0
1 2 3 4 5 6
Outcome
We can also test a data set of observed data to see if they follow a normal
distribution. We learned in Chapter 4 that a normal distribution is a
continuous distribution (one with infinite possible outcomes), which doesn’t
coordinate well with the discrete nature of chi-square testing. Nevertheless,
we can use the “norm.dist” function in Excel to transform the continuous
nature of a normal distribution to one with a discrete nature. The “norm.dist”
function takes on the following form:
106
Patrick R. McMullen
This function returns the normal density function associated with the given
arguments: is the hypothesized population mean; is the standard
deviation of the normal distribution; xi is a specific outcome, and “false”
request the density function value associated with these arguments, as
opposed to the cumulative density function. We can convert this density
function to a probability – specifically the probability of outcome i occurring
as follows:
At this point, we are able to conduct our chi-square test because we have
values of fo and fe for all possible outcomes.
107
Quantitative Methods for MBA Students
Figure 8-2 shows a combined histogram for Chi-Square test for normality.
108
Patrick R. McMullen
80
60
Frequency
40
20
0
19.4
20.9
19.5
19.6
19.7
19.8
19.9
20
20.1
20.2
20.3
20.4
20.5
20.6
20.7
20.8
Weight
We can also test whether or not some given distribution is binomial. This
is easier than testing for normality, because the binomial distribution is
discrete by nature, and making an approximation from continuous to discrete
is not needed here. Given a data set of observed frequencies (fo), we can
estimate the probability of success (p), and then use the binomial distribution
to determine fe values for our chi-square test.
On-Time 0 1 2 3 4 5
Frequency (fo) 22 26 34 11 5 2
Table 8.4. Summary of Patients Seen On-Time
109
Quantitative Methods for MBA Students
need to estimate this value. We can do this by first determining the number
of patients seen on time: 0(22) + 1(26) + 2(34) + 3(11) + 4(5) + 5(2), which is
157. Since there was 500 total patients seen (5 patients/day over 100 days),
the probability of a patient being seen on time is 157 / 500, which is 0.314.
This value is used in the “binom.dist” function, which will provide us with
probabilities associated with the number of “successes” from 0 to 5 patients
being seen on time. For a reminder of the binom.dist function, please refer to
Section 4.1.2. Using this function, along with multiplying each of these
binomial probabilities by 100 days, we have the following:
Successes 0 1 2 3 4 5
fo 22 26 34 11 5 2
Binomial Prob. 0.15 0.35 0.32 0.15 0.03 0.00
fe 15.19 34.77 31.83 14.57 3.33 0.31
(fe-fo)2/fe 3.05 2.21 0.15 0.87 0.83 9.41
Table 8.5. Chi-Square Test Results for On-Time Visits
The values in the bottom row of Table 8.5 result in a 2 statistic of 16.53, with
an associated p-value of 0.0055. At any credible value of , we would reject
H0, and claim that the binomial distribution does not describe the given data
set. Figure 8.3 shows the non-matching histograms, supporting our decision
to reject H0.
110
Patrick R. McMullen
40
Frequency
30
20
10
0
0 1 2 3 4 5
On-Time Visits
111
Quantitative Methods for MBA Students
Yes No Total
Republican 242 (56.94%)
Democrat 183 (43.06%)
Total 275 (64.71%) 150 (35.29%) 425
Table 8.6. House Vote Template on HR 1599
Because I know that 64.71% voted “Yes,” and 56.94% are Republicans, I expect
that (64.71%*56.94%*425) 156.59 of the 425 voters are Republicans that
voted “Yes.”
Using Eq. 8-7, we can compute the remaining values of Table 8.6 as follows:
Yes No
Republican 156.59 85.41
Democrat 118.41 64.59
Table 8.7. “Expected” House Vote on HR 1599
The values in the four “filled” cells of Table 8.7 are based on expectation only
– they are independent of political party and/or the actual vote – expected
values. The observed values (fo) are the original values from Table 3.1
Yes No
Republican 230 12
Democrat 45 138
Table 8.8. “Observed” House Vote on HR 1599
Revising our formula from Eq 8-2 to determine the test statistic for a chi-
square test for independence, we have the following;
2 (Eq. 8-8)
𝑟𝑜𝑤𝑠 𝑐𝑜𝑙𝑠
(𝑓𝑜(𝑖𝑗) − 𝑓𝑒(𝑖𝑗) )
𝜒2 = ∑ ∑
𝑓𝑒(𝑖𝑗)
𝑖=1 𝑗=1
112
Patrick R. McMullen
Using Eq. 8-8, we have the following values that need to be summed:
Yes No
Republican 34.42 63.10
Democrat 45.51 83.44
Table 8.9. Calculation for House Vote on HR 1599
2
These values sum to 226.47. A chi-square test for independence has degrees
of freedom determined as follows:
Our test, then, as (2 - 1)(2 - 1) = 1 degree of freedom. This yields a p-value <
0.0001, strongly suggesting that our two entities are not independent of each
other. In other words, political party and the actual vote are related to each
other, which given our political climate of strong partisanship, this should not
be a surprise.
113
Quantitative Methods for MBA Students
Table 8-12 shows the components of the chi-square test statistic for
independence.
The 2 test statistic sums to 10.08. This test has (3 - 1)(4 - 1) = 6 degrees of
freedom, resulting in a p-value of 0.1215. As such we fail to reject the H0 and
we claim that the entities of the student’s opinion and class are independent
– they are not related.
8.4 Conclusions
The chi-square test is a powerful, yet simple statistical test. Because of
its simplicity, it can be used to make a variety of general determinations. The
general intent of this test is to compare the observed data we are provided to
the outcomes we expect. When we do this, we can address many important
issues of a statistical nature.
In the next chapter, we cover simple linear regression – one of the more
important concepts in statistics. With careful data preparation, we can
replace regression hypothesis testing with a chi-square test. The motivation
to do this is to simplify the process. This is only mentioned as an example of
the value of the chi-square test. The power of the chi-square test is its
simplicity.
Because of its simplicity and versatility, the chi-square test has the
114
Patrick R. McMullen
8.5 Exercises
1. I flip a coin 103 times, and tails is the result 57 times. At the = 0.05
level, is the coin fair?
2. I roll a pair of dice 1000 times, and the following outcomes result. At the
= 0.05 level, are the dice fair?
Outcome 2 3 4 5 6
Freq. 25 56 83 112 145
Outcome 7 8 9 10 11 12
Freq. 172 136 108 86 51 26
Use the following data for Problems 3 – 7. I flip a coin 10 times, and I tally the
number times I end up with tails. I repeat this experiment 1,000 times, and
end up with the following frequency for tails.
Tails 0 1 2 3 4 5
Freq. 1 5 47 115 207 248
Tails 5 6 7 8 9 10
Freq. 248 202 120 41 12 2
115
Quantitative Methods for MBA Students
9. Consider a large university with seven colleges. Each year, the number of
cheating cases for each college are published by the university. The relevant
data is as follows:
10. People were randomly selected across the US to ask if they would like to
see gun laws made more restrictive. Specifically, the question was “would you
like to see gun laws in the US made more restrictive?” Both men and women
were asked the question and the results are below:
Yes No
116
Patrick R. McMullen
Men 74 51
Women 80 32
Using = 0.03, are gender and the response to the question independent?
11. Several Americans were randomly selected and asked if they approved of
the US Supreme Court’s decision to permit same-sex marriage. The
respondents were broken into political party affiliation. The data is shown
below:
Using = 0.06, are party affiliation and the approval to same-sex marriage
independent?
117
Quantitative Methods for MBA Students
y = a + bx (Eq. 9-1)
The intercept, represented by “a” is the value of the y-value when “x”
118
Patrick R. McMullen
equals zero. Graphically, this is where the line crosses the vertical axis (y-axis).
The slope, represented by “b” is the ratio of the change in y to each unit
change in x. Mathematically, this is often stated as follows:
b = y / x (Eq. 9-2)
y = 3 + 2x
16
14
12
10
8 y
6
x
4
2
0
-4 -2 -2 0 2 4 6 8 10
This example line has an intercept of “3” – notice how the line crosses the
vertical axis at y = 3 (y = 3 + 2(0) = 3). This example line has a slope of “2.” The
y / x ratio is equal to “2” – this is detailed in the “wedged” section of Figure
9.1.
119
Quantitative Methods for MBA Students
Pr. 3.38 3.42 3.48 3.52 3.55 3.55 3.61 3.61 3.67 3.72
Dem 83 86 83 81 81 83 81 81 80 80
Table 9.1. Price vs. Demand Data for Diet Dr. Pepper
87
86
85
84
83
82
81
80
79
3.35 3.4 3.45 3.5 3.55 3.6 3.65 3.7 3.75
Figure 9.2. Scatter Plot for Diet Dr. Pepper Demand Data
Here we see points, showing the demand for each price, as the result of our
random sampling effort. As one might expect, there seems to be an inverse
relationship between price and demand – as price increases, demand seems
to increase.
In order to find the “best” line relating price to demand, we select a slope
and intercept that minimizes the sum of squared errors. Figure 9.2 also shows
a fitted line. This fitted line is the result of selecting a slope and intercept that
minimizes the total squared difference between the actual data points and the
fitted line. These error terms are shown via the thin vertical lines spanning
the fitted line and the actual data values. We square these error terms for
120
Patrick R. McMullen
The next section will detail how to perform ordinary least squares
regression in Microsoft Excel, but for our data set, we have a slope of -$14.13
and an intercept of 132.05 units of Diet Dr. Pepper. What this means is that
when the price is zero, we can expect a demand of 132.05 units. This result,
of course, should not be taken literally, but the intercept is where the line
crosses the vertical (“y”) axis. The slope tells is that for each $1 increase in the
price, we can expect demand to decrease 14.13 units. The negative sign tells
us it is a decrease.
The table shows the slope and intercept values that we learned earlier. The
121
Quantitative Methods for MBA Students
Y = + X (Eq. 9-3)
Here, the “x” and “y” terms are capitalized to imply data from the entire
population. This equation shows “,” which represents the intercept for the
population5, while “ ” represents the slope for the population. We need to
test the slope and intercept for statistical significance. These hypotheses are
as follows:
Eq. 9-4 is a test for the significance of the intercept, while Eq. 9-5 is a test for
the significance of the slope. The test for the intercept is not of importance
here – the intercept is important in terms of minimizing the sum of squared
error value, but for this book, the statistical significance of the intercept is not
a matter of concern. What is important, however, is the test for the statistical
significance of the slope. In Eq. 9-5, the H0 states that the slope is equal to
zero. This means that as X changes, Y does not change – Y is not sensitive to
X. The HA says that the slope is not equal to zero, meaning that as X changes,
Y changes as well – Y is sensitive to X.
These tests for the significance of the slope and intercept are always as
presented in these two equations. They are always two-tailed tests. Because
of this consistency, we never have to formally state them – they are always
5
Not to be confused with the level of significance as covered previously. These two
entities are independent from each other.
122
Patrick R. McMullen
the same. This provides Excel, or whatever software is being used to present
us the appropriate statistics to interpret. We have to, of course, compare
these hypothesized values to the estimated values, which we know to be “a”
for the intercept and “b” for the slope. The test statistic for the intercept is as
follows:
The results of these two-tailed tests are also shown in Table 9.2. When
the estimates of the slope and intercept are divided by their standard errors,
we have the appropriate t-statistics and p-values. You will notice that the p-
value associated with the intercept is < 0.0001. Here, we reject the H0, and
claim the intercept is statistically significant. The t-statistic associated with
the slope term is -3.90, with an associated p-value of 0.0045. This tells us that
the relationship between Demand and Price is statistically significant – there
is a meaningful relationship between price and demand.
123
Quantitative Methods for MBA Students
There is one other statistic worth discussing, which is the R2 value. The
value of R2 tells us the percent of variation in our “y” variable that is explained
by our “x” variable. This measure of R2 is essentially a measure of the model’s
predictive ability. The R2 value can be as low as zero, and as high as one. We
want our R2 value to be as high as possible, as that would maximize the
percent variation in “y” that is explained by “x.” For our example problem, the
R2 value is 0.6553. This value is also included with our Excel output. This
means that 65.53% of the variation in Demand is explained by Price. This
implies, then, that 34.47% of the variation in Demand has not been explained.
As such, when using this model for prediction purposes, we should temper our
expectations on the quality of our predictive ability, because much of the
variation in demand is explained by entities other than price.
9.5 Conclusions
As stated at the beginning of this chapter, linear regression is the study
of relationships among variables. This means that our paramount interest is
to see if two variables are related. If they are related – meaning we reject the
H0 regarding a slope of zero – then, and one then, can we pursue the topic of
prediction. In short, significance is more important than prediction.
124
Patrick R. McMullen
9.6 Exercises
1. Earlier in this chapter, while talking about the Diet Dr. Pepper example, I
mentioned that if we increased the sample size, we would decrease the p-
value associated with the slope term. Why is this the case?
2. Let us revisit the “ExamScores” data set that we used in Chapter 2. This
data set has two columns: the “Exam 1” column is the Exam 1 score for a
specific student. The “Exam 2” column is the Exam 2 score for the same
student. Explain how simple linear regression could be used as a tool to study
the student’s propensity for success.
3. Using the “ExamScores” data set, are Exam 1 scores and Exam 2 scores
related?
4. Using the “ExamScores” data set, report the R2 term with the Exam 1
score as the independent variable and the Exam 2 score as the dependent
variable.
5. Using the “ExamScores” data set, report the R2 term with the Exam 2
score as the independent variable and the Exam 1 score as the dependent
variable.
6. How do you reconcile your results from the above two questions?
7. Using the “ExamScores” data set, what score would you expect a student
to get on Exam 2 if they got an 83 on Exam 1?
125
Quantitative Methods for MBA Students
15. Using the “BaseballSalaries2014” data set, describe the predictive ability
of your model.
16. Look at the two regression plots below. For each plot, the ordinary least
squares regression line is fitted through the data. Of the two plots, which one
has the higher R2 value. Justify your response.
126
Patrick R. McMullen
$1,140.00
$1,130.00
$1,120.00
$1,110.00
$1,100.00
$11.83 $12.83 $13.83 $14.83 $15.83 $16.83
x
$1,180.00
$1,160.00
y
$1,140.00
$1,120.00
$1,100.00
$11.83 $12.83 $13.83 $14.83 $15.83 $16.83
x
127
Quantitative Methods for MBA Students
Let us continue our Diet Dr. Pepper example from the Simple Linear
Regression chapter. This time, however, we will add a predictor variable
“Advertising,” which represents the number of signs advertising the Diet Dr.
6
To “paint” the necessary predictor variables, they must be in adjacent columns.
Otherwise, Excel cannot process them. Columns of predictor variables in non-
adjacent columns will not work in Excel.
128
Patrick R. McMullen
Pepper. The advertising is done in the proximity of the store. The new data
set, then, is as follows:
Price 3.38 3.42 3.48 3.52 3.55 3.55 3.61 3.61 3.67 3.72
Adv. 28 30 28 26 26 28 26 26 26 27
Dem. 83 86 83 81 81 83 81 81 80 80
Table 10.1. Price and Advertising vs. Demand Data for Diet Dr. Pepper
We then ask Excel to perform the linear regression. This time, however, we
make certain that both Price and Advertising are treated as “x” variables.
This statistic should not look entirely unfamiliar. We saw something very
similar when we discussed the Analysis of Variance. If this ratio is high (the
associated p-value being less than some threshold value, such as 0.05), we
reject the H0 and claim that there is a meaningful relationship between the
129
Quantitative Methods for MBA Students
Here we see that the intercept and both slope terms are statistically significant
– all p-values are very small. You will also notice that the values of the
intercept and the Price slope have changed from the simple linear regression
(“SLR”) problem. This should not be a surprise, because with the addition of
the new term (Advertisement) all other terms will have new values in the
pursuit of minimizing the sum of squared errors term.
7
In truth, the significance of the F-statistic in multiple linear regression almost
always shows significance, because we are trying to optimize the predictive ability,
which strongly implies a significant relationship between the “x” and “y” variables
already exists.
130
Patrick R. McMullen
ability.
We must, however, bear in mind that when predicting; make sure we use
values of the predictor variables that are in range of the values from our data
set. Otherwise, we are extrapolating, which can provide unreliable results.
10.2 Multicollinearity
Let us consider another example, where I am interested in estimating
GMAT Score using the predictor variables of undergraduate GPA and Hours
Studying for the GMAT exam. This sounds like a reasonable experiment. It
seems reasonable to expect that as both undergraduate GPA and Hours
Studying increase, so would the associated GMAT score.
131
Quantitative Methods for MBA Students
When looking at the table above, a problem becomes evident. The slope term
for Hours Studying doesn’t show statistical significance. We are being told
that there is no relationship between Hours Studying and GMAT score. This
doesn’t make sense. Upon further investigation, a simple linear regression is
performed between GMAT score and Hours studying – GPA is excluded from
this analysis. This regression reveals that Hours Studying is in fact related to
GMAT score (t = 39.94, p < 0.0001).
10.2.1 Correlation
When we define multicollinearity above, we state that it is the presence
of a strong correlation of predictor variables. Now correlation must be
defined. Correlation is a degree of relationship between variables. That
definition sounds eerily familiar – simple linear regression explores
relationships between variables. The difference between correlation and
simple linear regression is that simple linear regression explains the
relationship between slope and intercept, along with the results of associated
hypothesis tests, while correlation provides a single standardized statistic
describing the relationship between the two variables of interest.
132
Patrick R. McMullen
This value of , intended to measure the correlation between “x” and “y,” is
determined as follows:
10.2.2 Remediation
If we take a close look at our correlation matrix above, we see that we
have also included the response variable of GMAT Score. This will be discussed
further momentarily. For now, you should take notice of the correlation of
0.9894 – the standardized relationship between the two predictor variables,
GPA and hours studying. This means that Hours Studying and GPA are
essentially telling us the same thing. Because of this, we have
multicollinearity, and when two predictors are highly correlated, our
regression output can be counterintuitive. Such is the case here – our multiple
regression output tells us that Hours Studying is not meaningful, when simple
regression tells us Hours Studying is meaningful.
What do we do when we get conflicting results like this? There are many
opinions on this matter, but I strongly prefer the most parsimonious
approach. I prefer to remove one of the correlated variables, and let it be
133
Quantitative Methods for MBA Students
explained by the remaining variable. In this situation, it makes the most sense
to remove Hours Studying and let it be explained by GPA, because its
correlation with GMAT score (the response variable) is less than GPA’s
correlation with GMAT score (0.7846 < 0.7894).
10.3 Parsimony
With having only one predictor variable now, we once again perform a
linear regression without Hours Studying as a predictor variable. The results
are as follows:
Of course, the lone slope of GPA is highly significant (t = 40.59, p < 0.0001).
Most importantly, however, is the R2 value, which is 0.6232 – 62.32% of the
variation in GMAT score is explained by GPA. Comparing that to the multiple
regression model, where GMAT score is explained by GPA and Hours studying,
we have an R2 of 0.6238. In other words, by removing Hours Studying from
our model, we only lose 0.0006 from our R2 term. A meaningless loss.
8
Charles Dickens often used the word “parsimony” in his work. In the Dickensian
context, it implies “thrift” – not spending more money than needed. In our work
here, it is important not to use more words and/or variables than needed.
134
Patrick R. McMullen
because we get the same predictive power with one less term to “worry”
about and/or discuss.
10.4 Conclusions
Simple Linear Regression is a tool to explore whether or not relationships
exist between variables. With only a single predictor variable, we would be
naïve to assume Simple Linear Regression as a good predictive tool. This is
where Multiple Linear Regression comes in. Multiple Linear Regression builds
on relationships (uncovered by Simple Linear Regression) and attempts to
optimize the predictive fit by adding variables. Those variables, however,
should be chosen wisely. When the appropriate variables are added, our R2
term approaches a value of 1.00, resulting in a good predictive model.
10.5 Exercises
1. A company that delivers equipment is attempting to understand what
determines annual maintenance cost for their fleet of their nine trucks. A data
set named “TruckMaintainence” is to be used, where data is given for each
truck including the annual maintenance cost for each truck, the number of
miles on the truck, and the age of the truck in years. Using = 0.05, please
address the following:
135
Quantitative Methods for MBA Students
data. These variables are “College Prob,” “Class Size,” and “SAT
Score.” Definitions of these variables are as follows:
College Prob: this is the probability that the student in question will attend
college.
Class Size: this is the number of students from the class in which the student
in question comes.
SAT Score: this is the SAT score that the student in question earned.
For example, the very first data point tells us that the student in question has
a 63.52% chance of attending college. This particular student came from a
class size of 21, and earned a score of 1232 on the SAT test.
Your job for this problem is to find the most parsimonious model that explains
the probability of attending college. When finding this model, explain the
steps you have taken without use of Excel terminology.
136
Patrick R. McMullen
137
Quantitative Methods for MBA Students
138
Patrick R. McMullen
Day 1 2 3 4 5 6 7 8
Demand 12 12 13 15 16 16 19 20
Table 11.1. Time Series Data for Forecasting Problem
For these problems, we will introduce the terms Dt and Ft, which mean
139
Quantitative Methods for MBA Students
11.2.3 Differencing
Unfortunately, neither of the forecasting approaches above is reliable.
The reason is that whatever value we forecast into the future, that value will
always be somewhere between the minimum and maximum values that we
use for our average. In short, we are interpolating, when there are times we
need to be extrapolating.
140
Patrick R. McMullen
have the average change per time unit. If we let t represent the difference
for period t, we have the following:
For this notation, n represents the number of periods. For our example,
period differences are shown in the table below:
Day 1 2 3 4 5 6 7 8
Demand 12 12 13 15 16 16 19 20
Difference 0 1 2 1 0 3 1
Table 11.2. Time Series Data for Forecasting Problem
141
Quantitative Methods for MBA Students
Comparing our values of actual demand, Dt, with those “fitted,” we can
calculate the error for each time period, which in forecasting, we call this the
absolute deviation (ADt):
̂𝑡 |
𝐴𝐷𝑡 = |𝐷𝑡 − 𝐷 (Eq. 11-6)
The mean absolute deviation (MAD) is simply the average of the AD values for
all time periods:
𝑛
1
𝑀𝐴𝐷 = ∑ 𝐴𝐷𝑡 (Eq. 11-7)
𝑛
𝑡=1
Knowing the forecast error (MAD), we can use our “fit” model above to
forecast into the future:
Day 1 2 3 4 5 6 7 8
𝑫𝒕 12 12 13 15 16 16 19 20
𝑫̂𝒕 11.17 12.37 13.57 14.77 15.98 17.18 18.38 19.58
ADt 0.83 0.37 0.57 0.23 0.02 1.18 0.62 0.42
Table 11.3. Regression Forecasting Results for Example Problem
Figure 11.1 shows a time series plot of the given data along with the
142
Patrick R. McMullen
regression forecasting line, where we have forecast two periods into the
future (F9 = 20.79, F10 = 21.99).
21
19
17
Dt
15
13
11
1 2 3 4 5 6 7 8 9 10
Period
One might note how the forecast line is simply a continuation of the trend,
which we quantified via simple linear regression.
143
Quantitative Methods for MBA Students
Linear Forecast
45
40
35
30
25
Dt
R² = 0.9589
20
15
10
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Period
Upon careful inspection, you might notice that the trend line overestimates
the first few periods, underestimates the middle periods, and once again
underestimates the latter periods. This suggests that the trend is nonlinear.
Perhaps we are “splitting hairs” here, but one can make a very reasonable
argument for a nonlinear trend. When this occurs, we can use multiple linear
regression as a means to achieve “nonlinear regression.” We can do this by
creating a new independent variable called t2, which is simply the period
multiplied by itself and used as a predictor variable, resulting in the following
general equation:
̂𝑡 = 𝑎 + 𝑏1 𝑡 2 + 𝑏2 𝑡
𝐷 (Eq. 11-10)
Using multiple regression to estimate the intercept and slope terms, we have
the following model we can use for forecasting:
144
Patrick R. McMullen
Nonlinear Forecast
45
40
35
30
25
Dt
R² = 0.9893
20
15
10
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Period
Two things should be noted from inspection of Figure 11.3. First, notice how
the nonlinear fit better captures the data points than does the line via the
simple regression model. Second, notice how the associated R2 term has
increased about 0.03 from the simple regression model.
145
Quantitative Methods for MBA Students
146
Patrick R. McMullen
monthly data, the order of seasonality is twelve. If we have hourly data, the
order of seasonality is twenty-four. For our example, we have quarterly data,
so the order of seasonality is four. The order of seasonality can be determined
by either examining a time series plot or by understanding how many time
periods there are for each temporal cycle.
Given that we know the order of seasonality for our problem is four, we
compute the seasonal index for each season – quarters for our example. The
seasonal index for a specific season is the average for that specific season
divided by the average of all the given data. For example, the seasonal index
for January would be for the average for all January observations divided by
the average for all data. For quarters, the seasonal index for the first quarter
would be the average value of the first quarter data divided by the average of
all data. Mathematically, we can pursue this as follows: there are n rows (a
row for each quarter, and i is the row index), and there are m columns (a
column for each year, and j is the column index). The value of quantity
demand for the jth quarter for year i will be represented by Dij. Seasonal
indices (SIj) are as follows:
(∑𝑚
𝑗=1 𝐷𝑖𝑗 )/𝑚 (11-12)
𝑆𝐼𝑖 =
(∑𝑚 𝑛
𝑗=1 ∑𝑖=1 𝐷𝑖𝑗 )/𝑚𝑛
Using this formula, we can calculate the seasonal indices for our example
problem, which are shown in Table 11.5.
147
Quantitative Methods for MBA Students
𝐷𝑖𝑗 (11-13)
𝐷𝑆𝑖𝑗 =
𝑆𝐼𝑖
The formula above filters out the seasonality, and isolates any trend which
may exist, and linear regression is the sensible tool of choice to capture a
trend.
𝑡 = 𝑛𝑗 + 𝑖, ∀𝑖 = 1, … 𝑛; ∀𝑗 = 0, … , 𝑚 − 1 (11-14)
Do not place too much concern with the above formula – the appropriate
period numbers (t) are listed next to the actual demand values. We can fit the
linear model to capture the trend exposed by our de-seasonalized data:
148
Patrick R. McMullen
̂ 𝑡 = 𝑎 + 𝑏𝑡
𝐷𝑆 (11-15)
Fitting this model with our de-seasonalized data, we have captured the trend,
with an intercept of 87.12 and a slope of 1.21, resulting in the following
fit/forecast model:
De-Seasonalizing
Actual Deseasonalized Trend
140
130
120
110
100
Dt
90
80
70
60
1 2 3 4 5 6 7 8 9 10 11 12
Period
149
Quantitative Methods for MBA Students
Here, SIi is the appropriate seasonal index corresponding with period t. Simply
put, multiplying the fitted trend by the appropriate seasonal index puts the
seasonality “back in.” We can then determine the MAD by comparing the
back-forecasts to the given data:
𝑃 (11-18)
1
𝑀𝐴𝐷 = ∑|𝐷𝑡 − 𝐹𝑡 |
𝑃
𝑡=1
For our example, the MAD is 1.55. On average, our back-forecast is 1.55 units
different from the actual.
At this point, to forecast, we only need to “extend” our line into the
future by adding new values of t, and using the appropriate SIi values to re-
seasonalize our data, along with our trend model. A comparison of our actual
data and our fit/forecast data one year into the future is shown below. Notice
how closely the fit/forecast line matches up with the actual values.
150
Patrick R. McMullen
Actual v. Forecast
Actual Fit/Forecast
140
130
120
110
100
Dt
90
80
70
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Period
The notation in this section has been complicated due to the fact that we
must change from double-subscripted notation (“ij”) to time-series notation
(“t”). Because of this complicating force, the Appendix is provided for a more
deliberated pursuit of this example problem.
11.5 Parsimony
There is a very large arsenal of forecasting tools. In this chapter, we have
covered some simple, yet very useful tools. There is always a temptation to
increase the R2 or decrease the MAD term a little bit so as to boast a better
forecast. When forecasting, let us remember that we may have to explain the
details of our forecasting approach to someone else – perhaps someone who
is not quantitatively versed. Because of this, it is very important to keep our
forecasts as straightforward as possible. Your objective should not be to
“impress” the customer, but to “inform” the customer.
151
Quantitative Methods for MBA Students
11.6 Conclusions
Forecasting is a very powerful topic. Even though this book is
introductory in nature, we have covered linear forecasting, nonlinear
forecasting and seasonal forecasting. With just a little bit of practice and
experience, these tools can provide immense value to organizations.
11.7 Exercises
1. The time series “SeriesG” documents domestic air travelers (in 1,000s)
from January, 1959 to December, 1970. Find the most appropriate forecasting
model and forecast domestic air travel for all months of 1971.
2. The time series “SnowShovels” shows demand for snow shovels from
January, 2006 through December, 2010. Find the most appropriate
forecasting model, and forecast snow shovel demand for all months of 2011.
3. The time series “TimeSeries1” shows Advertising dollars for a specific day,
152
Patrick R. McMullen
and the sales the following day. Using the most appropriate forecasting
model, explain the relationship between Advertising dollars and the next day’s
sales.
Use the provided data set to forecast one year into the future.
153
Quantitative Methods for MBA Students
Let’s take this example one step further. Job A offers more compensation
than does Job B, but Job B looks more promising in terms of professional
growth. What is the likelihood for growth regarding Job B? How much growth
is associated with Job B? All of a sudden this decision becomes less trivial
because of the uncertainty associated with future professional growth
opportunities. In short, we are confronted with a difficult decision.
154
Patrick R. McMullen
Rain (-8)
Umbrella Rain?
No Rain
(-3)
Decision
Rain (-25)
No Umbrella Rain?
No Rain (0)
There are four outcomes for this problem: (1) we bring an umbrella and it
rains; (2) we bring an umbrella and it does not rain; (3) we do not bring an
umbrella and it rains; and (4) we do not bring an umbrella and it rains. The
payoffs associated with these outcomes are: -8, -3 -25 and 0. These payoffs
are in units of utility – a non-monetary unit associated with success. “Utility”
is often used economics.
155
Quantitative Methods for MBA Students
The “maximin” value here is -8, the payoff associated with bringing an
umbrella when it rains. This means that the best pessimistic decision is to
bring an umbrella because we think it will rain.
156
Patrick R. McMullen
Our best alternative has an expected value of -4.25, and is associated with the
“no umbrella alternative.” As such, using expected value, we should not bring
the umbrella. We will call this value of -4.25 as the expected value.
157
Quantitative Methods for MBA Students
For our umbrella problem, assume we have advance knowledge that it will
rain. With that knowledge, we select an umbrella as that payoff is better than
the payoff associated with not bringing un umbrella when it rains (max[-8, -
25] = -8). With advance knowledge that it will not rain, we would choose not
to bring an umbrella because of the higher payoff (max[-3, 0] = 0). Because
our advance knowledge of rain will occur 25% of the time, and advance
knowledge of no rain will occur 75% of the time, our expected value under
certainty value is as follows:
158
Patrick R. McMullen
For our example, the EVPI is -2 - (-4.25) = 2.25. This value is always positive,
and numerically represents the difference between being certain and being
uncertain. Another way to interpret this is maximum amount you should
“pay” to eliminate uncertainty.
12.4 An Example
Let is consider a monetary example with three investment alternatives:
Bonds, Stocks and CDs. Each alternative has three possible outcomes: growth,
stagnation and inflation. We assume that the probability of growth is 50%,
the probability of stagnation is 30%, and the probability of inflation is 20%.
Figure 10.2 shows the details of this problem, including payoffs in decision tree
format.
Growth
($12)
Stagnation
Bonds ?
($6)
Inflation
($3)
Growth
($15)
Stagnation
Decision Stocks ?
($3)
Inflation
(-$2)
Growth
($6.50)
Stagnation
CDs ?
($6.50)
Inflation
($6.50)
Table 12.1 displays this data in table format for easy calculations.
159
Quantitative Methods for MBA Students
Outcomes Results
Alternative Growth Stag. Inflation
Max Min EVUU
(p = 0.5) (p = 0.3) (p = 0.2)
Bonds $12 $6 $3 $12 $3 $8.4
Stocks $15 $3 -$2 $15 -$2 $8
CDs $6.5 $6.5 $6.5 $15 $6.5 $6.5
Max Val. $15 $6.5 $6.5 $15 $6.5 $8.4
Decision N/A Stocks CDs Bonds
Table 12.1. Details of Monetary Example Decision Tree Problem
For each alternative, the maximum, minimum and expected value payoffs
have been determined. The optimist would choose Stocks because $15, the
best possible outcome would occur via the Stocks decision. The pessimist
would choose CDs, because $6.5 is the best outcome assuming the worst will
happen. When pursuing expected value, Bonds is the best decision, because
they provide the highest expected value ($8.4).
Now that we understand the expected value under uncertainty is $8.4 and the
expected value under certainty is $10.75, we can calculate the expected value
of perfect information as ($10.75 – $8.4 = $2.35). To put this result into a
monetary context, we can say that the value of eliminating uncertainty is
$2.35 – if we hired someone with the ability to somehow remove uncertainty
from our problem, we should pay them no more than $2.35 for their services.
12.5 Conclusions
Decision analysis is an important topic in that it bridges the theoretical
and applied worlds. I have personally never constructed a decision tree for a
decision that I have needed to make. Nevertheless, a decision tree does
structure a decision in terms of options, associated uncertainties and their
interrelationships, which is important.
160
Patrick R. McMullen
interpretation. In our umbrella example, we talk about how being in the rain
without an umbrella is somehow a “bad” thing. That is basically stated from
the perspective of someone traveling to work or classes at the university.
Imagine, however, a nervous farmer in South Dakota worried about a dried-
out corn crop in August. If that farmer is caught in a downpour without an
umbrella will they be upset? Of course not – they will be relieved and happy.
This point needs to be made because payoffs need to be valued from the
proper perspective.
12.6 Exercises
For these questions, address the following: state what the optimist would do;
161
Quantitative Methods for MBA Students
state what the pessimist would do; state what one would do using expected
value as a strategy; and state the expected value of perfect information.
2. You have been given some money to invest, and you are considering
three different options for your investment choice: AlphaStuds, BetaStuds
and GammaStuds. Under a favorable market, AlphaStuds will return $100, but
lose $30 under an unfavorable market. Under a favorable market, BetaStuds
will return $75, but lose $25 under an unfavorable market. GammaStuds will
return $60 under a favorable market, but lose $15 under an unfavorable
market. A favorable market has a 60% of occurring, while an unfavorable
market has a 40% chance of occurring. All money must be invested in one of
the three options – distributing the investment among multiple investments
is not permitted.
162
Patrick R. McMullen
$1M in one of the securities – mixing the investment between the two
securities is not permitted. Under good market conditions, AlphaTron will
provide a return of $100K, while OmegaTron will provide a return of $60K.
Under average market conditions, AlphaTron will provide a return of $20K,
while OmegaTron will provide a return of $10K. Under bad market conditions,
AlphaTron will lose $60K, while OmegaTron will lose $30K. Good market
conditions are estimated to occur with a 40% probability, average and bad
market conditions are each estimated to occur with a 30% probability.
163
Quantitative Methods for MBA Students
Quarter t Dt t Dt t Dt Avg. SI
1 1 85 5 93 9 96 91.33 0.96
2 2 118 6 130 10 133 127.00 1.34
3 3 83 7 89 11 95 89.00 0.94
4 4 69 8 74 12 75 72.67 0.76
Table 1. Given Data with Seasonal Indices Calculated
Period SI Dt DSt
1 0.96 85 88.54
2 1.34 118 88.06
3 0.94 83 88.30
4 0.76 69 90.79
5 0.96 93 96.88
6 1.34 130 97.01
7 0.94 89 94.68
8 0.76 74 97.37
9 0.96 96 100.00
10 1.34 133 99.25
11 0.94 95 101.06
12 0.76 75 98.68
Table 2. De-Seasonalized Data
164
Patrick R. McMullen
The final step in this process is to re-incorporate our seasonality, and extend
our fitted trend into the future. This is shown below:
165
Quantitative Methods for MBA Students
166
Patrick R. McMullen
References
Having never written a book before, I’m not exactly sure how to cite
references. My knowledge of statistics has accumulated over the years to the
point where most all of what has been written here came from experience. As
such, the references I cite below are from books that I like, books I have used
in the past, or books in my collection that history has deemed as important.
Albright, C., Winston, W. and Zappe, C. “Data Analysis and Decision Making,
4th Edition.” Cengage Learning. Cincinnati, OH. 2011.
Bowerman, B., O’Connell, R. and Murphee, E. “Business Statistics in Practice.”
Irwin/McGraw-Hill. New York, New York. 2011.
Box, G. and Jenkins, G. “Time Series Analysis: Forecasting and Control.”
Holden-Day. San Francisco, CA. 1970.
Brightman, H. “Statistics in Plain English.” Southwestern Publishing.
Cincinnati, OH. 1986.
Gaither, N. and Frazier, G. “Operations Management.” Southwestern
Publishing. Cincinnati, OH. 2002.
Sharpe, N., DeVeaux, R. and Velleman, P. “Business Statistics: A First Course.”
Pearson Higher Education. New York, New York. 2014.
Sternstein, M. “Statistics.” Barron’s College Review Series. Hauppauge, New
York. 1996.
Wonnacott, T.H. and Wonnatott, R.J. “Introductory Statistics.” John Wiley &
Sons. New York, New York. 1990.
167