MITx SCX KeyConcept SC0x FV
MITx SCX KeyConcept SC0x FV
MITx SCX KeyConcept SC0x FV
These are meant to complement, not replace, the lesson videos and slides. They are intended to
be references for you to use going forward and are based on the assumption that you have learned
the concepts and completed the practice problems.
The draft was updated by Dr. Inma Borella in the Spring of 2019.
This is a draft of the material, so please post any suggestions, corrections, or recommendations to
the Discussion Forum under the topic thread “Key Concept Documents Improvements.”
Thanks,
Chris Caplice, Eva Ponce and the SC0x Teaching Community
Spring 2019
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
1
Table of Contents
Module 1: Analytics Basics
Supply Chain Management Overview................................................................................................... 3
Mathematical functions ....................................................................................................................... 6
Models......................................................................................................................................................6
Functions ..................................................................................................................................................6
Quadratic Functions .................................................................................................................................7
Convexity and Continuity..........................................................................................................................8
Data Management .............................................................................................................................. 9
Module 2: Probability
Probability ......................................................................................................................................... 10
Probability basics ...................................................................................................................................10
Discrete Distributions .............................................................................................................................13
Continuous Distributions ........................................................................................................................15
Module 3: Statistics
Statistics ............................................................................................................................................ 19
Statistical testing and the central limit theorem ....................................................................................19
Sampling and confidence intervals……………………………………………………………………………………….22
Hypothesis Testing .................................................................................................................................25
Multiple Random Variables ....................................................................................................................27
Regression ......................................................................................................................................... 29
Ordinary Least Squares Linear Regression .............................................................................................29
Module 4: Optimization
Optimization ...................................................................................................................................... 33
Unconstrained Optimization ..................................................................................................................33
Constrained Optimization ......................................................................................................................34
Integer and Mixed Integer Programs .....................................................................................................38
Networks and Non-Linear Programming ............................................................................................ 42
Network Models .....................................................................................................................................42
Non-Linear Optimization ........................................................................................................................44
Module 5: Algorithms, Approximations and Simulation
Algorithms and Approximations......................................................................................................... 46
Algorithms ..............................................................................................................................................46
Approximation Methods ........................................................................................................................52
Simulation ......................................................................................................................................... 59
References ......................................................................................................................................... 62
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2
Supply Chain Management Overview
Summary
Supply Chain Basics is an overview of the concepts of Supply Chain Management and logistics. It
demonstrates that product supply chains as varied as bananas to women’s shoes to cement have
common supply chain elements. There are many definitions of supply chain management. But
ultimately supply chains are the physical, financial, and information flow between trading
partners that ultimately fulfill a customer request. The primary purpose of any supply chain is to
satisfy a customer’s need at the end of the supply chain. Essentially supply chains seek to
maximize the total value generated as defined as: the amount the customer pays minus the cost
of fulfilling the need along the entire supply chain. All supply chains include multiple firms.
Key Concepts
While Supply Chain Management is a new term (first coined in 1982 by Keith Oliver from Booz
Allen Hamilton in an interview with the Financial Times), the concepts are ancient and date back
to ancient Rome. The term “logistics” has its roots in the Roman military. Additional definitions:
• Logistics involves… “managing the flow of information, cash and ideas through the
coordination of supply chain processes and through the strategic addition of place, period
and pattern values” – MIT Center for Transportation and Logistics
• “Supply Chain Management deals with the management of materials, information and
financial flows in a network consisting of suppliers, manufacturers, distributors, and
customers” - Stanford Supply Chain Forum
• “Call it distribution or logistics or supply chain management. By whatever name it is the
sinuous, gritty, and cumbersome process by which companies move materials, parts and
products to customers” – Fortune 1994
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3
Supply Chain Perspectives
Supply chains can be viewed in many different perspectives including process cycles (Chopra &
Meindl 2013) and the SCOR model (Supply Chain Council).
The Supply Chain Process has four Primary Cycles: Customer Order Cycle, Replenishment Cycle,
Manufacturing Cycle, and Procurement Cycle, Not every supply chain contains all four cycles.
The Supply Chain Operations Reference (SCOR) Model is another useful perspective. It shows
the four major operations in a supply chain: source, make, deliver, plan, and return. (See Figure
below)
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
4
Supply Chain as a System
It is useful to think of the supply chain as a complete system. This means one should:
• Look to maximize value across the supply chain rather than a specific function such
as transportation.
• Note that while this increases the potential for improvement, complexity and
coordination requirements increase as well.
• Recognize new challenges such as:
o Metrics — how will this new system be measured?
o Politics and power — who gains and loses influence, and what are the effects
o Visibility — where data is stored and who has access
o Uncertainty — compounds unknowns such as lead times, customer demand,
and manufacturing yield
o Global Operations — most firms source and sell across the globe
Supply chains must adapt by acting as both a bridge and a shock absorber to connect functions
as well as neutralize disruptions.
Learning Objectives
• Gain multiple perspectives of supply chains to include process and system views.
• Identify physical, financial, and information flows inherent to supply chains.
• Recognize that all supply chains are different but have common features.
• Understand importance of analytical models to support supply chain decision-making.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
5
Mathematical functions
Summary
This review provides an overview of the building blocks to the analytical models used frequently
in supply chain management for decision-making. Each model serves a role; it all depends on how
the techniques match with need. First, a classification of the types of models offers perspectives
on when the use a model and what type of output they generate. Second, a review of the main
components of models, beginning with an overview of types of functions, the quadratic and how
to find its root(s), logarithms, multivariate functions, and the properties of functions. These
“basics” will be used continuously throughout the remainder of the courses.
Key Concepts
Models
Decision-making is at the core of supply chain management. Analytical models can aid in decision-
making to questions such as “what transportation option should I use?” or “How much inventory
should I have?” They can be classified into several categories based on degree of abstraction,
speed, and cost.
Functions
Functions are one the main parts of a model. They are “a relation between a set of inputs and a
set of permissible outputs with the property that each input is related to exactly one
output.”(Wikipedia)
y=f(x)
Linear Functions
With Linear functions, “y changes linearly with x.” A graph of a linear function is a straight line
and has one independent variable and one dependent variable. (See figure below)
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
6
Figure 2. Components of a linear function
**Typically constants are denotes by letters from the start of the alphabet, variables are letters from the end of the
alphabet.
Quadratic Functions
A quadratic function takes the form of y=ax2 + bx + c, where a, b, and c are numbers and are
not equal to zero. In a graph of a quadratic function, the graph is a curve that is called a
parabola, forming the shape of a “U”. (See Figures)
a>0
Quadratic Functions y
This is the x
dependent
variable function, f(x).
a<0
y = ax 2 +bx +c y
x
independent constants
variable
Roots of Quadratic
A root, or solution, satisfies the quadratic. The equation can have 2, 1, or 0 roots. The roots must
be a real number. There are two methods for finding roots:
Quadratic equation
Factoring: Find r1 and r2 such that
ax2+bx+c = a(x-r1)(x-r2) −b ± b 2 − 4ac
r1, r2 =
2a
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
7
Other Common Functional Forms Euler’s number, or e, is a constant
Power Function number that is the base of the natural
A power function is a function where a ≠ zero, is a logarithm.
constant, and b is a real number. The shape of the curve
e = 2.7182818…
is dictated by the value of b.
Y=ex
y = f(x) = axb
Exponential Functions
Exponential functions have very fast growth. In Logarithms: A logarithm is a quantity
exponential functions, the variable is the power. representing the power to which a fixed
y=abx number must be raised to produce a Other Inv
given number. It is the inverse function
Multivariate Functions Logarithms of an exponential.
y=ax
y=a+x
Function with more than one independent x variable (x1, y=b x logb(y)=x
x2, x3).
y is the value of b raised x is the power that I need to
to the xth power. raise the base, b, to equal y.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
8
Data Management
In data management supply chain managers will be faced with immense complexity. This
complexity is influenced by the volume (how much), velocity (pace), variety (spread), and veracity
(accuracy). Each of these components will influence how data is treated and used in the supply
chain.
There are several reoccurring issues that supply chain managers must be aware of as they are
working with data:
• Is the data clean?
• Is the data complete?
• What assumptions are you making about the data?
• Are the results making sense? How can I check?
Cleaning data is one of the most important, yet time consuming processes in data analysis. It can
greatly influence the outcome of analysis if not completed properly. Therefore, SC professionals
should always plan enough time for basic data checks (meaning if you get garbage in, you will get
garbage out).
There are several typical checks you should always look for:
• Invalid values - negative, text, too small, too big, missing
• Mismatches between related data sets - # of rows, # of columns
• Duplication – unique identifiers
• Human error – wrong dates, invalid assumptions
• Always explore the outliers – they are the most interesting!
When cleaning data, you should be organized. This means you must make sure to version the
documents you are working with and keep track of data changes.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
9
Probability
Summary
We review two very important topics in supply chain management: probability and distributions.
Probability is an often-reoccurring theme in supply chain management due to the common
conditions of uncertainty. On a given day, a store might sell 2 units of a product, on another, 50.
To explore this, the probability review includes an introduction of probability theory, probability
laws, and proper notation. Summary or descriptive statistics are shown for capturing central
tendency and the dispersion of a distribution. We also introduce two theoretical discrete
distributions: Uniform and Poisson.
We then introduce three common continuous distributions: Uniform, Normal, and Triangle. The
review then goes through the difference between discrete vs. continuous distributions and how
to recognize these differences. The remainder of the review is a exploration into each type of
distribution, what they look like graphically and what are the probability density function and
cumulative density function of each.
Key Concepts
Probability basics
Probability defines the extent to which something is probable, or the likelihood of an event
happening. It is measured by the ratio of the case to the total number of cases possible.
Probability Theory
• Mathematical framework for analyzing random events or experiments.
• Experiments are events we cannot predict with certainty (e.g., weekly sales at a store,
flipping a coin, drawing a card from a deck, etc.).
• Events are a specific outcome from an experiment (e.g., selling less than 10 items in a
week, getting 3 heads in a row, drawing a red card, etc.)
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
10
Notation
• P (A) – the probability that event A occurs
• P (A’) = complement of P (A) – probability some other event that is not A occurs. This
is also the probability that something other than A happens.
Probability Laws
1. The probability of any event is between 0 and 1, that is 0 ≤ P(A)≤ 1
2. If A and B are mutually exclusive events, then P (A or B) = P (A U B) = P (A) + P (B)
3. If A and B are any two events, then
P(A and B) P ( A ∩ B)
P(A | B) = =
P(B) P(B)
Where P(A I B) is the conditional probability of A occurring given B has already occurred.
P(A | B) = P(A)
P(A and B) = P(A ∩ B) = P ( A | B ) P(B) = P ( A ) × P ( B )
Where events A and B are independent if knowing that B occurred does not influence the
probability of A occurring.
Summary statistics
Descriptive or summary statistics play a significant role in the interpretation, presentation, and
organization of data. It characterizes a set of data. There are many ways that we can characterize
a data set, we focused on two: Central Tendency and Dispersion or Spread.
Central Tendency
This is, in rough terms, the “most likely” value of the distribution. It can be formally measured in
a number of different ways to include:
• Mode – the specific value that appears most frequently
• Median – the value in the “middle” of a distribution that separates the lower from the
higher half. This is also called the 50th percentile value.
• Mean (μ) – the sum of values multiplied by their probability (called the expected value).
Also the sum of values divided by the total number of observations (called the average).
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
11
Expected value of X, E[X] is equal to:
n
E[X] = x = µ = ∑ pi xi
i=1
Dispersion or Spread
This captures the degree to which the observations “differ” from each other. The more common
dispersion metrics are:
• Range – the maximum value minus the minimum value.
• Inner Quartiles – 75th percentile value minus the 25th percentile value
- captures the
“central half” of the entire distribution.
• Variance (σ2) – the expected value of the squared deviation around the mean; also called
the Second Moment around the mean
n 2 n 2
Var[X] = σ = ∑ pi ( xi − x ) = ∑ pi ( xi − µ )
2
i=1 i=1
• Standard Deviation (σ) – the square root of the variance. This puts it in the same units as
the expected value or mean.
• Coefficient of Variation (CV) – the ratio of the standard deviation over the mean = σ/μ.
This is a common comparable metric of dispersion across different distributions. As a
general rule:
o 0 £ CV £ 0.75, low variability
o 0.75 £ CV £ 1.33, moderate variability
o CV > 1.33, high variability
The only differences between calculating the population versus the sample variances (and
thus their corresponding standard deviations) is that for the population variance, σ2, we
divide the sum of the observations by n (the number of observations) while for the sample
variance, s2, we divide by n-1.
n 2 n 2
σ2 =
∑ (x − µ)
i=1 i
s2 =
∑ (x − x )
i=1 i
n n −1
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
12
Note that the sample variance will be slightly larger than the population variance for small
values of n. As n gets larger, this difference essentially disappears. The reason for the use n-
1 is due to having to use a degree of freedom in calculating the average (x bar) from the same
sample that we are estimating the variance. It leads to an unbiased estimate of the
population variance. In practice, you should just use the sample variance and standard
deviation unless you are dealing with specific probabilities, like flipping a coin.
Discrete Distributions
Probability distributions can either be empirical (based on actual data) or theoretical (based on
a mathematical form). Determining which is best depends on the objective of the analysis.
Empirical distributions follow past history while theoretical distributions follow an underlying
mathematical function. Theoretical distributions do tend to allow for more robust modeling
since the empirical distributions can be thought of as a sampling of the population data. The
theoretical distribution can be seen as better describing the assumed population distribution.
Typically, we look for the theoretical distribution that best fits the data
We presented five distributions. Two are discrete (Uniform and Poisson) and three are
continuous (Uniform, Normal, and Triangle). Each is summarized in turn.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
13
Discrete Uniform Distribution ~U(a,b)
Finite number (N) of values observed with a minimum value of a and a maximum value of b. The
probability of each possible value is 1/N where N= b – a +1
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
14
Continuous Distributions
Probability Density Function (pdf)
The pdf is function of a continuous variable. The probability that X lies between values a and b
is equal to area under the curve between a and b. Total area under the curve equals 1, but the
P(X= t) = 0 for any specific value of t.
⎧ 1
⎪ if a ≤ t ≤ b
pdf: f (t | a,b) = ⎨ b − a
⎪ 0 Summary Metrics
⎩ otherwise • Mean = (a+b)/2
• Median = (a+b)/2
• Mode N/A all values equally likely
cdf: • Variance = (b-a)2/12
⎧ 0 if t < a
⎪
⎪ t−a
F(t | a,b) = ⎨ if a ≤ t ≤ b
⎪ b−a
⎪ 1 if t > b
⎩
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
15
Normal Distribution ~N(μ,σ)
Widely used bell-shaped, symmetric continuous distribution with mean μ and standard
deviation σ. Most commonly used distribution in practice.
Summary Metrics
• Mean = μ
• Median = μ
• Mode =μ
• Variance = σ2
⎡ ⎛ 2⎤
⎢− 1 ⎜ x−µ ⎞⎟ ⎥
1 ⎢ 2⎝ σ ⎠ ⎥
pdf: f (x | µ ,σ ) = 1/2
e ⎣ ⎦
(2π ) σ
istribution 2
(b - a)
ngle distribution
m of a, maximum
a c b x
of c, ~T(a, b, c)” Figure 6. Triangle distribution
x<a a+b+c
E éë x ùû =
pdf: 3
) ⎧
æ 1 ö 2 ⎪ 2 02 x<a
-a )
a£x£c
è 18 ø
(
Var éë xùû = ç ÷ a +⎪b +2c( x −-aab
⎪
) - ac - bc
a≤x≤c
)
⎪ (2b − a ) ( c − a )
æ f (x) = ⎨ ö
) c£x£b P éë x > d ùû = çç
b - d
⎪ ( 2 (b÷− x ) ) for cc≤£xd≤ b£ b
-c ) è ( ⎪
)(
ç b - a ⎪b(-b −ca)÷÷(b − c)
ø )
⎪
0 x>b
( )( )
⎩
x>b d = b - P éë x > d ùû b - a b - c for c £ d £ b
aracteristics
cdf:
• Good way to get a sense of an unknown distribution
• People tend to recall extreme and common values
• Handles asymmetric distributions 22
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
17
Summary Metrics
a+b+c
E ⎡⎣ x ⎤⎦ =
3
⎛1⎞ 2 2 2
(
Var ⎡⎣ x ⎤⎦ = ⎜ ⎟ a + b + c − ab − ac − bc
⎝ 18 ⎠
)
⎛ 2 ⎞
P ⎡⎣ x > d ⎤⎦ = ⎜
⎜ ( b− d) ⎟
for c ≤ d ≤ b
⎜ (b − a ) (b − c ) ⎟⎟
⎝ ⎠
( )(
d = b − P ⎡⎣ x > d ⎤⎦ b − a b − c ) for c ≤ d ≤ b
Learning Objectives
• Understand probabilities, importance and application in daily operations and extreme
circumstances.
• Understand and apply descriptive statistics.
• Understand difference between continuous vs. discrete random variable distributions.
• Review major distributions: Uniform (discrete and continuous), Poisson, Normal and
Triangle.
• Understand the difference between discrete vs. continuous distributions.
• Recognize and apply probability mass functions (pmf), probability density functions (pdf),
and cumulative density functions (cdf).
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
18
Statistics
Statistical testing and the central limit theorem
Central Limit Theorem
The central limit theorem states that the sample distribution of the mean of any independent
random variable will be (nearly) normally distributed, if the sample size is large enough. Large
enough is based on a few factors – one is accuracy (more sample points will be required) as well
as the shape of the underlying population. Many statisticians suggest that 30, sometimes 40, is
considered sufficiently large. The central limit theorem is important because it suggests that it
does not matter what distributions the random variable follows. If we have a sufficient sample
size we can assume a normal distribution.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
19
Figure 7. Convert a Normal distribution to standard unit normal
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
20
Normal Distribution
To Standard unit Normal
Category To find? N(μ, σ)
find? Z ~N (0, 1)
If n>= 30
If n >= 30
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
21
Sampling and confidence intervals
Sampling
An inference is a conclusion reached on the basis of evidence and reasoning. To make inferences
about the population we need to collect a sample and ask testable questions, such as, for
example, if data fits a specific distribution or if two variables are correlated? If sampling is done
correctly, the sample mean should be an estimator of the population mean as well as
corresponding parameters.
• Population: is the entire set of units of observation.
• Sample: subset of the population.
• Parameters: describe the distribution of random variable.
• Random Sample: is a sample selected from the population that ensures that results are
representative and not skewed from selection of the sample.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
22
Confidence Intervals
Confidence intervals are used to describe the uncertainty associated with a sample estimate of a
population parameter.
There are some important insights for confidence intervals around the mean.
Recall that: There are tradeoffs between interval (L), sample size (n) and confidence (c):
• When n is fixed, using a higher confidence level c leads to a wider interval, L.
• When confidence level is fixed (c), increasing sample size n, leads to smaller interval, L.
• When both n and confidence level are fixed, we can obtain a tighter interval, L, by
reducing the variability (i.e. s).
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
23
When interpreting confidence intervals, a few things to keep in mind:
• Repeatedly taking samples and finding confidence intervals leads to different intervals
each time,
• But c% of the resulting intervals would contain the true mean.
• To construct a c% confidence interval that is within (+/-) L of μ , the required sample size
is: n=z2*s2/ L2
After finding the z-value “c” using the table 8.1, use the formula below to derive the confidence
intervals:
⎡ cs cs ⎤
⎢x − ,x + ⎥
⎣ n n⎦
Student t- Distribution
If our sample is small (generally for samples n≤30) we use the student t-distribution instead of
the Normal distribution.
The t-distribution . . .
-is bell-shaped and symmetric around 0,
-is flatter and has fatter tails than the Normal distribution,
-its shape is a function of k, the degrees of freedom, where k=n-1, and
-its mean=0 and its standard deviation = √(k/k-2).
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
24
Find what? Excel Function
CDF of t-Distribution
=TDIST(X, k, num_tails)
(returns α probability)
Inverse of t Distribution
=T.INV.2T(α, k)
(returns the t-value)
Hypothesis Testing
Hypothesis testing is a method for making a choice between two mutually exclusive and
collectively exhaustive alternatives. In this practice, we make two hypotheses and only one can
be true. Null Hypothesis (H0) and the Alternative Hypothesis (H1). We test, at a specified
significance level, to see if we can reject the Null hypothesis, or we cannot reject the Null
Hypothesis.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
25
Figure 11. One-tailed and two-tailed predictions
Suppose, I want to test whether a new information system has decreased my order cycle time.
We know that historically, the average cycle time is 72.5 hours +/- 4.2 hours. We sampled 60
orders after the implementation and found the average to be 71.4 hours. We select a level of
significance to be α=5%.
1. Select the test statistic of interest:
mean cycle time in hours – use Normal distribution (z-statistic).
2. Determine whether this is a one or two tailed test:
One tailed test (test for direction).
3. Pick your significance level and critical value:
α= 5 percent, therefore z= NORM.S.INV(.05) = -1.6448.
1. Formulate your Null & Alternative Hypotheses:
H0: New cycle time is not shorter than the old cycle time,
H1: New cycle time is shorter than the old cycle time.
2. Calculate the test statistic:
z =(X` - μX` )/σX` =(X` - μ)/(σ/√n) = (71.4 – 72.5)/(4.2/√60) = -2.0287.
3. Compare the test statistic to the critical value :
z = -2.0287 < -1.6448 the test statistic < critical value, therefore, we can reject the null
hypothesis.
Rather than just reporting that H0 was rejected at a 5% significance level, we might want to let
people know how strongly we rejected it. The p-value is the smallest level of α(level of
significance) such that we would reject the Null Hypothesis with our current set of data. Always
report the p-value: p-value = NORM.S.DIST(-2.0287) = .0212.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
26
Chi square test
The Chi Square test is used to measure the goodness of fit and determine whether the data is,
for example, normally distributed. To use a chi square test, you typically will create a bucket of
categories, c, count the expected and observed (actual) values in each category, and calculate
the chi-square statistics and find the p-value. If the p=value is less than the level of significant,
you will then reject the null hypothesis.
"
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)"
𝜒 = $% 5
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
Spreadsheet Functions:
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
27
Correlation Coefficient: is used to standardize the covariance in order to better
interpret. It is a measure between -1 and +1 that indicates the degree and direction of
the relationship between two random variables or sets of data.
COV (X,Y )
CORR(X,Y) =
σ Xσ Y
Spreadsheet Functions
Covariance
n
n
Cov(X,Y) = ∑ P(X = x i , Y = y i )[(x i − µ X )(y i − µY )] =
∑ i=1
(x i − µ X )(y i − µY )
i=1 n
Expected value: E[W] = aμX + bμY These relations hold for any
Variance: VAR[W] = a2σ2X + b2σ2Y + 2abCOV(X,Y) distribution of X and Y. However,
= a2σ2X + b2σ2Y + 2abσXσYCORR(X,Y) if X and Y are ~N, then W is ~N as
Standard Deviation = σW = √VAR[W] well!
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
28
Regression
Summary
In this review, we expand our tool set of predictive models to include ordinary least squares
regression. This equips us with the tools to build, run and interpret a regression model. We first
introduce how to work with multiple variables and their interaction. This includes correlation and
covariance, which measures how two variables change together. As we review how to work with
multiple variables, it is important to keep in mind that the data sets supply chain managers will
deal with are largely samples, not a population. This means that the subset of data must be
representative of the population. The later part of the lesson introduces hypothesis testing,
which allows us to answer inferences about the data.
We then tackle linear regression. Regression is a very important practice for supply chain
professionals because it allows us to take multiple random variables and find relationships
between them. In some ways, regression becomes more of an art than a science. There are four
main steps to regression: choosing with independent variables to include, collecting data,
running the regression, and analyzing the output (the most important step).
Linear Model:
yi = β0 + β1 xi
Yi = β0 + β1 xi + εi for i = 1,2,...n
Residuals
Because a linear regression model is not always appropriate for the data, you should assess the
appropriateness of the model by defining residuals. The difference between the observed value
of the dependent variable and predicted value is called the residual.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
29
ŷi = b0 + b1 xi for i = 1,2,...n
ei = yi − ŷi = yi − b0 + b1 xi for i = 1,2,...n
n
∑ (xi − x )( yi − y)
= y − b1 x b =
1
i=1
n
∑ (xi − x ) 2
i=1
Multiple Variables
These relationships translate also to multiple variables.
n n 2 n 2
∑ (e ) = ∑ ( y − ŷ ) = ∑ ( y − b − b x
i=1
2
i i=1 i i i=1 i 0 1 1i
− ... − bk xki )
Validating a Model
All statistical software packages will provide statistics for evaluation (names and format will vary
by package). But the model output typically includes: model statistics (regression statistics or
summary of fit), analysis of variance (ANOVA), and parameter statistics (coefficient statistics).
Overall Fit
Overall fit = how much variation in the dependent variable (y), can we explain?
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
30
Make estimate for of Y for each x
⎛ RSS ⎞⎛ n −1 ⎞
adjR 2 = 1− ⎜ ⎟⎜ ⎟
⎝ TSS ⎠⎝ n − k −1 ⎠
Individual Coefficients
Each Independent variable (and b0) will have:
• An estimate of coefficient (b1),
• A standard error (sbi) 2
se =
∑( y − ŷ )
i i
• The t-statistic
o k = number of independent variables
o bi = estimate or coefficient of independent
variable
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
31
Corresponding p-value – Testing the Slope
o We want to see if there is a linear relationship, i.e. we want to see if the slope (b1)
is something other than zero. So: H0: b1 = 0 and H1 b1 ≠ 0
Autocorrelation is a characteristic of data in which the correlation between the values of the
same variables is based on related objects. It is typically a time series issue.
Heteroscedasticity is when the variability of a variable is unequal across the range of values of a
second variable that predicts it. Some tell-tale signs include: observations are supposed to have
the same variance. Examine scatter plots and look for “fan-shaped” distributions.
Learning Objectives
• Understand how to work with multiple variables.
• Be aware of data limitations with size and representation of population.
• Identify how to test a hypothesis.
• Review and apply the steps in the practice of regression.
• Be able to analyze regression and recognize issues.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
32
Optimization
Summary
This is an introduction and overview of optimization. It starts with an overview of unconstrained
optimization and how to find extreme point solutions, keeping in mind first order and second
order conditions. It also reviews rules in functions such as the power rule. Next the lesson review
constrained optimization that shares similar objectives of unconstrained optimization but adds
additional decision variables and constraints on resources. To solve constrained optimization
problems, the lesson introduces mathematical programs that are widely used in supply chain for
many practices such as designing networks, planning production, selecting transportation
providers, allocating inventory, scheduling port and terminal operations, fulfilling orders, etc. The
overview of linear programming includes how to formulate the problem, how to graphically
represent them, and how to analyze the solution and conduct a sensitivity analysis.
In real supply chains, you cannot have .5 bananas in an order or shipment. This means that we
must add additional constraints for integer programming where either all of the decision
variables must be integers, or in a mixed integer programming where some, but not all, variables
are restricted to be an integer. We review the types of numbers you will encounter. Then we
introduce integer programs and how they are different. We then review the steps to formulating
an integer program and conclude with conditions for working with binary variables.
Key Concepts
Unconstrained Optimization
Unconstrained optimization considers the problem of minimizing or maximizing an objective
function that depends on real variables with no restrictions on their values.
Extreme points
• Extreme points are when a function takes on an extreme value - a value that is much
smaller or larger in comparison to nearby values of the function.
• They are typically a min or a max (either global or local), or inflection points.
• Extreme points occur where slope (or rate of change) of function = 0.
• Test for Global vs. Local
o Global min/max – for whole range
o Local min/max – only in certain area
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
33
4. Take the first derivative of your function
5. Set it equal to zero, and
6. Solve for X*, the value of x at extreme point.
This is called the First Order Condition.
Constrained Optimization
Similarities with unconstrained optimization
• Requires a prescriptive model
• Uses an objective function
• Solution is an extreme value
Differences
• Multiple decision variables
• Constraints on resources
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
34
Math Programming: Math programming is a powerful family of optimization methods that is
widely used in supply chain analytics. It is readily available in software tools, but is only as good
as the data input. It is the best way to identify the “best” solution under limited resources.
Linear Programs
1. Decision Variables
• What are you trying to decide?
• What are their upper or lower bounds?
2. Formulate objective function
• What are we trying to minimize or maximize?
• Must include the decision variables and the form of the function determines the approach
(linear for LP)
3. Formulate each constraint
• What is my feasible region? What are my limits?
• Must include the decision variable and will almost always be linear functions
Solution
The solution of a linear program will always be in a “corner” of the Feasible Region:
• Linear constraints form a convex feasible region.
• The objective function determines in which corner is the solution.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
35
The Feasible Region is defined by the constraints and the bounds on the decision variables (See
Figure).
Additional Questions:
• Am I using all of my resources?
• Where do I have slack?
• Where I am constrained?
• How robust is my solution?
Sensitivity Analysis: what happens when data values are changed.
• Shadow Price or Dual Value of Constraint: What is the marginal gain in the profitability
for an increase of one on the right-hand side of the constraint?
• Slack Constraint – For a given solution, a constraint is not binding if total value of the left-
hand side is not equal to the right hand side value. Otherwise it is a binding constraint
• Binding Constraint – A constraint is binding if changing it also changes the optimal solution
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
36
Figure 14. Alternative of multiple optimal solutions
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
37
Integer and Mixed Integer Programs
Although in some cases a linear program can
provide an optimal solution, in many it cannot. Numbers
For example, in warehouse location selection,
batch orders, or scheduling, fractional answers • N = Natural, Whole or Counting numbers
are not acceptable. In addition, the optimal 1, 2, 3, 4
solution cannot always be found by rounding • Z-Integers = -3, -2, -1, 0, 1, 2, 3
the linear program solution. This is where • Q = Rational Number, continuous numbers =
integer programs are important. However, Any faction of integers ½ , -5/9
integer program solutions are never better than • R = Real Numbers = all Rational and
a linear program solution, they lower the Irrational Numbers, ex: e, pie, e
objective function. In general, formulating • Binary Integers = 0, 1
integer programs is much harder than
formulating linear program.
• To identify the solution in integer programs – the Feasible Region becomes a
collection of points, it is no longer a convex hull (see Figure)
• In addition, cannot rely on “corner” solutions anymore – the solution space is much
bigger
Mass enumeration - Unlike linear programs, integer programs can only take a finite number
of integer values. This means that one approach is to enumerate all of these possibilities –
calculating the objective function at each one and choosing the one with the optimal value.
As the problem size increases, this approach is not feasible.
s.t.
XHL + XSL ≤ 11
Plan
t 3XHL + 2XSL ≤ 30
Add.
A XHL + 3XSL ≤ 28
Add.
B XHL, XSL ≥ 0 Integers
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
38
Binary Variables
Suppose you had the following formulation of a minimization problem subject to capacity at
plants and meeting demand for individual products:
We could add binary variables to this formulation to be able to model several different logical
conditions. Binary variables are integer variables that can only take the values of 0 or 1.
Generally, a positive decision (do something) is represented by 1 and the negative decision (do
nothing) is represented by the value of 0.
s.t.
where:
∑x i ij
≤Cj ∀j xij = Number of units of product i made
in plant j
∑x j ij
≥ Di ∀i yj = 1 if plant j is opened; = 0 otherwise
cij = Cost per unit of product i made at
∑x i ij
− My j
≤ 0 ∀j plant j
fj = Fixed cost for producing at plant j
xij ≥ 0 ∀ij Cj = Capacity in units at plant j
Di = Demand for product i in units
y j = {0,1} M = a big number (such as Cj in this
case)
Note that the not only we have added the binary variable in the objective function, we have also
added a new constraint (the third one). This is known as a linking constraint or a logical
constraint. It is required to enforce an if-then condition in the model. Any positive value of xij
will force the yj variable to be equal to one. The “M” value is a big number – it should be as small
as possible, but at least as big as the values of what sum of the xij’s can be. There are also more
technical tricks that can be used to tighten this formulation.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
39
We can also introduce Either/Or Conditions- where there is a choice between two constraints,
only one of which has to hold; it ensures a minimum level, Lj, if yj=1.
∑xi ij
− My j ≤ 0 ∀j ∑x i ij
− L j y j ≥ 0 ∀j
For example:
where:
xij = Number of units of product i made in plant j
∑ x ≤C
i ij j
∀j
yj = 1 if plant j is opened; = 0 o.w.
∑ x ≥D
j ij i
∀i M = a big number (such as Cj in this case)
Cj = Maximum capacity in units at plant j
∑ x − My
i ij j
≤ 0 ∀j Lj = Minimum level of production at plant j
Di = Demand for product i in units
We need to add a constraint that ensures that if we DO use plant j, that the volume is between
the minimum allowable level, Lj, and the maximum capacity, Cj. This is sometimes called an
Either-Or condition.
where:
∑x i ij
≤ My j ∀j
xij = Number of units of product i made in plant j
∑x i ij
≥ Lj y j ∀j yj = 1 if plant j is opened; = 0 o.w.
M = a big number (such as Cj in this case)
Lj = Minimum level of production at plant j
Finally, we can create a Select From Condition – that allows us to select the best N choices. Note
that this can be formulated as “choose at least N” or “choose no more than N” by changing the
inequality sign on the second constraint.
∑x i ij
− My j ≤ 0 ∀j ∑ j
yj ≤ N
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
40
Learning Objectives
• Learn the role of optimization.
• Understand how to optimize in unconstrained conditions.
• Identify how to find Extreme Point Solutions.
• Understand how to formulate problems with decision variables and resource constraints
and graphically present them.
• Review how to interpret results and conduct sensitivity analysis.
• Understand the different “types” of numbers and how they change the approach to
problems.
• Review the approach of formulating integer and mixed integer program problems and
solving them.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
41
Networks and Non-Linear Programming
Summary
This review concludes the learning portion on optimization with an overview of some frequently
used advanced optimization models. Network models are key for supply chain professionals. The
review begins by first defining the terminology used frequently in these networks. It then
introduces common network problems including the Shortest Path, Traveling Salesman Problem
(TSP), and Flow problems. These are used frequent in supply chain management and
understanding when they arise and how to solve them is essential. We then introduce non-linear
optimization, highlighting its differences with linear programming, and an overview of how to
solve non-linear problems. The review concludes with practical recommendations of for
conducting optimization, emphasizing that supply chain professionals should: know their
problem, their team and their tool.
Key Concepts
Network Models
Network Terminology
• Node or vertices – a point (facility, DC, plant, region)
• Arc or edge – link between two nodes (roads, flows, etc.) may be directional
• Network or graph – a collection of nodes and arcs
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
42
• Importance: TSP is at the core of all vehicle routing problems; local routing and last mile
deliveries are both common and important.
• Challenges: It is exceptionally hard to solve exactly, due to its size; possible solutions
increase exponentially with number of nodes.
• Primary approach: special algorithms for exact solutions (smaller problems) – Heuristics
(many available).
o Two examples: Nearest Neighbor, Cheapest Insertion
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
43
Non-Linear Optimization
A nonlinear program is similar to a linear program in that it is composed of an objective function,
general constraints, and variable bounds. The difference is that a nonlinear program includes at
least one nonlinear function, which could be the objective function, or some or all of the
constraints.
Figure 16. Example of NLP with linear constraint and non-linear objective function (z=xy).
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
44
o Run multiple “what-if” scenarios changing uncertain input values and testing
different conditions.
• Models vs. People (models don’t make decisions, people do!)
o Optimization models are good at making trade-offs between complicated options
and uncovering unexpected insights and solutions.
o People are good at:
§ Considering intangible and non-quantifiable factors,
§ Identifying underlying patterns, and
§ Mining previous experience and insights.
§ Models should be used for Decision SUPPORT not for the decision.
Learning Objectives
• Introduction to advanced optimization methods.
• Understand the conditions and when to apply network models.
• Differentiate nonlinear optimization and when it should be used.
• Review recommendations for running optimization in practice – emphasizing
importance of knowing the problem, team and tool.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
45
Algorithms and Approximations
Summary
In this lesson, we will be reviewing Algorithms and approximations. The first half of the lesson
will be a review of algorithms – which you technically have already be introduced to, but perhaps
not in these terms. We will be reviewing the basics of algorithms, their components, and how
they are used in our everyday problem solving! To demonstrate these, we will be looking at a few
common supply chain problems such as the Shortest Path problem, Traveling Salesman Problem,
and Vehicle Routing Problem while applying the appropriate algorithm to solve them.
In this next part of the lesson, we will be reviewing approximations. Approximations are good
first steps in solving a problem because they require minimal data, allow for fast sensitivity
analysis, and enable quick scoping of the solution space. Recognizing how to use approximation
methods are important in supply chain management because commonly optimal solutions
require large amounts of data and are time consuming to solve. So, if that level of granularity is
not needed, approximation methods can provide a basis to work from and to see whether further
analysis is needed.
Algorithms
Algorithm - a process or set of rules to be followed in calculations or other problem-solving
operations, especially by a computer.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
46
Shortest Path Problem
Subject to:
∑ x =1
i ji
∀ j=s
∑ x −∑ x
i ji i ij
= 0 ∀ j ≠ s, j ≠ t
∑ x =1
i ij
∀ j=t
xij ≥ 0
where:
xij = Number of units flowing from node i to node j
cij = Cost per unit for flow from node i to node j
s = Source node – where flow starts
t = Terminal node – where flow ends
Dijkstra’s Algorithm
Djikstra's algorithm (named after its discover, E.W. Dijkstra) solves the problem of finding the
shortest path from a point in a graph (the source) to a destination.
Inputs:
• Connected graph with nodes and arcs with positive costs, d(ij)
• Source (s) and Terminal (t) nodes
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
47
Algorithm:
1. for all nodes in graph, set L()=∞, P()=Null, S()=0
2. set s to i, S(i)=1, and L(i)=0
3. For all nodes, j, directly connected (adjacent) to node i; if L(j) > L(i) + d(ij), then set L(j) =
L(i) + d(ij) and P(j)=i
4. For all nodes where S()=0, select the node with lowest L() and set it to i, set S(i)=1
5. Is this node t, the terminal node? If so, go to end. If not, go to step 3
6. end – return L(t)
Output:
L(t) and P array
2-Opt Heuristic
1. Identify pairs of arcs (i-j and k-l), where d(ij) + d(kl) > d(ik) + d(jl) – usually where they cross
2. Select the pair with the largest difference, and re-connect the arcs (i-k and j-l)
3. Continue until there are no more crossed arcs.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
48
Vehicle Routing Problem
Find minimum cost tours from single origin to multiple destinations with varying demand using
multiple capacitated vehicles.
Heuristics
• Route first Cluster second
o Any earlier TSP heuristic can be used
Optimal
• Mixed Integer Linear Program (MILP)
• Select optimal routes from potential set
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
49
Sweep Heuristic
1. Form a ray from the DC and select an angle and direction (CW vs CCW) to start
2. Select a new vehicle, j, that is empty, wj=0, and has capacity, cj.
3. Rotate the ray in selected direction until it hits a customer node, i, or reaches the starting
point (go to step 5).
4. If the demand at i (Di) plus current load already in the vehicle (wj) is less than the vehicle
capacity, add it to the vehicle, wj=Di + wj and go to step 3. Otherwise, close this vehicle,
and go to step 2 to start a new tour.
5. Solve the TSP for each independent vehicle tour.
Different starting points and directions can yield different solutions! Best to use a variety or
a stack of heuristics.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
50
Savings Heuristic
1. Calculate savings si,j = cO,i + cO,j - ci,j for every pair (i, j) of demand nodes.
2. Rank and process the savings si,j in descending order of magnitude.
3. For the savings si,j under consideration, include arc (i, j) in a route only if:
• No route or vehicle constraints will be violated by adding it in a route and
• Nodes i and j are first or last nodes to/from the origin in their current route.
4. If the savings list has not been exhausted, return to Step 3, processing the next entry in
the list; otherwise, stop.
Min ∑C Y j j
j
s.t.
J
∑a Y ij j
≥ Di ∀i
j=1
J
∑Y j
≤ V
j=1
Y j = {0,1} ∀j
Indices
Demand nodes i
Vehicle routes j
Input Data
Cj = Total cost of route j ($)
Di = Demand at node i (units)
V = Maximum vehicles
aij = 1 if node i is in route j;
= 0 otherwise
Decision Variables
Yj = 1 if route j is used,
=0 otherwise
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
51
Approximation Methods
In this second half, we will discuss examples of approximation and estimation. In particular we
will review estimation of One-to-Many Distribution through line haul distance, traveling salesman
and vehicle routing problems.
Quick Estimation
Simple Estimation Rules:
1. Break the problem into pieces that you can estimate or determine directly
2. Estimate or calculate each piece independently to within an order of magnitude
3. Combine the pieces back together paying attention to units
Example:
How many piano tuners are there in Chicago?"
• There are approximately 9,000,000 people living in Chicago.
• On average, there are two persons in each household in Chicago.
• Roughly one household in twenty has a piano that is tuned regularly.
• Pianos that are tuned regularly are tuned on average about once per year.
• It takes a piano tuner about two hours to tune a piano, including travel time.
• Each piano tuner works eight hours in a day, five days in a week, and 50 weeks in a year.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
52
Estimation of One to Many Distribution
Assumptions:
• Vehicles are homogenous
• Same capacity, QMAX
• Fleet size is constant
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
53
Route to serve a specific district:
• Line haul from origin to the 1st customer in the district
• Local delivery from 1st to last customer in the district
• Back haul (empty) from the last customer to the origin
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
54
Random Network: different approach
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
55
Estimating Local Route Distances
Traveling Salesman Problem
• Starting from an origin, find the minimum distance required to visit each destination once
and only once and return to origin.
• The expected TSP distance, dTSP, is proportional to √(nA) where n= number of stops and
A=area of district
• The estimation factor (kTSP) is a function of the topology
A = Area of district
n = Number of stops in district
δ = Density (# stops/Area)
kTSP = TSP network factor (unitless)
dTSP= Traveling Salesman Distance
dstop= Average distance per stop
⎛n⎞ ⎛ n ⎞
dTSP ≈ kTSP nA = kTSP n ⎜ ⎟ = kTSP ⎜ ⎟
⎝δ ⎠ ⎝ δ⎠
kTSP nA A
d stop ≈ = kTSP
n n
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
56
What values of kTSP should we use?
• Lots of research on this for L1 and L2 networks - depends on district shape, approach to
routing, etc.
• Euclidean (L2) Networks
o kTSP = 0.57 to 0.99 depending on clustering & size of N (MAPE~4%, MPE~-1%)
o kTSP=0.765 commonly used and is a good approximation!
• Grid (L1) Networks
o kTSP = 0.97 to 1.15 depending on clustering and partitioning of district
ckTSP
dTOUR = 2d LineHaul +
δ
nkTSP
d AllTours = ldTOUR = 2ld LineHaul +
δ
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
57
Key Points
• Review the basis and components of algorithms
• Recognize desired properties of an algorithm
• Review different network algorithms
• Recognize how to solve the Shortest Path Problem
• Recognize which algorithms to use for the Traveling Salesman Problem
• Recognize how to solve a Vehicle Routing Problem (Cluster First – Route Second)
• Review how to use approximations
• Recognize steps to quick estimation
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
58
Simulation
Summary
This review provides an overview of simulation. After a review of deterministic, prescriptive
modeling (optimization, math programming) and predictive models (regression), simulation
offers an approach for descriptive modeling. Simulations let you experiment with different
decisions and see their outcomes. Supply chain managers typically use simulations to assess the
likelihood of outcomes that may follow from different actions. This review outlines the steps in
simulation from defining the variables, creating the model, running the model, and refining the
model. It offers insight into the benefits of using simulation for open ended questions but warns
of its expensive and time-consuming nature.
Over the duration of the course we have reviewed several types of models including optimization,
regression, and simulation. Optimization (LP, IP, MILP, NLP) is a prescriptive form of model that
finds the “best solution” and/or provides a recommendation. Regression is a predictive form of
model that measures the impact of independent variables on dependent variables. We now
cover simulation, which captures the outcomes of different policies with an uncertain or
stochastic environment.
Simulation
Simulation can be used in a variety of contexts; it is most useful in capturing complex system
interactions, modeling system uncertainty, or generating data to analyze, describe and visualize
interactions, outcomes, and sensitivities. There are several classes of simulation models
including: system dynamics; Monte Carlo Simulation; discrete time simulation; and agent based
simulations.
There are five main steps in developing a simulation study. Formulate and plan the study; collect
data and define a model; construct model and validate; make experimental runs; and analyze
output. The following will review how each of those steps can be conducted.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
59
Collect data & define a model
Once the plan has been formulated, the data needs to be collected and a model defined.
If you are faced with a lack of sample data – you will need to determine the “range” of the
variable(s) by talking to stakeholders or experts to identify possible values. Some data can be
derived from known distributions such as Poisson or Uniform/Triangular Distributions when little
to no information is available.
If sample data is available, conduct steps as we have previously reviewed such as histograms,
calculating summary statistics. Then conduct a Chi-Square test to fit the sample to “traditional”
distributions. Or use a “custom” empirical distribution such as discrete empirical (use % of
observation as probabilities), or continuous – use histogram to compute probabilities of each
range and then “uniform” within the range.
In Excel:
=CHITEST(Observed_Range,Actual_Range) – returns p-value
=1-CHIDIST(Chi-square, Degrees of freedom) – returns the p-value
Make necessary inputs random, add a data table to automate runs of model, add summary
statistics based on results from data table.
Generating random variables with the underlying principle of generating a random (or pseudo-
random) number and transform it to fit the desired distribution:
• Manual techniques: rolling die, turning a roulette wheel, random number tables.
• Excel
o RAND () = continuous variable between 0 and 1
§ Generate random number u
§ For each random u, calculate a value y whose cumulative distribution
function is equal to u; assign value y as the generated number:
F(y) = P(y)=u
• Uniform Distribution ~U(a,b)
o ~U(a, b) = a + (b – a) * RAND()
• Normal Distribution ~N(μ, σ)
o ~N(μ, σ) =NORMINV( RAND(), μ, σ)
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
60
Validation of Model – is the process of determining the degree to which a model and its
associated data are an accurate representation of the real world for the intended use of the
model. Different ways of validating model including comparing to historical data or getting expert
input. One primary method is parameter variability and sensitivity analysis:
• Generate statistical parameters with confidence intervals.
• Hypothesis Testing (see Week 6).
Analyze output
Analyzing output deals with drawing inferences about the system model based on the simulation
output. Need to ensure that the model output has maintained a proper balance between
establishing model validation and statistical significance. Depending on the system examined,
simulation will potentially be able to provide valuable insight with the output. The ability to draw
inferences from results to make system improvements will be discussed further in future courses.
Learning Objectives
• Review the steps to developing a simulation model.
• Understand when to use a simulation, and when to not.
• Recognize different kinds of simulations and when to apply them.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
61
References
APICS Supply Chain Council - http://www.apics.org/sites/apics-supply-chain-council
Chopra, Sunil, and Peter Meindl. "Chapter 1." Supply Chain Management: Strategy, Planning, and
Operation. 5th edition, Pearson Prentice Hall, 2013.
Law & Kelton (2000). Simulation Modeling & Analysis, McGraw Hill.
Taha, H.A. (2010). Operations Research. An introduction. 9th edition. Pearson Prentice Hall.
Winston (2003) Operations Research: Applications and Algorithms, Cengage Learning. There are
many different books by Wayne Winston - they are all pretty good.
03-22-2019・CTL.SC0x – Supply Chain Analytics Key Concepts・MITx MicroMasters in Supply Chain Management
MIT Center for Transportation & Logistics・Cambridge, MA 02142 USA ・[email protected]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
62