Discrete-Event Simulation Input Process Modeling
Discrete-Event Simulation Input Process Modeling
Discrete-Event Simulation Input Process Modeling
Lawrence M. Leemis
Department of Mathematics
College of William & Mary
Williamsburg, VA 23187-8795, U.S.A.
Discrete Binomial( n, p)
Degenerate ( c )
Normal (Il, 0- 2 )
Univariate Continuous Exponential( A)
Bezier curve
Time-independent
models
Discrete
Independent bino mial( n, p)
Mixed
Bivariate exponent
Input Models
Stationary
Markov chain
Discrete-state
Nonstationary
Discrete- ti me
Stationary
ARMA(p, q)
Continuous-state
Nonstationary
ARIMA(p, d, q)
Stochastic Processes
Stationary
Mar kov process
Continuous-state
Nonstationary
mass at one value. Examples of continuous distribu- ulation of a queuing systenl. The serVIce times In
tions include the normal distribution, an exponential seconds are
distribution with a random parameter A (see, for ex-
ample, Martz and Waller 1982), and Bezier curves 105.84 28.92 98.64 .55.56 128.04 45.60
(Flanigan-Wagner and Wilson 1993). Bezier curves 67.80 105.12 48.48 51.84 173.40 51.96
offer a unique combination of the paranletric and non- .54.12 68.64 93.12 68.88 84.12 68.64
parametric approaches. An initial distribution is fit- 41.52 127.92 42.12 17.88 33.00.
ted to the data set, then the modeler decides whether [Although these service tinles COIlle from the life test-
differences between the empirical and fitted nl0dels ing literature (Lieblein and Zelen 1956), the sanle
represent sampling variability (chance variation) or principles apply to both input nl0deling and survival
an aspect of the distribution that should be included analysis.]
in the input model. The first step is to assess \vhether the observations
Examples of k-variable multivariate input 1110d- are independent and identically distributed (iid). The
els (see Johnson 1987) include a sequence of k in- data must be given in the order collected for inde-
dependent binomial random variables, a nlultivari- pendence to be assessed. Situations where the iid
ate normal distribution with Inean J1 and variance- assumption \\Tould not be valid include:
covariance lllatrix L and a bivariate exponential dis-
tribution (Barlow and Proschan 1981). • A new teller has been hired at a bank and the 23
The lower half of the taxonomy contains stochas- service tinles represent a task that has a steep
tic process models. These models are often used to learning curve. The expected service time is
solve problems at the system level, in addition to likely to decrease as the new teller learns how
serving as input models for simulations with stochas- to perform the task more efficiently.
tic elements. Models are classified by how tilne is
measured (discrete/continuous), the state space (dis- • The service times represent 23 conlpletion times
crete/ continuous) and whether the 1110del is station- of a physically demanding task during an 8-hour
ary in time. For Markov models, the discrete-state/ shift. If fatigue is a significant factor, the ex-
continuous-state branch typically determines whether pected time to cOlllplete the task is likely to
the model will be called a "chain" or a "process" , and increase with time.
the stationary/nonstationary branch typically deter- If a sinlple linear regression of the observation num-
mines whether the model will be preceded with the bers regressed against the service times shows a signif-
term "homogeneous" or "nonhomogeneous". Exanl- icant nonzero slope, then the iid assumption is prob-
pIes of discrete-time stochastic processes include ho- ably not appropriate.
mogeneous, discrete-time Markov chains (Ross 1993) Assume that there is a suspicion that a learning
and ARIMA time series models (Box and Jenkins curve is present. An appropriate hypothesis test is
1976). Since point processes are counting processes,
they have been placed on the continuous-time, dis-
crete-space branch. Although the Poisson, renewal
and nonhomogeneous Poisson processes are all pure HI : PI < 0
birth processes, more general point processes, such
associated with the linear model (Neter, Wasserman,
as one to model the number of custonlers in a queue,
and Kutner 1989)
can be placed on one of the continuous time, discrete-
space branches.
Service
Time
150
100
50
..
0 Observation
'-----~-----.----.-----y---- Number
10 15 20
Figure 2: Service Time Vs. Observation Number Figure 3: Histogram of Service Times
adjacent observations. For this particular example, 68.64 seconds, and the observation in the far right-
assume that we are satisfied that the observations hand tail of the distribution, 173.40 seconds, tend to
are truly iid in order to perform a classical statistical indicate that a parametric analysis is more appropri-
analysis. ate. Since the input model is for service times, the
The next step in the analysis of this data set in- accurate modeling of the right-hand tail of the dis-
cludes plotting a histogram and calculating the values tribution is critical. These long service times signifi-
of some sample statistics. A histogram of the obser- cantly impact queuing statistics. For this particular
vations is shown in Figure 3. Although the data set data set, a parametric approach is chosen.
is small, a skewed bell-shaped pattern is apparent. There are dozens of choices for a univariate para-
The largest observation lies in the far right-hand tail metric model for the service times. These include gen-
of the distribution, so care must be taken to assure eral families of scalar distributions, modified scalar
that it is representative of the population. The sam- distributions and commonly-used parametric distri-
ple mean, standard deviation, coefficient of variation, butions (see Schmeiser 1990). Since the data is drawn
and skewness are from a continuous population and the support of the
s distribution is positive, a time-independent, univari-
x = 72.22 s = 37.49 - = 0.52
x ate, continuous input model is chosen. The shape
of the histogram indicates that the gamma, inverse
;?=
I n ( Xi -
-s-
_)3 = 0.88.
X Gaussian, log logistic, log normal, and Weibull dis-
tributions (Lawless 1982) are good candidates. The
t=l
Weibull distribution is analyzed in detail here. Simi-
Examples of the interpretations of these sample statis- lar approaches apply to the other distributions.
tics are: Parameter estimates for the Weibull distribution
can be found by least squares, the method of mo-
• A coefficient of variation s / x close to 1, along
ments, and maximum likelihood. Due to desirable
with the appropriate histogram shape, indicates
statistical properties, maximum likelihood is empha-
that the exponential distribution is a potential
sized here. The Weibull distribution has probability
input model.
density function
• A sample skewness close to 0 indicates that a
symmetric distribution (e.g., a normal distribu-
tion) is a potential input model.
where A is a positive scale parameter and K is a posi-
The next decision that needs to be made is whether tive shape parameter. Let Xl, X2, .•. ,X n be the data
a parametric or nonparametric input model should be values. The likelihood function is
used. One simple nonparametric model would repeat-
edly select one of the service times with probability
1/23. The small size of the data set, the tied value,
Discrete Event Sinlulation Input Process Alodeling
alog L('x, K)
o,x 0.2
and
0.0
50 100 150
/I
F
0.020
1.0
0.015 0.8
0.6
O.OlO
0.4
0.005
0.2
0.0 0.0
L-.,----.-------.--------.-----,----K F
o 0.0 0.2 0.4 0.6 0.8 1.0
Figure 5: 95% Confidence Region Based on the Like- Figure 6: A P-P Plot for the Service Times
lihood Ratio Statistic
Many of the discrete-event sin1ulation packages 0.4499 0.5495 0.6921 3.643 4.357.
exhibited at the ~Vinter Simulation Conference have
the capability of determining 111axin1UITI likelihood es- One preliminary statistical issue concerning this
tin1ators for several paran1etric distributions. If the data is whether the three days represent processes
Discrete Event Simulation Input Process l\Jodeling -J5
Weibull process has intensity function Maximum likelihood estimators can be determined
by maximizing L( B) or its logarithm with respect to
t > 0, all unknown parameters. Confidence intervals for the
where A and K are positive parameters. This pop- unknown parameters can be found in a similar man-
ular model would not be appropriate for this data ner to the service time exanlple.
46 LeeIllis