Inference Presentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Presentation

Statistical Estimation Methods in Hydrological Engineering


By
Talha Farooq
Abdul Rauf
Hafsa Ahmed
Hydrological Engineering
• Hydrological Engineering is a civil Engineering and chiefly
concerned with the flow and storage of water. Hydrological
Engineering also focuses on preventing floods and lessening the
effects of floods and other natural disasters.
How can we use Statistical Estimation Methods in
civil Engineering ?

• In designing civil engineering structures use is made of probabilistic


calculation methods. Stress and load parameters are described by
statistical distribution functions. The parameters of these
distribution functions can be estimated by various methods.
Parameters
• Load tests help you understand how a system behaves under an
expected load.
• Stress tests help you understand the upper limits of a system's
capacity using a load beyond the expected maximum.
Interest of Research Paper

• The main point of interest is the behavior of each method for


predicting p-quantiles, where p«1.
• An extensive comparison of different estimation methods is given in
this paper.
• In this paper the performance of the parameter estimation methods
with respect to its small sample behavior is analyzed with Monte
Carlo simulations, added with mathematical proofs.
Parameter estimation methods
• The method of moments (Johan Bernoulli, 1667-1748)
• The method of maximum likelihood (Daniel Bernoulli, 1700-1782)
• The method of least squares (on the original or on the linearized data),
(Gauss, 1777- 1855)
• The method of Bayesian estimation (Bayes, 1763)
• The method of minimum cross entropy (Shannon, 1949)
• The method of probability weighted moments (Greenwood et al., 1979)
• The method of L-moments (Hosking, 1990)
working
• In this paper, we will in particularly investigate the performance of
the parameter estimation method with respect to three different
criteria;
• (i) based on the relative bias and root mean squared error (RMSE)
• (ii) based on the over- and underdesign.
Terminologies
•  Relative bias  is defined as the estimate of the systematic error. In
practice bias is usually determined as the difference between the
mean obtained from a large number of replicate measurements
with a sample having a reference value.
• Root Mean Square Error (RMSE) is the standard deviation of the
residuals.
RMSD =
Goal
• It is desirable that the quantile estimate be unbiased, its expected
value should be equal to the true value. It is also desirable that an
unbiased estimate be efficient, i.e., its variance should be as small
as possible.
• The problem of unbiased and efficient estimation of extreme
quantiles from small samples is commonly encountered in the civil
engineering practice.
Goal
• The first step in quantile estimation involves fitting an analytical
probability distribution to represent adequately the sample
observations.
• To achieve this, the distribution type should be judged from data
and then parameters of the selected distribution should be
estimated.
Methods for the selection of distribution

• In this paper three different methods for the selection of the


distribution type will be reviewed, extended and tested.
• The first method is based on Bayesian statistics.
• The second one on linear regression.
• The third one on Lmoments.
L-moments
• Linear combinations of expectations of order statistics referred to as
L-moments by Hosking.
• Hosking shows that it is very useful in statistical parameter
estimation.
• Being a linear combination of data, they are less influenced by
outliers, and the bias of their small sample estimates remains fairly
small.
4th L-moment
•• A  measure of kurtosis derived from L-moments is known as L-kurtosis was
suggested as a useful indicator of distribution shape by Hosking.
• Hosking proposed a simple but effective approach to fit 3-parameter distributions.
• The approach involves the computation of three L-moments from a given sample.
• By matching the three L-moments, a set of 3-parameter distributions can be fitted
to the sample data.
• In this paper, L-moment related to L-kurtosis is suggested to be the most
representative distribution, which should be used for quantile estimation.
Focus
• This paper will focus on evaluating the robustness of the L-kurtosis
measure in the distribution selection and extreme quantile
estimation from small samples.
Classical Estimation Methods
• To make statements about the population based on a sample, it is important to
understand in what way the sample relates to the population. In most cases the
following assumptions will be made:
• 1. Every sample observation x is the outcome of a random variable X which has
an identical distribution (either discrete or continuous) for every member of the
population.
• 2. The random variables X1, X2, ..., Xn corresponding to the different members
of the sample are independent.
• These two assumptions (abbreviated to i.i.d. (independent identically
distributed)) formalize what is meant by the statement of drawing a random
sample from a population.
Statement
• We have now reduced the problem to one which is
mathematically very simple to state: we have i.i.d. observations
x1, x2, ..., xn of a random variable X with probability function
(in the discrete case) or probability density function (in the
continuous case) f, and we want to estimate some aspect of this
population distribution (for instance the mean or the variance).
• If a statistic is used for the purpose of estimating a Parameter,
then it is called an estimator and the realized value T(X) is
called an estimate.
Question Arises
• How should we determine whether an estimator is good or not?
• There are, in fact, many such criteria: we will focus on the two
most widely used:
• Though we cannot hope to estimate a parameter perfectly, we
might hope that ‘on average’ the estimation procedure gives
the correct result.
• Estimators are to be preferred if they have small variability; in
particular, we may require the variability to diminish as we
take samples of a larger size.
• These concepts are formalized as follows.
• The estimator T(X) is unbiased for θ if : E(T(X)) = θ
• Otherwise, B(T) = E(T(X)) - θ is the bias of T.
• If B(T)→0 as the sample size n→∞, then T is said to be asymptotically unbiased for θ.
• The mean-squared error of an estimator is defined by:
• MSE(T) = E((T(X) - θ)^2) (eq 1)
Note that MSE(T)=var(T)+B^2(T). (eq 2)
Indeed MSE(T) = E(T^2(X)- 2θT(X) + θ^2)
= E(T^2(X)) –2θE(T(X)) + θ^2
= E(T^2(X)) - 2θ(B(T)+ θ) + θ^2
= E(T^2(X)) - 2θB(T) – θ^2
and var(T) = E(T^2(X)) – [E(T(X))]^2
= E(T^2(X)) – (B(T) + θ )^2
= E(T^2(X)) – B^2(T) - 2θB(T) – θ^2 (This proves the equality)
• The root mean-squared error of an estimator is defined as:
• RMSE = √MSE
• An estimator T is said to be mean-squared consistent for θ if MSE(T) → 0 as the
sample size n → ∞
• Ideally, estimators are both unbiased and consistent. We also prefer estimators
to have as small variance as possible. In particular , given two estimators T1 and
T2 both being unbiased for θ, then T1 is said to be more efficient than T2 if :
• var(T1(X)) < var(T2(X))
• If estimators are not unbiased it is not so straightforward to determine efficiency:
we often had to make a choice between estimators that have low bias, but high
mean squared error and estimators that have high bias, but low mean squared
error.
Method of Moments (MOM)

• It is difficult to trace back who introduced the MOM, but Johan Bernoulli (1667-
1748) was one of the first who used the method in his work.
• With the MOM, the moments of a distribution function in terms of its parameters
are set equal to the moments of the observed sample.
• Analytical expressions can be derived quite easily but the estimators can be
biased and not efficient. The moment estimators, however, can be very well used
as a starting estimation in an iteration process.
Method of moment estimator
mᵣ = μᵣ
The sample mean is a natural estimator for μ. The higher sample moments m ᵣ are
reasonable estimators of the μᵣ.
but they are not unbiased. Unbiased estimators are often used. In particular
and the fourth cumulant are unbiasedly estimated by:

Finding theoretical moments as a function of θ is not easy for all probability


distributions.
The method is difficult to generalize to more complex situations (dependent data,
covariates, non-identically distributed data).
Sample covariances may be used to estimate parameters that determine
dependence. For some distributions (such as Cauchy), moments may not exist.
Method of Maximum Likelihood
(MML)
• Also, with the MML it is difficult to say who discovered the method, although
Daniel Bernoulli (1700-1782) was one of the first who reported about it (Kendall,
1961).
• The likelihood function gives the relative likelihood of the obtained observations,
as a function of the parameters θ:

• With this method one chooses that value of θ for which the likelihood function is
maximized.
• The ML-method gives asymptotically unbiased parameter estimations and of all the
unbiased estimators it has the smallest mean squared error.
• The variances approach asymptotically to:

• Furthermore, these estimators are invariant, consistent and sufficient.

• The maximum likelihood estimator is unbiased, fully efficient (in that it achieves the
Cramer-Rao bound under regularity conditions), and normally distributed; all of them in
asymptotical sense.
• Regularity conditions are not fulfilled if the range of the random variable X depends on
unknown parameters.
• The MML is extremely useful since it is often quite straightforward to evaluate from the
MLE and the observed information. Nonetheless it is an approximation and should
only be trusted for large values of n (though the quality of the approximation will vary
from model to model).
Method of Least Squares (MLS)
• Least Squares were introduced by Gauss (1777-1855). Given the observations x=(x1,
• x2, ..., xn) and y=(y1, y2, ..., yn), a regression model can be fitted. For the general case:

with the assumed constant variance of Y around its regression line, the parameter
estimates are:
The estimators are linear functions of the Yi ‘s and they are unbiased.
Method of L-Moments
• In Statistics, L-moments are a sequence of statistics used to summarize the
shape of a probability distribution.
• They are linear combinations of order statistics (L-statistics) and can be used
to calculate quantities analogous to standard deviation, skewness
and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively (the
L-mean is identical to the conventional mean).
• Standardized L-moments are called L-moment ratios.
• Sample L-moments can be defined for a sample from the population and can
be used as estimators of the population L-moments.
• Since L-moment estimators are linear functions of the ordered data
values, they are virtually unbiased and have relatively small sampling
variance.
• L-moment ratio estimators also have small bias and variance,
especially in comparison with the classical coefficients of skewness
and kurtosis.
• Moreover, estimators of L-moments are relatively insensitive to
outliers.
• Finally, L-moments are in fact nothing else than summary statistics for
probability distributions and data samples. They are analogous to
ordinary moments – they provide measures of location, dispersion,
skewness, kurtosis, and other aspects of the shape of probability
distributions or data samples.

You might also like