Inferential Statistics

INFERENTIAL STATISTICS
INTRODUCTION
Statistics deals with quantitative data.

Statistical analysis analyses masses of numerical data so as to summarize the essential
features and relationship of data in order to generalize from the analysis to determine
patterns of behavior, particular outcomes or future tendencies.
There are basically two branches of statistics in the business environment, and these are
the Descriptive Statistics and Inferential Statistics.
Where by the Descriptive Statistics is concerned with describing the basic features of the
data in study and provide simple summaries about the sample and the measures.
And at the same time simple graphics analysis together they form the basis of virtually
every quantitative analysis of data.
Statistical information seen in News Papers, magazines, reports, and other publications
of data that are summarized and presented in a form that is easy for the reader to
understand. Summaries of such data can be presented in tabular form, graphical form,
numerical form or test form all this is referred to as descriptive statistics. All descriptive
statistics are used to present quantitative descriptions in a well and manageable form.
That way one finds that this type of statistics reduces lots of data into a simpler summary.
Descriptive statistics is distinguished from inferential statistics where by with it, one is
concerned with trying to reach conclusions that extend beyond the immediate data done.
INFERENTIAL STATISTICS
ANDERSON. (1996), defines inferential statistics as a process of using data obtained

from a sample to make estimates or test claims about the characteristics of a population.
WIkipedia, defines it as an inference about population from a random sample drawn from
it, or more generally about random process from its observed behavior during a finite
period of time which includes point estimation, interval estimation, hypothesis testing
(statistical significance testing) and production.
Lucey (2002), defines statistical inference as the process by which behaviors are drawn
about some measure or attribute of a population. For instance (mean or standard
deviation) based on analysis of sample data. Samples are taken in order to draw
conclusions about the whole population and one finds that the testing process is
destructive so that is why sampling is preferred.
The basis of statistical inference is to take a sample of a given population in study which
is analyzed so that the properties of that population is estimated from the properties of the
sample.
That is why we find that statistical inferences uses sampling distributions which show the
distribution of values expected in samples.
The purpose of sampling in statistical inference:
• Is to get reliable data by looking at a few observations rather than all possible
observations. The properties of a population are then estimated by looking at the
properties of the sample. Then later on, conclusions are made.
• Sampling helps in reducing costs especially when more information is needed,
collected, and analyzed. And when many people are involved in the surveying
process.
• Sampling reduces the time needed to collect and analyze data. And also the
amount of effort put in data collection.
• Sampling in statistical inference gives better or accurate results than when all the
population is taken. This is because it would be impossible for all the population
to give constant response.
• It is again impossible to test all the population.
Though sampling is advantageous in statistical inferences, it has some draw backs

where by it is difficult to define a reliable sample that represents the whole population
fairly.
Applicability of Statistical Inference
• Statistical Inference uses values from samples to estimate values for the
population and this is based on simple random samples.
• Statistical Inference are used to draw conclusions that extend beyond the
immediate data alone. For instance this is used to infer from the sample data
what the population might think.
• Inferential Statistics is used to make judgments of the probability that an
observed difference between two groups is a dependable one or one that might
have happened by chance.
• Thus it is used to make inferences from our data to more general conditions.
• Statistical Inference is also useful in experimental and quasi experimental
research design or in program outcome evaluation. For example, when you
want to compare the average performance of two groups on a single measure
to see if there is a difference. One might want to know whether eight grade
boys and girls differ in maths test scores or whether a program group differs
on the outcome measure from a control group.
Whether one wishes to compare the average performance between two groups one
considers the t – test for differences between groups.
Statistical Inference about the population from a random sample drawn from it or
more generally about random process from where it is observed during a finite
period of time includes the following:
1. Point Estimation
2. Interval Estimation
3. Hypothesis Testing or Significance Testing
4. Prediction / forecasting
Below are some of the justifications of Statistical Inferences, and these are based
on the idea of what real world of phenomena can be modeled as a probability:
1. Frequency Probability
2. Bayesian Probability
3. Fiducial Probability
4. Eclectic Probability
Wikipedia analyses the Topics Included in the Area of Statistical Inference:

1. Statistical Assumption
2. Likelihood principle
3. Estimating Parameter
4. Statistical Hypothesis Testing
5. Revising Opinions in Statistics
6. Planning Statistical Research
7. Summarizing Statistical Data
Major Inferential Statistics come from a general family of statistical models

known as the General Linear Model. This includes the t – test, Analysis of
Variance (ANOVA), Analysis of Covariance (ANCOVA) regression analysis and
many other multi – variant methods like factor analysis, multi dimensional
scaling, cluster analysis, discriminant function analysis.
TYPES OF INFERENCES
Statistical Inference is divided into two types:

1. Estimation
2. Hypothesis Testing
ESTIMATION
This one deals with the estimation of population characteristics such as population
mean and standard deviation from sample characteristics such as sample mean
and standard deviation.
The population characteristics are known as Population Parameters
And sample characteristics are known as Sample Statistics
This type of statistical inference has four properties of which are classified as
properties of good estimation:
a) Unbiased
An estimate is said to be un biased if the mean of the sample means X of all
possible random samples of size n drawn from a population of size N, equals
the population parameter U. Thus the mean of distribution of sample means
would equal the population mean.
b) Consistency
An estimate is said to be consistent if as the sample size increases, the
precision of the estimate of the population parameter also increases.
c) Efficiency
An estimate is said to be more efficient than another if in a repeated sampling
its variance is smaller.
d) Sufficiency
An estimate is said to be sufficient if it uses all the information in the sample
in estimating the required population parameter.
These are the symbols used in sample statistics and population parameters
Sample Statistics Population Parameters

Arithemetic Mean X μ
Standard Deviation s σ
Number of Items n N
Estimation of Population Mean
Lucey (2002), identifies the use of the sample mean X, to make inferences about
the population mean in common and if series of samples size n (n>= 30) is taken
from a population, it will be found that:
• Each of th sample mean is approximately equal to the

population mean.
• Sample means cluster more closely around the
population mean than original individual values.
• The larger the samples, the more closely would their
mean cluster around the population mean.
• The distribution of sample mean follows a normal
curve.
Estimation of Population Proportions
In applying statistical inference, μ is done in the arithmetical mean and a

similar process can be used for other types of statistical measures of median,
standard deviation; proportion mention but a few.
Here three essential elements can be analyzed;
a) The required measure as found from the sample

b) The standard error of the measure involved
c) The sampling distribution of the measure
The proportion in this aspect one finds that it represents the ratio of defective to
good production; the proportion of consumers who plan to buy a given product or
other similar piece of information.
Statistical Inference involves Binomial distribution; which involves complex

technical difficulties caused by the discreteness of the distribution and symmetry
of confidence intervals.
But when n is large and np & nq are over 5, then the Binomial distribution can be
approximated by the normal distribution.
This simplifies the analysis and concepts outlined for the mean and can be applied
directly to the proportion.
The formula for standard error of sample proportion
Sps = √ (pq/ n)
For example a random sample of 400 passengers is taken and 55% are in favor of
the proposed new timetables.
With 95% confidence, what proportion of all rail passengers are in favor of the
timetable.
Solution
n = 400, p = .055, and q = 1 – 0.55 = 0.45
as np = 220, i.e. well over 5, the normal approximation can be used
Sps = √ (pq/ n) = √ (0.55*0.45/ 400) = 0.025
Therefore we are 95 % confident that the population proportion is between the

two values.
Ps is not equal to 1.96sps
= 0.55 not equal to 1.96(0.025) = 0.55 not equal to 0.049
= 0.501 to 0.599
In addition to this, interval estimate of μ1 – μ2: large sample case

Estimation from Small Samples
Assuming that all samples have been large, (n > 30). In this
aspect S, the sample standard deviation, is used as an estimate
of σ, the population standard deviation.
One also finds the that the distribution of sample means is approximately normal
so that the properties of the normal distribution can be used to calculate
confidence limits using the standard error of the mean.
If the properties and relationship are not true, and the sample size is small (n<30),
because the arithmetic means of small samples are not normally distributed, in
this scenario student’s t distribution must be used.
The t distribution
This is used when there are small samples and when it is

necessary to make an estimate of σ, the population standard
deviation based on s, the sample standard deviation.
Characteristics of t Distribution
• It is an exact distribution which is uni modal symmetrical about 0

• It is flatter than normal distribution i.e. the areas near the tails are greater
than the normal distribution.
• As the sample size becomes larger, the t distribution approaches the
normal distribution.
The statistic t is as follows;
S = √ (∑(x - x)/ n-1)
To develop interval estimates, for the two population small sample case, two
assumptions must be made about the population and the samples selected from the
two populations and these are:
Both populations have normal probability distributions

The variance of the populations are equal to (σ21 = σ22 = σ2)
Wherever the sample sizes are equal, the procedure here provides acceptable
results even if the population variances are not equal.
Thus a research with control of the sample sizes would consider equal sample
sizes where; n1 = n2
Considering the assumptions given above, sampling distribution of x1 – x2 is

normally distributed regardless of sample size. And the expected value of x1 – x2
X1 – X2 = ? ?2 (1/n1 + 1/n2)
Sampling Distribution of x1 – x2
is μ1 – μ2 because of the equal variance

assumption as indicated below.
σx1 – x2 = √ σ2/n1 + σ2/n2 √ σ2/n1 + σ2/n2 = √ σ2 (1/n1 + 1/n2)
Sampling distribution of x1 – x2 when the population has Normal Distributions

with equal variance
Fig 1.
Summary:
Statistical Inference is the process of drawing conclusions about the population from
samples.
Estimation is concerned with population parameters from sample statistics.
Where the sample size is large the sampling distribution of means is a normal
distribution.
Finding population proportions from sample information follows the usual estimation
process that np and nq are over 5 so that normal approximation can be used.
Where n < 30 the sample is small, the student’s t distribution must be used instead of the
normal distribution.
HYPTHESIS TESTING
This is yet another type of statistical inference. The hypothesis testing requires decision
makers to formulate a position or make a claim regarding decision environment they are
dealing with.
After which is sample is selected, and basing on its contents, either affirm that this
position is correct or conclude it is wrong.
In dealing with this hypothesis testing one can know how the predetermined position or
claims are formulated and how data is used to substantiate or refute the position.
Two types of errors are going to be discussed and in this very aspect devotion making
rules will be established in light of chances of making each type of error.
HYPOTHESIS TESTING DEFINED
Lacey (2002) Defines hypothesis testing as significance testing and similar to the process
of estimation .Random sampling is involved and the properties of distribution of samples
means proportions are used.
Hypothesis testing is a belief or opinion or is the process by which the belief is tested by
statistical means. For example, from a large batch of components a random sample may
be diameter of the population of component is 50mm.Based on the results from the
sample the hypothesis would be either accepted or reflected. And the Hypothesis to be
tested is the NULL Hypothesis denoted as Ho.
REASONS FOR TESTING HYPOTHESIS:
Statistical hypothesis testing allows managers to structured analytical method to make

decisions.
Hypothesis testing helps decision makers to make decisions in which way that the
chances of decision errors can easily be controlled or at least identified.
It again provides techniques that help managers to identify and control the level of
certainly and this is due to the fact hypothesis testing does not eliminate the uncertainly
in the managerial environment.
There are quite number of questions like what happens to a certain behavior. If...? What
causes people to? Can a person s behavior be influenced? Possible explanations for the
behavior can be studied based on the previous gathered facts and theories expressed as
prediction and scientific hypothesis not always true but it is stated in much way that it can
be proved false if it is indeed false.
`
For example Victo and Viane noticed that there was a lot of violence on television and in
movies, and wondered, television and film actors known as models for children from this
question the hypothesis that was generated was : Children who view film depicting
aggressive adult models will exhibit more aggressive acts than children who see a film
depicting passive models . This hypothesis makes a production that is should be easy to
verify.
In experiments, there are two mutually exclusive hypotheses meaning that if one is true,
the other can’t be.
A researcher hypothesis researcher is one that the researcher wishes to support and on the
other hand, the Null hypothesis is the one the researcher wants to reject because it
proposes that there will be no charge in behavior no difference between groups being
measured.
And it is this Null hypothesis that is tested in research study .If the Null hypothesis is
known to be false, then the research hypothesis is supported.
Null hypothesis depends on the decision problem to be either supported or refuted
in order to reach conclusions will depend on the hypothesis being tested.
Results of Hypothesis Testing
Lacey (2002) Puts down four possible results when a hypothesis test is carried out and
these are:
1. We accept a true hypothesis – a correct decision.
2. We reject a false hypothesis – a correct decision.
3. We reject a true hypothesis – an incorrect decision and this known as Type 1 error
4. We accept a false hypothesis – an incorrect decision and this is the Type 2
error.---
Groebner (1981) puts down the possible action and possible states of nature associated
with all the hypothesis – testing problem.
And these are three possible out comes.
(a) – no error.
(b) – Type 1 error
(c) – Type 2 error
Only one of the three outcomes will occur for every test of a Null hypothesis.
Though every one would wish to eliminate all chances of error ,the decision maker may
make either Type 1 error or Type 2 statistical error depending upon which decision is
selected .
For in the below figure.
If the Null hypothesis is true and an error is made , it must be a Type 1 error on the other
hand ,if the Null hypothesis is false and an error is made, it must be aType2 error.
Ho True Ho false
Reject Ho Type 1 error No error
No. error Type 2 error

Accept Ho
HYPOTHESIS – TESTING OUTCOME POSSIBILITIES

Establishing the Decision Rule
The objective of a hypothesis testis to use sample information to decide whether to accept
or reject the Null hypothesis about the population value. How do decision makers
determine whether the sample information supports or rejects the Null hypothesis?
The answer is to compare the sample results with a predetermined decision rule.
Decision Rule:
Based on the sample:
If X >= A, Conclude that the Null hypothesis is correct and

accept Ho.
If X < = A , conclude that the Null hypothesis is false and reject Ho.
Where; X = sample mean

A = critical value.
For example Wabwire has been hired as the head of production for crepes bottler’s
company. Some soft derricks bottlers have been under pressure from consumer groups,
which claim that bottlers have been increasing the price of crepes and filling the bottlers
with less than what has been advertised. Although Wabwire feels no manufacturer would
purposely short fill the bottle , he knows that filling machines some times fail to operate
properly and fill the bottle less than full .barbwire is responsible for making sure the
filling machine at the company operate correctly , ha samples every hour an decoded on
the sample results , decodes whether to adjust the machines. If is not interested in
whether the bottles are filled with too much soft drink, can identify two possible tastes of
nature for 35- ounce bottles.
State 1 the bottles are filled with 35 or more ounces of self drink on the average.
State 2 the bottles are filled with less than 35 ounces on the average.
In the above scenario , if the all hypothesis is rejected , barbwire will half production and
have a maintenance crew adjust the filling machine to increase the average fill on the
Reject Ho A ?x = 35 Accept Ho
of probability committing a Type I error
other hand , if X is greater than or equal to A he will accept the null hypothesis , and
conclude that the filling machine e are working properly . There is a need to determine a
critical value because a decision involves determining the critical value, A.
Selecting the critical value;
Fig : 2
Sampling distribution of x , crepes bottling company
X , possible X values.
μx = 35
Here the distribution of possible sample means will be approximately normal with center
at the population means .
And the null hypothesis in the crepes bottler company is μx >= 35 but even if its true,
we may get a sample mean less than 35 (sampling error) .
selecting a critical value ,” A hypothesis test require a decision maker to answer
questions like, what values X of will tend to reject the null hypothesis values much
smaller than μx, values much larger than μx or values both much smaller
and much larger than μx?
fig (3) critical values of crepes bottling company

In the above figure , if the null hypothesis is actually true (μx >= 35), the area under
the normal curve to the left of A represent the maximum probability of rejecting a true
null hypothesis, which is type 1 error. This probability is called ALPHA (α)
The chances of committing a type 1 error can be reduced if the critical value A , is more
further to the left of (μx >= 35) as above .
In order to determine the appropriate value for A , decision makers must determine how
large an they want .this decision makers must select the value of in light of the costs
thawed in committing a type 1error .
For example if Wabwire rejects the null hypothesis when it’s true, he will shut down the
production and incur the costs of machine adjustments. This can even affect the future
production, so calculate these costs and determine the probability of incurring them in am
management decision
These stapes must be followed in order to test any null hypothesis: presses:
1. Determine the null hypothesis and alternative hypothesis

2. Determine the desired alpha level
3. Chose the sample size.
4. Determine the critical value of a
5. Establish the decision null
6. Serest the sample and perform test
Booklet II
ADVANCED TIME SERIES ANALYSIS
Time series use statistical analysis on past data arranged in time series for example
– sales by month for the last ten years
– Annual production of agriculture products over twelve month.
Factors considered in time series analysis
a) Are the past data representatives? For example do they contain the results of
recession boom shift of taste?
b) Time series methods are appropriate where short term forecasts are required.
c) Time series methods best limited to stable situations. Where
fluctuations are common and are expected to change, then may give poor results.
Importance of time series analysis
- Helps to understand the past behavior which gives a production for the future
for which is statistically important for business planning.
- Gives information which lets as a base for comparison in values of different
phenomenon at different times.
- Helps in evaluation of current achievements.
- It helps in interpreting and evaluating changes in economic phenomenon in the
hope of more correctly anticipating the course of future events.
Components of time series
1. Long term trend

2. Seasonal
3. Cyclical
4. Irregular or random
These are explained briefly;
Long term trend components
The trend components in the long term increase or decrease in a variable being measured
over time
Today, organization is faced with increased planning problems caused by changing
technology, government regulations and uncertain foreign competition. A combination of
these forces most organizations with increasing the time span of their planning cycle.
And due to fact that long term forecasting is increasing, the trend component in time
series analysis is important to all organizations.
Seasonal component
Some organizations or industries are affected by seasonal variations and not only long
term trends.
The seasonal component represents these changes in time every year.
Organization affected by seasonal variation need to identify and measure the seasonality
to help with planning for temporally increases or decreases in labor requirements,
inventory , training, periodic maintenance and many others.
Organization need to know if seasonal variations they experience occur at more or less
than the average.
Cyclical component
Cyclical effects in time series are represented by wave like fluctuations around long term
trend. These fluctuations are caused by factors such as interest rates, supply, consumer
demand, inventory levels, national and international market conditions and government
policies.
The cyclical fluctuations repeat themselves but occur with differing frequencies and
intensities. So each though one knows what happened to the firm during the last cycle, he
or she has no guarantee the effect will be the same as the next time.
Irregular or Random Component
These are not attributed to any of the three previous Components.

These fluctuations as n are caused by factors met as weather, political events human and
non human actors. Minor irregularities are not significant in the organization long term
operations. But major irregularities like wars, droughts are significant to operations of the
organization.
TIME SERIES ANALYSIS
These are some of the methods considered moving average and Exponential smoothing.
Moving Average Method
In this aspect, if forecast for next months sales say for December, was the actual sales for
November, then the forecasts obtained would fluctuation. If forecasts for the next months
sales for several preceding months then random fluctuations would cancel each other.
This is the principle of moving average method.
Illustration:
Past Sales
Month Actual sales 3 monthly 6 monthly 12 monthly
units moving average average moving average
January 500
February 650
March 420
April 600 523.33
May 350 556.66
June 840 456.66
July 920 596.66 560
August 950 703.33 630
September 800 903.33 680
October 750 890 743
November 850 833.33 768.33
December 930 800 851.66
January 990 843.33 866.66
Any month’s forecast is the average of the proceeding al in actual sales. For example the
3 months moving average forecasts were prepared as follows:
April’s forecasts = January sales +February sales + March sales

3
= 500 + 650 + 420
= 1570
3
= 523.33
May’s forecasts = February sales + March sales + April sales

3
=650 + 420 + 600
=1670
3
=556.66
And for a six months moving average forecasts were prepared as follows:
July forecasts = Jan sales + Feb sales + Mar sales + April sales +May sales + June sales
6
=500 + 650 + 420 +600 + 350 +840

6
= 3360
6
= 560
Characteristics of moving average
- Different moving average produces different forecasts.

- The greater the number of periods in the moving average, the greater the
smoothing effect.
- If the underlying trend of the fast data is thought to be fairly constant with
Substantial randomness, the greater number of periods would be chosen.
- If there is change in the underlying state of data, more responsiveness is
needed; therefore fewer periods should be included in the moving average.
Limitations of moving average
- Equal weighing is given to each of the values used in the moving average
calculation, where as it is reasonable to impose that the most recent data is
more relevant to current conditions.
- Moving average calculation takes no account of data out side the period of
average, no full use is not made of all the data available.
- The use of unadjusted moving average as forecasts can cause misleading results
when there is underlying variation.
- An n period moving average requires the storage of n – 1 value to which is
added the latest observation.
Exponential Smoothing: Method
Lucey: (2002). Asserts that this is a frequently forecasting technique which largely over
comes the limitations of the moving average method. It involves the automatic weighing
of past data with weights that decreases exponentially with time. Meaning that the most
current values receive the greatest weighing and the older observations receive a
decreasing weighing. Exponential smoothing method is a weighed moving average
system having the principle that;
Forecasts = old forecasts + a proportion of the forecast error.
The simplest formula;
New forecast = old + α (latest observation – old forecast)
Where: α (alpha) is the smoothing constant.
Illustration : Exponential smoothing

Table below shows that forecasts have been prepared using α value of
0.1 and 0.5.
Actual Exponential Forecasts

Month Sales (units) α value 0.1 value 0.5
January 450
February 440 450 450
March 460 449 445
April 410 450.10 452.50
May 380 445.69 431.25
June 400 439.12 405.63
July 370 435.21 402.82
August 360 428.69 586.41
September 410 421.82 373.21
October 470 420.64 391.61
November 490 423.57 420.81
December 460 428.21 445.41
January 434.39 467.71
Note:
1. Because no previous forecasts were available Jan. Sales were used as
February forecasts.
2. Formula used when α = 0.1, March forecasts = Feb forecasts + 0.1
(Feb sales – February Forecasts)
= March forecasts = 450 + 0.1 (440 -450)
= March forecasts = 449
3. In practice the forecasts are rounded to the nearest unit.

4. The higher α value, 0.5, produces a forecast which adjusts more readily to
the most recent sales.
The smoothing constant:

It is noted that the value of α can be between 0 and 1.Incating that higher value of α (i.e.
near to 1), the more sensitive the forecast becomes to current condition and the cover the
value, the more stable the forecast will react less sensitively to the current condition.
The value is chosen by analysis for carrying out experiments with different values to
see what value gives the most realistic forecast.
Characteristics of Exponential Smoothing:
a) Greater weight is given to more recent data.

b) All past data are incorporated there is no cut off point as with moving averages.
c) Like moving average it is an adaptive forecasting system. That is it adapts
continually as new data becomes available and it is frequently incorporated as an
integral past of stock control and production control.
d) To cope with various problems (trend, seasonal factor) the model needs to be
modified.
e) Less data needs to be stored than with the longer period moving average.
f) What ever form of exponential smoothing is adapted, changes to the model to suit
the changing conditions can simply be made by changing the value.
Summary
- Time series is based on data arranged in regular time periods e.g. sales per month
- Moving average system is based on any number of periods. Say 3 months, 6 month or
storage.
-The key factor in exponential smoothing is the choice of smoothing constant α. The
higher the value, the more responsive is the system to the current condition.
PROBABILITY:
Definition
Salem: (1997: 253). Asserts that probability is the ratio of the number of favorable cases
to the total number of equally likely cases
That is if there are severally equally likely events that may happen, the probability that
any one of these events will happen, is the ratio of the number of cases favorable to it’s
happening to the total number of possible cases.
Lucey: (2002:8).Asserts that probability can be considered as the qualification of
uncertainty where by an uncertainty is expressed as like “hood “chance or “risk “
Probability is expressed by P and takes values ranging from zero to and one meaning that
zero is an impossibility and one is certain for example P (crossing the ocean unaided ) = 0
band P(dieing ) = I
Applicable areas of probability
 Statistical quality control problems

 Personnel tern over problems
 Inventory control problems.
 Quelling or walling live problems.
 Equipment replacement and maintenance.
Inferential statistical is concerned with taking a sample from a given population

studying the sample and make use of the sample to make conclusions.
Making inferences on the population based on the sample of the population call upon
business managers to make decisions in situations of uncertainty.
This subject mater is dealt with a branch of statistics called probability which is
already above.
Approaches of probability
There are compulsions about probability due to the reason that, it means different
things to different people. there are four basic methods or approaches to probability
and these are:
i. Relative frequency of occurrence or empirical approach
ii. Subjective probability assessment or personal statistic approach
iii. Classical probability assessment.
iv. Axiomatic approach.
Relative frequency of occurrence
This approach is based on actual observation and it borrows the concept of relative
frequency which implies the probability of that the probability of an event is given by the
frequency of that event relative to the total number of trials.
For example if one is the probability of ten customers or fewer customers actually did
arrive before 8.00 am. The probability assessment would be the ratio of days when ten or
few customers arrived to the total number of days observed,
Formula: RF (E1) = number of times E1 occurs
Where RF = Relative Frequency of E1 occurring.

n = number of Trials.
Advantages of Relative Frequency Approach
 It is a valuable tool for decision making because it can be used to quantitatively

represent experience.
 More useful for practical problems. Disadvantage of related frequency approach.
 If n is small ,in business situation because a shilling and time constrain, the
estimated probability may be quite different from the true probability
Subjective / Personality Probability Assessment

.
This approach is defined as a measure of the degree of confidence or belief that a
particular individual has in the occurrence of an event E. (Saleemi: 257).
At times manages may not be able to use a relative frequency of occurrence as a starting
point of assessing the desired probability. In absence of past experience, decision makers
must make a subjective probability assessment. Which is a measure of personal
conviction that an out come will occur? This subjective probability rests in a persons
mind and not a physical event.
Classical probability / prior probability assessment:
This approach of probability is not directly applicable to business decision making as the
subjection and relation frequency methods if there is A possible outcome favorable to the
occurrence of an event E and B possible out come unfavorable to the occurrence of E
,and all these possible outcomes one equally likely and mutually exclusive then this
probability that the event E will occur, denoted by
P (E) =A / A+B
Limitation of classical approach:
 The definition is not applicable when the assumption of equally likely does not
hold.
 The definition becomes vague when the number of possible outcomes may be
infinite.
 It may be difficult to determine the values or the numerator and denominator
Axiomatic approach:
Saleem: (1997:256). assets that.

According to this approach, probability is defined as under,
Let there be a sample space S defined on a random experiment and consisting of n simple
event E, E2 -----En
Then the function P defined as S will be called probability function if it associates a real
number denoted by P (E)
And called the probability of E for every event E defined on S and satisfies the following
axioms:
i. Axiom of positiveness.
This means 0< P (E) < 1 for every E in S.
ii. Axion of certainty.
This means P(S) =1
iii. Axion this means P (E+E) = P(E)+P(E) for any two mutually
exclusive event is S.
Having analyzed the approaches to probability assessment regardless of how decision

makers arrive at probability assessment, there are basic ruled which these probabilities
are used to asset indecision making process.
Before these rules are considered mutually exclusive events and independent event must
cross the minds of the reader.
Mutually exclusive events:

These are events which cannot happen at the same time. If one happers the other cannot
happen or occur. For example a female and male are mutually exclusive, fail and pass.
Independent event:
Two or more events are independent if the occurrence or none. Occurrence of any one
event does not affect the occurrence or non occurrence of the others. E.g. the outcome of
any throw of a die is independent of the outcome of any proceeding or succeeding
outcome.
Basic probability rules

These include the following:
 Zero includes one probability rule.
 Complimentary rule of probability
 Conditional rule of probability
 Multiplicative rule of probability
 Additional rule of probability.
Zero to one rule of probability

This rule states the probability of any event such as A occurring is a number lying
between O and 1.expressed as O< P(A) < 1.
That is to say, if the probability of an event A takes the least value of O. the
probability of A is O, and the event A is an impossibility.
If the probability of an event takes the highest value of 1 expressed P(A)= 1 it means
that the event is certain.
Complimentary rule of probability

This rule states that if E is an event, if is a non –occurrence. That is the complement
of an event E is the collection of all possible outcomes that are not complement of
event E is represented by E.Expressed as P(E)=1-P(E).
Conditional rule of probability:
Lucey.(2002). Asserts that this probability rule is associated with combination of

event given that some prior results has already been achieved with one of item.
When the events are independent of one another , then the conditional probability is
the same as the probability of the remaining event.
Multiplication rule of probability.

This rule states that of probability of combined occurrence of the two events A and B
is the probability of A by the conditional probability of B on the assumption that A
has happened.
Expressed as follows;
P(AnB)=P(A/B)xP(B)
P(BnA)=P(B/A)x P(A).
This rule is used when there is a string of independent events for which each
individual probability is known and it is required to know the overall probability.
Addition Rules of probability
This rule is concerned with calculating the probability of two or more mutually
exclusive events. Such that the probabilities of the separate events are added.
For example the probability of showing a 3 or a 6 die would be expressed.
P(throwing 3) = 1/6 and P(throwing a 6) = 1/6 .
P(throwing a 3 or a 6) 1/6 + 1/6 = 1/3 .
Other Probability Rules:
- Conditional probability for independent events E1 and E2.

This will be expressed as P(E1) E2 = P(E1).
And P (E2) E1 = P(E2).
This rule states that two events are independent if one event occurring had no
bearing on whether the second event occurs. This two events are independents.
- Multiplication Rule for independent events:
This rule requires that conditional probability be had since the result on the second
draw depends on the card selected on the first draw.
Expressed as:
P(E1 and E2) = P(E1) P(E2).
Baye’s Rule:
This rule or process involves working backwards from effect to cause.

This rule is used in the analysis of decision trees where information is given in the
form of conditional probabilities and the reverse of these probabilities must be
found.
Formula:
P(A\B) = P(A) x P(B/A)

P(B)
Summary:
Probability provides decision makers a qualitative measure of the chance an

environmental outcome will occur.
Probability helps or allows decision makers to qualify uncertainty.
Various approaches of probability have been dealt with and the basic rules that
govern probability operations.
Buyers rule finds a conditional probability (A\B) given it’s reverse (B\A) and the
general formula of Baye’s Rule is:
P (A\B) = P (A) x P (B\A)

P(B)
Booklet III
Discrete Probability Distribution
• Discrete Random Variable

• Discrete probability Distribution
• Mean and Standard deviation of discrete probability distribution
• Mean of Discrete probability distribution
• Standard deviation of discrete probability
• Characteristics of binomial probability distribution
• Develop a binomial probability distribution
• Comments on binomial distribution
• Poisson probability distribution
• Characteristics of Poisson distribution
• Poisson distribution form
• Application of Poisson probability distribution
• Variance and Standard deviation of Poisson distribution
TOPIC: PROBABILITY DISTRIBUTION
DISCRETE RANDOM VARIABLE
In this topic the consideration is put on identifying the processes that are represented by
discrete distributions in general and by binomial and Poisson distributions in particular.
Again the probabilities associated with particular outcomes in discrete distribution are
considered.
Determine the mean and standard deviation for general discrete distributions and for the
binomial and Poisson distribution.
These are the major concepts used in probability distributions and one needs to fully
understand them when dealing with this topic.
• Random Variable
• Probability function
• Expected value and variance
Random Variable
Saleem (1997), defines a random variable that takes specified values with specified
probabilities. The probabilities specified in such a way that the random experiment is
conducted and the way in which the variable is defined and observed on the random
experiment.
Capital letters are used to denote random variables and corresponding small letters to
represent any specified value of the random variable.
Groebner (1981:123), asserts that a random variable is a variable whose numerical value
is determined by the outcome of a random experiment or trial.
Classes of Random Variables
• Discrete Random variable

• Continuous random variable
A discrete random variable is a random variable that assumes only distinct values for
example: if a manager examines 10 accounts the number of inaccurate balance can be
represented by the value X. then x is a random variable with values {0,1,2,3,…,10}
Continuous random variables are ones which assume any value on a continuum. For
example time is continuous.
It is noted that discrete probability distribution is an extension of relative frequency
distribution for example DELL computers Limited, each week offers specials on to 5
specific computers as part of sales. For a period of 40 weeks, the sales manager recorded
how many of the 5 computers were sold each week, as shown in the table.
P(x) X. (number of computer sold)
Computers Sold
Computers Sold Frequency Relative Frequency

0 10 10/40 = 0.25
1 8 5/40 = 0.2
2 12 12/40 = 0.3
3 3 3/40 = 0.075
4 4 4/40 = 0.1
5 3 3/40 = 0.075
∑ = 40 ∑ = 1.00
In this aspect the probability of an outcome or (value of a random variable) occurring can
be assessed by the relative frequency of that outcome.
The probability distribution must add to one (1) and can be shown in a graphical form as
below.
Probability Distribution
THE MEAN AND STANDARD DEVIATION OF A DISCRETE PROBABILITY

DISTRIBUTION
Decision makers need to calculate the distribution’s mean, and standard deviation and
these values measure the central location and spread respectively.
Mean of a discrete probability Distribution
This is also called the expected value of the discrete random variable. And the expected
value is the weighted average of the random variable values where the weights are the
probabilities assigned to the values.
Formulae
E (x) = ∑ x p(x)
Where E (x) = Expected value of x
X = value of the random variable
P(x) = probability of each value of x
Standard deviation of a Discrete Probability Distribution
This measures the spread, or dispersion, in a set of data. It again measures the spread in
the values of a random variable calculation of standard deviation for a discrete
probability distribution.
Formulae: ∂x = √ {∑ [x – E(x)]2 p(x)
Where x = value of the random variable

E(x) = expected value of x
P(x) = probability of each value of x
Probability Distribution Commonly Used
1. Binomial Distribution
2. Poisson distribution
3. Normal Distribution
BINOMIAL DISTRIBUTION
This is a theoretical discrete distribution that has extensive application in business

decision making. Meaning that the probability distribution is well defined and that the
probabilities associated with values of the random variable can be computed from a well
established equation.
This process describes only two possible outcomes. For example a quality control system
in a manufacturing firm labels each tested item either defective or acceptable, marketing
research firm may receive responses to a questionnaire either yes (will buy) or No (will
not buy).
Conditions / Assumptions of Binomial Distribution Groebner (1981) puts down the
conditions necessary for binomial distribution as follows:
1. The process has only two possible outcomes that is success and failure
2. There are n identical trials or experiments
3. The trials and experiments are independent of each other, that is probability of
outcome for any particular trial is not influenced by the outcome of the other
trials. Say in production process this means that if one item is defective, this fact
does not influence the process of another being found defective.
4. The process must be stationary in generating success p, remains constant from
trial to trial.
5. If P, represents the probability of a success, then (1 - P) = q is the probability of a
failure.
Characteristics of Binomial Distribution
Saleem (1997:281), puts down the following characteristics of binomial distribution:
1. Binomial distribution is a discrete probability distribution in which random

variable x assumes the values 0, 1, 2… n where n is finite. That is why it gives the
theoretical probabilities and frequencies of a discrete variable.
2. The distribution is symmetrical if P and q are equal and will be asymmetrical if p
and q are unequal.
3. The distribution depends on the parameters p or q and n (a positive integer).
4. The distribution can be presented graphically taking horizontal axis to represent
the number of success and the vertical axis to represent probability of frequencies.
5. Since 0 < q < 1, variance = np x q < np = mean. Thus binomial distribution,

variance is less than the mean.
6. The mode binomial distribution is that the value of variance which occurs with
the largest probability, may have either one or two modes.
7. If two independent random variables x and y follow binomial distribution with
parameters (n1,p) and (n2,p) respectively, their sum (x + y) will follow binomial
distribution with parameters (n1 + n2, p).
8. When p is small (say 0.1) the binomial distribution is skewed to the right. As p
increases to (0.3) the skewness is less noticeable. When p = 0.5, the binomial
distribution is symmetrical and when p is larger than 0.5, the distribution is
skewed to the left.
9. Different statistical measures of binomial distribution are given below:
n = Number of Independent events in trial
p = probability of success
q = probability of failure
Fitting a Binomial Distribution

In order to observe data the following procedure should be adopted in respect of the
binomial distribution.
Determine the value of p and q. if one of these values is known, the other can be found
out by the simple relationship.
P = (1-q) and q = (1-p)

When p and q are equal, the distribution is said to be symmetrical then p and q are
interchanged without changing the value of any term.
If p and q are not equal, the distribution is skewed. If p is less than 0.5, distribution is
positively skewed, and when p is more than 0.5 the distribution is negatively skewed.
Expand the binomial (q+ p) n. the power n is equal to one less than the number of terms in
the expanded binomial. Thus when n =2, there will be three terms in the binomial and if n
= 4, there will be five terms.
Multiply each term of the expanded binomial by N (the total frequency) in order to obtain
the expected frequency in each category.
Comments about the Binomial Distribution
Binomial Distribution has many applications such as in quality control, marketing

research, still in decision making under uncertainty circumstances, Binomial Distribution
can be applied in other functional areas of the business.
If p, the probability of success is 0.5 the binomial distribution is symmetrical regardless

of the sample size.
When the value of p differs from 0.5 in either decision, the Binomial Distribution is
skewed.
THE POISSON PROBABILITY DISTRIBUTION
In using binomial distribution, one is able to count the number of successes and the
number of failures. But in other applications, the number of successes may be counted
and the number of failures may be difficult to count.
If the total number of possible outcomes cannot be determined, then the binomial
distribution cannot be applied in decision making aid.
This calls for the application of Poisson distribution which can be applied in situations
without knowing the total possible outcomes. In order to apply the Poisson distribution
one needs to know the average number of success for a given segment.
Conditions / Assumptions of Poisson distribution
These are the conditions that must be satisfied before applying the Poisson distribution.
1. The variable is discrete

2. The event can only be either success or failure
3. The number of trials n is finite and large.
4. P (the probability of occurrence of an event) is small that q is almost equal to
unity.
5. The probability of an occurrence of the event is the same for any two intervals of
equal length.
6. Statistical Independence is assumed i.e. the occurrence or non occurrence of the
event in any interval is independent of the occurrence or non occurrence in any
other interval.
Characteristics of Poisson distribution
A physical situation must have certain characteristics before it can be described byth
Poisson distribution.
1. The physical event must be considered to be rare.

2. The physical events must be random and independent of each other. That is an
event occurring must not be predictable, nor can it influence the chances of
another event occurring.
3. Poisson distribution is a discrete probability distribution where the random
variable assumes a countable infinite number of values 0, 1, 2….n.
4. Poisson distribution is viewed as a limiting form of binomial distribution when:
a. N the number of trials is indefinitely large i.e. n -> 100
b. P, the constant probability of success for each trial is indefinitely small i.e.
p -> 0
c. Np = m, is finite.
Poisson distribution can be used when n is equal to or more than 20 (n >= 20) and p is
equal to or less than 0.10 (p<= 0.10)
5. The main parameter of Poisson distribution is mean (m = np)

If the value of m is known, the values of other parameters can be ascertained as
follows
x approx. P(m).
6. Value of constants, mean = m = mean number of successes.
a. Mean = m, variance (σ2) = m

b. Standard deviation (σ) = √m
c. μ3 =m, μ4 = m+3m2
d. B1 = μ32 / μ23 = 1/m, B2 = μ4 / μ2 = 3 + 1/m
e. Moment measure of skewness
(Y1) = 1 / √m
Moment measure of kurtosis

(Y2) = 1 / m
7. Poisson distribution has either one or two modes (like the binomial distribution)
when m is not an integer, mode is the largest integer contained in m. when m is an
integer, there are two modes that is m and m – 1.
8. The distribution is positively skewed to the left. With an increase in value of the
mean m, the distribution shifts to the right and the skewness diminishes. Poisson
distribution differs from binomial distribution in two ways:
a. Poisson distribution operates continuously over a given period of time,
distance, area.
b. Poisson distribution produces success which occurs at random points in
the specified time, distance, and area. And these successes are called
occurrences.
The formula of Poisson distribution is given by:

F(x) = p(x = x) = e-mm x / x! ; x = 0, 1, 2 …
Where :
M = the parameter of the distribution and is the average number of
occurrences of random event.
X = the number of occurrences of the random event
e = the whose value is 2.7183
The distribution satisfies these properties f(x) > 0 and ∑f(x) = 1.
Importance of Poisson distribution
1. Number of customers arriving at service facility in unit time for instance per hour
2. Number of telephone calls arriving at telephone switchboard per unit time for
instance per minute
3. Number of defects along a tape
4. Dimensional errors in engineering drawings
5. Number of radioactive particles decaying in a given interval of time
6. Number of printing mistakes per page in a book
7. Number of accidents on a particular road per day
8. Hospital emergencies per day
9. Number of defective materials of products, say pins
10. Number of goals in a football match
Example
A product x follows a Poisson distribution with mean 6. Calculate:
i) P(x = 0) and
ii) P(x >2)
Given that e-6 = 0.00248
Solution
i) P(x = 0) = P(0) = e-6 = 0.00248

ii) P(x > 2) = 1 – P(x = 0) - P(x = 1) - P(x = 2)
= 1 – P(0) - P(1) - P(2)
= 1 – 0.00248 – 6(0.00248) – 18 (0.00248)
= 1 – 0.062 = 0.938
Fitting a Poisson distribution
Here one needs to obtain the values of m and calculate the probability of zero
occurrences. The other probabilities will be calculated by recurrence relation as follows:
F(x) e-mm x / x!
F (x +1) = e-mm x+1 / (x+1)!

F(x) / F (x+1) = m / m+1 or f (x+1) = m/x+1 f(x)
Where x = 0, f (1) = mf(0)

Where x =1, f (2) = m/2 f(1) = m2/2 f(0)
Where x =2, f (3) = m/3 f(2) = m3/6 f(0)
Variance and Standard Deviation of the Poisson distribution
The mean of the Poisson distribution is:

μx = λt
and the mean or expected value can be calculated using
μx = E(x) = ∑ xp(x)
and the variance using the following

σx2 = ∑ (x - μx)2 p(x)
N.B. The variance for the Poisson distribution will always equal to the mean
σx2 = λt
Standard deviation of the Poisson distribution is the square root of the mean
σx = √ λt
Where a Poisson distribution applies, the uncertainty can be controlled by controlling the
mean, which must be within the decision maker’s control.
Summary:
In this chapter, discrete random variable concepts have been introduced and showed how
probability distribution is developed for a discrete random variable. The computation of
mean, and standard deviation for discrete distribution have been considered.
Binomial and Poisson distributions represent two of the most commonly used theoretical
distributions. These distributions are used in a number of managerial applications.
Some concepts connected with discrete distributions from a managerial perspective have
been dealt with such as random variable, probability function and expected value and
variance.
NORMAL DISTRIBUTION
This is the mother type of discrete probability distribution which is associated with the
names of Laplace and Gauss and is called Gaussian distribution.
In this aspect, one finds that whether or not p is equal to q, the binomial distribution
forms a continuous curve when n becomes large. This correspondence between binomial
and the normal curve is close even for low values of n provided p and q are fairly near
equality. The limiting curve, obtained as n becomes large and is called the normal
frequency or the normal curve.
A random variable x having normal distribution with μ (mean) and σ2 (variance)
And the function is given by:
Y = f(x) = 1 / σ√(2∏) e-1/2 ((x – μ)/σ )2
Where y = computed height of an ordinate at distance x from the mean

σ = standard deviation of given normal distribution
∏ = constant = 3.1416: √(2∏) = 2.5066
e = the constant, 2.7183
μ = mean of the random variable x. can be expressed as X ~N(μ,
σ)
In normal distribution, the normal curve can be shown as below
The y = f(x) is a bell shaped curve. And the top of the bell is directly
above the mean μ. For large values of σ, the curve tends to flatten out
and for small values of σ it has a sharp peak.
?
Properties of Normal distribution

Saleem (1997:293-294) considers the following to be the characteristics of normal
distribution
i) The curve is symmetrical about the line x = μ, (z = 0). The line
is bell shaped.
ii) The mean, median and mode have the same value i.e. mean = median = mode.
iii) The height of the normal curve is maximum at the mean value, thus dividing the
curve into two parts.
iv) Since the curve is symmetrical, the first and the third quatiles are equidistant from
the median. i.e.
Q3 – Median = Median – Q1
Since there is only one point of maximum frequency (at the mean) the normal distribution
is uni-modal.
The mean deviation is 4/5 SD
Quatiles are given by
Q1 = μ – 0.6745 σ
Q3 = μ + 0.6745 σ
The points of inflexion (the points at which the curve changes its direction) are each at a
distance of one standard deviation from the mean.
The curve is asymmetric to the base time i.e. continues to approach but never touches the
base line. No portion of the curve lies below the base line.
The percentage distribution of area under standard normal curve is broadly shown below.
Here we have:
P(μ - 2σ < x < μ - 2σ) = P (-2< z < 2)

=
Thus the area under the standard normal probability curve
Between the ordinates at Z =± 1 is 0.6827 or 68.77%

If X and Y are independent normal curve variants with mean μ1 and μ2

and the variance Ө21 and Ө22 respectively, then (X+ Y) is also normal variant with the
mean (μ1 + μ2) and variance Ө21, + Ө22
Importance of the normal distribution
Normal distribution is of great importance in statistics for these reasons
• Inquiry distribution of many physical characteristics such as heights and weights

of people often have the shape of the normal increase
• Normal distribution is useful as a approximation to the other distribution such as

binomial and Poisson distribution certain limiting conditions
• It is used in tasting statistical hypothesis and test of significance in which the

assumption is that the population from which the sample have been drown is in
normal distribution
• Useful in statistical quality control where the control limit are set by using this
distribution
• It is also used in sampling theory. Thus helping to estimate parameters from

statistic and to find the confidence of parameters
• Reasonable results can be obtained by approximating the normal distribution to

many non – normal distribution provided the distribution do not depart too much
from normality.
• By virtue of the central limit Theorem the distribution of the means of samples
taken from any population which need not be normal tends towards the normal
distribution if the sample is large.

Inferential Statistics

Uploaded by

Copyright:

Available Formats

Inferential Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inferential Statistics

Uploaded by

Copyright:

Available Formats

INFERENTIAL STATISTICS

Statistics deals with quantitative data.

ANDERSON. (1996), defines inferential statistics as a process of using data obtained

Though sampling is advantageous in statistical inferences, it has some draw backs

Applicability of Statistical Inference

Wikipedia analyses the Topics Included in the Area of Statistical Inference:

Major Inferential Statistics come from a general family of statistical models

Statistical Inference is divided into two types:

Sample Statistics Population Parameters

Estimation of Population Mean

• Each of th sample mean is approximately equal to the

Estimation of Population Proportions

In applying statistical inference, μ is done in the arithmetical mean and a

a) The required measure as found from the sample

Statistical Inference involves Binomial distribution; which involves complex

The formula for standard error of sample proportion

n = 400, p = .055, and q = 1 – 0.55 = 0.45

as np = 220, i.e. well over 5, the normal approximation can be used

Sps = √ (pq/ n) = √ (0.55*0.45/ 400) = 0.025

Therefore we are 95 % confident that the population proportion is between the

In addition to this, interval estimate of μ1 – μ2: large sample case

This is used when there are small samples and when it is

• It is an exact distribution which is uni modal symmetrical about 0

The statistic t is as follows;

S = √ (∑(x - x)/ n-1)

Both populations have normal probability distributions

Considering the assumptions given above, sampling distribution of x1 – x2 is

is μ1 – μ2 because of the equal variance

Sampling distribution of x1 – x2 when the population has Normal Distributions

HYPOTHESIS TESTING DEFINED

REASONS FOR TESTING HYPOTHESIS:

Statistical hypothesis testing allows managers to structured analytical method to make

Results of Hypothesis Testing

For in the below figure.

No. error Type 2 error

HYPOTHESIS – TESTING OUTCOME POSSIBILITIES

If X >= A, Conclude that the Null hypothesis is correct and

Where; X = sample mean

Sampling distribution of x , crepes bottling company

fig (3) critical values of crepes bottling company

1. Determine the null hypothesis and alternative hypothesis

Factors considered in time series analysis

Importance of time series analysis

Components of time series

1. Long term trend

These are explained briefly;

Long term trend components

Irregular or Random Component

These are not attributed to any of the three previous Components.

TIME SERIES ANALYSIS

Moving Average Method

April’s forecasts = January sales +February sales + March sales

= 500 + 650 + 420

May’s forecasts = February sales + March sales + April sales

=650 + 420 + 600

=500 + 650 + 420 +600 + 350 +840

Characteristics of moving average

- Different moving average produces different forecasts.

Limitations of moving average

Forecasts = old forecasts + a proportion of the forecast error.