Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

REGIONAL MARITIME UNIVERSITY

ACCRA GHANA

BSM 301

STATISTICS

LECTURE NOTES

COMPILED

BY

BAMFO ERNEST FREMPONG

1
REFERENCES

1 FREUND, J. E, (1967). MODERN ELEMENTARY STATISTICS; PRENTICE HALL


INC., USA.

2 MILTON, J. S., CORBERT, J. J. AND MCTEER, P. M. (1986). INTRODUCTION TO


STATISTICS; D. C. HEATH CO.

2
INTRODUCTION TO STATISTICS - Concept, Definitions And Relations

SAMPLING - Probability And Non – Probability Sampling Techniques


Selecting Appropriate Techniques For Different Studies
VARIABLES
SOURCES AND TYPES OF DATA
DATA ORGANISATION
Grouping Or Classifying Data
Simple Bar Chart
The Pie Chart
The Frequency Polygon
Stem - And - Leaf Plot
Scatter Diagram
Histogram
MEASURES OF CENTRAL TENDENCY - The Mean, Median And Mode
Percentiles, Quartiles

MEASURES OF DISPERSION - Variance And Standard Deviation

Absolute And Relative Dispersion

SKEWNESS / KURTOSIS

CORRELATION AND REGRESSION


Pearson’s Product Moment Correlation Coefficient
Spearman’s Coefficient Of Rank Correlation
Regression Analysis
PROBABILITY – Introduction To Probability

Classical, Relative Frequency And Subjective Definition Of Probability


Probability Of Independent Events
Conditional Probability/ Probability Of Dependent Events
Total Probability
Baye’s Theorem
Counting Rules: Permutations And Combinations
Multiplication Theorem
Permutations
Combinations

DISCRETE PROBABILITY DISTRIBUTIONS

Poisson Distribution

Binomial Distribution

3
Geometric and Hyper geometric Distribution

Expectation Of Random Variables

Variance Of A Random Variable

Expectation And Variance Of A Binomial Distribution

CONTINUOUS PROBABILITY DISTRIBUTIONS


Normal Distribution

Exponential Distribution

Gamma Distribution

4
INTRODUCTION TO STATISTICS

Concept, Definitions and Relations

Statistics is the science of data, involving the collection, classification, summarization,


organization, analysis and interpretation of numerical information.

It is widely used in different disciplines both scientific and non-scientific, to make


decisions and draw appropriate conclusions based on credible data. For instance,
business leaders must frequently decide whom to offer their company’s products. A
company hoping to introduce new products onto the market could engage the services
of a statistician to assess the acceptability levels of the product among the population.

There are two main branches of statistics namely descriptive and inferential.

Descriptive Statistics: It utilizes numerical and graphical methods to look for patterns
in a data set, summarizes and presents the information in a convenient form useful in
making decisions. The idea of descriptive statistics is to describe a data set.
Descriptive statistics include both numerical measures, like mean and median and
graphical displays, like pie-charts.

Inferential Statistics: It utilizes sample data to make estimates, decisions, predictions


and other generalizations about a larger set of data. Examples of inferential statistics
might be z or t – statistics.

Experimental unit is an object upon which data is collected.

Population is a set of units that is of interest to a study.

Variable is a property of an individual experimental unit.

Sample is a subset of the units of a population.

The main target of inferential statistics is to make conclusions about a population based
on a sample of data from that population. One commonly used inferential technique is
hypothesis testing.

5
A Statistical Hypothesis is an educated guess about the relationship between two or
more variables. For instance, an educational leader may have a lingering question:
does a graduate of RMU have a better chance of securing a job compared to graduates
of other Universities in Ghana?

The hypothesis would be that graduates of RMU would have a better chance of
securing jobs since the graduates are of superior abilities. The processes for running
the test are executed, once the hypothesis is formed. In forming statistical hypotheses,
the variables are either dependent or independent.

Dependent Variables are variables which represent the effects that are being tested.

Independent Variables are variables which represent the inputs to the dependent
variable or the variable that could be manipulated to check if it is the cause. In the
above example where an educational leader seeks to find out whether it is graduates of
RMU or others who have greater chances of securing jobs, the dependent variable is
whether a graduate is able to secure a job. The independent variable is which university
completed, whether RMU or other.

A statistical test is done to evaluate the data collected on graduates who secured jobs
from RMU or others; to find out if the educational leader’s hypothesis is correct or
otherwise.

Elements of a Descriptive Statistical Problem

1 Define the population (sample) of interest

2 Select the variables to be studied

3 Select the tables, graphs or numerical summary tools

4 Identify patterns in the data

Elements of an Inferential Statistical Problem


1 Define the population of interest
2 Select the variables that are under study

3 Select a sample of the population unit

4 Run the statistical test on the sample

6
5 Generalize the result to your population and draw conclusions

SAMPLING
PROBABILITY AND NON – PROBABILITY SAMPLING TECHNIQUES

Probability sampling involves using random selection so that each unit in the population
has a known / equal chance of being selected. Probability sampling keeps sampling
error low and samples are seen to be representative.
Non – probability sampling does not involve random selection so some units in the
population may have had a higher chance of being selected.

Generalizability refers to being able to use sample results as if they applied to the
whole population.

Non-Response is a source of non-sampling error when someone in the sample does


not respond (to questionnaire or interview). A fair amount of this is normal and there are
many reasons for it to happen (lack of interest).

Representative sample refers to the population accurately – showing the same


distribution of characteristics or variables as the whole population.

Sampling Error is the difference in result between a sample and that of the population

Sampling Frame is a list of all units in the population from which a sample could be
selected.

Sampling Fraction is the number required for sample divided by number in total
sampling frame expressed as a fraction or percentage.

Non-probability sampling techniques include Convenience/accidental, Quota,


Snowball (Chain or Network) and Purposive Sampling.

Convenience: sampling chosen for ease rather than through random sampling. Used in
pilot studies or short term projects where there is insufficient time to construct a
probability sample. The results cannot be generalised to the population.

Quota: used in market research and opinion polling. The sample is chosen to include a
certain proportion of particular variables (gender, age group). There is no random
sampling stage, the choice of respondent is up to the interviewer provided the quota is a
accurate.

Snowball: an initial group of respondents relevant to the research topic are contacted
and then his group to contact others for the research.
There is no sampling frame and not random and sometimes difficult to pre-define the
population (eg creative ideas contributors in a company). This technique is mostly used
in qualitative approaches.

7
Purposive: One’s own judgment is used in selecting a sample and used with small
populations within qualitative research, especially case studies or grounded theory. This
approach cannot yield any statistical inference about the population.

Probability sampling techniques include, Systematic, Stratified, Multi-stage, Simple


Random (Lottery, Random number and Computer methods), Cluster, Area, Panel,
Spatial

Systematic: sample is chosen directly from the sampling frame, doing without any
random number in selecting a random sample. With a random sample proportion,
example, 1 in 10, start with a random number generated item in the list, then choose
every 10th name until the sample is complete

Stratified: specifies any characteristics which need to be distributed within the


population, strata for each are identified and within each group, random or systematic
sampling could proceed.

Multi – Stage: it helps when drawing a sample from a geographically dispersed


population. The sampling frame is into clusters and a random or systematic sample
taken. The randomly sampled units are put together. This could introduce some bias but
using both cluster and systematic sampling could usually produce effective samples.

SELECTING APPROPRIATE TECHNIQUES FOR DIFFERENT STUDIES

We often wonder about how large a sample should be. There is no right answer to
sample size. It is more important to look at the absolute size of a sample than its relative
size in relation to the total population. Imagine 10% of a population as fine sample.
Then sample size for 1m is 100 thousand and that for 10000 is 1000 and for 1000, we
have 10.
This sample could be quite unrepresentative of the total population by itself. So relative
sample size is not important but absolute size is. The larger the sample size, the more
the sample is likely to represent the population and the lower the sampling error. The
larger the absolute size of a sample, the more closely its distribution will be to the
normal distribution.
For a statistical analysis on data, the minimum size of sample for any one category of
the data should be 30, as this is most likely to offer reasonable chance of normal
distribution. If the sample frame is 30 or less, it is prudent to include the whole frame,
rather than sampling.

Margin of Error: the expected margin of error is affected by absolute size of sample
within a population. 5% margin of error (95% certainty) is the maximum normally
appropriate for rigorous research. There is diminishing need for higher samples at the

8
high population end of the table (the figures to achieve 95% certainty for a population of
1m is the same as for a population of 10 m).

Time and Cost


Precision in the data increases up to sample size of 1000 but then begins to decrease
making it less worthwhile to interview or survey more. It is time consuming and very
costly when using large sample size.

Variation in population
If population is highly varied the sample size will need to be larger than if the population
is less varied.

Central Limit Theorem


For any sequence of independent identically distributed random variables, 𝑥 1 , 𝑥 2 ,
𝑥 3. . . . , 𝑥𝑛 with finite mean, µ, and non-zero variance, σ2 , then, provided n is sufficiently
σ2
large, 𝑥 has approximately a normal distribution with mean µ and variance , where
𝑛
𝑥 1, 𝑥 2, 𝑥 3. . . . , 𝑥𝑛 σ2
𝑥= . In symbol, 𝑥 ˗ 𝑁(µ, ).
𝑛 𝑛

Example; A continuous random variable, has a probability density function, f(x), given
2𝑥 𝑓𝑜𝑟 0 ≤ 𝑥 ≤ 1
by 𝑓 (𝑥 ) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find (a) the mean, µ (b) the variance, σ2 , of this distribution
A random sample of 100 observations is taken from this distribution, and the mean, 𝑥,
is found. Write down the distribution of 𝑥.

Solution
1 1 1 2 2
(a) µ = ∫0 𝑥𝑓(𝑥)𝑑𝑥 = ∫0 𝑥(2𝑥)𝑑𝑥 = ∫0 2𝑥 2 𝑑𝑥 = [3 𝑥 3 ]10 = 3
1 1 1 2 2 1 4 1
(b) σ2 = ∫0 𝑥 2 𝑓(𝑥)𝑑𝑥 - µ2 = ∫0 𝑥 2 (2𝑥)𝑑𝑥 - µ2 = ∫0 2𝑥 3 𝑑𝑥 - µ2 = [4 𝑥 4 ]10 - ( 3)2 = 2 − 9 = 18
1
σ2 18 2
By the Central Limit Theorem, the distribution of 𝑥 is approximately 𝑁(µ, 𝑛 ) = 𝑁(3 , 100 )
2 1
= 𝑁(3 , 1800)

VARIABLES
Variables are into qualitative and quantitative. Qualitative variables are variables that
could be placed into distinct groups in accordance with some characteristics, with each
element belonging to only one category. Different types of data fall into four categories:
interval (quantifiable), ordinal, nominal and dichotomous (all three referred to as
categorical).

9
Interval variables
It is the highest form of measurement and the easiest to manipulate and analyze. There
is a fixed interval (space) between each variable and this is a consistent space. There
could be answers involving age, income and weight. There is an even more precise
form of this variable known as ratio variable.

Ordinal variables
These can be rank-ordered but the space between the variables is not equal across the
range. For example, asking for ages: 1-5, 6-10, 11-15 and over 16. The last category
changes the entire set into ordinal and constraints what we do with the data.

Nominal variables
This cannot be rank-ordered at all. An example could be to offer alternative answers in
a multiple choice question such as ‘sometimes’, ‘occasionally’ and ‘often’.

Dichotomous variables
This answer can fall into only one of two categories, treated as special kind of nominal
variable. For example YES / NO, MALE / FEMALE, TRUE / FALSE.

SOURCES OF DATA

Data is the observed, measured and the recorded values pertaining to a variable.
It is a collection of raw facts and could be measurements, observations, words etc.
Data could include marks of students in statistics exams, ages of people.
The types of data or variables are qualitative and quantitative.
Qualitative variables cannot be measured on a natural numerical scale and could only
be classified into categories. For instance, gender and degree of satisfaction.

Quantitative data or variables are recorded on a naturally occurring scale, examples,


height and age.

There are primary and secondary sources of data.


Primary data is collected from original sources and it is raw in nature.
The methods or tools for obtaining primary data include questionnaires, interviews,
observations, experiments and focused groups.
Secondary data are processed in nature and might have been used in an earlier study.
Secondary data can be in the form of journals, books and articles.

10
DATA ORGANISATION

Grouping or classifying data


These conditions are necessary when classifying data into a frequency distribution
table:
1. The score base (the upper limit of the data elements or the total out of which the
scoring was done) should be more than 20
2. Data should be large enough, say, of size not less than 30.
3. Range of data should be more than 20.

Minimum Number of Classes Required


Let k be the required number of classes for any data with size, N. then 2 k ≥ N, log 2k ≥
log N, k ≥ log N/ log2, k must always be a whole number.

Class interval (c) = Range of scores (R) / Number of classes (k).

Qualitative data is often presented with bar charts, pie charts and frequency polygons
whiles quantitative data comes with scatter diagrams, stem-and-leaf plots, histogram,
cumulative frequency curves(ogive), and frequency polygons or bar graphs (component
/ multiple).
SIMPLE BAR CHART
The pupils in a class are classified according to their favourite soft drinks, as in the table
below.
Fanta Coke Sprite Lemon Malt
8 10 15 6 12
Use a simple bar chart to illustrate the above

20 BAR GRAPH

BAR GRAPH
0
FANTA COKE SPRITE LEMON MALT

COMPONENT/SEGMENTED/STACKED BAR GRAPH

The following shows the distribution of students in various courses in a S.H.S.


Draw a component bar graph with it.
Gender Science Arts Business H/Econs
Male 35 35 40 5

11
Female 15 35 20 35
Total 50 70 60 40

80
60
FEMALE
40
MALE
20
0
SCIENCE ARTS BUSINESS HOME ECONOMICS

MULTIPLE BAR GRAPH

Below is the distribution of males and female in five cities in Africa (in thousands)
Gender Accra Free town Monrovia Abuja Pretoria
Males 200 100 50 40 150
Females 300 150 60 60 200
Illustrate this using multiple bar graphs.

350
300
250
200
150
100 MALES
50
0
FEMALES

THE PIE CHART

Below is the distribution of students in RMU statistics class. Illustrate on a pie chart.
Programs Number of students
BME 45
BEE 18
BCE 8

12
STUDENTS IN RMU

11%
BME
26% BEE
63% BCE

THE FREQUENCY POLYGON

Construct a frequency polygon or line chart for the following data.


Month June July August September October
Amount of 80 100 50 35 70
rainfall/mm

200 AMOUNT OF RAINFALL


0
AMOUNT OF RAINFALL

STEM - AND - LEAF - PLOT

It represents discrete quantitative data in a way that can be used to study the shape of a
frequency distribution as well as the range of the values. The plot could easily be used
to recreate the data.

Example: Construct a stem-and-leaf plot for the following data set;


11 10 42 45 16 18 25 33 51 12
37 41 28 46 19 56 51 46 25 16
24 59 22 33 35 50 56 36 48 39
27 53 54 21 17 27 32 38 55 15
37 13 44 44 46 37 42 58 40 50

STEM LEAF
1 012356678

13
2 12455778
3 2335677789
4 01224456668
5 00113456689

SCATTER DIAGRAM

The dependent variable is on the y-axis while the independent variable is on the x-axis.
We look out for correlations on the scatter diagram. There could also be a line of best fit
using the “eye ball fitting method”.
Example: The following are the number of minutes it takes 8 typist to finish a piece of
secretarial work on Monday and on Friday. Construct a scatter diagram using the data
set and indicate a line of best fit using the eyeball fitting method.
Typist 1 2 3 4 5 6 7 8
Monday(x) 9 8 10 13 11 15 13 12
Friday (y) 8 12 11 15 11 14 16 15

20 SCATTER DIAGRAM
10

0
SCATTER DIAGRAM
0 5 10 15 20

HISTOGRAM

Example: Construct a histogram for the following data.


Family size 16-20 21-25 26-30 31-35 36-40
No. of 10 13 8 11 19
families

UNEQUAL CLASS INTERVAL FREQUENCY DISTRIBUTION

When the class intervals are of different width then the heights of the bars are
proportional to frequency density = class frequency x k where k = height scale factor
Class width

14
Example: Represent the data set below with a histogram.
Height (cm) 140-144 145-149 150-159 160-164 165-174
Frequency 4 5 10 10 8

Class boundaries Class width(w) Frequency(f) Frequency


density(f/w.k)
139.5-144.5 5 4 0.8k
144.5-149.5 5 5 K
149.5-159.5 10 10 K
159.5-164.5 5 10 2k
164.5-174.5 10 8 0.8k
Let k=10

MEASURES OF CENTRAL TENDENCY

THE MEAN
Ungrouped Data;
Suppose we have n observations, x1, x2, x3, . . . , xn. The mean or mean value is defined
𝑥𝑖
as 𝑥̅ = ∑𝑛𝑖=1 𝑛

Example
The set of numbers, x2, 3, 3x - 4, 7, 9, where x is a positive integer, has a mean of 8.6.
Find x.

Solution
𝑥𝑖 x2+3+3x− 4+7+9
𝑥̅ = ∑𝑛𝑖=1 = = 8.6 ⇒ x2 + 3x - 28 = 0 ⇒ x = 4
𝑛 5

Simple Frequency Distribution


∑ 𝑓𝑥
Let values x1, x2, x3, . . . , xn have corresponding frequencies, f1, f2, f3, . . . ,fn. Then, 𝑥̅ = ∑ 𝑓

Example: Calculate the mean for the data set

x 1 2 3 4 5
f 2 3 4 5 6

15
Solution
x f fx
1 2 2
2 3 6
3 4 12
4 5 20
5 6 30
sum 20 70

∑ 𝑓𝑥 70
𝑥̅ = ∑𝑓
= 20 = 3.5

Grouped Frequency Distribution


Find the class midpoint as x and use same procedure as in simple frequency
distribution.

Example: Consider the table below and calculate the mean

Class 16-20 21-25 26-30 31-35 36-40


frequency 10 13 8 11 20

Solution

Class midpoint(x) Frequency(f) fx


18 10 180
23 13 299
28 8 224
33 11 363
38 20 760
Sum 62 1826

∑ 𝑓𝑥 1826
𝑥̅ = ∑𝑓
= = 29.5
62

The Weighted Mean


Weights are assigned to data values and their means calculated using the values and
∑ 𝑤.𝑥
their assigned weights. W (𝑥̅ ) = ∑ 𝑤

Example: Find the weighted mean of three test results, 80, 85, and 75, where the first
test counts 20%, the second 30% and the third counts 50 %

Solution
∑ 𝑤.𝑥 20(80)+30(85)+ 50(75) 7900
. W (𝑥̅ ) = ∑ = = = 79
𝑤 20+30+50 100

16
Simplifying the Calculation of the Mean
Suppose we want to determine the mean of the set of numbers 507, 508,498, 502,497.
A direct method gives 502.4.
We could make these numbers smaller by subtracting 500 from each number, yielding
7, 8, -2, 2, -3. These added give 12 and their mean is 12/5 = 2.4. To find the mean of
the original values, add 500 to 2.4 to give 502.4.

∑(𝑥−𝑎)
𝑥= 𝑛 +a
The heights, y cm, of a sample of 90 students are summarized by the equation∑(𝑥 −
200) = 280. Find the mean height of a student.
Solution
∑(𝑦−𝑎) ∑(𝑦−200) 280
𝑦= 𝑛 +a= + 200 = + 200 = 203.1
90 90

Properties of the Mean

 It is unique, that is there exists only one for any data set.
 It is more representative since every unit is considered.
 Extreme values affect the mean (hence use for data without outliers).
 Applied quite often in hypothesis testing.
 Used for interval / ratio (not skewed) data.

THE MEDIAN

Ungrouped data;
𝑛+1 th
Odd number of observation: ( ) position
2

𝑛 𝑛+2 th
Even number of observations: [(2 )th ( ) ]/ 2
2

Grouped Data
∑𝑓
−𝑐
Median = L + h[ 2 𝑓 ]

17
L =lower class boundary of the median class, n= total frequency, f= frequency of median
class, h= class interval, c= cumulative frequency, down to preceding class before
median class.

Example: Find the median for the data set below

Class 16-20 21-25 26-30 31-35 36-40


Frequency 10 13 8 11 20
Solution

Class Class Frequency Cumulative


Boundaries frequency
16-20 15.5-20.5 10 10
21-25 20.5-25.5 13 23
26-30 25.5-30.5 8 31
31-35 30.5-35.5 11 42
36-40 35.5-40.5 20 62

L =25.5 {1/2 x 62=31th position}; n= 62, h=5, c=23, f= 8


31−23
Median = 25.5 + [ ] x 5 =25.5 +5 = 30.5
8

Properties of the Median

 It is unique
 Not affected by extreme values /outliers, hence preferred.
 Found for nominal data and preferred for ordinal data.
 Not all units are involved in its calculation.
 Used for interval /ratio (skewed) data.

MODE

It is the value or class with highest frequency. A bimodal data set has two modes whiles
a multimodal data set has more than 2 modes.

Grouped Frequency
𝑎
Mode =L +h [ 𝑎+𝑏 ]

18
L = lower class boundary of modal class, h = class interval, a = difference between
modal frequency and frequency above it, b = difference between modal frequency and
frequency below it.

For data above; L = 35.5, h = 5, a = 20-11=9 and b =20-0=20,


9
Mode =35.5 + [9+20] x5 = 35.5 +1.6 =37

Properties of the Mode

 Not unique, could be bimodal or multimodal.


 Not affected by extreme values.
 For both qualitative and quantitative data.
 For mostly nominal data.

PERCENTILES

Arrange the data in ascending order and find the position of the pth percentile as p/100 (n
+ 1), to the nearest whole number.

Example, the 70th percentile of a 30 unit data set could be found as

70th percentile = [70/100 (30 + 1)]th value = 21.7 = 22nd value

QUARTILES

First quartile Q1, is 25th percentile, 2nd quartile Q2, is 50th percentile (median) and third
quartile,

Q3, is 75th percentile.

19
MEASURES OF DISPERSION

The Range;

Range = Highest value – Lowest value

For 2, 2, 3, 3, 4, 4, range = 4 – 2 = 2.

MEAN DEVIATION

Mean Deviation = 1/N ∑[|𝑥 − 𝑋̅ |]

Example The data set 25, 26, 27, 28, and 29 are scores in a mid-semester exam. Find
the mean deviation.

Solution

𝑋̅ = 25+26+27+28+29 / 5 = 27

x |𝑥 -𝑋̅|
25 2
26 1
27 0
28 1
29 2
sum 6
Mean Deviation = 1/N ∑[|𝑥 − 𝑋̅ |] = 6/5 = 1.2

On the average, the test scores deviated by 1.2 marks from the mean mark

VARIANCE AND STANDARD DEVIATION

∑𝑁 ̅
𝑖=1(𝑥𝑖−𝑋 )2
Sample Variance, S2 = 𝑁−1

∑ (𝑥𝑖−𝑋)2 𝑁 ̅
Sample Standard Deviation 𝜎 = √ 𝑖=1𝑁−1

If there is a data with frequencies, say, x1, x2 …x n on to f1, f2, … fn,


∑ 𝑓(𝑥−𝑋̅ )2 ∑ 𝑓𝑥2
S2 = = −(𝑋̅)2
𝑁−1 𝑁−1

20
Simplifying the Calculation of the Variance
∑(𝑥−𝑎)2 2
𝜎2 = -𝑥
𝑛

Let ∑(𝑥 − 200) = 280 and ∑(𝑥 − 200)2 = 9000 represent information on the heights of
∑(𝑥−𝑎)2 2 9000 280
90 people then 𝜎 2 = -𝑥 = – ( 90 )2 = 90.32 and therefore the standard
𝑛 90
deviation is 9.5.

NB: The standard deviation remains same for both the original and new values.

For a grouped frequency distribution, it is assumed that all the values in a class are
centered at the mid-point or a data with a particular class allocated at the mid-point
which results in grouping error; and corrected by the Sheppard’s correction as (S2 –
C2/12), C = common class interval size.
S2− C2
Corrected Variance = and Corrected Standard Deviation = √Corrected Variance
12

Standard deviation shows how data deviates from the central (mean).

For standard deviations, 𝜎1 and 𝜎2, for samples, N1 and N2, if 𝜎1 > 𝜎2, then sample N1 is
more spread than N2.

ABSOLUTE AND RELATIVE DISPERSION

Actual variation of dispersion as determined from the Standard Deviation or other


measures of dispersion is called the absolute dispersion.

However, a variation of dispersion of 100cm on measuring distances of 1000m is quite


different in effect from a variation of 100cm in measuring 50m. We therefore require the
coefficient of variation.

Coefficient of variation = 𝜎 /𝑋̅, always in %.

Example, 𝜎 = 2kg, 𝑋̅= 5kg, Coefficient of Variation = 2/5 x 100 = 40%.

Example

A manufacturer of electrical gadgets has two devices A and B. The devices have respective mean
life spans of 578 days and 825 days with corresponding standard deviations of 80 days and 150
days. Which device is preferred?

Solution

Coefficient of Variation of A = 80/578 X 100 = 13.8%

21
Coefficient of Variation of B = 150/825 x 100 = 18.18%.

A is preferred because the data are more closely spread (with lower percentage) around
the mean.

Example

Calculate the standard deviation for 143, 155, 167, 171, 181, and 191.

SKEWNESS

Skewness shows the degree of departure from symmetry of a distribution. Data which is
not normal or symmetrical is skewed positively or negatively.
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛)
Skewness 𝛾 1= 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

∑(𝑋−𝑋̿)3
Sample skewness = (𝑁−1)𝑆 3

∑(𝑋−𝑋̿)3
Population skewness = 𝑁𝑆 3

If there is a data with frequencies, say, x1, x2 …x n on to f1, f2, … fn,


∑ 𝑓(𝑋−𝑋̿)3
Population skewness = 𝑁𝑆 3

Example

Calculate the skewness for the data set below

Class 11-15 16-20 21-25 26-30 31-35 36-40 41-45


Frequency 20 18 28 21 25 29 19

The Normal Curve

A normal distribution is zero skewed and bell-shaped

The mean, median and mode are all located on the line of symmetry

22
Positive Skewness

Mean > median > mode

Negative Skewness

Mean < median< mode

23
KURTOSIS

Kurtosis is the level of peakedness of a normal distribution. It indicates the closeness


of the data to the mode.

∑(𝑋−𝑋̿)4
Sample kurtosis = (𝑁−1)𝑆 4

∑(𝑋−𝑋̿)4
Population kurtosis = 𝑁𝑆 4

If there is a data with frequencies, say, x1, x2 …x n on to f1, f2, … fn,


∑ 𝑓(𝑋−𝑋̿)4
Population kurtosis =
𝑁𝑆 4

A Platykurtic distribution has a lower peak than normal distribution and lighter tails.
It has negative kurtosis, meaning the data points are distributed closer to the extreme
values than the mode which lies at the middle. The graph looks a little flat with gentle
slope.

Mesokurtic distribution has zero kurtosis and the data points are evenly distributed.

A Leptokurtic distribution has higher peak than normal distribution and has heavier
tails. A leptokurtic distribution has positive kurtosis, meaning the data points are
gathered closer to the mode of the distribution, thereby making the peak of the graph
pointed with steep slope.

24
CORRELATION

Correlation is a statistical technique which shows whether variables are related and the
extent. For a positive correlation as the values of one variable increase, the values of
the other variable also increase and vice versa; example, voltage supplied and current
generated. There is a direct relationship. For a negative correlation as the values of the
first variable increase, the values of the second variable decrease or there is an inverse
relationship, example, supply of a product and its price. Zero correlation implies no
relationship, example skin colour and intelligence.

COEFFICIENT OF CORRELATION (r)


Coefficient of correlation measures the strength and direction of relationship between
two variables. - 1 ≤ r ≤ 1. O indicates no relationship, -1, perfect negative relationship
and +1 indicates a perfect positive relationship. O and 0.3 (weak positive relationship),
0.3 and 0.7 (moderately positive relation), 0.7 and 1 (strong positive relationship).
Similar interpretations hold for negatives.

PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT


𝑛(∑ 𝑥𝑦)−∑ 𝑥 ∑ 𝑦
r=
√[𝑛 ∑ 𝑥 2− (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2]

Example

16 Calculate Pearson’s product Moment Correlation Coefficient and indicate its significance.

Age(x) 20 21 22 23 24 25
Mark (y) 25 30 35 37 28 29

Solution

x y xy x2 y2
20 25 500 400 625
21 30 630 441 900
22 35 770 484 1225
23 37 851 529 1369
24 28 672 576 784
25 29 725 625 841
135 184 4148 3055 5744

25
6(4148)−135(184)]
r= = 0.2
√[6(3055)−18225][6(5744)−33856]

There is a weak positive relationship between x and y.

SPEARMAN’S COEFFICIENT OF RANK CORRELATION

Spearman’s Coefficient of Rank Correlation, rho or is non parametric measure for


discovering the degree of association between numerical ranks of objects from
attributes x and y presented in the data as P1(x1, y1) - - - Pn (xn, yn). The objects are
ranked with respect to x first and then y.

6 ∑𝑛 2
𝑖=1 𝑑𝑖
𝜌=1− where d = difference between the ranks of the two pairs of variables, n
𝑛(𝑛 2−1)
= number of paired values.

For two or more sample with same values, the average rank should be used. E.g. for 2
samples all with the same depth ranked 5 th in order, you should add the rank values
together. If after ranking and reach 5 th position and moving to 6th but are 2(8 and 8)
values to be ranked, add the rank values together (6+7 = 13) and divide by numbers of
samples with same depth number ( 13/2 = 6.5th and the next has rank of 8th.

Example

Calculate the spearman’s rank correlation for the data and explain the value.

Age (x) 25 26 28 32 35 40
Mark (y) 60 65 77 89 72 87
Solution

x y r(x) r(y) d= r(x)- r(y) d2


25 60 1 1 0 0
26 65 2 2 0 0
28 77 3 4 -1 1
32 89 4 6 -2 4
35 72 5 3 2 4
40 87 6 5 1 1
10
6 ∑𝑛 2
𝑖=1 𝑑𝑖 6(10) 10
𝜌=1− =1− =1− = 0.7
𝑛(𝑛 2−1) 6(62−1) 35

Strong Positive Relationship

26
REGRESSION ANALYSIS

A regression line is a straight line that describes how a dependent variable changes
with respect to an independent variable. The line is to explain the change in a
dependent variable in terms of an independent variable and also to predict the values of
the independent variable for a given dependent variable.

The best line is the line that minimizes these distances. The least squares regression
line is the line y=a+bx; y is the predicted response for any predictor x ; a is the y-
intercept and b is the slope. There are simple and multiple linear regressions. Multiple
linear regression has two or more independent variables against one dependent
variable.

(∑ 𝑦)(∑ 𝑥 2 )−(∑ 𝑥)(∑ 𝑥𝑦) 𝑛(∑ 𝑥𝑦)−∑ 𝑥 ∑ 𝑦


a= and b=
𝑛(∑ 𝑥 2 )−(∑ 𝑥)2 𝑛(∑ 𝑥 2)−(∑ 𝑥)2

Example

X 11 12 13 14 15 16 17
y 10 13 16 10 18 11 18
Fit a regression line equation to the data above and predict the value of x when y is 15.

Example

Fit a regression line equation to the data below.

X 5 7 9 10 12 15
y 12 16 20 22 26 32

27
PROBABILITY

Introduction To Probability

The foundation for inferential statistics is probability.

Uncertainty; It is the study of something that is unknown.


It could pertain to an individual or a group of people. Uncertainty may not be universal.
People may have different assessments of the chances of an event happening or not.
Since uncertainty varies across different people, there must exist a away to quantify
uncertainty in order to be able to discuss it in precise terms.
The commonly used measurement of uncertainty is probability.

Thus probability is a measure of uncertainty or the likelihood that something would


happen.
We could express probability as decimals, fractions or percentages. For instance, if you
are told that there is a 98% chance of passing exams, then you almost probably need to
go to sleep.

A statistical experiment, which is an observation that leads to a single outcome that


cannot be predicted with certainty, is similar to any other kind of experiment.
There is a hypothesis to be tested without knowing the result. Flipping a coin, rolling a
dice, comes with uncertainties.
We might pull several cards out of a deck and ponder how many face cards (jack,
queen, king, or ace) would be pulled.

Sample Space is the collection of all the simple events for a statistical experiment,
denoted S.
There exist two simple and important rules to be observed while assigning probabilities
to simple events:

1. All simple event probabilities are between 0 and 1


2. The probabilities of all sample points must sum up to 1

Steps for Calculating Probability of Simple Events;


1. Define the experiment
2. List the simple events
3. Assign probabilities to the simple events
4. Determine the collection of sample points in the event of interest
5. Sum up the sample point problems to get event probability

28
An event is a single or group of outcomes of an experiment.

The empty set and the sample space are all events.

Mutually exclusive events cannot occur together, that is A ∩ B = ∅ and

P (A ∪ B ) = P (A) + P (B).

Classical Definition of Probability

Let n (A) be the number of events in an experiment with the number of outcomes in the
sample space as n (S). Then P (A) = n (A) / n (S).

Example

Let S = {1, 2, 3, 4, 5, 6} be the sample space when a die with n (S) = 6 is tossed once;
the event that the number showing is a factor of 6, E = {1, 2, 3, 6}, with n (E) = 4, has
probability, P (E) =n (E)/ n(S) = 4/6 = 2/3.

A teacher choosing a boy from a class of 40 girls is P(B) = 0/40 = 0, which is an


impossible event and the probability that he chooses a girl is 40/40 = 1 which is a sure
event.

Example

Let two coins be tossed once or one coin be tossed two times. The sample space, S =
{HH, HT, TH, TT}, with n (S) = 4. The probability of obtaining no head has event, E =
{TT}, n {E} = 1 and hence P {E} = ¼.

The event that there is at least a tail is E = {HT, TH, TT} with n {E} = 3 and P {E} = ¾.

When a die is tossed twice or two dice are tossed once, the space is : {(1,1), (1,2), (1,3),
(1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (3,1), (3,2), (3,3), (3,4), (3,5), (3,6),
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6), (6,1), (6,2), (6,3),
(6,4), (6,5), (6,6)} with n (S) = 36. The event that the sum of the values is 1 is E = { }
with P (E) = 0/36 = 0 and the event that the sum of the values is less than 13 is the entire
sample space, a sure event with P (E) = 36/36 =1

29
RELATIVE FREQUENCY DEFINITON OF PROBABILITY

P (E) = Relative Frequency = frequency of event / total frequency.

Example: The distribution below is for the ages of children.

Ages 6 7 8 9 10
Frequency 10 20 30 40 50
The probability that a child chosen at random is 9yrs is P (9yrs) = n (9years)/n (F)
40
= /150 = 4/15.

The probability that a child chosen at random is at most 9 years is P (at most 9) =
n (at most 9) /n (F) = 10 + 20 + 30 + 40 /150 = 100/150 = 2/3.

Subjective definition of probability

This is based on educated guesses or estimates.

Axiom (true propositions) of probability

1) 0≤ 𝑃(𝐸 ) ≤ 1
2) P (A u B) = P (A) + P (B) – P (A n B) and for mutually exclusive events, P (A n B)
= P (A u B ) = P (A) + P (B)
3) P (A) + P(A’) = 1 , P (A’) = 1 – P (A)

Proofs

1 ∅⊆E⊆S

n(∅) ≤ n(E) ≤ n(S)

N(∅) n(E) n(S)


≤ ≤
n(S) n(S) n(S)

P (∅)≤ P(E) ≤ 1

0 ≤ P (E) ≤ 1

A
2

30
A∪B=A–x+x+B–x

A∪B=A+B– x

A ∪ B = A + B - (A ∩ B)

n (A ∪ B) = n (A) + n (B) – n (A ∩ B)

n (A ∪ B) n (A) n (B) n (A ∩ B)
= + −
n (S) n (S n (S) n (S)

P (A ∪ B) = P (A) + P (B) – P (A ∩ B) and when A ∅ B =∅

P (A ∩ B) = 0 ⇒ P (A ∪ B) = P (A) + P (B)

A’

A ∪A’ = S

n (A) + n (A’) = n (S)

n (A) n (A’) n (S)


+ =
n (S) n (S) n (S)

P (A) + P (A’) = 1 ⇒ P (A’) = 1 - P (A)

Example The probability that a boy with a catapult hits target A is 2/3 and that he hits
target B is ¾. Given the probability of hitting both targets to be 1/2, find the probability
that he
a) hits at least one of the targets b) does not hit any.
Solution;
P (A) = 2/3, P (A’) =1/3, P (B) = ¾, P (B’) = ¼. P (A ∩ B) = ½.
a) P (at least one) = P (either A or B or both) = P (A ∪ B)
= P (A) + P (B) – P (A ∩ B) = 2/3 + 3/4 - 1/2 = 11/12

b) P (neither of them) = P (A’ ∩ B’) = P (A’) x P (B’) = 1/3 x ¼ = 1/12 or P (none)


= 1 – P (at least one) = 1-11/12 = 1/12.

31
PROBABILITY OF INDEPENDENT EVENTS

Two or more events are independent if the probability of one of them is not affected by
knowing whether or not the other (s) have occurred. Events A and B are independent if
P (A ∩ B) = P (A) P (B). Equivalent to this is the condition P (A / B) = P (A) that is the
probability of A is the same as the conditional probability of A given B.

Example

A red and a black dice are thrown. Let R be the event that red dice shows 6 and B that
black dice shows 6; then P (R ∩ B) = P(R) P(B) = 1/6 x 1/6 = 1/36.

The Inclusion – Exclusion Formula for three events is, P(A ∪ B ∪ C) = P (A) + P (B) +
P (C) –P (A ∩ B) – P (A ∩C) – P (B ∩ C)+ P(𝐴 ∩ 𝐵 ∩ 𝐶)

Example

A bowl contains 13 red and 7 white identical balls. A ball is selected at random from the box; find the
probability of selecting

b) 4 red balls b) 4 white balls


c) 4 balls of same colour d) 2 balls of different colours

Solution

a) P (4 red balls) = P (R ∩ R ∩ R ∩ R) = 13/20 x 13/20 x 13/20 x 13/20 =0.18


b) P (4 white balls) = P (W ∩ W ∩ W ∩ W) = 7/20 x 7/20 x 7/20 x 7/20 =0.02
c) P (4 of R or 4 of W) = P (R ∩ R ∩ R ∩ R) + P (W ∩ W ∩ W ∩ W)

=13/20 x 13/20 x 13/20 x 13/20 + 7/20 x 7/20 x 7/20 x 7/20 = 0.2

d) P (R and W or W and R) = P (R ∩ W) ∪ P (W ∩ R)

= P (R) P (W) + P (W) P(R)

= 13/20 x 7/20 + 7/20 x 13/20 = 91/200

Example

The probability that A hits a target is ¾ and that B hits is 2/3 and that of C is 3/5. Given
that they fire together, find the probability that

a) they all missed the target b) Exactly one hits the target
b) at least one shot hits d) A hits given that exactly one hit is
recorded

Solution

32
P (A) = ¾ P (A’) = ¼, P (B) = 2/3 P (B’) = 1/3 P (C) = 3/5 P (C’) = 2/5

a) P (all missed ) = P (A’ ∩ B’ ∩ C’) = ¼ x 1/3 x 2/5 = 1/30


b) P ( exactly one) = P (A ∩ B’∩ C’) ∪ P (A’∩ B ∩ C’) ∪ P (A’ ∩ B’∩ C)

= P (A ∩ B’∩ C’) + P (A’∩ B ∩ C’) +P (A’ ∩ B’∩ C)

= ¾ x 1/3 x 2/5 + ¼ x 2/3 x 2/5 +1/4 x 1/3 x 3/5 = 13/60

c) P (at least one) = 1 – P ( none hitting)


= 1 – 1/30 = 29/30
d) P (A hits while only one is recorded) = P (A ∩ B’∩ C’)

= ¾ x 1/3 x 2/5 = 1/10

CONDITIONAL PROBABILITY/ PROBABILITY OF DEPENDENT EVENTS

Conditional probability is the probability of some event A, given that event B occurs. For
any two events A and B, the conditional probability of event A given B had occurred is
given as

P (A/B) = P (A ∩ B) / P (B) and

P (B/A) = P (B ∩ A) / P (A)

[P (A), P (B) ≠0 for B must have surely happened]

Example

Suppose that at a Goil Filling Station, 60% of drivers check their oil levels, 40% check
tyre pressure and 10% check both oil levels and tyre pressure. Suppose also that a
driver is selected at random without bias. What is the probability that a driver checked
his tyre pressure given that he had checked his oil levels?

Solution

P (O) = 60% = 0.6, P (T) = 40% = 0.4, P (O ∩ T) = 10% = 0.1

P (T/O) = P (T ∩ O) / P (O) = 0.1 / 0.6 =1/6.

Again, the probability that oil levels are checked given that tyre pressure is checked is
given by P (O/T) = P (O ∩ T) / P (T) = 0.1 / 0.4 = 1/4

33
Example

Students of a school were selected for an alcohol test. The table is a distribution of
results.

Chemical test Positive test result Negative test result Totals


Subject is alcoholic 7 70 77
Subject not alcoholic 9 13 22
Totals 16 83 99

A student is selected at random, find the probability that

a) The student tested positive given that he/she is alcoholic


b) The student was alcoholic given that he/she tested positive.

Solution
7
99 7 99 7 1
a) P (positive /alcoholic)=P (positive and alcoholic) / P(alcoholic) = 77 = 99 × 77 = =11
77
99

b) P (alcoholic/ positive) = P ( alcoholic and positive) / P (positive)


7
99 7 99 7
= 16 = 99 × 16 = 16
99

TOTAL PROBABILITY

If event A could be realized only when one of the events B 1, B2, B3, B4, - - - Bn occurs,
then the probability of event A is

P (A) = P (B1) P (A /B1) + P (B2) P (A/B2) + P (B3) P (A/B3) + - - + P (Bn) P (A/Bn)

P (A) = ∑𝑛𝑖=0 P (Bi) P (A / Bi) = Total Probability

Example

In a used car garage, 45% of the cars are manufactured in U.S.A and 20% of these cars are
compact, 25% are manufactured in Europe and 30% are compact and finally 30% are
manufactured in Japan and 70% of them are compact..

a) If a car is selected at random from the garage, find the probability it is compact
34
b) Given that the car is compact, find the probability that it is manufactured in
Europe.

Solution

Let A = Compact Car B1 = USA Car B2 = European Car B3 = Japan Car

P (B1) = 0.45 P (B2) = 0.25 P (B3) = 0.3

P (A/B1) = 0.2 P (A/B2) = 0.3 P (A/B3) = 0.7

(a) P(A) = P (B1 and A) + P (B2 and A) + P (B3 and A)


= P (B1) P (A/B1) + P (B2) P (A/B2) + P (B3) P (A/B3)

= 0.45 ×0.2 + 0.25×0.3 + 0.3×0.7 = 0.375


𝑃(𝐴 𝑎𝑛𝑑 𝐵2) P (B2)P (A/B2)
(b) P (B2 /A) = = = 0.25× 0.3 / 0.375 = 0.2
𝑃(𝐴) 𝑃(𝐴)

Example

Three machines, X, Y and Z are used to produce greeting cards.

During a day’s production, X produces 1440 cards, Y produces 864 cards and Z does 576 cards.
The probability of X producing a defective card is 0.02, that of Y is 0.1 and that of Z is 0.05.
Find the probability that at the end of the day, one card selected at random will be defective.

Solution

Total Cards = 1440+864+576 = 2880

P(X) = 1440/2880 = 0.5 P(Y) = 864/2880 = 0.3 P (Z) = 576/2880 = 0.2

P (D) = P(X) P (D/X) + P(Y) P(D/Y) + P(Z)P(D/Z) = 0.5×0.02 +0.3×0.1+0.2×0.05 = 0.05

BAYE’S THEOREM

Let E1, E2, E3 . . . , En be a collection of n mutually exclusive events such that E1 ∪ E2 ∪


E3 ∪ . . . ∪ En = S and E1 ∩ E2 ∩ E3 ∩ . . . ∩ En = ∅. Let F be an event such that P (F) >
0, then, for i = 1, 2, 3, . . . , n
𝑃(𝐸𝑖)𝑃(𝐹/Ei)
P (Ei /F) = ∑𝑛 𝑃(𝐹/Ei)𝑃(𝐸𝑖)
1

35
Example
Suppose that Bob can decide to go to work by one of three modes of transportation,
car, bus, or commuter train. Because of high traffic, if he decides to go by car, there is a
50% chance he will be late. If he goes by bus, which has special reserved lanes but is
sometimes overcrowded, the probability of being late is only 20%. The commuter train is
almost never late, with a probability of only 1%, but is more expensive than the bus.

(a) Suppose that Bob is late one day, and his boss wishes to estimate the probability
that he drove to work that day by car. Since he does not know which mode of
transportation Bob usually uses, he gives a prior probability of 1/3 to each of the three
possibilities. What is the boss’ estimate of the probability that Bob drove to work?

(b) Suppose that a coworker of Bob’s knows that he almost always takes the commuter
train to work, never takes the bus, but sometimes, 10% of the time, takes the car. What
is the coworker’s probability that Bob drove to work that day, given that he was late?

Solution

P (B) = P(C) = P(T) = 1/3

P (L |C) =0.5 P(L |B) = 0.2 P(L |T) = 0.01

𝑃(𝐶)P(L |C) 1/3×0.5


P(C |L) = P(C)P(L |C)+P(B)P(L |B)+P(T) P(L |T) = 1 1 = 0.7
×0.5+ ×0.2+1/3×0.01
3 3

(b) P (B) = 0, P(C) = 0.1, and P (T) = 0.9. ⇒ P(C | L) = 0.8

Example
In Orange County, 51% of the adults are males and the other 49% are females. One
adult is randomly selected for a survey involving credit card usage. It is later learned
that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke
cigars, whereas 1.7% of females smoke cigars. Find the probability that the selected
subject is a male.

Solution

P (M) = 0.51 P(F) = 0.49 P(C|M) = 0.095 P( C|F) = 0.017,

𝑃(𝑀)P(C |M) 0.51×0.095


P (M|C) = = = 0.85
P(𝑀)P(C |M)+P(F)P(C |F) 0.51×0.095+0.49×0.017

36
Example
Three boxes A, B and C, contain red and black balls. Box A contains 2 red and 3 black
balls, box B contains 1 and 4 black balls and box C contains 3 red balls and 1 black ball.
We choose randomly a box, and from this box we choose randomly one of the balls.
Assume that the drawn ball is red. Find the probability that the ball comes from box A.

Solution
P (A) = P(B) = P(C) = 1/3 P(R|A) = 2/5 P(R|B) = 1/5 P(R|C) = ¾
𝑃(𝐴)P(R |A) 1/3×2/5
P(A|R) = P(A)P(R |A)+P(B)P(R |B)+P(C) P(R |C) = 1 2 1 3 = 8/27=0.3
( + + )
3 5 5 4

Example
Two boxes A1 and A2 contain w1 white and b1 black balls and w2 white and b2 black balls
respectively. We draw at random one ball from each one of the boxes and then at
random one of the two balls. Find the probability that this ball is white.

Solution
Let Ai denote the event that a ball comes from box i and let A denote the event that the
ball is white. Since we choose 1 ball from each box, we get
P (Ai) = ½ , i = 1, 2
wi
P (A| Ai) = wi+bi = i = 1, 2
1 w1 1 w2
P (A) = P (A1) P (A/A1) + P (A2) P (A/A2) = 2 w1+b1 + 2 w2+b2

Example
An information channel can transmit 0s and 1s, though some errors may occur. One
expects that a sent 0 is changed with the probability 1/5 to a 1, and that a sent 1 is
changed with the probability 1/6 to a 0. It is also given that in mean 2/3 of all signals are
0s.
a) Assuming that we receive a 0, what is the probability that a 0 was sent?
b) Assuming that we receive a 1, what is the probability that a 1 was sent?
Solution
a) Ai = {i sent}, i = 0, 1 A = {1 received}
𝑃(0 𝑠𝑒𝑛𝑡)P(0 received |0 sent)
P (0 sent|0 received) =
P(0 𝑠𝑒𝑛𝑡)P(0 received |0 sent)+P(1 sent)P(0 received |1 sent)
2/3×4/5 48
=2/3×4/5+1/3×1/6 = 53

b) Ai = {i sent}, i = 0, 1 A = {1 received}

𝑃(1 𝑠𝑒𝑛𝑡)P(1 received |1 sent)


P (0sent|0received) =P(1 𝑠𝑒𝑛𝑡)P(1 received |1 sent)+P(0 sent)P(1 received |0 sent)
1/3×5/6 25
=1/3×5/6+2/3×1/5= 37

Example

37
A factory buys 1000 light bulbs of type A and 500 bulbs of type B which are somewhat
more expensive. For a randomly chosen bulb of type A there is the probability 0.6 of
that it lasts longer than 2 months. . For a randomly chosen bulb of type B we have the
probability 0.9 of that it lasts longer than 2 months. By mistake all bulbs are mixed
together. A bulb is chosen at random from the 1500 bulbs. Find the probability that this
bulb will last for longer than 2 months.
b)if a bulb lasts for more than 2 months, what is the probability that it is of type A?
1000 500
P(the bulb lasts in more than 2 months) = 1500 ×0.6 + 1500 × 0.9 = 0.7

b) P(bulb comes from A| bulb lasts more than 2 months) =


𝑃(𝑏𝑢𝑙𝑏 𝑐𝑜𝑚𝑒𝑠 𝑓𝑟𝑜𝑚 𝐴)P(bulb lasts more than 2 months |A) 2/3×0.6
= = 4/7
P(bulb lasts in more than 2 months) 0.7

Example
An aircraft emergency locator transmitter (ELT) is a device designed to transmit a signal
in the case of a crash. The Altigauge Manufacturing Company makes 80% of the ELTs,
the Bryant Company makes 15% of them, and the Chartair Company makes the other
5%. The ELTs made by Altigauge have a 4% rate of defects, the Bryant ELTs have a
6% rate of defects, and the Chartair ELTs have a 9% rate of defects.
If a randomly selected ELT is then tested and is found to be defective, find the
probability that it was made by the Altigauge Manufacturing Company.

Solution
P(A) = 0.80 P(B) = 0.15 P(C) = 0.05 P(D|A) = 0.04 P(D|B) = 0.06 P(D|C) = 0.09

𝑃(𝐴)P(D |A) 0.8×0.04


P(A|D) = = = 0.7
P(𝐴)P(D |A)+P(B)P(D |B) +P(C)P(D |C) 0.8×0.004+0.15×0.06+0.05×0.09

COUNTING RULES: PERMUTATIONS AND COMBINATIONS

n! = n (n - 1) (n - 2) - - - (3) (2) (1)

E.g. 7! = 7 x 6 x 5 x 4 x 3 x 2 x 1

Note: 0! = 1! = 1

THE ADDITION THEOREM

If an operation could be done in m ways and a second independent operation could be


performed in n ways, then either of the two could be performed in (m + n) ways.

Example

In how many ways can a number be chosen from 1 - 22 such that

38
a) It is a multiple of 3 or 8? (b) It is a multiple of 2 or 3?

Solution

a) Let n (A) = numbers involved in multiples of 3 or 8.


E1 = {3, 6, 9, 12, 15, 18, 21}
E2 = {8, 16}
n (A) = n (E1) + n (E2) = 7 + 2 = 9

b) n (D) = numbers involved in multiples of 2 or 3


E3 = {2,4,6,8,10,12,14,16,18,20,22}
E4 = {3, 6, 9, 12, 15, 18, 21}
n (D) = n (E3) + n (E4) – n (E3 ∩ E4) = 11 + 7 – 3 = 15

MULTIPLICATION THEOREM
Suppose that event D1 could result in any one of n (D1) outcomes, and for
each outcome of the event D1, there are n (D2), then together there will be n
(D1) x n (D2) outcomes for the two events.
n (D) = n (D1) x n (D2)

Example
One has 20 pairs of jeans and 16 shirts, in how many ways could the person
combine these clothes if he wears a pair of jeans and a shirt at a time?

Solution
n (A) = n (E1) x n (E2) = 20 x18 = 360 ways.

PERMUTATIONS
Permutation is the different arrangements of a given number of things by
considering some or all at a time.
In permutations; ‘order’ is the watch word. In general, the number of
permutations of n distinct things taking them all at a time = nPn = n!
n
pn = n! / (n – n)! = n! / 0! = n! /1 = n!
E.g. 4p4 = 4! / (4 -4)! = 4! / (4 – 4)! = 4! / 0! = 4x3x2x1 = 24
Again, npr =n!/ (n – r)!
E.g. 6p3 = 6! / (6 -3)! = 6! /3! = 6 x 5 x 4 = 120

39
Example
How many ways can gold, silver, and bronze medals be awarded for a race run by 8
people?

Solution.
8!
Using the permutation formula we find P (8,3) = (8−3)!= 336 ways.

Example
How many five-digit zip codes can be made where all digits are unique? The possible
digits are the numbers 0 through 9.
10!
Solution. P(10, 5) = (10−5)! = 30, 240 zip codes.
Example

In how many ways could a supermarket manager display 10 brands of cereals in 6


spaces on shelf?

Solution

An orderly arrangement with n = 10 and r = 6.


10
p6 = 10! / (10-6)! = 10 x 9 x 8 x 7 = 151200

Example

How many different number plates for cars could be made if each number plate
contains four (4) of the digits from 0 – 9 followed by a letter A – Z, and prefixed with GT,
assuming that

a) no repetition allowed b) repetition allowed

Solution

0 – 9 gives, n = 10, r = 4
a) No repetition; 10
p4 = 10! / (10 – 4)! = 10! / 6!

= 10 x 9 x 8 x 7 = 5040

From A – Z gives 26 letters, hence 26 x 5040 = 131040 number plates

b) 0000 – 9999 gives 10000 sets


26 x 10000 = 260000 number plates

40
If 0 is not used, we’ve 260000 – 26 = 259974 plates

In general, the numbers of different permutations of n objects of which n 1 are of the


same kind, n2 are of a second kind, - - - ,n k of the k kind is npr = n!/n1! xn2! x - - - x nk!

Example

In how many ways can the letters of the word ‘STATISTICS’ be arranged?

Solution
n1 = S, n (n1) = 3 n2 = T, n (n2) = 3 n3 = A, n (n3) = 1
n4 = I, n (n4) = 2 n5 = C, n (n5) = 1
n
pr = 10! /3! 3! 1! 2! 1! = 10 / 3! 3! 2! = 50400 ways

COMBINATIONS

The number of ways of selecting r items from a set of n distinct objects without regard to
any order is referred to as combinations.

Cr = nPr / r ! = n! /r! (n – r)!


n

Example
How many ways are there to select a committee to develop a discrete mathematics
course at a school if the committee is to consist of 3 faculty members from the
Mathematics department and 4 from the computer science department, if there are 9
faculty members of the math department and 11 of the CS department?

Solution.
9! 11!
There are C (9, 3) · C(11, 4) = × = 27, 720 ways.
3!(9−3)! 4!(11−4)!

Example How many combinations are there in 6 distinct things taking 4 at a time?

Solution C4 = 6! / 4! (6 – 4)! = 6! /4! 2! = 6 x 5/2 = 15 ways


6

n
NB; Ck = nC n-k

Example
In how many different ways can 4 of 13 teachers be selected to assist with the
preparation of examinations?

Solution

41
n = 13, r = 4, hence 13C4 ways
13
C4 = 13! / 4! (13 – 4)! = 13! / 4! 9! = 13 x 12 x 11 x 10 /24 = 715

Example

A statistics lecturer sets 7 questions in an end of semester exam and students were
asked to attempt any 4 of them. Find the number of ways of selecting these questions.

Solution

n = 7, r = 4, 7
C4 ways C4 = 7! / 4! (7 – 4)! = 7! / 4! 3! = 7 x 6 x 5/ 6 = 35
7

ways

Example

In how many ways could a committee comprising of 7 men and 6 women be formed
from a group of 9 men and 8 women?

Solution;

7 from 9 is 9C7 and 6 from 8 is 8C6

Ways = 9C7 x 8C6

=9! / 7! (9 – 7)! X 8! / 6! (8 – 6)! = 36 x 28 = 1008

Example

A box contains 12 red, 6 white and 10 blue balls. If three balls are drawn at random
simultaneously, find the probability that

a) all are red b) 2 are red and 1 white

c) at least one is red d) 1 of each colour is drawn

Solution

selecting 3 of 12 balls
a) P (all red) =
Ways of selecting 3 out of (12 + 6 + 10) balls
12C3 12! /3! 9! 2 x 11 x 10 55
= = =
28C3 28! / 3!25! 28 x 9 x 13 819

2 of 12 and 1 of 6
b) P (2 red and 1 white) = 28C3

42
= 12C2 x 6C1 = 66 x 6 = 11
28
C3 32764 91
c) P (at least 1 red) = 1 – P (no red) = 1 – C3/28C3
16

= 1 – 16! / 3! 13!
28! 3! 25! = 1 – 20 /117 = 97/117

d) P (one of each) = 12C1 x 6C1 x 10C1


28
C3
= 12 x 6 x 10 = 20
14 x 9 x 26 91

Example

In a group of 6 boys and 4 girls, four are to be selected. In how many ways can they be
selected such that at least one boy should be there?

Solution Ways = (6C4) + (6C3 x 4C1) + (6C2 x 4C2) + (6C1 x 4C3) = 209

Example

There are 6 periods in each working day of a school. In how many ways can one
organize 5 subjects such that each subject is allowed at least one period?

Solution

In 6 periods, 5 can be organized in 6P5 ways and the remaining 1 period can be
organized in 5P1 ways.

Total ways = 6P5 x 5P1 = 3600

Example
Given a class of 12 girls and 10 boys.
(a) In how many ways can a committee of 5 consisting of 3 girls and 2 boys be chosen?
(b) What is the probability that a committee of five, chosen at random from the class, consists of
3 girls and 2 boys?
(c) How many of the possible committees of 5 have no boys?(i.e. consists only of girls)
(d) What is the probability that a committee of five, chosen at random from the class, consists
only of girls?

Solution
(a) First note that the order of the children in the committee does not matter. From 12 girls we
can choose C (12, 3) different groups of three girls. From the 10 boys we can choose C(10, 2)
different groups. Thus, by the Fundamental Principle of Counting the total number of committee
is

43
12! 10! 12× 11× 10 10× 9
C (12, 3) × C(10, 2) = 3!9! × 2!8! = 3× 2× 1 × 2× 1 = 220× 45 = 9900
(b) The total number of committees of 5 is C (22, 5) = 26,334. Using part
( a), we find the probability that a committee of five will consist of 3 girls and
2 boys to be
C(12,3)×C(10,2) 9900
= 26334 = 0.4
𝐶(22,5)
(c) The number of ways to choose 5 girls from the 12 girls in the class is
12× 11× 10×9×8
C (10, 0) × C(12, 5) = C(12, 5) = 5×4×3× 2× 1 = 792
(d) The probability that a committee of five consists only of girls is

𝐶(12,5) 792
= 26334 = 0.03
𝐶(22,5)

DISCRETE PROBABILITY DISTRIBUTIONS

1 POISSON DISTRIBUTION

Situations occur when the variable under consideration is the number of occurrences of a
particular event in a given interval of space. Examples, number of cars passing a point on a
road in an hour, number of phone calls a person receives in a day. The distribution used to
model these scenarios is the Poisson probability distribution, defined by

𝜆𝑥
P(X=x) = 𝑒 −𝜆 , x = 0, 1, 2, 3 . . . and 𝜆 is the mean occurrences.
𝑥!

Example
The number of particles emitted per second by a radioactive source has a Poisson distribution
with mean 5. Calculate the probabilities of
(a) 0 (b) 1 (c) 2 (d) 3 or more emissions in a time interval of 1 second

Solution
𝜆𝑥 5𝑥
X – Po (5) P(X=x) = 𝑒 −𝜆 𝑥! P(X=x) = 𝑒 −5 𝑥!
50 51 52
(a) P(X=0) = 𝑒 −5 0! = 0.007 (b) P(X=1) = 𝑒 −5 1! = 0.03 (c) P(X=2) = 𝑒 −5 2! = 0.08

(d) P (X≥3) = 1- [P(X=0) + P(X=1) + P(X=2)] = 1- [0.007+0.03+0.08] = 0.883

The Variance of Poisson distribution

Mean = µ = E(X) = 𝜆

44
Variance = σ2 = Var(X) = 𝜆

2 BINOMIAL DISTRIBUTION

A single trial has exactly two and only two outcomes which are success (p) and failure(q) and
are mutually exclusive. A fixed number of trials, n , takes place, with each trial been
independent of the outcome of all the other trials.

Let X, the random variable, represent the number of successes in the n trials of an experiment,
have a probability distribution given by

P(X=x ) = (𝑛𝑥) 𝑝 𝑥 (𝑞)𝑛−𝑥 for x = 0, 1, 2, 3, . . ., n with q = 1 - p.

Shortly written, x – B(n, p) .

Eg A card is selected at random from a standard pack of 52 playing cards. The suit of the card
is recorded and the card is replaced. This process is repeated to give a total of 16 selections
and on each occasion, the card is replaced in the pack before another selection is made.
Calculate the probability that

a exactly five hearts occur in the 16 selections b at least three hearts occur

Solution
P(heart) = P(success) = 1/4 (1one out of the four hearts) P(failure) = 1-1/4 = ¾ hence X –
B(16, 1/4)
a P(X=5) = (16
5
)( 1/4)5 (3/4)11 = 0.2

b P(at least three hearts) = P(X≥3) = 1 – P(X≤ 2) = 1 – [ (16


0
)( 1/4)0 (3/4)16 + (16
1
)( 1/4)1 (3/4)15
+ (16
2
)( 1/4)2 (3/4)14 ] = 0.8

Example Given that Z – B(9, 0.4), calculate P(4 or 5) and P(Z≥7 ).


Solution
a P(4 or 5) = (94)( 0.4)4 (0.6)5 + (95)( 0.4)5 (0.6)4 =
P(Z≥7) = (97)( 0.4)7 (0.6)2 + (98)( 0.4)8 (0.6)1 +(99)( 0.4)9 (0.6)0 =

3 Geometric Distribution
Consider an experiment where there will be only two outcomes success, p, and failure, q, and
p+q = 1. The number, X, of trials needed to obtain the first success, of independent trials. The
probability mass function (pmf) for a discrete probability distribution

45
P(X = x ) = 𝑝𝑞 𝑥−1 for x = 0, 1, 2, 3, . . .
1 𝑞 1−𝑃
E(X) = µ = 𝑃
and Var(X) = 𝑃 = 𝑃

Example

In a certain producing process it is known that, on the average, 1 in every 100 items is
defective. What is the probability that the fifth item inspected is the first defective item found ?

Solution

Let x = 5, p = 1/100 and q = 99/100

P(X = 5 ) = (1/100)(99/100)5−1 = 0.01

4 Hyper Geometric Distribution

Sampling for this distribution is carried out without replacement and hence the repeated trials
are not independent. Consider a population of size, N, composed of two categories, good, R
and defective, N – R.

The number of successes is given by

(𝑅 )(𝑁−𝑅)
𝑃 (𝑋 = 𝑥) = 𝑝(𝑥) = 𝑥 𝑛−𝑥
(𝑁 )
x = 0, 1, 2, . . . n and 0 ≤ x ≤ R and 0 ≤ n – x ≤ N
𝑛

𝑛𝑅 𝑁−𝑛 𝑅 𝑅
E(X) = µ = 𝑁
and Var(X) = 𝑁−1 . n. 𝑁 (1 − 𝑁)

Example
A class in Statistics has 25 students , 15 males and 10 females. A committee of 5 students is to
be selected. What is the probability mass function for the number, X, of females on the
commmittee ?
Calculate the mean and standard deviation
Solution

N = 25, n = 5, R = 10, and N – R = 15

(10)( 15 )
𝑥 5−𝑥
𝑃 (𝑋 = 𝑥) = 𝑝(𝑥) = (25 )
x = 0, 1, 2, 3, 4, 5
5

x 0 1 2 3 4 5
p(x) 0.06 0.3 0.4 0.2 0.1 0.005

5𝑥10
µ= 25
=2

46
25−5 10 10
Var(X) = 25−1
. 5. 25 (1 − 25) = 1

Standard deviation = √1 = 1

EXPECTATION OF RANDOM VARIABLES

The mean, µ, is the expectation or expected value of a random variable is a rough estimate of
the probability of an event occurring within a large sequence of events. E(X) = µ = ∑ 𝑥𝑖𝑝𝑖

Example
Find the expected values of the random variables X, Y and W which have the following
probability distributions
(i) x 0 1 2 3 4
P(X = x) 1/8 3/8 1/8 ¼ 1/8
(ii) ) y -2 -1 0 1 2 3
P(Y = y) 0.15 0.25 0.3 0.05 0.2 0.05
(iii) w 1 2 3 4 5 6 7

P (W = w) 0.1 0.2 0.1 0.2 0.1 0.2 0.1

VARIANCE OF A RANDOM VARIABLE

Variance = σ2 = Var(X) = ∑(𝑥𝑖− µ)2 𝑝𝑖 = ∑(𝑥𝑖2 𝑝𝑖 ) - µ2

Standard Deviation equals square root of the variance

Example
Calculate the standard deviation of the random variable W above.

Expectation And Variance Of A Binomial Distribution


E(X) = np and Var(X) = npq = n p (1-p)
Example
Nails are sold in packets of 100. Occasionally a nail is faulty. The number of faulty nails in a
randomly chosen pack is denoted by X. Assuming that faulty nails occur independently and at
random, calculate the mean and standard deviation of X, given that the probability of any nail
being faulty is 0.04.

47
Solution
E(X) = np=100(0.04) = 4 and Var(X) = npq = 100(0.04)(0.96)=3.84

S.D = √3.84 = 1.96

CONTINUOUS PROBABILITY DISTRIBUTIONS

1 NORMAL (Gaussian) DISTRIBUTION

The notation X - N(µ, 𝜎 2 ) is used to denote a continuous variable which is normally


distributed with mean µ and variance 𝜎 2 .

The curve of the distribution can be written as

(𝑥−µ)2
1 −
𝑓 (𝑥) = 𝜎√2𝜋 𝑒 2𝜎2 for all real values of x.

The Standard Normal Distribution

We standardize variables so that one normal distribution table can be used for all normal
distributions. The standardized value, Z, is given from the value of the variable X as
𝑋−µ
𝑍= 𝜎
with µ = 0 and 𝜎 2 = 1.

In symbols, P (a ≤ Z ≤ b) = P (Z ≤ b) – P (Z ≤ a) = Φ(𝑏) - Φ(𝑎)

Example P (1.20≤Z≤2.34) = P (Z ≤ 2.34)-P (Z≤1.20)

= Φ(2.34)-Φ(1.20)=0.9904 – 0.8849 = 0.1055

Again, Φ(−𝑍) = 1 - Φ(𝑍)

Example Φ(−1.2) = 1 - Φ(1.2) = 1- 0.8849 = 0.1151

𝑋−µ
𝑍= 𝜎
allows you to change a statement about a N(µ, 𝜎 2 ) into an equivalent statement
about a N(0, 1) distribution.

Consider finding the probability P(X ≤ 230) where N (205, 202 )


𝑋−205
We use 𝑍 = 20

230−205
P(X ≤ 230) = P (Z ≤ ) = P (Z≤1.25) = Φ(1.25) = 0.894
20

NB: P (Z > 𝑎 ) = 1 – P(Z ≤ a)

48
2 Exponential distributions

Spontaneous occurrences of events with intensity, λ, ( and the number of spontaneous


events in any given time interval is Po(λ) distributed), the wait time, T, between two
spontaneous events is given by T ~ 𝐸𝑥𝑝(λ).

F(x) = 1 − 𝑒 −𝜆𝑥
1 1
E(X) = µ = 𝜆
and Var(X) = 𝜆2

A device contains two electrical components, A and B. The lifespans of A and B are both
exponentially distributed with expected values of five years and ten years respectively. The
device works as long as both components work. What is the expected lifespan of the device ?

Solution
1
A: E(X) = =
5

1
B: E(X) = =
10

1
A and B both work: E(X) = = 10
(since it is smaller)

3 Gamma distribution

It has a relation with normal and exponential distributions.

𝛤(𝑛) = (𝑛 − 1)!

In general terms, for a positive number, 𝛼, 𝛤(𝛼) is defined as



𝛤(𝛼 ) = ∫0 𝑥 𝛼−1 𝑒 −𝑥 𝑑𝑥 for 𝑥 > 0

Note that for 𝛼 = 1,



−𝑥
𝛤 (1) = ∫ 𝑒 𝑑𝑥 = 1
0

𝛤 (𝛼 + 1) = 𝛼𝛤(𝛼 ) 𝑓𝑜𝑟 𝛼 > 0

49
1
𝛤 (2) = √𝜋

A continuous random variable X is said to be a gamma distribution with parameters


𝛼, 𝜆 > 0, denoted as 𝑋~𝐺𝑎𝑚𝑚𝑎(𝛼, 𝜆), when its probability density function, PDF, is
𝜆𝛼 𝑥 𝛼−1 𝑒 −𝜆𝑥
given by 𝑓𝑋 (𝑥) = { 𝛤(𝛼)
𝑥>0
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Letting 𝛼 = 1, we have
−𝜆𝑥
𝑓𝑋 (𝑥) = { 𝜆𝑒 𝑥>0
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝛼 𝛼
E(X) = and Var(X) = =
𝜆 𝜆2

50

You might also like