Probability and Statistics Notes
Probability and Statistics Notes
Probability and Statistics Notes
Statistics Definition:
Statistics is the branch of mathematics for collecting, analysing and interpreting data.
Statistics is a branch that deals with every aspect of the data. Statistical knowledge helps to
choose the proper method of collecting the data and employ those samples in the correct analysis
process in order to effectively produce the results. In short, statistics is a crucial process which
helps to make the decision based on the data.
Characteristics of Statistics
The important characteristics of Statistics are as follows:
Importance of Statistics
The important functions of statistics are:
Statistics Example
An example of statistical analysis is when we have to determine the number of people in a town
who watch TV out of the total population in the town. The small group of people is called the
sample here, which is taken from the population.
Types of Statistics
Descriptive Statistics
Inferential Statistics
1. Descriptive Statistics – Through graphs or tables, or numerical calculations,
descriptive statistics uses the data to provide descriptions of the population.
2. Inferential Statistics – Based on the data sample taken from the population,
inferential statistics makes the predictions and inferences.
Descriptive statistics
Summarize and organize characteristics of a data set. A data set is a collection of responses or
observations from a sample or entire population.
In quantitative research, after collecting data, the first step of statistical analysis is to describe
characteristics of the responses, such as the average of one variable (e.g., age), or the relation
between two variables (e.g., age and creativity).
1. Frequency distribution
A data set is made up of a distribution of values, or scores. In tables or graphs, you can
summarize the frequency of every possible value of a variable in numbers or percentages. This is
called a frequency distribution.
For the variable of gender, you list all possible answers on the left hand column. You count the
number or percentage of responses for each answer and display it on the right hand column.
Gender Number
Male 182
Female 235
Gender Number
Other 27
From this table, you can see that more women than men or people with another gender identity
took part in the study.
Here we will demonstrate how to calculate the mean, median, and mode using the first 6
responses of our survey.
Mean
Median
Mode
3. Measures of variability
Measures of variability give you a sense of how spread out the response values is. The range,
standard deviation and variance each reflect different aspects of spread.
Range
The range gives you an idea of how far apart the most extreme response scores are. To find the
range, simply subtract the lowest value from the highest value.
Range: 24 – 0 = 24
Standard deviation
The standard deviation (s or SD) is the average amount of variability in your dataset. It tells you,
on average, how far each score lies from the mean. The larger the standard deviation, the more
variable the data set is.
There are six steps for finding the standard deviation:
From learning that s = 9.18, you can say that on average, each score deviates from the mean by
9.18 points.
Variance
The variance is the average of squared deviations from the mean. Variance reflects the degree of
spread in the data set. The more spread the data, the larger the variance is in relation to the mean.
To find the variance, simply square the standard deviation. The symbol for variance is s2.
s = 9.18
s2 = 84.3
Univariate descriptive statistics focus on only one variable at a time. It‘s important to examine
data from each variable separately using multiple measures of distribution, central tendency and
spread. Programs like SPSS and Excel can be used to easily calculate these.
N 6
Mean 9.5
Median 7.5
Mode 3
Variance 84.3
Range 24
If you were to only consider the mean as a measure of central tendency, your impression of the
―middle‖ of the data set can be skewed by outliers, unlike the median or mode.
If you‘ve collected data on more than one variable, you can use bivariate or multivariate
descriptive statistics to explore whether there are relationships between them.
In bivariate analysis, you simultaneously study the frequency and variability of two variables to
see if they vary together. You can also compare the central tendency of the two variables before
performing further statistical tests.
Multivariate analysis is the same as bivariate analysis but with more than two variables.
Q1 What is Statistics?
Statistics is the branch of mathematics for collecting, analysing and interpreting data. Statistics
can be used to predict the future, determine the probability that a specific event will happen, or
help answer questions about a survey. Statistics is used in many different fields such as business,
medicine, biology, psychology and social sciences.
Graphical Representation
A graphical representation is a visual display of data and statistical results. It is a way of
analysing numerical data. It exhibits the relation between data, ideas, information and concepts
in a diagram. It is easy to understand and it is one of the most important learning strategies. It
always depends on the type of information in a particular domain.
There are different types of graphical representation. Some of them are as follows:
1. Bar Graph
2. Pie Chart
3. Line Graph
4. Pictograph
5. Histogram
6. Frequency Distribution
7. Stem and Leaf Plot
8. Scatter Plot
A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit
or digits) and a "leaf" (usually the last digit). Like in this example:
Example:
Attractive and Impressive: Graphs are always more attractive and impressive than tables or
figures.
Simple and understandable presentation of data: Graphs help to present complex data in a simple
and understandable way. It saves time and energy for both the statistician and the observer.
Universal utility: Graphs can be used in all fields such as trade, economics, government
departments, advertisements, etc.
Helpful in predictions: Through graphs, tendencies that could occur in the near future can be
predicted in a better way.
Lack of Secrecy: Graphical representation of data makes the full presentation of information that
may cause the objective of hiding something.
Errors and Mistakes: There are more chances of errors in the graphical representation of data
because it is complex. This will cause problems in a better understanding.
More Time: Graphical representation of data takes more time in comparison to normal reports.
Box-cox plot
The Box-Cox linearity plot is a plot of the correlation between Y and the transformed X for
given values of \lambda . That is, \lambda is the coordinate for the horizontal axis variable and
the value of the correlation between Y and the transformed X is the coordinate for the vertical
axis of the plot.
Measures that describe the spread of the data are measures of dispersion. These measures include
the mean, median, mode, range, upper and lower quartiles, variance, and standard deviation.
The mean of a set of data is the sum of all values in a data set divided by the number of values in
the set.
It is also often referred to as an arithmetic average. he reek letter ―mu‖ is used as the symbol
for population mean and the symbol is used to represent the mean of a sample. o determine the
mean of a data set:
he median of a set of data is the ―middle element‖ when the data is arranged in ascending order.
To determine the median:
Example:
Consider the data set: 17, 10, 9, 14, 13, 17, 12, 20, 14
Step 1: Put the data in order from smallest to largest. 9, 10, 12, 13, 14, 14, 17, 17, 20
Step 2: Determine the absolute middle of the data. 9, 10, 12, 13, 14, 14, 17, 17, 20
Since the number of data points is odd choose the one in the very middle.
Example:
Consider the data set: 17, 10, 9, 14, 13, 17, 12, 20, 14
Step 1: Put the data in order from smallest to largest. 9, 10, 12, 13, 14, 14, 17, 17, 20
Step 2: Look for any number that occurs more than once. 9, 10, 12, 13, 14, 14, 17, 17, 20
Step 3: Determine which of those occur most frequently. 14 and 17 both occur twice.
Example:
Consider the data set: 17, 10, 9, 14, 13, 17, 12, 20, 14
Step 1: Put the data in order from smallest to largest. 9, 10, 12, 13, 14, 14, 17, 17, 20
Step 2: Identify your maximum. 9, 10, 12, 13, 14, 14, 17, 17, 20
Step 2: Identify your minimum. 9, 10, 12, 13, 14, 14, 17, 17, 20
1. ind the mean of the data. if calculating for a population or if using a sample
6. Divide the sum from Step 4 by the number n (if calculating for a population) or n – 1(if using
a sample). This will give you the variance.
Definition
Probability can be defined as the ratio of the number of favorable outcomes to the total
number of outcomes of an event. For an experiment having 'n' number of outcomes, the
number of favorable outcomes can be denoted by x.
where,
Probability Tree
The tree diagram helps to organize and visualize the different possible outcomes. Branches
and ends of the tree are two main positions. Probability of each branch is written on the
branch, whereas the ends are containing the final outcome. Tree diagrams are used to figure
out when to multiply and when to add. You can see below a tree diagram for the coin:
Types of Probability
There are three major types of probabilities:
1. Theoretical Probability
2. Experimental Probability
3. Axiomatic Probability
1. Theoretical Probability
It is based on the possible chances of something to happen. The theoretical probability is
mainly based on the reasoning behind probability. For example, if a coin is tossed, the
theoretical probability of getting a head will be ½.
2. Experimental Probability
3. Axiomatic Probability
In axiomatic probability, a set of rules or axioms are set which applies to all types. These
axioms are set by Kolmogorov and are known as Kolmogorov‘s three axioms. With the
axiomatic approach to probability, the chances of occurrence or non-occurrence of the events
can be quantified. The axiomatic probability lesson covers this concept in detail with
Kolmogorov‘s three rules axioms along with various examples.
Sample Space:
All the possible outcomes of an experiment together constitute a sample space. For
example, the sample space of tossing a coin is {head, tail}.
Event: The total number of outcomes of a random experiment is called an event.
Equally Likely Events: Events that have the same chances or probability of occurring
are called equally likely events. The outcome of one event is independent of the other.
For example, when we toss a coin, there are equal chances of getting a head or a tail.
Exhaustive Events: When the set of all outcomes of an event is equal to the sample
space, we call it an exhaustive event.
Mutually Exclusive Events: Events that cannot happen simultaneously are called
mutually exclusive events. For example, the climate can be either hot or cold. We
cannot experience the same weather simultaneously.
Events in Probability
In probability theory, an event is a set of outcomes of an experiment or a subset of the sample
space. If P(E) represents the probability of an event E, then, we have,
P(E) = 0 if and only if E is an impossible event.
0 ≤ P E ≤ 1.
Suppose, we are given two events, "A" and "B", then the probability of event A, P(A) > P(B)
if and only if event "A" is more likely to occur than the event "B". Sample space(S) is the set
of all of the possible outcomes of an experiment and n(S) represents the number of outcomes
in the sample space.
P(E) = n(E)/n(S)
P E‘ = n - n(E))/n(S) = 1 - (n(E)/n(S))
Calculating Probability
In an experiment, the probability of an event is the possibility of that event occurring. The
probability of any event is a value between (and including) "0" and "1". Follow the steps
below for calculating probability of an event A:
Step 1: Find the sample space of the experiment and count the elements. Denote it by n(S).
Here are some examples that well describe the process of finding probability.
Example 1: Find the probability of getting a number less than 5 when a dice is rolled by
using the probability formula.
Solution
To find:
Therefore, n(S) = 6
So, n(A) = 4
Using the probability equation,
P(A) = (n(A))/(n(s))
p(A) = 4/6
m = 2/3
Example 2: What is the probability of getting a sum of 9 when two dice are thrown?
Solution:
To get the desired outcome i.e., 9, we can have the following favorable outcomes.
Probability Theorems
The following theorems of probability are helpful to understand the applications of
probability and also perform the numerous calculations involving probability.
Theorem 1: The sum of the probability of happening of an event and not happening of an
event is equal to 1. P(A) + P(A') = 1.
Theorem 4: The probability of happening of any event always lies between 0 and 1. 0 <
P(A) < 1
Theorem 5: If there are two events A and B, we can apply the formula of
the union of two sets and we can derive the formula for the probability of happening of event
A or event B as follows.
P(A∪B) = P(A) + P(B) - P A∩B)
Also for two mutually exclusive events A and B, we have P( A U B) = P(A) + P(B)
For example,
let us assume that there are three bags with each bag containing some blue, green, and
yellow balls. What is the probability of picking a yellow ball from the third bag? Since there
are blue and green colored balls also, we can arrive at the probability based on these
conditions also. Such a probability is called conditional probability
P (A1) + P A2 + P A3 + … + P An = 1
For example,
If we toss a fair coin, the probability of getting a head is 12. If we toss it for 50 times, the
probability of getting a head is 25. We call this as the theoretical or expected frequency of the
heads. But actually, by tossing a coin, we may get 25, 30 or 35 heads which we call as the
observed frequency.
Binomial Definition
The Latin prefix "bi-" means "two", the root "nom" means name, and the suffix "-ial" means
"of or relating to". The literal translation of the word binomial is "of or relating to two
names."
The algebraic expression which contains only two terms is called binomial.
Where a and b are the numbers, and m and n are non-negative distinct integers. x takes the
form of indeterminate or a variable.
In Laurent polynomials, binomials are expressed in the same manner, but the only
difference is the exponents, m and n can be negative. Therefore, we can write it as;
Examples of Binomial
4x2+5y2
xy2+xy
0.75x+10y2
x+y
x2 + 3
Binomial distribution
The binomial distribution is the discrete probability distribution that gives only two possible
results in an experiment, either Success or Failure
For example,
A coin toss has only two possible outcomes: heads or tails and taking a test could have two
possible outcomes: pass or fail.
Properties of Binomial Distribution
The properties of the binomial distribution are:
There are two possible outcomes: true or false, success or failure, yes or no.
here is ‗n‘ number of independent trials or a fixed number of n times repeated trials.
The probability of success or failure remains the same for each trial.
Only the number of success is calculated out of n independent trials.
Every trial is an independent trial, which means the outcome of one trial does not
affect the outcome of another trial.
Binomial Distribution Mean and Variance
For a binomial distribution, the mean, variance and standard deviation for the given number
of success are represented using the formulas
The following quick examples help in a better understanding of the concept of the negative
binomial distribution.
Suppose we flip a coin repeatedly and count the number of heads (successes). If we continue
flipping the coin until it has landed 2 times on heads, we are conducting a negative binomial
experiment. The negative binomial random variable is the number of coin flips required to
achieve 2 heads. In this example, the number of coin flips is a random variable that can take
on any integer value between 2 and plus infinity. The negative binomial probability
distribution for this example is presented below.
Properties of Negative Binomial Distribution
A negative binomial distribution is a distribution that has the following properties.
A person is seeking new employment that is both challenging and fulfilling. What is
the probability that he will quit zero times, one time, two times so on until he finds
his ideal job?
A pharmaceutical company is designing a new drug to treat a certain disease that will
have minimal side effects. What is the probability that zero drugs fail the test, one
drug fails the test, two drugs fail the test and so on until they have designed the ideal
drug?
Poison distribution
A Poisson distribution is a discrete probability distribution. It gives the probability of an
event happening a certain number of times (k) within a given interval of time or space.
he Poisson distribution has only one parameter, λ lambda , which is the mean number of
events.
Where,
Example 1: In a cafe, the customer arrives at a mean rate of 2 per min. Find the
probability of arrival of 5 customers in 1 minute using the Poisson distribution formula.
Solution:
iven: λ = 2, and x = 5.
P(X = 6) = 0.036
Normal distribution
Normal distribution, also known as the Gaussian distribution, is a probability distribution that is
symmetric about the mean, showing that data near the mean are more frequent in occurrence than
data far from the mean. In graphical form, the normal distribution appears as a "bell curve".
he normal distribution follows the following formula. Note that only the values of the mean μ
and standard deviation σ are necessary
As the chart below shows, most people conform to that average. Meanwhile, taller and shorter
people exist, but with decreasing frequency in the population. According to the empirical rule,
99.7% of all people will fall with +/- three standard deviations of the mean, or between 154 cm
(5' 0") and 196 cm (6' 5"). Those taller and shorter than this would be quite rare (just 0.15% of
the population each).
Normal Distribution Problems and Solutions
Question 1: Calculate the probability density function of normal distribution using the following
data. x = 3, μ = 4 and σ = 2.
Mean = 4 and
Standard deviation = 2
Applications
The normal distributions are closely associated with many things such as:
Gamma distribution
The gamma distribution term is mostly used as a distribution which is defined as two parameters
– shape parameter and inverse scale parameter, having continuous probability distributions. It is
related to the normal distribution, exponential distribution, and chi-squared distribution and
Erlang distribution. ‗Γ‘ denotes the gamma function.
Gamma distributions have two free parameters, named as alpha α and beta β , where;
α = hape parameter
he scale parameter β is used only to scale the distribution. his can be understood by remarking
that wherever the random variable x appears in the probability density, then it is divided by β.
Since the scale parameter provides the dimensional data, it is seldom useful to work with the
―standard‖ gamma distribution, i.e., with β = 1.
Hypothesis Testing
Hypothesis testing is the process of data utilization to test a hypothesis about a population. A
hypothesis is a statement about a population parameter. For example, the hypothesis that the
population mean equals to 5 is considered to be a null hypothesis. A test statistic is a number
that is used for testing the hypothesis.
Types
The three major types of hypotheses are:
1. Null Hypothesis (H0): Represents the default assumption, stating that there is no
significant effect or relationship in the data.
2. Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific
effect or relationship that researchers want to investigate.
3. No directional Hypothesis: An alternative hypothesis that doesn't specify the direction
of the effect, leaving it open for both positive and negative possibilities.
Example
Let's consider a hypothesis test for the average height of women in the United States. Suppose
our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and
determine that their average height is 5'5". The standard deviation of population is 2.
z = x – μ0 / σ /√n
z = (5'5" - 5'4" / 2" / √100
z = 0.5 / (0.045)
z = 11.11
We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is
evidence to suggest that the average height of women in the US is greater than 5'4".
Estimation
Any of numerous procedures used to calculate the value of some property of a population from
observations of a sample drawn from the population.
Estimation refers to the process by which one makes inferences about a population, based on
information obtained from a sample.
Estimation is a process in which we obtain the values of unknown population parameters with
the help of sample data
Types of Estimation
Estimators are two different types:
1. Point Estimation
2. Interval Estimation
Point Estimates
A point estimate is a sample statistic calculated using the sample data to estimate the most likely
value of the corresponding unknown population parameter. In other words, we derive the point
estimate from a single value in the sample and use it to estimate the population value.
x = Σx/n
For example,
62 is the average x mark achieved by a sample of 15 students randomly collected from a class
of 150 students, which is considered the mean mark of the entire class. Since it is in the single
numeric form, it is a point estimator.
The basic drawback of point estimates is that no information is available regarding their
reliability. In fact, the probability that a single sample statistic is equal to the population
parameter is very unlikely.
Interval Estimates
A confidence interval estimate is a range of values constructed from sample data so that
the population parameter will likely occur within the range at a specified probability.
Accordingly, the specified probability is the level of confidence.
Broader and probably more accurate than a point estimate
Used with inferential statistics to develop a confidence interval – where we believe with a
certain degree of confidence that the population parameter lies.
Any parameter estimate that is based on a sample statistic has some amount of sampling
error.
In statistics, interval estimation uses sample data to calculate an interval of possible values of an
unknown population parameter.
Properties of Estimation
We use sample measures to estimate the population measures; these statistics are the estimators.
Following are the properties of good estimators.
Variable
A variable is any characteristic, number, or quantity that can be measured or counted. A variable
may also be called a data item.
Quantitative
A quantitative variable is a variable that reflects a notion of magnitude, that is, if the values it can
take are numbers. A quantitative variable represents thus a measure and is numerical.
Quantitative variables are divided into two types: discrete and continuous. The difference
is explained in the following two sections.
1. Discrete
Quantitative discrete variables are variables for which the values it can take are countable and
have a finite number of possibilities. The values are often (but not always) integers. Here are
some examples of discrete variables:
Even if it would take a long time to count the citizens of a large country, it is still technically
doable. Moreover, for all examples, the number of possibilities is finite. Whatever the number of
children in a family, it will never be 3.58 or 7.912 so the number of possibilities is a finite
number and thus countable.
2. Continuous
On the other hand, quantitative continuous variables are variables for which the values are not
countable and have an infinite number of possibilities. For example:
Age
Weight
Height
For simplicity, we usually referred to years, kilograms (or pounds) and centimeters (or feet and
inches) for age, weight and height respectively. However, a 28-year-old man could actually be
28 years, 7 months, 16 days, 3 hours, 4 minutes, 5 seconds, 31 milliseconds, 9 nanoseconds old.
For all measurements, we usually stop at a standard level of granularity, but nothing (except our
measurement tools) prevents us from going deeper, leading to an infinite number of potential
values. The fact that the values can take an infinite number of possibilities makes it uncountable.
Qualitative
In opposition to quantitative variables, qualitative variables (also referred as categorical variables
or factors in R) are variables that are not numerical and which values fit into categories.
In other words, a qualitative variable is a variable which takes as its values modalities, categories
or even levels, in contrast to quantitative variables which measure a quantity on each individual.
Qualitative variables are divided into two types: nominal and ordinal.
Nominal
A qualitative nominal variable is a qualitative variable where no ordering is possible or implied
in the levels.
For example, the variable gender is nominal because there is no order in the levels (no matter
how many levels you consider for the gender—only two with female/male, or more than two
with female/male/ungendered/others, levels are unordered). Eye color is another example of a
nominal variable because there is no order among blue, brown or green eyes.
Two levels (e.g., do you smoke? Yes/No, or are you pregnant? Yes/No), or a large number of
levels (what is your college major? Each major is a level in that case).
Note that a qualitative variable with exactly 2 levels is also referred as a binary or dichotomous
variable.
Ordinal
On the other hand, a qualitative ordinal variable is a qualitative variable with an order implied in
the levels. For instance, if the severity of road accidents has been measured on a scale such as
light, moderate and fatal accidents, this variable is a qualitative ordinal variable because there is
a clear order in the levels.
Another good example is health, which can take values such as poor, reasonable, good, or
excellent. Again, there is a clear order in these levels so health is in this case a qualitative ordinal
variable.
Variable transformations
Let‘s say we are interested in babies‘ ages. he data collected is the age of the babies, so a
quantitative continuous variable. However, we may work with only the number of weeks since
birth and thus transforming the age into a discrete variable. The variable age remains a
quantitative continuous variable but the variable we are working on (i.e., the number of weeks
since birth) can be seen as a quantitative discrete variable.
Let‘s say we are interested in the Body Mass Index BMI . or this, a researcher collects data on
height and weight of individuals and computes the BMI. The BMI is a quantitative continuous
variable but the researcher may want to turn it into a qualitative variable by categorizing
individuals below a certain threshold as underweighted, above a certain threshold as
overweighted and the rest as normal weighted. The raw BMI is a quantitative continuous
variable but the categorization of the BMI makes the transformed variable a qualitative (ordinal)
variable, where the levels are in this case underweighted < normal < overweighted.