Lekcija 5 - Vjerovatnoca
Lekcija 5 - Vjerovatnoca
Lekcija 5 - Vjerovatnoca
Lecture 5
Probability Distributions
Sampling
PROBABILITY DISTRIBUTIONS
Uniform Distribution
1/6
1 2 3 4 5 6
Binomial Distribution (1)
Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd
edition). Dubuque, IA: McGraw-Hill Education, p. 156-157.
Binomial Distribution (2)
• In fact, in most cases, we are not even bothered about which items are defective or have over 50% fat
content.
• We only want to know the number with this characteristic.
Binomial Distribution - A Sample of 2
(0.8)(0.8)(0.8)=0.512 q3 q3 P(0S)
F,F,F
ænö r n-r
P(r ) = çç ÷÷ p (1 - p )
èrø
This gives the probability of r successes in n trials
Provided that the binomial conditions are met
Tables of values are also available
or spreadsheets can be used to do the calculations
Binomial Distribution - Example
• What is the probability that with a fair dice you throw a six
for six times in a row?
• n =6 " n% r n −r
• r=6 P(r) = $ ' p (1 − p)
• p = 1/6 # r&
• q = 1/6 6 6−6
"6%" 1 % " 1 %
$ '$ ' $1 − ' =
#6&# 6 & # 6 &
0.0000216
Discrete Probability Distributions and Function
Names in Excel
• Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd edition).
Dubuque, IA: McGraw-Hill Education, p. 162.
LO 5.3
Basic Concepts Of Normal Distribution
• It is a symmetrical distribution
Why Is It important?
• occurs in nature
• occurs in production situation
• is the basis of sampling theory
• hence is the basis for the underlying usefulness of
statistics to allow us to draw implications about the
whole population from the results of a sample
Probability Density Function
% x −µ (2
1 −(1/ 2)'
& σ )
*
f (x) = e −∞≤ x ≤∞
σ 2π
where π = 3.14159... and e = 2.71828...
€
The Shape of the Normal Distribution
90 µ 110
Why symmetrical? Let µ = 100. Suppose x = 110. Now suppose x = 90
2 2
æ 110-100 ö æ 10 ö æ 90-100 ö
2
æ -10 ö
2
APP
Finding Normal Probabilities
X -µ
z=
s
Fortunately there are tables of values for areas
under the Standard Normal Distribution
Probabilities
Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd edition).
Dubuque, IA: McGraw-Hill Education, p. 220.
Sampling
• Elements of sampling
• population
• sampling framework
• random number generator
• one vs. more samples
Source: Loane, D. P., Seward, L. E. (2016). Applied Statistics in Business and Economics. New York, NY:
McGraw-Hill Education, p. 293.
Types of Sampling
• Probability Sampling
• This is where every item has a calculable chance of selection
• Non-probability Sampling
• this is where someone has some choice in who or what is selected
• this would mean that some people or organisations had zero chance of selection
Types of Random Sample
• Systematic Sample
• Stratified Sampling
• uses information that we already have to try to make sure the sample
reflects the population
• Clusters
• Multi-stage Designs
• only practical method for national surveys
Simple Random Sample (1)
Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd edition).
Dubuque, IA: McGraw-Hill Education, p. 222.
Simple Random Sample (2)
• Subjects in the population are sampled by a random process, using either a random
number generator or a random number table, so that each person remaining in the
population has the same probability of being selected for the sample.
Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd edition).
Dubuque, IA: McGraw-Hill Education, p. 222.
Stratified Random Sampling (2)
• This sampling procedure separates the population into mutually exclusive sets (strata), and
then draws simple random samples from each stratum.
Stratified Random Sampling (3)
• a stratum is a subset of the population that shares at least one common characteristic (males
and females, or managers and non-managers...)
• the researcher first identifies the relevant stratums and their actual representation in the
population
• random sampling is then used to select a sufficient number of subjects from each stratum
• often used when one or more of the stratums in the population have a low incidence relative to
the other stratums
• stratified sampling can reduce cost per observation and narrow the error bounds (reduces
sampling error)
Stratified Random Sampling (4)
• each stratum
• the relationships among strata.
• Advantages:
• It guarantees that the population subdivisions of interest are represented in the sample.
• The estimates of parameters produced from stratified random sampling have greater precision than
estimates obtained from simple random sampling.
Stratified Random Sample: An Example
Cluster Sampling (1)
Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd edition).
Dubuque, IA: McGraw-Hill Education, p. 223.
Cluster Sampling (2)
http://www.youtube.com/watch?v=QOxXy-I6ogs&feature=related
Stratified vs. Cluster Sampling (1)
Source: Jaggia, S., & Hawke, A. K. (2020). Essentials of Business Statistics: Communicating with Numbers (2nd edition).
Dubuque, IA: McGraw-Hill Education, p. 223.
Stratified vs. Cluster Sampling (2)
Stratified Cluster
The population is divided into homogeneous The members of the population are selected
segments, and then the sample is randomly at random, from naturally occurring groups
taken from the segments. called 'cluster'.
• Select every kth item from a list or sequence (e.g., restaurant customers)
• Systematic sampling is quick and convenient when you have a complete list of the
members of your population (for example, members of Congress). However, if there’s
some kind of pattern to the original list, then bias may creep in to your statistics.
• For example, if a list of people is ordered as MFMFMFMF, then choosing every 10th number will
give you a sample consisting entirely of females.
Source: Doane, D. P., Seward, L. E. (2016). Applied Statistics in Business and Economics. New York, NY:
McGraw-Hill Education, p. 37.
Multistage Sampling
• Quota Sample
• most frequently used, especially in market research
• again uses information we already have about the population in order for the sample to
reflect this
• Judgmental Sampling
• Snowball Sampling
• Convenience Sampling
Quota Sampling
• Convenience Sampling
• used in exploratory research
• sample is selected because they are convenient
• first available primary data source will be used without additional requirements (e.g., Facebook polls).
• Judgment (Purposive) Sampling
• common nonprobability method
• researcher selects the sample based on judgment
• most effective in situations where there are only a restricted number of people in a population who own qualities that a
researcher expects from the target population
• extension of convenience sampling
• the researcher must be confident that the chosen sample is truly representative of the entire population
• Snowball Sampling
• used when the desired sample characteristic is rare
• relies on referrals from initial subjects to generate additional subjects
• may introduce bias
Other Non-Random Sampling Methods
Sampling Methods: An Overview
Sources of Error: Sampling Bias vs. Sampling Error
• In sampling, the word bias does not refer to prejudice. Rather, it refers to a systematic
tendency to over- or underestimate a population parameter of interest.
• The word error generally refers to issues / characteristics of sample methodology that lead
to inaccurate estimates of a population parameter.
Source: Doane, D. P., Seward, L. E. (2016). Applied Statistics in Business and Economics. New York, NY:
McGraw-Hill Education, p. 41.
Sampling Error and Survey Bias (1)
•Two major types of errors can arise when a sampling procedure / data collection is
performed.
• Sampling Error
• Sampling error refers to differences between the sample and the population, because of the specific
observations that happen to be selected.
• Sampling error is expected to occur when making a statement about the population based on the sample taken.
•Non-sampling Error
• Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a
sample.
• Increasing sample size will not reduce this type of errors.
• Non-sampling errors have the potential to cause bias in polls, surveys or samples.
• There are three types of non-sampling errors:
• errors in data acquisition
• non-response errors
• selection bias.