Comm 215.MidtermReview

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 71
At a glance
Powered by AI
The document outlines the topics and format that will be covered on the Business Statistics midterm exam, including multiple choice questions, problems, and the use of concepts like descriptive statistics, probability, z-scores, and sampling.

The midterm exam will cover topics like descriptive statistics, probability, binomial distributions, z-scores, and data mining through multiple choice questions and problems. The problems will focus on areas like descriptive statistics, probability charts, and z-scores.

The different types of sampling methods discussed are probability sampling (such as simple random, stratified, cluster, and systematic sampling), convenience sampling, voluntary response sampling, and judgement sampling. Only probability sampling can be used to make statistical inferences.

Comm 215: Business

Statistics
Midterm Review

Robert Anthony Pereira


Midterm Outline
• 20 to 25 multiple choice questions
• 10 easy, 10 medium, 5 hard
• Data mining is about 5 question but either on midterm or final
• 4 to 5 problems
• One on descriptive statistics (stem leaf, box and whiskers, Pareto)
• One on probability (probability chart)
• One on probability and binomial distributions
• One on z-scores (may include binomial distributions)
Descriptive Statistics
Definitions
• Statistics: the science that deals with the collection, classification, an
alysis, and
interpretation of numerical facts or data, and that, by use of mathem
atical theories of probability,
imposes order and regularity on aggregates of more or less disparate
elements.
• Two types of statistics
• Descriptive Statistics: the science of describing the important aspects of a set
of measurements
• Inferential Statistics: the science of using samples to make assumptions about
the population
Definitions
• Population: the set of all elements about which we wish to draw
conclusions.
• Sample: a subset of the elements of a population
• Statistical Interference: the science of using a sample of
measurements to make generalizations about the important aspects
of a population of measurements.
Definitions
• Random sample: when every element of a population has the same
chance of being in the sample
• Sample with replacement: once we pick an element from the
population, we put it back before picking again
• Sample without replacement: once an element is picked it can not be
picked again
Definitions: Type of sampling
• Probability sampling: select randomly, based on probability
1)Simple random ( randomly selecting from a population)
2) Stratified (dividing the population into subgroups, randomly selecting parts of each group)
3) Cluster (similar to stratified, divide population, then divide the subgroups again)
4) Systemic (Picking every nth element)
• Convenience sampling: select elements based on convenience, they were easy to
pick
• Voluntary response sampling: only selecting people who want to be selected
• Judgement Sampling: when someone selects a sample based on his or her own
judgement

***Only probability sampling may be used to make statistical inferences


Definitions: Ethical Issues
1) Improper sampling: purposely selecting a biased sample, selecting a
specific sample in order to get a desired response
2) Misleading charts, graphs and descriptive measures: manipulating
graphs or data to minimize or maximize any part of the gathered
information
3) Inappropriate statistical analysis or inappropriate interpretation of
statistical results : continuing to sample a population until a selected
sample has the desired characteristics
Definitions: types of quantitative variables
• Quantitative: represents quantities
• Ratio variable: a quantitative variable measured on a scale such that the
ratios of its values are meaningful and there is an inherently defined zero
value
• Interval variable: a quantitative variable where ratios of its value are not
meaningful and there is not an inherently defined zero.
Simplified definitions
Ratio variable: the values are mathematically comparable value A is 2 times
smaller than value B and 0 is a value with no meaning, weighting 0 pounds
means there is no weight
Interval Variable: the values are not mathematically comparable and 0 is a
value with meaning. Think of temperature: 60 degrees F is not twice as hot
as 30 degrees F and 0 degrees does not mean there no temperature
Definitions
• Response rate: the proportion of all people whom we attempted to contact
to those that actually responded to a survey.
• Target population: the entire population of interest to us in a particular
study.
• Sample frame: a list of sampling elements from which the sample will be
selected.
• Sampling error: the difference between a numerical descriptor of the
population and the corresponding descriptor of the sample.
• Undercoverage: when some population elements are excluded from the
process selecting the sample.
• Nonresponse: whenever some of the individuals who were supposed to be
included in the sample are not.
Frequency Distributions
• Frequency Distribution: a table that summarizes the number (or frequency) of items in each of
several non-overlapping classes.
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
• Relative Frequency =
𝑛
• Cumulative Relative Frequency= Add the frequency of a class with all the frequencies of the
classes before it
• Sample: 1,10,14,16,17,22,26,29
Classes Frequency Relative Frequency Cumulative Relative
Frequency
[0-11[ 2 2/8 = .25 .25
[11-21[ 3 3/8 = .375 .25+.375 = .675
[21-31[ 3 3/8 = .375 .675+.375= 1
Total 8 1
Making classes
Step 1: Find the number of classes using 2𝐾 > 𝑁,K being the number of
classes (start at K=1, and go up until 2𝐾 > 𝑁)
𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 −𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
Step 2: Find the class length using
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
Step 3: Using the numbers found in step 1 & 2 make the classes
Step 4: Count the numbers of elements in each frequency
Histogram and Bar chart
• Histogram: the bars touch each other and it is used for quantitative
values. (Percentage, frequency, salaries)

• Bar chart: the bars not touch and it is used for qualitative values. (eye
color)
Stem and Leaf graph (example on slide 29)
1) Place all numbers in order
2) Choose a stem unit, if one is not given
3) Find the leaf unit, leaf unit = stem unit divided by 10
- The leaf unit is equal to stem unit x10
4) Remember a leaf unit can only be one integer
5) Build the graph
Other types of Charts
• Pareto chart
• A histogram or bar chart with a line showing cumulative frequency drawn
over it
• Pie Chart
• A circle with a slice colored for each classes or type of value with the size of
the slice depending on the frequency of the class or value
• Dot Plot
1) Draw a straight line
2) Write all the possible values in order on that line
3) Place 1 dot above each value of every time it appears in the sample
Describing Central Tendency
• Mean, median, mode
• All three are used to determine the distribution of a population or
sample
• Mean(µ for a population, Xbar for a sample)
• Mean is the average 𝜇=
𝑥
𝑥ҧ =
𝑥
𝑁 𝑛
• Median
• Median is the middle number or 50th percentile
• Mode
• Mode is the most frequent number
Types of distributions
• Normal/Bell Shaped/Symmetric
• when the mean, median and mode are all equal
• Most common type for this class Mean
Median
Mode

Skewed to the left(Mean<median) Skewed to the right right (Mean>median)

The words left and right indicate where the tail is. (tail is the skinny part)
Other less common types of Distribution
• Binomial

• Uniform
Standard deviation and variance
• Standard deviation is the average of all distances from the mean and
the individual values of a population or sample
StD of a population StD of a sample
σ𝑛
𝑖=0(𝑥−𝑋(𝑝𝑎𝑟))
2
σ𝑛𝑖=0(𝑥 − 𝜇)2 S=
𝜎= 𝑛−1
𝑁

.5
1
1
2.5

• Variance is the same formula as standard deviation but without the square root
Coefficient of Variation
• Coefficient of variation indicates how much a population varies in
percentage, which allows us to compare one population to another
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
• Coefficient of variation = 𝑥100
𝑚𝑒𝑎𝑛
• A lower coefficient of variation means that most of the values are
closer to the mean, which is normally considered better

• The higher CV is the wider the distribution will be


Empirical rule
• Only useful for Normal/Bell Shaped/Symmetric

µ-3Ó µ-2Ó µ-Ó µ µ+Ó µ+2Ó µ+3Ó

68.26%
95.44%

99.73%
Chebyshev’s rule (for very skewed or non-normal distribution)
• The formula, Percentage= 100*(1-(1/K^2))
• K is the number of standard deviations
Two Options
1) Standard Deviation = Given
Probability = Not Given

Plug In and solve


Example: Standard Deviation = 2
1 1
Probability =100*(1-( 𝐾2 ) = 100*(1-( 22 ) = 75%

2) Standard Deviation = Not Given


Probability = Given

Work backward to solve for K

Example on next slide


Chebyshev’s Rule Solve for K
Probability = 86% 4) Cancel the negatives
1 1
Standard Deviation = Not Given −.14 = − 2 , .14 = 2
𝐾 𝐾
1
Probability =100*(1-( )
𝐾2 5) Cross multiply
1) Plug in the probability 1
.14 = 2 , .14𝐾 2 = 1
1 𝐾
86 = 100*(1-( 2)
𝐾
2) Divide both sides by 100 6) Divide both sides by .14
.14𝐾2 1 1
1
100∗(1−( 2 ) = , 𝐾2 = , 𝐾 2 = 7.1428
86 𝐾 1 .14 .14 .14
= , .86 = (1-( )
100 100 𝐾2
3) Bring over the 1 7) Square root both sides
1 1 𝐾 2 = 7.1428
.86 – 1 = , −.14 = −
𝐾2 𝐾2
K= 2.6726
Percentile
• Once all the numbers are placed in increasing order, the percentile
formula will find the number that is at the border of being equal to or
less than any given percentage of the rest of the population or sample
• Percentile formula
𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
• i= 𝑛
100
• I is not the value, it is the position of the number that represents that given
percentile
• If i is a decimal round up and take the number in that position
• If i is an exact number take the average of the number in that position and the
number following it
Percentile examples
Ex.: Find the 30th percentile from the following data set: 1, 3, 4, 5, 6, 6,
6, 8, 8, 9.
30
𝑖= ∗ 10 = 3 (3rd value in ascending order)
100
4+5
The 30th percentile is = 4.5. (Since it was a whole number, take the
2
average)
Ex.: Find the 32th percentile from the following data set: 1, 3, 4, 5, 6, 6,
6, 8, 8, 9.
32
𝑖= ∗ 10 = 3.2 (round up 4th value in ascending order)
100
The 30th percentile is 5.
5 Number Summary or box plot
1) The Median (50th percentile)
2) The first Quartile (25th percentile)
3) The Third Quartile (75th percentile)
4) The smallest number in the dataset
5) The largest number in the dataset
Box and whiskers
1) Find the median or 50th percentile
2) Find the First Quarter/25th percentile and the third Quarter/75th
percentile
3) Find the Interquartile Range IQR= Q3-Q1
4) Find the fences
1) Lower fence = Q1 – (1.5 x IQR)
2) Upper Fence = Q3 + (1.5 x IQR)
5) Draw the Box and Whiskers
• The whiskers should extend until the fences if there are outliers or to the
largest and smallest value if there is none
Stem and Leaf Example
• A researcher has obtained the number of hours worked per week
during the summer for a sample of twelve students. The results are
shown below: 40 25 35 30 22 43 35 24 38 10 8 15
• a. Construct a stem-and-leaf display for the number of hours worked
per week.
• b. Provide the five numbers required in order to draw the boxplot.
• c. Compute the mean and standard deviation of the distribution.
• d. Determine the 40th percentile of the data set.
A) Construct a stem-and-leaf display for the
number of hours worked per week.
• 40 25 35 30 22 43 35 24 38 10 8 15
Step 1: Place elements in ascending order
8,10,15,22,24,25,30,35,35,38,40,43
Step 2: Pick a Leaf or Stem unit
Stem unit = Leaf unit = 1
10
0 8
1 0 5
2 2 4 5
3 0 5 5 8
4 0 3
B) Provide the five numbers required in order
to draw the boxplot.
8,10,15,22,24,25,30,35,35,38,40,43
• Boxplot is the same as a 5 number summary
• 1-First quartile (25th percentile)
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 25
• 𝑖= 𝑛= 12 = 3
100 100
• 3 is a whole number, take the average of the 3rd and 4th value
15+22
• 25th percentile is equal to 2 = 18.5
• 2-Second quartile (50th percentile)
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 50
• 𝑖= 100
𝑛= 100
12 = 6
• 6 is a whole number, take the average of the 6th and 7th value
25+30
• 50th percentile is equal to 2 = 27.5
• 3-Third quartile (75th percentile)
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 75
• 𝑖= 𝑛= 12 = 9
100 100
• 9 is a whole number, take the average of the 9th and 10th value
35+38
• 75th percentile is equal to 2 = 36.5
• 4- largest value = 43
• 5-smallest value = 8
C) Compute the mean and standard deviation
of the distribution. X result (x-xbar)^2

8 (8-27.0833)^2 364.1723
𝑥
• Mean 𝑥ҧ = 10 (10-27.0833)^2 291.8391
𝑛
15 (15-27.0833)^2 146.0061
• 8,10,15,22,24,25,30,35,35,38,40,43
8+10+15+22+24+25+30+35+35+38+40+43 22 (22-27.0833)^2 25.8399
• 24 (24-27.0833)^2 9.5067
12
• Mean = 27.0833 25 (25-27.0833)^2 4.3401
30 (30-27.0833)^2 8.5071
σ𝑛
𝑖=0(𝑥−𝑋(𝑝𝑎𝑟))
2
• Standard Deviation S = 35 (35-27.0833)^2 62.6741
𝑛−1
35 (35-27.0833)^2 62.6741
1514.2494 38 (38-27.0833)^2 119.1743
• = 11.7328
12−1 40 (40-27.0833)^2 166.8411
43 (43-27.0833)^2 253.3413

Sum= 1514.2494
D) Determine the 40th percentile of the data set.

• 8,10,15,22,24,25,30,35,35,38,40,43
𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
• 𝑖= 𝑛
100
40
• I = ( )12, = 4.8
100
• 4.8 is has a decimal, around up and take the 5th number in the order
• The 40th percentile is equal to 24
Probability
Probability
• Probability: the chance or percentage of an outcome occurring
𝑎𝑙𝑙 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
• Probability of any event =P(A)=
𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝐻 1
• Probability of flipping head = = .50
𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝐻+𝑇 2

• The rule of complements


• The complement is the probability of what you do not want P(A’)
• If all probability equals one
• 1-P(A)=P(A’)
• P(A’)+P(A)=1
Intersection and union
• Intersection, P(AΠB): the probability of that both event A and B will
simultaneously occur.
2,4,6
• Event A: rolling a dice and getting an even number =.5
1,2,3,4,5,6
1,2
• Event B: rolling a dice and getting a number less than 3 = .33333
1,2,3,4,5,6
𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑡ℎ𝑒𝑦 ℎ𝑎𝑣𝑒 𝑖𝑛 𝑐𝑜𝑚𝑚𝑜𝑛 2
• P(AΠB)= = = 0.1667
𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 1,2,3,4,5.,6
• Union, P(AUB): the probability that A or B (or both) will occur.
2,4,6
• Event A: rolling a dice and getting an even number =.5
1,2,3,4,5,6
1,2
• Event B: rolling a dice and getting a number less than 3 = .33333
1,2,3,4,5,6
𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑒𝑖𝑡ℎ𝑒𝑟 𝐴 𝑜𝑟 |𝐵 1,2,4,6
• P(AUB) = = =.6667
𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 1,2,3,4,5.,6
Theoretical example
• From this slide on the following population will be referenced
• 1= Female, 12, likes bacon
• 2= Female, 25, likes bacon
• 3= Female, 42, likes bacon
• 4= Female, 58, likes bacon
• 5= Male, 11, likes bacon
• 6 = Male, 15, likes bacon
• 7 = Male, 27, dislikes bacon
• 8 = Male, 32, dislikes bacon
• 9 = Male, 49, dislikes bacon
• 10= Male, 51, dislikes bacon
The addition rule
• The addition rule
• P(AUB)=P(A)+P(B)-P(AΠB)
• Example
6
• P(A)= someone liking bacon =.6
10
4
• P(B)=someone being over 40 =.4
10
2
• P(AΠB)=someone being over 40 and liking bacon = .2
10
• P(AUB)=someone liking bacon or being over bacon = .6+.4-.2 = .8
• If we did not subtract the people over 40 and liking bacon we would
be double, person 3 and 4, since they counted in both event A and B
Mutually exclusive events
• Mutually exclusive events: no possibility of events A and B occurring
at the same time
• P(AΠB) = 0
• Example
4
• P(A)= being female = =.4
10
4
• P(B)= not liking bacon = = .4
10
0
• P(AΠP)= being female and not liking bacon= =0
10
• Addition rule for two mutually exclusive events
• P(AUB)=P(A)+P(B)
• P(AUB)= Being female or not liking bacon = .4+.4=.8
Conditional Probability
• Conditional probability is the probability of Event A happen given that
Event B has already happened, denoted as P(AIB)
𝑃(𝐴∩𝐵)
• P(AIB)=
𝑃(𝐵)
• Example: the probability of someone not liking bacon given that the person
is male
6
• P(B)=being male .6
10
4
• P(AΠB)=being male and not like bacon = =.4
10
.4
• P(AIB)=not liking bacon given being male = .6667
.6
4
• Same as checking how many males out of all males do not like bacon =
6
.6667
General multiplication Rule
• P(AΠB)=P(A)P(AIB)
• Useful to have more than one way to find an intersection
• Can be used to check for errors
• Example
6
• P(B)=being male .6
10
.4
• P(AIB)=not liking bacon given being male = .6667
.6
• P(AΠB)= P(A)P(AIB)=(.6)(.6667)=.4
4
• Same as checking who is male and does not like bacon = .4
10
Independent events
• Independent events are two events who's probability have no effect
on the others
• Two events are considered independent if and only if
• P(AIB)=P(A) or
• P(AIB)=P(B)
• Example:
• Flipping a coin once and getting heads, has no effect on the outcome of a
second flip
• The probability on the second flip is still 50/50
Multiplication Rule for two independent
Events
• P(AΠB)=P(A)(B)
• Example
1
• P(A)= probability of flipping heads
2
1
• P(B)= probability of rolling a six
6
1 1 1
• P(AΠB)=probability of rolling a six and flipping heads 𝑥 =
2 6 12
Example: Two percent (2%) of the customers of a store buy cigars. Half of the
customers who buy cigars buy beer. 25 percent who buy beer buy cigars. What is the
probability that the customer who buys beer will buy cigar?
Beer No Beer total
Cigars .01 .01 .02
No Cigars 1-.02=.98
1

• 25% who buy beer buy cigars, P(AIB) =.25


• P(AΠB), people who buy cigars and beer = .01
• P(B), people who buy beer = ?
𝑃(𝐴∩𝐵) .01
• 𝑃 𝐴𝐼𝐵 = = .25 = ,?= .04
𝑃(𝐵) ?

Beer No Beer Total


Cigars .01 .01 .02
No Cigars .04-.01=.03 .96-.01=.95 .98
.04 1-.04=.96 1
Discrete Random Variable
Random variable
• A random variable is a variable whose value is numerical and is determined by the outcome of an
experiment. A random variable assigns one and only one numerical value to each experimental
outcome.
• Rolling a dice, possible outcomes are 1,2,3,4,5,6
• Before rolling, X could be 1,2,3,4,5,6
• After rolling, X = 5
• Two types of random variable
1) Discrete random variable (Chapter 5)
• Dice only have 6 possible outcome
• Ex.: Number of students in this class
2) Continuous random variable (Chapter 6)
• The height of all Concordia students
• Ordered by classes
• Ex.: Weight
Probability distribution
• For discrete random variable the data is normally show in 1 of 2 forms
Possible outcomes (x) Probability of each
outcome
1 20%
2 10%
3 20%
4 35%
5 15%
• Important to remember that the sum of all probability must always
equal to 1
Expected Value (mean)
• 𝜇𝑥 = σ 𝑥 ∙ 𝑝(𝑥)

Outcome (x) Probability Expected value


$5 20% or .20 5 x .20 = 1
$10 20% or .20 10 x .20 = 2
$20 20% or .20 20 x .20 = 4
$50 20% or .20 50 x .20 = 10
$100 20% or .20 100 x .20 = 20
Total 100% or 1 µ = 37
Variance and Standard Deviation
• Variance
• 𝜎 2 = σ(𝑥 − 𝜇)2 ∙ 𝑝(𝑥)
• Standard Deviation
• 𝜎 = 𝜎2

Outcome (x) Probability Expected value Variance


$5 20% or .20 5 x .20 = 1 (5 − 37)2 𝑥. 20 =204.8
$10 20% or .20 10 x .20 = 2 (10 − 37)2 𝑥. 20 =145.8
$20 20% or .20 20 x .20 = 4 (20 − 37)2 𝑥. 20 =57.8
$50 20% or .20 50 x .20 = 10 (50 − 37)2 𝑥. 20 =33.8
$100 20% or .20 100 x .20 = 20 (100 − 37)2 𝑥. 20 =793.8
Total 100% or 1 Expected Value =37 Variance= 1236

Std=35.1568
Binomial
4 Rules for Binomials
1. Experiment consists of n identical trials

2. Each trial results in either “success” or “failure”

3. Probability of success, p, is constant from trial to trial


• The probability of failure, q, is 1 – p

4. Trials are independent (the probability of one event does not effect the
probability of another)
Binomial Formula
𝑛!
• P= ∗ 𝑝 𝑥 ∗ (1 − 𝑝)𝑛−𝑥
𝑥! 𝑛−𝑥 !
• The first part is the combination formula
𝑛!

𝑥! 𝑛−𝑥 !
• N is the number of trials
• X is the number of desired outcomes
• 𝑃 𝑥 , the probability of success to the power of the X
• X being the number of desired outcomes
• (𝟏 − 𝐩)𝐧−𝐱 𝐨𝐫 𝐪𝐧−𝐱
• The probability of the event not happening to the power of n-x
• N-x, being the number of trials minus the number of desired outcomes
Binomial Theoretical example
What are the chances of rolling a dice 3 times and rolling a 4 two times.
• Long way:
• P(1,4,4)+P(2,4,4)+P(3,4,4)+P(5,4,4)+P(6,4,4)+P(4,1,4)+P(4,2,4)+P(4,3,4)+P(4,5,4)+P(4,6,4)+P(4,4,1)+P(4,4
,2)+P(4,4,3)+P(4,4,5)+P(4,4,6)= .06944 or 6.9444%
𝑛!
• Short way: P= ∗ 𝑝𝑥 ∗ (1 − 𝑝)𝑛−𝑥
𝑥! 𝑛−𝑥 !
3! 12 1
• P= 2! ∗ ∗ (1 − 6)3−2 = .06944 or 6.9444%
3−2 ! 6
Mean, Variance, Standard Deviation for
Binominals
• Mean
• 𝜇𝑥 = 𝑛𝑝
• Variance
• 𝜎𝑥2 = 𝑛𝑝𝑞
• Standard Deviation
• 𝜎𝑥 = 𝑛𝑝𝑞

n= number of trials, p = probability of positive outcome, q= probability


of negative outcome (1-p)
Question 1: From the midterm review, under the
section called Discrete Probability Distributions/
Binomial Distribution
Before starting, build a chart with the
information given in the problem
1st class 2nd class 3rd class Total

Regular Price • 90% in first class pay regular • 70% in second class pay • 80% in third class pay regular • Total riders paying regular
(given) regular (given) (given) price .18+.21+.40= .79
• 20% of all riders are in first • 30% of all riders are in second • If 20% are in first class and
class (given) class (given) 30% are in second 1-
• Number of people in first class • Number of people in second .2 -.3 = .5, 50% are in third
that pay full price .90 x .20 = class that pay full price .70 x class
.18 .30 = .21 • Number of people in third
• First class regular price, $4 • Second class regular price, $2 class that pay full price .50 x
(given) (given) .80 = .40
• Third class regular price, $0.50
(given)

Reduced Price • People in first class who pay • People in second class who • People in third class who pay • Total riders paying reduced
reduced 1-.9 = .1 par reduced 1 - .7 = .3 reduced 1 - .8 = .2 price .02+.09+.10=.21
• 20% of all riders are in first • 30% of all riders are in second • 50% of all riders are in third
class (given) class (given) class ( calculated above)
• Number of people in first class • Number of people in second • Number of people in third
that pay reduced .1 x .2= class that pay reduced .3 x class that pay reduced .2 x
.02 .3 = 0.09 .5 = .10
• First class reduced price, $2 • Second class reduced price, $1 • Third class reduced price,
(given) (given) $0.25 (given)

Total • 18% in first class regular • 21% in second class regular • 40% in third class regular • Total in 1st, 2nd, and 3rd class
• 2% in first class reduced • 9% in second class reduced • 10% in third class reduced .20+.30+.50= 1
A) What proportion of riders pay regular fare?
• Look at chart
• 18% of all riders are in first class and pay regular
• 21% of all riders are in second class and pay regular
• 40% of all riders are in third class and pay regular
• .18 + .21 .40 = .79

• Proportion of riders that pay regular fare is .79 or 79%


B)If a rider is selected at random, and found to have paid regular fare,
what is the probability that the rider paid a first-class fare?

• *If* hinting to the fact that it is conditional probability


𝑃 𝐴∩𝐵
•𝑃 𝐴⋮𝐵 =
𝑃 𝐵
• A being the probability of being in first class
• B being the probability of paying regular
.18
• 𝑃 .18 ⋮ .79 = = .227
.79
C) (A)What is the mean amount of money paid per rider? (B)What is the
mean amount of money paid per regular-fare rider?

• (A)What is the mean amount of money paid per rider?


• Expected value 𝜇𝑥 = σ 𝑥 ∙ 𝑝(𝑥)
• .18($4) + .21($2) + .4($.50) + .02($2)+.09($1) + .10($.25) = $1.495

• (B)What is the mean amount of money paid per regular-fare rider?


.18 4 + .21 2 + .4(.5)

.79
• * why divided by .79?
• In part A, there is technically an invisible 1 dividing the entire equation but since for
part B the percentages (.18,.21,.4) do not represent the probability for the whole
population (1) but the problem is only about 79% of the population, we need to
divide by .79.
D) A transit-authority spot check is made by selecting eight riders at random. (A) What is the
probability that at least two of them are reduced-fare riders? (B) What is the expected number of
reduced-fare riders amongst the eight? (C)What is the standard deviation?

• (A)Use Binomial, since a rider is either reduced or not


𝑛!
• P= ∗ 𝑝 𝑥 ∗ (1 − 𝑝)𝑛−𝑥
𝑥! 𝑛−𝑥 !
• It is faster to find the probability of 0 and 1 reduced fare riders and finding the
compliment rather than finding the probability of 2,3,4,5,6,7,8 reduced fare riders
8!
• P= ∗.210 ∗ 1 − .21 8−0 = .1517
0! 8−0 !
8!
• P= ∗.211 ∗ (1 − .21)8−1 = .3226
1! 8−1 !
• .1517+.3226=.4743
• 1-.4743= .5257
• The probability of at least two riders out of eight being reduced fare riders equals
.5257 or 52.57%
D continued…
• (B) Expected value = 𝜇𝑥 = 𝑛𝑝
• Number of trials(N) = 8
• Probability of any rider being reduced fare(p) = .21
• 𝜇𝑥 = 8 . 21 = 1.68
• (C) Standard Deviation = 𝑛𝑝𝑞
• Number of trials(N) = 8
• Probability of any rider being reduced fare(p) = .21
• The probability of any rider not being a reduced fare (q) = 1-.21 = .79
• 8 .21 (.79) = 1.15
E) If there are seventy-five thousand riders on a randomly selected day,
what is an estimate for the total amount paid in fares on that day?

• How much does the average rider pay?


• Found in C, the average rider pays an expected price of $1.495
• Therefore, if there are 75,000 riders
• 75,000 x 1.495 = $112,125
• 75,000 riders should bring in an estimated $112,125 on any given day
Continuous Random Variable
Continuous random variables
• The curve f(x) is the continuous probability distribution of the random
variable x if the probability that x will be in a specified interval of
numbers is the area under the curve f(x) corresponding to the
interval. (also called probability curve or probability density function)
• In short: continuous random variables can be any number under a
distribution curve.
• In the this chapter, the distribution will be bell shaped or normal
Normal Distribution
• Bell-Shaped
• Symmetrical around its mean
• Standard Normal Curve: 𝜎=1 and 𝜇=0.
• Mean = Mode = Median

.5 .5

Mean
Median
Mode
Z Score
• A measure of the how many standard deviations away 𝑥−𝜇
𝑧=
your x is from the µ. 𝜎
• After finding a z score, looking in up on the table and it
tells the probability of any number in the distribution
being below your x value.
• Always look at area to the left
•A

A z-score question will be asked in 1 of 4 ways.


Option 1: Find the probability on either side of X
Compute the probability that a randomly selected rod is longer
than 140.5mm. (Ơ=.2,µ=140)

𝑥−𝜇 140.5−140
1. Z= = = 2.5
𝜎 .2

2. Find z=2.5 on the Z score table


?
140mm 140.5mm 3. Probability of rod being shorter than
140.5mm is 99.38%. Therefore, the
probability of the rod being greater than
140.5mm is equal to 1-.9938 = .0062 or
.62%.
Option 2: Find the probability between two points (x)
Compute the probability that a randomly selected rod is between
140.2mm and 140.5mm. (Ơ=.2,µ=140)
𝑥−𝜇 140.5−140
Z= 𝜎 = = 2.5
.2
Z= 2.5, find 2.5 on the Z score table
99.38%
Probability of rod being shorter than
140.5mm is 99.38%.
140mm 140.5mm

𝑥−𝜇 140.2−140
Z= 𝜎 = =1
.2
Z= 1, find 1 on the Z score table
84.13%
Probability of rod being shorter than
140.2mm is 84.13%.
140mm 140.2mm

99.38 – 84.13 = 15.25


The probability of a selected rod being
between 140.2mm and 140.5mm is
15.25% 15.25%.

140mm 140.2mm 140.5mm


Option 3:Find the probability at both ends of the distribution
A rod is considered defective if it is less than 139.8mm or if it is longer than
140.2mm. Compute the probability that a random selected rod will be defective.
(Ơ=.2,µ=140)

𝑥−𝜇 139.8−140
Z= 𝜎 = .2
= −1
Z= -1, find -1 on the Z score table
15.87 Probability of rod being shorter than
139.8mm 140mm 139.8mm is 15.87%.
𝑥−𝜇 140.2−140
Z= 𝜎 = =1
.2
Z= 1, find 1 on the Z score table
Probability of rod being shorter than 140.2mm is
15.87 84.13%.Therefore, the probability of the rod being
140mm 140.2mm longer than 140.2mm equals 1-.8413 = 15.87%
Since we are looking for the probability
of the rod being shorter than 139.8mm
or longer than 140.2mm , we can
simply add the two probabilities found
15.87 15.87
above. 15.87 + 15.87 = 31.74%
139.8mm 140mm 140.2mm
Z score Option 4 (special case)
• Normally in any Z-score problem, the x, µ, and 𝜎 are given.
• However in some cases, the 𝜎, µ, and a percentage is given and the
question asks to find the X.
• In order to solve
• 1- Find the given percentage on the inside of the Z score table and the
matching z score value.
• Remember the percentage represents the probability to the left of X and not the actual z
score
• 2- Plug in your z score, 𝜎, and µ into the z score formula and solve for X.
Option 4 Example
A population has a mean of 3 and a standard deviation of .5. Find the 60th percentile.

1) Understand that the 60th percentile means what value(x) is greater than 60% of the value when
dealing with continuous distributions.
2) Find 60% on the z-score table. 60% equals a z-score between 0.25 and 0.26.
.25+.26
3) Take the average of both z scores = =.255
2
𝑥−𝜇 𝑥−3
4) Plug all the values in the z score formula, z = =.255=
𝜎 .5
5) Isolate X,
𝑥−3
• .255 = .5
• X-3=.1275
• X=3.1275

6) Final statement; 60% of the values in the distribution are less than or equal to 3.1275.
Final tips
• Always start by putting the numbers of sample in order
• If you forget the empirical rule, look them up on the Z table
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
• Coefficient of variation is not on your formulae sheet, 𝑥100
𝑚𝑒𝑎𝑛
• If the goal for this class is an A or A+, try to do the midterm review package 3 times
• Keep in mind that exam is fairly long for time given and time will be a factor

You might also like