Statistics Presentation 1
Statistics Presentation 1
Statistics Presentation 1
• WHAT IS STATISTICS?
1
Probability & Statistics
• WHY IS STATISTICS IMPORTANT FOR US?
• The field of statistics deals with the collection, presentation, analysis, and use of
data……..
………. to make decisions, solve problems, and design products and processes.
2
Probability & Statistics
• WHY IS STATISTICS IMPORTANT FOR US?
• Statistical methods are used to help us describe and understand
variability.
Of course not…
3
Probability & Statistics
• WHY IS STATISTICS IMPORTANT FOR US?
• In fact, sometimes the kilometer performance varies considerably.
This observed variability in fuel-oil kilometer depends on many
factors,
– type of driving that has occurred most recently (city versus highway), the
changes in condition of the vehicle over time (which could include factors
such as tire inflation, engine compression, or valve wear), the brand and/or
octane number of the fuel-oil used, or possibly even the weather conditions
that have been recently experienced.
4
Probability & Statistics
• POPULATIONS AND SAMPLES
• A population is a well-defined collection of objects.
5
Probability & Statistics
• POPULATIONS AND SAMPLES
Population Sample
All students currently Any department
enrolled in school
All books in library Statistics’ Books
All campus fast food restaurants Burger King
6
Probability & Statistics
• STEPS OF STATISTICAL PRACTICE
• Data collection: Make a plan of what data to collect and how to collect it.
7
Probability & Statistics
• BRANCHES OF STATISTICS
• Descriptive Statistics - methods for organizing and summarizing data.
– Tables or graphs are used to organize data, and descriptive values such as the
average score are used to summarize data.
8
Probability & Statistics
• BRANCHES OF STATISTICS
• Inferential Statistics - methods for using sample data to make general
conclusions (inferences) about populations.
– Because a sample is typically only a part of the whole population, sample data
provide only limited information about the population. As a result, sample
statistics are generally imperfect representatives of the corresponding population
parameters.
– It is often used when you don’t have an access to the entire population
9
Probability & Statistics
• WHAT IS PROBABILITY?
• “Probability is the measure of the likeliness that an event will occur.”
(“Probability”. Webster’s Revised Unabridged Dictionary. G & C Merriam, 1913)
11
Probability & Statistics
• Probability is a branch of statistics (mathematics) that is concerned with
developing and analyzing mathematical models of random (or statistical)
experiments.
• EXAMPLES
12
Probability & Statistics
Comparison of Probability and Statistics
Example: A box of bonibon’s (m&m’s) contains 100 candy pieces, 15 are red.
A handful of 10 is selected.
Statistics question: What is the proportion of red one’s in the entire box?
13
Probability & Statistics
Fundamental relationship
between probability and
inferential statistics
• Suppose I know exactly the proportions of car makes in California. Then I can
find the probability that the first car I see in the street is a Ford. This is
probabilistic reasoning as I know the population and predict the sample.
• Now suppose that I do not know the proportions of car makes in California,
but would like to estimate them. I observe a random sample of cars in the
street and then I have an estimate of the proportions of the population. This is
statistical reasoning.
14
Learning Objectives
1. Define Statistics
2. Describe the Uses of Statistics
3. Distinguish Descriptive & Inferential Statistics
4. Define Population, Sample, Parameter,
& Statistic
What Is Statistics?
3. Characterizing
Data
e.g., Average
Statistical Methods
Statistical
Methods
Descriptive Inferential
Statistics Statistics
Descriptive Statistics
1. Involves
Collecting Data $
50
Presenting Data
Characterizing Data
25
2. Purpose
Describe Data 0
Q1 Q2 Q3 Q4
`X = 30.5 S2 = 113
Inferential Statistics
1. Involves
Estimation
Population?
Hypothesis
Testing
2. Purpose
Make Decisions About
Population Characteristics
Key Terms
Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)
Row Is
Major Count Tally:
Categor ||||
y Accounting 130
||||
Economics 20 ||||
Management 50 ||||
Total 200
Categorical
Data Presentation
Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)
1. Shows
Breakdown of Total Majors
Quantity Mgmt.
into Categories Econ. 25%
2. Useful for 10% 36°
Showing Relative
Differences Acct.
3. Angle Size 65%
(360°)(Percent) (360°) (10%) = 36°
Dot Chart
Vertical
67% Descending
Order
Bar Chart
33%
0%
Equal Bar Acct. Mgmt. Econ.
Widths Major
Thinking Challenge
Lotus
Microsoft
Wordperf.
Others
Market Share
Others
Wordperf. 15%
10%
Lotus
15%
Microsoft
60%
Dot Chart Solution*
Mfg.
Lotus
Microsoft
Wordperf.
Others
Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)
Gender
Residence Male Female Total
On-Campus 4 1 5
Off-Campus 2 3 5
Total 6 4 10
Contingency Table
(Row %)
Gender
Residence Male Female Total
On-Campus 4 1 5
(80) (20) (100)
Off-Campus 2 3 5
(40) (60) (100)
Total 6 4 10
(60) (40) (100)
(Cell Count) (3/5)(100) =
(100) 60%
Row Total
Contingency Table
(Column %)
Gender
Residence Male Female Total
On-Campus 4 1 5
(67) (25) (50)
Off-Campus 2 3 5
(33) (75) (50)
Total 6 4 10
(100) (100) (100)
(Cell Count) (3/4)(100) =
(100) 75%
Column Total
Contingency Table
(Total %)
Gender
Residence Male Female Total
On-Campus 4 1 5
(40) (10) (50)
Off-Campus 2 3 5
(20) (30) (50)
Total 6 4 10
(60) (40) (100)
(Cell Count) (3/10)(100) = 30%
(100)
Grand Total
Which Percentage to
Use?
1. Compute % in Direction of
Explanatory Variable
2. If Explanatory Variable Is Row,
Use Row Total
3. In Example, Gender Is
Explanatory Variable
‘Explains’ Residence Choice
Thinking Challenge
You’re a marketing
research analyst for
Visa. You want to
analyze data on
credit card use &
annual income using
XTABS. (000): 12 20 32 45 72 46 18 55
Income
Use: Y N N Y Y Y N Y
(Income Categories: Under $25,000; $25,000 & Over;
Use Categories: Y = Use Credit Cards, N = Do Not Use)
Solution*
Use
Income No Yes Total
Under $25k 2 1 3
Explanator (67) (33) (100)
y Variable $25K & Over 1 4 5
(20) (80) (100)
Total 3 5 8
(38) (62) (100)
Row (4/5)(100) =
Percentages 80%
Using XTABS:
Is There a Relationship?
Use
Income No Yes
Total
Under $25k 0 8 8
(0) (100) (100)
$25K & Over 0 0 0
(0) (0) (100)
Total 0 8 8
(0) (100) (100)
Using XTABS:
Is There a Relationship?
Use
Income No Yes Total
Under $25k 2 2 4
(50) (50) (100)
$25K & Over 2 2 4
(50) (50) (100)
Total 4 4 8
(50) (50) (100)
Using XTABS:
Is There a Relationship?
Use
Income No Yes
Total
Under $25k 4 0 4
(100) (0) (100)
$25K & Over 0 4 4
(0) (100) (100)
Total 4 4 8
(50) (50) (100)
Categorical
Data Presentation
Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)
Residence
Variables On-Campus Off-Campus
Gender
Male 4 2
XTAB (67%) (33%) Row %
1 Female 1 3 Only
(25%) (75%)
Work
Yes 1 4 Counts
XTAB (20%) (80%) Often
2 No 4 1 Omitte
(80%) (20%)
d
Explanatory
Variables
Numerical
Data Presentation
Numerical
Data
Ordered Frequency
Array Distributions
Stem-&-Leaf Histo-
Polygon Ogive
Display gram
Ordered Array
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Frequency Distribution
Table
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32,
38
Class Frequency
15 but < 25 3
25 but < 35 5
35 but < 45 2
Frequency Distribution
Table Steps
1. Determine Range
2. Select Number of Classes
Usually Between 5 & 15 Inclusive
3. Compute Class Intervals (Width)
4. Determine Class Boundaries (Limits)
5. Compute Class Midpoints
6. Count Observations & Assign to Classes
Frequency Distribution Table
Example
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32,
38
Class Midpoint Frequency
15 but < 25 20 3
Width
25 but < 35 30 5
35 but < 45 40 2
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32,
38
Percentage
Class Cumulative Less than Lower
Percentage Class Boundary
Class Freq.
Count 15 but < 25 3
5 25 but < 35 5
35 but < 45 2
Frequency 4
3
Relative
Frequency 2 Bars
Touch
Percent 1
0
0 15 25 35 45 55
Lower Boundary
Polygon
Class Freq.
Count 15 but < 25 3
5 25 but < 35 5
35 but < 45 2
Frequency 4
3
Relative
Frequency 2 Fictitious
1 Class
Percent
0
0 10 20 30 40 50 60
Midpoint
Cumulative % Polygon
(Ogive)
Cumulative % Fictitious
100% Class
75%
Class Cum. %
50% 15 but < 25 0%
25 but < 35 30%
25% 35 but < 45 80%
45 but < 55 100%
0%
0 15 25 35 45 55
Lower Boundary
Errors in Presenting Data
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
No Zero Point
on Vertical Axis