Statistics Presentation 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Probability & Statistics

• WHAT IS STATISTICS?

“Statistics is the study of the collection, analysis, interpretation,


presentation, and organization of data.”
Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9

1
Probability & Statistics
• WHY IS STATISTICS IMPORTANT FOR US?
• The field of statistics deals with the collection, presentation, analysis, and use of
data……..
………. to make decisions, solve problems, and design products and processes.

• Specifically, statistical techniques can be a powerful aid;

– in designing new products and systems,


– improving existing designs,
– designing, developing, and improving production processes.

2
Probability & Statistics
• WHY IS STATISTICS IMPORTANT FOR US?
• Statistical methods are used to help us describe and understand
variability.

• We all encounter variability in our everyday lives, and statistical


thinking can give us a useful way to incorporate this variability
into our decision-making processes.

• For example, consider the fuel-oil kilometer performance


(mileage efficiency) of your car. Do you always get exactly the
same kilometer performance on every tank of fuel?

Of course not…
3
Probability & Statistics
• WHY IS STATISTICS IMPORTANT FOR US?
• In fact, sometimes the kilometer performance varies considerably.
This observed variability in fuel-oil kilometer depends on many
factors,
– type of driving that has occurred most recently (city versus highway), the
changes in condition of the vehicle over time (which could include factors
such as tire inflation, engine compression, or valve wear), the brand and/or
octane number of the fuel-oil used, or possibly even the weather conditions
that have been recently experienced.

• These factors represent potential sources of variability in the system.

• Statistics gives us a framework for describing this variability and for


learning about which potential sources of variability are the most
important or which have the greatest impact on the fuel-oil kilometer
performance.

4
Probability & Statistics
• POPULATIONS AND SAMPLES
• A population is a well-defined collection of objects.

• A subset of the population is a sample.


– Usually populations are so large that a researcher cannot examine the entire group.
Therefore, a sample is selected to represent the population in a research study. The goal is
to use the results obtained from the sample to help answer questions about the population
Example: A college dean is interested in learning about the average age of faculty.
Identify the basic terms in this situation.
• The population is the age of all faculty members at the college.
• A sample is any subset of that population. For example, we might select 10 faculty members and
determine their age.

5
Probability & Statistics
• POPULATIONS AND SAMPLES

Population Sample
All students currently Any department
enrolled in school
All books in library Statistics’ Books
All campus fast food restaurants Burger King

6
Probability & Statistics
• STEPS OF STATISTICAL PRACTICE

• Model building: Set clearly defined goals for the investigation.

• Data collection: Make a plan of what data to collect and how to collect it.

• Data analysis: Apply appropriate statistical methods to extract information


from the data.

• Data interpretation: Interpret the information and draw


conclusions

7
Probability & Statistics
• BRANCHES OF STATISTICS
• Descriptive Statistics - methods for organizing and summarizing data.

– Tables or graphs are used to organize data, and descriptive values such as the
average score are used to summarize data.

– For example, the shooting percentage in basketball is a descriptive statistic that


summarizes the performance of a player or a team. This number is the number of
shots made divided by the number of shots taken. For example, a player who
shoots 33% is making approximately one shot in every three. The percentage
summarizes or describes multiple discrete events.

– Possible only when the entire population can be accessed.

8
Probability & Statistics
• BRANCHES OF STATISTICS
• Inferential Statistics - methods for using sample data to make general
conclusions (inferences) about populations.

– Because a sample is typically only a part of the whole population, sample data
provide only limited information about the population. As a result, sample
statistics are generally imperfect representatives of the corresponding population
parameters.

– It is often used when you don’t have an access to the entire population

9
Probability & Statistics
• WHAT IS PROBABILITY?
• “Probability is the measure of the likeliness that an event will occur.”
(“Probability”. Webster’s Revised Unabridged Dictionary. G & C Merriam, 1913)

• A numerical value expressing the degree of uncertainty regarding the


occurrence of an event. A measure of uncertainty

• Probability is expressed in numbers between 0 and 1. Probability = 0 means


the event never happens; probability = 1 means it always happens.

• The total probability of all possible events always sums up to 1.

“The chance of rain today is 30 % (P=0.3)’’ is a statement that quantifies our


feeling about the possibility of rain.
10
Probability & Statistics
• WHY IS PROBABILITY IMPORTANT FOR US?
• “The term probability refers to the study of randomness and uncertainty.”

• Nothing in life is certain. In everything we do, we gauge the chances of


successful outcomes, from business to medicine to the weather.

• A probability provides a quantitative description of the chances or likelihoods


associated with various outcomes.

11
Probability & Statistics
• Probability is a branch of statistics (mathematics) that is concerned with
developing and analyzing mathematical models of random (or statistical)
experiments.

• Definition: A statistical (or random) experiment is an experiment whose


outcomes are not certain.

• EXAMPLES

 Flipping one or more coins, (heads or tails)


 Tossing one or more dice,
 For instance, rolling a die (singular of dice). The chance of rolling a 2 is 1/6, because there is a 2 on
one face and a total of 6 faces. So, assuming the die is balanced, a 2 will come up 1 time in 6.

 Examining a manufactured item to determine whether it is defective or not,

12
Probability & Statistics
 Comparison of Probability and Statistics
Example: A box of bonibon’s (m&m’s) contains 100 candy pieces, 15 are red.
A handful of 10 is selected.

Probability question: What is the probability that 3 of the 10 selected are


red?

Statistics question: What is the proportion of red one’s in the entire box?

13
Probability & Statistics

Fundamental relationship
between probability and
inferential statistics

• Suppose I know exactly the proportions of car makes in California. Then I can
find the probability that the first car I see in the street is a Ford. This is
probabilistic reasoning as I know the population and predict the sample.

• Now suppose that I do not know the proportions of car makes in California,
but would like to estimate them. I observe a random sample of cars in the
street and then I have an estimate of the proportions of the population. This is
statistical reasoning.

14
Learning Objectives

 1. Define Statistics
 2. Describe the Uses of Statistics
 3. Distinguish Descriptive & Inferential Statistics
 4. Define Population, Sample, Parameter,
& Statistic
What Is Statistics?

1. Collecting Data


 e.g. Survey

2. Presenting Data


 e.g., Charts & Tables

3. Characterizing
Data
 e.g., Average
Statistical Methods

Statistical
Methods

Descriptive Inferential
Statistics Statistics
Descriptive Statistics

1. Involves
 Collecting Data $
50
 Presenting Data
Characterizing Data

25
2. Purpose
 Describe Data 0
Q1 Q2 Q3 Q4

`X = 30.5 S2 = 113
Inferential Statistics

1. Involves
 Estimation
Population?
 Hypothesis
Testing

2. Purpose
 Make Decisions About
Population Characteristics
Key Terms

 1. Population (Universe) • P in Population


 All Items of Interest & Parameter
 2. Sample • S in Sample
 Portion of Population & Statistic
 3. Parameter
 Summary Measure about Population
 4. Statistic
 Summary Measure about Sample
GRAPHİCAL METHODS FOR
DESCRİBİNG DATA SETS
Learning Objectives

 1. Construct Tables & Charts for Categorical Data


 2. Create Contingency Tables
 3. Interpret Contingency Tables
 4. Explain Supertables
 5. Organize Numerical Data
 6. Describe Numerical Data Graphically
Categorical
Data Presentation

Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)

Bar Pie Dot Pareto


Chart Chart Chart Diagram
Summary Table

1. Lists Categories & No. Elements in Category


2. Obtained by Tallying Responses in Category
3. May Show Frequencies (Counts), % or Both

Row Is
Major Count Tally:
Categor ||||
y Accounting 130
||||
Economics 20 ||||
Management 50 ||||
Total 200
Categorical
Data Presentation

Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)

Bar Pie Dot Pareto


Chart Chart Chart Diagram
Bar Chart

Horizontal Major Bar Length


Bars for Shows
Categoric count or %
al
Mgmt.
Variables
Equal Bar
Econ. Widths
1/2 to 1
Bar Width
Acct.

Zero Point 0 50 100 150


Percent Used Also Frequency
Pie Chart

1. Shows
Breakdown of Total Majors
Quantity Mgmt.
into Categories Econ. 25%
2. Useful for 10% 36°
Showing Relative
Differences Acct.
3. Angle Size 65%
 (360°)(Percent) (360°) (10%) = 36°
Dot Chart

Like Major Line Length


Horizontal Shows count or
Bar Chart %
Mgmt.
Horizontal
Equal
Lines for Econ. Spacing
Categoric
al
Variables
Acct.

Zero Point 0 50 100 150


Percent Used Also Frequency
Pareto Diagram

Cumulative Bar Midpoint


Percent Polygon (Ogive)
Always %
100%

Vertical
67% Descending
Order
Bar Chart
33%

0%
Equal Bar Acct. Mgmt. Econ.
Widths Major
Thinking Challenge

You’re an analyst.  Mfg. Mkt. Share (%)


Lotus 15
You want to show
Microsoft 60
the market shares WordPerfect 10
held by Windows Others 15
program
manufacturers.
Construct a bar
chart, pie chart, &
dot chart to
describe the data.
Bar Chart Solution*
Mfg.

Lotus

Microsoft

Wordperf.

Others

0% 20% 40% 60%


Market Share (%)
Pie Chart Solution*

Market Share
Others
Wordperf. 15%
10%
Lotus
15%

Microsoft
60%
Dot Chart Solution*
Mfg.

Lotus

Microsoft

Wordperf.

Others

0% 20% 40% 60%


Market Share (%)
Categorical
Data Presentation

Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)

Bar Pie Dot Pareto


Chart Chart Chart Diagram
Contingency Table

 1. Shows # Observations Jointly in


2 Categorical Variables
 e.g., Male Accounting Student
 Gender Variable And Major Variable
 Can Use Categorized Numerical Variables
 2. May Include Row, Column, or Total %
 3. Helps Find Relationships
 4. Used Widely in Marketing
Contingency Table
Example
Residence: C C O O C C O O C O
Gender: M F F M M M F M M F
(C=On-Campus, O=Off-Campus; M=Male,
F=Female)

Gender
Residence Male Female Total
On-Campus 4 1 5
Off-Campus 2 3 5
Total 6 4 10
Contingency Table
(Row %)

Gender
Residence Male Female Total
On-Campus 4 1 5
(80) (20) (100)
Off-Campus 2 3 5
(40) (60) (100)
Total 6 4 10
(60) (40) (100)
(Cell Count) (3/5)(100) =
(100) 60%
Row Total
Contingency Table
(Column %)

Gender
Residence Male Female Total
On-Campus 4 1 5
(67) (25) (50)
Off-Campus 2 3 5
(33) (75) (50)
Total 6 4 10
(100) (100) (100)
(Cell Count) (3/4)(100) =
(100) 75%
Column Total
Contingency Table
(Total %)

Gender
Residence Male Female Total
On-Campus 4 1 5
(40) (10) (50)
Off-Campus 2 3 5
(20) (30) (50)
Total 6 4 10
(60) (40) (100)
(Cell Count) (3/10)(100) = 30%
(100)
Grand Total
Which Percentage to
Use?
1. Compute % in Direction of
Explanatory Variable
2. If Explanatory Variable Is Row,
Use Row Total
3. In Example, Gender Is
Explanatory Variable
 ‘Explains’ Residence Choice
Thinking Challenge

You’re a marketing
research analyst for
Visa. You want to
analyze data on
credit card use &
annual income using
XTABS. (000): 12 20 32 45 72 46 18 55
Income
Use: Y N N Y Y Y N Y
(Income Categories: Under $25,000; $25,000 & Over;
Use Categories: Y = Use Credit Cards, N = Do Not Use)
Solution*

Use
Income No Yes Total
Under $25k 2 1 3
Explanator (67) (33) (100)
y Variable $25K & Over 1 4 5
(20) (80) (100)
Total 3 5 8
(38) (62) (100)

Row (4/5)(100) =
Percentages 80%
Using XTABS:
Is There a Relationship?

Use
Income No Yes
Total
Under $25k 0 8 8
(0) (100) (100)
$25K & Over 0 0 0
(0) (0) (100)
Total 0 8 8
(0) (100) (100)
Using XTABS:
Is There a Relationship?

Use
Income No Yes Total
Under $25k 2 2 4
(50) (50) (100)
$25K & Over 2 2 4
(50) (50) (100)
Total 4 4 8
(50) (50) (100)
Using XTABS:
Is There a Relationship?

Use
Income No Yes
Total
Under $25k 4 0 4
(100) (0) (100)
$25K & Over 0 4 4
(0) (100) (100)
Total 4 4 8
(50) (50) (100)
Categorical
Data Presentation

Categorical
Data
1 Variable 3+ Variables
2 Variables
Summary XTAB
Supertable
Table (Contingency)

Bar Pie Dot Pareto


Chart Chart Chart Diagram
Supertable

Residence
Variables On-Campus Off-Campus
Gender
Male 4 2
XTAB (67%) (33%) Row %
1 Female 1 3 Only
(25%) (75%)
Work
Yes 1 4 Counts
XTAB (20%) (80%) Often
2 No 4 1 Omitte
(80%) (20%)
d
Explanatory
Variables
Numerical
Data Presentation

Numerical
Data

Ordered Frequency
Array Distributions

Stem-&-Leaf Histo-
Polygon Ogive
Display gram
Ordered Array

 1. Organizes Data to Focus on Major Features


 2. Data Placed in Rank Order
 Smallest to Largest
 3. Data in Raw Form (as Collected)
 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
 4. Data in Ordered Array
 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Stem-and-Leaf Display

1. Divide Each


2 144677
Observation into
Stem Value and Leaf
3 028 26
Value
 Stem Value Defines
Class 4 1
 Leaf Value Defines
Frequency (Count)

2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Frequency Distribution
Table

Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32,
38
Class Frequency
15 but < 25 3
25 but < 35 5
35 but < 45 2
Frequency Distribution
Table Steps
 1. Determine Range
 2. Select Number of Classes
 Usually Between 5 & 15 Inclusive
 3. Compute Class Intervals (Width)
 4. Determine Class Boundaries (Limits)
 5. Compute Class Midpoints
 6. Count Observations & Assign to Classes
Frequency Distribution Table
Example

Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32,
38
Class Midpoint Frequency

15 but < 25 20 3
Width
25 but < 35 30 5

35 but < 45 40 2

(Upper + Lower Boundaries) /


Boundaries
2
Relative Frequency &
% Distribution Tables

Relative Frequency Percentage


Distribution Distribution

Class Prop. Class %


15 but < 25 .3 15 but < 25 30.0
25 but < 35 .5 25 but < 35 50.0
35 but < 45 .2 35 but < 45 20.0
Cumulative Percentage
Distribution Table

Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32,
38
Percentage
Class Cumulative Less than Lower
Percentage Class Boundary

15 but < 25 0.0


Lower 25 but < 35 30.0
Class
35 but < 45 80.0 30% + 50%
Boundary
45 but < 55 100.0 80% + 20%
Histogram

Class Freq.
Count 15 but < 25 3
5 25 but < 35 5
35 but < 45 2
Frequency 4
3
Relative
Frequency 2 Bars
Touch
Percent 1
0
0 15 25 35 45 55
Lower Boundary
Polygon

Class Freq.
Count 15 but < 25 3
5 25 but < 35 5
35 but < 45 2
Frequency 4
3
Relative
Frequency 2 Fictitious
1 Class
Percent
0
0 10 20 30 40 50 60
Midpoint
Cumulative % Polygon
(Ogive)
Cumulative % Fictitious
100% Class

75%
Class Cum. %
50% 15 but < 25 0%
25 but < 35 30%
25% 35 but < 45 80%
45 but < 55 100%

0%
0 15 25 35 45 55
Lower Boundary
Errors in Presenting Data

1. Using ‘Chart


Junk’
2. Compressing
the Vertical Axis
3. No Zero Point
on the Vertical Axis
‘Chart Junk’

Bad Presentation Good Presentation


Minimum Minimum
Wage
1960: $1.00 $ Wage
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
Compressing Vertical Axis

Bad Presentation Good Presentation


Quarterly Quarterly
$ Sales $ Sales
200 50

100 25

0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
No Zero Point
on Vertical Axis

Bad Presentation Good Presentation


Monthly Sales Monthly Sales
$ $
45 60
42 40
39 20
36 0
J M M J S N J M M J S N
Probability & Statistics 63

END OF THE LECTURE…

You might also like