Sbe10 02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Chapter 2 Summarizing Qualitative Data

Descriptive Statistics:
Tabular and Graphical Presentations Frequency Distribution
Relative Frequency Distribution
Bar Graph
Summarizing Qualitative Data
Pie Chart
Summarizing Quantitative Data
Exploratory Data Analysis
Crosstabulations and Scatter Diagrams

Slide 1 Slide 2

Frequency Distribution Example: Marada Inn

Guests staying at Marada Inn were


A frequency distribution is a tabular summary of asked to rate the quality of their
data showing the frequency (or number) of items accommodations as being excellent,
in each of several nonoverlapping classes. above average, average, below average, or
poor. The ratings provided by a sample of 20 guests are:
The objective is to provide insights about the data
Below Average Average Above Average
that cannot be quickly obtained by looking only at
the original data. Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Slide 3 Slide 4

Frequency Distribution Relative Frequency Distribution

The relative frequency of a class is the fraction or


Rating Frequency proportion of the total number of data items
2 belonging to the class.
Poor
Below Average 3
Average 5 A relative frequency distribution is a tabular
Above Average 9 summary of a set of data showing the relative
Excellent 1 frequency for each class.
Total 20

Slide 5 Slide 6

1
Relative Frequency Distributions Bar Graph

A bar graph is a graphical device for depicting


qualitative data.
Relative On one axis (usually the horizontal axis), we specify
Rating Frequency the labels that are used for each of the classes.
Poor .10 A frequency, or relative frequency scale can be used
Below Average .15 for the other axis (usually the vertical axis).
Average .25
Using a bar of fixed width drawn above each class
Above Average .45 label, we extend the height appropriately.
Excellent .05
The bars are separated to emphasize the fact that each
Total 1.00 class is a separate category.
1/20 = .05

Slide 7 Slide 8

Bar Graph Pie Chart


Marada Inn Quality Ratings The pie chart is a commonly used graphical device
10
for presenting relative frequency distributions for
9
qualitative data.
8
7 First draw a circle; then use the relative
Frequency

6 frequencies to subdivide the circle


5 into sectors that correspond to the
4 relative frequency for each class.
3 Since there are 360 degrees in a circle,
2
a class with a relative frequency of .25 would
1
consume .25(360) = 90 degrees of the circle.
Rating
Poor Below Average Above Excellent
Average Average

Slide 9 Slide 10

Pie Chart Example: Marada Inn

Marada Inn Quality Ratings Insights Gained from the Preceding Pie Chart

Excellent
One-half of the customers surveyed gave Marada
5% a quality rating of “above average” or “excellent”
Poor (looking at the left side of the pie). This might
10% please the manager.
Below
Average For each customer who gave an “excellent” rating,
Above 15% there were two customers who gave a “poor”
Average
45%
rating (looking at the top of the pie). This should
Average displease the manager.
25%

Slide 11 Slide 12

2
Summarizing Quantitative Data Example: Hudson Auto Repair

Frequency Distribution The manager of Hudson Auto


Relative Frequency Distributions would like to have a better
Histogram understanding of the cost
Cumulative Distributions of parts used in the engine
Ogive tune-ups performed in the
shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

Slide 13 Slide 14

Example: Hudson Auto Repair Frequency Distribution

Sample of Parts Cost for 50 Tune-ups Guidelines for Selecting Number of Classes
• Use between 5 and 15 classes.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76 • Data sets with a larger number of elements
usually require a larger number of classes.
104 74 62 68 97 105 77 65 80 109
• Smaller data sets usually require fewer classes
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Slide 15 Slide 16

Frequency Distribution Frequency Distribution

Guidelines for Selecting Width of Classes


For Hudson Auto Repair, if we choose six classes:
Use classes of equal width.
Approximate Class Width = (109 - 52)/6 = 9.5  10
Approximate Class Width =

Largest Data Value − Smallest Data Value


Parts Cost ($) Frequency
50-59 2
Number of Classes
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

Slide 17 Slide 18

3
Relative Frequency Distributions Relative Frequency Distributions

Insights Gained from the Relative Frequency


Distribution
Parts Relative Only 4% of the parts costs are in the $50-59 class.
Cost ($) Frequency 30% of the parts costs are under $70.
50-59 .04 The greatest percentage (32% or almost one-third)
60-69 .26 2/50 of the parts costs are in the $70-79 class.
70-79 .32 10% of the parts costs are $100 or more.
80-89 .14
90-99 .14
100-109 .10
Total 1.00

Slide 19 Slide 20

Histogram Histogram

A common graphical presentation of Tune-up Parts Cost


quantitative data is a histogram. 18
16
The variable of interest is placed on the horizontal
axis. 14
12
Frequency

A rectangle is drawn above each class interval with


its height corresponding to the interval’s frequency, 10
or relative frequency. 8
Unlike a bar graph, a histogram has no natural 6
separation between rectangles of adjacent classes. 4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)

Slide 21 Slide 22

Cumulative Distributions Cumulative Distributions

Hudson Auto Repair


Cumulative frequency distribution - shows the
number of items with values less than or equal to Cumulative
the upper limit of each class. Cumulative Relative
Cost ($) Frequency Frequency
Cumulative relative frequency distribution – shows < 59 2 .04
the proportion of items with values less than or
< 69 15 .30
equal to the upper limit of each class.
< 79 31 2 + 13 .62 15/50
< 89 38 .76
< 99 45 .90
< 109 50 1.00

Slide 23 Slide 24

4
Ogive Ogive with
Cumulative Percent Frequencies
An ogive is a graph of a cumulative distribution.
Tune-up Parts Cost
The data values are shown on the horizontal axis.
100

Cumulative Percent Frequency


Shown on the vertical axis are the:
• cumulative frequencies, or 80
• cumulative relative frequencies.
60 (89, 76)
The frequency of each class is plotted as a point.
The plotted points are connected by straight lines. 40

20
Parts
Cost ($)
50 60 70 80 90 100 110

Slide 25 Slide 26

Exploratory Data Analysis Stem-and-Leaf Display

The techniques of exploratory data analysis consist of ◼ A stem-and-leaf display shows both the rank order
simple arithmetic and easy-to-draw pictures that can and shape of the distribution of the data.
be used to summarize data quickly. ◼ It is similar to a histogram on its side, but it has the
One such technique is the stem-and-leaf display. advantage of showing the actual data values.
◼ The first digits of each data item are arranged to the
left of a vertical line.
◼ To the right of the vertical line we record the last
digit for each item in rank order.
◼ Each line in the display is referred to as a stem.
◼ Each digit on a stem is a leaf.

Slide 27 Slide 28

Example: Hudson Auto Repair Example: Hudson Auto Repair

The manager of Hudson Auto Sample of Parts Cost for 50 Tune-ups


would like to have a better
91 78 93 57 75 52 99 80 97 62
understanding of the cost
71 69 72 89 66 75 79 75 72 76
of parts used in the engine
104 74 62 68 97 105 77 65 80 109
tune-ups performed in the
85 97 88 68 83 68 71 69 67 74
shop. She examines 50
62 82 98 101 79 105 79 69 62 73
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

Slide 29 Slide 30

5
Stem-and-Leaf Display Crosstabulations and Scatter Diagrams

Thus far we have focused on methods that are used


5 2 7 to summarize the data for one variable at a time.
6 2 2 2 2 5 6 7 8 8 8 9 9 9 Often a manager is interested in tabular and
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 graphical methods that will help understand the
8 0 0 2 3 5 8 9 relationship between two variables.
9 1 3 7 7 7 8 9 Crosstabulation and a scatter diagram are two
10 1 4 5 5 9 methods for summarizing the data for two (or more)
variables simultaneously.
a stem
a leaf

Slide 31 Slide 32

Crosstabulation Crosstabulation

A crosstabulation is a tabular summary of data for Example: Finger Lakes Homes


two variables. The number of Finger Lakes homes sold for each
Crosstabulation can be used when: style and price for the past two years is shown below.
• one variable is qualitative and the other is quantitative qualitative
quantitative, variable variable
• both variables are qualitative, or Home Style
Price
• both variables are quantitative.
Range Colonial Log Split A-Frame Total
The left and top margin labels define the classes for
the two variables. < $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100

Slide 33 Slide 34

Crosstabulation Crosstabulation: Row or Column Percentages

Converting the entries in the table into row


Frequency distribution percentages or column percentages can provide
for the price variable
additional insight about the relationship between
Home Style the two variables.
Price
Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution
for the home style variable

Slide 35 Slide 36

6
Crosstabulation: Row Percentages Crosstabulation: Column Percentages

Price Home Style Price Home Style


Range Colonial Log Split A-Frame Total Range Colonial Log Split A-Frame
< $99,000 32.73 10.91 34.55 21.82 100 < $99,000 60.00 30.00 54.29 80.00
> $99,000 26.67 31.11 35.56 6.67 100 > $99,000 40.00 70.00 45.71 20.00
Note: row totals are actually 100.01 due to rounding. Total 100 100 100 100

(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100 (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100

Slide 37 Slide 38

Crosstabulation: Simpson’s Paradox Crosstabulation: Simpson’s Paradox

Data in two or more crosstabulations are often Example: Kidney stone treatment
aggregated to produce a summary crosstabulation. The table below shows the success rates and numbers
We must be careful in drawing conclusions about the of treatments for treatments involving both small and
relationship between the two variables in the large kidney stones.
aggregated crosstabulation.
Treatment A Treatment B
Simpson’ Paradox: In some cases the conclusions Small Stones Group 1 Group 2
based upon an aggregated crosstabulation can be 93% (81/87) 87% (234/270)
completely reversed if we look at the unaggregated Large Stones Group 3 Group 4
data. 73 (192/263) 69% (55/80)
Both 78% (273/350) 83% (289/350)

Slide 39 Slide 40

Scatter Diagram and Trendline Example: Panthers Football Team

A scatter diagram is a graphical presentation of the Scatter Diagram


relationship between two quantitative variables. The Panthers football team is interested
One variable is shown on the horizontal axis and the in investigating the relationship, if any,
other variable is shown on the vertical axis. between interceptions made and points scored.
The general pattern of the plotted points suggests the
x = Number of y = Number of
overall relationship between the variables.
Interceptions Points Scored
A trendline is an approximation of the relationship. 1 14
3 24
2 18
1 17
3 30

Slide 41 Slide 42

7
Scatter Diagram Example: Panthers Football Team

Insights Gained from the Preceding Scatter Diagram


y
35 The scatter diagram indicates a positive relationship
Number of Points Scored

between the number of interceptions and the


30 number of points scored.
25
Higher points scored are associated with a higher
20 number of interceptions.
15 The relationship is not perfect; all plotted points in
10 the scatter diagram are not on a straight line.
5
0 x
0 1 2 3 4
Number of Interceptions

Slide 43 Slide 44

Tabular and Graphical Procedures


Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

•Bar Graph •Frequency


•Histogram
•Frequency •Pie Chart Distribution
•Ogive
Distribution •Rel. Freq. Dist.
•Scatter
•Rel. Freq. Dist. •Cum. Freq. Dist.
Diagram
•Crosstabulation •Cum. Rel. Freq.
Distribution
•Stem-and-Leaf
Display
•Crosstabulation

Slide 45

You might also like