Chapter 2 Organizing Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Section 2.

CHAPTER 2

Organizing Data
GENERAL
OBJECTIVE

LESSON
OUTLINE

In the last chapter, you learned how to obtain random samples. There are
many tools used in statistics to summarize and organize data. This chapter
will focus on how to obtain tables, graphs, pictures, and charts that are
commonly used to organize data. You should be familiar with Chapter 2 of
your textbook before beginning this chapter.
2.1
2.2
2.3
2.4
2.5
2.6

Variables and Data


Organizing Qualitative Data
Organizing Quantitative Data
Distribution Shapes
Misleading Graphs
Problems

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-1

2-2

Organizing Data

2.1 Variables and Data


A characteristic that varies from one person or thing to another is called a
variable. Examples of variables for humans are height, weight, number of
siblings, sex, marital status, and eye color. The first three of these variables
yield numerical information and are examples of quantitative variables; the
last three yield nonnumerical information and are examples of qualitative
variables, also called categorical variables.
Some of the statistical procedures that you will study are valid for only certain
types of data. This is one reason why you must be able to classify data. The
classifications we have discussed are sufficient for most applications, even
though statisticians sometimes use additional classifications.

2.2 Organizing Qualitative Data


Statistics is about organizing, summarizing, analyzing, and interpreting data.
When the data is qualitative (also called categorical data) we can organize the
data by forming frequency distributions, and relative frequency distributions.
Graphs such as pie charts and bar charts can be used to describe the data.

Constructing Frequency and Relative Frequency Distributions


Examples Professor Weiss asked his introductory statistics students to state their
2.5- 2.6 political-party affiliations as Democratic (D), Republican (R), or Other (O).
The responses are given in Table 2 1. Determine the frequency and relativefrequency distributions for these data.
Table 2 1
Politicalparty
affiliations of
the students

D
R
R
D
R
O
O
R

R
R
D
D
R
D
D
R

O
R
O
R
O
D
O
R

R
D
O
O
R
D
R
R

R
O
R
D
D
R
D
D

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.2

Solution

2-3

To construct frequency and relative-frequency distributions,


1. Enter the data in Table 2 1 into a new data file named PARTY.
2. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box (Figure 2 1).
3. Paste the variable PARTY to the Variable(s) box.

Figure 2 1
Frequencies
dialog box

4. Click the OK button and the Frequencies Table will open in the Viewer
window. (Figure 2 2)
Figure 2 2
Frequency
table for
political party

PARTY
Cumulative
Frequency
Valid

Percent

Valid Percent

Percent

13

32.5

32.5

32.5

22.5

22.5

55.0

18

45.0

45.0

100.0

Total

40

100.0

100.0

Constructing Pie Charts and Bar Charts


Examples Use the data in Table 2 1 to make a pie chart and a bar chart for the data on
2.7 -2.8 political party affiliations.
To construct a bar chart,
1. Enter the data in Table 2 1 into a new data file named PARTY.
2. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box (Figure 2 1).
3. Paste the variable PARTY to the Variable(s) box.
Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-4

Organizing Data

4. Select Charts to open the Frequencies:Charts dialog box (Figure 2 3).


Figure 2 3
Frequencies:
Charts dialog
box

5. Select Bar charts. Click on Continue to close the Frequencies: Charts


dialog.
6. Clicking the OK button will produce the bar chart (Figure 2 4) for the
variable, PARTY, in the Viewer window.
The results of every procedure are displayed in a window called the Viewer.
Figure 2 4
Bar Chart of
Political
Party
Affiliation

To construct a pie chart,


1. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box (Figure 2 1).
2. Paste the variable PARTY to the Variable(s) box.
3. Select Charts to open the Frequencies:Charts dialog box (Figure 2 3).
4. Select Pie charts. Click on Continue to close the Frequencies: Charts
dialog.
6. Clicking the OK button will produce the pie chart (Figure 2 5) for the
variable, PARTY, in the Viewer window.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.2

2-5

Figure 2 5
Pie Chart in
the Viewer

The pie hart that is created will show a different color for each category. In
this example the categories are D (for democrat), R (for Republican) and O
(for other).
If the pie chart is printed in black and white then it is impossible see the
colors. In this case, it is preferable to display each section with a different
pattern rather than a different color.
To make changes to a pie chart, double click on the pie chart to open the chart
editor dialog box. (Figure 2 6). Double click on the pie chart in the Chart
Editor to open the Properties dialog box and select the tab for variables
(Figure 2 7).

Figure 2 6
Chart Editor
dialog box

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-6

Organizing Data

Figure 2 7
Properties
dialog box
for pie chart

Click on Style and select Style: Pattern. Click on APPLY. The pie chart will
now have a different pattern for each category rather than a different color.
(Figure 2 8)
Figure 2 8
Pie Chart
showing a
different
pattern for
each
category

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.3

2-7

2.3 Organizing Quantitative Data


It is said that a picture is worth a thousand words. Describing the distribution
of data can be accomplished much more succinctly with pictures than with
table. When the data is quantitative (also called numerical data) we can
organize the data by constructing frequency distributions and relative
frequency distribution. We can also graph the data using histograms, and
stem-and-leaf diagrams.

Frequency Distributions and Relative Frequency Distributions


For a first look at data, it is practical to begin by describing the distribution of
values in a data set. Many statistical tools have been developed for this
purpose. Common tools used for organizing and summarizing a large number
of data values are frequency distributions and relative frequency
distributions.

Constructing Frequency Distributions and Relative Frequency Distributions


Example 2.12 TVs per Household: Trends in Television, published by the Television Bureau
of Advertising, contains information on the number of television sets owned
by U.S. households. Data on the number of TV sets per household for 50
randomly selected households is displayed in Table 2 2.
Table 2 2
Number of
TVs per
Household
Solution

1
3
3
0
3

1
2
1
3
2

1
1
1
1
1

2
5
4
2
2

6
2
3
1
1

3
1
2
2
1

3
3
2
3
3

4
6
2
1
1

2
2
2
1
5

4
2
3
3
1

Complete the following steps to obtain a frequency distribution and a relative


frequency distribution.
1. Enter the data from Table 2 2 into a new data file named TVs.
2. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box (Figure 2 9).
3. Highlight the variable TVs by clicking on it.
4. Click the Variable Paste,
box.

, button to copy the variable to the Variable(s)

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-8

Organizing Data

Figure 2 9
Frequencies
dialog box

4. Clicking the OK button will produce the grouped-data table (Figure 2


10) for the variable, TVs, in the Viewer window.
The results of every procedure are displayed in a window called the Viewer.
Figure 2 10
Groupeddata table for
TVs per
household

Since frequency distributions are often the first tool used to describe a data
set, SPSS provides buttons in the Frequencies dialog box (see Figure 2 9)
for calculating some common statistics and charts. The next two sections will
discuss some of these charts.

Constructing a Histogram
A histogram is a picture of the data that shows the shape, center, and spread
of the distribution.
Example 2.15

TVs per Household: Consider again the data in Table 2-2 of Example 2.12,
the number of TVs per household for 50 randomly selected households. Make
a histogram for this data.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.3

Solution

2-9

To make a histogram, complete the following steps.


1. Enter the data in Table 2 2 into a new data file named TVs.
2. Choose Analyze > Descriptive Statistics > Frequencies to open
the Frequencies dialog box (Figure 2 1).
3. Paste the variable, TVs, into the Variable(s) list.
4. Click the Charts button to open the Frequencies: Charts dialog box
(Figure 2 11).
5. Choose the bullet for Histograms to make a histogram for the variables
listed in the Variable(s) box. This same dialog box can be used to make
bar charts and pie charts.
6. Click the Continue button to return to the Frequencies dialog box (Figure
2 1).

Figure 2 11
Frequencies:
Charts dialog
box

SPSS is now setup to make a frequencies report and a histogram for the
variable, TVs. At this point, we could click the Statistics button in the
Frequencies dialog box (Figure 2 1) to have SPSS calculate some
descriptive statistics as well. We will discuss descriptive statistics in the next
chapter.
7. Click the OK button in the Frequencies dialog box and the
Viewer window (Figure 2 12) will open.
The results from a command procedure being run in a dialog box are
displayed in the Viewer window. In this case, a frequencies report and a
histogram are shown in the Viewer window (Figure 2 12).

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-10

Organizing Data

Figure 2 12
Viewer
window
showing the
output of the
Frequencies
procedure

The Viewer has its own menus. Notice that the menu bar for the Viewer is
different from the menu bar for the Data Editor. The Viewer window is
divided into two independent panes. The left pane (Outline pane) shows the
output in outline view. The right pane (Output pane) shows the output (the
tables, charts, and text output). You can use the scroll bars to browse the
results or select an item in the Outline pane to re-center the Output pane on
the selected item.
The Outline pane can be used to selectively display or hide individual results
in the Output pane. This is useful to shorten the amount of visible output in
the Output pane. For example, Notes, which has information about the
execution of a procedure, is hidden by default.
The histogram (Figure 2 13) shows the distribution of the data values to be
decidedly skewed to the right and approximately centered about the
approximate value of two.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.3

2-11

Figure 2 13
Histogram:
Number of
TVs in
household

Stem-and-Leaf Diagrams
Statisticians continue to invent ways to display data. One method, developed in
the 1960s by the late Professor John Tukey of Princeton University, is called a
stem-and-leaf diagram, or stemplot. This ingenious diagram is often easier to
construct than either a frequency distribution or a histogram and generally
displays more information.
With a stem-and-leaf diagram, we think of each observation as a stem
consisting of all but the rightmost digitand a leaf, the rightmost digit. In
general, stems may use as many digits as required, but each leaf must contain only
one digit.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-12

Organizing Data

Constructing a Stem-and-Leaf Diagram


Example
2.17

Table 2 3
Days to
maturity for
40 short-term
investments

Solution

Table 2 3 gives the number of days to maturity for 40 short-term


investments. The data are from Barrons National Business and Financial
Weekly. Construct a stem and leaf diagram
70
89
67
39
99
53
80
66

64
87
70
75
68
47
98
85

99
65
60
56
95
50
51
79

55
62
69
71
86
55
36
83

64
38
78
51
57
81
63
70

To make a stem-and-leaf diagram, perform the following steps.


1. Enter the data in Table 2 3 into a new data file named MATURITY.
2. Choose Analyze > Descriptive Statistics > Explore to open the
Explore dialog box (Figure 2 14).
3. Paste the variable, MATURITY, into the Dependent List box.

Figure 2 14
Explore
dialog box

The Explore procedure can produce both descriptive statistics and graphical
displays. We will discuss the descriptive statistics in the next chapter.
We can control what is displayed in the Viewer window by choosing one of
the bullets in the Display section.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.3

2-13

3a. Choose the bullet for Plots to display the plots and suppress all descriptive
statistics, or
3b. Choose the bullet for Statistics to display the descriptive statistics and
suppress printing all plots, or
3c. Choose the bullet for Both to display both descriptive statistics and plots.
4. Click the Plots button to open the Explore: Plots dialog box
(Figure 2 15).
This dialog box can be used to make certain plots and diagrams such as,
Boxplots, Stem-and-Leaf diagrams, and Histograms. We will discuss the
boxplot in the next chapter.
5. Choose the checkbox for Stem-and-leaf to make a stem-and-leaf diagram
for each of the variables in the Dependent List in the Explore dialog box.
6. Choose the bullet for None to suppress making any boxplots.
Figure 2 15
Explore:
Plots dialog
box

Notice that the Explore: Plots dialog box has both bullets and checkboxes.
Checkboxes and bullets serve two different purposes. Within a grouping, only
a single bullet may be chosen while multiple checkboxes can be selected. For
example, there are two different types of Boxplots that can be made, but only
one type is appropriate for a set of data. Therefore, only one of the bullets in
the Boxplot grouping can be chosen.
Choosing the checkbox for Histogram and the checkbox for Stem-and-leaf
will display both the histogram and stem-and-leaf diagram in the Viewer
window. We have already seen that a histogram can also be made in the
Frequencies: Charts dialog box (Figure 2 11).
Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-14

Organizing Data

In this problem we only want a stem-and-leaf diagram therefore,


7. Choose the checkbox for Stem-and-leaf. Click the Continue
button to return to the Explore dialog box (Figure 2 14).
8. Click the OK button in the Explore dialog box and the stem-andleaf diagram will appear in the Viewer window (Figure 2 17).
Figure 2 17
Stem-andleaf diagram
for days to
maturity for
40 short-term
investments

Let us focus on the detail in the stem-and-leaf diagram (Figure 2 17). Below
the stem-and-leaf diagram we see Stem width: 10 and Each leaf: 1
case(s). This tells us that each leaf in the stem-and-leaf diagram represents
one number and that the difference between two consecutive stem values is
10. Therefore, the first stem, 3, thus represents 30. If the stem width had
been 100, the first stem, 3, would have represented 300.
The second column gives the stem and it is followed by a column of dots. To
the right of the dots are the leaves. Since the leaves on each line are ordered,
we see that SPSS displays an ordered stem-and-leaf diagram. The first
column is the frequency on each of the stems. When there are many values on
the stem this column can be useful. The leaf 6 on the row associated with 3
tells us the minimum data value is 36.
The stem-and-leaf diagram helps to show the shape of the distribution as the
histogram shows the shape of the distribution.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.5

2-15

2.4 Distribution Shapes


The distribution of a data set is a table, graph, or formula that provides the
values of the observations and how often they occur. The distribution of a
data set can be portrayed by frequency distributions, relative frequency
distributions, histograms, stem-and-leaf diagrams, pie charts, and bar charts.
An important aspect of the distribution of a quantitative data set is its shape.
To identify the shape of a distribution, the best approach usually is to use a
smooth curve that approximates the overall shape.
When considering the shape of a distribution, you should also observe its
number of peaks (highest points). A distribution is unimodal if it has one
peak, bimodal if it has two peaks, and multimodal if it has three or more
peaks.
When describing the shape of a distribution you should also consider whether
the distribution is symmetric or skewed. If the distribution can be divided into
two pieces that are mirror images of each other then we call the distribution
symmetric. A unimodal distribution that is not symmetric, is either rightskewed or left-skewed.

2.5 Misleading Graphs


Graphs and charts are frequently misleading, sometimes intentionally and
sometimes inadvertently. Regardless of intent, we need to read and interpret
graphs and charts with a great deal of care. You should also be careful to
make graphs that clearly present the material.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

2-16

Organizing Data

2.6 Problems
Problem 2.11

Top Broadcast Shows: The following table gives the top five television shows,
as determined by the Nielsen Ratings for the week ending October 19, 2008.
Identify the type of data provided by the information in each column of the
Table 2 4.

Table 2 4
Top
Broadcast
Shows

Problem 2.18

Table 2 5
Networks for
top 20
Television
Shows

Rank
1
2
3
4
5

Show title

Network

CSI
NCIS
Dancing with the Stars
Desperate Housewives
The Mentalist

CBS
CBS
ABC
ABC
CBS

Viewers
(millions)
19.3
18.0
17.8
15.5
14.9

Top Broadcast Shows: The networks for the top 20 television shows, as
determined by the Nielsen Ratings for the week ending October 26, 2008, are
shown in Table 2 5.

CBS
Fox
ABC
Fox

ABC
CBS
CBS
Fox

CBS
CBS
CBS
CBS

ABC
Fox
CBS
Fox

ABC
CBS
Fox
ABC

a. determine a frequency distribution.


b. obtain a relative-frequency distribution.
c. draw a pie chart.
d. construct a bar chart.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 2.6

Problem 2.20

Table 2 6
College of
the students

2-17

Colleges of Students: Table 2 6 provides data on college for the students in


one section of the course Introduction to Computer Science during one
semester at Arizona State University. In the table, we use the abbreviations
BUS for Business, ENG for Engineering and Applied Sciences, and LIB for
Liberal Arts and Sciences.
ENG
LIB
BUS
LIB
ENG

ENG
LIB
BUS
BUS
ENG

BUS
ENG
ENG
BUS
LIB

BUS
ENG
BUS
BUS
ENG

ENG
ENG
ENG
ENG
BUS

a. determine a frequency distribution.


b. obtain a relative-frequency distribution.
c. draw a pie chart.
d. construct a bar chart.
Problem 2.56

Residential Energy Consumption: The U.S. Energy Information


Administration collects data on residential energy consumption and
expenditures. Results are published in the document Residential Energy
Consumption Survey: Consumption and Expenditures. The following table
gives one years energy consumption for a sample of 50 households in the
South. Data are in millions of BTUs. Use the data in Table 2 7 to determine
a frequency distribution, obtain a relative-frequency distribution and construct
a histogram.

Table 2 7
Annual
Energy
Consumption

130
58
97
54
96

55
101
77
86
87

45
75
51
100
129

64
111
67
78
109

155
151
125
93
69

66
139
50
113
94

60
81
136
111
99

80
55
55
104
97

102
66
83
96
83

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

62
90
91
113
97

2-18

Organizing Data

Problem 2.57

Table 2 8
Age at
Diagnosis

Problem 2.71

Table 2 9
Number of
Patents

Early-Onset Dementia: Dementia is a persons loss of intellectual and social


abilities that is severe enough to interfere with judgment, behavior, and daily
functioning. Alzheimers disease is the most common type of dementia. In the
article Living with Early Onset Dementia: Exploring the Experience and
Developing Evidence-Based Guidelines for Practice (Alzheimers Care
Quarterly, Vol. 5, Issue 2, pp. 111122), P. Harris and J. Keady explored the
experience and struggles of people diagnosed with dementia and their families.
A simple random sample of 21 people with early-onset dementia gave data on
age, in years, at diagnosis in Table 2 8.
60
61
47

58
54
42

52
59
56

58
55
57

59
53
49

58
44
41

51
46
43

University Patents: The number of patents a university receives is an indicator


of the research level of the university. From a study titled Science and
Engineering Indicators issued by the National Science Foundation, we found
the number of U.S. patents awarded to a sample of 36 private and public
universities. The results are presented in Table 2 9. Construct a stem-andleaf diagram for these data.
93
35
35
3

27
24
2
69

11
19
15
23

30
14
4
18

9
29
16
41

30
11
79
11

35
2
16
7

20
55
22
34

9
15
49
16

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

You might also like