Chapter 2 Organizing Data
Chapter 2 Organizing Data
Chapter 2 Organizing Data
CHAPTER 2
Organizing Data
GENERAL
OBJECTIVE
LESSON
OUTLINE
In the last chapter, you learned how to obtain random samples. There are
many tools used in statistics to summarize and organize data. This chapter
will focus on how to obtain tables, graphs, pictures, and charts that are
commonly used to organize data. You should be familiar with Chapter 2 of
your textbook before beginning this chapter.
2.1
2.2
2.3
2.4
2.5
2.6
2-1
2-2
Organizing Data
D
R
R
D
R
O
O
R
R
R
D
D
R
D
D
R
O
R
O
R
O
D
O
R
R
D
O
O
R
D
R
R
R
O
R
D
D
R
D
D
Section 2.2
Solution
2-3
Figure 2 1
Frequencies
dialog box
4. Click the OK button and the Frequencies Table will open in the Viewer
window. (Figure 2 2)
Figure 2 2
Frequency
table for
political party
PARTY
Cumulative
Frequency
Valid
Percent
Valid Percent
Percent
13
32.5
32.5
32.5
22.5
22.5
55.0
18
45.0
45.0
100.0
Total
40
100.0
100.0
2-4
Organizing Data
Section 2.2
2-5
Figure 2 5
Pie Chart in
the Viewer
The pie hart that is created will show a different color for each category. In
this example the categories are D (for democrat), R (for Republican) and O
(for other).
If the pie chart is printed in black and white then it is impossible see the
colors. In this case, it is preferable to display each section with a different
pattern rather than a different color.
To make changes to a pie chart, double click on the pie chart to open the chart
editor dialog box. (Figure 2 6). Double click on the pie chart in the Chart
Editor to open the Properties dialog box and select the tab for variables
(Figure 2 7).
Figure 2 6
Chart Editor
dialog box
2-6
Organizing Data
Figure 2 7
Properties
dialog box
for pie chart
Click on Style and select Style: Pattern. Click on APPLY. The pie chart will
now have a different pattern for each category rather than a different color.
(Figure 2 8)
Figure 2 8
Pie Chart
showing a
different
pattern for
each
category
Section 2.3
2-7
1
3
3
0
3
1
2
1
3
2
1
1
1
1
1
2
5
4
2
2
6
2
3
1
1
3
1
2
2
1
3
3
2
3
3
4
6
2
1
1
2
2
2
1
5
4
2
3
3
1
2-8
Organizing Data
Figure 2 9
Frequencies
dialog box
Since frequency distributions are often the first tool used to describe a data
set, SPSS provides buttons in the Frequencies dialog box (see Figure 2 9)
for calculating some common statistics and charts. The next two sections will
discuss some of these charts.
Constructing a Histogram
A histogram is a picture of the data that shows the shape, center, and spread
of the distribution.
Example 2.15
TVs per Household: Consider again the data in Table 2-2 of Example 2.12,
the number of TVs per household for 50 randomly selected households. Make
a histogram for this data.
Section 2.3
Solution
2-9
Figure 2 11
Frequencies:
Charts dialog
box
SPSS is now setup to make a frequencies report and a histogram for the
variable, TVs. At this point, we could click the Statistics button in the
Frequencies dialog box (Figure 2 1) to have SPSS calculate some
descriptive statistics as well. We will discuss descriptive statistics in the next
chapter.
7. Click the OK button in the Frequencies dialog box and the
Viewer window (Figure 2 12) will open.
The results from a command procedure being run in a dialog box are
displayed in the Viewer window. In this case, a frequencies report and a
histogram are shown in the Viewer window (Figure 2 12).
2-10
Organizing Data
Figure 2 12
Viewer
window
showing the
output of the
Frequencies
procedure
The Viewer has its own menus. Notice that the menu bar for the Viewer is
different from the menu bar for the Data Editor. The Viewer window is
divided into two independent panes. The left pane (Outline pane) shows the
output in outline view. The right pane (Output pane) shows the output (the
tables, charts, and text output). You can use the scroll bars to browse the
results or select an item in the Outline pane to re-center the Output pane on
the selected item.
The Outline pane can be used to selectively display or hide individual results
in the Output pane. This is useful to shorten the amount of visible output in
the Output pane. For example, Notes, which has information about the
execution of a procedure, is hidden by default.
The histogram (Figure 2 13) shows the distribution of the data values to be
decidedly skewed to the right and approximately centered about the
approximate value of two.
Section 2.3
2-11
Figure 2 13
Histogram:
Number of
TVs in
household
Stem-and-Leaf Diagrams
Statisticians continue to invent ways to display data. One method, developed in
the 1960s by the late Professor John Tukey of Princeton University, is called a
stem-and-leaf diagram, or stemplot. This ingenious diagram is often easier to
construct than either a frequency distribution or a histogram and generally
displays more information.
With a stem-and-leaf diagram, we think of each observation as a stem
consisting of all but the rightmost digitand a leaf, the rightmost digit. In
general, stems may use as many digits as required, but each leaf must contain only
one digit.
2-12
Organizing Data
Table 2 3
Days to
maturity for
40 short-term
investments
Solution
64
87
70
75
68
47
98
85
99
65
60
56
95
50
51
79
55
62
69
71
86
55
36
83
64
38
78
51
57
81
63
70
Figure 2 14
Explore
dialog box
The Explore procedure can produce both descriptive statistics and graphical
displays. We will discuss the descriptive statistics in the next chapter.
We can control what is displayed in the Viewer window by choosing one of
the bullets in the Display section.
Section 2.3
2-13
3a. Choose the bullet for Plots to display the plots and suppress all descriptive
statistics, or
3b. Choose the bullet for Statistics to display the descriptive statistics and
suppress printing all plots, or
3c. Choose the bullet for Both to display both descriptive statistics and plots.
4. Click the Plots button to open the Explore: Plots dialog box
(Figure 2 15).
This dialog box can be used to make certain plots and diagrams such as,
Boxplots, Stem-and-Leaf diagrams, and Histograms. We will discuss the
boxplot in the next chapter.
5. Choose the checkbox for Stem-and-leaf to make a stem-and-leaf diagram
for each of the variables in the Dependent List in the Explore dialog box.
6. Choose the bullet for None to suppress making any boxplots.
Figure 2 15
Explore:
Plots dialog
box
Notice that the Explore: Plots dialog box has both bullets and checkboxes.
Checkboxes and bullets serve two different purposes. Within a grouping, only
a single bullet may be chosen while multiple checkboxes can be selected. For
example, there are two different types of Boxplots that can be made, but only
one type is appropriate for a set of data. Therefore, only one of the bullets in
the Boxplot grouping can be chosen.
Choosing the checkbox for Histogram and the checkbox for Stem-and-leaf
will display both the histogram and stem-and-leaf diagram in the Viewer
window. We have already seen that a histogram can also be made in the
Frequencies: Charts dialog box (Figure 2 11).
Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.
2-14
Organizing Data
Let us focus on the detail in the stem-and-leaf diagram (Figure 2 17). Below
the stem-and-leaf diagram we see Stem width: 10 and Each leaf: 1
case(s). This tells us that each leaf in the stem-and-leaf diagram represents
one number and that the difference between two consecutive stem values is
10. Therefore, the first stem, 3, thus represents 30. If the stem width had
been 100, the first stem, 3, would have represented 300.
The second column gives the stem and it is followed by a column of dots. To
the right of the dots are the leaves. Since the leaves on each line are ordered,
we see that SPSS displays an ordered stem-and-leaf diagram. The first
column is the frequency on each of the stems. When there are many values on
the stem this column can be useful. The leaf 6 on the row associated with 3
tells us the minimum data value is 36.
The stem-and-leaf diagram helps to show the shape of the distribution as the
histogram shows the shape of the distribution.
Section 2.5
2-15
2-16
Organizing Data
2.6 Problems
Problem 2.11
Top Broadcast Shows: The following table gives the top five television shows,
as determined by the Nielsen Ratings for the week ending October 19, 2008.
Identify the type of data provided by the information in each column of the
Table 2 4.
Table 2 4
Top
Broadcast
Shows
Problem 2.18
Table 2 5
Networks for
top 20
Television
Shows
Rank
1
2
3
4
5
Show title
Network
CSI
NCIS
Dancing with the Stars
Desperate Housewives
The Mentalist
CBS
CBS
ABC
ABC
CBS
Viewers
(millions)
19.3
18.0
17.8
15.5
14.9
Top Broadcast Shows: The networks for the top 20 television shows, as
determined by the Nielsen Ratings for the week ending October 26, 2008, are
shown in Table 2 5.
CBS
Fox
ABC
Fox
ABC
CBS
CBS
Fox
CBS
CBS
CBS
CBS
ABC
Fox
CBS
Fox
ABC
CBS
Fox
ABC
Section 2.6
Problem 2.20
Table 2 6
College of
the students
2-17
ENG
LIB
BUS
BUS
ENG
BUS
ENG
ENG
BUS
LIB
BUS
ENG
BUS
BUS
ENG
ENG
ENG
ENG
ENG
BUS
Table 2 7
Annual
Energy
Consumption
130
58
97
54
96
55
101
77
86
87
45
75
51
100
129
64
111
67
78
109
155
151
125
93
69
66
139
50
113
94
60
81
136
111
99
80
55
55
104
97
102
66
83
96
83
62
90
91
113
97
2-18
Organizing Data
Problem 2.57
Table 2 8
Age at
Diagnosis
Problem 2.71
Table 2 9
Number of
Patents
58
54
42
52
59
56
58
55
57
59
53
49
58
44
41
51
46
43
27
24
2
69
11
19
15
23
30
14
4
18
9
29
16
41
30
11
79
11
35
2
16
7
20
55
22
34
9
15
49
16