CHAPTER 2 Descriptive Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

CHAPTER 2, PART A - DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL DISPLAYS

SUMMARIZING CATEGORICAL DATA

 Frequency Distribution

 Relative Frequency Distribution


o The relative frequency of a class is the fraction or proportion of the total number of data items belonging to a
class.
o A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each
class.
Frequency of theclass
o
n
 Percent Frequency Distribution
o The percent frequency of a class is the relative frequency multiplied by 100.
o A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each
class.

 Bar Chart
o A bar chart is a graphical display for depicting qualitative data.
o On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes.
o A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical
axis).
o Using a bar of fixed width drawn above each class label, we extend the height appropriately.
o The bars are separated to emphasize the fact that each class is separate.
Using Excel’s Recommended Charts Tool to Construct a Bar Chart

1. Step 1 - Select cells C1:D6


2. Step 2 - Click Insert on the Ribbon
3. Step 3 - In the Charts group, click Recommended Charts (a preview showing of bar chart appears)
4. Step 4 - Click OK (the bar chart will appear in a new worksheet)

 Pareto Diagram
o In quality control, bar charts are used to identify the most important causes of problems.
o When the bars are arranged in descending order of height from left to right (with the most frequently
occurring cause appearing first) the bar chart is called a Pareto diagram.
o This diagram is named for its founder, Vilfredo Pareto, an Italian economist.

 Pie Chart
o The pie chart is a commonly used graphical display for presenting relative frequency and percent frequency
distributions for categorical data.
o First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the
relative frequency for each class.
o Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90
degrees of the circle.

SUMMARIZING QUANTITATIVE DATA

 Frequency Distribution
1. Step 1 - Determine the number of non-overlapping classes.
2. Step 2 - Determine the width of each class.
3. Step 3 - Determine the class limits.

o Use between 5 and 20 classes.


o Data sets with a larger number of elements usually require a larger number of classes.
o Smaller data sets usually require fewer classes.
o The goal is to use enough classes to show the variation in the data, but not so many classes that some contain only a
few data items.

Guidelines for Determining the Width of Each Class


o Use classes of equal width.
Largest data value−Smallest data value
o Approximate Class Width =
Number of classes
o Making the classes the same width reduces the chance of inappropriate interpretations.

Guidelines for Determining the Class Limits


o Class limits must be chosen so that each data item belongs to one and only one class.
o The lower class limit identifies the smallest possible data value assigned to the class.
o The upper class limit identifies the largest possible data value assigned to the class.
o The appropriate values for the class limits depend on the level of accuracy of the data.
o An open-end class requires only a lower class limit or an upper class limit.

Class Midpoint

 In some cases, we want to know the midpoints of the classes in a frequency distribution for quantitative data.
 The class midpoint is the value halfway between the lower and upper class limits.
 Dot Plot
 One of the simplest graphical summaries of data is a dot plot.
 A horizontal axis shows the range of data values.
 Then each data value is represented by a dot placed above the axis.

 Histogram
o Another common graphical display of quantitative data is a histogram.
o The variable of interest is placed on the horizontal axis.
o A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative
frequency, or percent frequency.
o Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.

 Cumulative Distributions

 Cumulative frequency distribution - shows the number of items with values less than or equal to the upper limit of
each class.
 Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the
upper limit of each class.
 Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the
upper limit of each class.
 The last entry in a cumulative frequency distribution always equals the total number of observations.
 The last entry in a cumulative relative frequency distribution always equals 1.00.
 The last entry in a cumulative percent frequency distribution always equals 100.

 Stem-and-Leaf Display
o A stem-and-leaf display shows both the rank order and shape of the distribution of the data.
o It is similar to a histogram on its side, but it has the advantage of showing the actual data values.
o The first digits of each data item are arranged to the left of a vertical line.
o To the right of the vertical line we record the last digit for each item in rank order.
o Each line (row) in the display is referred to as a stem.
o Each digit on a stem is a leaf.

o If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display
vertically by using two stems for each leading digit(s).
o Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value
corresponds to leaf values of 5 - 9.
CHAPTER 2, PART B --- DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL DISPLAYS

 Crosstabulation
o A crosstabulation is a tabular summary of data for two variables.
o Crosstabulation can be used when:
o one variable is categorical and the other is quantitative,
o both variables are categorical, or
o both variables are quantitative.
o The left and top margin labels define the classes for the two variables.

Crosstabulation: Simpson’s Paradox

o Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation.
o We must be careful in drawing conclusions about the relationship between the two variables in the aggregated
crosstabulation.
o In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the
unaggregated data. The reversal of conclusions based on aggregate and unaggregated data is called Simpson’s paradox.

 Scatter Diagram and Trendline


o A scatter diagram is a graphical presentation of the relationship between two quantitative variables.
o One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
o The general pattern of the plotted points suggests the overall relationship between the variables.
o A trendline provides an approximation of the relationship.
o Positive relationship – going upwards
o Negative relationship --- going downwards

 Side-by-Side Bar Chart


o A side-by-side bar chart is a graphical display for depicting multiple bar charts on the same display.
o Each cluster of bars represents one value of the first variable.
o Each bar within a cluster represents one value of the second variable.

 Stacked Bar Chart


o A stacked bar chart is another way to display and compare two variables on the same display.
o It is a bar chart in which each bar is broken into rectangular segments of a different color.
o If percentage frequencies are displayed, all bars will be of the same height (or length), extending to the 100%
mark.

 Choosing the Type of Graphical Display


 Displays used to show the distribution of data:
 Bar Chart to show the frequency distribution or relative frequency distribution for categorical data
 Pie Chart to show the relative frequency or percent frequency for categorical data
 Dot Plot to show the distribution for quantitative data over the entire range of the data
 Histogram to show the frequency distribution for quantitative data over a set of class intervals
 Stem-and-Leaf Display to show both the rank order and shape of the distribution for quantitative data

 Choosing the Type of Graphical Display

 Displays used to make comparisons:


o Side-by-Side Bar Chart to compare two variables
o Stacked Bar Chart to compare the relative frequency or Percent frequency of two categorical variables

 Displays used to show relationships:


o Scatter Diagram to show the relationship between two quantitative variables
o Trendline to approximate the relationship of data in a scatter diagram

Data Dashboards
o A data dashboard is a widely used data visualization tool.
o It organizes and presents key performance indicators (KPIs) used to monitor an organization or
process.
o It provides timely, summary information that is easy to read, understand, and interpret.
o Some additional guidelines include . . .
 Minimize the need for screen scrolling.
 Avoid unnecessary use of color or 3D.
 Use borders between charts to improve readability.

You might also like