CHAPTER 2 Descriptive Statistics
CHAPTER 2 Descriptive Statistics
CHAPTER 2 Descriptive Statistics
Frequency Distribution
Bar Chart
o A bar chart is a graphical display for depicting qualitative data.
o On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes.
o A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical
axis).
o Using a bar of fixed width drawn above each class label, we extend the height appropriately.
o The bars are separated to emphasize the fact that each class is separate.
Using Excel’s Recommended Charts Tool to Construct a Bar Chart
Pareto Diagram
o In quality control, bar charts are used to identify the most important causes of problems.
o When the bars are arranged in descending order of height from left to right (with the most frequently
occurring cause appearing first) the bar chart is called a Pareto diagram.
o This diagram is named for its founder, Vilfredo Pareto, an Italian economist.
Pie Chart
o The pie chart is a commonly used graphical display for presenting relative frequency and percent frequency
distributions for categorical data.
o First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the
relative frequency for each class.
o Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90
degrees of the circle.
Frequency Distribution
1. Step 1 - Determine the number of non-overlapping classes.
2. Step 2 - Determine the width of each class.
3. Step 3 - Determine the class limits.
Class Midpoint
In some cases, we want to know the midpoints of the classes in a frequency distribution for quantitative data.
The class midpoint is the value halfway between the lower and upper class limits.
Dot Plot
One of the simplest graphical summaries of data is a dot plot.
A horizontal axis shows the range of data values.
Then each data value is represented by a dot placed above the axis.
Histogram
o Another common graphical display of quantitative data is a histogram.
o The variable of interest is placed on the horizontal axis.
o A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative
frequency, or percent frequency.
o Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.
Cumulative Distributions
Cumulative frequency distribution - shows the number of items with values less than or equal to the upper limit of
each class.
Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the
upper limit of each class.
Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the
upper limit of each class.
The last entry in a cumulative frequency distribution always equals the total number of observations.
The last entry in a cumulative relative frequency distribution always equals 1.00.
The last entry in a cumulative percent frequency distribution always equals 100.
Stem-and-Leaf Display
o A stem-and-leaf display shows both the rank order and shape of the distribution of the data.
o It is similar to a histogram on its side, but it has the advantage of showing the actual data values.
o The first digits of each data item are arranged to the left of a vertical line.
o To the right of the vertical line we record the last digit for each item in rank order.
o Each line (row) in the display is referred to as a stem.
o Each digit on a stem is a leaf.
o If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display
vertically by using two stems for each leading digit(s).
o Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value
corresponds to leaf values of 5 - 9.
CHAPTER 2, PART B --- DESCRIPTIVE STATISTICS: TABULAR AND GRAPHICAL DISPLAYS
Crosstabulation
o A crosstabulation is a tabular summary of data for two variables.
o Crosstabulation can be used when:
o one variable is categorical and the other is quantitative,
o both variables are categorical, or
o both variables are quantitative.
o The left and top margin labels define the classes for the two variables.
o Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation.
o We must be careful in drawing conclusions about the relationship between the two variables in the aggregated
crosstabulation.
o In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the
unaggregated data. The reversal of conclusions based on aggregate and unaggregated data is called Simpson’s paradox.
Data Dashboards
o A data dashboard is a widely used data visualization tool.
o It organizes and presents key performance indicators (KPIs) used to monitor an organization or
process.
o It provides timely, summary information that is easy to read, understand, and interpret.
o Some additional guidelines include . . .
Minimize the need for screen scrolling.
Avoid unnecessary use of color or 3D.
Use borders between charts to improve readability.