Strategic Data
Strategic Data
Strategic Data
Please send comments/corrections/additions to the author: George Ludovici at [email protected] , thank you.
The author credits fellow participants at PCMI, especially Bill Thill, for help with the basic idea of this project.
Objective – To address statistical components of the NYS Integrated Algebra Regents level course with accommodations
for anticipated impact of the Common Core Standards.
Outline
Project and data collection
Frequency and Cumulative Frequency Histograms
Box and Whiskers Plots
Scatter Plot
Line of Best Fit
Correlation
Central Tendency
Materials –
16 plastic stackable/disposable cups per student group, 3 to 5 students per group
Two different color markers, for each group to write on the cups
Stop watch or other method of timing to the nearest second for each group
Tape measure (reach across student’s arm-span)
Standards – There is emphasis in the Common Core Mathematics Standards on deeper thinking and communication. To
this end students are often asked to explain their reasoning throughout this unit.
The two example histograms and the example scatter plot are borrowed from old NYS Regents exam questions.
1) Fill in the names, genders and arm spans. Arm span is fingertip to fingertip with arms outstretched to the
sides.
2) Number 16 cups, 1 to 16, using one color for even numbers and one color for odd numbers. Write the
numbers several times so they are visible on the sides of the cups no matter how the cup is turned. An
alternative to this is to use two different cup colors, what is important is being able to identify that only one
cup at a time is being moved and when all the cups have been moved.
3) Take turns being: a stacker, a timer, and a spotter. Stacker: holds the cups. Timer: operates the timer and
records the data. Spotter: makes sure stacker performs correctly.
4) Stand up and hold the stack of cups. Take one cup from the bottom of the stack with one hand and put it in
the top of the stack, then use your other hand to take the new bottom cup and put it at the top of the stack.
5) You must alternate hands throughout the experiment. You are done when the cup which started at the
bottom goes all the way through one cycle, returning to the bottom.
6) The stacker must be standing up.
7) The spotter is responsible for making sure every number 1-16 goes by. Specifically that only one cup at a
time is being moved and when the end is reached.
8) Mistakes must be corrected by undoing and repeating the action correctly. Do not start a new trial.
9) The timer stops as soon as the original cup is back at the bottom of the stack and the stack is complete.
10) Make sure the stacker goes through this experiment five times in a row, then everyone in the group
switches to a new position.
Analysis
How would you describe the relationships you see in your data?
How is your description helpful to someone who has not seen your data?
What else might that person want or need to know about your data?
During this statistical unit, we are going to use the data you collected from your experiment to answer the
above questions.
As you do your work you may think of additional questions. Think about how you might use the statistical
tools that we are going to learn about to help answer those questions. If the statistical tools we use don’t
help to answer your questions, think about how you might create a different tool to answer the question.
Histogram – You are going to create frequency histograms to communicate your data graphically. Two example
histograms are discussed: a regular histogram on this page and then a cumulative histogram on the next page.
Notice how the vertical axis always represents frequency.
Write a sentence describing exactly what the fourth gray bar, above
190-199, for Student Heights means.
Height Cumulative
Interval Total Total
Using the above histogram, complete this table.
160-169
Discuss your table answers. Should the “Total”
entry for 170-179 be 2 or 4 or 6? Why?
170-179
180-189
190-199
200-209
Describe exactly what the fourth gray bar, above 41-80, for CumulativeTest Scores means.
Draw a Cumulative Student Height histogram based on the data in the Student Heights histogram.
Draw a Test Scores histogram based on the data in the Cumulative Test Scores histogram.
We are now going to draw a frequency histogram based on the data you collected using the Total Time.
The width of each histogram bar represents an interval of your data. In our case this will be in seconds.
19 What does interval mean?
Your interval should be greater than or equal to one depending on your data.
20 What interval will you choose for your bars?
24 What does scale mean? 34 Complete this table based on your cup
stacking data.
25 Write the scale you will use on your horizontal axis. 35 Will you need to fill in every line in this
table?
Five numbers are used to create a Box and Whiskers plot. First put your data in order (sort) from smallest to largest.
2) The middle number in the lower half of your sorted data. This is called the first or lower quartile. Quartile
1
sounds like quarter which is or 25% of all the data points. If two numbers are in the middle then take
4
their average by adding them up and dividing by two. If there is an overall odd number of data values then
do not include the median in the upper or the lower half when determining quartiles.
3) The middle number in all of your sorted data. This is called the median; think of median as middle. For
example, a highway or a wide street often has a median running down the middle to separate cars going in
different directions. The median is the same as the second quartile. Half (50%) of the data comes before
the median and half (50%) of the data comes after the median. Like the lower quartile, if there are an even
number of data points so that two points are in the middle, then average them to find the median.
4) The middle number in the higher half of your sorted data. This is called the third or upper quartile.
Notice how in the above Box and Whiskers plot the quartiles or vertical lines are not evenly spaced.
7) What does this tell you about how the data are spread out?
8) The Interquartile Range shows how spread out the middle half of the data is. The Interquartile Range is
calculated by subtracting the first quartile from the third quartile. What is the Interquartile Range for the
box and whiskers plot on the previous page?
1) Lowest: ___________
3) Median: ___________
5) Highest: ___________
The Box and Whiskers plot is drawn above part of a number line which is the scale for the plot.
Draw your cup-stacking Box and Whisker plot. Be sure to include a title, the scale, the box around the three middle
numbers, vertical lines on the first, third and fifth numbers and connect the whiskers with horizontal lines.
How does the spread of the data from your Cup Stacking Box and Whiskers plot compare with the spread of the data
from the sample Box and Whiskers plot that we have been using?
Scatter Plots
A sample scatter plot for the maximum height and speed of some roller coasters is shown in the table below and
graphed with a scatter plot.
The scatter plot has a data point graphed for every piece of data. Although the above table is sorted by height, it does
not need to be. The points are not connected with lines. Both axis and the graph are labeled. Each axis has a scale and
a label. The whole graph has a label. If there was more than one point with the same height value, they would each be
graphed. Notice how the scatter plot does not use intervals; this is different from a histogram.
Describe the relationship you see between Max Height and Max Speed:
A scatter plot is useful for seeing relationships between two variables. Do you think there is a relationship between Arm
Length and Average Trial Time in your data?
Create a scatter plot with Average Trial Time on the horizontal and Arm Length on the vertical. Join with one or two
other groups so that you will have at least ten points to plot.
How did you determine the scale for the horizontal axis?
How did you determine the scale for the vertical axis?
Remember the Roller Coaster Scatter Plot? Here it is again, drawn twice. The one on the right has a “line of best fit”.
If your data is close to the line of best fit, then we can say there is a relationship or correlation between the data on the
x-axis and the data on the y-axis. The closer your data is to the line, the stronger the correlation is. The correlation is
either positive (positive slope), negative (negative slope) or zero (no slope, no correlation).
Go back to the previous page and add a line of best fit to your scatter plot.
Describe how the correlation strengthens, weakens or changes your prior answer about the relationship between
Average Trial Time and Arm Length.