CH 1 Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

AP Stats Ch 1 Notes Name: ____________________________

Chapter 1 Notes – Data Analysis

1.1 – Analyzing Categorical Data


Definitions:
Individuals:
 Objects described by a set of data. People, animals, or things.
 Examples:
Variables:
 Any characteristic of an individual. Takes different values for different individuals.
 Examples:
Categorical Variable:
 Places individuals into one of several groups or categories
 Examples:
Quantitative Variable:
 Takes on numerical values
 Examples:
Distribution:
 Tells us what values the variables take on and how often it takes these values

Check your Understanding:


Jake is a car buff who wants to find out more about the vehicles that students at his school drive. He gets permission to
go to the student parking lot and record some data. Later, he does some research about each model of a car online. He
makes a spreadsheet that includes the each car's make. Model, year, color, number of cylinders, gas mileage, weight,
and whether it has a backup camera.
1. What are the individuals in ake's study?

2. What variables did Jake measure? Identify each as categorical or quantitative.

Bar Graphs:

Pie Charts:

(Area Principle)
Definitions:
Marginal Distribution/Relative Frequency:
 The distribution of values of a categorical variable in a two way table among all individuals described by the
table.
Conditional Distribution:
 Describes the values of a variable among the individuals who have a specific value of another variable. There is a
separate conditional distribution for each value of the other variable.
Association:
 An association exists if knowing the value of one variable helps predict the value of the other. If knowing the
value of one variable does not help you predict the value of the other, there is no association between the
variables.

Examples:
We will collect data regarding our gender and ideal super power in the table below:

Superpower Male Female Total

Fly

Time Travel

Invisibility

Super Strength

Telepathy

Total

1. Use the two-way table to calculate the marginal distributions (in precents) of superpower preferences.

2. Make a graph to display the marginal distribution. Describe what you see.

3. Find the conditional distributions between superpower preference and gender.

4. Is there an association between gender and preference of superpower? Give appropriate evidence to support your
answer.
1.2 – Displaying Quantitative Data
We are now switching to QUANTITATIVE variables from categorical variables.

Dotplots
How good was the 2012 U.S. women’s soccer team? With players like Abby Wambach,
Megan Rapinoe, and Hope Solo, the team put on an impressive showing enroute to
winning the gold medal at the 2012 Olympics in London. Here are data on the number of goals scored by the team in the
12 months prior to the 2012 Olympics.

Create a dotplot displaying the data.

Stemplots
Also called Stem-and-Leaf plots
How many pairs of shoes does a typical teenager have? To find out, a group of AP Statistics students conducted a survey.
They selected a random sample of 20 female students from their school. Then they recorded the number of pairs of
shoes that each respondent reported having. Here
are the data:
50 26 26 31 57 19 24 22 23 38

13 50 13 34 23 30 59 13 15 51
Create a stemplot of the data.

Back-to-Back Stemplots
If we have multiple variables we are considering
Who is taller? Males or females? A sample of 14-year-olds from the UK
was randomly selected. Here is the heights of the students (in cm):
Male: 154, 157, 187, 163, 167, 159, 169, 162, 176, 177, 151, 175, 174,
165, 165, 183, 180
Female: 160, 160, 152, 167, 164, 163, 160, 163, 169, 157, 158, 153, 161,
165, 165, 159, 168, 153, 166, 158, 158, 166

Here is a back-to back stemplot comparing male and female heights:


Discrete Data: Countable
Examples:
 Number of students present
 Number of pepperonis on a pizza
 Number of heads flipped when flipping three coins
 Students grade level
Continuous Data: Value is obtained by measuring, can always be more specific than what is stated
Examples:
 Height of students in class
 Weight of students in class
 Time it takes to get to school
 Distance traveled between class
Histograms:
 Divide data into classes/bins of equal width
 Find the frequency or relative frequency of individuals in each class
 Label and scale your axis and draw your histogram.

Example:
What percent of your home state’s residents were born outside the United States? A few years ago, the country as a
whole had 12.5% foreign-born residents, but the states varied from 1.2% in West Virginia to 27.2% in California. The
following table presents the data for all 50 states.
a) Who/what is the individual?

b) What is the variable?

Create classes of equal width

Create frequency table

Create relative frequency table

Create histogram:
Describing graphs
 Look at the OVERALL pattern and any departures from that pattern
 SOCS

o SHAPE

o OUTLIERS

o CENTER

o SPREAD

Let's look at a dotplot of adolescent internet use:


 SOCS

o SHAPE

o OUTLIERS

o CENTER

o SPREAD
1.3 – Describing Quantitative Data with Numbers
Measuring Center
TWO ways to measure center - Mean and Median
Mean

Example:
Here is a stemplot of the travel times to work for the sample of 15 North Carolinians

a) Find the mean travel time for all 15 workers

b) Calculate the mean again, this time excluding the person who reported a 60-minute travel time to work. What do you
notice?

"Fair Share" value


Median

Comparing mean vs. median:


The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric,
the mean and median are exactly the same. In a skewed distribution, the mean is usually farther out in the long tail than
is the median.

Here is the stemplot of travel times to work for 20 randomly selected New Yorkers. Earlier, we found that the median
was 22.5 minutes.

1. Based only on the stemplot, would you expect the mean travel time to be less than, about the same as, or larger than
the median? Why?
2. Use your calculator to find the mean travel time. Was your answer to Question 1 correct?

3. Would the mean or the median be a more appropriate summary of the center of this distribution of drive times?
Justify your answer.

Measuring Spread

Range

Interquartile Range (IQR)

Identifying Outliers

The 1.5 x IQR Rule

Creating Boxplots

The five number summary

Putting it all Together

Barry Bonds set the major league record by hitting 73 home runs in a single season in 2001. On August 7, 2007, Bonds hit
his 756th career home run, which broke Hank Aaron’s longstanding record of 755. By the end of the 2007 season when
Bonds retired, he had increased the total to 762. Here are data on the number of home runs that Bonds hit in each of his
21 complete seasons. Make a boxplot of these data.

You might also like