2018 Practice Exam - FR Scoring Guidelines

2018 AP® Statistics Practice Exam – Scoring Guidelines
Question 1
1. At a recent country music awards show, the diversity of the gender and ages of the award recipients
was observed. The ages of the first thirty males and the first thirty females to receive an award
(including group awards) was recorded. Boxplots of the ages by gender of these award recipients are
presented in the figure below.
(a) Write a few sentences to compare the distributions of ages for male and female award recipients.
(b) The male award winner who was 74 years old was identified as a potential outlier. If this award
winner had been 54 instead of 74 years old, what effect would this decrease have on the
following statistics? Justify your answers.
The interquartile range of male ages:
The standard deviation of male ages:
AP® is a trademark registered by the College Board, which was not involved in the production of, and does not endorse, this exam.
Question 1
Intent of Question
The primary goals of this question were to assess a student's ability to (1) compare features of two
distributions displayed in boxplots and (2) determine the effect of changing one data value on the
interquartile range and standard deviation.
Solution
Part (a):
Female award winners are typically younger than male award winners. The median age for the male
award winners was about 40 years old, whereas the median age for the female award winners was
much smaller, at around 28 years old. Because the boxes do not overlap, 75% of the males are older
than 75% of the females. Both age distributions have a lot of variability, with female ages ranging
from 16 to 70 and male ages ranging from 20 to 74. Though the two distributions have the same
range of 54 years, males have a larger interquartile range indicating that males have more variability
among the middle 50% of ages. The distribution of ages for males has one high outlier of 74 years,
but the distribution of ages for females has several high outliers of 50, 51, 64, and 70 years.
Part (b):
The interquartile range of male ages: If the male award winner who was 74 were instead 54 years old
the interquartile range of male ages would not change because both 54 and 74 are above the third
quartile.
The standard deviation of male ages: If the male award winner who was 74 were instead 54 years
old, the standard deviation would decrease because this change would decrease the typical distance
to the mean age.
Scoring
Parts (a) and (b) are scored as essentially correct (E), partially correct (P), or incorrect (I).
Part (a) is scored as follows:
Essentially correct (E) if the response includes the following four components:
1. A correct comparison of center.
2. A correct comparison of spread.
3. A discussion of the outliers in both distributions.
4. The response is in context.
Partially correct (P) if the response includes only three of the four components.
Incorrect (I) if the response includes at most two of the four components.
Note: Any mention of shape should be ignored because complete shape information cannot be
determined from a boxplot.
Question 1
Part (b) is scored as follows:
1. Comments that the interquartile range will change very little
2. Correctly justifies why the interquartile range will change very little
3. Comments that the standard deviation will decrease
4. Correctly justifies why the standard deviation will decrease
Partially correct (P) if the response includes only two or three of the four components.
Incorrect (I) if the response comments on at most one of the four components.
4 Complete Response
Both parts essentially correct
3 Substantial Response
One part essentially correct and one part partially correct
2 Developing Response
One part essentially correct and one part incorrect

OR
Both parts partially correct
1 Minimal Response
One part partially correct and one part incorrect

Question 2
2. An elementary school principal is interested in determining whether obesity is a common problem

among the children who attend the school. Body Mass Index (BMI) is a measure of relative weight
based on an individual's mass and height, and is considered a better measure of obesity than weight
alone. High values can be used to indicate obesity. The BMI assessment takes some time to complete
so it will not be feasible to assess every child. Some local university researchers are willing to help
complete the BMI assessments for a sample of children in the school.
The school is organized in wings as depicted below. Each wing contains four classrooms from the
same grade level. There are a total of 24 classrooms across grades K through 5. Each classroom
contains 20 children.
1 2 3 4
The Kindergarten Wing
5 6 7 8
The 1st Grade Wing
9 10 11 12
The 2nd Grade Wing
13 14 15 16
The 3rd Grade Wing
17 18 19 20
The 4th Grade Wing
21 22 23 24
The 5th Grade Wing

Question 2
(a) For convenience, the researchers want to use a cluster sampling method, in which the classrooms
are clusters. Their goal is to assess 60 children. Describe a process for randomly selecting
classrooms and identifying the sample of children using this method.
(b) One of the researchers suggests that an alternative sampling method would be to select a
stratified random sample. Describe a process for randomly selecting 60 children using this
sampling strategy and justify your selection of the stratification variable.
(c) In the context of this situation, give one statistical advantage of using a stratified random
sampling method as opposed to a cluster sampling method that uses classrooms as clusters.
Question 2
Intent of Question
The primary goals of this question were to assess a student's ability to (1) describe a process for
implementing cluster sampling; (2) identify a plausible stratification variable; (3) describe a process for
implementing stratified random sampling; and (4) describe a statistical advantage of stratified sampling
over cluster sampling in a particular situation.
Solution
Part (a):
Three classrooms of 20 children are necessary to obtain a sample of 60 children. Note that the
classrooms have already been numbered 1 to 24. To select three of these classrooms, use a calculator
or computer to generate 3 random numbers between 1 and 24 without replacement. If a random
number generator is used that generates non-unique numbers, the repeated numbers are ignored until
3 unique numbers are obtained. The classrooms whose numbers correspond to the randomly
generated numbers are selected and all 20 children within each selected classroom are included in
the sample for a total of 60 children.
Part (b):
Because BMI could vary based on a child's age or grade level, it would be advantageous to have
children of all ages and grade levels represented in the sample. Obesity can be present in young
children, but it is generally more common as children develop and grow older. Age and grade level
are not exactly the same thing as children can repeat grades. However, age and grade level are
generally expected to be highly correlated. Therefore, I am selecting grade level as my stratification
variable.
To obtain a stratified random sample of 60 children using grade level as strata, we would need to
randomly select 10 children from each of the six grade levels. In each grade level, number the
children from 1 to 80. There are 80 children in each grade because each grade has 4 classrooms of 20
children. Within each grade, use a calculator or computer to generate 10 random numbers between 1
and 80 without replacement. If a random number generator is used that generates non-unique
numbers, the repeated numbers are ignored until 10 numbers are obtained. Select the children whose
numbers correspond to the randomly generated numbers.
Part (c):
The cluster sampling procedure in part (a) will produce a sample with no children in three or more of
the grade levels. For example, a cluster sample of three classrooms could include one fourth grade
classroom and two fifth grade classrooms and thus over represent older children among whom
obesity is more common. Stratified random sampling, where the six strata are the six grade levels,
guarantees a sample that includes children from all grade levels. This method would therefore yield a
sample that is representative of all grade levels and represents each grade in its proper proportion.
Question 2
Scoring
This question is scored in four sections. Section 1 consists of part (a), Section 2 consists of a justification
of the stratification variable used in part (b), Section 3 consists of a description for implementing
stratified random selection in part (b), and Section 4 consists of part (c). Each of the four sections is
scored as essentially correct (E), partially correct (P), or incorrect (I).
Section 1 is scored as follows:
Essentially correct (E) if the response correctly addresses the following two components:
1. Indication that three classrooms are randomly selected and that all 20 children in each of the
selected classrooms are included in the sample.
2. Description of a valid random sampling procedure for selecting three classrooms that could
be implemented after reading the response (so that two knowledgeable statistics users would
use the same method to select the classrooms).
Partially correct (P) if the response includes exactly one of the two components listed above.
Incorrect (I) if the response includes neither of the two components listed above OR the response
does not involve taking a random sample of three classrooms from the 24 total classrooms.
Note: Failing to explicitly deal with the issue of potentially repeated random numbers loses credit for
the second component.
Essentially correct (E) if the response includes the following two components:
1. Selection of grade level as the stratification variable.
2. Explanation of how obesity is a problem that develops with age, or discussion of the need to
include all ages or grade levels because the grade levels might not have the same prevalence
of obesity.
Incorrect (I) if the response fails to meet the criteria for E or P.
Essentially correct (E) if the response correctly addresses the following two components:
1. Indication that all children within each grade level will be numbered on separate lists, and
that a random selection process will be implemented separately for each of the six grade
levels.
2. Description of a valid random sampling procedure for selecting 10 children within each
grade level that could be implemented after reading the response (so that two knowledgeable
statistics users would use the same method to select the children).
Question 2
Incorrect (I) if the response includes neither of the two components listed.
Note: Failing to explicitly deal with the issue of potentially repeated random numbers loses credit for
the second component.
Essentially correct (E) if the response indicates the following two components:
1. Indication that stratified sampling ensures that each grade level will be represented in the sample.
2. Indication that cluster sampling could possibly overrepresent or underrepresent older children in
the sample.
Each essentially correct (E) section counts as 1 point. Each partially correct (P) section will count as
½ point.
4 Complete Response
1 Minimal Response
If a response is between two scores (for example, 2½ points), use a holistic approach to decide
whether to score up or down, depending on the overall strength of the response and communication.
The process of randomly selecting the correct number of units and using a correct randomizing
method should be given more weight in rounding decisions than the handling of repeats.
Question 3
3. A softball player wants to investigate the number of times she can hit the ball in successive attempts.
The player plans to swing at every pitch so that she will either hit the ball or miss it on each attempt.
Assume the player hits the ball 25% of the time and that her attempts are independent.
(a) What is the probability the player gets her third hit on the fourth attempt?
The player continued her inquiry for several weeks. Each day the player was given up to six attempts
to get three hits. Let the random variable X represent the number of attempts required for the player
to get three hits given that she is successful at hitting the ball three times in her six attempts. The
table below gives the observed relative frequencies for each possible value of X.
x 3 4 5 6
Observed Probability of x 0.09 0.21 0.31 0.39
(b) Use the given relative frequencies to find the mean and standard deviation of X.
(c) Two spectators who are watching the softball player decide to set up a bet between themselves.
Spectator A believes that the next time the player is successful at getting three hits in her six
attempts that it will only take 3 or 4 attempts, and spectator B believes that it will take 5 or 6
attempts. The spectators have agreed that spectator A will receive $20 if she is correct. Based on
the observed relative frequencies of X, how much should spectator B receive to ensure this is a
fair bet? A fair bet is one in which both parties have zero expected winnings.
Question 3
Intent of the Question
The primary goals of this question are to assess a student's ability to (1) perform probability calculations
involving independent events using the multiplication rule and (2) calculate and use expected values.
Solution
Part (a):
The player can get her third hit on the fourth attempt in three different ways: the one miss comes on
either the first attempt, the second attempt, or the third attempt. Thus, the probability the player gets
her third hit on the fourth attempt is:
(0.75)(0.25)(0.25)(0.25) + (0.25)(0.75)(0.25)(0.25) + (0.25)(0.25)(0.75)(0.25) = 3(0.75)(0.25)3 ≈ 0.0352
Part (b):
The mean number of attempts is 3(0.09) + 4(0.21) + 5(0.31) + 6(0.39) = 5.
The standard deviation of the number of attempts is:

�(3 − 5)2 × (0.09) + (4 − 5)2 × (0.21) + (5 − 5)2 × (0.31) + (6 − 5)2 × (0.39) = √0.96 ≈ 0.9798
Part (c):
From the data collected over several weeks, the probability the player can get three hits in 3 or 4
attempts is 0.09 + 0.21 = 0.30 and the probability the player will take more than 4 attempts to get
three hits is 0.7. Let c represent the amount of money spectator B should receive. For the expected
winnings of both parties to be equal we need:
20(0.3) = c(0.7)
Solving this equation for c, we have that spectator B should receive $8.57 for this to be a fair bet.
Scoring
Parts (a), (b), and (c) are scored as essentially correct (E), partially correct (P), or incorrect (I).
Part (a) is scored as follows:
Essentially correct (E) if the response gives the correct probability with work shown.
Partially correct (P) if the response doesn't demonstrate that there are three different orderings for
how to get the third hit on the fourth attempt or completes the calculation for only one ordering;
OR
if the response demonstrates correct use of the multiplication rule using the correct values for the
probability of a hit or not but makes an error in performing the calculations.
Question 3
Incorrect (I) if the response does not meet the criteria for E or P.
Part (b) is scored as follows:
Essentially correct (E) if both the mean and standard deviation are calculated correctly and work is
shown, with the exception of minor arithmetic errors.
Partially correct (P) if either the mean or the standard deviation is calculated correctly with work
shown but not both.
Part (c) is scored as follows:
Essentially correct (E) if the value to make this a fair bet is calculated correctly with work shown.
Partially correct (P) if the correct value to make this a fair game is given but no work is shown
OR
if appropriate work is shown but the answer is incorrect or missing.
Incorrect (I) if the responses does not meet the criteria for E or P.
4 Complete Response
All three parts essentially correct
Two parts essentially correct and one part partially correct
Two parts essentially correct and one part incorrect

OR
One part essentially correct and one or two parts partially correct
OR
Three parts partially correct
1 Minimal Response
One part essentially correct and two parts incorrect

OR
Two parts partially correct and one part incorrect
Question 4
4. Harpy eagles live in the rain forests of Central and South America and are one of the largest species
of eagle. For many birds of prey, females are larger than males. A biologist is interested in
investigating whether this phenomenon is also true of the harpy eagle. She selects random samples
of 8 female harpy eagles and 9 male harpy eagles. The weights of the eagles, in pounds, were
recorded, as shown in the table below.
Standard
Weight of Eagle Mean
Deviation
Females
13.5 14.1 15.1 16.0 17.6 14.9 14.8 13.6 14.95 1.353
(nF = 8)
Males
13.4 13.9 13.5 12.8 14.5 12.7 13.6 13.1 12.2 13.30 0.689
(nM = 9)
Do the data provide convincing evidence that the mean weight of female harpy eagles is greater than
the mean weight of male harpy eagles?
Question 4
Intent of Question
The primary goal of this question was to assess a student's ability to identify, set up, perform, and
interpret the results of a significance test. More specific goals were to assess a student's ability to (1)
state appropriate hypotheses; (2) identify the appropriate statistical test procedure and check appropriate
conditions for inference; (3) calculate the appropriate test statistic and p-value; and (4) draw an
appropriate conclusion, in the context of the study.
Solution
Step 1: States a correct pair of hypotheses.
Let 𝜇𝜇F represent the population mean weight of female harpy eagles and 𝜇𝜇M represent the population
mean weight of all male harpy eagles.
The hypotheses to be tested are H0 : 𝜇𝜇F = 𝜇𝜇M versus H𝑎𝑎 : 𝜇𝜇F > 𝜇𝜇M .
Step 2: Identifies a correct test procedure (by name or formula) and checks appropriate conditions.
The appropriate procedure is a two-sample t-test.
The first condition is that the samples are independent random samples from the two populations.
This was stated in the question.
The second condition is that the population distributions of eagle weights are normal. The following
dotplots reveal no obvious departures from normality, so it appears reasonable to proceed with the
two-sample t-test.
Step 3: Demonstrates correct mechanics, including the value of the test statistic, df, and p-value.
𝑥𝑥̅ F − 𝑥𝑥̅M 14.95 − 13.30

The test statistic is: 𝑡𝑡 = = ≈ 3.109
2 2
𝑠𝑠 2 2
𝑠𝑠M �1.353 + 0.689
� F 8 9
𝑛𝑛F + 𝑛𝑛M
With df = 10.13, p-value = 0.0055.

Question 4
Step 4: States a correct conclusion in the context of the study, using the result of the statistical test.
Because the p-value is smaller than any conventional significance level (such as 𝛼𝛼 = 0.05 or 𝛼𝛼 =
0.01), we reject the null hypothesis. The data provide convincing statistical evidence that the mean
weight of female harpy eagles is greater than the mean weight of male harpy eagles.
Scoring
Steps 1, 2, 3, and 4 are scored as essentially correct (E), partially correct (P), or incorrect (I).
Step 1 is scored as follows:
Essentially correct (E) if the response identifies correct parameters AND both hypotheses are labeled
and state the correct relationship between the parameters.
Partially correct (P) if the response identifies correct parameters OR states correct relationships, but
not both.
Note: Either defining the parameters in context, or simply using common parameter notation with
subscripts clearly relevant to the context, such as 𝜇𝜇F or 𝜇𝜇M , is sufficient.
Essentially correct (E) if the response correctly includes the following three components:
1. Identifies the correct test procedure (by name or by formula)
2. Checks for independent random samples
3. Checks for normality
Partially correct (P) if the response correctly includes only two of the three components.
Incorrect (I) if the response correctly includes at most one of the three components.
Notes:
• A two-sample z-test is not a correct test procedure in this case, but if both conditions are checked
correctly, this step is scored as partially correct.
• If a student chooses to conduct a pooled t-test, the plausibility of the equal variances assumption
must be addressed to get credit for choosing the appropriate test procedure.
• To get credit for the check of independent random samples, students must indicate that more
than one random sample was taken.
• To get credit for the normality condition, students must include correct graphs of both
distributions and include an appropriate comment about shape or outliers.
• Ignore additional conditions listed, as long as they are correct, such as "the sample sizes must be
less than 10 percent of the population sizes." However, if the student includes additional
incorrect conditions, such as np > 10, reduce the score in this step by one level (that is, from E to
P or P to I).
Question 4
Essentially correct (E) if the response correctly calculates both the test statistic and a p-value that is
consistent with the stated alternative hypothesis.
Partially correct (P) if the response correctly calculates the test statistic but not the p-value OR omits
the test statistic but correctly calculates the p-value.
Notes:
• It is acceptable for students to use the conservative df (df = 7) or use the t-table to get a p-value
between 0.05 and 0.10.
• Students who incorrectly choose a two-sample z-test lose credit for identifying the correct test
procedure in step 2 but can earn full credit in step 3 if they provide the correct z-statistic (z =
3.109) and p-value (p-value = .0009).
Essentially correct (E) if the response provides a correct conclusion in context, with justification
based on linkage between the p-value and conclusion.
Partially correct (P) if the response provides a correct conclusion, with linkage to the p-value, but not
in context OR provides a correct conclusion in context, but without justification based on linkage to
the p-value.
Notes:
• The conclusion must be about the mean eagle weights to get credit for context.
• The conclusion must be related to the alternative hypothesis.
• If no significance level 𝛼𝛼 is given, the solution must be explicit about the linkage by giving a
correct interpretation of the p-value or explaining how the conclusion follows from the p-value.
For example, stating that because the p-value is small, we reject the null hypothesis or stating
that because the p-value is large, we do not reject the null hypothesis.
• If the p-value is incorrect, then step 4 is scored as E if the response includes proper linkage and a
conclusion in context consistent with the p-value.
• If the p-value is less than 0.05, wording that states or implies that the alternative hypothesis is
proven lowers the score one level (that is, from E to P or P to I) in step 4.
• If the p-value is incorrect and greater than 0.05, wording that states or implies that the null
hypothesis is accepted lowers the score one level (that is, from E to P or P to I) in step 4.
Question 4
Each essentially correct (E) step counts as 1 point. Each partially correct (P) step counts as ½ point.
4 Complete Response
1 Minimal Response
If a response is between two scores (for example, 2½ points), score down.

Question 5
5. The U.S. speed skaters heading to the 2018 Pyeongchang Winter Olympics needed to investigate
reasons for the team's poor showing at the 2014 Sochi Winter Olympics. There were many potential
reasons presented to explain this poor performance, including that the U.S. speed skating team
trained in Collalbo, Italy at elevation 3,792 ft but Sochi, Russia is at sea level (elevation 0 ft).
(a) In the months leading up to the Sochi Olympics, the U.S. speed skaters competed in many
international skating competitions to prepare for the Olympic Games. One competition was
located in Salt Lake City, Utah (elevation 4,330 ft) in November 2013 and the other in Berlin,
Germany (elevation 164 ft) in December 2013. For these two competitions the times for 14
athletes in the men's 1500m were recorded in both races. The difference in finish times was
calculated for each athlete and summary statistics are shown below.
Mean Standard Deviation

Salt Lake City – Berlin −3.37 seconds 0.87 seconds
Assuming that all conditions for inference have been met, construct a 95% confidence interval
for the mean difference in 1500m times between the Salt Lake City and Berlin competitions.
Does this interval suggest that there is a difference in mean finish times when two competitions
are held at different elevations? Explain.
(b) The completion times for 15 athletes who competed in both the Berlin event and in the 1500m
final at the Sochi Olympics (two cities with similar elevation) were also compared and the
difference in finish times (Berlin – Sochi) was calculated for each athlete. From this data, a 95%
confidence interval for the mean difference in 1500m times between the Berlin and Sochi
competitions is 0.16 ± 0.68 seconds. Assuming that all conditions for inference have been met,
does this interval suggest that there is a difference in mean finish times when two competitions
are held at similar elevations? Explain.
Question 5
A scatterplot showing the finishing times by country for the 10 athletes who competed in all three
events are shown below. Shani Davis, the U.S. speed skater whose position in the plot is circled,
won silver medals in the men's 1500m during both the 2006 and 2010 Olympics but finished 11th in
Sochi. He is also set to compete in the 2018 Pyeongchang Winter Olympics.
Scatterplot: Change in Times for Athletes Who Completed All Three Events
(c) Use the scatterplot to compare Shani's finish times across all three competitions (Salt Lake at
elevation 4,330 ft, Berlin at elevation 164 ft, and Sochi at elevation 0 ft) and comment on the
difference in Shani's performance when competing in cities with different elevations compared to
his performance when competing in cities with similar elevations.
(d) Using all the information in this question, what recommendation would you make to the U.S.
Speed Skating Association in regards to the elevations of the cities where the team trains for
future Olympics?
Question 5
Intent of Question
The primary goals of this question were to assess a student's ability to (1) construct and interpret
confidence intervals for paired data and (2) use information from a scatterplot to draw an appropriate
conclusion.
Solution
Part (a):
𝑠𝑠
A 95% confidence interval for the population mean difference is given by 𝑑𝑑̅ ± 𝑡𝑡 ∗ � �. The critical
√𝑛𝑛
value for 95% confidence, based on 14 – 1 = 13 degrees of freedom, is 𝑡𝑡 ∗ = 2.160. The 95%
confidence interval for the population mean difference in race times (Salt Lake City – Berlin) is
 0.87 
−3.37 ± 2.160  
 14 
–3.37 ± 0.50
–3.87 to –2.87 seconds.
As this interval does not contain zero, it suggests there is a significant mean difference in the finish
times. As the interval contains only negative values when subtracting Berlin times from Salt Lake
City times it suggests that completion times of 1500m speed skate races are faster at higher
elevations.
Part (b):
The 95% confidence interval for the population mean difference in race times (Berlin – Sochi) is
−0.52 to 0.84 seconds. As this interval does contain zero, it suggests that there is not a significant
mean difference in the finish times of the 1500m speed skate competitions when held in similar
elevations.
Part (c):
Shani Davis' finish time in Salt Lake was about 4.7 seconds faster than his finish time in Berlin.
However, he was a little faster in Sochi than he was in Berlin but only by 0.7 seconds. Thus, his
performance at a lower elevation was much slower than his performance at a higher elevation. His
performances at two events at a lower elevation were similar, but he did demonstrate improvement
from the first event to the second event. Shani's improvement from Berlin to Sochi suggests it is
possible that if he had more training time at lower elevations, his finish time at the Sochi Olympics
may have improved even more.
Part (d):
I would suggest that for future Olympics the team should train in cities with a similar elevation to
where the Olympics will be held.
Question 5
Scoring
This question is scored in three sections. Section 1 consists of the calculation of the interval in part (a),
Section 2 consists of the interpretations of the intervals in parts (a) and (b), and Section 3 consists of
parts (c) and (d). Each of the three sections is scored as essentially correct (E), partially correct (P), or
incorrect (I).
Essentially correct (E) if a 95% confidence interval is correctly computed with work shown.
Partially correct (P) if a correct method (paired t-interval) is used but either an incorrect t critical
value is used or there are errors in the calculation of the interval.
Incorrect (I) if the response does not meet the criteria for an E or P.
Essentially correct (E) if the response includes the following two components:
1. Correctly determines whether or not the interval calculated in part (a) provides evidence of a
difference in mean finish times for competitions held at different elevations and links this
decision to whether or not the calculated interval contains 0.
2. Concludes that because the interval given in part (b) includes 0 it implies there is not a
significant difference in mean finish times for competitions held at similar elevations.
Partially correct (P) if the response only includes one of these two components.
Incorrect (I) if the response does not meet the criteria for an E or P.
Note: A response that incorrectly concludes that 0 is in the interval given in part (b), perhaps because
the endpoints of the interval are calculated incorrectly, does not satisfy component 2.
Essentially correct (E) if the response includes the following three components:
1. States that Shani's time was faster in Salt Lake than in Berlin and points out that his finish
times were quite different when competing in cities with differing elevations.
2. States that Shani's time was only slightly faster in Sochi than Berlin (or that the two times
were close) and points out that his finish times were very similar when competing in cities
with similar elevations.
3. Recommends that the team train in cities with elevation similar to the city in which the
competition will be held.
Partially correct (P) if the response only includes two of the three components.
Incorrect (I) if the responses does not meet the criteria for E or P.
Question 5
4 Complete Response

OR
OR
1 Minimal Response

OR
Question 6
6. A researcher is trying to estimate the unknown proportion p of individuals with a rare genetic trait in
a particular population. The researcher will take a sample of n individuals from this population and
count the number of individuals, X, in the sample that possess this genetic trait.
Suppose that a sample consists of n = 20 trials from this binomial process with success probability p.
In other words, let the random variable X, the number who possess this genetic trait, have a binomial
probability distribution with parameters n = 20 and unknown success probability p.
(a) Suppose the population proportion of individuals who possess this genetic trait is 0.03. A
simulation was conducted in which 1,000 random samples of size n = 20 were taken from this
𝑋𝑋
population and the point estimate 𝑝𝑝̂ = 𝑛𝑛 calculated. The histogram below displays the
distribution of the 1,000 simulated sample statistics, 𝑝𝑝̂ .
Summary Statistics
Mean of 𝑝𝑝̂ values: 0.030

Std Dev of 𝑝𝑝̂ values: 0.038
Simulated Values of 𝑝𝑝̂
Based on these simulation results, does it appear that 𝑝𝑝̂ is an unbiased estimator of the population
proportion p of individuals who possess this genetic trait? Explain.
(b) For each of the 1,000 sample proportions obtained in part (a), a 95% confidence interval for the
population proportion p of individuals who possess this genetic trait was constructed using the
usual one-proportion z-interval formula. Of the 1,000 intervals, 469 or 46.9% succeeded in
capturing the population proportion p of individuals who possess this genetic trait within the
endpoints of the interval. Explain why the proportion of intervals in the simulation that
succeeded in capturing the parameter p was much less than 95%.
Question 6
Another estimator for p that can provide a statistical advantage over the conventional estimator 𝑝𝑝̂ in
certain situations is the following:
𝑋𝑋 + 2
𝑝𝑝� = .
𝑛𝑛 + 4
(c) Carry out the calculations below to investigate the relationship between 𝑝𝑝̂ and 𝑝𝑝�.
(i) Suppose that the sample results in 5 individuals who possess this genetic trait among the 20
trials. Determine the values of 𝑝𝑝̂ and 𝑝𝑝�.
(ii) Suppose now that the sample results in 12 successes among the 20 trials. Determine the
values of 𝑝𝑝̂ and 𝑝𝑝�.
(iii) Are there any sample results for which the values of 𝑝𝑝̂ and 𝑝𝑝� would be the same? Justify
your answer.
(d) A simulation was conducted in which 1,000 random samples of size n = 20 were taken from this
𝑋𝑋+2
population and the point estimate 𝑝𝑝� = 𝑛𝑛+4 calculated. The histogram below displays the
distribution of the 1,000 simulated sample statistics, 𝑝𝑝�.
Summary Statistics
Mean of 𝑝𝑝� values: 0.107

Std Dev of 𝑝𝑝� values: 0.031
Simulated Values of 𝑝𝑝�
For each of the 1,000 sample proportions obtained, a 95% confidence interval for the population
proportion p of individuals who possess this genetic trait was constructed using 𝑝𝑝� in place of 𝑝𝑝̂ in
the usual one-proportion z-interval formula and n + 4 in place of n. Of the 1,000 intervals, 976 or
97.6% succeeded in capturing the population proportion p of individuals who possess this
genetic trait within the endpoints of the interval.
Based on these simulation results, does it appear that 𝑝𝑝� is an unbiased estimator of the population
proportion p of individuals who possess this genetic trait? Explain.
Question 6
(e) Based on comparing the summary statistics for the simulation results in parts (a) and (d), state a
statistical advantage of the estimator 𝑝𝑝̂ .
(f) Based on comparing the summary statistics for the simulation results in parts (a) and (d), state a
statistical advantage of the estimator 𝑝𝑝�. Explain why this statistical advantage makes sense, given
that the new statistic 𝑝𝑝� is calculated by adding 2 to the numerator and 4 to the denominator of p̂ .
Question 6
Intent of Question
The primary goals of this investigative task are to assess a student’s ability to: (1) recognize an unbiased
estimator and explain why the estimator is unbiased; (2) identify the reason a stated level of confidence
is not achieved; (3) perform calculations related to summary statistics not previously studied; and (4)
explain statistical advantages of competing estimators.
Solution
Part (a):
Since the mean of the distribution of simulated sample proportions is equal to the long-run process
probability of 0.03, 𝑝𝑝̂ is an unbiased estimator of the proportion of individuals in this population that
possess this genetic trait.
Part (b):
In order for a normal distribution to be an appropriate model for a binomial distribution, the
expected number of successes and failures should each be at least 10. With a success probability of
0.03 and a sample size of only 20, the expected number of successes, np, is 0.03(20) = 0.6, thus a
normal model is not appropriate to use.
Part (c):
(i) Values of X = 5 and n = 20 give 𝑝𝑝̂ = 5/20 = 0.25 and 𝑝𝑝� = 7/24 ≈ 0.29.
(ii) Values of X = 12 and n = 20 give 𝑝𝑝̂ = 12/20 = 0.6 and 𝑝𝑝� = 14/24 ≈ 0.58.
(iii) The values of 𝑝𝑝̂ and 𝑝𝑝� will be the same when
𝑋𝑋 𝑋𝑋 + 2
= .
𝑛𝑛 𝑛𝑛 + 4
Solving for X gives n X + 4X = n X + 2n or X = n / 2. That is, the estimators will be the same
value when half the sample are successes.
Part (d):
Since the mean of the distribution of simulated sample proportions, 0.108, is much different from the
population proportion of 0.03, 𝑝𝑝� is not an unbiased estimator of the proportion of individuals in this
population that possess this genetic trait.
Part (e):
A statistical advantage of the estimator 𝑝𝑝̂ is that it is an unbiased estimator of the population
proportion.
Question 6
Part (f):
A statistical advantage of the estimator 𝑝𝑝� is that it has less variability. If an estimator is biased (not
centered at the target parameter), on average the values may still be closer to the unknown parameter
value of interest if the variability in the statistic from sample to sample is smaller.
It makes sense that 𝑝𝑝� has less variability because the denominator will be larger for 𝑝𝑝� than for 𝑝𝑝̂ ,
resulting in a smaller variance and a smaller standard deviation as observed in the simulation results.
Scoring
This question is scored in three sections. Section 1 consists of parts (a) and (b), Section 2 consists of
parts (c) and (d); and Section 3 consists of parts (e) and (f) Each of the three sections is scored as
essentially correct (E), partially correct (P), or incorrect (I).
Section 1 [parts (a) and (b)] is scored as follows:
1. Indicates that 𝑝𝑝̂ is a biased estimator in part (a)
2. Clearly demonstrates an understanding of what it means for an estimator to be biased in part (a)
3. States the condition necessary for the normal approximation to the binomial to be appropriate
in part (b)
4. Includes calculations to demonstrate that the condition required to use the normal
approximation is not met in part (b)
Partially correct (P) if the response includes two or three of the four components required for E.
Incorrect (I) if the response does not met the criteria for E or P.
Section 2 [parts (c) and (d)] is scored as follows:
1. Correctly determines the values of 𝑝𝑝̂ and 𝑝𝑝� in (c-i) and (c-ii)
2. Correctly determines when 𝑝𝑝̂ and 𝑝𝑝� will be equal with work shown in (c-iii)
3. Indicates that 𝑝𝑝� is an unbiased estimator in part (d)
4. Clearly demonstrates an understanding of what it means for an estimator to be unbiased in
part (d)
Partially correct (P) if the response includes two or three of the four components required for E.
Question 6
Section 3 [parts (e) and (f)] is scored as follows:
Essentially correct (E) if the response includes the following three components:
1. Identifies unbiasedness as the statistical advantage of 𝑝𝑝̂
2. Identifies minimum variance as the statistical advantage of 𝑝𝑝�
3. Explains how the larger denominator in 𝑝𝑝� leads to smaller variability
Partially correct (P) if the response includes only two of the three components required for E.
4 Complete Response

OR
OR
1 Minimal Response

OR

2018 Practice Exam - FR Scoring Guidelines

Uploaded by

Copyright:

Available Formats

2018 Practice Exam - FR Scoring Guidelines

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 Practice Exam - FR Scoring Guidelines

Uploaded by

Copyright:

Available Formats

2018 AP® Statistics Practice Exam – Scoring Guidelines

The interquartile range of male ages:

The standard deviation of male ages:

Part (a) is scored as follows:

Part (b) is scored as follows:

Both parts essentially correct

One part essentially correct and one part partially correct

One part essentially correct and one part incorrect

One part partially correct and one part incorrect

2. An elementary school principal is interested in determining whether obesity is a common problem

The Kindergarten Wing

The 1st Grade Wing

The 2nd Grade Wing

The 3rd Grade Wing

The 4th Grade Wing

The 5th Grade Wing

Section 1 is scored as follows:

Section 2 is scored as follows:

Incorrect (I) if the response fails to meet the criteria for E or P.

Section 3 is scored as follows:

Section 4 is scored as follows:

Incorrect (I) if the response fails to meet the criteria for E or P.

(0.75)(0.25)(0.25)(0.25) + (0.25)(0.75)(0.25)(0.25) + (0.25)(0.25)(0.75)(0.25) = 3(0.75)(0.25)3 ≈ 0.0352

The mean number of attempts is 3(0.09) + 4(0.21) + 5(0.31) + 6(0.39) = 5.

The standard deviation of the number of attempts is:

Part (a) is scored as follows:

Part (b) is scored as follows:

Part (c) is scored as follows:

All three parts essentially correct

Two parts essentially correct and one part partially correct

Two parts essentially correct and one part incorrect

One part essentially correct and two parts incorrect

Step 1: States a correct pair of hypotheses.

The appropriate procedure is a two-sample t-test.

𝑥𝑥̅ F − 𝑥𝑥̅M 14.95 − 13.30

With df = 10.13, p-value = 0.0055.

Step 1 is scored as follows:

Step 2 is scored as follows:

Incorrect (I) if the response fails to meet the criteria for E or P.

Step 4 is scored as follows:

If a response is between two scores (for example, 2½ points), score down.

Mean Standard Deviation

Section 1 is scored as follows:

Section 2 is scored as follows:

Section 3 is scored as follows:

All three parts essentially correct

Two parts essentially correct and one part partially correct

Two parts essentially correct and one part incorrect

One part essentially correct and two parts incorrect

Mean of 𝑝𝑝̂ values: 0.030

Simulated Values of 𝑝𝑝̂

Mean of 𝑝𝑝� values: 0.107

Simulated Values of 𝑝𝑝�

Section 1 [parts (a) and (b)] is scored as follows:

Section 2 [parts (c) and (d)] is scored as follows:

Section 3 [parts (e) and (f)] is scored as follows:

All three parts essentially correct

Two parts essentially correct and one part partially correct

Two parts essentially correct and one part incorrect

One part essentially correct and two parts incorrect