Sem 1 Review
Sem 1 Review
Sem 1 Review
1. Researchers looking at the relationship between the type of college attended (public or private) and
achievement gather the following data on 3265 people who graduated from college in the same year. The
variable “management level” describes their job description 20 years after graduating from college.
Type of College
Public Private
High 75 107
Management level Medium 962 794
Low 732 595
(a) Calculate the marginal distribution of management level in percents.
(b) Find the conditional distribution of management level for each college type, in percents.
(c) Write a brief description of what the information in (a) and (b) tells you about the relationship between
these variables.
2. Literary scholars sometimes use the distribution of word lengths in a work as a test of authenticity. Here are
the word lengths for the first 25 words on a randomly-selected page from Toni Morrison’s Song of Solomon.
2 3 4 10 2 11 2 8 4 3 7 2 7
5 3 6 4 4 2 5 8 2 3 4 4
(b) Describe the overall pattern of the distribution and any possible outliers.
3.The scores of a reference population on the Wechsler Intelligence Scale for Children (WISC) are
approximately Normally distributed with µ = 100 and s = 15.
(b) A score in what range would represent the top 1% of the scores?
(c) What proportion of the reference population has WISC scores below 110?
(d) What proportion of the reference population has WISC scores between 80 and 110?
(e) What is the interquartile range of WISC scores for the reference population?
4. Twenty students were asked to guess the age of a man in a photograph. Here are their guesses:
44 43 48 37 44 40 33 42 43 41
50 49 43 46 46 45 43 38 39 41
Are these guesses approximately Normally distributed? Provide evidence to support your answer.
5. Of the 50 species of oaks in the United States, 28 grow on the Atlantic coast and 11 grow in California. We
are interested in the distribution of acorn volumes among oak species. Here are back-to-back stemplots on
the volumes of acorns (in cubic centimeters) for these 39 oak species:
Use the stemplots to compare the distribution of acorn sizes between Atlantic Coast and California oak
species.
6. The Candela brothers own two pizza restaurants, one on Park Street and one on Bridge Road. The
computer output below summarizes the distribution of weekly revenues at each restaurant—26
weeks for Park Street and 40 weeks for Bridge Avenue.
(a) One week, Park Street’s revenues were $7500, which was the 15th highest revenue recorded for
that restaurant. In the same week, Bridge Road’s revenue was $7100, the 12th highest for that
restaurant. Use percentiles and z-scores to compare how successful each restaurant was that week,
relative to their typical weekly revenue.
(b) The weekly fixed operating costs for the Park Street restaurant are $3000, which means that net
weekly profit is weekly revenue minus $3000. Find the mean, median, standard deviation, and
interquartile range for net weekly profit.
7. Below is some information about the first ten United States Presidents.
Age at Age at
Name Political Party Inauguration Death State of Birth
George Washington Federalist 57 67 Virginia
John Adams Federalist 61 90 Massachusetts
Thomas Jefferson Democratic-Republican 57 83 Virginia
James Madison Democratic-Republican 57 85 Virginia
James Monroe Democratic-Republican 58 73 Virginia
John Quincy Adams Democratic-Republican 57 80 Massachusetts
Andrew Jackson Democrat 61 78 South Carolina
Martin Van Buren Democrat 54 79 New York
William H. Harrison Whig 68 68 Virginia
John Tyler Whig 51 71 Virginia
(b) Identify the variables that were recorded, and indicate whether each one is categorical or quantitative.
(c) Here is a pie chart for the distribution of the variable “State of birth.” Fill in the blanks with the
appropriate values of the variable.
3 22 35 43 49 57 70 92
13 25 35 43 50 59 70 98
15 31 37 45 50 63 74 157
19 33 37 46 53 65 80
21 35 38 48 56 66 82
(a) What measures would you use to describe the center and spread of these data? Justify your answer.
(d) For the oil well data on the previous page, how can you tell without doing any calculations, that the mean
of these data is larger than the median?
9. Below are cumulative frequency graphs for the age distributions of the populations of France and the
Philippines.
(a) Use the graphs to compare the median and interquartile range of ages in the two countries.
10. Below is a cumulative relative frequency graph for the length of time a group of 62 students spent on
a no-time-limit final exam in Algebra II.
(a) What are the median and interquartile range for the amount of time these students spent on the
exam? Draw lines on the graph to show how you arrived at your answers.
(b) According to these data, the mean time students spent on the exam was 94.1 minutes, and the
standard deviation was 24.23 minutes. Suppose the exam proctor realized after compiling these
data that he had used the wrong start time in his calculation, so that each value for time spent on
exam needs to be reduced by 15 minutes. He also wants to express the times in hours, rather than
minutes. Find the mean and standard deviation of the transformed data.
(c) What are the mean and standard deviation of the z-scores of time spent on the exam for all the
students who took this exam? Justify your answer.
11. A church group interested in promoting volunteerism in a community chooses an SRS of 200
community addresses and sends members to visit these addresses during weekday working hours to
inquire about the residents’ attitudes toward volunteer work. Sixty percent of all respondents say
that they would be willing to donate at least an hour a week to some volunteer organization. Bias is
present in this sample design. Identify the type of bias involved and state whether you think the
sample percent obtained is higher or lower than the true population percent.
12. The probability that a randomly selected person in the United States is left-handed is about 0.14.
(a) Use this probability to explain what the Law of Large Numbers says.
(b) Among the 28 students in Mr. Millar’s Calculus BC class, 8 are left-handed. Could this have
happened by chance alone? Describe how you would use a random number table to simulate the
proportion of left-handers in a class of 28 students if they were chosen randomly from a population
that is 14% left-handed. Do not perform the simulation.
13. A student wonders if there is a relationship between foot length and height. She measures herself and
six classmates and produces the following data (heights and foot lengths are in centimeters):
(b) Based on the scatterplot, describe the pattern, if any, in the relationship between the heights and
foot lengths of these students.
(c) Use your calculator to find the correlation r between the height and foot length. Do the data
show any evidence of a relationship between these two variables? Explain.
• heights were measured in inches rather than centimeters? (There are 2.54 centimeters in an inch.)
(e) Suppose another student with foot length 19 cm and height 167 cm were added to the data. How
would this influence r?
14. The school’s newspaper has asked you to contact 100 of the approximately 1100 students at the
school to gather information about student opinions regarding food at your school’s cafeteria.
(a) With as much precision as possible, describe the population for your study.
(b) You are pretty sure that there is a big difference between the opinions of males and females when
it comes to cafeteria food. Describe a study design that takes into account this potentially important
variable. Explain the advantage of this method.
(c) You decide to conduct a survey about the quality of food served in the school cafeteria by
randomly selecting students as they leave the cafeteria after lunch on a specific day next week.
Describe a source of bias that may result from using this method. Be sure to use the correct
terminology, and indicate the direction of the potential bias.
15. A couple has two sons and decide to have a third child. The husband says, “We’re bound to have a
daughter this time: things balance out.” The wife says, “Nonsense! Two boys in a row means we
are more likely to have another boy.” Comment on this disagreement, based on your
understanding of probability.
16. Below is some data on the relationship between the price of a certain manufacturer’s flat-panel LCD
televisions and the area of the screen. We would like to use these data to predict the price of
televisions based on size.
(a) Use your calculator to find the equation of the least-squares regression equation. Write the
equation below, defining any variables you use.
(b) This manufacturer also produces a television with a screen size of 943 square inches. Would it
be reasonable to use this equation to predict the price of that television? Explain.
(c) Calculate the residual for the television that has a screen area of 437 square inches. What does
this number suggest about the cost of this television, relative to the others?
17. Agricultural scientists for a chemical company want to determine if a newly developed fertilizer
produces heavier tomatoes than the fertilizer they currently manufacture. For their first pilot study,
they have 24 healthy young tomato plants growing in individual pots, numbered from 1 to 24.
Describe the design of a completely randomized, controlled experiment to test the whether the new
fertilizer produces heavier tomatoes. Your answer should address all four basic principles of
experimental design.
18. A cookie manufacturer is trying to determine how long cookies stay fresh on store shelves, and the
extent to which the type of packaging and the store’s temperature influences how long the cookies
stay fresh. He designs a completely randomized experiment involving low (64 °F and high (75 °F)
temperatures and two types of packaging—plastic and waxed cardboard. List the experimental
units, factors, and treatments in this experiment.
19. Alana’s favorite exercise machine is a stair climber. On the “random” setting, it changes speeds at
regular intervals, so the total number of simulated “floors” she climbs varies from session to session.
She also exercises for different lengths of time each session. She decides to explore the relationship
between the number of minutes she works out on the stair climber and the number of floors it tells
her that she’s climbed. She records minutes of climbing time and number of floors climbed for six
exercise sessions. Computer output and a residual plot from a linear regression analysis of the data
are shown below.
(a) What is the equation of the least-squares line? Be sure to define any variables you use.
(b) Is a line an appropriate model for these data? Justify your answer.
(a) Explain clearly how you would use your calculator to choose a sample of 100 students for this
study.
(b)Explain how you could use a random digits table to choose a sample of 100 students for this
study.
21. For each study describe below, comment briefly on the extent to which results can be generalized to
some larger population, and the extent to which cause and effect has been established.
(a) A marketing executive who wants to gauge reactions to a new packaging design for a popular
brand of cookie places the new packages in 45 randomly-selected grocery stores in a large city and
compares sales of the cookies to sales of the same cookie (with the old packaging) in the previous
month.
(b) A consumer advocacy organization wants to determine if using premium gasoline in the engines
of cars improves gas mileage. They randomly select 40 makes and models of new cars and acquire
two of each. They run each car on a track for 1000 miles, one with regular gasoline, one with
premium. (Which car within each pair gets the premium gas is determined by coin flip). After
driving each car, they determine the difference in fuel consumption within each pair of cars.
(c) A high school student thinks that the longer a student has been at the school, the less they like the
food in the cafeteria. To test this theory, she gives a two-question survey to the first 100 people
who enter the cafeteria on a certain day. The first question is, “How long have you attended school
here?” The second question asks the student to rate the food in the cafeteria on a 1 to 5 scale.
22. Some days, Ramon drives to work. The rest of the time he rides his bike. Suppose we choose a
random work day. The following table gives the probabilities of several events.
Event Probability
Student participates in sports 0.20
Student participates in sports and graduates 0.18
Student graduates, given no participation in sports 0.82
(a) Find the probability that Ramon is late for work, given that he drives.
(b) Find the probability that Ramon is not late for work, given that he drives.
(c) Draw a tree diagram to summarize the given probabilities and those you determined above.
(d) Find the probability that Ramon drove to work, given that he is late.
23. Suppose a person was having two surgeries performed at the same time by different operating teams.
Assume (unrealistically) that the two operations are independent. If the chances of success for
surgery A are 85%, and the chances of success for surgery B are 90%, what is the probability that
both will fail?
24. What age groups use social networking sites? A recent study produced the following data about
768 individuals who were asked their age and which of three social networking sites they used most
often. (People who did not use such sites were excluded from the study).
Age Group (Years)
Web site 0 – 24 25 – 44 45 – 64 Over 65 Totals
Facebook 77 105 114 12 308
Twitter 46 110 81 7 244
LinkedIn 15 97 95 9 216
Totals 138 312 290 28 768
(a) Find the probability that the selected subject preferred Twitter.
(b) Find the probability that the selected subject preferred Twitter, given that he or she was in the 45
– 64 age group.
(c) Are the events “preferred Twitter” and “age group 45 – 64” independent? Explain.
(d) Are the events “preferred Twitter” and “age group 45 – 64” mutually exclusive? Explain.
(e) If a random sample of two subjects were selected, what is the probability that neither preferred
Twitter?
25.The manager of a children’s puppet theatre has determined that the number of adult tickets he sells for a
Saturday afternoon show is a random variable with a mean of 28.3 tickets and a standard deviation
of 5.3 tickets. The mean number of children’s tickets he sells is 42.5, with a standard deviation of
8.1.
(a) The adult tickets sell for $10. Let A = the money he collects from adult tickets on a random
Saturday. What are the mean and standard deviation of A?
(b) The children’s tickets sell for $6. Let T = the money he collects from all ticket sales (adults and
children) on a random Saturday. Assume (unrealistically, perhaps) that the number of tickets sold
to adults is independent of the number sold to children. What are the mean and standard deviation
of T?
(c) It costs $300 for the manager to put on each puppet show. Let P = the profit from a random
Saturday’s show. What are the mean and standard deviation of P?
26. Consider the following activity: The letters in the word AARDVARK are printed on identical
plastic cards with one letter per card. The eight cards are then placed in a hat, and one card is
randomly chosen (without looking) from the hat. The chance process we are interested in is what
letter is on the selected card.
(b) Make a table that shows the set of outcomes and the probability of each outcome:
Outcome
Probability
List the outcomes in each of the following events, and determine their probabilities:
V={ P(V) =
F={ P(F) =
V or F = { P(V or F) =
c
F ={ P(Fc) =
V and F = { P(V and F) =
V given F = { P(V|F) =
(d) Are the events V and F are independent? Explain.
(a) Draw a card from a standard deck of 52 playing cards, observe the card, return the card to the
deck, and shuffle. Count the number of times you draw a card in this manner until you observe a
jack.
(b) Joey buys a Virginia lottery ticket every week. X is the number of times in a year that he wins a
prize.
29.When a computerized generator is used to generate random digits, the probability that any particular digit
in the set {0, 1, 2, . . . , 9} is generated on any individual trial is 1/10 = 0.1. Suppose that we are
generating digits one at a time and are interested in tracking occurrences of the digit 0.
(a) Determine the probability that the first 0 occurs as the fifth random digit generated.
(b).How many random digits would you expect to have to generate in order to observe the first 0?
(c)Let X = number of digits selected until first zero is encountered. Construct a probability
distribution histogram for X = 1 through X = 5.
30.A fair coin is flipped 20 times.
(a) Determine the probability that the coin comes up tails exactly 15 times.
(b).Let X = the number of tails in the 20 flips. Find the mean and standard deviation of X.
(c) Find the probability that X takes a value within 1 standard deviation of its mean.
31. The weights of Granny Smith apples from a large orchard are Normally distributed with a mean of
380 gm and a standard deviation of 28 gm.
(a) A single apple is selected at random from this orchard. What is the probability that it weighs
more 400 gm?
(b) Three apples are selected at random from this orchard. What is the probability that their mean
weight is greater than 400 gm.?
(b) Is it possible to calculate the standard deviation of ? If it is, do the calculation. If it isn’t,
explain why.
(c) Do you know the approximate shape of the sampling distribution of ? If so, describe the shape
and justify your answer. If not, explain why not.
33.A four sided die shaped like an asymmetrical tetrahedron has the following roll probabilities.
Number on Die 1 2 3 4
Probability 0.4 0.3 0.2 0.1
(a) Find
(b) Find
Below is a copy of the table from the first page, showing the probability distribution of X = the
number rolled on an asymmetrical four-sided die.
X 1 2 3 4
P(X) 0.4 0.3 0.2 0.1
(a) A 1993 survey conducted by the Richmond Times-Dispatch one week before election day asked
voters which candidate for the state’s attorney general they would vote for. 37% of the respondents
said they would vote for the Democratic candidate. On election day, 41% actually voted for the
Democratic candidate.
(b) The National Center for Health Statistics reports that the mean systolic blood pressure for males
35 to 44 years of age is 128 and the standard deviation is 15. The medical director of a large
company looks at the medical records of 72 executives in this age group and finds that the mean
systolic blood pressure for these executives is 126.07.
35. A large pet store that specializes in tropical fish has several thousand guppies. The store claims that
the guppies have a mean length of 5 cm and a standard deviation of 0.5 cm. You come to the store
and buy 10 randomly-selected guppies and find that the mean length of your 10 guppies is 4.8 cm.
This makes you suspect that the mean fish length is not what the store says it is. To explore this
further, you assume that the length of guppies is Normally distributed and use a computer to
simulate 200 samples of 10 guppies from the store’s claimed population. Below is a dotplot of the
means from these 200 samples.
(a) What is the population in this situation, and what population parameters have we been given?
(b) The distribution of one sample is described in the opening paragraph. What information have
we been given about this sample?
(d) Do you think the store is being honest about the length of its guppies? Justify your answer.
36. According to a poll, 22% of high school students in the United Kingdom say that Dobby is their favorite
character in the Harry Potter books. Let’s assume this is the parameter value for the entire
population of high school students in the U.K. You take a sample of 150 high school students and
record the proportion, , of individuals in your sample who say Dobby is their favorite character.
(a) What are the mean and standard deviation of the sampling distribution of ?
(b) What is the approximate shape of the sampling distribution? Justify your answer.
(c) Suppose our sample size was 36 instead of 150. Compare the shape, center, and spread of this
sampling distribution to the one in parts A. and B..
(d) A small town in the U.K. has only 600 high school students. What is the largest possible
sample you can take from this town and still be able to calculate the standard deviation of the
sampling distribution of using the method presented in the textbook? Explain.
37. Power companies severely trim trees growing near their lines to avoid power failures due to falling
limbs in storms. Applying a chemical to slow the growth of the trees is cheaper than trimming, but
the chemical kills some of the trees. Suppose that one such chemical would kill 20% of sycamore
trees. The power company tests the chemical on 250 sycamores. Consider these an SRS from the
population of all sycamore trees.
(a) What are the mean and standard deviation of the proportion of trees that are killed in samples of
250 trees?
(b) Calculate probability that at least 24% of the trees in the sample are killed.
1st Semester Final Exam Review
Answer Section
1. ANS:
(a) High: 5.6%; Medium: 53.8%; Low: 40.6%. (b) and (c) See the table of conditional distributions and
segmented bar graph below:
(d) The proportion of private college graduates in the medium and low management levels is about the same
as the proportion of public college graduates in those categories. But a higher proportion of the private
college graduates are in high management level than for public college graduates.
(a)
(b) The distribution is skewed to the right, with two peaks at 2 and 4 letters in length and a range of 11 – 2 = 9
letters. There are two possible outliers of 10 letters and 11 letters in length
D.
B.
C. No. From parts A. and B.,
D. No. The occurrence of one event does not preclude the occurrence of the other; it’s possible
that a subject preferred Twitter and is also in the 45 – 64 age group. That is,
B..
C. The mean weight of a random sample of three apples is less variable than the weight of a single
randomly-selected apple, so we are less likely to get a mean weight that is 20 gm above the mean
when we take a sample of three apples.
C. No. The population distribution is skewed, and n = 12, which is not large enough for the
central limit theorem to apply.