Statistics FinalReview
Statistics FinalReview
Statistics FinalReview
Theoretical quetions:
1. Explain the steps how you compute the pth percentile. Give example for the quartiles.
2. Prove the short-cut formula for a population variance.
3. State and prove the Bayess theorem.
4. Describe the binomial distribution X=B(n,p) and prove the formula for P(X=k).
5. Let X=N(,) and Y=(X-)/. Explain how you compute P(X<a), P(X>b) and
P(a<X<b) in term of Y. Deduce that Y=N(0,1).
6. Describe the sampling distribution of sample means, give an example. State the
central limit theorem in this case.
7. Describe the sampling distribution of sample proportion, give an example. State the
central limit theorem in this case.
8. Explain how to obtain the confidence interval for a normal population mean.
9. Describe the formulae of the sample sizes required to obtain a confidence interval of
specific margin of error, given a significance level , for a normal mean and for a
population proportion.
Problems:
Descriptive statistics
P1. How much do users pay for Internet service? Here are the monthly fees (in dollars) paid
by a random sample of 50 users of commercial Internet service providers in August 2000:
20 40 22 22 21 21 20 10 20 20 20 13 18 50 20 18 15 8 22 25
22 10 20 22 22 21 15 23 30 12 9 20 40 22 29 19 15 20 20 20
20 15 19 21 14 22 21 35 20 22
(a) Make a stemplot of these data. Briefly describe the pattern you see. About how much do
you think America Online and its larger competitors were charging in August 2000?
(b) Which observations are suspected outliers by the 1.5IQR rule?
P2. Some companies grade on a bell curve to compare the performance of their managers
and professional workers. This forces the use of some low performance ratings, so that not
all workers are graded above average. Until the threat of lawsuits forced a change, Ford
Motor Companys performance management process assigned 10% A grades, 80% B
grades, and 10% C grades to the companys 18,000 managers. It isnt clear that the bell
curve of ratings is really a normal distribution. Nonetheless, suppose that Fords
performance scores are normally distributed. One year, managers with scores less than 25
received Cs and those with scores above 475 received As. What are the mean and standard
deviation of the scores?
P3. The median of any normal distribution is the same as its mean. We can use normal
calculations to find the quartiles and related descriptive measures for normal distributions.
(a) What is the area under the standard normal curve to the left of the first quartile? Use this
to find the value of the first quartile for a standard normal distribution. Find the third
quartile similarly.
1
Lecturer: Dr. Pho Duc Tai, Department of Mathematics, Vietnam National University, E-mail: [email protected]
1
(b) Your work in (a) gives the z-scores for the quartiles of any normal distribution. Scores
on the Wechsler Intelligence Scale for Children (WISC) are normally distributed with mean
100 and standard deviation 15. What are the quartiles of WISC scores?
(c) What is the value of the IQR for the standard normal distribution?
(d) What percent of the observations in the standard normal distribution are suspected
outliers according to the 1.5 IQR rule? (This percent is the same for any normal
distribution.)
P4. Many random number generators allow users to specify the range of the random
numbers to be produced. Suppose that you specify that the outcomes are to be distributed
uniformly between 0 and 2. Then the density curve of the outcomes has constant height
between 0 and 2, and height 0 elsewhere.
(a) What is the height of the density curve between 0 and 2? Draw a graph of the density
curve.
(b) Use your graph from (a) and the fact that areas under the curve are proportions of
outcomes to find the proportion of outcomes that are less than 1.
(c) Find the proportion of outcomes that lie between 0.5 and 1.3.
P5. Here is a two-way table of all suicides committed in a recent year by sex of the victim
and method used.
Male Female
Firearms 15,802 2,367
Poison 3,262 2,233
Hanging 3,822 856
Other 1,571 571
Total 24,457 6,027
(a) What is the probability that a randomly selected suicide victim is male?
(b) What is the probability that the suicide victim used a firearm?
(c) What is the conditional probability that a suicide used a firearm, given that it was a
man? Given that it was a woman?
(d) Describe in simple language (dont use the word probability) what your results in (a)
tell you about the difference between men and women with respect to suicide.
2
Problem 4. A random sample of 16 ATM transactions at the Last National Bank of Flatrock
revealed a mean transaction time of 2.8 minutes with a standard deviation of 1.2 minutes.
The width (in minutes) of the 95% confidence interval for the true mean transaction time is
0.639 / 0.588 / 0.300 / 2.131
Problem 5. To estimate the average annual expenses of students on books and class
materials a sample of size 36 is taken. The mean is $850 and the standard deviation is $54.
A 99% confidence interval for the population mean is
(a) $823.72 to $876.28
(b) $826.82 to $873.18
(c) $831.73 to $868.27
(d) $825.48 to $874.52
Problem 6. A poll showed that 48 out of 120 randomly chosen graduates of California
medical schools last year intended to specialize in family practice. What is the width of a
90% confidence interval for the proportion that plan to specialize in family practice?
.04472 / .07357 / .08765 / .00329
Problem 7. In a random sample of 810 women employees, it is found that 81 would prefer
working for a female boss. The width of the 95% confidence interval for the proportion of
women who prefer a female boss is
.0288 / .0105 / .0196 / .0207
Problem 8. Jolly Blue Giant Health Insurance (JBGHI) is concerned about rising lab test
costs and would like to know what proportion of the positive lab tests for prostate cancer
are actually proven correct through subsequent biopsy. JBGHI demands a sample large
enough to ensure an error of 2% with 90% confidence. What is the necessary sample size?
2,401 / 1,692 / 1,604 / 609
Problem 9. A financial institution wishes to estimate the mean balances owed by its credit
card customers. The population standard deviation is estimated to be $300. If a 98 percent
confidence interval is used and an interval of $75 is desired, how many cardholders should
be sampled?
3,382 / 62 / 629 / 87
Problem 10. Landings and takeoffs at Schiphol, Holland, per month are (in 1,000s) as
follows:
26, 19, 27, 30, 18, 17, 21, 28, 18, 26, 19, 20, 23, 18, 25, 29, 30, 26, 24, 22, 31, 18, 30, 19
Assume a random sample of months. Give a 95% confidence interval for the average
monthly number of takeoffs and landings.
Answer: [21.507, 25.493]
Problem 11. The Java computer language, developed by Sun Microsystems, has the
advantage that its programs can run on types of hardware ranging from mainframe
computers all the way down to handheld computing devices or even smart phones. A test of
100 randomly selected programmers revealed that 71 preferred Java to their other most used
computer languages. Construct a 95% confidence interval for the proportion of all
programmers in the population from which the sample was selected who prefer Java.
Answer: [0.6211, 0.7989]
Problem 12. According to the Wall Street Journal, an average of 44 tons of carbon dioxide
will be saved per year if new, more efficient lamps are used.20 Assume that this average is
based on a random sample of 15 test runs of the new lamps and that the sample standard
deviation was 18 tons. Give a 90% confidence interval for average annual savings.
Answer: [35.81417, 52.18583]
3
Problem 13. Sonys new optical disk system prototype tested and claimed to be able to
record an average of 1.2 hours of high-definition TV. Assume n = 10 trials and = 0.2 hour.
Give a 90% confidence interval.
Answer: [1.0841, 1.3159]
Problem 14. FinAid is a new, free Web site that helps people obtain information on 180,000
college tuition aid awards. A random sample of 500 such awards revealed that 368 were
granted for reasons other than financial need. They were based on the applicants
qualifications, interests, and other variables. Construct a 95% confidence interval for the
proportion of all awards on this service made for reasons other than financial need.
Answer: [0.6974, 0.7746]
Problem 15. In May 2007, a banker was arrested and charged with insider trading after
government investigators had secretly looked at a sample of nine of his many trades and
found that on these trades he had made a total of $7.5 million. Compute the average earning
per trade. Assume also that the sample standard deviation was $0.5 million and compute a
95% confidence interval for the average earning per trade for all trades made by this banker.
Use the assumption that the nine trades were randomly selected. Suppose the confidence
interval contained the value 0.00. How could the bankers attorney use this information to
defend his client?
Answer: did benefit
Problem 16. A marketing manager wishes to estimate the proportion of customers who
prefer a new packaging of a product to the old. He guesses that 60% of the customers would
prefer the new packaging. The manager wishes to estimate the proportion to within 2% with
90% confidence. What is the minimum required sample size?
Answer:
Problem 17. According to Shape, on the average, 1/2 cup of edamame beans contains 6
grams of protein. If this conclusion is based on a random sample of 50 half-cups of
edamames and the sample standard deviation is 3 grams, construct a 95% confidence
interval for the average amount of protein in 1/2 cup of edamames.
Answer: [5.147, 6.853]
5
reveals that 298 use the companys product. Is there evidence to conclude that the
companys market share is no longer 56%, at the 0.01 level of significance?
Answer: z = 1.622, Do not reject H0 (p-value = 0.1048)
Problem 13. According to Money, the average amount of money that a typical person in the
United States would need to make him or her feel rich is $1.5 million. A researcher wants to
test this claim. A random sample of 100 people in the United States reveals that their mean
amount to feel rich is $2.3 million and the standard deviation is $0.5 million. Conduct the
test.
Answer: z = 16.0, Reject H0
Problem 14. Certain eggs are stated to have reduced cholesterol content, with an average of
only 2.5% cholesterol. A concerned health group wants to test whether the claim is true. The
group believes that more cholesterol may be found, on the average, in the eggs. A random
sample of 100 eggs reveals a sample average content of 5.2% cholesterol, and a sample
standard deviation of 2.8%. Does the health group have cause for action?
Answer: z = 9.643, Reject H0
Problem 15. The engine of the Volvo model S70 T-5 is stated to provide 246 horsepower.
To test this claim, believing it is too high, a competitor runs the engine n = 60 times,
randomly chosen, and gets a sample mean of 239 horsepower and standard deviation of 20
horsepower. Conduct the test, using = 0.01.
Answer: z = 2.711, Reject H0
Problem 16. According to BusinessWeek, the Standard & Poors 500 Index posted an
average gain of 13% for 2006. If a random sample of 50 stocks from this index reveals an
average gain of 11% and standard deviation of 6%, can you reject the magazines claim in a
two-tailed test? What is your p-value?
Answer: z = 2.3570, Reject H0
Problem 17. The null and alternative hypotheses of a t test for the mean are
H0: = 1,000, H1: < 1,000
Other things remaining the same, which of the following will result in an increase in
the p-value?
a. Increase in the sample size.
b. Increase in the sample mean.
c. Increase in the sample standard deviation.
d. Increase in .
Problem 18. The null and alternative hypotheses of a test for population proportion are
H0: = 0.25, H1: > 0.25
Other things remaining the same, which of the following will result in an increase in
the p-value?
a. Increase in sample size.
b. Increase in sample proportion.
c. Increase in .
6
Problem 2. Carver Memorial Hospital's surgeons have a new procedure that they think will
decrease the time to perform an appendectomy. A sample of 8 appendectomies using the old
method had a mean of 38 minutes with a variance of 36 minutes, while a sample of 10
appendectomies using the experimental method had a mean of 29 minutes with a variance
of 16 minutes.
(a) For a right-tail test of means (assume equal variances) the critical value for = .10 is
1.746 / 1.337 / 2.120 / 2.754
(b) For a right-tail test of means (assume equal variances) the test statistic is
2.365 / 3.814 / 3.000 / 1.895
Problem 3. In a test of a new surgical procedure, the five most respected surgeons in
FlatBroke Township were invited to Carver Hospital. Each surgeon was assigned two
patients of the same age, gender, and overall health. One patient was operated upon in the
old way, and the other in the new way. Both procedures are considered equally safe. The
time (in minutes) to complete each procedure is shown:
Surgeon
Allen Bob Chloe Daphne Edgar
Old Way 36 55 28 40 62
New Way 31 45 28 35 57
(a) In a right-tail test for a difference of means at = .05, the critical value is
(i) 3.162, paired t-test
(ii) 2.132, paired t-test
(iii) 1.645, independent samples t-test
(iv) 2.776, independent samples t-test
(b) In a right-tailed test for a difference of means, the test statistic is 3.162 / 1.645 /
1.860 / 2.132