Statistics
Statistics
Statistics
Stat 431
Homework 1
n+1i
(b) Usually, is unknown. The typical solution is to obtain the theoretical quantiles from
the Exp(1) distribution function F (x) = 1 exp(x), and pair them with the sample
quantiles of rescaled data. [Recall that in the normal case, we obtain the theoretical
quantiles from the N (0, 1) distribution, and pair them with the sample quantiles of the
z-scores of the data.] As before, we check whether the plotted points are close to the
line y = x.
What would be a proper rescaling for the exponential QQ plot? Why?
(Hint: If X follows an exponential distribution with parameter , then x follows an
exponential distribution with parameter 1.)
3. Three different random samples are generated from three different underlying distributions.
For each random sample, the histogram, the boxplot and the normal QQ plot are all made.
However, the plots have been shuffled, so the three plots in each column are not necessarily
from the same sample. Please group the three plots generated from the same sample together
for all three samples.
1
(A)
(B)
4
2
0.9
0.8
1.5
0.7
1.0
0.5
0.6
(c)
10
0
1.5
2.0
2.5
3.0
0.6
0.7
0.8
(2)
(3)
3
0
Theoretical Quantiles
Theoretical Quantiles
0.9
3
2
1
0
Sample Quantiles
0
1
Sample Quantiles
(1)
1.0
0.5
2
0
2
4
0.0
30
Frequency
40
Frequency
20
0
2
20
60
40
50
80
150
100
Frequency
50
(b)
0
6
(a)
Sample Quantiles
2.0
2.5
3.0
(C)
Theoretical Quantiles
4. HousePrices.txt records the sale price [price] and square footage [BLDSQFT] for 439 houses.
(a) Make a normal QQ plot and a boxplot of the price variable. Is the empirical distribution
positively or negatively skewed?
(b) According to the direction of skewness, which transformations would be appropriate
for the price variable? [Please name at least two possible candidates.] Apply the two
possible transformations to the data, and use the normal QQ plot to judge which one is
more suitable for the current data.
(c) Make a scatter plot of the properly transformed price [as obtained in part (b)] vs. the
building square footage, with the least square line added on top of the data cloud. Report
the correlation coefficient for this data cloud.
5. Let X1 , . . . , X10 be a random sample of size 10 from a N (, 1) distribution.
2
+ 1.96/ 10].
1.96/ 10, X
[X
Show that the resulting test has significance level = 0.05.
(b) Now suppose we have observed x1 , . . . , x10 , and have computed
= 1.6
the sample mean x
out of these numbers. We reject H0 when |
x 0 | > 1.96/ 10. Find all the values of
0 such that we do not reject H0 : = 0 based on the above rejection rule. Compare
the values of 0 you find with the realization of the 95% CI for on this dataset.