2023 Applied Stat Comp Qual Exam
2023 Applied Stat Comp Qual Exam
2023 Applied Stat Comp Qual Exam
Instructions: This is a closed book exam. You are not allowed a crib sheet or a calculator.
Please answer problems 1–2 and 3–4 in separate blue books. ALL answers need to include
an explanation, even if this is not explicitly asked in the question.
To pass the exam at the Master’s level, you need answer 3 questions. If you answer
more than three problems, the lowest three scores will be used to compute your total. If
you do not want us to grade any part of your answer, please cross it out completely.
To pass the exam at the Ph.D. level, you need to answer all four questions.
1
1. Consider a simple linear regression model yi = β0 +β1 xi +ϵi , for i = 1, . . . , n, where the
error terms ϵi ’s are
Pnassumed independent and identically distributed (i.i.d.) N(0, σ 2 ).
We assume that i=1 (xi − x̄)2 > 0. Let β̂0 , β̂1 denote the least squares estimators of
β0 and β1 respectively. We know that
Pn
(x − x̄)(yi − ȳ)
β̂0 = ȳ − β̂1 x̄, and β̂1 = i=1Pn i 2
.
i=1 (xi − x̄)
y = Xβ + ϵ.
y = Xβ + ϵ, (1)
where hi is the i-th leverage, and ϵ̂i is the i-th residual. Show that ti ∼ Tn−1−p .
2
exp(κcos(ϕi −µi ))
3. Let. ϕ be a random variable taking values in [−π, π) with pdf f (ϕ) = 2πI0 (κ)
,
1
Rπ
where I0 (κ) = 2π −π
exp(κcos(x))dx.
cos(ϕ) κcos(µ)
(a) Let yi = be a unit vector and θ = . Show that this
sin(ϕ) κsin(µ)
distribution can be written in the form of the natural exponential family. Hint:
recall that cos(a − b) = cos(a)cos(b) + sin(a)sin(b).
(b) What are the natural parameter, dispersion parameter, and cumulant function
for this distribution? Make sure the cumulant function is expressed as a function
of the natural parameter.
(c) Express µ and κ as functions of the natural parameter.
(d) Write down an expression for the mean of y as a function of the natural parameter
and as a function of µi and κ.
4. An analysis was performed to determine how the number of traffic accidents occurring
at various intersections through town are related to the average traffic volume at
those intersections. The number of traffic accidents occurring over the period of 1
year at each of 100 intersections, Accidents, were recorded along with a standardized
measure of traffic volume for each intersection, Vol.
A Poisson GLM with log link was fit to Accidents as a function of Vol and Vol^2.
Here is the summary of the fitted model:
(a) Interpret the coefficients for Vol and Vol^2. If this fitted model is correct,
how many accidents per year would we expect in an intersection with no traffic
volume? At what value of Vol is the expected number of accidents maximized?
What is the expected number of accidents at this maximizing value?
(b) The researchers are worried that there is overdispersion in this model. The
Pearson statistic for this fit has a value of 218.25. Use this to estimate a
dispersion parameter for this model. Is there significant overdispersion in this
model? Compute 95% confidence intervals for the parameters associated with
Vol and Vol^2, using your estimated dispersion.
3
Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.1760 1262.5576 -0.014 0.98915
Vol 0.6138 0.1671 3.673 0.00024
I(Vol^2) -0.3116 0.1170 -2.664 0.00773
C2 16.0244 1262.5576 0.013 0.98987
C3 18.1158 1262.5576 0.014 0.98855
(c) Conduct a hypothesis test to determine whether this categorical predictor has
a statistically significant influence on Accidents. Explain why the estimated
Std. Error values in the summary table are so large.
(d) Assuming this new fitted model is correct, compute the expected number of
accidents in an intersection of type C1 with no traffic volume. What is the
maximum expected number of accidents in an intersection of type C3?
(e) The Pearson statistic for this fit has a value of 95. Is there evidence of overdis-
persion in this model? Based on the fits from both of these models, draw a
conclusion about whether traffic volume has a significant influence on the num-
ber of accidents occurring at each intersection.