2023 Applied Stat Comp Qual Exam

Comprehensive/Qualifying Exam: Applied Statistics
Boston University, 2023
Instructions: This is a closed book exam. You are not allowed a crib sheet or a calculator.
Please answer problems 1–2 and 3–4 in separate blue books. ALL answers need to include
an explanation, even if this is not explicitly asked in the question.
To pass the exam at the Master’s level, you need answer 3 questions. If you answer
more than three problems, the lowest three scores will be used to compute your total. If
you do not want us to grade any part of your answer, please cross it out completely.
To pass the exam at the Ph.D. level, you need to answer all four questions.
1
1. Consider a simple linear regression model yi = β0 +β1 xi +ϵi , for i = 1, . . . , n, where the
error terms ϵi ’s are
Pnassumed independent and identically distributed (i.i.d.) N(0, σ 2 ).
We assume that i=1 (xi − x̄)2 > 0. Let β̂0 , β̂1 denote the least squares estimators of
β0 and β1 respectively. We know that
Pn
(x − x̄)(yi − ȳ)
β̂0 = ȳ − β̂1 x̄, and β̂1 = i=1Pn i 2
.
i=1 (xi − x̄)
We can also rewrite the model in matrix form. If we set

     
y1 ϵ1 1 x1
β0
y =  ...  , ϵ =  ...  , β = , and X =  ... ...  ,
     
β1
yn ϵn 1 xn
then, the simple linear regression model above is equivalent to
y = Xβ + ϵ.
And the least squares estimate of (β0 , β1 ) is then

β̂0
= (X ′ X)−1 X ′ y.
β̂1
Show that the two expressions match.
2. We consider the multiple linear regression model
y = Xβ + ϵ, (1)
where y ∈ Rn , X ∈ Rn×p , and β ∈ Rp . We assume that ϵ ∼ N(0, σ 2 In ) and that X

has full column rank. We assume the model is correctly specified with true parameter
values β⋆ , σ⋆2 . Here the intercept will not play any special role. Let y(i) ∈ Rn−1 be the
vector of responses obtained after removing the i-th response. Let X(i) ∈ R(n−1)×p be
the explanatory matrix obtained after removing the i-th row of X, that we denote
xi . Let β̂(i) be the least squares estimate of the model y(i) = X(i) β + ϵ(i) . The i-th
deleted residual is defined as
ϵ̂(i) = yi − xi β̂(i) .

′
(a) Show that Var(ϵ̂(i) ) = σ⋆2 1 + xi (X(i) X(i) )−1 x′i .
p
(b) We know that ϵ̂(i) / Var(ϵ̂(i) ) can be approximated by the i-th Studentized resid-
ual of the model defined as
ϵ̂ ∥y(i) − X(i) β̂(i) ∥2

ti = √i , 2
where σ̂(i) = ,
σ̂(i) 1 − hi n−1−p
where hi is the i-th leverage, and ϵ̂i is the i-th residual. Show that ti ∼ Tn−1−p .
2
exp(κcos(ϕi −µi ))
3. Let. ϕ be a random variable taking values in [−π, π) with pdf f (ϕ) = 2πI0 (κ)
,
1
Rπ
where I0 (κ) = 2π −π
exp(κcos(x))dx.

cos(ϕ) κcos(µ)
(a) Let yi = be a unit vector and θ = . Show that this
sin(ϕ) κsin(µ)
distribution can be written in the form of the natural exponential family. Hint:
recall that cos(a − b) = cos(a)cos(b) + sin(a)sin(b).
(b) What are the natural parameter, dispersion parameter, and cumulant function
for this distribution? Make sure the cumulant function is expressed as a function
of the natural parameter.
(c) Express µ and κ as functions of the natural parameter.
(d) Write down an expression for the mean of y as a function of the natural parameter
and as a function of µi and κ.
4. An analysis was performed to determine how the number of traffic accidents occurring
at various intersections through town are related to the average traffic volume at
those intersections. The number of traffic accidents occurring over the period of 1
year at each of 100 intersections, Accidents, were recorded along with a standardized
measure of traffic volume for each intersection, Vol.
A Poisson GLM with log link was fit to Accidents as a function of Vol and Vol^2.
Here is the summary of the fitted model:
Estimate Std. Error z-value Pr(>|z|)

(Intercept) 0.04699 0.13011 0.361 0.71799
Vol 0.61983 0.17143 3.616 0.00030
I(Vol^2) -0.37810 0.14170 -2.668 0.00762
Residual deviance: 186.78 on 97 degrees of freedom
(a) Interpret the coefficients for Vol and Vol^2. If this fitted model is correct,
how many accidents per year would we expect in an intersection with no traffic
volume? At what value of Vol is the expected number of accidents maximized?
What is the expected number of accidents at this maximizing value?
(b) The researchers are worried that there is overdispersion in this model. The
Pearson statistic for this fit has a value of 218.25. Use this to estimate a
dispersion parameter for this model. Is there significant overdispersion in this
model? Compute 95% confidence intervals for the parameters associated with
Vol and Vol^2, using your estimated dispersion.
The researchers added a new categorical predictor describing each intersection

in one of three safety categories with C1 being the safest, C2 being in-between,
and C3 being the least safe. A new Poisson GLM was fit to this model, yielding
the following summary:
3
Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.1760 1262.5576 -0.014 0.98915
Vol 0.6138 0.1671 3.673 0.00024
I(Vol^2) -0.3116 0.1170 -2.664 0.00773
C2 16.0244 1262.5576 0.013 0.98987
C3 18.1158 1262.5576 0.014 0.98855
Residual deviance: 91.80 on 95 degrees of freedom
(c) Conduct a hypothesis test to determine whether this categorical predictor has
a statistically significant influence on Accidents. Explain why the estimated
Std. Error values in the summary table are so large.
(d) Assuming this new fitted model is correct, compute the expected number of
accidents in an intersection of type C1 with no traffic volume. What is the
maximum expected number of accidents in an intersection of type C3?
(e) The Pearson statistic for this fit has a value of 95. Is there evidence of overdis-
persion in this model? Based on the fits from both of these models, draw a
conclusion about whether traffic volume has a significant influence on the num-
ber of accidents occurring at each intersection.

2023 Applied Stat Comp Qual Exam

Uploaded by

Copyright:

Available Formats

2023 Applied Stat Comp Qual Exam

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 Applied Stat Comp Qual Exam

Uploaded by

Copyright:

Available Formats

Comprehensive/Qualifying Exam: Applied Statistics

Boston University, 2023

We can also rewrite the model in matrix form. If we set

then, the simple linear regression model above is equivalent to

And the least squares estimate of (β0 , β1 ) is then

Show that the two expressions match.

2. We consider the multiple linear regression model

where y ∈ Rn , X ∈ Rn×p , and β ∈ Rp . We assume that ϵ ∼ N(0, σ 2 In ) and that X

ϵ̂ ∥y(i) − X(i) β̂(i) ∥2

Estimate Std. Error z-value Pr(>|z|)

Residual deviance: 186.78 on 97 degrees of freedom

The researchers added a new categorical predictor describing each intersection

Residual deviance: 91.80 on 95 degrees of freedom

You might also like