Exercises
Exercises
Exercises
Exercise 1
We aim to predict the price cars through a linear relationship
∀i, Pi = α + Xi β + ui ,
where Xi contains the following continuous variables: wheelbase (X1 ), width (X2 ), height
(X3 ), curb weight (X4 ), engine size (X5 ), compression ratio (X6 ), horsepower (X7 ), peak-
rpm (X8 ). The variables are first centered and standardized by their standard deviation
(since they are not expressed in the same unit). We assume that the error term is
ui |X ∼ NR (0, σ 2 ).
i.i.d.
Moreover, we assume the standard assumptions for the multiple linear regression model
(exogeneity, homoskedasticity). The software provides the following estimation results
(’SE’ stands for standard errors):
(2) Provide the t-test for the estimator βb3 corresponding to x3 using the distribution of
ui .
To do so, we will state the null hypothesis, provide an estimator of σ 2 and the
rejection area.
(4) The F-test tests for β = 0. Provide the F-statistic used to test for H0 : β = 0.
1
(5) The 1 − 5% quantile of F (8, 186) is 1.99. Is the linear model significant at 5%
significance level?
2
Exercise 2
Consider the multiple regression model:
∀i = 1, . . . , n, Yi = Xi β + ui ,
where Xi = (Xi1 , . . . , XiK ) is a 1×K vector of covariates and ui is the error term, centered
with variance σ 2 and Var(u|X) = σ 2 In . All the data are i.i.d.
(2) Write the statistical criterion that solves the least squares estimation problem. As-
suming X⊤ X ≻ 0, solve this problem.
(3) Under the exogeneity assumption, prove that it is an unbiased estimator. Then,
show that Var(β|X)
b = σ 2 (X⊤ X)−1 under the assumption that E[ui uj |X] = 0, i ̸= j
and Var(ui |X) = σ 2 , ∀i.
(5) In addition to the exogeneity assumption, E[ui uj |X] = 0, i ̸= j and Var(ui |X) =
σ 2 , ∀i, Assume that the moments E[|Xil Xim |] exist for any l, m ≤ K and E[X⊤
i Xi ]
is non-singular. Prove
√ d
n(βb − β) −→ NRK (0, σ 2 E[X⊤ −1
P
βb −→ β, i Xi ] ).
n→∞ n→∞
(To prove these results, we will consider the OLS criterion as the sum averaged by
the number of observations.)
3
Exercise 3
Consider the multiple regression model:
∀i = 1, . . . , n, Yi = Xi β + ui ,
We aim to test:
H0 : β1 = β2 , β3 + β4 = α,
(2) The sample size is n = 200. Based on the sample of realized data, Fb = 2.5. With
level α = 0.05, the quantile is q1−α (F (2, 196)) = 3.042. What is the decision about
H0?
Yi = µ + βXi + γDi + ui .
We assume exogeneity and homoskedasticity. We aim to test for a structural break at the
m + 1 observation. Formally, the test is
H0 : β = γ = 0, H1 : β ̸= 0 or γ ̸= 0.
4
(5) The sample size is n = 150. Based on the realized sample, Fb = 120.7. At α = 0.05,
the quantile is q1−α (F (2, 147)) = 3.058. What is the decision about H0?
5
Exercise 4
Consider the multiple linear regression model:
Yi = X i β + u i ,
(1) Define Mxu = E[X⊤ i ui ]. Which condition with respect to Mxu is sufficient to have a
consistent OLS estimator β?b
(2) Assume that X2i and ui are correlated. We rely on IV estimation. Define Zi =
(X1i , Zi ) with the instrument Zi . Compute the IV estimator and prove its consis-
tency.
(3) Now assume we have 3 instruments W1,i , W2,i , W3,i . Perform the 2SLS method.
6
Exercise 5
Consider the multiple regression model:
Yi = X i β + u i ,
with α > 0, µ ≥ 0 and λ ∈ R and Zi a random variable formed from Xi . In matrix form,
the heteroscedasticity structure is:
α + µ exp(λZ1 ) 0 ··· 0
0 α + µ exp(λZ2 ) · · · 0
Var(u|X) = ∈ Rn×n .
.. .. .. ..
. . . .
0 0 · · · α + µ exp(λZn )
(1) Compute the OLS estimator βbOLS and the GLS estimator βbGLS .
(3) Under the same conditions, derive the FGLS estimator when Var(ui |X) = γZi2 ,
γ > 0.
7
Exercise 6
Compute the Fisher information matrix for the following statistical models:
Reminder:
and Z Z
∇2θθ⊤ log(pθ0 (x)) d x = ∇2θθ⊤ log(pθ0 (x)) d x, (0.2)
X X
which means that the score evaluated at θ0 (the true parameter) is centered (expectation
is zero). Moreover, differentiating the score, we have:
∇θ pθ0 (X)
Eθ0 [∇2θθ⊤ log(pθ0 (X))] = Eθ0 [∇θ⊤ ]
pθ0 (X)
∇2θθ⊤ pθ0 (X) ∇θ pθ0 (X)∇θ⊤ pθ0 (X) ∇2θθ⊤ pθ0 (X)
= Eθ0 [ − ] = E θ0 [ − ∇θ log(pθ0 (X))∇θ⊤ log(pθ0 (X))]
pθ0 (X) pθ0 (X)2 pθ0 (X)
Z Z
2 2
= ∇θθ⊤ pθ0 (x) d x − I(θ0 ) = ∇θθ⊤ pθ0 (x) d x − I(θ0 ) = −I(θ0 ),
X X
λk
∀k ∈ N, P(X = k) = exp(−λ) .
k!
(2) A Pareto distribution with parameters α, γ with α > 1 and γ > 0, with density:
α − 1 γ α
p(α,γ)⊤ (x) = 1(x ≥ γ).
γ x
8
(3) A Weibull distribution with parameters α, γ with α > 0 and γ > 0, with density:
9
Exercise 7
All the random variables are i.i.d.
Xi ∼ B(p),
i.i.d.
Xi ∼ NR (m, σ 2 ),
i.i.d.
(3) Compute the MLE (b b)⊤ of the α-translated exponential distribution, whose den-
α, γ
sity is:
1 x−α
p(α,γ)⊤ (x) = exp(− )1(x ≥ α),
γ γ
α − α) (when n is fixed).
and provide the distribution of n(b
(Hint: use the cumulative distribution function.)
10
Exercise 8
Consider the i.i.d. sample of random variables X1 , . . . , Xn with probability density func-
tion defined as
12 x exp(− x ), x ≥ 0, θ > 0,
pθ (x) = θ θ
0, otherwise.
n
1X
θb = arg max Mn (θ), Mn (θ) = ℓ(Xi ; θ), ℓ(Xi ; θ) = log(pθ (Xi )).
θ n i=1
11
Exercise 9
Consider the i.i.d. sample of random variables X1 , . . . , Xn . For any i, Xi follows the Pareto
distribution P(a, b), a, b > 0, whose probability density function is given as follows:
aba
pθ (x) = 1(x ≥ b).
xa+1
The parameters of interest is θ = (a, b)⊤ ∈ R2 . Using the ML method, provide the ML
estimator θ.
b
12
Exercise 10
Consider a sample of i.i.d. random variables (X1 , · · · , Xn ) taking values in N∗ , whose law
is
∀i ≤ n, ∀x ∈ N∗ , P(Xi = x) = p(1 − p)x−1 ,
(1) Provide the log-likelihood function of the observations (X1 , · · · , Xn ) and the ML
statistical criterion.
13
Exercise 11
We consider the following non-linear regression model:
∀i = 1, . . . , n, Yi = µ + α exp(Xi β) + ϵi ,
(1) Using the distribution assumption ϵi |X ∼ NR (0, σ 2 ) and the i.i.d. assumption on
the observations (Yi , Xi ), provide the log-likelihood under the form
n
1X
Mn (θ) = log(pθ (Yi |Xi )),
n i=1
(2) Provide the first order conditions satisfied by the maximum likelihood estimator
θb (that is in total, 3 + K equations, each of them corresponding to the partial
derivative with respect to µ, α, β, σ 2 ).
14
Exercise 12
Let Yi a discrete random variable taking a finite number of values. Then we consider the
non-linear regression model
Zi1−ρ − 1
∀i = 1, . . . , n, Yi = γ + α + Vi β + ϵi ,
1−ρ
θ = (γ, α, ρ, β ⊤ , σ 2 )⊤ ∈ RK+4 .
(1) Using the distribution ϵi |X ∼ NR (0, σ 2 ) and that (Yi , Xi ), i = 1, . . . , n are i.i.d.,
provide the log-likelihood under the form
n
1X
Mn (θ) = ℓ(Yi |Xi ; θ).
n i=1
(2) Provide the first order conditions (that is in total, 5 equations, each of them corre-
sponding to the gradient with respect to γ, α, ρ, β, σ 2 ).
(3) Due to the non-linearity with respect to the parameters, we propose to use the
Newton-Raphson method. Provide the algorithm with the corresponding Hessian
and score. We will provide the explicit components of the Hessian.
15