Exercises

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Exercises Econometrics I

Exercise 1
We aim to predict the price cars through a linear relationship

∀i, Pi = α + Xi β + ui ,

where Xi contains the following continuous variables: wheelbase (X1 ), width (X2 ), height
(X3 ), curb weight (X4 ), engine size (X5 ), compression ratio (X6 ), horsepower (X7 ), peak-
rpm (X8 ). The variables are first centered and standardized by their standard deviation
(since they are not expressed in the same unit). We assume that the error term is

ui |X ∼ NR (0, σ 2 ).
i.i.d.

Moreover, we assume the standard assumptions for the multiple linear regression model
(exogeneity, homoskedasticity). The software provides the following estimation results
(’SE’ stands for standard errors):

(1) Explain the meaning of the tStat column.

(2) Provide the t-test for the estimator βb3 corresponding to x3 using the distribution of
ui .
To do so, we will state the null hypothesis, provide an estimator of σ 2 and the
rejection area.

(3) Construct a 1 − 0.05 confidence interval for β3 .

(4) The F-test tests for β = 0. Provide the F-statistic used to test for H0 : β = 0.

1
(5) The 1 − 5% quantile of F (8, 186) is 1.99. Is the linear model significant at 5%
significance level?

2
Exercise 2
Consider the multiple regression model:

∀i = 1, . . . , n, Yi = Xi β + ui ,

where Xi = (Xi1 , . . . , XiK ) is a 1×K vector of covariates and ui is the error term, centered
with variance σ 2 and Var(u|X) = σ 2 In . All the data are i.i.d.

(1) Write the previous equation as a stacked regression model.

(2) Write the statistical criterion that solves the least squares estimation problem. As-
suming X⊤ X ≻ 0, solve this problem.

(3) Under the exogeneity assumption, prove that it is an unbiased estimator. Then,
show that Var(β|X)
b = σ 2 (X⊤ X)−1 under the assumption that E[ui uj |X] = 0, i ̸= j
and Var(ui |X) = σ 2 , ∀i.

(4) Provide an estimator of σ 2 and prove that it is unbiased.

(5) In addition to the exogeneity assumption, E[ui uj |X] = 0, i ̸= j and Var(ui |X) =
σ 2 , ∀i, Assume that the moments E[|Xil Xim |] exist for any l, m ≤ K and E[X⊤
i Xi ]
is non-singular. Prove

√ d
n(βb − β) −→ NRK (0, σ 2 E[X⊤ −1
P
βb −→ β, i Xi ] ).
n→∞ n→∞

(To prove these results, we will consider the OLS criterion as the sum averaged by
the number of observations.)

3
Exercise 3
Consider the multiple regression model:

∀i = 1, . . . , n, Yi = Xi β + ui ,

where Xi = (Xi1 , X12 , Xi3 , Xi4 ). The error term ui satisfies:

E[ui |X] = 0, Var(u|X) = σ 2 In .

We aim to test:
H0 : β1 = β2 , β3 + β4 = α,

where α is a known quantity.

(1) Provide the F statistic.

(2) The sample size is n = 200. Based on the sample of realized data, Fb = 2.5. With
level α = 0.05, the quantile is q1−α (F (2, 196)) = 3.042. What is the decision about
H0?

Consider the following structural change model:

Yi = µ + βXi + γDi + ui .

Here, Xi ∈ R and Di is a dummy variable:



0, i = 1, · · · , m,
Di =
1, i = m + 1, · · · , n.

We assume exogeneity and homoskedasticity. We aim to test for a structural break at the
m + 1 observation. Formally, the test is

H0 : β = γ = 0, H1 : β ̸= 0 or γ ̸= 0.

(3) Derive the OLS estimator θb = (b


α, β,
bγ b)⊤ ∈ R3 .

(4) Provide the F test.

4
(5) The sample size is n = 150. Based on the realized sample, Fb = 120.7. At α = 0.05,
the quantile is q1−α (F (2, 147)) = 3.058. What is the decision about H0?

5
Exercise 4
Consider the multiple linear regression model:

Yi = X i β + u i ,

Here, Xi = (X1i , X2i ), β = (β1 , β2 )⊤ ∈ R2 . We assume that the sample of random


variables (Yi , Xi ) are i.i.d. and the moments of |Xki Xli | exist. Moreover, E[ui |X] =
0, Var(ui |X) = σ 2 , with X ∈ Rn×2 .

(1) Define Mxu = E[X⊤ i ui ]. Which condition with respect to Mxu is sufficient to have a
consistent OLS estimator β?b

(2) Assume that X2i and ui are correlated. We rely on IV estimation. Define Zi =
(X1i , Zi ) with the instrument Zi . Compute the IV estimator and prove its consis-
tency.

(3) Now assume we have 3 instruments W1,i , W2,i , W3,i . Perform the 2SLS method.

6
Exercise 5
Consider the multiple regression model:

Yi = X i β + u i ,

where Xi = (Xi1 , . . . , XiK ) is a 1 × K vector of covariates and ui is the error term,


centered. The covariates are stochastic. The first and second order moments of ui are as
follows:

∀i, E[ui |X] = 0, E[ui uj |X] = 0, j ̸= i, Var(ui |X) = α + µ exp(λZi ),

with α > 0, µ ≥ 0 and λ ∈ R and Zi a random variable formed from Xi . In matrix form,
the heteroscedasticity structure is:
 
α + µ exp(λZ1 ) 0 ··· 0
 
 0 α + µ exp(λZ2 ) · · · 0 
Var(u|X) =   ∈ Rn×n .
 
.. .. .. ..

 . . . . 

0 0 · · · α + µ exp(λZn )

(1) Compute the OLS estimator βbOLS and the GLS estimator βbGLS .

(2) Compute the FGLS estimator.

(3) Under the same conditions, derive the FGLS estimator when Var(ui |X) = γZi2 ,
γ > 0.

7
Exercise 6
Compute the Fisher information matrix for the following statistical models:

Reminder:

Assuming the following equalities:


Z Z
∇θ log(pθ0 (x)) d x = ∇θ log(pθ0 (x)) d x, (0.1)
X X

and Z Z
∇2θθ⊤ log(pθ0 (x)) d x = ∇2θθ⊤ log(pθ0 (x)) d x, (0.2)
X X

then, using (0.1) we first have:

Eθ0 [∇θ log(pθ0 (X))]


∇θ pθ0 (X)
Z Z

= Eθ0 [ ]= ∇θ pθ0 (x) d x = ∇θ log(pθ0 (x)) d x = ∇θ 1 , = 0.
pθ0 (X) X X

which means that the score evaluated at θ0 (the true parameter) is centered (expectation
is zero). Moreover, differentiating the score, we have:

∇θ pθ0 (X) 
Eθ0 [∇2θθ⊤ log(pθ0 (X))] = Eθ0 [∇θ⊤ ]
pθ0 (X)
∇2θθ⊤ pθ0 (X) ∇θ pθ0 (X)∇θ⊤ pθ0 (X) ∇2θθ⊤ pθ0 (X)
= Eθ0 [ − ] = E θ0 [ − ∇θ log(pθ0 (X))∇θ⊤ log(pθ0 (X))]
pθ0 (X) pθ0 (X)2 pθ0 (X)
Z Z
2 2
= ∇θθ⊤ pθ0 (x) d x − I(θ0 ) = ∇θθ⊤ pθ0 (x) d x − I(θ0 ) = −I(θ0 ),
X X

where we used (0.2).

(1) A Poisson distribution with parameter λ:

λk
∀k ∈ N, P(X = k) = exp(−λ) .
k!

(2) A Pareto distribution with parameters α, γ with α > 1 and γ > 0, with density:

α − 1 γ α
p(α,γ)⊤ (x) = 1(x ≥ γ).
γ x

8
(3) A Weibull distribution with parameters α, γ with α > 0 and γ > 0, with density:

p(α,γ)⊤ (x) = αγxα−1 exp(−γxα )1(x ≥ 0).

(4) A uniform distribution on [0, θ] with θ > 0 unknown.

9
Exercise 7
All the random variables are i.i.d.

(1) Compute the MLE pb of p in the model

Xi ∼ B(p),
i.i.d.

where B(p) is the Bernoulli distribution with parameter p.

(2) Compute the MLE (m,


b σb2 )⊤ of (m, σ 2 )⊤ in the model

Xi ∼ NR (m, σ 2 ),
i.i.d.

and provide the limiting distribution of


!
√ b −m
m
n .
b2 − σ 2
σ

(3) Compute the MLE (b b)⊤ of the α-translated exponential distribution, whose den-
α, γ
sity is:
1 x−α
p(α,γ)⊤ (x) = exp(− )1(x ≥ α),
γ γ
α − α) (when n is fixed).
and provide the distribution of n(b
(Hint: use the cumulative distribution function.)

10
Exercise 8
Consider the i.i.d. sample of random variables X1 , . . . , Xn with probability density func-
tion defined as 
 12 x exp(− x ), x ≥ 0, θ > 0,
pθ (x) = θ θ
0, otherwise.

We denote by θ0 the true parameter (unique).

(1) Provide the statistical criterion under the form

n
1X
θb = arg max Mn (θ), Mn (θ) = ℓ(Xi ; θ), ℓ(Xi ; θ) = log(pθ (Xi )).
θ n i=1

Provide the ML estimator θ.


b

(2) Show that θb is unbiased.

11
Exercise 9
Consider the i.i.d. sample of random variables X1 , . . . , Xn . For any i, Xi follows the Pareto
distribution P(a, b), a, b > 0, whose probability density function is given as follows:

aba
pθ (x) = 1(x ≥ b).
xa+1

The parameters of interest is θ = (a, b)⊤ ∈ R2 . Using the ML method, provide the ML
estimator θ.
b

12
Exercise 10
Consider a sample of i.i.d. random variables (X1 , · · · , Xn ) taking values in N∗ , whose law
is
∀i ≤ n, ∀x ∈ N∗ , P(Xi = x) = p(1 − p)x−1 ,

where p ∈ (0, 1) is the parameter of interest.

(1) Provide the log-likelihood function of the observations (X1 , · · · , Xn ) and the ML
statistical criterion.

(2) Compute pb the maximum likelihood estimator of p. Is it an unbiased estimator?


Use the Jensen’s inequality to show the last assertion:
Jensen inequality: If f (.) is convex on a real interval I and X is a random variable
taking values in I with E[X] < ∞. Then

f (E[X]) ≤ E[f (X)].

The inequality is strict for a strictly convex function.

13
Exercise 11
We consider the following non-linear regression model:

∀i = 1, . . . , n, Yi = µ + α exp(Xi β) + ϵi ,

where Yi ∈ R is the output, Xi = (Xi1 , . . . , XiK ) is a 1 × K vector of continuous random


variables. Here, µ ∈ R, α ∈ R and β ∈ RK . We assume: ϵi |X ∼ NR (0, σ 2 ), that is the
conditional distribution ϵi |X is Gaussian with moments E[ϵi |X] = 0, Var(ϵi |X) = σ 2 . We
propose to estimate the vector of parameters θ using the maximum likelihood method,
where θ = (µ, α, β ⊤ , σ 2 )⊤ ∈ R3+K . We assume that (Yi , Xi ), i = 1, . . . , n are i.i.d.

(1) Using the distribution assumption ϵi |X ∼ NR (0, σ 2 ) and the i.i.d. assumption on
the observations (Yi , Xi ), provide the log-likelihood under the form

n
1X
Mn (θ) = log(pθ (Yi |Xi )),
n i=1

and provide the conditional density pθ (Yi |Xi ).


(Carefully explain the assumptions you use and the change of variable method )

(2) Provide the first order conditions satisfied by the maximum likelihood estimator
θb (that is in total, 3 + K equations, each of them corresponding to the partial
derivative with respect to µ, α, β, σ 2 ).

(3) Assume K = 4 so that Xi = (Xi1 , Xi2 , Xi3 , Xi4 ) is a 1 × 4 vector and β =


(β1 , β2 , β3 , β4 )⊤ ∈ R4 so that θ = (µ, α, β1 , β2 , β3 , β4 , σ 2 )⊤ ∈ R7 . We want to test
H0 : β1 = β2 ,h β32 = β4 , using the Wald statistic
W ⊤
⊤ −1 −1 b
i−1
ζn = nr(θ)
b ∇θ r(θ) H (θ) J(θ) H (θ) ∇θ r(θ)
b b b b b b b r(θ).
b
b ⊤ and the distribution

Provide the restriction function r(θ),b the Jacobian ∇θ r(θ)
of ζnW when n → ∞.

14
Exercise 12
Let Yi a discrete random variable taking a finite number of values. Then we consider the
non-linear regression model

Zi1−ρ − 1
∀i = 1, . . . , n, Yi = γ + α + Vi β + ϵi ,
1−ρ

where Zi is a univariate random variable and Vi is a 1×K vector of continuous covariates.


Let Xi = (Zi , Vi ) We assume (Yi , Xi ), i = 1, . . . , n are i.i.d. We propose to estimate the
parameters using the maximum likelihood method. To do so, we assume that the error
term ϵi |X is a Gaussian variable, with moment

∀i, E[ϵi |X] = 0, Var(ϵi |X) = σ 2 .

The parameter vector of interest is

θ = (γ, α, ρ, β ⊤ , σ 2 )⊤ ∈ RK+4 .

We denote by θb the ML estimator and θ0 the true parameter (unique).

(1) Using the distribution ϵi |X ∼ NR (0, σ 2 ) and that (Yi , Xi ), i = 1, . . . , n are i.i.d.,
provide the log-likelihood under the form
n
1X
Mn (θ) = ℓ(Yi |Xi ; θ).
n i=1

(2) Provide the first order conditions (that is in total, 5 equations, each of them corre-
sponding to the gradient with respect to γ, α, ρ, β, σ 2 ).

(3) Due to the non-linearity with respect to the parameters, we propose to use the
Newton-Raphson method. Provide the algorithm with the corresponding Hessian
and score. We will provide the explicit components of the Hessian.

15

You might also like