ECMT Math Camp 2018 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 161

Calculus

Math Camp

ECMT

July 5, 2018

Math Camp (ECMT) Calculus July 5, 2018 1 / 38


Overview

1 Differentiation

2 Integration

3 Multi-Variate Calculus

Math Camp (ECMT) Calculus July 5, 2018 2 / 38


Differentiation

Math Camp (ECMT) Calculus July 5, 2018 3 / 38


Differentiation

Differentiation is commonly used in Optimization.

Informally, it is a measure of the rate of change of a function.

Geoetric representation of the derivative of a one-variable function:

Math Camp (ECMT) Calculus July 5, 2018 4 / 38


Differentiation

Geoetric representation of the derivative of two-variable function:

Math Camp (ECMT) Calculus July 5, 2018 5 / 38


Rules of Differentiation

For constants b, c and m, have

If f (x) = c then f 0 (x) = 0

If f (x) = mx + b, then f 0 (x) = m

If f (x) = x n , then f 0 (x) = nx n−1

dy
If y = e x , then dx = ex

dy 1
If y = ln x, then dx = x

If y = f (x) and x = f −1 (y ), then dx


dy = 1
dy /dx

Math Camp (ECMT) Calculus July 5, 2018 6 / 38


Rules of Differentiation

If g (x) = cf (x), then g 0 (x) = cf 0 (x)

If h(x) = g (x) ± f (x), then h0 (x) = g 0 (x) ± f 0 (x)

Pn Pn
If h(x) = i=1 gi (x), then h0 (x) = i=1 g
0 (x)

If h(x) = f (x)g (x), then h0 (x) = f 0 (x)g (x) + f (x)g 0 (x)

f (x) f 0 (x)g (x)−f (x)g 0 (x)


If h(x) = g (x) , g (x) 6= 0, then h0 (x) = [g (x)]2

dy dy du
If y = f (u) and u = g (x), then dx = du dx

Math Camp (ECMT) Calculus July 5, 2018 7 / 38


Continuity

Let S ⊂ Rn and T ⊂ Rl . Then f : S → T is said to be continuous at


x ∈ S if for all sequences xk in S such that limk→∞ xk = x, we have
limk→∞ f (xk ) = f (x).

The function f : S → T is said to be continuous on S if it is continuous at


all points in S.

Example of a non-continuous function

Math Camp (ECMT) Calculus July 5, 2018 8 / 38


Differentiability

Let S ⊂ Rn and T ⊂ Rl . Then f : S → T is said to be differntiable at a


point x ∈ S if there exists a l × n matrix A such that for all  > 0, there
exists a δ > 0 such that t ∈ S and ||t − x|| < δ implies

||f (t) − f (x) − A(t − x)||


lim =0
t→x ||t − x||

The matrix A is called the derivative of f at x and is denoted Df (x) (often


f 0 (x) or dx
df
).

Math Camp (ECMT) Calculus July 5, 2018 9 / 38


Differentiability

If f is differentiable at all points in S, then f is said to be differentiable on


S, and the derivative Df (x) itself forms a function s to Rl×n .

If Df : S → Rl×n is a continuous function, then f is said to be


continuously differentiable on S and we say f belongs to the class C 1
functions.

Example
x 2 sin( x1 )

if x 6= 0
f (x) =
0 if x = 0
is differentiable at x = 0 but not continuously differentiable.

However, thankfully most functions in economics are continuous


real-valued functions, e.g. utility functions and cost functions.

Math Camp (ECMT) Calculus July 5, 2018 10 / 38


Mean Value Theorem
Mean Value Theorem. If the function f (x) is continuous and differentiable
on some closed interval [a, b], then there must be a number c ∈ [a, b] such
that
f (b) − f (a)
f 0 (c) =
b−a
that is the secant line connecting A and B and the tangent line at must be
parallel.

Math Camp (ECMT) Calculus July 5, 2018 11 / 38


Intermediate Value Theorem

Intermediate Value Theorem. Let f : [a, b] → R be a continuous function.


If f (a) < f (b), and if c is a real number such that f (a) < c < f (b), then
there exists x ∈ (a, b) such that f (x) = c. A similar statement follows for
f (a) > f (b).

Intermediate Value Theorem in Rn . Let D ⊂ Rn be a convex set and let


f : D → R be continuous on D. Suppose that a and b are points in D
such that f (a) < f (b). Then for any c such that f (a) < c < f (b), there
exists a λ ∈ (0, 1) such that f ((1 − λ)a + λb) = c.

The Intermediate Value Theorem is useful for showing the existence of


f (x) = c, f ((1 − λ)a + λb) = c, etc.

Math Camp (ECMT) Calculus July 5, 2018 12 / 38


Intermediate Value Theorem

Intermediate Value Theorem for the Derivative. Let f : [a, b] → R be a


function that is differentiable everywhere on [a, b]. If and only if c is a real
number such that f 0 (a) < c < f 0 (b), then there is a point x ∈ (a, b) such
that f 0 (x) = c. A similar statement follows for f 0 (a) > f 0 (b).

Note that the above theorem does not assume f is C 1 .

Math Camp (ECMT) Calculus July 5, 2018 13 / 38


Implicit Function Theorem
Implicit Function Theorem.
If f1 , · · · , fn are differentiable functions on a neighbourhood of the
point (x0 , y0 ) = (x10 , · · · , xn0 , y10 , · · · , ym
0 ) in Rn+m

And if f1 (x0 , y0 ) = f2 (x0 , y0 ) = · · · = fn (x0 , y0 )


And if the following n × n matrix is non-singular
∂f1 ∂f1 ∂f1
···
 
∂x1 ∂x2 ∂xn
∂f2 ∂f2 ∂f2

 ∂x1 ∂x2 ··· ∂xn


 .. .. .. .. 
 . . . . 
∂fn ∂fn ∂fn
∂x1 ∂x2 ··· ∂xn

then there is a neighbourhood U of the point y0 = (y10 , · · · , ym0 ) in Rm ,

there is a neighbourhood V of the point x0 = (x1 , · · · , xn ) in Rn , and


0 0

there is a unique mapping ϕ : U → V such that ϕ(y0 ) = x0 and


f1 (ϕ(y ), y ) = · · · = fn (ϕ(y ), y ) = 0 for all y in U.
Math Camp (ECMT) Calculus July 5, 2018 14 / 38
Implicit Function Theorem

That is, if we write


ϕ(y ) = (g1 (y ), · · · , gn (y )
where g1 , · · · , gn are differentiable functions on U, then

x1 = g1 (y1 , · · · , ym )
.. ..
. .
xn = gn (y1 , · · · , ym )

is the unique solution to the above system of equations near y0 .

Math Camp (ECMT) Calculus July 5, 2018 15 / 38


Also  
∂ϕ1 (y ) ∂ϕ1 (y )
∂y1 ··· ∂ym
 .. .. .. 

 . . . 

∂ϕm (y )
∂y1 · · · ∂ϕ∂ymm(y )
∂f1 ∂f1 −1  ∂f1 (ϕ(y ),y ) ∂f1 (ϕ(y ),y )

···

∂x1 ··· ∂xm ∂y1 ∂ym
= −
 .. .. ..   .. .. .. 
. . .    . . . 

∂fn ∂fn ∂fn (ϕ(y ),y ) ∂fn (ϕ(y ),y )
∂x1 ··· ∂xn ∂y1 ··· ∂ym

The Implicit Function Theorem is the idea behind the Lagrange method.

Math Camp (ECMT) Calculus July 5, 2018 16 / 38


Implicit Function Theorem

Example

We want to investigate the behaviour of u and v in terms of x and y in


the neighbourhood of (2, −1, 2, 1) fot the following equations

x 2 − y 2 − u3 + v 2 + 4 = 0

2xy + y 2 − 2u 2 + 3v 4 + 8 = 0
Let
x 2 − y 2 − u3 + v 2 + 4
   
f1
f = =
f2 2xy + y 2 − 2u 2 + 3v 4 + 8
then
∂f1 ∂f1
−3u 2 2v
   
∂u ∂v =
∂f2 ∂f2 −4u 12v 3
∂u ∂v

Math Camp (ECMT) Calculus July 5, 2018 17 / 38


Implicit Function Theorem
and !
∂f1 ∂f1  
∂x ∂y 2x −2y
∂f2 ∂f2 =
∂x ∂y
2y 2y

Hence
! −1 
∂u ∂u
−3u 2 2v
 
∂x ∂y 2x −2y
∂v ∂v =−
∂x ∂y
−4u 12v 3 2y 2y
−1 
−12v 3 2v
 
1 2x −2y
=
8uv − 36u 2 v 2 −4u 3u 2 2y 2y

That is, for example


∂u 1
= (−24xv 3 + 4vy )
∂x 8uv − 36u 2 v 2

Math Camp (ECMT) Calculus July 5, 2018 18 / 38


Taylor Series

The Taylor Series expansion of the function f (x) in a neighbourhood of


the value x = x0 is
n−1 (k)
X f (x0 )(x1 − x0 )k
f (x1 ) = f (x0 ) + [ ] + Rn
k!
k−1

where Rn = f (n) (ξ)(x1 − x0 )n /n! and ξ lies between x0 and x1 . The


function is assumed to possess derivatives to the nth order.

Note Rn can be made as small as one wishes by taking n large.

Taylor Series is useful as linear approximation of functions.

Math Camp (ECMT) Calculus July 5, 2018 19 / 38


Taylor Series

Example

By Taylor Series expansion,

x2 x3 x n−1
ex = 1 + x + + + ··· + + Rn
2! 3! (n − 1)!
ξn
where Rn = n! and ξ lies between 0 and x.

Math Camp (ECMT) Calculus July 5, 2018 20 / 38


Integration

Math Camp (ECMT) Calculus July 5, 2018 21 / 38


Integration is commonly used in Probability and Statistics.

Informally, it is a measure of the area under a function.

Geoetric representation of the integral:

Math Camp (ECMT) Calculus July 5, 2018 22 / 38


Fundamental Theorem of Calculus

If the function f (x) is continuous on the closed interval [a, b] and if F (x)
is any antiderivative (indefinite integral) of f (x), then
Z b
f (x)dx = F (b) − F (a)
a

where F (b) is the antiderivative of f (x) at the point x = b and F (a) is the
antiderivative of f (x) at the point x = a. The expression [F (b) − F (a)] is
often denoted as [F (x)]ba .

The Fundamental Theorem of Calculus establishes the relationship


between the derivative and anti-derivative (integral).
∂F (x) R
If f (x) = ∂x , then f (x)dx = F (x) + C .

Math Camp (ECMT) Calculus July 5, 2018 23 / 38


Rules of Integration

x n=1
Z
x n dx = +C
n+1
Z Z Z
f (x) ± g (x)dx = f (x)dx ± g (x)dx
Z Z
kf (x)dx = k f (x)dx
Z
e x dx = e x dx + C
Z
1
= ln(x) + C
x

Math Camp (ECMT) Calculus July 5, 2018 24 / 38


Properties of Integration

Z c Z b Z c
f (x)dx = f (x)dx + f (x)dx
a a b
Z a Z c
f (x)dx ≡ lim f (x)dx = 0
a c→a a
Z c Z a
f (x)dx = − f (x)dx
a c

Math Camp (ECMT) Calculus July 5, 2018 25 / 38


Techniques of Integration

Substitution

If F (u) is the antiderivative of f (u) and u = g (x), then


Z Z Z
du 0
f (u) dx = f (g (x))g (x)dx = f (u)du = F (u) + C
dx
Example Z
(x 3 + e x )(3x 2 + e x )dx

du d(x 3 +e x )
let u = x 3 + e x , then dx = dx = 3x 2 + e x so have

u2 (x 3 + e x )2
Z Z
du
u dx = u du = +C = +C
dx 2 2

Math Camp (ECMT) Calculus July 5, 2018 26 / 38


Techniques of Integration

Integration by Parts

Suppose thatu = f (x) and v = f (x) are continuous functions, then


Z Z
v du = uv − u dv

Example Z
xe x dx

Let u = e x and v = x, then du x x


dx = e ⇒ du = e dx and
dv
dx = 1 ⇒ dv = dx so
Z Z
xe x dx = e x x − e x dx = e x (x − 1) + C

Math Camp (ECMT) Calculus July 5, 2018 27 / 38


Fubini’s Theorem

Fubini’s Theorem Let f (x, y ) be continuous on a compact interval


I = [a, b] × [c, d] where x ∈ [a, b] and y ∈ [c, d]. Then
Z Z b Z d  Z d Z b 
f (x, y )d(x, y ) = f (x, y )dy dx = f (x, y )dx dy
[a,b]×[c,d] a c c a

Useful for changing the order of integration.

Math Camp (ECMT) Calculus July 5, 2018 28 / 38


Fubini’s Theorem

Example

For A = [0, 1] × [0, 1],


Z Z 1 Z 1  Z 1 Z x 
xy xy z
xe dxdy = xe dy dx = e dz dx
A 0 0 0 0
Z 1
= (e x − 1)dx = e − 2
0

Math Camp (ECMT) Calculus July 5, 2018 29 / 38


Leibniz’s Rule

Leibniz’s Rule. If f (θ), a(θ) and b(θ) are differentiable with respect to θ,
then Z b(θ)
d
f (x, θ)dx
dθ a(θ)
Z b(θ)
d d ∂f (x, θ)
= f (b(θ), θ) b(θ) − f (a(θ), θ) a(θ) + dx
dθ dθ a(θ) ∂θ

If a(θ) and b(θ) are constants, then we have


Z b Z b
d ∂f (x, θ)
f (x, θ)dx = dx
dθ a a ∂θ

Useful for bringing the differentiation inside the integral. Also useful for
finding integrals by differentiating first.

Math Camp (ECMT) Calculus July 5, 2018 30 / 38


Leibniz’s Rule

Example
1
xα − 1
Z
dx (α ≥ 0)
0 ln x
x α −1
R1
Let F (α) = 0 ln x dx. Differentiating both sides with respect to α
1 1 1
xα − 1 ∂ xα − 1
Z Z Z
0 d 1
F (α) = dx = dx = x α dx =
dα 0 ln x 0 ∂α ln x 0 α+1

Integrating both sides with respect to α, get F (α) = ln(α + 1) + C . Since


F (0) = 0, have C = 0. So F (α) = ln(α + 1) that is
1
xα − 1
Z
(For α ≥ 0) dx = ln(α + 1)
0 ln x

Math Camp (ECMT) Calculus July 5, 2018 31 / 38


Multi-Variate Calculus

Math Camp (ECMT) Calculus July 5, 2018 32 / 38


Partial Derivative

The partial derivative of a function y = f (x1 , x2 , · · · , xn ) with respect to


the variable xi is

∂f f (x1 , · · · , xi + ∆xi , · · · , xn ) − f (x1 , · · · , xi , · · · , xn )


= lim
∂xi ∆x→0 ∆xi

Useful for finding the rate of change with respect to one variable keeping
all others constant.

Math Camp (ECMT) Calculus July 5, 2018 33 / 38


Partial Derivative

Example

Let
f (x1 , x2 ) = Ax1α x2β
then
∂f
= αAx1α−1 x2β
∂x1
∂f
= βAx1α x2β−1
∂x2

Math Camp (ECMT) Calculus July 5, 2018 34 / 38


Young’s Theorem

Young’s Theorem. If f (x ) : Rn → R has continuous second partial


derivatives, then the order of differentiation in computing the cross-partials
is irrelevant, that is for i 6= j

∂2f ∂2f
=
∂xi ∂xj ∂xj ∂xi

Useful for swapping the order of differentiation.

Corollary: The Hessian matrix is symmetric for functions with continuous


second partial derivatives.

Math Camp (ECMT) Calculus July 5, 2018 35 / 38


Directional Derivative

Let f : RN → RM . Let x0 ∈ R and v ∈ RN . Then the directional


derivative ∂∂fv is defined as

∂f f (x0 + hv ) − f (x0 )
= lim
∂v h→0,h6=0 h

Note that for f : RN → RM differentiable at x0 and any vector v ∈ RN


have
∂f
= (Df (x0 ))(v )
∂v

Useful for finding rate of change with respect to a vector, e.g.


consumption bundle.

Math Camp (ECMT) Calculus July 5, 2018 36 / 38


Directional Derivative

Example

Let f (x, y ) = xy . Let v1 = (1, 1) and v2 = (2, 2).

Then the directional derivative is given by


∂f
=x +y
v1

∂f
= 2x + 2y
v2

Note normally the directional derivatives are computed in terms of unit


vectors, i.e. ||v || = 1.

Math Camp (ECMT) Calculus July 5, 2018 37 / 38


References

Hoy et. al. (2011).Mathematics for Economics, 3rd ed.). London:


MIT Press.
Loomis, L.H. and Sternberg, S. (1990). Advanced Calculus. Boston:
Jones and Bartlett Publishers.
Hallam, A. (2005). Retrieved from
http://www2.econ.iastate.edu/classes/econ500/hallam/
Royster, D. C. (2009). Retrieved from www.ms.uky.edu/
~droyster/courses/fall98/math4080/classnotes/
Yu, w. W. (2013). Retrieved from
https://www.mathualberta.ca/~xinweiyu/217.1.13f/
Hastings, S. (2011). Retrieved from
www.math.pitt.edu/~sph/1540/
Pervin, W. J. (2012). Retrieved from
www.utdallas.edu/~pervin/ENGR3300/

Math Camp (ECMT) Calculus July 5, 2018 38 / 38


Optimization

Math Camp

ECMT

July 17, 2018

Math Camp (ECMT) Optimization July 17, 2018 1 / 51


Overview

1 Set Theory

2 Unconstrained Optimization

3 Constrained Optimization
Lagrange Method
Envelope Theorem
Kuhn-Tucker Theorem

Math Camp (ECMT) Optimization July 17, 2018 2 / 51


Set Theory

Math Camp (ECMT) Optimization July 17, 2018 3 / 51


Definitions

A set S is a collection of elements that possess a certain property P(x),


written
S = {x : P(X )}
If x is an element of set S, we write

x ∈S

If x is not an element of set S, we write

x∈
/S

Example of a set

S = {x : x is an integer and x ≤ 5} = {1, 2, 3, 4, 5}

Math Camp (ECMT) Optimization July 17, 2018 4 / 51


Number Sets

Number sets:
Natural numbers N = {1, 2, 3, ...}
Integers Z = {..., −2, −1, 0, 1, 2, ...}
Positive integers Z+ = {1, 2, 3, ...}
Negative integers Z− = {..., −3, −2, −1}
Rational numbers Q = { pq : p ∈ Z and q ∈ Z}
Real numbers R = (−∞, ∞)
Positive real numbers R+ = [0, ∞)
Strictly positive real numbers R++ = (0, ∞)
Negative real numbers R− = (−∞, 0]
Strictly negative real numbers R−− = (−∞, 0)

Math Camp (ECMT) Optimization July 17, 2018 5 / 51


Sets and Subsets

If all the elements of set X are also elements of set Y , then X is a subset
of Y , written
X ⊆Y
If all the elements of set X are in set Y , but not all elements of set Y are
in set X , then X is a proper subset of Y , written

X ⊂Y

If X ⊆ Y and Y ⊆ X , then X and Y contain exactly the same elements,


i.e. they are equal
X =Y
The universal set U is the set that contains all the elements of every
possible set.

The empty set or the null set is the set with no elements, written ∅
Math Camp (ECMT) Optimization July 17, 2018 6 / 51
Sets and Subsets

The intersection W of two sets X and Y is the set of elements that are in
both X and Y

W = X ∩ Y = {x : x ∈ X and x ∈ Y }

The union V of two sets X and Y is the set of elements that are in one or
other of the sets

V = X ∪ Y = {x : x ∈ X or x ∈ Y }

Example given X = {1, 2, 3} and Y = {3, 4, 5}, have

X ∩ Y = {3}

X ∪ Y = {1, 2, 3, 4, 5}

Math Camp (ECMT) Optimization July 17, 2018 7 / 51


Sets and Subsets
The complement X C of set X is the set of elements of the universal set U
that are not elements of X
X C = {x ∈ U : x ∈
/ X}
Note: UC = ∅ and ∅C = U

The relative difference X − Y of X and Y is the set of elements of X that


are not also in Y
X − Y = {x ∈ U : x ∈ X and x ∈
/ Y}
Note: X − Y = X ∩ Y C

Example given U = {1, 2, 3, 4, 5}, X = {1, 2, 3} and Y = {3, 4, 5},


X C = {4, 5}
X − Y = {1, 2}
Math Camp (ECMT) Optimization July 17, 2018 8 / 51
Sets and Subsets
Sets X1 , X2 , ..., Xn is a partition S of the universal set U if X1 , X2 , ..., Xn
are disjoint and the union of X1 , X2 , ..., Xn is U, that is
n
[
S = {x ⊆ U : Xi = U and Xi ∩ Xj = ∅ for i, j = 1, ..., n and i 6= j}
i=1

Example given U = {1, 2, 3, 4, 5}, then X1 = {1, 2}, X2 = {3, 4},


X3 = {5} is a partition.

The power set P(X ) of a set X is the set of all subsets of X

P = {A : A ⊆ X }

Example if X = {1, 2, 3}, then

P(X ) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}

Math Camp (ECMT) Optimization July 17, 2018 9 / 51


Intervals and Euclidean Distance

Closed interval [a, b] = {x ∈ R : a ≤ x ≤ b}


Half-open interval (a, b] = {x ∈ R : a < x ≤ b}
Half-open interval [a, b) = {x ∈ R : a ≤ x < b}
Open interval (a, b) = {x ∈ R : a < x < b}

The Euclidean distance d(a, b) between points a = (a1 , ..., an ) and


b = (b1 , ..., bn ) in Rn is
v
u n
uX
d(a, b) = t (ai − bi )2
i=1

Example
q √
d[(2, 3, 4), (4, 1, −5)] = (2 − 4)2 + (3 − 1)2 + (4 − (−5))2 = 89

Math Camp (ECMT) Optimization July 17, 2018 10 / 51


Closed and Bounded Sets

An -neighbourhood N (x0 ) of a point xo ∈ Rn is the set of points lying


within a distance of  of x0

N (x0 ) = {x ∈ Rn : d(x0 , x) < }

A set X ⊂ Rn is open if, for every x ∈ X , there exists an  such that


N (x) ⊂ X .

A boundary point of a set X ⊂ Rn is a point x0 such that every


-neighbourhood N (x0 ) contains points that are in and points that are not
in X .

A set X ⊂ Rn is closed if its complement X C ⊂ Rn is an open set.

A set X ⊂ Rn is bounded if, for every x0 ∈ X , there exists an  < ∞ such


that X ⊂ N (x0 ).
Math Camp (ECMT) Optimization July 17, 2018 11 / 51
Convex Sets
The convex combination of two points, x, x 0 ∈ Rn is the set of points
x̄ ∈ Rn for some λ ∈ [0, 1], given by
x̄ = λx + (1 − λ)x 0
A set X ⊂ Rn is convex if for every pair of points x, x 0 ∈ Rn , and any
λ ∈ [0, 1], the point
x̄ = λx + (1 − λ)x 0
also belongs to the set X .

An interior point of a set X ⊂ Rn is a point x0 ∈ X for which there exists


an  such that N (x0 ) ⊂ X .

A set X ⊂ Rn is strictly convex, if for every pair of points x, x 0 ∈ Rn , and


every λ ∈ (0, 1), we have that x̄ is an interior point of X , where
x̄ = λx + (1 − λ)x 0
Math Camp (ECMT) Optimization July 17, 2018 12 / 51
Unconstrained Optimization

Math Camp (ECMT) Optimization July 17, 2018 13 / 51


Global/Local Maximum/Minimum

At a global maximum x ∗ , have for all x

f (x ∗ ) ≥ f (x)

At a local maximum x̂, have for all x where x̂ −  ≤ x ≤ x̂ + 

f (x̂) ≥ f (x)

At a global minimum x ∗ , have for all x

f (x ∗ ) ≤ f (x)

At a local minimum x̂, have for all x where x̂ −  ≤ x ≤ x̂ + 

f (x̂) ≤ f (x)

Math Camp (ECMT) Optimization July 17, 2018 14 / 51


First and Second-Order Conditions

If the differentiable function f takes an extreme value (maximum or


minimum) at a point x ∗ , then f 0 (x ∗ ) = 0

For a differentiable function f , point x ∗ , at which f 0 (x) = 0, yields a


stationary value of the function. Such stationary values may be extreme
values or points of inflection. Every extreme value of a function is a
stationary value, but not every stationary value need be an extreme value.

If f 0 (x ∗ ) = 0, and f 00 (x ∗ ) > 0, then f has a local minimum at x ∗


If f 0 (x ∗ ) = 0, and f 00 (x ∗ ) < 0, then f has a local maximum at x ∗

Math Camp (ECMT) Optimization July 17, 2018 15 / 51


First and Second-Order Conditions

For matrices,

If ∇f (X ∗ ) = 0 and H(f )(X ∗ ) is positive definite, then f has a local


minimum at X ∗ .
If ∇f (X ∗ ) = 0 and H(f )(X ∗ ) is negative definite, then f has a local
maximum at X ∗ .

Note a matrix A is positive definite if the k-th leading principal minors Ak


are such that |A1 | > 0, |A2 | > 0, |A3 | > 0, |A4 | > 0, and so on
Similarly a matrix A is negative definite if the k-th leading principal minors
Ak are such that |A1 | < 0, |A2 | > 0, |A3 | < 0, |A4 | > 0, and so on

Math Camp (ECMT) Optimization July 17, 2018 16 / 51


First and Second-Order Conditions

Example

Given
f (x1 , x2 , x3 ) = 5x12 + 2x22 + x34 − 32x3 + 6x1 x2 + 5x2
Solving ∇f (X ∗ ) = 0
   
10x1 + 6x2 0
∇f =  6x1 + 4x2 + 5  =  0 
4x33 − 32 0

which gives  
7.5
X ∗ =  −12.5 
2

Math Camp (ECMT) Optimization July 17, 2018 17 / 51


First and Second-Order Conditions

The Hessian is given by


 
10 6 0
H= 6 4 0 
0 0 12x32

with
|H1 | = |10| = 10

10 6
|H2 | = =4
6 4

10 6 0

|H3 | = 6 4
0 = 192

0 0 12(2)2

Hence H(f )(X ∗ ) is positive definite and f has a local minimum at X ∗ .

Math Camp (ECMT) Optimization July 17, 2018 18 / 51


Matrix Differentiation Identities

∂x T b ∂b T x
= =b
∂x ∂x
∂Ax ∂x T A
= =A
∂x ∂x
∂y T Ax ∂x T AT y
= = AT y
∂x ∂x
∂x T Ax
= (A + AT )x
∂x
∂2
= (A + AT )
∂x∂x T

Math Camp (ECMT) Optimization July 17, 2018 19 / 51


Matrix Differentiation Identities

∂aT Xb
= ab T
∂X
n−1
∂ T n X
a X b= (X r )T ab T (X n−1−r )T
∂X
r =0

∂aT X T b
= baT
∂X
∂aT Xa ∂aT X T a
= = aaT
∂X ∂X
∂aT X T Xb
= X (ab T + baT )
∂X
n−1
∂ T n T n X
a (X ) X b = [X n−1−r ab T (X n )T X r + (X r )T X n ab T (X n−1−r )T ]
∂X
r =0

Math Camp (ECMT) Optimization July 17, 2018 20 / 51


Weiestrass’s Theorem

If f is a continuous function, and X is a nonempty, closed and bounded


set, then f has both a maximum and a minimum on X .

Math Camp (ECMT) Optimization July 17, 2018 21 / 51


Constrained Optimization

Math Camp (ECMT) Optimization July 17, 2018 22 / 51


Lagrange Method
Let (x1∗ , x2∗ ) be a solution to the constrained maximization problem

max f (x1 , x2 ) s.t. g (x1 , x2 ) = 0

then the Lagrange method of finding (x1∗ , x2∗ ) consists of deriving the
Lagrange function

L(x1 , x2 , λ) = f (x1 , x2 ) + λg (x1 , x2 )

to satisfy the following first-order conditions to the stationary point(s) of


the Lagrange function
∂L
=0
∂x1
∂L
=0
∂x2
∂L
=0
∂λ
Math Camp (ECMT) Optimization July 17, 2018 23 / 51
Lagrange Method
Example given
max y = x10.25 x20.75 s.t. 100 − 2x1 − 4x2 = 0
then the Lagrange function is
L = x10.25 x20.75 + λ(100 − 2x1 − 4x2 )
First-order conditions
∂L
= 0.25x1−0.75 x20.75 − 2λ = 0
∂x1
∂L
= 0.75x10.25 x2−0.25 − 4λ = 0
∂x2
∂L
= 100 − 2x1 − 4x2 = 0
∂λ
which gives
600 300
x1∗ = and x2∗ =
48 16
Math Camp (ECMT) Optimization July 17, 2018 24 / 51
Lagrangian Method for Multiple Constraints

In the constrained maximization problem

max f (x1 , · · · , xn ) s.t. g 1 (x1 , · · · , xn ) = 0, · · · , g m (x1 , · · · , xn ) = 0

where m < n, if x ∗ is a solution to the problem, and if the n × m matrix


∂ j ∗
G = ∂x i
g (x1 , · · · , xn∗ ) has rank m, then there exist real numbers
λ1 , . . . , λm such that (x1∗ , · · · , xn∗ ) satisfies n + m conditions

∂ X ∂ j ∗
f (x1∗ , · · · , xn∗ ) + λj g (x1 , · · · , xn∗ ) = 0
∂xi ∂xi
j

g j (x1∗ , · · · , xn∗ ) = 0
where i = 1, · · · , n and j = 1, · · · , m

Math Camp (ECMT) Optimization July 17, 2018 25 / 51


Quasiconcavity

A level set of the function y = f (x1 , x2 , ..., xn ) is the set

L = {(x1 , ..., xn ) ∈ Rn : f (x1 , x2 , ..., xn ) = c}

The better or upper contour set of the point (x10 , x20 , ..., xn0 ) is

B(x10 , x20 , ..., xn0 ) = {(x1 , ..., xn ) ∈ X : f (x1 , ..., xn ) ≥ f (x10 , x20 , ..., xn0 )}

A function f with domain X ⊆ Rn is quasiconcave if for every point in X ,


the better set B of that point is a convex set. It is strictly quaisconcave if
B is strictly convex.

Math Camp (ECMT) Optimization July 17, 2018 26 / 51


Quasiconvexity

The worse or lower contour set of the point (x10 , x20 , ..., xn0 ) is

W (x10 , x20 , ..., xn0 ) = {(x1 , ..., xn ) ∈ X : f (x1 , ..., xn ) ≥ f (x10 , x20 , ..., xn0 )}

A function f with domain X ⊆ Rn is quasiconvex if for every point in X ,


the better set W of that point is a convex set. It is strictly quaisconvex if
W is strictly convex.

Math Camp (ECMT) Optimization July 17, 2018 27 / 51


Global Optimum

In the constrained maximization problem

max f (x1 , · · · , xn ) s.t. g 1 (x1 , · · · , xn ) = 0, · · · , g m (x1 , · · · , xn ) = 0

if the function f is quasiconcave, and the functions g 1 , · · · , g m are all


quasiconvex, then any locally optimal solution to the problem is also
globally optimal.

Math Camp (ECMT) Optimization July 17, 2018 28 / 51


Global Optimum

In the constrained maximization problem

max f (x1 , · · · , xn ) s.t. g 1 (x1 , · · · , xn ) = 0, · · · , g m (x1 , · · · , xn ) = 0

if f and g are increasing functions of x = (x1 , · · · , xn ), if either

(i) f is strictly quaisconcave and the functions g j for j = 1, · · · , m are all


quasiconvex
or
(ii) f is quaisconcave and the functions g j for j = 1, · · · , m are all strictly
quasiconvex

then a locally optimal solution is unique and also globally optimal.

Math Camp (ECMT) Optimization July 17, 2018 29 / 51


General Method

Given a model of n equations


 
f1 (x1 , · · · , xn ; α1 , · · · , αm )
f (x1 , · · · , xn ; α1 , · · · , αm ) =  ..
=0
 
.
fn (x1 , · · · , xn ; α1 , · · · , αm )

where (x1 , · · · , xn ) are endogeneous variables whose values the model


is deisgned to explain
where (α1 , · · · , αm ) are exogeneous variables whose values are taken
as given from outside the model

The functions are assumed to have continuous partial derivatives up to the


r th order over some open sets of points in Rn+m

Math Camp (ECMT) Optimization July 17, 2018 30 / 51


General Method
Let F be the Jacobian J(f ) with respect to the xi ’s
 ∂f1 ∂f1 ∂f1 
∂x1 ∂x2 · · · ∂xn
 ∂f2 ∂f2 · · · ∂f2 
 ∂x ∂x2 ∂xn 
F =  .1 .. .. .. 
 .. . . . 
∂fn ∂fn ∂fn
∂x1 ∂x2 ··· ∂xn

Hence for (x1∗ , · · · , xn∗ ; α10 , · · · , αm


0 ) a point satisfying the model of n

equations, we have in some neighbourhood (x1∗ , · · · , xn∗ ; α10 , · · · , αm 0 ), for

j = 1, · · · , m
 ∂x ∗  
∂f1 
 ∂f1 ∂f1
· · · ∂f1 
∂α
1
− ∂α
∂x1 ∂x2 ∂xn  ∗  j j
 ∂f2 ∂f2 · · · ∂f2   ∂x2   − ∂f2 
 ∂αj 
 ∂x1 ∂x2 ∂xn   ∂αj 
 . . . .  .  =  .. 

 . . .
. . . .
.  ..  
 ∗   . 
∂fn ∂fn ∂fn ∂xn ∂fn
∂x1 ∂x2 · · · ∂xn − ∂α
∂αj j

Math Camp (ECMT) Optimization July 17, 2018 31 / 51


General Method

Assume that the determinant |F | =


6 0

By Cramer’s Rule
∂xi∗ |Fij |
=
∂αj |F |
where Fij is given by replacing the ith column of F by the jth column of
the Jacobian J(f ) with respect to the αj ’s
∂f1 ∂f1 ∂f1
− ∂α − ∂α ··· − ∂α
 
1 2 m
∂f2 ∂f2 ∂f2
 − ∂α 1
− ∂α 2
··· − ∂α m

J=
 
.. .. .. .. 
 . . . . 
∂fn ∂fn ∂fn
− ∂α 1
− ∂α 2
··· − ∂α m

Math Camp (ECMT) Optimization July 17, 2018 32 / 51


General Method

Example

Given
max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = m

Applying Lagrangian method, get the following first-order conditions


∂u
f1 (x1∗ , x2∗ , λ∗ ) = − λ∗ p1 = 0
∂x1
∂u
f2 (x1∗ , x2∗ , λ∗ ) = − λ∗ p2 = 0
∂x2
f3 (x1∗ , x2∗ , λ∗ ) = m − p1 x1∗ − p2 x2∗ = 0

Math Camp (ECMT) Optimization July 17, 2018 33 / 51


General Method

Ordering the endogeneous variables from first to last as x1∗ , x2∗ , λ∗ get

∂2u ∂2u
∂x1 ∂x1 ∂x1 ∂x2 −p1

∂2u 2
|F | = ∂x ∂x ∂x∂ ∂x u
−p2

2 1 2 2
−p1 −p2 0

Ordering the exogeneous variables from first to last as p1 , p2 , m get


 ∗ 
λ 0 0
J =  0 λ∗ 0 
x1∗ x2∗ −1

We can then solve for the partial derivatives at equilibrium

Math Camp (ECMT) Optimization July 17, 2018 34 / 51


General Method


λ∗ ∂2u
∂x1 ∂x2 −p1

∂2u
−p2

0 ∂x2 ∂x2


∂x1 |F11 | x1 −p2 0
= =
∂p1 |F | |F |

∂2u 0 −p1
∂x1 ∂x1
∂2u
λ∗ −p2

∂x ∂x
2 1
x2∗

∂x2 |F22 | −p1 0
= =
∂p2 |F | |F |

Math Camp (ECMT) Optimization July 17, 2018 35 / 51


General Method


∂2u
0
∂x1 ∂x2 −p1

∂2u
−p2

0 ∂x2 ∂x2


∂x1 |F13 | −1 −p2 0
= =
∂m |F | |F |

∂2u 0 −p1
∂x1 ∂x1
∂2u
λ∗ −p2

∂x ∂x
2 1
∂x2 |F23 | −p1 −1 0
= =
∂m |F | |F |

Math Camp (ECMT) Optimization July 17, 2018 36 / 51


Envelope Theorem
General idea:

Consider the maximization problem

max f (x1 , x2 ; α) s.t. g (x1 , x2 ; α) = 0

The Lagrangian is given by

L(x1 , x2 ; α) = f (x1 , x2 ; α) + λg (x1 , x2 ; α)

with first-order conditions


∂f (x1∗ , x2∗ ; α) ∂g (x1∗ , x2∗ ; α)
+ λ∗ =0
∂x1 ∂x1
∂f (x1∗ , x2∗ ; α) ∂g (x1∗ , x2∗ ; α)
+ λ∗ =0
∂x2 ∂x2
g (x1∗ , x2∗ ; α) = 0
Math Camp (ECMT) Optimization July 17, 2018 37 / 51
Envelope Theorem
Expressing the solutions as a function of α

x1∗ = x1∗ (α), x2∗ = x2∗ (α) and λ∗ = λ∗ (α)

and substituting back into f get

f (x1∗ , x2∗ ; α) = V (α)

Differentiating V (α) have

dV df dx1 df dx2 df
= + +
dα dx1 dα dx2 dα dα
Substituting first two FOCs
dV dg dx1 dg dx2 df
= −λ∗ ( + )+
dα dx1 dα dx2 dα dα

Math Camp (ECMT) Optimization July 17, 2018 38 / 51


Envelope Theorem

Differentiating third FOC get


dg dx1 dg dx2 dg
+ + =0
dx1 dα dx2 dα dα
So
dV df dg
= + λ∗
dα dα dα
Similarly

L(α) = f (x1∗ (α), x2∗ (α); α) + λ∗ (α)g (x1∗ (α), x2∗ (α); α)

Differentiating get
dL d dx1 d dx2 df dλ∗ dg
= (f + λ∗ g ) + (f + λ∗ g ) + + g + λ∗
dα dx1 dα dx2 dα dα dα dα

Math Camp (ECMT) Optimization July 17, 2018 39 / 51


Envelope Theorem

Substituting FOCs
dL dx1 dx2 df dλ dg
=0· +0· + + · 0 + λ∗
dα dα dα dα dα dα
dL df dg
= + λ∗
dα dα dα
Hence
dV dL df dg
= = + λ∗
dα dα dα dα

Math Camp (ECMT) Optimization July 17, 2018 40 / 51


Envelope Theorem
Envelope Theorem
Given
max f (x1 , · · · , xn ; α1 , · · · , αm )
subject to

gk (x1 , · · · , xn ; α1 , · · · , αm ) = 0 for k = 1, · · · , K

and the corresponding value function V α1 , · · · , αm ) have


K
dV dL df X dgk
= = + λj
dαj dαj dαj dαj
k=1

The Lagrange multiplier measures the rate at which the value function
changes when the corresponding constraint is tightened or relaxed slightly.
If a constraint is nonbinding at the optimum, so that a small tightening or
relaxing of it has no effect on the solution, then the associated Lagrange
multiplier will take the value zero at the optimum.
Math Camp (ECMT) Optimization July 17, 2018 41 / 51
Envelope Theorem

Example

max Y = p1 x1 + p2 x2 s.t. xi = ai Lbi for i = 1, 2


L1 + L2 = L0
for ai > 0 and 0 < b < 1.

The Lagrange function is given by

L = p1 a1 Lb1 + p2 a2 Lb2 + λ(L0 − L1 − L2 )

FOCs
bpi ai Lib−1 − λ∗ = 0 for i = 1, 2
L0 − L1 − L2 = 0

Math Camp (ECMT) Optimization July 17, 2018 42 / 51


Envelope Theorem

Solving
L1 = c1 L0 and L2 = c2 L0
where
p1 a1 1/(b−1) −1
c1 = [1 + ( ) ] and c2 = 1 − c1
p2 a2
Optimized value function

Y ∗ = p1 a1 [c1 L0 ]b + p2 a2 [c2 L0 ]b = V (p1 , p2 , L0 )

then by Envelope Theorem have


∂V ∂L
= = λ∗
∂L0 ∂L0
∂V ∂L
= = x1∗
∂p1 ∂p1

Math Camp (ECMT) Optimization July 17, 2018 43 / 51


Kuhn-Tucker Conditions

Given inequality constraints

max f (x1 , x2 ) s.t. g (x1 , x2 ) ≥ 0 for x1 , x2 ≥ 0

where both f and g are concave and differentiable, the Langrange function

L(x1 , x2 , λ) = f (x1 , x2 ) + λg (x1 , x2 )

is maximized w.r.t. x1 , x2 and minimized w.r.t. λ subject to x1 , x2 , λ ≥ 0.

Math Camp (ECMT) Optimization July 17, 2018 44 / 51


Kuhn-Tucker Conditions

The Kuhn-Tucker conditions are


∂L ∂f ∗ ∗ ∂g ∗ ∗
= (x1 , x2 ) + λ∗ (x , x ) ≤ 0 where xi∗ ≥ 0
∂xi ∂xi ∂xi 1 2
∂L
xi∗ =0
∂xi
∂L
= g (x1∗ , x2∗ ) ≥ 0 whereλ∗ ≥ 0
∂λ
∂L
λ∗ =0
∂λ

Math Camp (ECMT) Optimization July 17, 2018 45 / 51


Kuhn-Tucker Theorem

Kuhn-Tucker Theorem

Given
max f (x1 , x2 ) s.t. g (x1 , x2 ) ≥ 0 for x1 , x2 ≥ 0
if f and g are concave and differnetiable, and if there exists a point
(x10 , x20 ) such that g (x10 , x20 ) > 0, then there exists a Lagrange multiplier
λ∗ such that the Kuhn-Tucker conditions are both necessary and sufficient
for the point (x1∗ , x2∗ ) to be a solution to the problem.

Math Camp (ECMT) Optimization July 17, 2018 46 / 51


Kuhn-Tucker Theorem
Example

max u(x1 , x2 ) s.t. m − p1 x1 − p2 x2 ≥ 0 and x1 , x2 ≥ 0

Lagrange function

L = u(x1 , x2 ) + λ(m − p1 x1 − p2 x2 )

Kuhn-Tucker Conditions
∂L ∂u
= − λ∗ pi ≤ 0 where xi∗ ≥ 0
∂xi ∂xi
∂u
xi∗ ( − λ∗ pi ) = 0
∂xi
∂L
= m − p1 x1∗ − p2 x2∗ ≥ 0 where λ∗ ≥ 0
∂λ
λ∗ (m − p1 x1∗ − p2 x2∗ ) = 0
Math Camp (ECMT) Optimization July 17, 2018 47 / 51
Kuhn-Tucker Theorem

If xi∗ > 0 and x2∗ > 0, get ∂u


∂xi = λ∗ pi and

∂u
∂x1 p1
∂u
=
∂x2
p2

If xi∗ > 0 and x2∗ = 0, get


∂u
= λ ∗ p1
∂x1
∂u
≤ λ ∗ p2
∂x2
and
∂u
∂x1 p1
∂u

∂x2
p2

Math Camp (ECMT) Optimization July 17, 2018 48 / 51


Kuhn-Tucker Theorem

Kuhn-Tucker Theorem (General Case)

Given
max f (x1 , · · · , xn )
subject to
gj (x1 , · · · , xn ) ≥ 0 for j = 1, · · · , m
if all the funuctions f and gj are concave and differentiable, and there
exists a point (x10 , · · · , xn0 ) such that for all gj (x10 , · · · , xn0 ) > 0, then there
exist m Lagrange multipliers λ∗j such that the following conditions are
necessary and sufficient for the point (x1∗ , · · · , xn∗ ) to be a solution to the
problem.

Math Camp (ECMT) Optimization July 17, 2018 49 / 51


Kuhn-Tucker Theorem

Conditions
∂f (x1∗ , · · · , xn∗ ) X ∗ ∂gj (x1∗ , · · · , xn∗ )
− λj ≤0 and xi ≥ 0
∂xi ∂xi
∂f X ∂gj
xi∗ ( − λ∗j )=0
∂xi ∂xi
gj (x1∗ , · · · , xn∗ ) ≥ 0 and λ∗j ≥ 0
λ∗j gj (x1∗ , · · · , xn∗ ) = 0

Math Camp (ECMT) Optimization July 17, 2018 50 / 51


References

Miller, R. E. (2000).Optimization. New York: Wiley-Interscience


Publication.
Hoy et. al. (2011).Mathematics for Economics, 3rd ed.). London:
MIT Press.
Petersen, K. B. and Pedersen, M. S. (2012. The Matrix Cookbook.
Retrieved from https:
//www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

Math Camp (ECMT) Optimization July 17, 2018 51 / 51


Statistics

Math Camp

ECMT

July 26, 2018

Math Camp (ECMT) Statistics July 26, 2018 1 / 72


Overview

1 Sequences and Series

2 Probability

3 Convergence

4 Random Variables

5 Distribution Functions

6 Estimators

Math Camp (ECMT) Statistics July 26, 2018 2 / 72


Sequences and Series

Math Camp (ECMT) Statistics July 26, 2018 3 / 72


Sequences

A sequence is a function whose domain is the positive integers.

Example f (n) = 3n − 2 gives the sequence 1, 4, 7..., that is


a1 = 1, a2 = 4, a3 = 7, ...

A sequence is said to have the limit L if for ane  > 0, however small,
there is some value N such that |an − L| <  whenever n > N.

Such a sequence is said to be convergent and we write

lim an = L
n→∞

Example:
1
lim (1 + )n = e
n→∞ n
If a sequence has no limit, it is divergent.
Math Camp (ECMT) Statistics July 26, 2018 4 / 72
Properties of Sequences

For lim an = La convergent, lim bn = Lb convergent and c constant,


n→∞ n→∞
have
lim can = cLa
n→∞
lim (an ± bn ) = La ± Lb
n→∞
lim (an )(bn ) = La Lb
n→∞
lim an /bn = La /Lb for Lb 6= 0
n→∞

Math Camp (ECMT) Statistics July 26, 2018 5 / 72


Properties of Sequences

For lim an = La convergent, lim bn = +∞ definitely divergent and c


n→∞ n→∞
constant, have
lim cbn = +∞ for c > 0 and lim cbn = −∞ for c < 0
n→∞ n→∞
lim (an ± bn ) = ±∞
n→∞
lim (an )(bn ) = +∞ for La > 0 and lim (an )(bn ) = −∞ for La < 0
n→∞ n→∞
lim an /bn = 0
n→∞
lim c/bn = 0
n→∞

Math Camp (ECMT) Statistics July 26, 2018 6 / 72


Monotonicity and Boundedness

A sequence is monotonically increasing if a1 < a2 < a3 < ... and is


monotonically decreasing if a1 > a2 > a3 > ...
A sequence is bounded if and only if it ahs a lower bound an d an
upper bound.
A monotnic sequence is convergent if and only if it is bounded.

Math Camp (ECMT) Statistics July 26, 2018 7 / 72


Series

Pn
If at is a sequence, then sn = t=1 at is a series.

Example at = 3t − 2 with a1 = 1, a2 = 4, a3 = 7, ..., then


n
X
sn = 3t − 2
t=1

where sn gives the series 1, 5, 12, ..., that is s1 = 1, s2 = 5, s3 = 12, ...

Math Camp (ECMT) Statistics July 26, 2018 8 / 72


Properties of Series
Pn
If sn = t=1 at is the series associated with sequence at and

an+1
lim =L
n→∞ an

have
If L < 1, then the series sn converges
If L > 1, then the series sn diverges
If L = 1, then the series sn may converge or diverge

Example for the geometric series


n
X
sn = aρt−1 = a + aρ + aρ2 + ... + aρn−1
t=1
n
Computing | aan+1
= | aρaρn−1 | = |ρ| allows us to conclude that sn converges if
n
|
|ρ| < 1 and diverges if |ρ| > 1. If |ρ| = 1, have at = a and sn = na which
diverges for any a 6= 0.
Math Camp (ECMT) Statistics July 26, 2018 9 / 72
Probability

Math Camp (ECMT) Statistics July 26, 2018 10 / 72


σ-algebra

The collection of subsets A in the sample space Ω, that is A ⊆ Ω, is a


Borel σ-algebra denoted by B if
Ω∈B
If A ∈ B, then AC ∈ B

If A1 , · · · ∈ B, then Ai ∈ B
S
i=1

Some identities:

If A1 , · · · , An ∈ B, then Ai ∈ B
T
i=1
If B1 ∈ B and B2 ∈ B, then B1 ∩ B2 ∈ B
{∅, Ω} ∈ B
P(Ω) ∈ B where P denotes the power set

Math Camp (ECMT) Statistics July 26, 2018 11 / 72


Probability

Let F be the σ-albegra defined on Ω, then the probability measure defined


on (Ω, F) is a function P : F → [0, 1] with the following properties:
P(A) ≥ 0 for ∀A ∈ F
P(Ω) = 1
∞  ∞
For partition A1 , · · · ∈ F, have P
S P
Ai = P(Ai )
i=1 i=1

Some identities:
P(∅) = 0
P(A) ≤ 1
P(AC ) = 1 − P(A)
If A ⊆ B, then P(A) ≤ P(B)

Math Camp (ECMT) Statistics July 26, 2018 12 / 72


Probability

For A1 ⊆ A2 ⊆ · · · , then

!
[
lim P(An ) = P Ai
n→∞
i=1

For A1 ⊇ A2 ⊇ · · · , then

!
\
lim P(An ) = P Ai
n→∞
i=1

Math Camp (ECMT) Statistics July 26, 2018 13 / 72


Convergence

Math Camp (ECMT) Statistics July 26, 2018 14 / 72


Almost Sure Convergence

A sequence of random variables X1 , X2 , · · · converges almost surely to the


random variable X if
n o
P s ∈ S : lim Xn (s) = X (s) = 1
n→∞

If Xn converges almost surely to X , it is denoted


a.s.
Xn −−→ X

Math Camp (ECMT) Statistics July 26, 2018 15 / 72


Almost Sure Convergence
Example, let S ∈ [0, 1] and
n+1

1 for 0 ≤ s < 2n
Xn (s) =
0 otherwise

then let
1

1 for 0 ≤ s < 2
X (s) =
0 otherwise
For 0 ≤ s < 12 , since n+1
2n> 21 ∀n ≥ 1, i.e. Xn (s) = 1 ∀n ≥ 1 and have
 
P lim Xn (s) = X (s) = 1
n→∞

1
For 2 ≤ s ≤ 1, have Xn (s) = 0 ∀n ≥ 1 and
 
P lim Xn (s) = X (s) = 1
n→∞

a.s.
Hence Xn −−→ X
Math Camp (ECMT) Statistics July 26, 2018 16 / 72
Convergence in Probability (Probability Limit)

A sequence of random variables {Xb } converges in probability towards the


random variable X if for all ε > 0

lim P(|Xn − X | > ε) = 0


n→∞

Alternatively
lim P(|Xn − X | < ε) = 1
n→∞

If Xn converges in probability to X , it is denoted


p
Xn −
→X or plim Xn = X
n→∞

Math Camp (ECMT) Statistics July 26, 2018 17 / 72


Convergence in Probability (Probability Limit)

Some identities:
plim cXn = c plim Xn
plim Xn + Yn = plim Xn + plim Yn
plim Xn Yn = plim Xn plim Yn

Slutsky’s Theorem

If the function g is continuous at plim X , then

plim g (X ) = g (plim X )

Math Camp (ECMT) Statistics July 26, 2018 18 / 72


Convergence in Distribution

A sequence of random variables X1 , X2 , · · · converges in distribution to the


random variable X if
lim FXn (x) = FX (x)
n→∞

If Xn converges in distribution to X , it is denoted


d
Xn −
→X

Cramer Wold’s Device


d d
→ X , then c T Xn −
If Xn − → c T X for any vector c
d
Note: Xn −
→ X ; E (Xn ) → E (X )

Math Camp (ECMT) Statistics July 26, 2018 19 / 72


Convergence in Distribution
Example, for n ≥ 2 let

1 − (1 − n1 )nx

for x > 0
FXn (x) =
0 otherwise

then let X ∼ Exponential(1), i.e.

1 − e −x

for x > 0
FX (x) =
0 otherwise

For x ≤ 0, have
FXn (x) = FX (x) = 0 ∀n ≥ 2
For x > 0, have
1 nx 1 nx
     
lim FXn (x) = lim 1− 1− = 1 − lim 1 − = 1 − e −x
n→∞ n→∞ n n→∞ n
d
Hence Xn −
→X
Math Camp (ECMT) Statistics July 26, 2018 20 / 72
Convergence in r -th Mean

Given a real number r ≥ 1, the sequence Xn converges in the r -th mean


(or in the Lr -norm) towards the random variable X , if

lim E (|Xn − X |r ) = 0
n→∞

For r = 1, have Xn converges in mean to X .


For r = 2, have Xn converges in mean square to X .

Math Camp (ECMT) Statistics July 26, 2018 21 / 72


Convergence in r -th Mean

Example, let
1

n for 0 ≤ x ≤ n
fXn (x) =
0 otherwise
then Z 1
r
n 1
E (|Xn − 0| ) = x r ndx = →0
0 (r + 1)nr
Lr
Hence Xn −→ 0 for all r ≥ 1

Math Camp (ECMT) Statistics July 26, 2018 22 / 72


Useful Properties

Markov’s Inequality
E (X n )
P(X ≥ a) ≤
an

Chebychev’s Inequality

Var (X )
P(|X − E (X )| ≥ a) ≤
a2
Borel-Cantelli Lemma

P
If P([|Xn − c| > ]) < ∞, ∀ > 0, then
n→∞

a.c.
Xn −−→ c

Math Camp (ECMT) Statistics July 26, 2018 23 / 72


Useful Properties

a.s. p d
Xn −−→ X ⇒ Xn −
→ X ⇒ Xn −
→X
For r ≥ 1
Lr p
Xn −→ X ⇒ Xn −
→X
For s ≥ r ≥ 1
Ls Lr
Xn −→ X ⇒ Xn −→ X

Math Camp (ECMT) Statistics July 26, 2018 24 / 72


Weak Law of Large Numbers

Weak LLN

Suppose X1 , · · · , Xn are a sequence of iid random variables with mean


µ < ∞ and variance σ 2 , ∞, then
n
1X p
lim Xi = X̄ −
→µ
n→∞ n
i=1

Khinchine’s Weak LLN

Suppose X1 , · · · , Xn are a sequence of iid random variables with mean


µ<∞
p
X̄ −
→µ

Math Camp (ECMT) Statistics July 26, 2018 25 / 72


Weak Law of Large Numbers

Chebychev’s Weak LLN

Suppose X1 , · · · , Xn are a sequence of independent (not necessarily


identical) random variables with E (Xi ) = µi < ∞ and Var (Xi ) = σi2 < ∞
such that n1 bar σn2 → 0, then
n
1X p
X̄ − µi −
→0
n
i=1

Math Camp (ECMT) Statistics July 26, 2018 26 / 72


Strong Law of Large Numbers

Kolmogorov’s Strong LLN

Suppose X1 , · · · , Xn are a sequence of iid random variables with


E (Xi ) = µ < ∞, then
a.s.
X̄ − µi −−→ 0
Markov’s Strong LLN

Suppose X1 , · · · , Xn are a sequence of independent (not necessarily


identical)
P random variables with E (Xi ) = µi < ∞ and ∃δ > 0 s.t.
lim E (|Xi − µi |1+δ )/i 1+δ < ∞, then
n→∞

n
1X a.s.
X̄ − µi −−→ 0
n
i=1

Math Camp (ECMT) Statistics July 26, 2018 27 / 72


Strong Law of Large Numbers

Liapounov’s Strong LLN

Suppose X1 , · · · , Xn are a sequence of independent (not necessarily


identical) random variables with E (Xi ) = µ < ∞ and ∃δ, ∆ > 0 s.t.
E (|Xt |1+δ ) < ∆ << ∞ for some Xt , then
n
1X a.s.
X̄ − µi −−→ 0
n
i=1

e.g. if let Xt = Xi − µi , we have Markov’s SLLN.

Math Camp (ECMT) Statistics July 26, 2018 28 / 72


Central Limit Theorem

Lindberg-Levy CLT

Suppose X1 , · · · , Xn are a sequence of iid random variables with mean


µ < ∞ and variance σ 2 ∈ (0, ∞), then
√ d
→ N(0, σ 2 )
n(X̄ − µ) −

Liapounov CLT

Suppose X1 , · · · , Xn are a sequence of independent (not necessarily


identical) random variables with E (Xi ) = µi < ∞ and ∃δ, ∆ > 0 s.t.
E [|xi − µi |2+δ < ∆ < ∞], then if lim σ̄n2 > 0 have
√ d
→ N(0, lim σ̄n2 )
n(X̄ − µ̄n ) −

Math Camp (ECMT) Statistics July 26, 2018 29 / 72


Central Limit Theorem

Lindberg-Feller CLT

Suppose X1 , · · · , Xn are a sequence of independent (not necessarily


identical) random variables with E (Xi ) = µi < ∞ and
Var (Xi ) = σi2 ∈ (0, ∞), then if
n
1 −2 X
lim σ̄ E ((Xi − µi )2 1[(Xi −µi )2 >nεσ̄n2 ] ) = 0, ∀ε > 0
n→∞ n n
i=1

have
σi2
lim max ∈ [1, n] =0
n→∞ i=1 nσ̄n2
√ d
→ N(0, lim σ̄n2 )
n(X̄ − µ̄n ) −

Math Camp (ECMT) Statistics July 26, 2018 30 / 72


Delta Method

√ d
If n(X̄ − µ) −
→ N(0, σ2) and g differentiable, then
√ d ∂g (µ) 2 ∂g (µ)
n(g (X̄ ) − g (µ)) −
→ N(0, σ )
∂µ ∂µ

Math Camp (ECMT) Statistics July 26, 2018 31 / 72


Big ’O’ and Little ’o’ Notation

g (n)
If lim → c, we say that g (n) = O(f (n)), for example
n→∞ f (n)

a1 n2 + a2 n + a3 = O(n2 )

b1 n−2 + b2 n−1 = O(n−1 )


that is big ’O’ means ”g (n) is of same order as f (n)”
g (n)
If lim → 0, we say that g (n) = o(f (n)), for example
n→∞ f (n)

a1 n2 + a2 n + a3 = o(n3 )

b1 n−2 + b2 n−1 = o(1)


that is little ’o’ means ”g (n) is ultimately negligible compared to f (n)”

Math Camp (ECMT) Statistics July 26, 2018 32 / 72


Big ’O’ and Little ’o’ Notation

If g (n) = O(f (n)), then cg (n) = O(f (n)) for any constant c
If g1 (n) = O(f (n)) and g2 (n) = O(f (n)), then
g1 (n) + g2 (n) = O(f (n))
If g1 (n) = O(f (n)) but g2 (n) = o(f (n)), then
g1 (n) + g2 (n) = O(f (n))
If g (n) = O(f (n)) but f (n) = o(b(n)), then g (n) = o(b(n))

Math Camp (ECMT) Statistics July 26, 2018 33 / 72


Random Variables

Math Camp (ECMT) Statistics July 26, 2018 34 / 72


Random Variable

A random variable X is a mapping

X : (Ω, B(Ω)) → (R, B(R))

and (Ω, B(Ω), P) is the associated probability space.

Example, let X be the number of heads in a three coins toss, then have

Ω = {{HHH}, {HHT }, {HTH}, {HTT }


{THH}, {THT }, {TTH}, {TTT }}

X = {0, 1, 2, 3}
P(X = 0) = 0.125, P(X = 1) = 0.375
P(X = 2) = 0.375, P(X = 3) = 0.125

Math Camp (ECMT) Statistics July 26, 2018 35 / 72


Cumulative Distribution Function

The Cumulative Distribution Function F (X ) of a random variable X is


given by
F (x) = P(X ≤ x)
with the following properties
lim F (x) = 0
x→−∞
lim F (x) = 1
x→∞
F (x) is a non-decreasing function of x

Math Camp (ECMT) Statistics July 26, 2018 36 / 72


Probability Density Function

The Probability Density Function (Prbability Mass Function for x discrete)


f (X ) of a random variable X is given by

d
f (x) = F (x)
dx
with the following properties
0 ≤ f (x) ≤ 1
R∞
−∞ f (x) = 1
Rx
F (x) = −∞ f (t)dt

Math Camp (ECMT) Statistics July 26, 2018 37 / 72


Expectation

The Expectation (or Mean) of a random variable X is given by


Z ∞
E [X ] = xf (x)dx
−∞

Alternatively Z ∞
E [g (X )] = g (x)f (x)dx
−∞
Some identities:
E [ag1 (x) + bg2 (x) + c] = aE [g1 (x)] + bE [g2 (x)] + c
If g1 (x) ≥ g2 (x) for all x, then E [g1 (x)] ≥ E [g2 (x)] for all x

Math Camp (ECMT) Statistics July 26, 2018 38 / 72


Moment generating function

The Moment Generating Function of a random variable X is given by


Z
MX [t] = E [e ] = e tx f (x)dx
tX

The n-th moment Expectiation of X is given by

dn

n

E (X ) = n MX (t)
dt t=0

Example for n = 1
Z Z Z
d d tx d tx
MX (t) = e f (x)dx = e f (x)dx = xe tx f (x)dx
dt dt dt
Z Z
d tx

MX (t)
= xe f (x)dx
= xf (x)dx = E [X ]
dt t=0 t=0

Math Camp (ECMT) Statistics July 26, 2018 39 / 72


Variance

The variance of a random variable X , sometimes denoted by σ 2 is given by

Var (X ) = E [(X − E (X ))2 ]


Z ∞
2
σ = (x − µ)2 f (x)dx
−∞
Also
Var (X ) = E [(X − E (X ))2 ]
= E [X 2 − 2XE (X ) + (E (X ))2 ]
= E (X 2 ) − 2E (X )E (X ) + (E (X ))2
= E (X 2 ) − 2(E (X ))2 + (E (X ))2
= E (X 2 ) − (E (X ))2

Math Camp (ECMT) Statistics July 26, 2018 40 / 72


Variance

Some identities:
Var (X ) ≥ 0
Var (c) = 0 for constant c
Var (X + c) = Var (X ) for constant c
Var (cX ) = c 2 Var (X ) for constant c
Var (cX + dY ) = c 2 Var (X ) + d 2 Var (Y ) ± 2cdCov (X )(Y ) for
constants c, d

Var (XY ) = E [X 2 Y 2 ] − [E (XY )]2


= Cov (X 2 , Y 2 ) + E (X 2 )E (Y 2 ) − [E (XY )]2
= Cov (X 2 , Y 2 )
+(Var (X ) + [E (X )]2 )(Var (Y ) + [E (Y )]2 )
−[Cov (X , Y ) + E (X )E (Y )]2

Math Camp (ECMT) Statistics July 26, 2018 41 / 72


Variance

For matrices
N N
 
P P
Var Xi = Cov (Xi , Xj )
i=1 i=1,j=1
PN P
= Var (Xi ) + Cov (Xi , Xj )
i=1 i6=j

N N
 
P P
Var ai Xi = ai aj Cov (Xi , Xj )
i=1 i=1,j=1
N
ai2 Var (Xi ) +
P P
= ai aj Cov (Xi , Xj )
i=1 i6=j
N
ai2 Var (Xi ) + 2
P P
= ai aj Cov (Xi , Xj )
i=1 1≤≤j≤N

Math Camp (ECMT) Statistics July 26, 2018 42 / 72


Variance

If Xi and Xj are uncorrelated, that is Cov (Xi , Xj ) = 0, ∀i 6= j, have

N N
!
X X
Var Xi = Var (Xi )
i=1 i=1

Math Camp (ECMT) Statistics July 26, 2018 43 / 72


Conditional Probability

The Conditional Probability of X given Y is given as

P(X ∩ Y )
P(X |Y ) =
P(Y )

Baye’s Rule
P(Y |X )P(X )
P(X |Y )
P(Y )

Math Camp (ECMT) Statistics July 26, 2018 44 / 72


Conditional Probability

Conditional ditribution
f (x, y )
f (x|y ) =
f (y )
Conditional Expectation
Z
E (X |Y ) = f (x|y )dx

Conditional Variance
Z
Var (X |Y ) = [X − E (X |Y )]2 f (x|y )dx
= E (Y 2 |X ) − [E (Y |X )]2

Math Camp (ECMT) Statistics July 26, 2018 45 / 72


Conditional Probability

Variance Decomposition

Var (X ) = E [Var (X |Y )] + Var (E [X |Y ])

Law of iterated or double expectations

E [E (X |Y )] = E (X )

E [E (X |Y , Z )|Y ] = E (X |Y )

Math Camp (ECMT) Statistics July 26, 2018 46 / 72


Independence

If X and Y are mutually independent, have

P(X ∩ Y ) = P(X )P(Y )

that is
P(X ∩ Y ) P(X )P(Y )
P(X |Y ) = = = P(X )
P(Y ) P(Y )
Two random variables X and Y are identically distributed iff
P[x ≥ x] = P[x ≥ Y ] ∀x

Variables X1 , · · · , X − N are independent and identically distributed


(denoted i.i.d.) if each random variable has the same probability
distribution as the others and all are mutually independent.

Math Camp (ECMT) Statistics July 26, 2018 47 / 72


Likelihood estimators

Let f (x |θ) denote the joint pdf of X = (X1 , · · · , Xn )T . Then given X =x


is observed, the likelihood function of θ is given by
n
L(θ|x ) = f (x |θ) =
Y
f (xi |θ)
i=1

and the log-likelihood function of θ is given by


n
`(θ|x ) = log L(θ|x ) =
X
f (xi |θ)
i=1

Example the Maximum Likelihood Estimator can be obtained by setting

`(θ|x ) = 0

∂θ

Math Camp (ECMT) Statistics July 26, 2018 48 / 72


Bi-variate Cumulative Distribution Function

The Joint Cumulative Distribution Function FX ,Y (x, y ) of a random


variables X and Y is given by

FX ,Y (x, y ) = P(X ≤ x, Y ≤ y )

with the following properties


FX ,Y (−∞, y )) = lim FX ,Y (x, y ) = 0
x→−∞
FX ,Y (x, −∞)) = lim FX ,Y (x, y ) = 0
y →−∞
FX ,Y (∞, ∞)) = lim FX ,Y (x, y ) = 1
x→∞,y →∞

Math Camp (ECMT) Statistics July 26, 2018 49 / 72


Bi-variate Probability Density Function

Let Z = (X , Y )T be a bivariate random variable.

The Probability Density Function f (X , Y ) of random variables X and Y is


given by
∂2
f (x, y ) = F (x, y )
∂x∂y
with the following properties
0 ≤ fZ (x, y ) ≤ 1
RR
R2 f (x, y) = 1
P(Z ) = −∞ −∞ f (s, t)dsdt
Ry Rx

Math Camp (ECMT) Statistics July 26, 2018 50 / 72


Expectation and MGF of Bivariate Random Variables
Let t = (t1 , t2 )T .
The Expectation of a g (X , Y ) is given by
Z ∞Z ∞
E [g (X , Y )] = g (x, y )f (x, y )dxdy
−∞ −∞

The Moment Generating Function of Z = (X , Y )T is given by


Z ∞Z ∞
MZ [t] = EZ [e t TZ
]= e t1 X +t2 Y f (x, y )dxdy
−∞ −∞

The Marginal Probability Density Function of X and Y are given by


Z ∞
fX (x) = f (x, y )dy
−∞
Z ∞
fY (y ) = f (x, y )dx
−∞
Math Camp (ECMT) Statistics July 26, 2018 51 / 72
Variance of Bivariate Random Variables

Let the mean vector of Z = (X , Y )Tbe


 
µ = E (Z ) =
E (X )
E (Y )

Then the variance-covariance matrix is given by


  
Var (Z ) = E [(Z − µ) (Z − µ)] = E ( σx
T σx
σy )
σy

where σx2 = Var (X ) and σy2 = Var (Y ), that is

σx2
 
Var (Z ) =
σx σy
σy σx σy2

Math Camp (ECMT) Statistics July 26, 2018 52 / 72


Multivariate Random Variables

Similarly we can define for X = (X1 , · · · , Xn )T


FX (x ) = P(X1 ≤ x1 , · · · , Xn ≤ xn )

∂n
f (x ) = FX
∂x1 · · · ∂xn
Z Z
E [g (X )] = · · · g (X )f (x )dx1 · · · dxn
Rn

MX [t] = EX [e t X ] = e t X f (x )d x
Z
T T

Rn

Math Camp (ECMT) Statistics July 26, 2018 53 / 72


Multivariate Random Variables

 
E (X1 )
µ = E (X ) =  ..
 
. 
E (Xn )
σx21
 
σx1 σx2 · · · σx1 σxn
 σ2 σx σx22 · · · σx2 σxn
Var (X ) = 

 1 1 
.. .. .. .. 
 . . . . 
σxn σx1 σxn σx2 · · · σx2n
Some properties
E (a T X ) = a T E (X )
Var (a T X ) = a T Var (X )a

Math Camp (ECMT) Statistics July 26, 2018 54 / 72


Distirubtion Functions

Math Camp (ECMT) Statistics July 26, 2018 55 / 72


Normal Distribution N(µ, σ 2 )

Probability Distribution Function

1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
Cumulative Distribution Function
  
1 x −µ
F (x) = 1 + erf √
2 2σ 2
Moment Generating Function
σ2 t 2
MX = e µt+ 2

Math Camp (ECMT) Statistics July 26, 2018 56 / 72


Normal Distribution N(µ, σ 2 )

where Z z
2 2
erf (z) = √ e −t dt
π 0
Mean and Variance
E (X ) = µ
Var (X ) = σ 2

Math Camp (ECMT) Statistics July 26, 2018 57 / 72


Bi-variate Normal Distribution N(µ, Σ)

Bi-variate normal random variable


σ12
  
    ρσ1 σ2
X1 µ1
∼N , 
X2 µ2
ρσ1 σ2 σ22

have
f (x1 , x2 ) =
σ1 −µ1 2
− 2ρ( σ1σ−µ )( σ2σ−µ ) + ( σ2σ−µ
!
1 2 2 2
1 ( σ1 ) 1 2 2
)
exp −
2(1 − ρ2 )
p
2πσ1 σ2 1 − ρ2

Math Camp (ECMT) Statistics July 26, 2018 58 / 72


Poisson Distribution Po(λ)
Probability Mass Function

λk e −λ
f (x) =
k!
Cumulative Distribution Function
k
X λi
F (x) = e −λ
i!
i=0

Moment Generating Function


t −1)
MX = e λ(e

Mean and Variance


E (X ) = λ
Var (X ) = λ
Math Camp (ECMT) Statistics July 26, 2018 59 / 72
Uniform Distribution U(a, b)

Probability Distribution Function


 1
b−a forx ∈ [a, b]
f (x) =
0 otherwise

Cumulative Distribution Function



 0 forx < a
x−a
F (x) = for x ∈ [a, b)
 b−a
1 forx ≥ b

Math Camp (ECMT) Statistics July 26, 2018 60 / 72


Uniform Distribution U(a, b)

Moment Generating Function


(
e ib −e ia
t(b−a) fort 6= 0
MX =
1 for t = 0

Mean and Variance


1
E (X ) = (a + b)
2
1
Var (X ) = (b − a)2
12

Math Camp (ECMT) Statistics July 26, 2018 61 / 72


Exponential Distribution Exponential(λ)

Probability Distribution Function

f (x) = λe −λx

Cumulative Distribution Function

F (x) = 1 − e −λx

Moment Generating Function


λ
MX = for t < λ
λ−t
Meand and Variance
E (X ) = λ−1
Var (X ) = λ−2

Math Camp (ECMT) Statistics July 26, 2018 62 / 72


Gamma Distribution Gamma(α, β)

Probability Distribution Function


β α α−1 −βx
f (x) = x e
Γ(α)

Cumulative Distribution Function


1
F (x) = γ(α, βx)
Γ(α)

Moment Generating Function


 −α
t
MX = 1−
β

Math Camp (ECMT) Statistics July 26, 2018 63 / 72


Gamma Distribution Gamma(α, β)

where Z ∞
Γ(α) = t α−1 e −t dt
0
Z βx
γ(α, βx) = t α−1 e −t dt
0
Mean and Variance
α
E (X ) =
β
α
Var (X ) = 2
β

Math Camp (ECMT) Statistics July 26, 2018 64 / 72


Beta Distribution Beta(α, β)

Probability Distribution Function

x α−1 (1 − x)β−1
f (x) =
B(α, β)

Cumulative Distribution Function


B(x; α, β)
F (x) =
B(α, β)

Moment Generating Function


∞ k−1
!
X Y α+r tk
MX = 1 +
α+β+r k!
k=1 r =0

Math Camp (ECMT) Statistics July 26, 2018 65 / 72


Beta Distribution Beta(α, β)

where Z 1
B(α, β) = t α−1 (1 − t)β−1 dt
0
Z x
B(x; α, β) = t α−1 (1 − t)b−1 dt
0
Mean and Variance
α
E (X ) =
α+β
αβ
Var (X ) =
(α + β)2 (α + β + 1)

Math Camp (ECMT) Statistics July 26, 2018 66 / 72


Estimators

Math Camp (ECMT) Statistics July 26, 2018 67 / 72


Estimators

Let β̂ be the estimator for true parameter β, then


β̂ is unbiased if E (β̂) = E (β)

The bias ofβ̂ is given by E (β̂) − E (β)


p
β̂ is consistent if β̂ −
→β

β̂ is more efficient than another estimator β̃ if Var (β̃) − Var (β̂) is


positive definite with probability 1

The asymptotic distribution of β̂ is such that we need to premultiply


E (β̂) − E (β) by some power of n other than 0 to get some meaningful
distribution. For example
√ d
→ N(0, σ 2 )
n(X̄ − µ) −

Math Camp (ECMT) Statistics July 26, 2018 68 / 72


Time Series

The time series {Xt }T


t=1 variable is covariance stationary if all of the
following are met
E (Xt ) = µ < ∞ ∀t
Var (Xt ) = σ2 <∞ ∀t
Cov (Xt , Xt−j ) = γj ∀t and ∀j 6= 0

Examples of time-series (where et ∼ N(0, σe2 ) are i.i.d.):


p
P
AR(p): Xt = µ + φi Xt−i + et
i=1
q
P
MA(q): Xt = µ + et − θi et−i
i=1
p
P q
P
ARMA(p,q): Xt = µ + φi Xt−i + et − θi et−i
i=1 i=1

Math Camp (ECMT) Statistics July 26, 2018 69 / 72


Hypothesis Testing

Example testing H0 : c T β = r versus H1 : c T β 6= r where c is a k × 1


vector and r is a scalar, gives the following test statistic

cT β − r
Tn = p
s 2 c T (X T X )−1 c

Student t distribution

A random variable T follows the student t distribution with q degrees of


freedom, written as T ∼ t(q), if T = √U where U ∼ N(0, 1),
V /q
V ∼ χ2 (q) and U ⊥ V

Under H0 have Tn ∼ t(n − k)

Math Camp (ECMT) Statistics July 26, 2018 70 / 72


Hypothesis Testing

Example testing H0 : Rβ = r versus H1 : Rβ 6= r where R is a q × k


matirx with q < k, and r is a q × 1 vector, gives the following test statistic
1 T h i 
Fn = R β̂ − r s 2 R(X T X )−1 R T R β̂ − r
q
F distribution

A random variable F follows the F distribution with (p, q) degrees of


freedom, written as F ∼ F (p, q), if F = VU/p 2
/q where U ∼ χ (p),
V ∼ χ2 (q) and U ⊥ V

Under H0 have Fn ∼ F (q, n − k)

Math Camp (ECMT) Statistics July 26, 2018 71 / 72


References

Hoy et. al. (2011).Mathematics for Economics, 3rd ed.). London:


MIT Press.
(2018). Retrieved from
https://www.probabilitycourse.com/chapter7
Shalizi C. (2013). Retrieved from http:
//www.stat.cmu.edu/~cshalizi/uADA/13/lectures/app-b.pdf

Math Camp (ECMT) Statistics July 26, 2018 72 / 72

You might also like