ECMT Math Camp 2018 PDF
ECMT Math Camp 2018 PDF
ECMT Math Camp 2018 PDF
Math Camp
ECMT
July 5, 2018
1 Differentiation
2 Integration
3 Multi-Variate Calculus
dy
If y = e x , then dx = ex
dy 1
If y = ln x, then dx = x
Pn Pn
If h(x) = i=1 gi (x), then h0 (x) = i=1 g
0 (x)
dy dy du
If y = f (u) and u = g (x), then dx = du dx
Example
x 2 sin( x1 )
if x 6= 0
f (x) =
0 if x = 0
is differentiable at x = 0 but not continuously differentiable.
x1 = g1 (y1 , · · · , ym )
.. ..
. .
xn = gn (y1 , · · · , ym )
The Implicit Function Theorem is the idea behind the Lagrange method.
Example
x 2 − y 2 − u3 + v 2 + 4 = 0
2xy + y 2 − 2u 2 + 3v 4 + 8 = 0
Let
x 2 − y 2 − u3 + v 2 + 4
f1
f = =
f2 2xy + y 2 − 2u 2 + 3v 4 + 8
then
∂f1 ∂f1
−3u 2 2v
∂u ∂v =
∂f2 ∂f2 −4u 12v 3
∂u ∂v
Hence
! −1
∂u ∂u
−3u 2 2v
∂x ∂y 2x −2y
∂v ∂v =−
∂x ∂y
−4u 12v 3 2y 2y
−1
−12v 3 2v
1 2x −2y
=
8uv − 36u 2 v 2 −4u 3u 2 2y 2y
Example
x2 x3 x n−1
ex = 1 + x + + + ··· + + Rn
2! 3! (n − 1)!
ξn
where Rn = n! and ξ lies between 0 and x.
If the function f (x) is continuous on the closed interval [a, b] and if F (x)
is any antiderivative (indefinite integral) of f (x), then
Z b
f (x)dx = F (b) − F (a)
a
where F (b) is the antiderivative of f (x) at the point x = b and F (a) is the
antiderivative of f (x) at the point x = a. The expression [F (b) − F (a)] is
often denoted as [F (x)]ba .
x n=1
Z
x n dx = +C
n+1
Z Z Z
f (x) ± g (x)dx = f (x)dx ± g (x)dx
Z Z
kf (x)dx = k f (x)dx
Z
e x dx = e x dx + C
Z
1
= ln(x) + C
x
Z c Z b Z c
f (x)dx = f (x)dx + f (x)dx
a a b
Z a Z c
f (x)dx ≡ lim f (x)dx = 0
a c→a a
Z c Z a
f (x)dx = − f (x)dx
a c
Substitution
du d(x 3 +e x )
let u = x 3 + e x , then dx = dx = 3x 2 + e x so have
u2 (x 3 + e x )2
Z Z
du
u dx = u du = +C = +C
dx 2 2
Integration by Parts
Example Z
xe x dx
Example
Leibniz’s Rule. If f (θ), a(θ) and b(θ) are differentiable with respect to θ,
then Z b(θ)
d
f (x, θ)dx
dθ a(θ)
Z b(θ)
d d ∂f (x, θ)
= f (b(θ), θ) b(θ) − f (a(θ), θ) a(θ) + dx
dθ dθ a(θ) ∂θ
Useful for bringing the differentiation inside the integral. Also useful for
finding integrals by differentiating first.
Example
1
xα − 1
Z
dx (α ≥ 0)
0 ln x
x α −1
R1
Let F (α) = 0 ln x dx. Differentiating both sides with respect to α
1 1 1
xα − 1 ∂ xα − 1
Z Z Z
0 d 1
F (α) = dx = dx = x α dx =
dα 0 ln x 0 ∂α ln x 0 α+1
Useful for finding the rate of change with respect to one variable keeping
all others constant.
Example
Let
f (x1 , x2 ) = Ax1α x2β
then
∂f
= αAx1α−1 x2β
∂x1
∂f
= βAx1α x2β−1
∂x2
∂2f ∂2f
=
∂xi ∂xj ∂xj ∂xi
∂f f (x0 + hv ) − f (x0 )
= lim
∂v h→0,h6=0 h
Example
∂f
= 2x + 2y
v2
Math Camp
ECMT
1 Set Theory
2 Unconstrained Optimization
3 Constrained Optimization
Lagrange Method
Envelope Theorem
Kuhn-Tucker Theorem
x ∈S
x∈
/S
Example of a set
Number sets:
Natural numbers N = {1, 2, 3, ...}
Integers Z = {..., −2, −1, 0, 1, 2, ...}
Positive integers Z+ = {1, 2, 3, ...}
Negative integers Z− = {..., −3, −2, −1}
Rational numbers Q = { pq : p ∈ Z and q ∈ Z}
Real numbers R = (−∞, ∞)
Positive real numbers R+ = [0, ∞)
Strictly positive real numbers R++ = (0, ∞)
Negative real numbers R− = (−∞, 0]
Strictly negative real numbers R−− = (−∞, 0)
If all the elements of set X are also elements of set Y , then X is a subset
of Y , written
X ⊆Y
If all the elements of set X are in set Y , but not all elements of set Y are
in set X , then X is a proper subset of Y , written
X ⊂Y
The empty set or the null set is the set with no elements, written ∅
Math Camp (ECMT) Optimization July 17, 2018 6 / 51
Sets and Subsets
The intersection W of two sets X and Y is the set of elements that are in
both X and Y
W = X ∩ Y = {x : x ∈ X and x ∈ Y }
The union V of two sets X and Y is the set of elements that are in one or
other of the sets
V = X ∪ Y = {x : x ∈ X or x ∈ Y }
X ∩ Y = {3}
X ∪ Y = {1, 2, 3, 4, 5}
P = {A : A ⊆ X }
P(X ) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
Example
q √
d[(2, 3, 4), (4, 1, −5)] = (2 − 4)2 + (3 − 1)2 + (4 − (−5))2 = 89
f (x ∗ ) ≥ f (x)
f (x̂) ≥ f (x)
f (x ∗ ) ≤ f (x)
f (x̂) ≤ f (x)
For matrices,
Example
Given
f (x1 , x2 , x3 ) = 5x12 + 2x22 + x34 − 32x3 + 6x1 x2 + 5x2
Solving ∇f (X ∗ ) = 0
10x1 + 6x2 0
∇f = 6x1 + 4x2 + 5 = 0
4x33 − 32 0
which gives
7.5
X ∗ = −12.5
2
with
|H1 | = |10| = 10
10 6
|H2 | = =4
6 4
10 6 0
|H3 | = 6 4
0 = 192
0 0 12(2)2
∂x T b ∂b T x
= =b
∂x ∂x
∂Ax ∂x T A
= =A
∂x ∂x
∂y T Ax ∂x T AT y
= = AT y
∂x ∂x
∂x T Ax
= (A + AT )x
∂x
∂2
= (A + AT )
∂x∂x T
∂aT Xb
= ab T
∂X
n−1
∂ T n X
a X b= (X r )T ab T (X n−1−r )T
∂X
r =0
∂aT X T b
= baT
∂X
∂aT Xa ∂aT X T a
= = aaT
∂X ∂X
∂aT X T Xb
= X (ab T + baT )
∂X
n−1
∂ T n T n X
a (X ) X b = [X n−1−r ab T (X n )T X r + (X r )T X n ab T (X n−1−r )T ]
∂X
r =0
then the Lagrange method of finding (x1∗ , x2∗ ) consists of deriving the
Lagrange function
∂ X ∂ j ∗
f (x1∗ , · · · , xn∗ ) + λj g (x1 , · · · , xn∗ ) = 0
∂xi ∂xi
j
g j (x1∗ , · · · , xn∗ ) = 0
where i = 1, · · · , n and j = 1, · · · , m
The better or upper contour set of the point (x10 , x20 , ..., xn0 ) is
B(x10 , x20 , ..., xn0 ) = {(x1 , ..., xn ) ∈ X : f (x1 , ..., xn ) ≥ f (x10 , x20 , ..., xn0 )}
The worse or lower contour set of the point (x10 , x20 , ..., xn0 ) is
W (x10 , x20 , ..., xn0 ) = {(x1 , ..., xn ) ∈ X : f (x1 , ..., xn ) ≥ f (x10 , x20 , ..., xn0 )}
j = 1, · · · , m
∂x ∗
∂f1
∂f1 ∂f1
· · · ∂f1
∂α
1
− ∂α
∂x1 ∂x2 ∂xn ∗ j j
∂f2 ∂f2 · · · ∂f2 ∂x2 − ∂f2
∂αj
∂x1 ∂x2 ∂xn ∂αj
. . . . . = ..
. . .
. . . .
. ..
∗ .
∂fn ∂fn ∂fn ∂xn ∂fn
∂x1 ∂x2 · · · ∂xn − ∂α
∂αj j
By Cramer’s Rule
∂xi∗ |Fij |
=
∂αj |F |
where Fij is given by replacing the ith column of F by the jth column of
the Jacobian J(f ) with respect to the αj ’s
∂f1 ∂f1 ∂f1
− ∂α − ∂α ··· − ∂α
1 2 m
∂f2 ∂f2 ∂f2
− ∂α 1
− ∂α 2
··· − ∂α m
J=
.. .. .. ..
. . . .
∂fn ∂fn ∂fn
− ∂α 1
− ∂α 2
··· − ∂α m
Example
Given
max u(x1 , x2 ) s.t. p1 x1 + p2 x2 = m
Ordering the endogeneous variables from first to last as x1∗ , x2∗ , λ∗ get
∂2u ∂2u
∂x1 ∂x1 ∂x1 ∂x2 −p1
∂2u 2
|F | = ∂x ∂x ∂x∂ ∂x u
−p2
2 1 2 2
−p1 −p2 0
λ∗ ∂2u
∂x1 ∂x2 −p1
∂2u
−p2
0 ∂x2 ∂x2
∗
∂x1 |F11 | x1 −p2 0
= =
∂p1 |F | |F |
∂2u 0 −p1
∂x1 ∂x1
∂2u
λ∗ −p2
∂x ∂x
2 1
x2∗
∂x2 |F22 | −p1 0
= =
∂p2 |F | |F |
∂2u
0
∂x1 ∂x2 −p1
∂2u
−p2
0 ∂x2 ∂x2
∂x1 |F13 | −1 −p2 0
= =
∂m |F | |F |
∂2u 0 −p1
∂x1 ∂x1
∂2u
λ∗ −p2
∂x ∂x
2 1
∂x2 |F23 | −p1 −1 0
= =
∂m |F | |F |
dV df dx1 df dx2 df
= + +
dα dx1 dα dx2 dα dα
Substituting first two FOCs
dV dg dx1 dg dx2 df
= −λ∗ ( + )+
dα dx1 dα dx2 dα dα
L(α) = f (x1∗ (α), x2∗ (α); α) + λ∗ (α)g (x1∗ (α), x2∗ (α); α)
Differentiating get
dL d dx1 d dx2 df dλ∗ dg
= (f + λ∗ g ) + (f + λ∗ g ) + + g + λ∗
dα dx1 dα dx2 dα dα dα dα
Substituting FOCs
dL dx1 dx2 df dλ dg
=0· +0· + + · 0 + λ∗
dα dα dα dα dα dα
dL df dg
= + λ∗
dα dα dα
Hence
dV dL df dg
= = + λ∗
dα dα dα dα
gk (x1 , · · · , xn ; α1 , · · · , αm ) = 0 for k = 1, · · · , K
The Lagrange multiplier measures the rate at which the value function
changes when the corresponding constraint is tightened or relaxed slightly.
If a constraint is nonbinding at the optimum, so that a small tightening or
relaxing of it has no effect on the solution, then the associated Lagrange
multiplier will take the value zero at the optimum.
Math Camp (ECMT) Optimization July 17, 2018 41 / 51
Envelope Theorem
Example
FOCs
bpi ai Lib−1 − λ∗ = 0 for i = 1, 2
L0 − L1 − L2 = 0
Solving
L1 = c1 L0 and L2 = c2 L0
where
p1 a1 1/(b−1) −1
c1 = [1 + ( ) ] and c2 = 1 − c1
p2 a2
Optimized value function
where both f and g are concave and differentiable, the Langrange function
Kuhn-Tucker Theorem
Given
max f (x1 , x2 ) s.t. g (x1 , x2 ) ≥ 0 for x1 , x2 ≥ 0
if f and g are concave and differnetiable, and if there exists a point
(x10 , x20 ) such that g (x10 , x20 ) > 0, then there exists a Lagrange multiplier
λ∗ such that the Kuhn-Tucker conditions are both necessary and sufficient
for the point (x1∗ , x2∗ ) to be a solution to the problem.
Lagrange function
L = u(x1 , x2 ) + λ(m − p1 x1 − p2 x2 )
Kuhn-Tucker Conditions
∂L ∂u
= − λ∗ pi ≤ 0 where xi∗ ≥ 0
∂xi ∂xi
∂u
xi∗ ( − λ∗ pi ) = 0
∂xi
∂L
= m − p1 x1∗ − p2 x2∗ ≥ 0 where λ∗ ≥ 0
∂λ
λ∗ (m − p1 x1∗ − p2 x2∗ ) = 0
Math Camp (ECMT) Optimization July 17, 2018 47 / 51
Kuhn-Tucker Theorem
∂u
∂x1 p1
∂u
=
∂x2
p2
Given
max f (x1 , · · · , xn )
subject to
gj (x1 , · · · , xn ) ≥ 0 for j = 1, · · · , m
if all the funuctions f and gj are concave and differentiable, and there
exists a point (x10 , · · · , xn0 ) such that for all gj (x10 , · · · , xn0 ) > 0, then there
exist m Lagrange multipliers λ∗j such that the following conditions are
necessary and sufficient for the point (x1∗ , · · · , xn∗ ) to be a solution to the
problem.
Conditions
∂f (x1∗ , · · · , xn∗ ) X ∗ ∂gj (x1∗ , · · · , xn∗ )
− λj ≤0 and xi ≥ 0
∂xi ∂xi
∂f X ∂gj
xi∗ ( − λ∗j )=0
∂xi ∂xi
gj (x1∗ , · · · , xn∗ ) ≥ 0 and λ∗j ≥ 0
λ∗j gj (x1∗ , · · · , xn∗ ) = 0
Math Camp
ECMT
2 Probability
3 Convergence
4 Random Variables
5 Distribution Functions
6 Estimators
A sequence is said to have the limit L if for ane > 0, however small,
there is some value N such that |an − L| < whenever n > N.
lim an = L
n→∞
Example:
1
lim (1 + )n = e
n→∞ n
If a sequence has no limit, it is divergent.
Math Camp (ECMT) Statistics July 26, 2018 4 / 72
Properties of Sequences
Pn
If at is a sequence, then sn = t=1 at is a series.
have
If L < 1, then the series sn converges
If L > 1, then the series sn diverges
If L = 1, then the series sn may converge or diverge
Some identities:
∞
If A1 , · · · , An ∈ B, then Ai ∈ B
T
i=1
If B1 ∈ B and B2 ∈ B, then B1 ∩ B2 ∈ B
{∅, Ω} ∈ B
P(Ω) ∈ B where P denotes the power set
Some identities:
P(∅) = 0
P(A) ≤ 1
P(AC ) = 1 − P(A)
If A ⊆ B, then P(A) ≤ P(B)
For A1 ⊆ A2 ⊆ · · · , then
∞
!
[
lim P(An ) = P Ai
n→∞
i=1
For A1 ⊇ A2 ⊇ · · · , then
∞
!
\
lim P(An ) = P Ai
n→∞
i=1
then let
1
1 for 0 ≤ s < 2
X (s) =
0 otherwise
For 0 ≤ s < 12 , since n+1
2n> 21 ∀n ≥ 1, i.e. Xn (s) = 1 ∀n ≥ 1 and have
P lim Xn (s) = X (s) = 1
n→∞
1
For 2 ≤ s ≤ 1, have Xn (s) = 0 ∀n ≥ 1 and
P lim Xn (s) = X (s) = 1
n→∞
a.s.
Hence Xn −−→ X
Math Camp (ECMT) Statistics July 26, 2018 16 / 72
Convergence in Probability (Probability Limit)
Alternatively
lim P(|Xn − X | < ε) = 1
n→∞
Some identities:
plim cXn = c plim Xn
plim Xn + Yn = plim Xn + plim Yn
plim Xn Yn = plim Xn plim Yn
Slutsky’s Theorem
plim g (X ) = g (plim X )
1 − (1 − n1 )nx
for x > 0
FXn (x) =
0 otherwise
1 − e −x
for x > 0
FX (x) =
0 otherwise
For x ≤ 0, have
FXn (x) = FX (x) = 0 ∀n ≥ 2
For x > 0, have
1 nx 1 nx
lim FXn (x) = lim 1− 1− = 1 − lim 1 − = 1 − e −x
n→∞ n→∞ n n→∞ n
d
Hence Xn −
→X
Math Camp (ECMT) Statistics July 26, 2018 20 / 72
Convergence in r -th Mean
lim E (|Xn − X |r ) = 0
n→∞
Example, let
1
n for 0 ≤ x ≤ n
fXn (x) =
0 otherwise
then Z 1
r
n 1
E (|Xn − 0| ) = x r ndx = →0
0 (r + 1)nr
Lr
Hence Xn −→ 0 for all r ≥ 1
Markov’s Inequality
E (X n )
P(X ≥ a) ≤
an
Chebychev’s Inequality
Var (X )
P(|X − E (X )| ≥ a) ≤
a2
Borel-Cantelli Lemma
∞
P
If P([|Xn − c| > ]) < ∞, ∀ > 0, then
n→∞
a.c.
Xn −−→ c
a.s. p d
Xn −−→ X ⇒ Xn −
→ X ⇒ Xn −
→X
For r ≥ 1
Lr p
Xn −→ X ⇒ Xn −
→X
For s ≥ r ≥ 1
Ls Lr
Xn −→ X ⇒ Xn −→ X
Weak LLN
n
1X a.s.
X̄ − µi −−→ 0
n
i=1
Lindberg-Levy CLT
Liapounov CLT
Lindberg-Feller CLT
have
σi2
lim max ∈ [1, n] =0
n→∞ i=1 nσ̄n2
√ d
→ N(0, lim σ̄n2 )
n(X̄ − µ̄n ) −
√ d
If n(X̄ − µ) −
→ N(0, σ2) and g differentiable, then
√ d ∂g (µ) 2 ∂g (µ)
n(g (X̄ ) − g (µ)) −
→ N(0, σ )
∂µ ∂µ
g (n)
If lim → c, we say that g (n) = O(f (n)), for example
n→∞ f (n)
a1 n2 + a2 n + a3 = O(n2 )
a1 n2 + a2 n + a3 = o(n3 )
If g (n) = O(f (n)), then cg (n) = O(f (n)) for any constant c
If g1 (n) = O(f (n)) and g2 (n) = O(f (n)), then
g1 (n) + g2 (n) = O(f (n))
If g1 (n) = O(f (n)) but g2 (n) = o(f (n)), then
g1 (n) + g2 (n) = O(f (n))
If g (n) = O(f (n)) but f (n) = o(b(n)), then g (n) = o(b(n))
Example, let X be the number of heads in a three coins toss, then have
X = {0, 1, 2, 3}
P(X = 0) = 0.125, P(X = 1) = 0.375
P(X = 2) = 0.375, P(X = 3) = 0.125
d
f (x) = F (x)
dx
with the following properties
0 ≤ f (x) ≤ 1
R∞
−∞ f (x) = 1
Rx
F (x) = −∞ f (t)dt
Alternatively Z ∞
E [g (X )] = g (x)f (x)dx
−∞
Some identities:
E [ag1 (x) + bg2 (x) + c] = aE [g1 (x)] + bE [g2 (x)] + c
If g1 (x) ≥ g2 (x) for all x, then E [g1 (x)] ≥ E [g2 (x)] for all x
dn
n
E (X ) = n MX (t)
dt t=0
Example for n = 1
Z Z Z
d d tx d tx
MX (t) = e f (x)dx = e f (x)dx = xe tx f (x)dx
dt dt dt
Z Z
d tx
MX (t)
= xe f (x)dx
= xf (x)dx = E [X ]
dt t=0 t=0
Some identities:
Var (X ) ≥ 0
Var (c) = 0 for constant c
Var (X + c) = Var (X ) for constant c
Var (cX ) = c 2 Var (X ) for constant c
Var (cX + dY ) = c 2 Var (X ) + d 2 Var (Y ) ± 2cdCov (X )(Y ) for
constants c, d
For matrices
N N
P P
Var Xi = Cov (Xi , Xj )
i=1 i=1,j=1
PN P
= Var (Xi ) + Cov (Xi , Xj )
i=1 i6=j
N N
P P
Var ai Xi = ai aj Cov (Xi , Xj )
i=1 i=1,j=1
N
ai2 Var (Xi ) +
P P
= ai aj Cov (Xi , Xj )
i=1 i6=j
N
ai2 Var (Xi ) + 2
P P
= ai aj Cov (Xi , Xj )
i=1 1≤≤j≤N
N N
!
X X
Var Xi = Var (Xi )
i=1 i=1
P(X ∩ Y )
P(X |Y ) =
P(Y )
Baye’s Rule
P(Y |X )P(X )
P(X |Y )
P(Y )
Conditional ditribution
f (x, y )
f (x|y ) =
f (y )
Conditional Expectation
Z
E (X |Y ) = f (x|y )dx
Conditional Variance
Z
Var (X |Y ) = [X − E (X |Y )]2 f (x|y )dx
= E (Y 2 |X ) − [E (Y |X )]2
Variance Decomposition
E [E (X |Y )] = E (X )
E [E (X |Y , Z )|Y ] = E (X |Y )
that is
P(X ∩ Y ) P(X )P(Y )
P(X |Y ) = = = P(X )
P(Y ) P(Y )
Two random variables X and Y are identically distributed iff
P[x ≥ x] = P[x ≥ Y ] ∀x
`(θ|x ) = 0
∂
∂θ
FX ,Y (x, y ) = P(X ≤ x, Y ≤ y )
σx2
Var (Z ) =
σx σy
σy σx σy2
∂n
f (x ) = FX
∂x1 · · · ∂xn
Z Z
E [g (X )] = · · · g (X )f (x )dx1 · · · dxn
Rn
MX [t] = EX [e t X ] = e t X f (x )d x
Z
T T
Rn
E (X1 )
µ = E (X ) = ..
.
E (Xn )
σx21
σx1 σx2 · · · σx1 σxn
σ2 σx σx22 · · · σx2 σxn
Var (X ) =
1 1
.. .. .. ..
. . . .
σxn σx1 σxn σx2 · · · σx2n
Some properties
E (a T X ) = a T E (X )
Var (a T X ) = a T Var (X )a
1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
Cumulative Distribution Function
1 x −µ
F (x) = 1 + erf √
2 2σ 2
Moment Generating Function
σ2 t 2
MX = e µt+ 2
where Z z
2 2
erf (z) = √ e −t dt
π 0
Mean and Variance
E (X ) = µ
Var (X ) = σ 2
have
f (x1 , x2 ) =
σ1 −µ1 2
− 2ρ( σ1σ−µ )( σ2σ−µ ) + ( σ2σ−µ
!
1 2 2 2
1 ( σ1 ) 1 2 2
)
exp −
2(1 − ρ2 )
p
2πσ1 σ2 1 − ρ2
λk e −λ
f (x) =
k!
Cumulative Distribution Function
k
X λi
F (x) = e −λ
i!
i=0
f (x) = λe −λx
F (x) = 1 − e −λx
where Z ∞
Γ(α) = t α−1 e −t dt
0
Z βx
γ(α, βx) = t α−1 e −t dt
0
Mean and Variance
α
E (X ) =
β
α
Var (X ) = 2
β
x α−1 (1 − x)β−1
f (x) =
B(α, β)
where Z 1
B(α, β) = t α−1 (1 − t)β−1 dt
0
Z x
B(x; α, β) = t α−1 (1 − t)b−1 dt
0
Mean and Variance
α
E (X ) =
α+β
αβ
Var (X ) =
(α + β)2 (α + β + 1)
cT β − r
Tn = p
s 2 c T (X T X )−1 c
Student t distribution