MLE Stuff
MLE Stuff
MLE Stuff
Large-Sample Theory
n
n(X
) !N (0,
An + Bn Yn !a + bY
2
. f is dierentiable at . Then
n)
n(f (X
n 1
d
n (
M LE
) !N
1
0, 00
A ()
MLE achieves Cramer-Rao in an asymptotic sense. Except not quite because of super efficicency.
Theorem 6. Keener 8.18. Let (Xi )i 1 be i.i.d rv with common CDF F, let 2 (0, 1), and let n be the
b nc0 th order statistics for X1 , X2 , ..., Xn . If F () = , and if F 0 ( ) exists and is finite and positive, then:
p
2
2.1
n(
) !N
0,
(1
[F 0 ()]
)
2
Definition 3. Random Element. Let (, F, P) be a probability space, and (E, E) a measurable space. A
random element with value in E is a function X : ! E which is (F, E) measurable.
Random Function is an example of random element. Let K be a compact set. Let Wi (t) = h(t, Xi ), t 2 K.
Assume h(t, x) is continuous in t, 8x. Wi are random functions taking values in C(K), the set of continuous
function.
1
Definition 4. Supremum Norm. For w 2 C(K), the supremum norm of ! is defined as:
kwk1 = sup |w(t)|
t2K
wk1 ! 0.
Lemma 2. Keener p152. Let W be a random function in C(k).Define (t) = EW (t).t 2 K. If EkW k1 < 1,
then is continuous. Also:
sup E
t2K
sup
s:ks tk<
|W (s)
W (t)| !0
as ! 0.
Theorem 7. Dinis Theorem. Suppose f1
then supx2K fn (K) ! 0.
f2
Dini theorem turns pointwise statement into uniform statement. A more general form in Empirical
Process Theory is:
Theorem 8. Uniform Weak Law. Keener p153. Let W, W1 , W2 , ... be i.i.d random functions in C(K), K
P
n = 1 Pn Wi . This implies kW
n k1 !
compact, with mean and EkW k1 < 1. Let W
0.
i=1
n
P
Theorem 9. Let Gn , n 1, be random function in C(K), K compact, and suppose kGn gk1 ! 0 with g
a nonrandom function in C(K).
P
1. If tn , n
1 are random variables converging in probability to a constant t 2 K, tn ! t , then
P
Gn (tn ) ! g(t ).
2. If g achieves its maximum at a unique value t , and if tn are random variables maximizing Gn , so
P
that: Gn (tn ) = supt2K Gn (t), then tn ! t .
3. If K R and g(t) = 0 has a uninque solution t , and if tn are random variables solving Gn (tn ) = 0,
P
then tn ! t .
2.2
For this subsection, let X, X1 , X2 , ... be i.i.d with common density f , 2 , and let ln be the log-likelihood
function for the first n obs:
ln (!) =
n
X
log f! (Xi )
i=1
f! (X)
f (X)
P
of ! for a.e. x, and P! 6= P! , 8! 6= , then under P , n ! .
Note that here n is the MLE for n observation, which mean it maximize W (!). This theorem establishes
the consistency of MLE.
Theorem 11. Suppose = Rp , f! (x) is a continuous function of ! for a.e. x, P! 6= P for all ! 6= , and
f! (x) ! 0 as ! ! 1. If E kIK W k1 < 1 for any compact set K Rp , and if E supk!k>a W (!) < 1 for
P
some a > 0, then under P , n ! .
2.3
E W 00 ()
,+] W
00
k1 <1
as n ! 1.
1
d
!N 0,
I()
2.4
Confidence Intervals
Definition 6. If
for g() i:
and
P (g() 2 ( 0 ,
1 ))
1)
for all 2 . Also, a random set S = S(X) constructed from data X is called a 1
for g() if:
P (g() 2 S)
confidence region
for all 2 .
Definition 7. A variable, which depends on both the data and the parameter, but whose distribution is
independent of the parameter is called Pivot
3
Suppose
n(n
nI()(n ) )N (0, 1)
hp
i
)P
nI() n z/2 !1
It is often difficult to calculate Fisher information. We will discuss strategies to approximate the Fisher
information
1. We can use I(n ) instead. If I() is continuous then:
s
I(n ) p
! 1
I()
nI(n )(n
) =
I(n ) p
nI()(n
I()
) ) N (0, 1)
2. We can also use the results from empirical process theorey. Remember that l() =
Thus by law of large number
l00 (n ) p
! I()
n
And thus by Slutsky theorem
q
l00 (n )(n
) )N (0, 1)
n )2
And thus
2ln () =
hp
ln00 (n )(
n )
i2
) X12
h
i
2
P Z 2 z/2
=P z/2 Z z/2 = 1
P (2ln (n )
2
2ln () z/2
) !1
Pn
i=1
log f (Xi ).
Hypothesis Testing
3.1
(1)
(2)
sup P (X 2 S) =
(3)
The LHS in (3) is called the size of the test or critical region S1 . The probability of rejection in (2) is
called power of the test against the alternative . Consider as a function of over , the probability is called
the power function of the test and is denoted ().
Now, for each X = x, instead of choosing 1 or 0 deterministically, one can do it randomly as a Bernoulli
trial with success rate (x). The randomized test is therefore completely characterized with the function
(x).The set of points x for which (x) = 1 is the region of rejection. The probability of rejection is:
E (X) =
The problem now is to select
(x)dP (x)
(4)
E (X) , 8 2 H
(5)
Now if K has more than one element, the test that maximize the power for each alternative will be
dierent, so things is more complicated. But if K has only one element, things simplify. Sometimes in case
when there are many alternatives, we can get lucky and have the test maximizes the power of all alternatives
in K. This is called uniformly most powerful UMP tests.
Theorem 13. Let P0 and P1 be probability distribution possessing densities p0 and p1 respectively wrt
measure .
(i) Existence. For testing H: p0 against the alternative K: p1 there exists a test and a constant k such
that:
5
E0 (X) =
(6)
and
(x) =
1
0
(7)
(ii) Sufficient Condition for a Most Powerful Test. If a test sastifies (6), and (7) for some k, then it is
most powerful for testing p0 against p1 at level .
(iii) Necessary condition for most powerful test. If is most powerful at level for testing p0 against p1 ,
then for some k it sastitifes 7 a.e. . It also sastifies 6 unless there exists a test of size < and with power
1.
3.2
Now when the set K has multiple elements, again in general the most powerful test of H agains an alternative
1 > 0 (in contrast to H0 1 0 ) depends on 1 and is then not UMP. However, a UMP test does exists if
an additional assumption is satisfied. The real-parameter family of densities p (x) is said to have monotone
likelihood ratio
Definition 8. Monotone Likelihood Ratio. The real-parameter family of densities p (x) is said to have
monotone likelihood ratio if there exists a real-valued function T (x) such that for any < 0 the distribution
P and P0 are distinct, and the ratio p0 (x)/p (x) is a nondecreasing function of T (x).
Theorem 14. Let be a real parameter, and let the random variable X have density p (x) with monotone
likelihood ratio in T (x).
(i) For testing H : 0 against K : > 0 , there exists a UMP test, which is given by:
(x) =
where C and
are determined by
8
>
<1
>
:
, T (x) > C
, T (x) = C
, T (x) < C
E0 (X) =
(8)
(9)
Concentration Bound
t]
6
EX
,t > 0
t
P [X
t]
E (X )
t2
)}}
2[0,b]
2 2
/2 , 8 2 R. Thus: P [X
t] exp
t2 / 2
t] 2 exp
X is sub-gaussian. Thus
t2
2 2
X
t2
Pn
P
(Xi i ) t exp
2 i=1 i2
i=1
i.
Then 8t
0,
Theorem 19. Equivalent characterizations of sub-Gaussian variables. Given any zero-mean RV X, the
following properties are equivalent:
2 2
(i)9 : E [exp { X}] exp
2
1, Z N 0, 2 : P [|X| s] cP [|Z|
(2k)!
(iii)9 0 : E X 2k k 2k , 8k = 1, 2, ...
2 k!
X2
1
(iv)E exp
p
, 8 2 [0, 1)
2
2
1
(ii)9c
s] , 8s
1
E exp { (X )} exp
,8| |
2
b
Theorem 20. Sub-exponential tail bound. Suppose that X is sub-exponential with parameters (, b) . Then:
n 2 o
h
i
(
t
2
exp
2 2 , 8t 2 0, b
P [X + t]
2
t
exp
8t > b
2b ,
Definition 11. Bernsteins condition. Given a RV X mean and variance
condition with parameter b holds if:
i 1
h
k
E (X ) k! 2 bk 2 , 8k = 3, 4, ...
2
7
Theorem 21. Bernstein-type bound. For any RV satisfying the Bernstein condition above, we have:
E [exp { (X
P [|X
)}] exp
|
t] 2 exp
2 2
/2
1
,8| | <
b| |
b
2
t
2 ( 2 + bt)
Multivariate sub-exponential. RVs X1 , ..., Xn are independent, and Xk is subexponential with parameters (k , bk ) , and has mean k = E [Xk ]. The MGF is:
"
( n
)#
n
X
Y
E exp
(Xk k )
=
E [exp { Xk }]
k=1
k=1
exp
i=1
2 2
k
,8| | <
max bk
k=1,...,n
i.i.d
"
1X 2
Zk
n
k=1
t 2 exp
nt2
8
, 8t 2 (0, 1)
Theorem 22. Equivalent Characterizations of sub-exponential variables. For a zero-mean RV X, the following statements are equivalent:
2
2
,8| | <
0
1
b