Lecture14 Logistic
Lecture14 Logistic
Lecture14 Logistic
Spring 2020
Stanley Chan
Note that
h(x) → 1, as x → ∞,
h(x) → 0, as x → −∞,
Derivative is
−2
d 1
−a(x−x0 ) −a(x−x0 )
= − 1 + e e (−a)
dx 1 + e −a(x−x0 )
!
e −a(x−x0 )
1
=a
1 + e −a(x−x0 ) 1 + e −a(x−x0 )
1 1
=a 1−
1 + e −a(x−x0 ) 1 + e −a(x−x0 )
= a[1 − h(x)][h(x)].
x T1 2
1 y1
1 . .. w − .. 1
kAθ − y k2 .
= .. . w . =
N N
T xN 1
0
yN
c Stanley Chan 2020. All Rights Reserved.
11 / 25
Training Loss for Logistic Regression
N
L(hθ (x n ), yn )
X
J(θ) =
n=1
N
− yn log hθ (x n ) + (1 − yn ) log(1 − hθ (x n ))
X n o
=
n=1
-1
0.8
-2
0.6
J( )
J( )
-3
0.4
-4
0.2
-5
0 -6
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
L2 Logistic
So the L2 loss is not convex, but the logistic loss is concave (negative
is convex)
If you do gradient descent on L2, you will be trapped at local minima
argmin J(θ)
θ
N
− yn log hθ (x n ) + (1 − yn ) log(1 − hθ (x n ))
X n o
= argmin
θ n=1
N
!
hθ (x n ) (1 − hθ (x n ))
Y
yn 1−yn
= argmin − log
θ n=1
N n
hθ (x n )yn (1 − hθ (x n ))1−yn .
Y o
= argmax
θ n=1
hθ (x n ) = p, and 1 − hθ (x n ) = 1 − p.
hθ (x n )
n
+ log(1 − hθ (x n ))
X n o
− yn log
1 − hθ (x n )
=
n=1
hθ (x n )
θ (x n )
In statistics, the term log 1−h is called the log-odd.
If we put hθ (x n ) = 1
, we can show that
1+e −θT x
hθ (x )
1
= log e θT x = θ T x .
1+e −θT x
1 − hθ (x )
log = log
e −θT x
1+e −θT x
e −θ x
" T
#
−θ T x −θ T x
h i
= −∇θ log = −∇ log e − log(1 + e )
1 + e −θ x
T θ
−e −θ x
T
!
=x+ x = hθ (x )x .
1 + e −θ x
T
=
1
−θ T
x 1−
1
−θ T
x xx T
1+e 1+e
= hθ (x )[1 − hθ (x )]xx T .
hθ (x n )
n
+ log(1 − hθ (x n ))
X n o
− yn log
1 − hθ (x n )
J(θ) =
n=1
is convex in θ.
So we can use convex optimization algorithms to find θ.
c Stanley Chan 2020. All Rights Reserved.
22 / 25
Convex Optimization for Logistic Regression
We can use CVX to solve the logistic regression problem
But it requires some re-organization of the equations
N
− yn θ T x n + log(1 − hθ (x n ))
X n o
J(θ) =
n=1
eθ xn
N T
!
− yn θ T x n + log 1 −
X n o
=
1 + eθ xn
T
n=1
N
− yn θ T x n − log 1 + e θ x n
X n T
o
=
n=1
!T
N N
yn x n log 1 + e θ x n
X X T
=− θ− .
n=1 n=1
Tx
The last term is a sum of log-sum-exp: log(e 0 + e θ ).
c Stanley Chan 2020. All Rights Reserved.
23 / 25
Convex Optimization for Logistic Regression
0.8 Data
Estimated
True
0.6
0.4
0.2
0 2 4 6 8 10