Lecture 2.3.7-2.3.9
Lecture 2.3.7-2.3.9
Lecture 2.3.7-2.3.9
(CSE)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
2
COURSE OUTCOMES
On completion of this course, the students shall be
able to
3
What is a sequence?
• Ordered set of elements: s = a1,a2,..an
• Each element ai could be
• Numerical
• Categorical: domain a finite set of symbols S, |S|=m
• Multiple attributes
• The length n of a sequence is not fixed
• Order determined by time or position and could be regular or
irregular
Real-life sequences
• Classical applications
• Speech: sequence of phonemes
• Language: sequence of words and delimiters
• Handwriting: sequence of strokes
• Newer applications
• Bioinformatics:
• Genes: Sequence of 4 possible nucleotides, |S|=4
• Example: AACTGACCTGGGCCCAATCC
• Proteins: Sequence of 20 possible amino-acids, |S|=20
• Example:
• Linear time algorithms exist for constructing such PSTs from training
data [Apostolico 2000]
Hidden Markov Models
• Doubly stochastic models
A 0.6
A 0.9
0.5 C 0.4
C 0.1
S1 0.9 S2
0.1 0.5
0.8
S4
S3
0.2
• Efficient dynamic programming A 0.5 A 0.3
algorithms exist for C 0.5 C 0.7
• Finding Pr(S)
• The highest probability path P that
maximizes Pr(S|P) (Viterbi)
• Training the model
• (Baum-Welch algorithm)
Discriminative training of HMMs
• Models trained to maximize likelihood of data might perform badly
when
• Model not representative of data
• Training data insufficient
• Alternatives to Maximum-likelihood/EM
• Objective functions:
• Minimum classification error
• Maximum posterior probability of actual label Pr(c|x)
• Maximum mutual information with class
• Harder to train above functions, number of alternatives to EM proposed
• Generalized probabilistic descent [Katagiri 98]
• Deterministic annealing [Rao 01]
HMMs for profiling system calls
• Training:
• Initial number of states = 40 (roughly equals number of distinct system calls)
• Train on normal traces
• Testing:
• Need to handle variable length and online data
• For each call, find the total probability of outputting given all calls before it.
• If probability below a threshold call it abnormal.
• Trace is abnormal if fraction of abnormal calls are high
More realistic
STIDEexperiments
RIPPER HMM
thresh %false- threshold %false- threshold %false-
old pos pos pos
Site-1 lpr 12 0.0 3 0.0016 10-7 0.0003
Site-2 lpr 12 0.0013 4 0.0265 10-7 0.0015
named 20 0.0019 10 0.0 10-7 0.0
xlock 20 0.00008 10 0.0 10-7 0.0
Prefix S1 S2
Suffix
S4
S3
Building name
Prefix S1 S2
Suffix
S4
S3
Combined model over all
Example: IE
tags
Naïve Model: One state per element
A C T G G T T T A C C C C T G T G
Simpler problem: Segmenting a 0/1
sequence
• Players A and B
• A has a set of coins with
different biases Return
• A repeatedly
• Picks arbitrary coin Pick
A
• Tosses it arbitrary number of
times Toss
• B observes H/T
• Guesses transition points 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1
and biases
B
0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1
A MDL-based approach
•Given n head/tail observations
• Can assume n different coins with bias 0 or 1
• Data fits perfectly (with probability one)
• Many coins needed
• Or assume one coin
• May fit data poorly
• “Best segmentation” is a compromise between
model length and data fit
0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1
0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 1
Shortest path
Edge cost =
model cost
+ data cost
Non-independent models
• In previous method each segment is assumed to be independent of each other,
0 0 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0
does not allow model reuse of the form:
M1 M2
2. Solve (n,h) to get H models
• Example: 2-medians
3. Assign each segment to best ofA HC T G G T T T A C C C C T G T G
52
THANK YOU
For queries
Email: [email protected]
53