Ilovepdf Merged

Computer Vision
FindingFeatures, Affine
Invariance, SIFT
1
Outline
• Concept of Scale
– Pyramids
– Scale-space approaches briefly
• Scale invariant region selection
• SIFT: an image regiondescriptor
2
Reminder: Motivation
• Image Matching
– Fundamentalaspect of many problems
• Object Recognition
• 3D Structures
• Stereo Correspondence
• Motion Tracking
• ….
3
Reminder: Motivation
• Image Matching
– Fundamental aspect of many problems
• Object Recognition
• 3D Structures
• Stereo Correspondence
• Motion Tracking
• ….
What are the desired features to conduct these tasks?
4
Review of the CornerDetection
Corners are invariant to
translation
rotation
scaling
5
SCALE
• The extrema in a signal and its first a few

derivatives provide a useful general purpose
description for many kinds of signals
– Edges
– Corners
6
SCALE

derivatives provide a useful general purpose
description for many kinds of signals
– Edges
– Corners
• In physics, objects live on a range of scales.
7
SCALE

derivatives provide a usefulgeneral purpose
descriptionfor many kinds of signals
– Edges
– Corners
• In physics, objects live on a range of scales.
• Neighborhood operations can only extract local
features (at a scale of at most a few pixels),
however, images contain information at larger
scales
8
Scale-Space
• Any given image can
contain objects that exist
at scales different from
other objects in the same
image
9
Scale-Space
image
• Or, that even existat
multiple scales
simultaneously
10
Scale-Space
image
• Or, that even existat
multiple scales
simultaneously
11
Multi-scale Representation
Small Scale Smaller Filter
Features Masks
Image
Large Scale Larger Filter
Features Masks
12
Multi-scale Representation
Small Scale Smaller Filter
Features Masks
Image
Large Scale Larger Filter
Features Masks
Computational Doubling the scale leads to four-fold

Cost increases! increase in the number of operations in 2D.
13
Multi-Scale Representation: Pyramid
• Pyramid is one way to represent images in

multi-scale
– Pyramid is built by using multiple copies of image.
– Each level in the pyramid is 1/4 of the size of
previous level.
– The lowest level is of the highest resolution.
– The highest level is of the lowest resolution.
14
Multi-Scale Representation: Pyramid
15
Pyramid can capture global and local
features
16
Gaussian Pyramids
17
Source: Forsyth
Laplacian Pyramids
18
Source: Forsyth
Laplacian Pyramids
1
-2 1 -2 1
1
Laplacian of Gaussian (LoG): Gaussian first to smooth images then perform Laplacian operation
19
Laplacian Pyramids
20
Laplacian Pyramids
Laplacian Pyramid Construction
22
Laplacian Pyramid Construction
23
Decimation and Interpolation
Lowpass filter
(i.e., Gaussian)
24
Difference of Gaussian(DoG)
• It is a common approximation of LoG (better run
time).
• It is the difference between a blurredcopy of image I

and an even more blurred copy ofI.
25
Scale Selection-Automated
• Find scale that gives local maxima of some function
f in both position and scale.
f f
Image 1 Image 2
s1 region size s2 region size

Good Function?
• A “good” function for scale detection:
has one stable sharp peak
f f f
Good !
bad bad
region size region size region size
• For usual images: a good function would be a one

which responds to contrast (sharp local intensity
change)
27
Patch Size Corresponding toScale
How to find corresponding patch sizes?
28
29
30
31
f (I i1! i m (x,  )) f (I i1! im (x,))

32
f (I i1! i m (x,  )) f (I i1! im (x,))

33
• Function responses for increasing scale (scale signature)
f (I i1! i m (x,  )) f (I i1! im (x,)

34
Normalization
42
Scale Invariant Feature Transform (SIFT)
• Lowe., D. 2004, IJCV
4
3
• Shows existence, importance, and value of

invariant types of detector
• Demonstrates the richness that feature
descriptors can bring to featurematching
Instead of using LoG (Laplacian of Gaussian) like

in Harris and Hessian-based operators, SIFT uses
DoG (Difference of Gaussian).
37
• Image content is transformed into local feature

coordinates that are invariant to translation,
rotation, scale, and other imaging parameters
38
• Image content is transformed into local feature
coordinates that are invariant to translation,
rotation, scale, and other imaging parameters
39
Overall Procedure at a High Level
Scale-Space Search over multiple scales and image
Extrema locations
Detection
Keypoint Fit a model to determine location and

scale. Select Keypoints based on a
Localization measure of stability.
Orientation Compute best orientation(s) for each

Assignment keypoint region.
Keypoint Use local image gradients at selected

scale and rotation to describe each
Description keypoint region. 47
Basic idea:
SIFT Descriptor
• Take n x n (i.e., n=16) square window (around a feature/interest point)
• Divide them into m x m (i.e, m=4) cells
• Compute gradient orientation for each cell
• Create histogram over edge orientations weighted by magnitude
0 2
angle histogram
41
SIFT Descriptor
16 histograms x 8 orientations
= 128 features
42
SIFT Descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
• Threshold normalize the descriptor:
such that:
0.2
43
Properties of SIFT
Extraordinarily robust matching technique
– Can handle changes in viewpoint

– Can handle significant changes in illumination
– Efficient (realtime)
– Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
44
Revisit SIFT Steps
(1) Scale-space extrema detection
– Extract scale and rotation invariant interest points (i.e.,
keypoints).
(2) Keypoint localization
– Determine location and scale for each interest point.
– Eliminate “ weak ” keypoints
(3) Orientation assignment
– Assign one or more orientations to each keypoint.
(4) Keypoint descriptor
– Use local image gradients at the selected scale.
45
(1) Scale-Space ExtremaDetection
We want to find points that give us information about the

objects in the image.
The information about the objects is in the object’s edges.
We will represent the image in a way that gives us these edges

as this representations extrema points.
46
scale
• Harris-Laplace
LoG
Find local maximaof:
– Harris detector in space y
– LoG in scale
 Harris → x
scale
• SIFT
DoG
Find local maximaof:
– Hessian in space y
– DoG in scale
Hessian x
47
48
• Extract local extrema
(i.e., minima or maxima)
( )
D k 2
in DoG pyramid.
-Compare each point to
its 8 neighbors at the
same level, 9 neighbors
in the level above, and 9
D(k ) neighbors in the level
below (i.e., 26 total).
D( )
49
• Laplacian of Gaussian kernel

– Scale normalized (x by scale2)
– Proposed by Lindeberg
• Scale-space detection
– Find local maxima across scale/space
– A good “blob” detector
50
(2) Keypoint Localization
• There are still a lot of points, some of them are
not good enough.
• The locations of keypoints may be not
accurate.
• Eliminating edge points.
51
—Reject (1) points with low contrast(flat)
(2) poorly localized along an edge (edge)
—Reject (1) points with low contrast(flat)
(2) poorly localized along an edge (edge)
Inaccurate KeypointLocalization
Poor contrast
True Extrema
Detected Extrema
Sampling x
61
Inaccurate KeypointLocalization
The Solution:
Taylor expansion:
Minimize to find accurate extrema:
If offset from sampling point is larger than 0.5 -

Keypoint should be in a differentsampling point.
62
Brown & Lowe 2002
Local extremas
56
Local extremas Remove low contrast features
57
Local extremas Remove low edges
58
SIFT Descriptor
59
(3) Orientation Assignment
Create histogram of gradient directions, within a region around the
keypoint, at selected scale:
L(x, y, ) = G(x, y, )* I (x, y)

m(x, y) = (L(x +1,y) − L(x −1,y))2 + (L(x,y +1)− L(x, y −1))2
(x, y) = atan2((L(x,y +1)− L(x, y −1))/ (L(x +1,y)− L(x −1,y)))
36 bins (i.e., 10o per bin)
0 2
•Histogram entries are weighted by (i) gradientmagnitude and (ii) a
Gaussian function with σ equal to 1.5 times the scale of the key point.
61
62
63
64
65
(4) KeypointDescriptor
• Partial Voting: distribute histogram entries into adjacent bins
(i.e., additional robustness to shifts)
– Each entry is added to all bins, multiplied by a weight of1-d,
where d is the distance from the bin it belongs.
66
(4) KeypointDescriptor
• Descriptordepends on two main parameters:
– (1) number of orientations
– n x n array of orientation histograms
67
Evaluating Results
How can we measure the performance of a feature matcher?
the matcher correctly

found a match 0.7
true
# true positives matched positive
# true positives rate
features that really do have a match
0 0.1 false positive rate 1

the matcher said yes when
# false positives matched the right answer was no
# true negatives
68
Evaluating Results
How can we measure the performance of a feature matcher?
ROC curve (“Receiver Operator Characteristic”)
1
the matcher correctly

found a match 0.7
true
# true positives matched positive
# true positives rate
features that really do have a match
0 0.1 false positive rate 1

the matcher said yes when
# false positives matched the right answer was no
# true negatives
69
Change of Illumination
Change of brightness => doesn’t effect gradients
(difference of pixels value).
Change of contrast => doesn’t effect gradients
(up to normalization).
Saturation (non-linear change of illumination) =>
affects magnitudes much more than orientation.
=> Threshold gradient magnitudes to 0.2 and
renormalize.
70
Illumination invariance?
1. Before you start, you can normalize the images.

2. Difference based metrics can be used (Haar, SIFT,…)
71
72
• Extract features
73
• Compute putative matches
74
• Compute putative matches
• Loop:
– HypothesizetransformationT (small group of
putative matches that are related by T)
75
• Compute putativematches
• Loop:
– Hypothesize transformation T (small groupof
– Verify transformation (search for other matches
consistent with T) 83
• Compute putativematches
• Loop:
– Hypothesize transformation T (small groupof
– Verify transformation (search for other matches
consistent with T) 84
Object Recognition
For training images:
Extracting keypoints by SIFT. Creating
descriptors database.
For query images:
Extracting keypoints by SIFT.
For each descriptor - finding nearest neighbor in DB.
Finding cluster of at-least3 keypoints.

Performing detailed geometric fit check for each cluster.
78
Recognition/Matching
79
Image Registration
80
THE END
Slide Credits: Ulas Bagci
INTRODUCTION
• Recognition of object deals with labelling of
an object present in an image in the same
way as a human will understand it.
• For computers to identify/recognize an
object there should be prior knowledge
database.
• Machine learning deals with training of
computers by giving them to prior
knowledge about the object shape, size,
color etc. which leads to recognise the
object.
Like human understanding , object recognition includes:
• Detection: of separate objects.

• Description: of their geometry and position in 3D
• Classification: as being one of known class
• Identification: of the particular instance
• Understanding: of spatial relationship between objects
What does object recognition involves?
Identification: Is it Taj mahal?
Object categorization
sky
tree
people
Scene and context categorization
The whole image says a scene where people come to visit Taj mahal
Instance level recognition:
The image contains Taj mahal in it. It is obvious.

Generic categorization
All of the images contain a common object i.e. a building like structure
Object categorization problem
▪ Given a number of training images of a category,

recognize a priori unknown instances of that category
and assign the correct category label.
▪ It deals with the categorization of an object based upon

prior knowledge database.
▪ Problem : A single object can not have an unique label.

It may belongs to various categories.
Red rose Rose Flower Plant
Which category is best suited for the above image?

For categorization problem first we have to understand how
human brain is able to categorize the object.
Brain deals with following task when it sees an image.
• The highest level at which category members have similar

perceived shape. Ex- In the previous image the shape highly
resemblance to rose.
• The highest level at which single mental image reflects the
entire category. Ex- when we see the image a prior image of
rose comes to our mind.
• The level at which human subjects are usually fastest at
identifying category numbers. Ex- What does the image
represent: flower or rose? We obviously choose rose.
• The highest label at which a person uses a similar motor actions
for interaction with category members. Here the highest
category is red rose.
Steps of object recognition
Start
Collect data
Prior knowledge
Choose features
Choose model
Train classifier
Evaluate classifier
End
Challenges in object recognition
• Robustness: Illumination, Occlusions, Clutter, Object

pose, View point
• Importance of context: Some part of the picture have

same intensity value but represent different object.
So context information may be incorporated.
• Complexity of image: Number of pixels, Large number

of categories, Large number of database.
Applications: Assisted driving
Pedestrian and car detection
Ped
meters
Ped
Car
meters
Lane detection • Collision warning

systems with adaptive
cruise control
• Lane departure warning
systems
• Rear object detection
systems
Applications: image search
Challenges: viewpoint variation
Challenges: illumination variation
slide credit: S. Ullman

Challenges: occlusion
Challenges: scale
Scaling affects
the image .
Challenges: deformation
Challenges: intra-class variation
Past achievements using Object
recognition
• Finger print recognition
• Bank cheque number recognition
• Facebook person finder
• Google net
• Recognition of flat structured objects

A brief introduction to Pattern Recognition
“The assignment of a physical object or event to one of a several
pre-specified categories”
Duda & Hart
• A pattern is an object, process or event that can be given a

name.
• A pattern class (or category) is a set of patterns sharing common
attributes and usually originating from the same source
• During recognition, given objects are assigned to the prescribed
classes.
• A classifier is a machine which performs classification.
Basic component of pattern recognition system
Training data
Physical environment
Pre processing
Data acquisition
Feature extraction
Pre processing
Feature selection
Feature extraction
Features
Features
Classification Model learning and
estimation
Post processing
Decision
Pattern representation
• A pattern is represented by a set of d features, or

attributes viewed as d-dimensional feature vector.
𝑋 = (𝑥1 , 𝑥2 , … . 𝑥𝑑 )𝑇
Feature vector
• A vector of observations (measurements).
Pattern • Feature vector (𝑥 ∈ 𝑋)
𝑥1 • 𝑥 is a point in feature
𝑥2
. =𝑥
space 𝑋
. • Feature vector is the set
𝑦 𝑥𝑑 of observations.
• Pattern (𝑦 ∈ 𝑌)
Task
• Our task is to develop a classifier(or decision rule) which can
map the features with respective patterns.
𝑞: 𝑋 → 𝑌
Feature extraction
Task: to extract features which are good for classification.
Good features:
• Objects from same classes have similar feature values

• Objects from different classes have different values.
Good features Bad features

Feature selection
• The problem is optimization of parameters of feature
extractor, i.e. which features to be selected such that
the classifier can perform the task efficiently as well as
dimension of feature vector can be reduced.
• These can be resolved by using PCA (unsupervised) or

LDA(supervised).
Classifier
• A classifier partitions feature space X in to class labeled
regions such that
𝑋 = (𝑋1 ∪ 𝑋2 ∪ ⋯ ∪ 𝑋 𝑌 ) and (𝑋1 ∩ 𝑋2 ∩ ⋯ ∩ 𝑋 𝑌 ) = ∅
• The classification task is to determine the region to which the
feature vector belongs.
• Boarders between decision boundaries are called decision
regions.
The efficiency of this classifier depend upon two
factors:
• What mistakes it does?
• Cost associated with these mistakes
To avoid misclassification two strategies are applied:
• Use the training data to build representative probability

model; separately model class conditional densities and
priors(Generative)
• Directly construct a good decision boundary, model the

posterior(Discriminative)
Types Data Modelling
• Generative methods
• Probabilistically model the classes
• model the “shape” of each class
• histograms, PCA, mixtures of Gaussians, graphical
models (HMM’s, belief networks, etc.)
• Discriminative methods
• Probabilistically model the decision boundaries
• model boundaries between classes
• perceptron, neural networks, support vector machines
(SVM’s)
Generative classification Example
• Skin pixels have a distinctive range of
colors corresponds to region(s) in RGB
color space
• For visualization, only R and G skin
components are shown in the image.
• Task: To determine whether the pixel
belongs to skin region or not.
• A pixel X = (R,G,B) is skin if it is in the
skin region
How to solve this problem?
Skin detection
X
• Learn the skin region from examples
• Manually label pixels in one or more
“training images” as skin or not skin
• Plot the training data in RGB space
• Skin pixels shown in orange, non-skin
pixels shown in red
• Some skin pixels may be outside the
region, non-skin pixels inside. Why?
Skin classification techniques
Given X = (R,G,B): how to determine if
it is skin or not?
• Nearest neighbor
– find labeled pixel closest to X
– choose the label for that pixel
• Data modeling
– fit a model (curve, surface, or
volume) to each class
• Probabilistic data modeling
– fit a probability model to each
class
Basic Probability
• X is a random variable
• P(X) is the probability that X achieves a certain
value
• For continuous X
∞
‫׬‬−∞ 𝑃
𝑋 𝑑𝑋 = 1
0≤𝑃 𝑋 ≤1
• For discrete X
෍𝑃 𝑋 = 1
• Conditional probability: P(X | Y)

• probability of X given that we already know Y
Probabilistic skin classification
• Each pixel has a probability of being skin or not skin

𝑃 ~𝑠𝑘𝑖𝑛 𝑅 = 1 − 𝑃(𝑠𝑘𝑖𝑛|𝑅)
• Choose interpretation of highest probability
• set X to be a skin pixel if and only if
𝑅1 < 𝑋 ≤ 𝑅2
Learning conditional PDF’s
• We can calculate P(R | skin) from a set of training images. It is

simply a histogram over the pixels in the training images. Each
bin Ri contains the proportion of skin pixels with color Ri
• This doesn’t work as well in higher-dimensional spaces.

Approach: fit parametric PDF functions
• common choice is rotated Gaussian
• center
𝑐 = 𝑋ത
• Covariance
ത
෍(𝑋 − 𝑋)(𝑋 ത 𝑇
− 𝑋)
𝑋
But this isn’t quite what we want

• How to determine if a pixel is skin?
• We want P(skin | R) not P(R | skin)
• How can we get it?
Bayes rule
what we measure domain knowledge

(likelihood) (prior)
what we want
normalization term
(posterior)
• The prior: P(skin)
• P(skin) may be larger if we know the image contains a
person
• For a portrait, P(skin) may be higher for pixels in the center
• P(skin) may be proportion of skin pixels in training set
• Suppose the prior is uniform: P(skin) = P(~skin) = 0.5
– in this case P(skin|R)= cP(R|skin)

P( ̴skin|R)= cP(R| s̴ kin) ,
• Maximizing the posterior is equivalent to maximizing the likelihood

P(skin|R)> P( s̴ kin|R) if and only if P(R|skin)>P(R| s̴ kin)
This is called Maximum Likelihood (ML) estimation

Bayesian estimation
likelihood posterior (un-normalized)
Goal : To choose the label (skin or ~skin) that maximizes the

posterior(minimizes the probability of misclassification).
This is called Maximum A Posteriori (MAP) estimation
General classification
• This same procedure applies in more general circumstances
– More than two classes
– More than one dimension
Example: face detection

• Here, X is an image region
– dimension = # pixels
– each face can be
thought of as a point in
a high dimensional
space
H. Schneiderman and T.Kanade
Principal Component Analysis (PCA)
• Problems arise when performing recognition in a high-dimensional
space (curse of dimensionality).
• Significant improvements can be achieved by first mapping the data
into a lower-dimensional sub-space.
• The goal of PCA is to reduce the dimensionality of the data while
retaining as much information as possible in the original dataset.
𝑎1 𝑏1
𝑎2 𝑏2
𝑥 = . −→ reduce dimensionality −→ y = . (K ≪ N)
. .
𝑎𝑁 𝑏𝑁
44
• Principle Component Analysis is all about finding the direction
in a feature space along which all the points have the greatest
variance.
Uses:
• Data Visualization
• Data Reduction
• Data Classification
• Trend Analysis
• Factor Analysis
• Noise Reduction
• Suppose we have a population measured on p random variables
X1,…,Xp.
• Note that these random variables represent the p-axes of the
Cartesian coordinate system in which the population resides.
• Our goal is to develop a new set of p axes (linear combinations of the
original p axes) in the directions of greatest variability.
X2
X1
This is accomplished by rotating the axes.

Algebraic Interpretation
• Given m points in a n dimensional space, for large n, how does one

project on to a low dimensional space while preserving broad
trends in the data and allowing it to be visualized?
Algebraic Interpretation – 1D
• Given m points in a n dimensional space, for large n, how
does one project on to a 1 dimensional space?
• Choose a line that fits the data so the points are spread out
well along the line
Algebraic Interpretation – 1D
• Formally, minimize sum of squares of distances to the line.
• Why sum of squares? Because it allows fast minimization,

assuming the line passes through 0
• Minimizing sum of squares of distances of the point to the line is
the same as maximizing the sum of squares of the projections on
that line, using Pythagoras theorem.
𝑥Ԧ 𝑇 𝑃
Origin
• 𝑥Ԧ 𝑖𝑠 𝑎 𝑢𝑛𝑖𝑡 𝑣𝑒𝑐𝑡𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑜𝑟𝑖𝑔𝑖𝑛

• 𝑥Ԧ 𝑇 𝑃 is the projection from point P to the line
• If there are more than one point present, then their total projection
on line can be found out using matrix multiplication
Point 1
P P P… P Point 2 L
Line * t t t … t *
Point 3 * i = ෍(𝑥Ԧ 𝑇 𝑃𝑖 )2
1 2 3… m : n 𝑖
Point m e
xT BT B x
• Our goal is to maximize 𝑥 𝑇 𝐵 𝑇 𝐵𝑥 , subject to constrain 𝑥 𝑇 𝑥 = 1

• Max( 𝑥 𝑇 𝐵 𝑇 𝐵𝑥 ), subject to constrain 𝑥 𝑇 𝑥 = 1
• Maximize 𝐸 = 𝑥 𝑇 𝑀𝑥 subject to 𝑥 𝑇 𝑥=1 (𝑀 = 𝐵 𝑇 𝐵)

• By applying Lagrange’s multiplier we can write
𝐸 ′ = 𝑥 𝑇 𝑀𝑥 + 𝜆 1 − 𝑥 𝑇 𝑥
• To maximize 𝐸′
𝑑𝐸 ′
= 2𝑀𝑥 + 2𝜆𝑥 = 0
𝑑𝑥
𝑀𝑥 = 𝜆𝑥(ℎ𝑒𝑟𝑒 𝑠𝑖𝑔𝑛 𝑖𝑠 𝑛𝑒𝑔𝑙𝑒𝑐𝑡𝑒𝑑)
• 𝑥 is an eigen vector of 𝐵 𝑇 𝐵
𝑻
How to calculate 𝑩 𝑩?
2
σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖
𝐵𝑇 𝐵= 2
σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 σ𝑛𝑖=1 𝑦𝑖
• Principle components are the orthogonal directions of the

covariance matrix of a set of points.
• 𝐵 𝑇 𝐵 = σ 𝑥𝑥 𝑇 if about origin
• 𝐵 𝑇 𝐵 = σ(𝑥 − 𝑥)(𝑥
Ԧ Ԧ 𝑇 if otherwise outer product
− 𝑥)
Dimensionality Reduction
• x is the mean of all the orange
G
points.
• v1 represents the position along
the line
• v2 represents the distance from
the points to the line
• To represent all the orange
point as 𝑣1 co-ordinates one
has to minimize 𝑣2 and
R
maximize 𝑣1
Consider the variation along direction 𝑣 among all of the orange
points:
T 2
var v = ෍ (x − x) . v
orange point x
= ෍ vT (x − x)(x − x)T v
x
= vT ෍ (x − x)(x − x)T
x
= vT Av
• v1 is eigenvector of A with largest eigenvalue
• v2 is eigenvector of A with smallest eigenvalue
𝑣1 = 𝑚𝑎𝑥𝑣 𝑣𝑎𝑟(𝑣) 𝑣2 = 𝑚𝑖𝑛𝑣 𝑣𝑎𝑟(𝑣)

Higher Dimension
• Suppose each data point is N-dimensional
T 2
var v = ෍ (x − x) . v
x
= 𝑥 𝑇 𝐴𝑥
• The eigenvectors of A define a new coordinate system.
• Eigenvector with largest eigenvalue captures the most

variation among training vectors x.
• Eigenvector with smallest eigenvalue has least variation.

Face recognition
• When viewed as vectors of
pixel values, face images are
extremely high dimensional.
• 100*100 image=10000
dimensions
• However only a few among
them corresponds a valid
face images
• We want to effectively model
a low dimensional linear
subspace which effectively
defines the variation of face
images.
• Given : M points x1,x2,...,xM in 𝑅𝑑 where d is very big.
• We want some direction in 𝑅𝑑 that captures most of the variations of
𝑥𝑖 . The coefficient would be
𝑢 𝑥𝑖 = 𝑢𝑇 𝑥𝑖 − μ
𝜇:mean of the points
• Direction that maximizes the variance of the projected data:
𝑀
1
= ෍ 𝑢𝑇 𝑥𝑖 − μ (𝑢𝑇 𝑥𝑖 − μ )𝑇
𝑀
𝑖=1
𝑀
𝑇
1 𝑇
=𝑢 ෍ 𝑥𝑖 − μ 𝑥𝑖 − μ u
𝑀
𝑖=1
= 𝑢𝑇 Σ𝑢
• Since u is a unit vector that can be expressed in terms of some
linear sum of eigenvectors of ∑, then direction of u that maximizes
the variance is the eigenvector associated with the largest
eigenvalue of ∑.
• The dimension that captures the maximum variation of data is the

eigenvector corresponding to the largest eigenvalue of the data
covariance matrix.
• Furthermore, the top k orthogonal directions that capture the most

variance of the data are the k eigenvectors corresponding to the k
largest eigenvalues.
PCA Example –STEP 1
• Subtract the mean from each of the data

dimensions.
• Subtracting the mean makes variance and covariance

calculation easier by simplifying their equations. The
variance and co-variance values are not affected by
the mean value.
DATA: ZERO MEAN DATA:
x y x y
2.5 2.4 .69 .49
0.5 0.7 -1.31 -1.21
2.2 2.9 .39 .99
1.9 2.2
.09 .29
3.1 3.0
1.29 1.09
2.3 2.7
2 1.6 .49 .79
1 1.1 .19 -.31
1.5 1.6 -.81 -.81
1.1 0.9 -.31 -.31
-.71 -1.01
• Calculate the covariance matrix
cov = .616555556 .615444444
.615444444 .716555556
• since the non-diagonal elements in this covariance

matrix are positive, we should expect that both the x
and y variable increase together.
• Calculate the eigenvectors and eigenvalues of

the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
•Eigenvectors are plotted
as diagonal dotted lines on
the plot.
• They are perpendicular
to each other.
•One of the eigenvectors
goes through the middle of
the points, like drawing a
line of best fit.
•The second eigenvector
gives us the other, less
important, pattern in the
data, that all the points
follow the main line, but
are off to the side of the
main line by some
amount.
• Reduce dimensionality and form feature vector: The

eigenvector with the highest eigenvalue is the principal
component of the data set.
• In our example, the eigenvector with the larges eigenvalue

was the one that pointed down the middle of the data.
• Once eigenvectors are found from the covariance matrix, the

next step is to order them by eigenvalue, highest to lowest.
This gives you the components in order of significance.
• Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
We can either form a feature vector with both of the
eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
• Deriving the new data

Final Data = Row Feature Vector * Row Zero Mean Data
• Row Feature Vector is the matrix with the eigenvectors in the
columns transposed so that the eigenvectors are now in the
rows, with the most significant eigenvector at the top
• Row Zero Mean Data is the mean-adjusted data transposed,
i.e. the data items are in each column, with each row holding
a separate dimension.
Final Data transpose: dimensions along columns
x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
Limitations of PCA
• Global appearance method : Not robust to misalignment,
background variation.
• The direction of maximum variance is not always good for
classification.
Some challenges for generative models
• Generative approaches were some of the first methods in pattern
recognition.
• Easy to model analytically and could be done with modest
amounts of moderate dimensional data.
There are some liabilities:
• Many signals are high dimensional and representing the complete

density of class is data-hard.
• Problem in model the hard cases-the ones near the boundaries!!
• We don’t typically know which features of instances actually
discriminative between classes.
Discriminative classification assumptions
• There are a fixed number of known classes.

• Ample number of training examples of each classes.
• Equal cost of making mistakes-what matter is getting the label right.
• Need to construct a representation of the instance but we don’t
know what features are diagnostic of the class label.
Basic framework
Train:
• Build an object model-a representation describe training
instances.
• Learn/train a classifier
Test:
• Generate candidates in new image
• Score the candidate
Discriminative methods(window method)
• Object detection and recognition is formulated as a classification
problem.
• The image is partitioned into a set of overlapping windows
Bag of image patches

• The image is partitioned into a set of overlapping windows and a
decision is taken at each window about if it contains a target object
or not.
Where are the screens?

Background Decision
boundary
Computer screen
Bag of image patches In some feature space
Generative vs. Discriminative
Generative Approach Discriminative Approach

model individual classes, priors model posterior directly
Feature or kernel space clustering
• Kernel functions are used to transform the data in the

image plane into a feature plane of higher dimension
(possibly infinite) known as kernel (or feature) space.
• Nonlinear mapping functions i.e. Φ transforms the
nonlinear separation problem in the image plane into a
linear separation problem in kernel space facilitating
clustering in the feature space.
Feature space clustering
• Due to high and possibly infinite feature dimension it is

unrealistic to measure the Euclidian distance between the
transformed variables.
• However, as per Mercer’s theorem working directly on the

transformed variables can be avoided.
• It can be used to calculate the distance between the pixel

feature values in the kernel space without knowing the
transformation function φ(.).
Mercers Theorem
• Any continuous, symmetric, positive semi definite kernel

function can be expressed as a dot product in a higher
dimension”. Thus if Φ(.) is the mapping function then using
Mercer’s Theorem k (x, y) is given by
where x and y be any two point in the image plane and Φ(x) and
Φ (y) are the corresponding points in the feature plane
respectively. ” ・ ” is the dot product in the kernel space and
k (x, y) represents kernel function.
Hematological Image Analysis for

108 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Standard Kernels
9/15/16
Computer Vision
Optical Flow
1
Readings
• Szeliski, R.
• Slide Credits : Ulas Bagci
2
Motion
3
Reasons
• the patterns only move when you blink or
move your eyes
• the arrangement of the backgrounds of the
‘rabbits’ determines which way the patterns
rotate.
4
Motion
Perception of change in the outside world is of the
highest importance for all living beings.
A great majority of changes in sensory excitation in

the image result from motion.
Images also have remarkable changes when a

light source is turned on or off-but this change
cannot be interpreted as motion.
5
Why Estimate Visual Motion?
• Visual Motion can be due to problems
– Camera instabilities, jitter
6
• Visual Motion Indicates dynamics in the scene
– Moving objects, behavior, tracking objects,
analyze trajectories
7
• Visual Motion Indicates dynamics in the scene
– Moving objects, behavior, tracking objects,
analyze trajectories
• Visual Motion reveals spatial layout
– Motion parallax
8
Motion Analysis
• Motion Analysis is connected with real-time
analysis; Ex: for robot navigation, gesture
identification etc.
• Prior Knowledge is necessary that includes:

➢ Information about the camera motion
➢ Information about the time interval between
consecutive images
Motion Analysis
Problems:
• Moving object detection and location; (i) static
camera and moving object (ii) Static object and
moving camera (iii) Moving object and moving
camera.
• Derivation of 3-D object properties from a set of
2-D projections acquired at different time
instants of object motion.
Motion Analysis
2-D representation of 3-D motion => Motion Field
Each point is assigned a velocity vector
corresponding to:
1. Motion direction
2. Velocity
3. Distance from an observer at an appropriate
image location
Visual Motion Estimation
• Patch-based motion
– Optical Flow
• Lucas-Kanade
• Horn-Schunck
• Motion Models
12
Applications
Motion
based
segmentation
Deformation Structure
Analysis from motion
Alignment
Tumor
(e.g., UAV
Tracking analysis)
Video
Compression
13
Video
• A video is a sequence of frames captured over
time
– Image data is a function of space (x,y) and time (t)
I(x,y,t)
Temporal
change t 15
Apparent Motion
Present dots on three movie frames See one moving dot
15
Describing Motion
16
Describing Motion
• Simplest way: Image Differencing (intensity values)
t
t+1
17
Optical Flow
• Refers to the problem of estimating a vector
field of local displacement in a sequence of
images.
18
Optical Flow
images.
• When we fix our attention to a single point
and measure velocities flowing through that
location, then the problem is called optical
flow.
19
Optical Flow
images.
• When we fix our attention to a single point
and measure velocities flowing through that
location, then the problem is called optical
flow.
– Stereo matching, image matching, tracking,…
20
Optical Flow
• Optical flow refers the image changes due to
motion during a time interval dt.
• The optical flow field is the velocity field that
represents the 3-D motion of object points across
a 2-D image. It represents only motion-related
intensity changes in the image.
• Optical flow should not be sensitive to
illumination changes and motion of unimportant
objects (shadows)
21
Estimating Optical Flow
• Assume the image intensity I is constant
Time = t Time = t+dt
I (x, y,t ) I (x + dx, y + dy,t + dt )

22
Assumption:
• The observed brightness of any object point

is constant over time (Colour constancy /
brightness constancy)
• Nearby points in the image plane move in a

similar manner (small motion)
23
I (x, y,t ) ≅ I (x + dx, y + dy,t + dt )

Let spatial displacement (dx, dy) occurs
during time interval dt with velocity (u, v)
24
I (x, y,t ) ≅ I (x + dx, y + dy,t + dt )

I(x(t ) + u.∆t, y(t) + v.∆t) — I(x(t ), y(t), t) ≈ 0
25
I (x, y,t ) ≅ I (x + dx, y + dy,t + dt)
I (x ( t ) + u.∆t, y(t) + v.∆t) — I ( x ( t ) , y(t), t) ≈ 0
Assuming I is differentiable function, and expand the first term using Taylor’s series:
26
I (x, y,t ) ≅ I (x + dx, y + dy,t + dt )
I(x(t ) + u.∆t, y(t) + v.∆t) — I(x(t ), y(t), t) ≈ 0
Assuming I is differentiable function, and expand the first term using Taylor’s series:
Compact I x u + Iy v + I t = 0 Brightness constancy

constraint
representation 27
Assumption: The Brightness Constraint
• Expresses the idea of similar brightness for the

same objects observed in a sequence
28

• When we follow with a given location and
trace their position in consecutive images of a
sequence, then the problem is called “feature
tracking”
29

tracking”
30

tracking”
• Trying to solve a single equation for two
variables (u and v) →ill-posed
I x u + Iy v + I t = 0 Aperture Problem
31
Aperture Problem
The components of the flow parallel to an edge is
unknown.
32
Second Assumption: Gradient Constraint
Velocity vector is constant within a small
neighborhood (LUCAS AND KANADE)
How to generate more equations for a pixel?

• Impose additional constraints.
• Assume that the flow field is smooth locally
• Assume the pixel’s neighbourhood have the same
(u,v)
• For a 5x5 window, 25 equations per pixel are
generated
33
34
35
Velocity vector is constant within a small neighborhood (LUCAS AND KANADE)
Type equation here.
2
E ( u , v) = (I x u + I y v + I t ) dxdy
x,y
E(u, v) @E(u, v)
= = 0
@u @v
2(I x u + I y v + I t ) I x = 0
2(I x u + I y v + I t )I y = 0
36
Lucas-Kanade
Type2 P Σ Σ
Minimum least square solution:
P
I
equation
x I x I y u I x I t
P P 2 = — P
I x
here. I y I y v Iy I t
37
Lucas-Kanade
P 2 P Σ Σ P
I I xIy u I x I t
P x P 2 = — P
IxIy Iy v Iy I t
Σ Σ
T xx Txy u T xt
Structural
= —
Tensor
representation Txy Tyy v Tyt
38
40
How does Lucas-Kanade behave?
At edges, matrix T becomes singular!
Textured region
Homog.
region
Eigenvalues of
Eigenvalues of
matrix T are high
matrix T are very small
41
Pisalls & Alternatives
• Brightness constancy is not satisfied
– Correlation based method could be used
42
• A point may not move like its neighbors
– Regularization based methods
43
• A point may not move like its neighbors
– Regularization based methods
• The motion may not be small (Taylor does not
hold!)
– Multi-scale estimation could be used
44
Multi-Scale Flow Estimation
u=1.25 pixels
u=2.5 pixels
u=5 pixels
u=10 pixels
image It-1 image It+1
Gaussian pyramid of image It Gaussian pyramid of image It+1

45
Example: Optical Flow in 1D
I(x,t) I(x,t + 1)
→
v?
p
46
I(x,t) I(x,t + 1)
It
Temporal derivative
p →
v
x
Ix
Spatial derivative
Assumptions:
I →
I t = I
I
Ix = V − t • Brightness constancy
x t t x= p
Ix • Small motion
47
Iterating helps refining the velocity vector
I(x,t) I(x,t + 1)
Temporal derivative at 2nd iteration
It
p
x
Ix
Can keep the same estimate for spatial derivative
→ → − It
vv previous
Converges in about 5 iterations
Ix 48
Lukas-Kanade without Pyramid
49
Lukas-Kanade with Pyramid
50
Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem
51
Horn & Schunck
• Minimize a global energy function
52
Horn & Schunck
53
Horn & Schunck
• Take partial derivatives w.r.t. u and v:
54
Horn & Schunck
• Take partial derivatives w.r.t. u and v:

2
(I x u + I y v + I t ) I x —𝛼 𝛻 u = 0
2
(I x u + I y v + I t )I y — 𝛼 𝛻 v = 0
55
Horn & Schunck
(u,v)
• Iterative scheme v (u,v)
I x (I x u k + I y v k + I t )
u k +1 = u k − (I x , I y )
 +I +I
2 2
x
2
y
I y (I xu k + I y v k + I t )
v k +1 = v k −
 2 +I 2x+I 2y u
• Yields high density flow

• Fill in missing information in the homogenous
regions
• More sensitive to noise than local methods
56
Optical Flow Matlab/C++/Python Code
• http://people.csail.mit.edu/celiu/OpticalFlow/
• http://opencv-python-
tutroals.readthedocs.org/en/latest/
py_tutorials/py_video/py_lucas_kanade/
py_lucas_kanade.html
• https://github.com/Itseez/opencv_attic/blob/
a6078cc8477ff055427b67048a95547b3efe92a
5/opencv/samples/python2/lk_track.py
57
Applications: Target Tracking
58
Applications: Action Recognition
− + − +
Fx , Fy Fx , Fx , Fy , Fy blurred F −, F +, F −, F +
x x y y
59
Recognition actions at a distance, Efros et al.
Applications: Motion Modeling
Flipping between image 1 and 2. Estimated flow field Second image is warped!
(hue indicates orientation
and saturation indicates magnitude)
60
Applications: Motion Segmentation
61
Global Motion Models (Parametric)
All pixels are considered to summarize global
motion!
• 2D Models
– Affine
– Quadratic
– Planar projective (homography)
• 3D Models
– Inst. Camera motion models
– Homography+epipole
– Plane+parallalx
62
Motion Models
Translation Affine Perspective 3D rotation
2 unknowns 6 unknowns 8 unknowns 3 unknowns
63
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
64
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
I x u + Iy v + I t ≈ 0
65
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
I x (a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0
66
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
Each pixel provides 1 linear constraint in 6 global unknowns (a1,..,a6)
67
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
I x ( a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0
Over all pixels, minimize the least square to find unknowns!
68
Computer Vision
Motion Models, Feature Tracking and

Alignment
1
Readings
• Szeliski, R.
• Slide Credits : Ulas Bagci
2
Recap: Estimating Optical Flow
• Assume the image intensity I is constant
Time = t Time = t+dt
I (x, y,t ) I (x + dx, y + dy,t + dt )

3
First Assumption: Brightness Constraint
4
Velocity vector is constant within a small neighborhood (LUCAS AND KANADE)
2(I x u + I y v + I t )I x = 0
2(I x u + I y v + I t )I y = 0
5
Recap: Lucas-Kanade
T yt T xy —T xt T yy T xt T xy —T yt T xx
u= 2 and v = 2
T x x T y y —T x y T x x T y y —T x y
6
Recap: Horn & Schunck
7
Global Motion Models(Parametric)
All pixels are considered to summarize global
motion!
• 2D Models
– Affine
– Quadratic
– Planar projective (homography)
• 3D Models
– Inst. Camera motion models
– Homography+epipole
– Plane+parallalx
8
Motion Models
Translation Affine Perspective 3D rotation
2 unknowns 6 unknowns 8 unknowns 3 unknowns
9
Example: Affine
Motion u(x, y) = a1 +
a 2 x + a 3 y v(x, y) = a 4 +
a 5 x + a6y
10
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
I x u + Iy v + I t ≈ 0
11
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
12
Example: Affine
a 2 x + a 3 y v(x, y) = a 4 +
a 5 x + a6y
13
Example: Affine
a 2 x + a 3 y v(x, y) = a 4 +
I x ( aa1 5+xa 2+x +a a6 y
3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0
Over all pixels, minimize the least square to find unknowns!
14
Example: Affine Motion
u(x, y) = a1 + a2 x + a3y
v(x, y) = a4 + a5 x + a6y
I x (a 1 + a2 x + a3 y) + I y (a4 + a5 x + a6 y) + I t 0
15
Locally Affine MotionModel
13
• E(A) should be computed within a small

neighborhood (i.e. locally affine term, let say
D)
17
D)
• From Taylor’s expansion
18

D)
19

D)
20

D)
18
22
23
24
25
26
27
Smoothness Constraint on Locally
Affine Motion Model
• We define a better error function by
28
Smoothness Constraint on
Locally Affine Motion Model
• We define a better error function by

E ( A) = E b (A) + E s (A)
29
Affine Motion Model
30
Affine Motion Model
31
Minimizing ErrorFunction
dE(A) dE b (A) dE s (A)
= +
da da da
32
dE(A) dE b (A) dE s (A)
= +
da da da
dE b (A) ∑ --- 2c[k ------

T
=
da c a]
x ,y €D
33
34
Minimizing Error Function-Solution Vector
36
Global Motion
Estimate motion using all

pixels in the image
37
Global Motion
Global Motion can be
Estimate motion using all used to
pixels in the image • Remove camera motion
• Object-based
segmentation
• generate mosaics
38
Global Motion
Global Motion can be

Estimate motion using all used to
pixels in the image • Remove camera motion
• Object-based
segmentation
• generate mosaics
39
Recap: Object Tracking
• Track an object over a sequence of images
40
Challenges in ObjectTracking
• Which features to track?

• Efficient tracking
• Appearance constraint violation
• …
41
Challenges in ObjectTracking
• Which features to track?

• Efficient tracking
• Appearance constraint violation
• …
42
Shi-Tomasi FeatureTracker
• Good Features to Track
43

– Find good features using eigenvalues of Hessian
matrix (threshold on the smallest eigenvalue
when computingHarris corner detection)
44

– Track from frame to frame with LK
45

– Track from frame to frame with LK
– Check consistencyof tracks by “affine registration”
to the first observed instance of the feature
46
Shi and Tomasi

CVPR 1994
Good Features
To Track.
47
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
48
KLT Tracking
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene
49
KLT Tracking
around a scene
• Object tracking (car, airplane, person)
50
KLT Tracking
around a scene
• Feature tracking (Harris corners)
51
KLT Tracking
around a scene
• Multiple object tracking
52
KLT Tracking
around a scene
• Tracking in single/multiple camera(s)
53
KLT Tracking
around a scene
• Tracking in single/multiple camera(s)
• Tracking in fixed/moving camera 54
KLT Tracking Algorithm
• Find GoodFeaturesToTrack
– Harris Corners (thresholdedon smallest
eigenvalues)
• Use LK algorithm to find optical flows
• Use Coarse-to-Fine strategy to deal with large
movements
• When creating long tracks, check appearance
of registered patch against appearance of
initial patch to find points that have drifted
55
Recent Developments at Optical Flow
• Start with LK or similar methods

+Gradientconsistency
+Energyminimization with smoothing term
+Region matching
+Keypointmatching
Region-based +Pixel-based +Keypoint-based

53
Large displacement optical flow, Brox et al., CVPR 2009
Very Recent Developments at Optical Flow
• Use of Machine Learning
– Deep Learning (ICCV 2015, Fischer et al., FlowNet)
57
9/22/16
DeepFlow (Large Displacement Optical Flow)
• Basically it is a matching algorithm with variational approach

[Weinzaepfel et al., ICCV 2013].
• Dense correspondence (matching)

• Self-smooth matching
• Large displacement optical
flowhttps://www.youtube.com/watch?v=k_wkDLJ8lJ
58
E
SIFT Tracking
Frame 0 → Frame 100
59
Practice: Horn-SchunckAlgorithm
60
61
62
Optical Flow - Quantitative Evaluation
63
Interpretation of Optical Flow Fields
64
Object features all have

Zero velocity.
65
66
Object is moving to the

Right.
67
68
Camera is moving into

the scene, and an object
moving passed the camera
69
70
Object is moving
Directly toward the camera
that is stationary
71
72
Object is rotating about

the line of sight to the camera
73
74
Object is rotating about an axis

perpendicular to the line of
sight.
75
Application in ImageAlignment
• Motion can be used for image alignment
73
Practice: HomogenousCoordinates
Q: How can we represent translation as a 3x3 matrix?
x' = x + t x
y' = y + t y
77
Practice: HomogenousCoordinates
Q: How can we represent translation as a 3x3 matrix?
x' = x + t x
y' = y + t y
78
Practice: Homogenous Coordinates
Practice: Homogenous Coordinates
Affine Transformation
Affine Transformation
Face Recognition and
Detection
4/22/2024 1
Eigenfaces for recognition
The Space of Faces
An image is a point in a high dimensional space

• An N x M image is a point in RNM
• We can define vectors in this space as we did in the 2D case
Key Idea
The set of faces is a “subspace” of the set of images

• We can find the best subspace using PCA
• This is like fitting a “hyper-plane” to the set of faces
– spanned by vectors v1, v2, ..., vK
– any face
• Images in the possible set  = {xRL } are highly correlated.

ˆ P
• So, compress them to a low-dimensional subspace that

captures key appearance characteristics of the visual DOFs.
Eigenfaces
PCA extracts the eigenvectors of A
• Gives a set of vectors v1, v2, v3, ...
• Each vector is a direction in face space
– what do these look like?
4/22/2024 5
Eigenfaces
Eigenfaces look somewhat like generic faces.

Linear subspaces
convert x into v 1, v 2 coordinates
What does the v 2 coordinate measure?

- distance to line
- use it for classification—near 0 for orange pts
What does the v 1 coordinate measure?
- position along line
- use it to specify which orange point it is
Classification can be expensive:

• Big search prob (e.g., nearest neighbors) or store large PDF’s
Suppose the data points are arranged as above
• Idea—fit a line, classifier measures distance to line
4/22/2024 7
Dimensionality reduction
Dimensionality reduction
• We can represent the orange points with only their v1 coordinates
(since v2 coordinates are all essentially 0)
• This makes it much cheaper to store and compare points
• A bigger deal for higher dimensional problems
4/22/2024 8
Linear subspaces
Consider the variation along direction v
among all of the orange points:
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution: v 1 is eigenvector of A with largest eigenvalue

v 2 is eigenvector of A with smallest eigenvalue
4/22/2024 9
Principal component analysis
Suppose each data point is N-dimensional
• Same procedure applies:
• The eigenvectors of A define a new coordinate system

– eigenvector with largest eigenvalue captures the most variation
among training vectors x
– eigenvector with smallest eigenvalue has least variation
• We can compress the data using the top few eigenvectors
– corresponds to choosing a “linear subspace”
» represent points on a line, plane, or “hyper-plane”
– these eigenvectors are known as the principal components
4/22/2024 10
Higher Dimensions
Suppose each data point is N-dimensional
• Same procedure applies:
• The eigenvectors of A define a new coordinate system

– eigenvector with largest eigenvalue captures the most variation
among training vectors x
– eigenvector with smallest eigenvalue has least variation
• We can compress the data by only using the top few
eigenvectors
– corresponds to choosing a “linear subspace”
» represent points on a line, plane, or “hyper-plane”
– these eigenvectors are known as the principal components
Problem: Size of Covariance Matrix A
Suppose each data point is N-dimensional (N pixels)

2 2
• The size of covariance matrix A is N x N

• The number of eigenfaces is N
• Example: For N = 256 x 256 pixels,

Size of A will be 65536 x 65536 !
Number of eigenvectors will be 65536 !
Typically, only 20-30 eigenvectors suffice. So, this

method is very inefficient!
Efficient Computation of Eigenvectors
If B is MxN and M<<N then A=BTB is NxN >> MxM

• M → number of images, N → number of pixels
• use BBT instead, eigenvector of BBT is easily
converted to that of BTB
(BBT) y = e y
=> BT(BBT) y = e (BTy)
=> (BTB)(BTy) = e (BTy)
=> BTy is the eigenvector of BTB
Eigenfaces – summary in words
Eigenfaces are
the eigenvectors of
the covariance matrix of
the probability distribution of
the vector space of
human faces
Eigenfaces are the ‘standardized face ingredients’ derived

from the statistical analysis of many pictures of human
faces
A human face may be considered to be a combination of

these standardized faces
Generating Eigenfaces – in words
1. Large set of images of human faces is taken.

2. The images are normalized to line up the eyes,
mouths and other features.
3. The eigenvectors of the covariance matrix of
the face image vectors are then extracted.
4. These eigenvectors are called eigenfaces.
Eigenfaces for Face Recognition
When properly weighted, eigenfaces can be summed

together to create an approximate gray-scale
rendering of a human face.
Remarkably few eigenvector terms are needed to give

a fair likeness of most people's faces.
Hence eigenfaces provide a means of applying data

compression to faces for identification purposes.
Dimensionality Reduction
The set of faces is a “subspace” of the set

of images
• Suppose it is K dimensional
• We can find the best subspace using

PCA
• This is like fitting a “hyper-plane” to the

set of faces
– spanned by vectors v1, v2, ..., vK
Any face:
Eigenfaces
PCA extracts the eigenvectors of A
• Gives a set of vectors v1, v2, v3, ...
• Each one of these vectors is a direction in face space
– what do these look like?
Projecting onto the Eigenfaces
The eigenfaces v1, ..., vK span the space of faces
• A face is converted to eigenface coordinates by

Choosing the Dimension K
eigenvalues
i= K NM
How many eigenfaces to use?
Look at the decay of the eigenvalues

• the eigenvalue tells you the amount of variance “in
the direction” of that eigenface
• ignore eigenfaces with low variance
Is this a face or not?
Recognition with Eigenfaces
Algorithm
1. Process the image database (set of images with labels)
– Run PCA—compute eigenfaces
– Calculate the K coefficients for each image
2. Given a new image (to be recognized) x, calculate K

coefficients
3. Detect if x is a face
4. If it is a face, who is it?

• Find closest labeled face in database
• nearest-neighbor in K-dimensional space
Key Property of Eigenspace Representation
Given
• 2 images xˆ1 , xˆ2 that are used to construct the Eigenspace

• ĝ1 is the eigenspace projection of image x̂1
• ĝ 2 is the eigenspace projection of image x̂2
Then,
|| gˆ 2 − gˆ1 ||  || xˆ2 − xˆ1 ||
That is, distance in Eigenspace is approximately equal to the

correlation between two images.
(M’ is the number of eigenfaces used)
Appearance-based Recognition
• Directly represent appearance (image brightness), not geometry.
• Why?
Avoids modeling geometry, complex interactions
between geometry, lighting and reflectance.
• Why not?
Too many possible appearances!
m “visual degrees of freedom” (eg., pose, lighting, etc)
R discrete samples for each DOF
How to discretely sample the DOFs?
How to PREDICT/SYNTHESIS/MATCH with novel views?

Appearance-based Recognition
• Example:
• Visual DOFs: Object type P, Lighting Direction L, Pose R
• Set of R * P * L possible images:
 = {xˆ } P
RL
• Image as a point in high dimensional space:
Pixel 2 gray value

x̂
x̂ is an image of N pixels and
A point in N-dimensional space
Pixel 1 gray value

Appearance from different view points
COIL Database, CAVE Lab, Columbia

Parametric Eigenspace
Estimating orientation of object
CAVE Lab, Columbia Univ

Estimating orientation of the object
CAVE Lab, Columbia Univ

View-Based and Modular
Eigenspaces for Face Recognition
Part-based eigenfeatures
Learn a separate
eigenspace for each
face feature
Boosts performance
of regular
eigenfaces
4/22/2024 35
Bayesian Face Recognition
4/22/2024 36
Bayesian Face Recognition
4/22/2024 37
Robust real-time face detection
Scan classifier over locs. & scales
4/22/2024 39
“Learn” classifier from data
Training Data
• 5000 faces (frontal)
• 108 non faces
• Faces are normalized
• Scale, translation
Many variations
• Across individuals
• Illumination
• Pose (rotation both in plane and out)
4/22/2024 40
Characteristics of algorithm
• Feature set (…is huge about 16M features)
• Efficient feature selection using AdaBoost
• New image representation: Integral Image
• Cascaded Classifier for rapid detection
➢ Fastest known face detector for gray scale

images
4/22/2024 41
Image features
• “Rectangle filters”
• Similar to Haar wavelets
• Differences between
sums of pixels in
adjacent rectangles
4/22/2024 42
Integral Image
Partial sum
Any rectangle is
D = 1+4-(2+3)
Also known as:

• summed area tables [Crow84]
• boxlets [Simard98]
4/22/2024 43
Huge library of filters
4/22/2024 44
Constructing the classifier
Perceptron yields a sufficiently powerful classifier
Use AdaBoost to efficiently choose best features

• add a new hi(x) at each round hi(x)
• each hi(xk) is a “decision stump” b=Ew(y [x> q])
a=Ew(y [x< q])
 x
4/22/2024 45
Constructing the classifier
For each round of boosting:
• Evaluate each rectangle filter on each example
• Sort examples by filter values
• Select best threshold for each filter (min error)
• Use sorting to quickly scan for optimal threshold
• Select best filter/threshold combination
• Weight is a simple function of error rate
• Reweight examples
• (There are many tricks to make this more efficient.)
4/22/2024 46
Good reference on boosting
Friedman, J., Hastie, T. and Tibshirani, R.
Additive Logistic Regression: a Statistical
View of Boosting
http://www-stat.stanford.edu/~hastie/Papers/boost.ps
“We show that boosting fits an additive logistic
regression model by stagewise optimization of a
criterion very similar to the log-likelihood, and present
likelihood based alternatives. We also propose a
multi-logit boosting procedure which appears to have
advantages over other methods proposed so far.”
4/22/2024 47
Trading speed for accuracy
Given a nested set of classifier hypothesis
classes
Computational Risk Minimization
4/22/2024 48
Sample results
4/22/2024 49
Summary (Viola-Jones)
• Fastest known face detector for gray images
• Three contributions with broad applicability:
❖Cascaded classifier yields rapid
classification
❖AdaBoost as an extremely efficient feature
selector
❖Rectangle Features + Integral Image can
be used for rapid image analysis
4/22/2024 50
Schneiderman
Viola Kanade
Jones
4/22/2024 51
Acute Lymphoblastic Leukemia
Dr. Dipti Patra

Seminar Outline
• Introduction
• Preliminaries of hematological disorders
• Leukemia and its classification
• Acute lymphoblastic leukemia (ALL)
• Diagnosis of ALL
• Lymphocyte image segmentation
• Automated characterization of Lymphocytes for ALL Diagnosis
• Automated FAB classification of lymphoblast subtypes
• Automated immunological classification of lymphoblast subtypes
• Image morphometry for lymphoid and myeloid blast
classification
• Conclusion
• References

Preliminaries of hematological disorders
• The study of blood disorders are commonly known as

hematology and are treated by medical experts known as
hematologists.
• Hematological disorders can be broadly classified in three ways,
i.e. by the type of blood cell which is affected, according to
functional disorders of the blood and lymphoid organs and
neoplastic disorders of blood and lymphoid organs.

Type of blood disorders

Leukemia
• Leukemia is a cancer of the blood tissue characteristically

associated with increased numbers of white cells in the bone
marrow and / or peripheral blood.
• Symptoms: fever, anemia, bleeding and/or bruising, persistent

weakness or tiredness, achiness in the bones or joints,
recurrent infections, difficulty breathing or swollen lymph nodes.
• Causes: No particular cause; include natural or artificial ionizing

radiation, certain kinds of chemicals, some viruses, and genetic
predispositions.

Leukemia classification

• Acute lymphoblastic leukemia (ALL) is a malignant disease

caused by the genetic alterations of the lymphocyte precursor
cells of the bone marrow.
• ALL is characterized by excessive production of immature

lymphocytes (lymphoblast) in the bone marrow preventing
normal hematopoiesis.
• Two popular ALL classification schemes presently in use are

French–American–British (FAB) classification and immunological
classification by world health organization (WHO).

ALL classification
• Two popular ALL classification schemes presently in use are

French–American–British (FAB) classification and immunological
classification by world health organization (WHO).
• As per FAB classification, there are three subtypes of ALL i.e. L1,
L2 and L3, and each has a distinct blast morphology.
• Immunological methods are based upon cell surface antigen

and can be broadly subdivided as:
• Precursor B–lymphoblastic leukemia or pre–B

• Precursor T–lymphoblastic leukemia or pre–T
• Mature B–lymphoblastic leukemia or mature–B

Diagnosis of ALL
• Techniques
• Light microscopic examination
• Cytochemistry
• Immunophenotyping
• Cytochemistry and immunophenotyping is expensive, and

requires additional evaluation of blood samples by molecular
analysis and flow cytometry.
• Microscopic examination of peripheral blood samples is still

considered as a standard economical procedure for confirmative
screening and subtyping of ALL in most of the countries including
India.

Conventional diagnosis of ALL
Peripheral Blood Microscopic

Smear Staining Examination and
Collection Decision
Blood Analysis
Report by
Pathologist

Limitations of conventional diagnosis
• Variability in diagnosis: It arises due to improper staining

process, intraobserver and interobserver differences.
• Time consuming
• Need for repeated sample collection

Need for automation
• The motivation to automate comes from the fact that besides

being time consuming, the results of manual diagnosis varies
with the hematologist’s skill, experience, work load and stress
level.
• Manual analysis becomes questionable not only because of the

amount of work, but also with regard to precision and the
reproducibility of the results.
• Automated systems can help overcome the dearth of trained

personnel.

Quantitative microscopy
Peripheral
Blood Smear
Image
Sample Blood Digital
Smear Staining Microscope with a
Collection Computer
Hematological
Image Analysis and
Decision
(Computer)

Research objective
In particular, the objectives are narrowed to —
i. Devise improved segmentation schemes for lymphocyte images.
ii. Utilize morphological, texture and color features in peripheral blood

smear images to classify between mature lymphocytes and
lymphoblasts.
iii. Develop a system for the classification of lymphoblasts.
iv. Establish an improved machine learning system for the classification

of lymphoid and myeloid blasts in peripheral blood smear images.

14 EE Department of NIT Rourkela
Proposed model using image processing
and machine learning

Materials and methods
• Histology
• Hematological Image Acquisition
• Sub Imaging
• Color Space Conversion
• Preprocessing

16 Acute Lymphoblastic Leukemia EE Department of NIT Rourkela
Hematological Image Acquisition
• Blood microscopic images of Lieshman stained peripheral blood

samples were optically grabbed by Zeiss Observer microscope
(Carl Zeiss, Germany) under 100X oil immersed setting and
with an effective magnification of 1000 at Ispat General
Hospital, Rourkela, India.

Sub imaging
• Peripheral blood smear images are relatively larger with more

than one leukocyte per image.
• The desired region of interest (ROI) must contain a single
lymphocyte only for ALL detection.
a) Entire Blood Image b) Nucleus Centroid c) Detected Subimages

Color space conversion
• Blood microscopic images are acquired in RGB color space.

• To perform image segmentation, the features of objects of
interest might seem to be more decorrelated in certain color
space than in others.
• Due to strong correlation among the individual color planes,
RGB color model is unsuitable for stained peripheral blood
smear image segmentation.
• L*a*b* color model is a suitable alternative for color feature
based image segmentation as the color dimension is reduced
and the color channels (a∗ and b∗) are uncorrelated.
• Transforming the blood microscopic images from RGB to CIELAB
reduces the color dimension of the problem from three (RGB) to
two (a∗ and b∗) and facilitates color based image segmentation.

Segmentation as a pixel classification
problem
• Segmentation of lymphocyte images is considered as a pixel
classification problem in a supervised framework.
• It is performed by measuring a set of CIELAB color features of
each pixel which defines a decision surface in the feature
space.
• Each pixel of the lymphocyte image is classified as belonging
to one of the regions i.e. cytoplasm, nucleus or background
(including RBC) using an improved neural classifier .
• The use of Functional Link Artificial Neural Network (FLANN)
as a classifier is introduced here for lymphocyte image
segmentation

problem
Proposed lymphocyte image segmentation algorithm using FLANN
(FLANNLS)

problem
(FLANNLS)
• To train the FLANN, the input–output patterns (Color features)–
Label) are generated for different sample lymphocyte images.
• a*, b* color values of each pixel is fed as input to the FLANN,
and the label of each pixel location is computed.
• Using a set of input–output pair (training data set) the network
parameters are optimized.
• In order to calculate the error, the actual label output of the
FLANN is compared with the desired label output provided by
the human expert.
• Depending on this error value, the weight matrix between input
and output layers is updated using back propagation algorithm
(BPA).

problem
(FLANNLS)
• Training patterns generated from a particular image IGH24HS is

listed below.

problem
(FLANNLS)
• The training convergence characteristics of FLANN.

problem
(FLANNLS)
• To validate the prediction of the FLANN, six patterns from six
different images other than the training images were tested and
listed below.

Segmentation as a pixel clustering problem
• Clustering is a special kind of unsupervised classification and

can be broadly classified as hard clustering (or crisp clustering)
and soft clustering.
• The popular clustering algorithms i.e. K–means, K–medoid
belong to the first category and each pixel is assumed to
belong to one and only one cluster.
• Soft clustering based clustering algorithms were developed,
and offer a principal alternative to crisp approaches with pixels
having partial membership to different classes.
• Algorithms such as fuzzy c–means (FCM), Rough c–means and
Shadow c–means (SCM) are generally used for the
segmentation of such images, where the class separation is
not well defined.

Need for feature (or kernel) space clustering
• Even though, the soft clustering approaches endow efficient

handling of overlapping partitions for spherical data, it fails
dramatically when the structure of input patterns is non–
spherical and complex.
• An alternative approach for solving such problems is to adopt
the strategy of nonlinearly transforming the data into a higher
dimensional feature space and then performing the clustering
within this feature space.
• Two novel kernel based clustering algorithms have been
proposed here for the segmentation of human lymphocyte
images.

Soft Partitive Clustering Algorithms
• Fuzzy c–means (FCM)

• Rough c–means (RCM)
• Rough–Fuzzy c–Means
• Shadowed c–Means (SCM)

Soft partitive clustering
Fuzzy c-means algorithm
• The first algorithm in the soft clustering arena was Fuzzy c–

means (FCM), which was developed in 1973 by Dunn and
improved by Bezdek in 1981.
• In FCM each data point is associated with every cluster using a
membership function, which gives degree of belongingness to
the clusters. The partition matrix is obtained by minimizing the
following objective function:
where, 1 ≤ m < ∞ is the degree of fuzziness, Xk is the kth data

pattern, vi is the ith cluster center, μ ∈ [0, 1] is the membership of
the kth data pattern to it, and ||.|| is the Euclidean distance norm.

Fuzzy c-means algorithm
• The partition matrix is obtained by minimizing the following

objective function:
where, 1 ≤ m < ∞ is the degree of fuzziness, Xk is the kth data

pattern, vi is the ith cluster center, μ ∈ [0, 1] is the membership of
the kth data pattern to it, and ||.|| is the Euclidean distance norm.
Where as
∀i with dik= ||Xk− vi||2, subject to σ𝑐𝑖=1 μ𝑖𝑘 = 1, ∀k, and 0<σ𝑐𝑖=1 μ𝑖𝑘 <N, ∀i.

Fuzzy c-means algorithm (FCM)

Fuzzy c-means algorithm (FCM)
• Even though, the membership concept of fuzzy sets endow

efficient handling of overlapping partitions in FCM algorithm,
issues like uncertainty, vagueness and incompleteness still
persists.
• In view of this, there was a necessity to use an alternative tool
like rough sets to handle such issues.
• To achieve such robustness in clustering problems rough sets
were incorporated in the c–means framework and was termed
as Rough c–means (RCM) algorithm.

Rough sets
• The principle of rough set is based on representation of rough or

imprecise information in terms of exact concepts i.e., lower and
upper approximation.
• The difference of upper and lower approximation will result with
objects in the rough boundaries.

Rough–c–Means
• In Rough c–means (RCM) clustering, the idea of standard c–

means is extended by visualizing each class as an interval or
rough set.
• A rough set Yis characterized by its lower and upper
approximations BYand B Y respectively.
• In rough context an object Xkcan be a member of at most one

lower approximation. If Xk∈ BYof cluster Y, then concurrently
Xk∈ BYof the same cluster. Whereas it will never belong to other
clusters. If Xkis not a member of any lower approximation, then
it will belong to two or more upper approximations.
Hematological Image Analysis for Subrajeet Mohapatra, Roll#509EE108

Rough c-Means (RCM)
• Updated centroid vi of cluster Ui is computed as

Rough c-Means (RCM) algorithm

Rough c-Means (RCM)
• Rough c–means algorithm is completely governed by three

parameters such as wlow, wup and T. The parameter threshold
can be defined as relative distance of a data member Xk from a
pair of cluster centroids vi and vj. These parameters each has
to be suitably tuned for proper segmentation.

Rough Fuzzy-c-Means (RFCM) algorithm

Rough Fuzzy-c-Means (RFCM) algorithm

Rough Fuzzy-c-Means (RFCM)
• Even though the clustering performance is improved with rough

set based approaches they are limited by issues like fine tuning
of upper and lower approximation parameters and
determination of threshold.

Shadow sets
• Fuzzy logic strives to model vagueness using membership
function, which indicates the degree of belongingness to a
concept which is desired to be represented.
• Membership values are accurate numerical quantities

representing excessive precision for describing imprecise
phenomena.
• Such excessive precision is undesirable under imprecise
phenomenon and a possible solution to it is shadowed sets.
• Most of the uncertainty arises in the determination of the
membership grades around 0.5 in contrast to assigning grades
close to 1 or 0.
• Confusion of assigning the belongingness around 0.5 sparked the
need of shadowed sets.
Shadow sets
• For each fuzzy set few membership values beyond a particular

threshold are elevated and reduced those are below a
substantially low value.
• Such process eliminate disambiguate property of the fuzzy sets
and thereby reducing the number of computations.
• The overall level of vagueness is maintained by defining a new
region termed zone of vagueness.
• Suitable membership assignment is made such that this
particular area of the universe of discourse will have values
between [0, 1],

Shadow sets
• The transformation of fuzzy set to shadowed set is achieved

using a particular threshold.
• Effectively such development transforms the domain of
discourse into clearly marked region of vagueness.
• Such mapping is termed as shadowed set and is defined as
A : X → {0, 1, [0, 1]}.
• The elements of X for which A attains the value 1 constitute its
core, whereas the elements with A(x) = [0, 1] lies in the
shadow region of the mapping; the rest forms the exclusion
region.

Shadow sets

Shadow sets
• A particular threshold is desired for partitioning the distribution

into core, shadowed and exclusion regions.
• A suitable threshold λ is obtained for the quantification process
using the relation:
where λ ∈ (0, 1/2) such that θ(λi) = 0

• A discrete version of the above equation can be defined as,

Shadow c- means (SCM) algorithm

Shadow c- means (SCM) algorithm
where,
• Data patterns belonging to core region have no fuzzy weight

factor, whereas elements belonging to shadowed region are
treated as in FCM.
• Data members belonging to the exclusion region, the fuzzy weight
factor m is raised to itself i.e. mm.
• Unlike FCM, it does not attempt fuzzification for elements having
membership values above the threshold reducing computational
burden.
Comparison of SCM with FCM and RCM
clustering algorithms
• Unlike FCM, it does not attempt fuzzification for elements having
membership values above the threshold reducing computational
burden.
• As compared to RCM there is absence of external user defined
parameters.
• Removal of this initial trial and error factor makes SCM more
robust , as well as insensitive to the fluctuations in the incoming
data.
• Analogous to K-means, FCM and RCM algorithm SCM is effective
only in clustering crisp, spherical, and non–overlapping type of
data.
• In order to alleviate this problem, feature space transformation
using non–linear kernels is proposed for the clustering of
lymphocyte image data leading to improvement in segmentation
performance.
Feature or kernel space clustering
• Kernel functions are used to transform the data in the
image plane into a feature plane of higher dimension
(possibly infinite) known as kernel (or feature) space.
• Nonlinear mapping functions i.e. Φ transforms the
nonlinear separation problem in the image plane into a
linear separation problem in kernel space facilitating
clustering in the feature space.

• Due to high and possibly infinite feature dimension it is

unrealistic to measure the Euclidian distance between the
transformed variables.
• However, as per Mercer’s theorem working directly on the

transformed variables can be avoided.
• It can be used to calculate the distance between the pixel

feature values in the kernel space without knowing the
transformation function φ(.).

Mercers Theorem
• Any continuous, symmetric, positive semi definite kernel

function can be expressed as a dot product in a higher
dimension”. Thus if Φ(.) is the mapping function then using
Mercer’s Theorem k (x, y) is given by
where x and y be any two point in the image plane and Φ(x) and
Φ (y) are the corresponding points in the feature plane
respectively. ” ・ ” is the dot product in the kernel space and
k (x, y) represents kernel function.
Hematological Image Analysis for Subrajeet Mohapatra, Roll#509EE108

Standard Kernels

Kernel Induced Rough Fuzzy c–means for
Lymphocyte Image Segmentation (KIRFCMLS)

Kernel Induced Shadowed c–means for
Lymphocyte Image Segmentation (KISCMLS)

Lymphocyte Image Segmentation as a
Pixel Labeling Problem
• Image segmentation using spatial interaction models like
Markov Random Field (MRF) and Gibbs Random Field (GRF) to
model the images was inspired by the computational techniques
developed in statistical mechanics.
• Such modeling started with the influential work of Geman and
Geman who linked via statistical mechanics between mechanical
systems and probability theory.

Markov Random Field (MRF)
A random field X = {Xi,j} defined over lattice S is a Markov

Random Field (MRF) with respect to the neighborhood system η if
and only if
1. All of its realizations have non zero probabilities P(X = x)>0
for all x (property of positivity).
2. Its conditional distribution satisfies the following property
where xij is the configuration corresponding to the random

variable Xi j and so on.

Clique
A clique of the pair (S, η) denoted by c is a subset of S such that
1. c consists of a single pixel, or

2. for (i, j) (k, l), (i, j) ∈ c and (k, l) ∈ c implies that (i, j) ∈ η(k, l)
The collection of all cliques of (S, η) is defined by C(S, η).

Gibbs random field
Considering η be a neighborhood system defined over a finite
lattice S. A random field X is said to be a Gibbs Random Field
(GRF) of lattice S with respect to a neighborhood system η if and
only if its configuration obey a Gibbs distribution which has the
following form
where Z is the partition function defined as:

Gibbs random field
T is a constant analogous to temperature which shall be assumed
to be 1 unless otherwise stated and U(x) is the energy function
expressed as
Energy is sum of clique potentials Vc(x) over all possible cliques C.

Vc(x) are a set of potential functions depending on the values of x
at the sites in the clique c.

Markov-Gibbs equivalence
• A MRF is defined in terms of local properties (the classification
label assigned to a pixel is affected only by its neighbors),
whereas a GRF is characterized by its global property (the Gibbs
distribution).
• The popular Hammersley–Cliffords theorem states that given the
neighborhood structure η of the model, for any set of sites within
the lattice S, their associated contribution to the Gibbs energy
function should be non zero, if and only if the sites form a clique;
a random field’s having the Markov property is equivalent to its
having a Gibbs distribution.
• This theorem establishes the equivalence of these two types of
properties and provides a very general basis for the specification
of MRF joint distribution function.

MRF image model
• Let the images are assumed to be defined on a discrete
rectangular lattice S =N × N. Assuming that X denotes the
random field associated to the noise free image and Z denotes
the corresponding label process. Let z be a realization of Z and
the label process Z is modeled as MRF. The observed image y is
assumed to be a realization of the random field. Moreover, the
label process Z is assumed to be a MRF with respect to a
neighborhood system η and is described by its local
characteristics
As Z is MRF, or equivalently Gibbs distributed, the joint

distribution can be expressed as

MRF image model

Problem addressed
The following image model is considered:
Y=X+N
• Y= Random field associated with the observed image
• X=Random field associated with noise free label process of the

original image which is assumed to be MRF having a priori
values of the model parameter.
• N= Random field associated with Gaussian noise process with
zero mean and known variance.

Image label estimation
• In a supervised framework the number of regions M and the

model parameters are assumed to be known, it is required to
estimate the pixel labels using the associated model parameter
vector θ.
• The label process Z, of the image is modeled as MRF and the
objective is to obtain the optimal estimate of the realization of
the scene labels z∗ and hence achieve segmentation.
• The following optimality criterion is adopted.
Where θ denote the parameter vector, which is assumed to be

known a priori,𝑧is
Ƹ the MAP estimate of the labels. Since z is
unknown the posterior probability cannot be evaluated here.

Hence using Bayes rule,
Since y is known, the denominator of the above equation is constant.

Using the assumptions i.e. the noise is a white Gaussian

sequence with zero mean, is independent of z in the degradation
model, P(Y = y | Z = z, θ) can be expressed as
1
P(Z=z|ϴ)= 𝑒 −𝑈(𝑧,Φ)
𝑍

We can further simplify this optimization problem by taking the

negative and minimizing the resultant to obtain the following
relation:
This optimization is computationally an enormous task. In order

to reduce the computational burden, a memory based search
algorithm is proposed here by exploiting the notion of simulated
annealing for obtaining the MAP estimate of image labels 𝑧.Ƹ

Memory Based Search Algorithm for Lymphocyte Image
Segmentation (MBSLS)
• In this algorithm the notion of annealing is employed with a view
to examine every point of the search space with finite probability
and hence achieve global optimal solution.
• The next move of the memory based search is achieved by using
the notion of neighborhood search.
• During implementation, an image is considered as a point in the
multidimensional search space.
• The next move is another image in the neighborhood that has
energy less than all the previous moves and is attained by
perturbing the point in the neighborhood structure.
• Following, this approach the revisiting of earlier points are
avoided and an array of image is created. In order to overcome
the local minima trapping, a criterion is introduced. The criterion
here is to accept moves of higher energy with a probability. This
guides the algorithm to overcome the local minima


Simulation and results
• The four proposed segmentation schemes (FLANNLS, KIRFCMLS,

KISCMLS, MBSLS) are implemented using Matlab 7.8 and
experimental simulation was performed using an Intel Core i5
3.20GHz PC, along with 2GB RAM running on Windows 7
professional operating system.
• A total of 270 lymphocyte sub images which include mature
lymphocytes and lymphoblasts constitute the entire image data
set and were used for the experimental evaluation of the
proposed schemes.

Simulation results
• Three experiments were conducted on the test images to

demonstrate the efficacy of all the four proposed schemes.
• In the first experiment each individual proposed scheme is
compared with four standard leukocyte segmentation schemes
such as Fuzzy Divergence (FD), Gaussian Mixture Model
(GMM), Modified Fuzzy C Means (MFCM) and Multiple Pixel
Classifier (MPC).
• The misclassification error percentage (ε) is evaluated in the
second experiment by comparing the segmentation results with
the available manual segmented test images using the
following relation:

Simulation results

Simulation results

Simulation results

Segmentation results for lymphoblasts
using MBSLS algorithm

Manual segmented images

Manual segmented images

Comparison of misclassification error percentage
for different methods.

Variation of computational time in seconds

Automated characterization of
lymphocytes for ALL diagnosis
• A novel image processing and machine

learning based approach is developed to
characterize a lymphocyte image for the
quantitative diagnosis of ALL in PBS images.
• The proposed method is used to differentiate

each lymphocyte image into a mature
lymphocyte or a lymphoblast.

Proposed model for automated characterization
of lymphocytes for ALL diagnosis
82
Histology
• The total data set used for the development of the model,
comprises of peripheral blood smear (PBS) samples, and were
collected from 63 patients diagnosed with ALL and 55 control
subjects.
• 150 stained subimages of lymphocyte were obtained by the
image acquisition and sub imaging process.

Morphological diagnosis of ALL
• The criteria during diagnosis or in the follow up of ALL are based
on the percentage of lymphoblast present in the peripheral blood
or bone marrow samples.
• Patients with presence of more than 20% of lymphoblast in
peripheral blood samples are diagnosed with ALL .
• Morphologically, lymphoblast is characterized by large nucleus,
having an irregular size, shape and the nucleoli are prominent.
• The cytoplasm is scarce and intensely colored in blast cell
images.
• Nucleus and cytoplasm of lymphoblast reflects morphological and
functional changes in comparison to lymphocytes, and plays a
main role in assessment of malignancy in blood samples.

Visual criteria for diagnosis of ALL

Quantitative measurements for diagnosis
of ALL
• The basis for the differentiation of lymphocyte from

lymphoblast can be grouped into two types of
characteristics i.e. nuclear changes (variation in shape and
size, chromatin pattern) and cytoplasmic changes (amount
of cytoplasm and protein accumulation).
• Quantitative features for nucleus and cytoplasm region of a
lymphocyte is suggested which is correlated directly with
the actual cytological features, and aides in the computer
processing of lymphocyte images.

Computed shape features
87
Computed texture and color features
88
Shape measurements
• Form Factor (F6): Is a shape parameter derived from the basic

cellular measurements i.e. area and perimeter. It can be
mathematically defined as
• Roundness (F7): Is the degree to which the nucleus shape differs

from that of a circle and can be defined as
• Length–Diameter ratio (F8): Length to diameter (L/D) ratio is

the ratio of the major axis length and minor axis length of the
nucleus region.
• Compactness (F9): Is a numerical measure representing the
degree to which a shape is compact and is mathematically
represented as.

Shape measurements
• Length–Diameter ratio (F8): Length to diameter (L/D) ratio is

the ratio of the major axis length and minor axis length of the
nucleus region.
• Compactness (F9): Is a numerical measure representing the

degree to which a shape is compact and is mathematically
represented as.

Nucleus boundary roughness
measurements
• Nuclear boundary irregularity is an

important diagnostic feature of ALL.
• To measure such deformation accurately

in quantitative manner fractal geometry
and contour signature can be used.

Fractal geometry
• Even though many fractal properties have been

defined, Hausdorff dimension (HD) is one of the most
important since it provides an accurate objective
measure of boundary irregularity.
• Several approaches for the estimation of HD or Df
(F10) is available in the literature, however the most
common among them used in biological sciences may
be summarized as follows.
• Modified pixel dilation

• Perimeter–area
• Ruler counting
• Box counting

Box counting method
• The box counting method is more popular due to its easier

implementation and is based on self–similarity.
• In this method, we cover boxes of different pixel length over the
digitized version of the segmented nucleus image.
• The Hausdorff dimension Df of a bounded set A in Euclidean n-
space can be derived from the relation
log N
Df =
log N (s )

Box counting method
Boxes of different pixel length superimposed over the

segmented nucleus image.

Box counting method example

Contour signature
• The dimensionality of the representation of the contour is

reduced from two to one by converting from a coordinate–based
representation to distances from each contour point to a
reference point.
• A suitable reference point is centroid or center of mass of the
contour, whose coordinates can be defined as
where (x, y) are the coordinates of the pixels along the contour
and M is the total no of pixels on the nucleus contour.

Contour signature
• The signature can be defined by the variance, skewness and

kurtosis of all the distances between centroid and nucleus
contour points that will be significantly different in benign and
malignant samples.

Texture feature
• Changes in the chromatin distribution, reflects the organization

of DNA in lymphocyte nucleus, and is an essential diagnostic
descriptor for classifying malignant lymphocytes (lymphoblast)
from benign ones.
• Leishman staining of blood samples enables the visualization of
chromatin distribution of lymphocyte nucleus in form of
texture.
• Genetic modifications are responsible for textural changes and
are visible during the transition from normal to malignant.
• Such textural transformation can be quantified using Haar
wavelet and Haralick feature based methods and is presented
below.

Texture feature
Wavelet texture feature: A texture feature vector for each gray scale
version of lymphocyte nucleus image consists of wavelet coefficients
obtained by taking mean and variance of An, Hn and Vn components.
Haralick texture feature: Statistical measures i.e. contrast, correlation,

homogeneity, energy and entropy are computed from the co-occurrence
matrices using different offsets.
Fourier descriptors: Feature descriptors used here for texture

quantification is based on two–dimensional DFT (Discrete Fourier
Transform). Statistics i.e. mean, standard deviation, skewness and
kurtosis are computed over the nucleus image in the frequency domain
obtained using discrete fourier transform.

Color feature
• Excessive pigmentation in lymphocyte nucleus results with

hyperchromatism and is an important characteristic appearing
in malignant lymphocytes.
• Chromatin abnormality results in increased staining capacity of
nuclei.
• Such modification in DNA content of nuclei is visible in form of
change in color intensity in ALL.
• This change in color during transition from normal to malignant
is measured as mean color intensity in RGB and HSV color
space.

Data Normalization
• Prior to classification, it is necessary to normalize the dataset

with dissimilar range of values.
• Combination of variables with nonuniform magnitudes results
with masking of lower magnitude data by higher magnitude
data due to the sheer magnitude of the inputs which generates
larger weights associated with them.
• Normalization is an essential procedure to transform the input
features into a similar range so that true influence of variables
can be ascertained.
• Using the below relation each input variable or feature can be
normalized as

Feature selection
• Selection of an appropriate set of features is important and

strongly affects classifier performance.
• Independent sample “t”–test is one such popular approach to
determine the statistical significance of the extracted features.
• Out of all the 44 extracted features, 32 features were found to

be statistically significant (p value < 0.05) using t–test and
participate in the classification process.

Classification
• We propose the use of an ensemble of classifiers for labeling

each lymphocyte subimage as benign or malignant sample
based upon a set of measured features.
• Performance of the extracted features in classification is also
tested with five other standard classifiers i.e. Naive Bayesian,
K–Nearest Neighbor, Multilayer Perceptron, Radial Basis
Functional Neural Network and Support Vector Machines.

Ensemble of classifiers
• Ensemble based systems or multiple classifier systems are more

preferable than their single classifier counterparts due to several
reasons and a few important among them are

An ensemble of classifiers for feature
classification

Proposed ensemble classifier for lymphocyte
characterization

K-fold cross validation
• In view of the fact that the data set used in this study is small,
k–fold cross validation resampling technique is employed for
the training and testing of the classifiers for the extracted
lymphocyte features.
• Considering the value of k as 5 the whole data set is randomly
divided into five parts, in which each class is represented in
approximately the same proportions as in the original data set.
• Three parts of the data is used for classifier training (training
set) and the rest two parts are considered for evaluation
(testing set).
• The procedure is executed a total of five times with different
combination of training and testing data.
• Finally, the five performance estimates are averaged to yield an
overall estimate of the classifier performance in terms of
accuracy, specificity and sensitivity.
Performance analysis
• Performance metrics i.e. accuracy, specificity and sensitivity are

calculated from a confusion matrix which represents the
differences in opinion between the hematologist and the
classifier.

Simulation results

Simulation results

Simulation results

Simulation results

Simulation results
Plot between feature index and p–value for showing feature significance.

Simulation results
Average classification accuracy of ensemble methods along with
standard classifiers over 5–fold.

Simulation results
Average sensitivity of ensemble methods along with standard
classifiers over 5–fold.

Simulation results
Average specificity of ensemble methods along with standard
classifiers over 5–fold.

Simulation results
Computational time consumed by different classifiers for lymphocyte
characterization.

Conclusion
• The proposed segmentation schemes along with a number of
reported schemes are simulated for lymphocyte and myeloblast
images.
• Performance measure like segmentation error is used to
evaluate the segmentation accuracy.
• Segmentation results are also evaluated visually.
• Altogether, the proposed segmented schemes exhibit superior
performance to their counterparts.
• For all the proposed schemes NB, KNN, MLP, RBFN, SVM and
EOC classifiers for feature classification are simulated in different
situations.
• It was found from our experimental evaluation that, the
performance of EOC is better than other classifiers in all
situations.
Scope for further research
• The research findings made out of this work has opened several
auxiliary research directions, which can be further investigated.
• The segmentation scheme can be enhanced by including
techniques that can lead to segmentation of overlapping cells as
well.
• The proposed schemes, which mostly deal with computer aided
diagnosis and sub classification of ALL, can be extended to AML.
• In ensemble learning classifier system, there exists multiple
classifier processes which can be executed in parallel for better
response time.
• Promising research direction to pursue is to develop an
automated prognostic scoring system for ALL.


Ilovepdf Merged

Uploaded by

Copyright:

Available Formats

Ilovepdf Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ilovepdf Merged

Uploaded by

Copyright:

Available Formats

Computer Vision

What are the desired features to conduct these tasks?

• The extrema in a signal and its first a few

• The extrema in a signal and its first a few

• The extrema in a signal and its first a few

Computational Doubling the scale leads to four-fold

• Pyramid is one way to represent images in

• It is the difference between a blurredcopy of image I

s1 region size s2 region size

• For usual images: a good function would be a one

How to find corresponding patch sizes?

f (I i1! i m (x,  )) f (I i1! im (x,))

f (I i1! i m (x,  )) f (I i1! im (x,))

f (I i1! i m (x,  )) f (I i1! im (x,)

• Lowe., D. 2004, IJCV

• Shows existence, importance, and value of

Instead of using LoG (Laplacian of Gaussian) like

• Image content is transformed into local feature

Keypoint Fit a model to determine location and

Orientation Compute best orientation(s) for each

Keypoint Use local image gradients at selected

– Can handle changes in viewpoint

We want to find points that give us information about the

The information about the objects is in the object’s edges.

We will represent the image in a way that gives us these edges

• Laplacian of Gaussian kernel

Minimize to find accurate extrema:

If offset from sampling point is larger than 0.5 -

L(x, y, ) = G(x, y, )* I (x, y)

36 bins (i.e., 10o per bin)

the matcher correctly

features that really do have a match

0 0.1 false positive rate 1

the matcher correctly

features that really do have a match

0 0.1 false positive rate 1

1. Before you start, you can normalize the images.

Finding cluster of at-least3 keypoints.

• Detection: of separate objects.

The image contains Taj mahal in it. It is obvious.

▪ Given a number of training images of a category,

▪ It deals with the categorization of an object based upon

▪ Problem : A single object can not have an unique label.

Which category is best suited for the above image?

• The highest level at which category members have similar

• Robustness: Illumination, Occlusions, Clutter, Object

• Importance of context: Some part of the picture have

• Complexity of image: Number of pixels, Large number

Lane detection • Collision warning

slide credit: S. Ullman

• Bank cheque number recognition

• Facebook person finder

• Recognition of flat structured objects

• A pattern is an object, process or event that can be given a

• A pattern is represented by a set of d features, or

• Objects from same classes have similar feature values

Good features Bad features

• These can be resolved by using PCA (unsupervised) or

To avoid misclassification two strategies are applied: