Ilovepdf Merged

Download as pdf or txt
Download as pdf or txt
You are on page 1of 510

Computer Vision

FindingFeatures, Affine
Invariance, SIFT

1
Outline
• Concept of Scale
– Pyramids
– Scale-space approaches briefly
• Scale invariant region selection
• SIFT: an image regiondescriptor

2
Reminder: Motivation
• Image Matching
– Fundamentalaspect of many problems

• Object Recognition
• 3D Structures
• Stereo Correspondence
• Motion Tracking
• ….

3
Reminder: Motivation
• Image Matching
– Fundamental aspect of many problems

• Object Recognition
• 3D Structures
• Stereo Correspondence
• Motion Tracking
• ….

What are the desired features to conduct these tasks?

4
Review of the CornerDetection
Corners are invariant to

translation

rotation

scaling

5
SCALE

• The extrema in a signal and its first a few


derivatives provide a useful general purpose
description for many kinds of signals
– Edges
– Corners

6
SCALE

• The extrema in a signal and its first a few


derivatives provide a useful general purpose
description for many kinds of signals
– Edges
– Corners
• In physics, objects live on a range of scales.

7
SCALE

• The extrema in a signal and its first a few


derivatives provide a usefulgeneral purpose
descriptionfor many kinds of signals
– Edges
– Corners
• In physics, objects live on a range of scales.
• Neighborhood operations can only extract local
features (at a scale of at most a few pixels),
however, images contain information at larger
scales
8
Scale-Space
• Any given image can
contain objects that exist
at scales different from
other objects in the same
image

9
Scale-Space
• Any given image can
contain objects that exist
at scales different from
other objects in the same
image
• Or, that even existat
multiple scales
simultaneously

10
Scale-Space
• Any given image can
contain objects that exist
at scales different from
other objects in the same
image
• Or, that even existat
multiple scales
simultaneously

11
Multi-scale Representation
Small Scale Smaller Filter
Features Masks
Image
Large Scale Larger Filter
Features Masks

12
Multi-scale Representation
Small Scale Smaller Filter
Features Masks
Image
Large Scale Larger Filter
Features Masks

Computational Doubling the scale leads to four-fold


Cost increases! increase in the number of operations in 2D.

13
Multi-Scale Representation: Pyramid

• Pyramid is one way to represent images in


multi-scale
– Pyramid is built by using multiple copies of image.
– Each level in the pyramid is 1/4 of the size of
previous level.
– The lowest level is of the highest resolution.
– The highest level is of the lowest resolution.

14
Multi-Scale Representation: Pyramid

15
Pyramid can capture global and local
features

16
Gaussian Pyramids

17
Source: Forsyth
Laplacian Pyramids

18
Source: Forsyth
Laplacian Pyramids

1
-2 1 -2 1
1

Laplacian of Gaussian (LoG): Gaussian first to smooth images then perform Laplacian operation

19
Laplacian Pyramids

20
Laplacian Pyramids
Laplacian Pyramid Construction

22
Laplacian Pyramid Construction

23
Decimation and Interpolation

Lowpass filter
(i.e., Gaussian)

24
Difference of Gaussian(DoG)
• It is a common approximation of LoG (better run
time).

• It is the difference between a blurredcopy of image I


and an even more blurred copy ofI.

25
Scale Selection-Automated
• Find scale that gives local maxima of some function
f in both position and scale.

f f
Image 1 Image 2

s1 region size s2 region size


Good Function?
• A “good” function for scale detection:
has one stable sharp peak
f f f
Good !
bad bad
region size region size region size

• For usual images: a good function would be a one


which responds to contrast (sharp local intensity
change)

27
Patch Size Corresponding toScale

How to find corresponding patch sizes?

28
Patch Size Corresponding toScale

29
Patch Size Corresponding toScale

30
Patch Size Corresponding toScale

31
Patch Size Corresponding toScale

f (I i1! i m (x,  )) f (I i1! im (x,))


32
Patch Size Corresponding toScale

f (I i1! i m (x,  )) f (I i1! im (x,))


33
Patch Size Corresponding toScale
• Function responses for increasing scale (scale signature)

f (I i1! i m (x,  )) f (I i1! im (x,)


34
Normalization

42
Scale Invariant Feature Transform (SIFT)

• Lowe., D. 2004, IJCV

4
3
Scale Invariant Feature Transform (SIFT)

• Shows existence, importance, and value of


invariant types of detector
• Demonstrates the richness that feature
descriptors can bring to featurematching

Instead of using LoG (Laplacian of Gaussian) like


in Harris and Hessian-based operators, SIFT uses
DoG (Difference of Gaussian).
37
Scale Invariant Feature Transform (SIFT)

• Image content is transformed into local feature


coordinates that are invariant to translation,
rotation, scale, and other imaging parameters

38
Scale Invariant Feature Transform (SIFT)
• Image content is transformed into local feature
coordinates that are invariant to translation,
rotation, scale, and other imaging parameters

39
Overall Procedure at a High Level
Scale-Space Search over multiple scales and image
Extrema locations
Detection

Keypoint Fit a model to determine location and


scale. Select Keypoints based on a
Localization measure of stability.

Orientation Compute best orientation(s) for each


Assignment keypoint region.

Keypoint Use local image gradients at selected


scale and rotation to describe each
Description keypoint region. 47
Basic idea:
SIFT Descriptor
• Take n x n (i.e., n=16) square window (around a feature/interest point)
• Divide them into m x m (i.e, m=4) cells
• Compute gradient orientation for each cell
• Create histogram over edge orientations weighted by magnitude

0 2
angle histogram

41
SIFT Descriptor
16 histograms x 8 orientations
= 128 features

42
SIFT Descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
• Threshold normalize the descriptor:

such that:

0.2

43
Properties of SIFT
Extraordinarily robust matching technique

– Can handle changes in viewpoint


– Can handle significant changes in illumination
– Efficient (realtime)
– Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT

44
Revisit SIFT Steps
(1) Scale-space extrema detection
– Extract scale and rotation invariant interest points (i.e.,
keypoints).
(2) Keypoint localization
– Determine location and scale for each interest point.
– Eliminate “ weak ” keypoints
(3) Orientation assignment
– Assign one or more orientations to each keypoint.
(4) Keypoint descriptor
– Use local image gradients at the selected scale.

45
(1) Scale-Space ExtremaDetection

We want to find points that give us information about the


objects in the image.

The information about the objects is in the object’s edges.

We will represent the image in a way that gives us these edges


as this representations extrema points.

46
(1) Scale-Space ExtremaDetection
scale
• Harris-Laplace

LoG
Find local maximaof:
– Harris detector in space y
– LoG in scale
 Harris → x

scale
• SIFT

DoG
Find local maximaof:
– Hessian in space y
– DoG in scale

Hessian x
47
(1) Scale-Space ExtremaDetection

48
(1) Scale-Space ExtremaDetection
• Extract local extrema
(i.e., minima or maxima)

( )
D k 2
in DoG pyramid.
-Compare each point to
its 8 neighbors at the
same level, 9 neighbors
in the level above, and 9
D(k ) neighbors in the level
below (i.e., 26 total).

D( )
49
(1) Scale-Space ExtremaDetection

• Laplacian of Gaussian kernel


– Scale normalized (x by scale2)
– Proposed by Lindeberg
• Scale-space detection
– Find local maxima across scale/space
– A good “blob” detector

50
(2) Keypoint Localization
• There are still a lot of points, some of them are
not good enough.
• The locations of keypoints may be not
accurate.
• Eliminating edge points.

51
(2) Keypoint Localization
—Reject (1) points with low contrast(flat)
(2) poorly localized along an edge (edge)
(2) Keypoint Localization
—Reject (1) points with low contrast(flat)
(2) poorly localized along an edge (edge)
Inaccurate KeypointLocalization
Poor contrast

True Extrema

Detected Extrema

Sampling x
61
Inaccurate KeypointLocalization
The Solution:
Taylor expansion:

Minimize to find accurate extrema:

If offset from sampling point is larger than 0.5 -


Keypoint should be in a differentsampling point.
62
Brown & Lowe 2002
Local extremas

56
Local extremas Remove low contrast features

57
Local extremas Remove low edges

58
SIFT Descriptor

59
(3) Orientation Assignment
Create histogram of gradient directions, within a region around the
keypoint, at selected scale:

L(x, y, ) = G(x, y, )* I (x, y)


m(x, y) = (L(x +1,y) − L(x −1,y))2 + (L(x,y +1)− L(x, y −1))2
(x, y) = atan2((L(x,y +1)− L(x, y −1))/ (L(x +1,y)− L(x −1,y)))

36 bins (i.e., 10o per bin)

0 2
•Histogram entries are weighted by (i) gradientmagnitude and (ii) a
Gaussian function with σ equal to 1.5 times the scale of the key point.
(3) Orientation Assignment

61
(3) Orientation Assignment

62
(3) Orientation Assignment

63
(3) Orientation Assignment

64
(3) Orientation Assignment

65
(4) KeypointDescriptor
• Partial Voting: distribute histogram entries into adjacent bins
(i.e., additional robustness to shifts)
– Each entry is added to all bins, multiplied by a weight of1-d,
where d is the distance from the bin it belongs.

66
(4) KeypointDescriptor
• Descriptordepends on two main parameters:
– (1) number of orientations
– n x n array of orientation histograms

67
Evaluating Results
How can we measure the performance of a feature matcher?

the matcher correctly


found a match 0.7

true
# true positives matched positive
# true positives rate

features that really do have a match

0 0.1 false positive rate 1


the matcher said yes when
# false positives matched the right answer was no
# true negatives

68
Evaluating Results
How can we measure the performance of a feature matcher?
ROC curve (“Receiver Operator Characteristic”)
1

the matcher correctly


found a match 0.7

true
# true positives matched positive
# true positives rate

features that really do have a match

0 0.1 false positive rate 1


the matcher said yes when
# false positives matched the right answer was no
# true negatives

69
Change of Illumination
Change of brightness => doesn’t effect gradients
(difference of pixels value).
Change of contrast => doesn’t effect gradients
(up to normalization).
Saturation (non-linear change of illumination) =>
affects magnitudes much more than orientation.
=> Threshold gradient magnitudes to 0.2 and
renormalize.

70
Illumination invariance?

1. Before you start, you can normalize the images.


2. Difference based metrics can be used (Haar, SIFT,…)

71
72
• Extract features

73
• Extract features
• Compute putative matches

74
• Extract features
• Compute putative matches
• Loop:
– HypothesizetransformationT (small group of
putative matches that are related by T)
75
• Extract features
• Compute putativematches
• Loop:
– Hypothesize transformation T (small groupof
putative matches that are related by T)
– Verify transformation (search for other matches
consistent with T) 83
• Extract features
• Compute putativematches
• Loop:
– Hypothesize transformation T (small groupof
putative matches that are related by T)
– Verify transformation (search for other matches
consistent with T) 84
Object Recognition
For training images:
Extracting keypoints by SIFT. Creating
descriptors database.
For query images:
Extracting keypoints by SIFT.
For each descriptor - finding nearest neighbor in DB.

Finding cluster of at-least3 keypoints.


Performing detailed geometric fit check for each cluster.

78
Recognition/Matching

79
Image Registration

80
THE END
Slide Credits: Ulas Bagci
INTRODUCTION
• Recognition of object deals with labelling of
an object present in an image in the same
way as a human will understand it.
• For computers to identify/recognize an
object there should be prior knowledge
database.
• Machine learning deals with training of
computers by giving them to prior
knowledge about the object shape, size,
color etc. which leads to recognise the
object.
Like human understanding , object recognition includes:

• Detection: of separate objects.


• Description: of their geometry and position in 3D
• Classification: as being one of known class
• Identification: of the particular instance
• Understanding: of spatial relationship between objects
What does object recognition involves?
Identification: Is it Taj mahal?
Object categorization

sky
tree

people
Scene and context categorization

The whole image says a scene where people come to visit Taj mahal
Instance level recognition:

The image contains Taj mahal in it. It is obvious.


Generic categorization

All of the images contain a common object i.e. a building like structure
Object categorization problem

▪ Given a number of training images of a category,


recognize a priori unknown instances of that category
and assign the correct category label.

▪ It deals with the categorization of an object based upon


prior knowledge database.

▪ Problem : A single object can not have an unique label.


It may belongs to various categories.
Red rose Rose Flower Plant

Which category is best suited for the above image?


For categorization problem first we have to understand how
human brain is able to categorize the object.
Brain deals with following task when it sees an image.

• The highest level at which category members have similar


perceived shape. Ex- In the previous image the shape highly
resemblance to rose.
• The highest level at which single mental image reflects the
entire category. Ex- when we see the image a prior image of
rose comes to our mind.
• The level at which human subjects are usually fastest at
identifying category numbers. Ex- What does the image
represent: flower or rose? We obviously choose rose.
• The highest label at which a person uses a similar motor actions
for interaction with category members. Here the highest
category is red rose.
Steps of object recognition
Start

Collect data
Prior knowledge
Choose features

Choose model

Train classifier

Evaluate classifier

End
Challenges in object recognition

• Robustness: Illumination, Occlusions, Clutter, Object


pose, View point

• Importance of context: Some part of the picture have


same intensity value but represent different object.
So context information may be incorporated.

• Complexity of image: Number of pixels, Large number


of categories, Large number of database.
Applications: Assisted driving
Pedestrian and car detection

Ped

meters
Ped

Car

meters

Lane detection • Collision warning


systems with adaptive
cruise control
• Lane departure warning
systems
• Rear object detection
systems
Applications: image search
Challenges: viewpoint variation
Challenges: illumination variation

slide credit: S. Ullman


Challenges: occlusion
Challenges: scale
Scaling affects
the image .
Challenges: deformation
Challenges: intra-class variation
Past achievements using Object
recognition
• Finger print recognition

• Bank cheque number recognition

• Facebook person finder

• Google net

• Recognition of flat structured objects


A brief introduction to Pattern Recognition
“The assignment of a physical object or event to one of a several
pre-specified categories”
Duda & Hart

• A pattern is an object, process or event that can be given a


name.
• A pattern class (or category) is a set of patterns sharing common
attributes and usually originating from the same source
• During recognition, given objects are assigned to the prescribed
classes.
• A classifier is a machine which performs classification.
Basic component of pattern recognition system

Training data
Physical environment
Pre processing
Data acquisition
Feature extraction
Pre processing
Feature selection
Feature extraction
Features
Features
Classification Model learning and
estimation
Post processing
Decision
Pattern representation

• A pattern is represented by a set of d features, or


attributes viewed as d-dimensional feature vector.
𝑋 = (𝑥1 , 𝑥2 , … . 𝑥𝑑 )𝑇

Feature vector
• A vector of observations (measurements).
Pattern • Feature vector (𝑥 ∈ 𝑋)
𝑥1 • 𝑥 is a point in feature
𝑥2
. =𝑥
space 𝑋
. • Feature vector is the set
𝑦 𝑥𝑑 of observations.
• Pattern (𝑦 ∈ 𝑌)

Task
• Our task is to develop a classifier(or decision rule) which can
map the features with respective patterns.
𝑞: 𝑋 → 𝑌
Feature extraction
Task: to extract features which are good for classification.

Good features:

• Objects from same classes have similar feature values


• Objects from different classes have different values.

Good features Bad features


Feature selection
• The problem is optimization of parameters of feature
extractor, i.e. which features to be selected such that
the classifier can perform the task efficiently as well as
dimension of feature vector can be reduced.

• These can be resolved by using PCA (unsupervised) or


LDA(supervised).
Classifier
• A classifier partitions feature space X in to class labeled
regions such that
𝑋 = (𝑋1 ∪ 𝑋2 ∪ ⋯ ∪ 𝑋 𝑌 ) and (𝑋1 ∩ 𝑋2 ∩ ⋯ ∩ 𝑋 𝑌 ) = ∅
• The classification task is to determine the region to which the
feature vector belongs.
• Boarders between decision boundaries are called decision
regions.
The efficiency of this classifier depend upon two
factors:
• What mistakes it does?
• Cost associated with these mistakes

To avoid misclassification two strategies are applied:

• Use the training data to build representative probability


model; separately model class conditional densities and
priors(Generative)

• Directly construct a good decision boundary, model the


posterior(Discriminative)
Types Data Modelling
• Generative methods
• Probabilistically model the classes
• model the “shape” of each class
• histograms, PCA, mixtures of Gaussians, graphical
models (HMM’s, belief networks, etc.)
• Discriminative methods
• Probabilistically model the decision boundaries
• model boundaries between classes
• perceptron, neural networks, support vector machines
(SVM’s)
Generative classification Example
• Skin pixels have a distinctive range of
colors corresponds to region(s) in RGB
color space
• For visualization, only R and G skin
components are shown in the image.
• Task: To determine whether the pixel
belongs to skin region or not.
• A pixel X = (R,G,B) is skin if it is in the
skin region
How to solve this problem?
Skin detection
X
• Learn the skin region from examples
• Manually label pixels in one or more
“training images” as skin or not skin
• Plot the training data in RGB space
• Skin pixels shown in orange, non-skin
pixels shown in red
• Some skin pixels may be outside the
region, non-skin pixels inside. Why?
Skin classification techniques
Given X = (R,G,B): how to determine if
it is skin or not?
• Nearest neighbor
– find labeled pixel closest to X
– choose the label for that pixel
• Data modeling
– fit a model (curve, surface, or
volume) to each class
• Probabilistic data modeling
– fit a probability model to each
class
Basic Probability
• X is a random variable
• P(X) is the probability that X achieves a certain
value
• For continuous X

‫׬‬−∞ 𝑃
𝑋 𝑑𝑋 = 1
0≤𝑃 𝑋 ≤1
• For discrete X
෍𝑃 𝑋 = 1

• Conditional probability: P(X | Y)


• probability of X given that we already know Y
Probabilistic skin classification

• Each pixel has a probability of being skin or not skin


𝑃 ~𝑠𝑘𝑖𝑛 𝑅 = 1 − 𝑃(𝑠𝑘𝑖𝑛|𝑅)
• Choose interpretation of highest probability
• set X to be a skin pixel if and only if
𝑅1 < 𝑋 ≤ 𝑅2
Learning conditional PDF’s

• We can calculate P(R | skin) from a set of training images. It is


simply a histogram over the pixels in the training images. Each
bin Ri contains the proportion of skin pixels with color Ri

• This doesn’t work as well in higher-dimensional spaces.


Approach: fit parametric PDF functions
• common choice is rotated Gaussian
• center
𝑐 = 𝑋ത
• Covariance

෍(𝑋 − 𝑋)(𝑋 ത 𝑇
− 𝑋)
𝑋

But this isn’t quite what we want


• How to determine if a pixel is skin?
• We want P(skin | R) not P(R | skin)
• How can we get it?
Bayes rule

what we measure domain knowledge


(likelihood) (prior)

what we want
normalization term
(posterior)
• The prior: P(skin)
• P(skin) may be larger if we know the image contains a
person
• For a portrait, P(skin) may be higher for pixels in the center
• P(skin) may be proportion of skin pixels in training set

• Suppose the prior is uniform: P(skin) = P(~skin) = 0.5

– in this case P(skin|R)= cP(R|skin)


P( ̴skin|R)= cP(R| s̴ kin) ,

• Maximizing the posterior is equivalent to maximizing the likelihood


P(skin|R)> P( s̴ kin|R) if and only if P(R|skin)>P(R| s̴ kin)

This is called Maximum Likelihood (ML) estimation


Bayesian estimation

likelihood posterior (un-normalized)

Goal : To choose the label (skin or ~skin) that maximizes the


posterior(minimizes the probability of misclassification).
This is called Maximum A Posteriori (MAP) estimation
General classification
• This same procedure applies in more general circumstances
– More than two classes
– More than one dimension

Example: face detection


• Here, X is an image region
– dimension = # pixels
– each face can be
thought of as a point in
a high dimensional
space
H. Schneiderman and T.Kanade
Principal Component Analysis (PCA)
• Problems arise when performing recognition in a high-dimensional
space (curse of dimensionality).
• Significant improvements can be achieved by first mapping the data
into a lower-dimensional sub-space.
• The goal of PCA is to reduce the dimensionality of the data while
retaining as much information as possible in the original dataset.

𝑎1 𝑏1
𝑎2 𝑏2
𝑥 = . −→ reduce dimensionality −→ y = . (K ≪ N)
. .
𝑎𝑁 𝑏𝑁

44
• Principle Component Analysis is all about finding the direction
in a feature space along which all the points have the greatest
variance.

Uses:
• Data Visualization
• Data Reduction
• Data Classification
• Trend Analysis
• Factor Analysis
• Noise Reduction
• Suppose we have a population measured on p random variables
X1,…,Xp.
• Note that these random variables represent the p-axes of the
Cartesian coordinate system in which the population resides.
• Our goal is to develop a new set of p axes (linear combinations of the
original p axes) in the directions of greatest variability.

X2

X1

This is accomplished by rotating the axes.


Algebraic Interpretation

• Given m points in a n dimensional space, for large n, how does one


project on to a low dimensional space while preserving broad
trends in the data and allowing it to be visualized?
Algebraic Interpretation – 1D
• Given m points in a n dimensional space, for large n, how
does one project on to a 1 dimensional space?

• Choose a line that fits the data so the points are spread out
well along the line
Algebraic Interpretation – 1D
• Formally, minimize sum of squares of distances to the line.

• Why sum of squares? Because it allows fast minimization,


assuming the line passes through 0
• Minimizing sum of squares of distances of the point to the line is
the same as maximizing the sum of squares of the projections on
that line, using Pythagoras theorem.

𝑥Ԧ 𝑇 𝑃

Origin

• 𝑥Ԧ 𝑖𝑠 𝑎 𝑢𝑛𝑖𝑡 𝑣𝑒𝑐𝑡𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑜𝑟𝑖𝑔𝑖𝑛


• 𝑥Ԧ 𝑇 𝑃 is the projection from point P to the line
• If there are more than one point present, then their total projection
on line can be found out using matrix multiplication

Point 1
P P P… P Point 2 L
Line * t t t … t *
Point 3 * i = ෍(𝑥Ԧ 𝑇 𝑃𝑖 )2
1 2 3… m : n 𝑖
Point m e

xT BT B x

• Our goal is to maximize 𝑥 𝑇 𝐵 𝑇 𝐵𝑥 , subject to constrain 𝑥 𝑇 𝑥 = 1


• Max( 𝑥 𝑇 𝐵 𝑇 𝐵𝑥 ), subject to constrain 𝑥 𝑇 𝑥 = 1

• Maximize 𝐸 = 𝑥 𝑇 𝑀𝑥 subject to 𝑥 𝑇 𝑥=1 (𝑀 = 𝐵 𝑇 𝐵)


• By applying Lagrange’s multiplier we can write
𝐸 ′ = 𝑥 𝑇 𝑀𝑥 + 𝜆 1 − 𝑥 𝑇 𝑥
• To maximize 𝐸′
𝑑𝐸 ′
= 2𝑀𝑥 + 2𝜆𝑥 = 0
𝑑𝑥
𝑀𝑥 = 𝜆𝑥(ℎ𝑒𝑟𝑒 𝑠𝑖𝑔𝑛 𝑖𝑠 𝑛𝑒𝑔𝑙𝑒𝑐𝑡𝑒𝑑)

• 𝑥 is an eigen vector of 𝐵 𝑇 𝐵
𝑻
How to calculate 𝑩 𝑩?
2
σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖
𝐵𝑇 𝐵= 2
σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 σ𝑛𝑖=1 𝑦𝑖

• Principle components are the orthogonal directions of the


covariance matrix of a set of points.
• 𝐵 𝑇 𝐵 = σ 𝑥𝑥 𝑇 if about origin
• 𝐵 𝑇 𝐵 = σ(𝑥 − 𝑥)(𝑥
Ԧ Ԧ 𝑇 if otherwise outer product
− 𝑥)
Dimensionality Reduction
• x is the mean of all the orange
G
points.
• v1 represents the position along
the line
• v2 represents the distance from
the points to the line
• To represent all the orange
point as 𝑣1 co-ordinates one
has to minimize 𝑣2 and
R
maximize 𝑣1
Consider the variation along direction 𝑣 among all of the orange
points:
T 2
var v = ෍ (x − x) . v
orange point x

= ෍ vT (x − x)(x − x)T v
x

= vT ෍ (x − x)(x − x)T
x
= vT Av
• v1 is eigenvector of A with largest eigenvalue
• v2 is eigenvector of A with smallest eigenvalue

𝑣1 = 𝑚𝑎𝑥𝑣 𝑣𝑎𝑟(𝑣) 𝑣2 = 𝑚𝑖𝑛𝑣 𝑣𝑎𝑟(𝑣)


Higher Dimension
• Suppose each data point is N-dimensional
T 2
var v = ෍ (x − x) . v
x
= 𝑥 𝑇 𝐴𝑥

• The eigenvectors of A define a new coordinate system.

• Eigenvector with largest eigenvalue captures the most


variation among training vectors x.

• Eigenvector with smallest eigenvalue has least variation.


Face recognition
• When viewed as vectors of
pixel values, face images are
extremely high dimensional.
• 100*100 image=10000
dimensions
• However only a few among
them corresponds a valid
face images
• We want to effectively model
a low dimensional linear
subspace which effectively
defines the variation of face
images.
• Given : M points x1,x2,...,xM in 𝑅𝑑 where d is very big.
• We want some direction in 𝑅𝑑 that captures most of the variations of
𝑥𝑖 . The coefficient would be
𝑢 𝑥𝑖 = 𝑢𝑇 𝑥𝑖 − μ
𝜇:mean of the points
• Direction that maximizes the variance of the projected data:
𝑀
1
= ෍ 𝑢𝑇 𝑥𝑖 − μ (𝑢𝑇 𝑥𝑖 − μ )𝑇
𝑀
𝑖=1
𝑀
𝑇
1 𝑇
=𝑢 ෍ 𝑥𝑖 − μ 𝑥𝑖 − μ u
𝑀
𝑖=1
= 𝑢𝑇 Σ𝑢
• Since u is a unit vector that can be expressed in terms of some
linear sum of eigenvectors of ∑, then direction of u that maximizes
the variance is the eigenvector associated with the largest
eigenvalue of ∑.

• The dimension that captures the maximum variation of data is the


eigenvector corresponding to the largest eigenvalue of the data
covariance matrix.

• Furthermore, the top k orthogonal directions that capture the most


variance of the data are the k eigenvectors corresponding to the k
largest eigenvalues.
PCA Example –STEP 1

• Subtract the mean from each of the data


dimensions.

• Subtracting the mean makes variance and covariance


calculation easier by simplifying their equations. The
variance and co-variance values are not affected by
the mean value.
PCA Example –STEP 1
DATA: ZERO MEAN DATA:
x y x y
2.5 2.4 .69 .49
0.5 0.7 -1.31 -1.21
2.2 2.9 .39 .99
1.9 2.2
.09 .29
3.1 3.0
1.29 1.09
2.3 2.7
2 1.6 .49 .79
1 1.1 .19 -.31
1.5 1.6 -.81 -.81
1.1 0.9 -.31 -.31
-.71 -1.01
PCA Example –STEP 2
• Calculate the covariance matrix
cov = .616555556 .615444444
.615444444 .716555556

• since the non-diagonal elements in this covariance


matrix are positive, we should expect that both the x
and y variable increase together.
PCA Example –STEP 3

• Calculate the eigenvectors and eigenvalues of


the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
PCA Example –STEP 3
•Eigenvectors are plotted
as diagonal dotted lines on
the plot.
• They are perpendicular
to each other.
•One of the eigenvectors
goes through the middle of
the points, like drawing a
line of best fit.
•The second eigenvector
gives us the other, less
important, pattern in the
data, that all the points
follow the main line, but
are off to the side of the
main line by some
amount.
PCA Example –STEP 4

• Reduce dimensionality and form feature vector: The


eigenvector with the highest eigenvalue is the principal
component of the data set.

• In our example, the eigenvector with the larges eigenvalue


was the one that pointed down the middle of the data.

• Once eigenvectors are found from the covariance matrix, the


next step is to order them by eigenvalue, highest to lowest.
This gives you the components in order of significance.
PCA Example –STEP 4
• Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
We can either form a feature vector with both of the
eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
PCA Example –STEP 5

• Deriving the new data


Final Data = Row Feature Vector * Row Zero Mean Data
• Row Feature Vector is the matrix with the eigenvectors in the
columns transposed so that the eigenvectors are now in the
rows, with the most significant eigenvector at the top
• Row Zero Mean Data is the mean-adjusted data transposed,
i.e. the data items are in each column, with each row holding
a separate dimension.
PCA Example –STEP 5
Final Data transpose: dimensions along columns
x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
Limitations of PCA
• Global appearance method : Not robust to misalignment,
background variation.
• The direction of maximum variance is not always good for
classification.
Some challenges for generative models
• Generative approaches were some of the first methods in pattern
recognition.
• Easy to model analytically and could be done with modest
amounts of moderate dimensional data.

There are some liabilities:

• Many signals are high dimensional and representing the complete


density of class is data-hard.
• Problem in model the hard cases-the ones near the boundaries!!
• We don’t typically know which features of instances actually
discriminative between classes.
Discriminative classification assumptions

• There are a fixed number of known classes.


• Ample number of training examples of each classes.
• Equal cost of making mistakes-what matter is getting the label right.
• Need to construct a representation of the instance but we don’t
know what features are diagnostic of the class label.
Basic framework
Train:
• Build an object model-a representation describe training
instances.
• Learn/train a classifier

Test:
• Generate candidates in new image
• Score the candidate
Discriminative methods(window method)
• Object detection and recognition is formulated as a classification
problem.
• The image is partitioned into a set of overlapping windows

Bag of image patches


• The image is partitioned into a set of overlapping windows and a
decision is taken at each window about if it contains a target object
or not.

Where are the screens?


Background Decision
boundary

Computer screen
Bag of image patches In some feature space
Generative vs. Discriminative

Generative Approach Discriminative Approach


model individual classes, priors model posterior directly
Feature or kernel space clustering

• Kernel functions are used to transform the data in the


image plane into a feature plane of higher dimension
(possibly infinite) known as kernel (or feature) space.
• Nonlinear mapping functions i.e. Φ transforms the
nonlinear separation problem in the image plane into a
linear separation problem in kernel space facilitating
clustering in the feature space.
Feature space clustering

• Due to high and possibly infinite feature dimension it is


unrealistic to measure the Euclidian distance between the
transformed variables.

• However, as per Mercer’s theorem working directly on the


transformed variables can be avoided.

• It can be used to calculate the distance between the pixel


feature values in the kernel space without knowing the
transformation function φ(.).
Feature space clustering

Mercers Theorem

• Any continuous, symmetric, positive semi definite kernel


function can be expressed as a dot product in a higher
dimension”. Thus if Φ(.) is the mapping function then using
Mercer’s Theorem k (x, y) is given by

where x and y be any two point in the image plane and Φ(x) and
Φ (y) are the corresponding points in the feature plane
respectively. ” ・ ” is the dot product in the kernel space and
k (x, y) represents kernel function.

Hematological Image Analysis for


108 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Feature space clustering

Standard Kernels
9/15/16

Computer Vision
Optical Flow

1
Readings
• Szeliski, R.
• Slide Credits : Ulas Bagci

2
Motion

3
Reasons
• the patterns only move when you blink or
move your eyes
• the arrangement of the backgrounds of the
‘rabbits’ determines which way the patterns
rotate.

4
Motion
Perception of change in the outside world is of the
highest importance for all living beings.

A great majority of changes in sensory excitation in


the image result from motion.

Images also have remarkable changes when a


light source is turned on or off-but this change
cannot be interpreted as motion.

5
Why Estimate Visual Motion?
• Visual Motion can be due to problems
– Camera instabilities, jitter

6
Why Estimate Visual Motion?
• Visual Motion can be due to problems
– Camera instabilities, jitter
• Visual Motion Indicates dynamics in the scene
– Moving objects, behavior, tracking objects,
analyze trajectories

7
Why Estimate Visual Motion?
• Visual Motion can be due to problems
– Camera instabilities, jitter
• Visual Motion Indicates dynamics in the scene
– Moving objects, behavior, tracking objects,
analyze trajectories
• Visual Motion reveals spatial layout
– Motion parallax

8
Motion Analysis
• Motion Analysis is connected with real-time
analysis; Ex: for robot navigation, gesture
identification etc.

• Prior Knowledge is necessary that includes:


➢ Information about the camera motion
➢ Information about the time interval between
consecutive images
Motion Analysis
Problems:
• Moving object detection and location; (i) static
camera and moving object (ii) Static object and
moving camera (iii) Moving object and moving
camera.
• Derivation of 3-D object properties from a set of
2-D projections acquired at different time
instants of object motion.
Motion Analysis
2-D representation of 3-D motion => Motion Field
Each point is assigned a velocity vector
corresponding to:
1. Motion direction
2. Velocity
3. Distance from an observer at an appropriate
image location
Visual Motion Estimation
• Patch-based motion
– Optical Flow
• Lucas-Kanade
• Horn-Schunck

• Motion Models

12
Applications
Motion
based
segmentation

Deformation Structure
Analysis from motion

Alignment
Tumor
(e.g., UAV
Tracking analysis)

Video
Compression

13
Video
• A video is a sequence of frames captured over
time
– Image data is a function of space (x,y) and time (t)

I(x,y,t)

Temporal
change t 15
Apparent Motion

Present dots on three movie frames See one moving dot

15
Describing Motion

16
Describing Motion
• Simplest way: Image Differencing (intensity values)
t

t+1

17
Optical Flow
• Refers to the problem of estimating a vector
field of local displacement in a sequence of
images.

18
Optical Flow
• Refers to the problem of estimating a vector
field of local displacement in a sequence of
images.
• When we fix our attention to a single point
and measure velocities flowing through that
location, then the problem is called optical
flow.

19
Optical Flow
• Refers to the problem of estimating a vector
field of local displacement in a sequence of
images.
• When we fix our attention to a single point
and measure velocities flowing through that
location, then the problem is called optical
flow.
– Stereo matching, image matching, tracking,…

20
Optical Flow
• Optical flow refers the image changes due to
motion during a time interval dt.
• The optical flow field is the velocity field that
represents the 3-D motion of object points across
a 2-D image. It represents only motion-related
intensity changes in the image.
• Optical flow should not be sensitive to
illumination changes and motion of unimportant
objects (shadows)
21
Estimating Optical Flow
• Assume the image intensity I is constant

Time = t Time = t+dt

I (x, y,t ) I (x + dx, y + dy,t + dt )


22
Estimating Optical Flow

Assumption:

• The observed brightness of any object point


is constant over time (Colour constancy /
brightness constancy)

• Nearby points in the image plane move in a


similar manner (small motion)

23
Estimating Optical Flow

I (x, y,t ) ≅ I (x + dx, y + dy,t + dt )


Let spatial displacement (dx, dy) occurs
during time interval dt with velocity (u, v)

24
Estimating Optical Flow

I (x, y,t ) ≅ I (x + dx, y + dy,t + dt )


I(x(t ) + u.∆t, y(t) + v.∆t) — I(x(t ), y(t), t) ≈ 0

25
Estimating Optical Flow
I (x, y,t ) ≅ I (x + dx, y + dy,t + dt)
I (x ( t ) + u.∆t, y(t) + v.∆t) — I ( x ( t ) , y(t), t) ≈ 0

Assuming I is differentiable function, and expand the first term using Taylor’s series:

26
Estimating Optical Flow
I (x, y,t ) ≅ I (x + dx, y + dy,t + dt )
I(x(t ) + u.∆t, y(t) + v.∆t) — I(x(t ), y(t), t) ≈ 0

Assuming I is differentiable function, and expand the first term using Taylor’s series:

Compact I x u + Iy v + I t = 0 Brightness constancy


constraint
representation 27
Assumption: The Brightness Constraint

• Expresses the idea of similar brightness for the


same objects observed in a sequence

28
Assumption: The Brightness Constraint

• Expresses the idea of similar brightness for the


same objects observed in a sequence
• When we follow with a given location and
trace their position in consecutive images of a
sequence, then the problem is called “feature
tracking”

29
Assumption: The Brightness Constraint

• Expresses the idea of similar brightness for the


same objects observed in a sequence
• When we follow with a given location and
trace their position in consecutive images of a
sequence, then the problem is called “feature
tracking”

30
Assumption: The Brightness Constraint

• Expresses the idea of similar brightness for the


same objects observed in a sequence
• When we follow with a given location and
trace their position in consecutive images of a
sequence, then the problem is called “feature
tracking”
• Trying to solve a single equation for two
variables (u and v) →ill-posed
I x u + Iy v + I t = 0 Aperture Problem
31
Aperture Problem
The components of the flow parallel to an edge is
unknown.

32
Second Assumption: Gradient Constraint
Velocity vector is constant within a small
neighborhood (LUCAS AND KANADE)

How to generate more equations for a pixel?


• Impose additional constraints.
• Assume that the flow field is smooth locally
• Assume the pixel’s neighbourhood have the same
(u,v)
• For a 5x5 window, 25 equations per pixel are
generated
33
34
35
Second Assumption: Gradient Constraint
Velocity vector is constant within a small neighborhood (LUCAS AND KANADE)
Type equation here.
2
E ( u , v) = (I x u + I y v + I t ) dxdy
x,y

E(u, v) @E(u, v)
= = 0
@u @v
2(I x u + I y v + I t ) I x = 0
2(I x u + I y v + I t )I y = 0
36
Lucas-Kanade
Type2 P Σ Σ
Minimum least square solution:
P
I
equation
x I x I y u I x I t
P P 2 = — P
I x
here. I y I y v Iy I t

37
Lucas-Kanade

P 2 P Σ Σ P
I I xIy u I x I t
P x P 2 = — P
IxIy Iy v Iy I t
Σ Σ
T xx Txy u T xt
Structural
= —
Tensor
representation Txy Tyy v Tyt

38
40
How does Lucas-Kanade behave?
At edges, matrix T becomes singular!

Textured region
Homog.
region

Eigenvalues of
Eigenvalues of
matrix T are high
matrix T are very small
41
Pisalls & Alternatives
• Brightness constancy is not satisfied
– Correlation based method could be used

42
Pisalls & Alternatives
• Brightness constancy is not satisfied
– Correlation based method could be used
• A point may not move like its neighbors
– Regularization based methods

43
Pisalls & Alternatives
• Brightness constancy is not satisfied
– Correlation based method could be used
• A point may not move like its neighbors
– Regularization based methods
• The motion may not be small (Taylor does not
hold!)
– Multi-scale estimation could be used

44
Multi-Scale Flow Estimation

u=1.25 pixels

u=2.5 pixels

u=5 pixels

u=10 pixels
image It-1 image It+1

Gaussian pyramid of image It Gaussian pyramid of image It+1


45
Example: Optical Flow in 1D

I(x,t) I(x,t + 1)


v?
p

46
Example: Optical Flow in 1D

I(x,t) I(x,t + 1)

It
Temporal derivative
p →
v

x
Ix

Spatial derivative
Assumptions:
I →
I t = I
I
Ix = V − t • Brightness constancy
x t t x= p
Ix • Small motion
47
Example: Optical Flow in 1D
Iterating helps refining the velocity vector

I(x,t) I(x,t + 1)
Temporal derivative at 2nd iteration

It
p

x
Ix

Can keep the same estimate for spatial derivative

→ → − It
vv previous
Converges in about 5 iterations
Ix 48
Lukas-Kanade without Pyramid

49
Lukas-Kanade with Pyramid

50
Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem

51
Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem
• Minimize a global energy function

52
Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem
• Minimize a global energy function

53
Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem
• Minimize a global energy function

• Take partial derivatives w.r.t. u and v:

54
Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem
• Minimize a global energy function

• Take partial derivatives w.r.t. u and v:


2
(I x u + I y v + I t ) I x —𝛼 𝛻 u = 0
2
(I x u + I y v + I t )I y — 𝛼 𝛻 v = 0
55
Horn & Schunck
(u,v)
• Iterative scheme v (u,v)
I x (I x u k + I y v k + I t )
u k +1 = u k − (I x , I y )
 +I +I
2 2
x
2
y

I y (I xu k + I y v k + I t )
v k +1 = v k −
 2 +I 2x+I 2y u

• Yields high density flow


• Fill in missing information in the homogenous
regions
• More sensitive to noise than local methods
56
Optical Flow Matlab/C++/Python Code
• http://people.csail.mit.edu/celiu/OpticalFlow/
• http://opencv-python-
tutroals.readthedocs.org/en/latest/
py_tutorials/py_video/py_lucas_kanade/
py_lucas_kanade.html
• https://github.com/Itseez/opencv_attic/blob/
a6078cc8477ff055427b67048a95547b3efe92a
5/opencv/samples/python2/lk_track.py

57
Applications: Target Tracking

58
Applications: Action Recognition

− + − +
Fx , Fy Fx , Fx , Fy , Fy blurred F −, F +, F −, F +
x x y y
59
Recognition actions at a distance, Efros et al.
Applications: Motion Modeling

Flipping between image 1 and 2. Estimated flow field Second image is warped!
(hue indicates orientation
and saturation indicates magnitude)

60
Applications: Motion Segmentation

61
Global Motion Models (Parametric)
All pixels are considered to summarize global
motion!

• 2D Models
– Affine
– Quadratic
– Planar projective (homography)
• 3D Models
– Inst. Camera motion models
– Homography+epipole
– Plane+parallalx

62
Motion Models

Translation Affine Perspective 3D rotation

2 unknowns 6 unknowns 8 unknowns 3 unknowns

63
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y

64
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y

I x u + Iy v + I t ≈ 0

65
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y

I x (a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0

66
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y

I x (a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0

Each pixel provides 1 linear constraint in 6 global unknowns (a1,..,a6)

67
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y
I x ( a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0

Each pixel provides 1 linear constraint in 6 global unknowns (a1,..,a6)

Over all pixels, minimize the least square to find unknowns!

68
Computer Vision

Motion Models, Feature Tracking and


Alignment

1
Readings
• Szeliski, R.
• Slide Credits : Ulas Bagci

2
Recap: Estimating Optical Flow

• Assume the image intensity I is constant

Time = t Time = t+dt

I (x, y,t ) I (x + dx, y + dy,t + dt )


3
First Assumption: Brightness Constraint

4
Second Assumption: Gradient Constraint

Velocity vector is constant within a small neighborhood (LUCAS AND KANADE)

2(I x u + I y v + I t )I x = 0
2(I x u + I y v + I t )I y = 0
5
Recap: Lucas-Kanade

T yt T xy —T xt T yy T xt T xy —T yt T xx
u= 2 and v = 2
T x x T y y —T x y T x x T y y —T x y
6
Recap: Horn & Schunck
• Global method with smoothness constraint to
solve aperture problem
• Minimize a global energy function

7
Global Motion Models(Parametric)
All pixels are considered to summarize global
motion!

• 2D Models
– Affine
– Quadratic
– Planar projective (homography)
• 3D Models
– Inst. Camera motion models
– Homography+epipole
– Plane+parallalx

8
Motion Models

Translation Affine Perspective 3D rotation

2 unknowns 6 unknowns 8 unknowns 3 unknowns

9
Example: Affine
Motion u(x, y) = a1 +
a 2 x + a 3 y v(x, y) = a 4 +
a 5 x + a6y

10
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
v(x, y) = a 4 + a 5 x + a 6 y

I x u + Iy v + I t ≈ 0

11
Example: Affine Motion
u(x, y) = a 1 + a 2 x + a 3 y
I x (a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0
v(x, y) = a 4 + a 5 x + a 6 y

12
Example: Affine
Motion u(x, y) = a1 +
a 2 x + a 3 y v(x, y) = a 4 +
a 5 x + a6y
I x (a 1 + a 2 x + a 3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0

Each pixel provides 1 linear constraint in 6 global unknowns (a1,..,a6)

13
Example: Affine
Motion u(x, y) = a1 +
a 2 x + a 3 y v(x, y) = a 4 +
I x ( aa1 5+xa 2+x +a a6 y
3 y) + I y (a 4 + a 5 x + a 6 y) + I t ≈ 0

Each pixel provides 1 linear constraint in 6 global unknowns (a1,..,a6)

Over all pixels, minimize the least square to find unknowns!

14
Example: Affine Motion

u(x, y) = a1 + a2 x + a3y
v(x, y) = a4 + a5 x + a6y
I x (a 1 + a2 x + a3 y) + I y (a4 + a5 x + a6 y) + I t 0

Each pixel provides 1 linear constraint in 6 global unknowns (a1,..,a6)

15
Locally Affine MotionModel

13
Locally Affine MotionModel

• E(A) should be computed within a small


neighborhood (i.e. locally affine term, let say
D)

17
Locally Affine MotionModel
• E(A) should be computed within a small
neighborhood (i.e. locally affine term, let say
D)
• From Taylor’s expansion

18
Locally Affine MotionModel

• E(A) should be computed within a small


neighborhood (i.e. locally affine term, let say
D)

19
Locally Affine MotionModel

• E(A) should be computed within a small


neighborhood (i.e. locally affine term, let say
D)
• From Taylor’s expansion

20
Locally Affine MotionModel

• E(A) should be computed within a small


neighborhood (i.e. locally affine term, let say
D)
• From Taylor’s expansion

18
Locally Affine MotionModel

22
Locally Affine MotionModel

23
Locally Affine MotionModel

24
Locally Affine MotionModel

25
Locally Affine MotionModel

26
Locally Affine MotionModel

27
Smoothness Constraint on Locally
Affine Motion Model

• We define a better error function by

28
Smoothness Constraint on
Locally Affine Motion Model

• We define a better error function by


E ( A) = E b (A) + E s (A)

29
Smoothness Constraint on Locally
Affine Motion Model

30
Smoothness Constraint on Locally
Affine Motion Model

31
Minimizing ErrorFunction
dE(A) dE b (A) dE s (A)
= +
da da da

32
Minimizing ErrorFunction
dE(A) dE b (A) dE s (A)
= +
da da da

dE b (A) ∑ --- 2c[k ------


T
=
da c a]
x ,y €D

33
Minimizing ErrorFunction

34
Minimizing Error Function-Solution Vector

36
Global Motion

Estimate motion using all


pixels in the image

37
Global Motion
Global Motion can be
Estimate motion using all used to
pixels in the image • Remove camera motion
• Object-based
segmentation
• generate mosaics

38
Global Motion

Global Motion can be


Estimate motion using all used to
pixels in the image • Remove camera motion
• Object-based
segmentation
• generate mosaics

39
Recap: Object Tracking

• Track an object over a sequence of images

40
Challenges in ObjectTracking

• Which features to track?


• Efficient tracking
• Appearance constraint violation
• …

41
Challenges in ObjectTracking

• Which features to track?


• Efficient tracking
• Appearance constraint violation
• …

42
Shi-Tomasi FeatureTracker

• Good Features to Track

43
Shi-Tomasi FeatureTracker

• Good Features to Track


– Find good features using eigenvalues of Hessian
matrix (threshold on the smallest eigenvalue
when computingHarris corner detection)

44
Shi-Tomasi FeatureTracker

• Good Features to Track


– Find good features using eigenvalues of Hessian
matrix (threshold on the smallest eigenvalue
when computingHarris corner detection)
– Track from frame to frame with LK

45
Shi-Tomasi FeatureTracker

• Good Features to Track


– Find good features using eigenvalues of Hessian
matrix (threshold on the smallest eigenvalue
when computingHarris corner detection)
– Track from frame to frame with LK
– Check consistencyof tracks by “affine registration”
to the first observed instance of the feature

46
Shi-Tomasi FeatureTracker

Shi and Tomasi


CVPR 1994
Good Features
To Track.
47
KLT Tracking
• KLT: Kanade-Lucas-Tomasi

48
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene

49
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene
• Object tracking (car, airplane, person)

50
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene
• Object tracking (car, airplane, person)
• Feature tracking (Harris corners)

51
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene
• Object tracking (car, airplane, person)
• Feature tracking (Harris corners)
• Multiple object tracking

52
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene
• Object tracking (car, airplane, person)
• Feature tracking (Harris corners)
• Multiple object tracking
• Tracking in single/multiple camera(s)
53
KLT Tracking
• KLT: Kanade-Lucas-Tomasi
• Tracking deals with estimating the trajectory
of an object in the image plane as it moves
around a scene
• Object tracking (car, airplane, person)
• Feature tracking (Harris corners)
• Multiple object tracking
• Tracking in single/multiple camera(s)
• Tracking in fixed/moving camera 54
KLT Tracking Algorithm
• Find GoodFeaturesToTrack
– Harris Corners (thresholdedon smallest
eigenvalues)
• Use LK algorithm to find optical flows
• Use Coarse-to-Fine strategy to deal with large
movements
• When creating long tracks, check appearance
of registered patch against appearance of
initial patch to find points that have drifted
55
Recent Developments at Optical Flow

• Start with LK or similar methods


+Gradientconsistency
+Energyminimization with smoothing term
+Region matching
+Keypointmatching

Region-based +Pixel-based +Keypoint-based


53
Large displacement optical flow, Brox et al., CVPR 2009
Very Recent Developments at Optical Flow
• Use of Machine Learning
– Deep Learning (ICCV 2015, Fischer et al., FlowNet)

57
9/22/16
DeepFlow (Large Displacement Optical Flow)

• Basically it is a matching algorithm with variational approach


[Weinzaepfel et al., ICCV 2013].

• Dense correspondence (matching)


• Self-smooth matching
• Large displacement optical
flowhttps://www.youtube.com/watch?v=k_wkDLJ8lJ
58
E
SIFT Tracking

Frame 0 → Frame 100

59
Practice: Horn-SchunckAlgorithm

60
Practice: Horn-SchunckAlgorithm

61
Practice: Horn-SchunckAlgorithm

62
Optical Flow - Quantitative Evaluation

63
Interpretation of Optical Flow Fields

64
Interpretation of Optical Flow Fields

Object features all have


Zero velocity.

65
Interpretation of Optical Flow Fields

66
Interpretation of Optical Flow Fields

Object is moving to the


Right.

67
Interpretation of Optical Flow Fields

68
Interpretation of Optical Flow Fields

Camera is moving into


the scene, and an object
moving passed the camera

69
Interpretation of Optical Flow Fields

70
Interpretation of Optical Flow Fields

Object is moving
Directly toward the camera
that is stationary

71
Interpretation of Optical Flow Fields

72
Interpretation of Optical Flow Fields

Object is rotating about


the line of sight to the camera

73
Interpretation of Optical Flow Fields

74
Interpretation of Optical Flow Fields

Object is rotating about an axis


perpendicular to the line of
sight.

75
Application in ImageAlignment
• Motion can be used for image alignment

73
Practice: HomogenousCoordinates
Q: How can we represent translation as a 3x3 matrix?
x' = x + t x
y' = y + t y

77
Practice: HomogenousCoordinates
Q: How can we represent translation as a 3x3 matrix?
x' = x + t x
y' = y + t y

78
Practice: Homogenous Coordinates
Practice: Homogenous Coordinates
Affine Transformation
Affine Transformation
Face Recognition and
Detection

4/22/2024 1
Eigenfaces for recognition
The Space of Faces

An image is a point in a high dimensional space


• An N x M image is a point in RNM
• We can define vectors in this space as we did in the 2D case
Key Idea

The set of faces is a “subspace” of the set of images


• We can find the best subspace using PCA
• This is like fitting a “hyper-plane” to the set of faces
– spanned by vectors v1, v2, ..., vK
– any face

• Images in the possible set  = {xRL } are highly correlated.


ˆ P

• So, compress them to a low-dimensional subspace that


captures key appearance characteristics of the visual DOFs.
Eigenfaces
PCA extracts the eigenvectors of A
• Gives a set of vectors v1, v2, v3, ...
• Each vector is a direction in face space
– what do these look like?

4/22/2024 5
Eigenfaces

Eigenfaces look somewhat like generic faces.


Linear subspaces
convert x into v 1, v 2 coordinates

What does the v 2 coordinate measure?


- distance to line
- use it for classification—near 0 for orange pts
What does the v 1 coordinate measure?
- position along line
- use it to specify which orange point it is

Classification can be expensive:


• Big search prob (e.g., nearest neighbors) or store large PDF’s
Suppose the data points are arranged as above
• Idea—fit a line, classifier measures distance to line
4/22/2024 7
Dimensionality reduction

Dimensionality reduction
• We can represent the orange points with only their v1 coordinates
(since v2 coordinates are all essentially 0)
• This makes it much cheaper to store and compare points
• A bigger deal for higher dimensional problems
4/22/2024 8
Linear subspaces
Consider the variation along direction v
among all of the orange points:

What unit vector v minimizes var?

What unit vector v maximizes var?

Solution: v 1 is eigenvector of A with largest eigenvalue


v 2 is eigenvector of A with smallest eigenvalue

4/22/2024 9
Principal component analysis
Suppose each data point is N-dimensional
• Same procedure applies:

• The eigenvectors of A define a new coordinate system


– eigenvector with largest eigenvalue captures the most variation
among training vectors x
– eigenvector with smallest eigenvalue has least variation
• We can compress the data using the top few eigenvectors
– corresponds to choosing a “linear subspace”
» represent points on a line, plane, or “hyper-plane”
– these eigenvectors are known as the principal components

4/22/2024 10
Higher Dimensions
Suppose each data point is N-dimensional
• Same procedure applies:

• The eigenvectors of A define a new coordinate system


– eigenvector with largest eigenvalue captures the most variation
among training vectors x
– eigenvector with smallest eigenvalue has least variation
• We can compress the data by only using the top few
eigenvectors
– corresponds to choosing a “linear subspace”
» represent points on a line, plane, or “hyper-plane”
– these eigenvectors are known as the principal components
Problem: Size of Covariance Matrix A

Suppose each data point is N-dimensional (N pixels)


2 2

• The size of covariance matrix A is N x N


• The number of eigenfaces is N

• Example: For N = 256 x 256 pixels,


Size of A will be 65536 x 65536 !
Number of eigenvectors will be 65536 !

Typically, only 20-30 eigenvectors suffice. So, this


method is very inefficient!
Efficient Computation of Eigenvectors

If B is MxN and M<<N then A=BTB is NxN >> MxM


• M → number of images, N → number of pixels
• use BBT instead, eigenvector of BBT is easily
converted to that of BTB

(BBT) y = e y
=> BT(BBT) y = e (BTy)
=> (BTB)(BTy) = e (BTy)
=> BTy is the eigenvector of BTB
Eigenfaces – summary in words

Eigenfaces are
the eigenvectors of
the covariance matrix of
the probability distribution of
the vector space of
human faces

Eigenfaces are the ‘standardized face ingredients’ derived


from the statistical analysis of many pictures of human
faces

A human face may be considered to be a combination of


these standardized faces
Generating Eigenfaces – in words

1. Large set of images of human faces is taken.


2. The images are normalized to line up the eyes,
mouths and other features.
3. The eigenvectors of the covariance matrix of
the face image vectors are then extracted.
4. These eigenvectors are called eigenfaces.
Eigenfaces for Face Recognition

When properly weighted, eigenfaces can be summed


together to create an approximate gray-scale
rendering of a human face.

Remarkably few eigenvector terms are needed to give


a fair likeness of most people's faces.

Hence eigenfaces provide a means of applying data


compression to faces for identification purposes.
Dimensionality Reduction

The set of faces is a “subspace” of the set


of images

• Suppose it is K dimensional

• We can find the best subspace using


PCA

• This is like fitting a “hyper-plane” to the


set of faces

– spanned by vectors v1, v2, ..., vK

Any face:
Eigenfaces
PCA extracts the eigenvectors of A
• Gives a set of vectors v1, v2, v3, ...
• Each one of these vectors is a direction in face space
– what do these look like?
Projecting onto the Eigenfaces

The eigenfaces v1, ..., vK span the space of faces

• A face is converted to eigenface coordinates by


Choosing the Dimension K

eigenvalues

i= K NM

How many eigenfaces to use?

Look at the decay of the eigenvalues


• the eigenvalue tells you the amount of variance “in
the direction” of that eigenface
• ignore eigenfaces with low variance
Is this a face or not?
Recognition with Eigenfaces
Algorithm
1. Process the image database (set of images with labels)
– Run PCA—compute eigenfaces
– Calculate the K coefficients for each image

2. Given a new image (to be recognized) x, calculate K


coefficients

3. Detect if x is a face

4. If it is a face, who is it?


• Find closest labeled face in database
• nearest-neighbor in K-dimensional space
Key Property of Eigenspace Representation

Given

• 2 images xˆ1 , xˆ2 that are used to construct the Eigenspace


• ĝ1 is the eigenspace projection of image x̂1
• ĝ 2 is the eigenspace projection of image x̂2
Then,
|| gˆ 2 − gˆ1 ||  || xˆ2 − xˆ1 ||

That is, distance in Eigenspace is approximately equal to the


correlation between two images.
(M’ is the number of eigenfaces used)
Appearance-based Recognition

• Directly represent appearance (image brightness), not geometry.

• Why?
Avoids modeling geometry, complex interactions
between geometry, lighting and reflectance.

• Why not?
Too many possible appearances!
m “visual degrees of freedom” (eg., pose, lighting, etc)
R discrete samples for each DOF

How to discretely sample the DOFs?

How to PREDICT/SYNTHESIS/MATCH with novel views?


Appearance-based Recognition
• Example:

• Visual DOFs: Object type P, Lighting Direction L, Pose R

• Set of R * P * L possible images:

 = {xˆ } P
RL
• Image as a point in high dimensional space:

Pixel 2 gray value



x̂ is an image of N pixels and
A point in N-dimensional space

Pixel 1 gray value


Appearance from different view points

COIL Database, CAVE Lab, Columbia


Parametric Eigenspace
Estimating orientation of object

CAVE Lab, Columbia Univ


Estimating orientation of the object

CAVE Lab, Columbia Univ


View-Based and Modular
Eigenspaces for Face Recognition
Part-based eigenfeatures
Learn a separate
eigenspace for each
face feature
Boosts performance
of regular
eigenfaces

4/22/2024 35
Bayesian Face Recognition

4/22/2024 36
Bayesian Face Recognition

4/22/2024 37
Robust real-time face detection
Scan classifier over locs. & scales

4/22/2024 39
“Learn” classifier from data
Training Data
• 5000 faces (frontal)
• 108 non faces
• Faces are normalized
• Scale, translation
Many variations
• Across individuals
• Illumination
• Pose (rotation both in plane and out)
4/22/2024 40
Characteristics of algorithm
• Feature set (…is huge about 16M features)
• Efficient feature selection using AdaBoost
• New image representation: Integral Image
• Cascaded Classifier for rapid detection

➢ Fastest known face detector for gray scale


images

4/22/2024 41
Image features
• “Rectangle filters”
• Similar to Haar wavelets
• Differences between
sums of pixels in
adjacent rectangles

4/22/2024 42
Integral Image
Partial sum

Any rectangle is
D = 1+4-(2+3)

Also known as:


• summed area tables [Crow84]
• boxlets [Simard98]

4/22/2024 43
Huge library of filters

4/22/2024 44
Constructing the classifier
Perceptron yields a sufficiently powerful classifier

Use AdaBoost to efficiently choose best features


• add a new hi(x) at each round hi(x)
• each hi(xk) is a “decision stump” b=Ew(y [x> q])
a=Ew(y [x< q])
 x

4/22/2024 45
Constructing the classifier
For each round of boosting:
• Evaluate each rectangle filter on each example
• Sort examples by filter values
• Select best threshold for each filter (min error)
• Use sorting to quickly scan for optimal threshold
• Select best filter/threshold combination
• Weight is a simple function of error rate
• Reweight examples
• (There are many tricks to make this more efficient.)
4/22/2024 46
Good reference on boosting
Friedman, J., Hastie, T. and Tibshirani, R.
Additive Logistic Regression: a Statistical
View of Boosting
http://www-stat.stanford.edu/~hastie/Papers/boost.ps
“We show that boosting fits an additive logistic
regression model by stagewise optimization of a
criterion very similar to the log-likelihood, and present
likelihood based alternatives. We also propose a
multi-logit boosting procedure which appears to have
advantages over other methods proposed so far.”

4/22/2024 47
Trading speed for accuracy
Given a nested set of classifier hypothesis
classes

Computational Risk Minimization

4/22/2024 48
Sample results

4/22/2024 49
Summary (Viola-Jones)
• Fastest known face detector for gray images
• Three contributions with broad applicability:
❖Cascaded classifier yields rapid
classification
❖AdaBoost as an extremely efficient feature
selector
❖Rectangle Features + Integral Image can
be used for rapid image analysis

4/22/2024 50
Schneiderman
Viola Kanade
Jones

4/22/2024 51
Hematological Image Analysis for
Acute Lymphoblastic Leukemia
Diagnosis and Classification

Dr. Dipti Patra


Seminar Outline
• Introduction
• Preliminaries of hematological disorders
• Leukemia and its classification
• Acute lymphoblastic leukemia (ALL)
• Diagnosis of ALL
• Lymphocyte image segmentation
• Automated characterization of Lymphocytes for ALL Diagnosis
• Automated FAB classification of lymphoblast subtypes
• Automated immunological classification of lymphoblast subtypes
• Image morphometry for lymphoid and myeloid blast
classification
• Conclusion
• References

Hematological Image Analysis for


2 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Preliminaries of hematological disorders

• The study of blood disorders are commonly known as


hematology and are treated by medical experts known as
hematologists.
• Hematological disorders can be broadly classified in three ways,
i.e. by the type of blood cell which is affected, according to
functional disorders of the blood and lymphoid organs and
neoplastic disorders of blood and lymphoid organs.

Hematological Image Analysis for


3 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Type of blood disorders

Hematological Image Analysis for


4 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Leukemia

• Leukemia is a cancer of the blood tissue characteristically


associated with increased numbers of white cells in the bone
marrow and / or peripheral blood.

• Symptoms: fever, anemia, bleeding and/or bruising, persistent


weakness or tiredness, achiness in the bones or joints,
recurrent infections, difficulty breathing or swollen lymph nodes.

• Causes: No particular cause; include natural or artificial ionizing


radiation, certain kinds of chemicals, some viruses, and genetic
predispositions.

Hematological Image Analysis for


5 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Leukemia classification

Hematological Image Analysis for


6 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Acute Lymphoblastic Leukemia

• Acute lymphoblastic leukemia (ALL) is a malignant disease


caused by the genetic alterations of the lymphocyte precursor
cells of the bone marrow.

• ALL is characterized by excessive production of immature


lymphocytes (lymphoblast) in the bone marrow preventing
normal hematopoiesis.

• Two popular ALL classification schemes presently in use are


French–American–British (FAB) classification and immunological
classification by world health organization (WHO).

Hematological Image Analysis for


7 Acute Lymphoblastic Leukemia
Diagnosis and Classification
ALL classification

• Two popular ALL classification schemes presently in use are


French–American–British (FAB) classification and immunological
classification by world health organization (WHO).

• As per FAB classification, there are three subtypes of ALL i.e. L1,
L2 and L3, and each has a distinct blast morphology.

• Immunological methods are based upon cell surface antigen


and can be broadly subdivided as:

• Precursor B–lymphoblastic leukemia or pre–B


• Precursor T–lymphoblastic leukemia or pre–T
• Mature B–lymphoblastic leukemia or mature–B

Hematological Image Analysis for


8 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Diagnosis of ALL

• Techniques
• Light microscopic examination
• Cytochemistry
• Immunophenotyping

• Cytochemistry and immunophenotyping is expensive, and


requires additional evaluation of blood samples by molecular
analysis and flow cytometry.

• Microscopic examination of peripheral blood samples is still


considered as a standard economical procedure for confirmative
screening and subtyping of ALL in most of the countries including
India.

Hematological Image Analysis for


9 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Conventional diagnosis of ALL

Peripheral Blood Microscopic


Smear Staining Examination and
Collection Decision

Blood Analysis
Report by
Pathologist

Hematological Image Analysis for


10 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Limitations of conventional diagnosis

• Variability in diagnosis: It arises due to improper staining


process, intraobserver and interobserver differences.

• Time consuming

• Need for repeated sample collection

Hematological Image Analysis for


11 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Need for automation

• The motivation to automate comes from the fact that besides


being time consuming, the results of manual diagnosis varies
with the hematologist’s skill, experience, work load and stress
level.

• Manual analysis becomes questionable not only because of the


amount of work, but also with regard to precision and the
reproducibility of the results.

• Automated systems can help overcome the dearth of trained


personnel.

Hematological Image Analysis for


12 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Quantitative microscopy

Peripheral
Blood Smear
Image
Sample Blood Digital
Smear Staining Microscope with a
Collection Computer

Hematological
Image Analysis and
Decision
(Computer)

Hematological Image Analysis for


13 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Research objective

In particular, the objectives are narrowed to —

i. Devise improved segmentation schemes for lymphocyte images.

ii. Utilize morphological, texture and color features in peripheral blood


smear images to classify between mature lymphocytes and
lymphoblasts.

iii. Develop a system for the classification of lymphoblasts.

iv. Establish an improved machine learning system for the classification


of lymphoid and myeloid blasts in peripheral blood smear images.

Hematological Image Analysis for


14 EE Department of NIT Rourkela
Acute Lymphoblastic Leukemia
Diagnosis and Classification
Proposed model using image processing
and machine learning

Hematological Image Analysis for


15 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Materials and methods

• Histology

• Hematological Image Acquisition

• Sub Imaging

• Color Space Conversion

• Preprocessing

Hematological Image Analysis for


16 Acute Lymphoblastic Leukemia EE Department of NIT Rourkela
Diagnosis and Classification
Hematological Image Acquisition

• Blood microscopic images of Lieshman stained peripheral blood


samples were optically grabbed by Zeiss Observer microscope
(Carl Zeiss, Germany) under 100X oil immersed setting and
with an effective magnification of 1000 at Ispat General
Hospital, Rourkela, India.

Hematological Image Analysis for


17 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Sub imaging

• Peripheral blood smear images are relatively larger with more


than one leukocyte per image.
• The desired region of interest (ROI) must contain a single
lymphocyte only for ALL detection.

a) Entire Blood Image b) Nucleus Centroid c) Detected Subimages

Hematological Image Analysis for


18 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Color space conversion

• Blood microscopic images are acquired in RGB color space.


• To perform image segmentation, the features of objects of
interest might seem to be more decorrelated in certain color
space than in others.
• Due to strong correlation among the individual color planes,
RGB color model is unsuitable for stained peripheral blood
smear image segmentation.
• L*a*b* color model is a suitable alternative for color feature
based image segmentation as the color dimension is reduced
and the color channels (a∗ and b∗) are uncorrelated.
• Transforming the blood microscopic images from RGB to CIELAB
reduces the color dimension of the problem from three (RGB) to
two (a∗ and b∗) and facilitates color based image segmentation.

Hematological Image Analysis for


19 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel classification
problem
• Segmentation of lymphocyte images is considered as a pixel
classification problem in a supervised framework.
• It is performed by measuring a set of CIELAB color features of
each pixel which defines a decision surface in the feature
space.
• Each pixel of the lymphocyte image is classified as belonging
to one of the regions i.e. cytoplasm, nucleus or background
(including RBC) using an improved neural classifier .
• The use of Functional Link Artificial Neural Network (FLANN)
as a classifier is introduced here for lymphocyte image
segmentation

Hematological Image Analysis for


20 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel classification
problem
Proposed lymphocyte image segmentation algorithm using FLANN
(FLANNLS)

Hematological Image Analysis for


21 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel classification
problem
Proposed lymphocyte image segmentation algorithm using FLANN
(FLANNLS)
• To train the FLANN, the input–output patterns (Color features)–
Label) are generated for different sample lymphocyte images.
• a*, b* color values of each pixel is fed as input to the FLANN,
and the label of each pixel location is computed.
• Using a set of input–output pair (training data set) the network
parameters are optimized.
• In order to calculate the error, the actual label output of the
FLANN is compared with the desired label output provided by
the human expert.
• Depending on this error value, the weight matrix between input
and output layers is updated using back propagation algorithm
(BPA).

Hematological Image Analysis for


22 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel classification
problem
Proposed lymphocyte image segmentation algorithm using FLANN
(FLANNLS)

• Training patterns generated from a particular image IGH24HS is


listed below.

Hematological Image Analysis for


23 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel classification
problem
Proposed lymphocyte image segmentation algorithm using FLANN
(FLANNLS)
• The training convergence characteristics of FLANN.

Hematological Image Analysis for


24 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel classification
problem
Proposed lymphocyte image segmentation algorithm using FLANN
(FLANNLS)
• To validate the prediction of the FLANN, six patterns from six
different images other than the training images were tested and
listed below.

Hematological Image Analysis for


25 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel clustering problem

• Clustering is a special kind of unsupervised classification and


can be broadly classified as hard clustering (or crisp clustering)
and soft clustering.
• The popular clustering algorithms i.e. K–means, K–medoid
belong to the first category and each pixel is assumed to
belong to one and only one cluster.
• Soft clustering based clustering algorithms were developed,
and offer a principal alternative to crisp approaches with pixels
having partial membership to different classes.
• Algorithms such as fuzzy c–means (FCM), Rough c–means and
Shadow c–means (SCM) are generally used for the
segmentation of such images, where the class separation is
not well defined.

Hematological Image Analysis for


26 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel clustering problem
Need for feature (or kernel) space clustering

• Even though, the soft clustering approaches endow efficient


handling of overlapping partitions for spherical data, it fails
dramatically when the structure of input patterns is non–
spherical and complex.
• An alternative approach for solving such problems is to adopt
the strategy of nonlinearly transforming the data into a higher
dimensional feature space and then performing the clustering
within this feature space.
• Two novel kernel based clustering algorithms have been
proposed here for the segmentation of human lymphocyte
images.

Hematological Image Analysis for


27 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation as a pixel clustering problem
Soft Partitive Clustering Algorithms

• Fuzzy c–means (FCM)


• Rough c–means (RCM)
• Rough–Fuzzy c–Means
• Shadowed c–Means (SCM)

Hematological Image Analysis for


28 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Fuzzy c-means algorithm

• The first algorithm in the soft clustering arena was Fuzzy c–


means (FCM), which was developed in 1973 by Dunn and
improved by Bezdek in 1981.
• In FCM each data point is associated with every cluster using a
membership function, which gives degree of belongingness to
the clusters. The partition matrix is obtained by minimizing the
following objective function:

where, 1 ≤ m < ∞ is the degree of fuzziness, Xk is the kth data


pattern, vi is the ith cluster center, μ ∈ [0, 1] is the membership of
the kth data pattern to it, and ||.|| is the Euclidean distance norm.

Hematological Image Analysis for


29 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Fuzzy c-means algorithm

• The partition matrix is obtained by minimizing the following


objective function:

where, 1 ≤ m < ∞ is the degree of fuzziness, Xk is the kth data


pattern, vi is the ith cluster center, μ ∈ [0, 1] is the membership of
the kth data pattern to it, and ||.|| is the Euclidean distance norm.
Where as

∀i with dik= ||Xk− vi||2, subject to σ𝑐𝑖=1 μ𝑖𝑘 = 1, ∀k, and 0<σ𝑐𝑖=1 μ𝑖𝑘 <N, ∀i.

Hematological Image Analysis for


30 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Fuzzy c-means algorithm (FCM)

Hematological Image Analysis for


31 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Fuzzy c-means algorithm (FCM)

• Even though, the membership concept of fuzzy sets endow


efficient handling of overlapping partitions in FCM algorithm,
issues like uncertainty, vagueness and incompleteness still
persists.
• In view of this, there was a necessity to use an alternative tool
like rough sets to handle such issues.
• To achieve such robustness in clustering problems rough sets
were incorporated in the c–means framework and was termed
as Rough c–means (RCM) algorithm.

Hematological Image Analysis for


32 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Rough sets

• The principle of rough set is based on representation of rough or


imprecise information in terms of exact concepts i.e., lower and
upper approximation.
• The difference of upper and lower approximation will result with
objects in the rough boundaries.

Hematological Image Analysis for


33 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Rough–c–Means

• In Rough c–means (RCM) clustering, the idea of standard c–


means is extended by visualizing each class as an interval or
rough set.
• A rough set Yis characterized by its lower and upper
approximations BYand B Y respectively.

• In rough context an object Xkcan be a member of at most one


lower approximation. If Xk∈ BYof cluster Y, then concurrently
Xk∈ BYof the same cluster. Whereas it will never belong to other
clusters. If Xkis not a member of any lower approximation, then
it will belong to two or more upper approximations.

Hematological Image Analysis for Subrajeet Mohapatra, Roll#509EE108


34 Acute Lymphoblastic Leukemia EE Department of NIT Rourkela
Diagnosis and Classification
Soft partitive clustering
Rough c-Means (RCM)

• Updated centroid vi of cluster Ui is computed as

Hematological Image Analysis for


35 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering

Rough c-Means (RCM) algorithm

Hematological Image Analysis for


36 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Rough c-Means (RCM)

• Rough c–means algorithm is completely governed by three


parameters such as wlow, wup and T. The parameter threshold
can be defined as relative distance of a data member Xk from a
pair of cluster centroids vi and vj. These parameters each has
to be suitably tuned for proper segmentation.

Hematological Image Analysis for


37 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Rough Fuzzy-c-Means (RFCM) algorithm

Hematological Image Analysis for


38 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Rough Fuzzy-c-Means (RFCM) algorithm

Hematological Image Analysis for


39 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Rough Fuzzy-c-Means (RFCM)

• Even though the clustering performance is improved with rough


set based approaches they are limited by issues like fine tuning
of upper and lower approximation parameters and
determination of threshold.

Hematological Image Analysis for


40 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow sets
• Fuzzy logic strives to model vagueness using membership
function, which indicates the degree of belongingness to a
concept which is desired to be represented.

• Membership values are accurate numerical quantities


representing excessive precision for describing imprecise
phenomena.
• Such excessive precision is undesirable under imprecise
phenomenon and a possible solution to it is shadowed sets.
• Most of the uncertainty arises in the determination of the
membership grades around 0.5 in contrast to assigning grades
close to 1 or 0.
• Confusion of assigning the belongingness around 0.5 sparked the
need of shadowed sets.
Hematological Image Analysis for
41 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow sets

• For each fuzzy set few membership values beyond a particular


threshold are elevated and reduced those are below a
substantially low value.
• Such process eliminate disambiguate property of the fuzzy sets
and thereby reducing the number of computations.
• The overall level of vagueness is maintained by defining a new
region termed zone of vagueness.
• Suitable membership assignment is made such that this
particular area of the universe of discourse will have values
between [0, 1],

Hematological Image Analysis for


42 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow sets

• The transformation of fuzzy set to shadowed set is achieved


using a particular threshold.
• Effectively such development transforms the domain of
discourse into clearly marked region of vagueness.
• Such mapping is termed as shadowed set and is defined as
A : X → {0, 1, [0, 1]}.
• The elements of X for which A attains the value 1 constitute its
core, whereas the elements with A(x) = [0, 1] lies in the
shadow region of the mapping; the rest forms the exclusion
region.

Hematological Image Analysis for


43 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow sets

Hematological Image Analysis for


44 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow sets

• A particular threshold is desired for partitioning the distribution


into core, shadowed and exclusion regions.
• A suitable threshold λ is obtained for the quantification process
using the relation:

where λ ∈ (0, 1/2) such that θ(λi) = 0


• A discrete version of the above equation can be defined as,

Hematological Image Analysis for


45 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow c- means (SCM) algorithm

Hematological Image Analysis for


46 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Soft partitive clustering
Shadow c- means (SCM) algorithm
where,

• Data patterns belonging to core region have no fuzzy weight


factor, whereas elements belonging to shadowed region are
treated as in FCM.
• Data members belonging to the exclusion region, the fuzzy weight
factor m is raised to itself i.e. mm.
• Unlike FCM, it does not attempt fuzzification for elements having
membership values above the threshold reducing computational
burden.
Hematological Image Analysis for
47 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Comparison of SCM with FCM and RCM
clustering algorithms
• Unlike FCM, it does not attempt fuzzification for elements having
membership values above the threshold reducing computational
burden.
• As compared to RCM there is absence of external user defined
parameters.
• Removal of this initial trial and error factor makes SCM more
robust , as well as insensitive to the fluctuations in the incoming
data.
• Analogous to K-means, FCM and RCM algorithm SCM is effective
only in clustering crisp, spherical, and non–overlapping type of
data.
• In order to alleviate this problem, feature space transformation
using non–linear kernels is proposed for the clustering of
lymphocyte image data leading to improvement in segmentation
performance.
Hematological Image Analysis for
48 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Feature or kernel space clustering
• Kernel functions are used to transform the data in the
image plane into a feature plane of higher dimension
(possibly infinite) known as kernel (or feature) space.
• Nonlinear mapping functions i.e. Φ transforms the
nonlinear separation problem in the image plane into a
linear separation problem in kernel space facilitating
clustering in the feature space.

Hematological Image Analysis for


49 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Feature space clustering

• Due to high and possibly infinite feature dimension it is


unrealistic to measure the Euclidian distance between the
transformed variables.

• However, as per Mercer’s theorem working directly on the


transformed variables can be avoided.

• It can be used to calculate the distance between the pixel


feature values in the kernel space without knowing the
transformation function φ(.).

Hematological Image Analysis for


50 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Feature space clustering
Mercers Theorem

• Any continuous, symmetric, positive semi definite kernel


function can be expressed as a dot product in a higher
dimension”. Thus if Φ(.) is the mapping function then using
Mercer’s Theorem k (x, y) is given by

where x and y be any two point in the image plane and Φ(x) and
Φ (y) are the corresponding points in the feature plane
respectively. ” ・ ” is the dot product in the kernel space and
k (x, y) represents kernel function.

Hematological Image Analysis for Subrajeet Mohapatra, Roll#509EE108


51 Acute Lymphoblastic Leukemia EE Department of NIT Rourkela
Diagnosis and Classification
Feature space clustering
Standard Kernels

Hematological Image Analysis for


52 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Kernel Induced Rough Fuzzy c–means for
Lymphocyte Image Segmentation (KIRFCMLS)

Hematological Image Analysis for


53 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Kernel Induced Shadowed c–means for
Lymphocyte Image Segmentation (KISCMLS)

Hematological Image Analysis for


54 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Lymphocyte Image Segmentation as a
Pixel Labeling Problem
• Image segmentation using spatial interaction models like
Markov Random Field (MRF) and Gibbs Random Field (GRF) to
model the images was inspired by the computational techniques
developed in statistical mechanics.
• Such modeling started with the influential work of Geman and
Geman who linked via statistical mechanics between mechanical
systems and probability theory.

Hematological Image Analysis for


55 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Markov Random Field (MRF)

A random field X = {Xi,j} defined over lattice S is a Markov


Random Field (MRF) with respect to the neighborhood system η if
and only if
1. All of its realizations have non zero probabilities P(X = x)>0
for all x (property of positivity).
2. Its conditional distribution satisfies the following property

where xij is the configuration corresponding to the random


variable Xi j and so on.

Hematological Image Analysis for


56 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Clique
A clique of the pair (S, η) denoted by c is a subset of S such that

1. c consists of a single pixel, or


2. for (i, j) (k, l), (i, j) ∈ c and (k, l) ∈ c implies that (i, j) ∈ η(k, l)

The collection of all cliques of (S, η) is defined by C(S, η).

Hematological Image Analysis for


57 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Gibbs random field
Considering η be a neighborhood system defined over a finite
lattice S. A random field X is said to be a Gibbs Random Field
(GRF) of lattice S with respect to a neighborhood system η if and
only if its configuration obey a Gibbs distribution which has the
following form

where Z is the partition function defined as:

Hematological Image Analysis for


58 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Gibbs random field
T is a constant analogous to temperature which shall be assumed
to be 1 unless otherwise stated and U(x) is the energy function
expressed as

Energy is sum of clique potentials Vc(x) over all possible cliques C.


Vc(x) are a set of potential functions depending on the values of x
at the sites in the clique c.

Hematological Image Analysis for


59 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Markov-Gibbs equivalence
• A MRF is defined in terms of local properties (the classification
label assigned to a pixel is affected only by its neighbors),
whereas a GRF is characterized by its global property (the Gibbs
distribution).
• The popular Hammersley–Cliffords theorem states that given the
neighborhood structure η of the model, for any set of sites within
the lattice S, their associated contribution to the Gibbs energy
function should be non zero, if and only if the sites form a clique;
a random field’s having the Markov property is equivalent to its
having a Gibbs distribution.
• This theorem establishes the equivalence of these two types of
properties and provides a very general basis for the specification
of MRF joint distribution function.

Hematological Image Analysis for


60 Acute Lymphoblastic Leukemia
Diagnosis and Classification
MRF image model
• Let the images are assumed to be defined on a discrete
rectangular lattice S =N × N. Assuming that X denotes the
random field associated to the noise free image and Z denotes
the corresponding label process. Let z be a realization of Z and
the label process Z is modeled as MRF. The observed image y is
assumed to be a realization of the random field. Moreover, the
label process Z is assumed to be a MRF with respect to a
neighborhood system η and is described by its local
characteristics

As Z is MRF, or equivalently Gibbs distributed, the joint


distribution can be expressed as

Hematological Image Analysis for


61 Acute Lymphoblastic Leukemia
Diagnosis and Classification
MRF image model

Hematological Image Analysis for


62 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Problem addressed

The following image model is considered:

Y=X+N

• Y= Random field associated with the observed image

• X=Random field associated with noise free label process of the


original image which is assumed to be MRF having a priori
values of the model parameter.
• N= Random field associated with Gaussian noise process with
zero mean and known variance.

Hematological Image Analysis for


63 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Image label estimation

• In a supervised framework the number of regions M and the


model parameters are assumed to be known, it is required to
estimate the pixel labels using the associated model parameter
vector θ.
• The label process Z, of the image is modeled as MRF and the
objective is to obtain the optimal estimate of the realization of
the scene labels z∗ and hence achieve segmentation.
• The following optimality criterion is adopted.

Where θ denote the parameter vector, which is assumed to be


known a priori,𝑧is
Ƹ the MAP estimate of the labels. Since z is
unknown the posterior probability cannot be evaluated here.

Hematological Image Analysis for


64 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Image label estimation

Hence using Bayes rule,

Since y is known, the denominator of the above equation is constant.

Hematological Image Analysis for


65 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Image label estimation

Using the assumptions i.e. the noise is a white Gaussian


sequence with zero mean, is independent of z in the degradation
model, P(Y = y | Z = z, θ) can be expressed as

1
P(Z=z|ϴ)= 𝑒 −𝑈(𝑧,Φ)
𝑍

Hematological Image Analysis for


66 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Image label estimation

We can further simplify this optimization problem by taking the


negative and minimizing the resultant to obtain the following
relation:

This optimization is computationally an enormous task. In order


to reduce the computational burden, a memory based search
algorithm is proposed here by exploiting the notion of simulated
annealing for obtaining the MAP estimate of image labels 𝑧.Ƹ

Hematological Image Analysis for


67 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Memory Based Search Algorithm for Lymphocyte Image
Segmentation (MBSLS)
• In this algorithm the notion of annealing is employed with a view
to examine every point of the search space with finite probability
and hence achieve global optimal solution.
• The next move of the memory based search is achieved by using
the notion of neighborhood search.
• During implementation, an image is considered as a point in the
multidimensional search space.
• The next move is another image in the neighborhood that has
energy less than all the previous moves and is attained by
perturbing the point in the neighborhood structure.
• Following, this approach the revisiting of earlier points are
avoided and an array of image is created. In order to overcome
the local minima trapping, a criterion is introduced. The criterion
here is to accept moves of higher energy with a probability. This
guides the algorithm to overcome the local minima
Hematological Image Analysis for
68 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Memory Based Search Algorithm for Lymphocyte Image
Segmentation (MBSLS)

Hematological Image Analysis for


69 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Memory Based Search Algorithm for Lymphocyte Image
Segmentation (MBSLS)

Hematological Image Analysis for


70 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation and results

• The four proposed segmentation schemes (FLANNLS, KIRFCMLS,


KISCMLS, MBSLS) are implemented using Matlab 7.8 and
experimental simulation was performed using an Intel Core i5
3.20GHz PC, along with 2GB RAM running on Windows 7
professional operating system.
• A total of 270 lymphocyte sub images which include mature
lymphocytes and lymphoblasts constitute the entire image data
set and were used for the experimental evaluation of the
proposed schemes.

Hematological Image Analysis for


71 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

• Three experiments were conducted on the test images to


demonstrate the efficacy of all the four proposed schemes.
• In the first experiment each individual proposed scheme is
compared with four standard leukocyte segmentation schemes
such as Fuzzy Divergence (FD), Gaussian Mixture Model
(GMM), Modified Fuzzy C Means (MFCM) and Multiple Pixel
Classifier (MPC).
• The misclassification error percentage (ε) is evaluated in the
second experiment by comparing the segmentation results with
the available manual segmented test images using the
following relation:

Hematological Image Analysis for


72 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


73 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


74 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


75 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Segmentation results for lymphoblasts
using MBSLS algorithm

Hematological Image Analysis for


76 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Manual segmented images

Hematological Image Analysis for


77 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Manual segmented images

Hematological Image Analysis for


78 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Comparison of misclassification error percentage
for different methods.

Hematological Image Analysis for


79 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Variation of computational time in seconds

Hematological Image Analysis for


80 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Automated characterization of
lymphocytes for ALL diagnosis

• A novel image processing and machine


learning based approach is developed to
characterize a lymphocyte image for the
quantitative diagnosis of ALL in PBS images.

• The proposed method is used to differentiate


each lymphocyte image into a mature
lymphocyte or a lymphoblast.

Hematological Image Analysis for


81 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Proposed model for automated characterization
of lymphocytes for ALL diagnosis

82
Histology

• The total data set used for the development of the model,
comprises of peripheral blood smear (PBS) samples, and were
collected from 63 patients diagnosed with ALL and 55 control
subjects.
• 150 stained subimages of lymphocyte were obtained by the
image acquisition and sub imaging process.

Hematological Image Analysis for


83 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Morphological diagnosis of ALL
• The criteria during diagnosis or in the follow up of ALL are based
on the percentage of lymphoblast present in the peripheral blood
or bone marrow samples.
• Patients with presence of more than 20% of lymphoblast in
peripheral blood samples are diagnosed with ALL .
• Morphologically, lymphoblast is characterized by large nucleus,
having an irregular size, shape and the nucleoli are prominent.
• The cytoplasm is scarce and intensely colored in blast cell
images.
• Nucleus and cytoplasm of lymphoblast reflects morphological and
functional changes in comparison to lymphocytes, and plays a
main role in assessment of malignancy in blood samples.

Hematological Image Analysis for


84 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Visual criteria for diagnosis of ALL

Hematological Image Analysis for


85 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Quantitative measurements for diagnosis
of ALL

• The basis for the differentiation of lymphocyte from


lymphoblast can be grouped into two types of
characteristics i.e. nuclear changes (variation in shape and
size, chromatin pattern) and cytoplasmic changes (amount
of cytoplasm and protein accumulation).
• Quantitative features for nucleus and cytoplasm region of a
lymphocyte is suggested which is correlated directly with
the actual cytological features, and aides in the computer
processing of lymphocyte images.

Hematological Image Analysis for


86 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Computed shape features

87
Computed texture and color features

88
Shape measurements

• Form Factor (F6): Is a shape parameter derived from the basic


cellular measurements i.e. area and perimeter. It can be
mathematically defined as

• Roundness (F7): Is the degree to which the nucleus shape differs


from that of a circle and can be defined as

• Length–Diameter ratio (F8): Length to diameter (L/D) ratio is


the ratio of the major axis length and minor axis length of the
nucleus region.
• Compactness (F9): Is a numerical measure representing the
degree to which a shape is compact and is mathematically
represented as.

Hematological Image Analysis for


89 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Shape measurements

• Length–Diameter ratio (F8): Length to diameter (L/D) ratio is


the ratio of the major axis length and minor axis length of the
nucleus region.

• Compactness (F9): Is a numerical measure representing the


degree to which a shape is compact and is mathematically
represented as.

Hematological Image Analysis for


90 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Nucleus boundary roughness
measurements

• Nuclear boundary irregularity is an


important diagnostic feature of ALL.

• To measure such deformation accurately


in quantitative manner fractal geometry
and contour signature can be used.

Hematological Image Analysis for


91 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Fractal geometry

• Even though many fractal properties have been


defined, Hausdorff dimension (HD) is one of the most
important since it provides an accurate objective
measure of boundary irregularity.
• Several approaches for the estimation of HD or Df
(F10) is available in the literature, however the most
common among them used in biological sciences may
be summarized as follows.

• Modified pixel dilation


• Perimeter–area
• Ruler counting
• Box counting

Hematological Image Analysis for


92 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Box counting method

• The box counting method is more popular due to its easier


implementation and is based on self–similarity.
• In this method, we cover boxes of different pixel length over the
digitized version of the segmented nucleus image.
• The Hausdorff dimension Df of a bounded set A in Euclidean n-
space can be derived from the relation

log N
Df =
log N (s )

Hematological Image Analysis for


93 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Box counting method

Boxes of different pixel length superimposed over the


segmented nucleus image.

Hematological Image Analysis for


94 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Box counting method example

Hematological Image Analysis for


95 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Contour signature

• The dimensionality of the representation of the contour is


reduced from two to one by converting from a coordinate–based
representation to distances from each contour point to a
reference point.
• A suitable reference point is centroid or center of mass of the
contour, whose coordinates can be defined as

where (x, y) are the coordinates of the pixels along the contour
and M is the total no of pixels on the nucleus contour.

Hematological Image Analysis for


96 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Contour signature

• The signature can be defined by the variance, skewness and


kurtosis of all the distances between centroid and nucleus
contour points that will be significantly different in benign and
malignant samples.

Hematological Image Analysis for


97 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Texture feature

• Changes in the chromatin distribution, reflects the organization


of DNA in lymphocyte nucleus, and is an essential diagnostic
descriptor for classifying malignant lymphocytes (lymphoblast)
from benign ones.
• Leishman staining of blood samples enables the visualization of
chromatin distribution of lymphocyte nucleus in form of
texture.
• Genetic modifications are responsible for textural changes and
are visible during the transition from normal to malignant.
• Such textural transformation can be quantified using Haar
wavelet and Haralick feature based methods and is presented
below.

Hematological Image Analysis for


98 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Texture feature

Wavelet texture feature: A texture feature vector for each gray scale
version of lymphocyte nucleus image consists of wavelet coefficients
obtained by taking mean and variance of An, Hn and Vn components.

Haralick texture feature: Statistical measures i.e. contrast, correlation,


homogeneity, energy and entropy are computed from the co-occurrence
matrices using different offsets.

Fourier descriptors: Feature descriptors used here for texture


quantification is based on two–dimensional DFT (Discrete Fourier
Transform). Statistics i.e. mean, standard deviation, skewness and
kurtosis are computed over the nucleus image in the frequency domain
obtained using discrete fourier transform.

Hematological Image Analysis for


99 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Color feature

• Excessive pigmentation in lymphocyte nucleus results with


hyperchromatism and is an important characteristic appearing
in malignant lymphocytes.
• Chromatin abnormality results in increased staining capacity of
nuclei.
• Such modification in DNA content of nuclei is visible in form of
change in color intensity in ALL.
• This change in color during transition from normal to malignant
is measured as mean color intensity in RGB and HSV color
space.

Hematological Image Analysis for


100 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Data Normalization

• Prior to classification, it is necessary to normalize the dataset


with dissimilar range of values.
• Combination of variables with nonuniform magnitudes results
with masking of lower magnitude data by higher magnitude
data due to the sheer magnitude of the inputs which generates
larger weights associated with them.
• Normalization is an essential procedure to transform the input
features into a similar range so that true influence of variables
can be ascertained.
• Using the below relation each input variable or feature can be
normalized as

Hematological Image Analysis for


101 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Feature selection

• Selection of an appropriate set of features is important and


strongly affects classifier performance.
• Independent sample “t”–test is one such popular approach to
determine the statistical significance of the extracted features.

• Out of all the 44 extracted features, 32 features were found to


be statistically significant (p value < 0.05) using t–test and
participate in the classification process.

Hematological Image Analysis for


102 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Classification

• We propose the use of an ensemble of classifiers for labeling


each lymphocyte subimage as benign or malignant sample
based upon a set of measured features.
• Performance of the extracted features in classification is also
tested with five other standard classifiers i.e. Naive Bayesian,
K–Nearest Neighbor, Multilayer Perceptron, Radial Basis
Functional Neural Network and Support Vector Machines.

Hematological Image Analysis for


103 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Ensemble of classifiers

• Ensemble based systems or multiple classifier systems are more


preferable than their single classifier counterparts due to several
reasons and a few important among them are

Hematological Image Analysis for


104 Acute Lymphoblastic Leukemia
Diagnosis and Classification
An ensemble of classifiers for feature
classification

Hematological Image Analysis for


105 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Proposed ensemble classifier for lymphocyte
characterization

Hematological Image Analysis for


106 Acute Lymphoblastic Leukemia
Diagnosis and Classification
K-fold cross validation
• In view of the fact that the data set used in this study is small,
k–fold cross validation resampling technique is employed for
the training and testing of the classifiers for the extracted
lymphocyte features.
• Considering the value of k as 5 the whole data set is randomly
divided into five parts, in which each class is represented in
approximately the same proportions as in the original data set.
• Three parts of the data is used for classifier training (training
set) and the rest two parts are considered for evaluation
(testing set).
• The procedure is executed a total of five times with different
combination of training and testing data.
• Finally, the five performance estimates are averaged to yield an
overall estimate of the classifier performance in terms of
accuracy, specificity and sensitivity.
Hematological Image Analysis for
107 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Performance analysis

• Performance metrics i.e. accuracy, specificity and sensitivity are


calculated from a confusion matrix which represents the
differences in opinion between the hematologist and the
classifier.

Hematological Image Analysis for


108 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


109 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


110 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


111 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results

Hematological Image Analysis for


112 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results
Plot between feature index and p–value for showing feature significance.

Hematological Image Analysis for


113 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results
Average classification accuracy of ensemble methods along with
standard classifiers over 5–fold.

Hematological Image Analysis for


114 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results
Average sensitivity of ensemble methods along with standard
classifiers over 5–fold.

Hematological Image Analysis for


115 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results
Average specificity of ensemble methods along with standard
classifiers over 5–fold.

Hematological Image Analysis for


116 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Simulation results
Computational time consumed by different classifiers for lymphocyte
characterization.

Hematological Image Analysis for


117 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Conclusion
• The proposed segmentation schemes along with a number of
reported schemes are simulated for lymphocyte and myeloblast
images.
• Performance measure like segmentation error is used to
evaluate the segmentation accuracy.
• Segmentation results are also evaluated visually.
• Altogether, the proposed segmented schemes exhibit superior
performance to their counterparts.
• For all the proposed schemes NB, KNN, MLP, RBFN, SVM and
EOC classifiers for feature classification are simulated in different
situations.
• It was found from our experimental evaluation that, the
performance of EOC is better than other classifiers in all
situations.
Hematological Image Analysis for
118 Acute Lymphoblastic Leukemia
Diagnosis and Classification
Scope for further research

• The research findings made out of this work has opened several
auxiliary research directions, which can be further investigated.
• The segmentation scheme can be enhanced by including
techniques that can lead to segmentation of overlapping cells as
well.
• The proposed schemes, which mostly deal with computer aided
diagnosis and sub classification of ALL, can be extended to AML.
• In ensemble learning classifier system, there exists multiple
classifier processes which can be executed in parallel for better
response time.
• Promising research direction to pursue is to develop an
automated prognostic scoring system for ALL.

Hematological Image Analysis for


119 Acute Lymphoblastic Leukemia
Diagnosis and Classification

You might also like