AD8703 BCV Unit II 2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

Please read this disclaimer before proceeding:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
AD8703
BASICS OF COMPUTER
VISION
UNIT II
Department: AI&DS

Batch/Year : 2020 - 2024 /IV

Created by : Dr. V. Seethalakshmi

Date : 26.07.2023
Table of Contents

S
CONTENTS PAGE NO
NO

1 Contents 1

2 Course Objectives 6
8
3 Pre Requisites (Course Names with Code)

4 Syllabus (With Subject Code, Name, LTPC details) 10

5 Course Outcomes 12

6 CO- PO/PSO Mapping 14

7 Lecture Plan 16

8 Activity Based Learning 18

9 Lecture Notes 20

Lecture Slides 40

Lecture Videos 42

10 Assignments 44

11 Part A (Q & A) 47

12 Part B Qs 51

13 Supportive Online Certification Courses 54

Real time Applications in day to day life and to 56


14
Industry

15 Contents Beyond the Syllabus 58

16 Assessment Schedule 60

17 Prescribed Text Books & Reference Books 62

18 Mini Project Suggestions 64


Course Objectives
AD8703 BASICS OF COMPUTER VISION

COURSE OBJECTIVES
To review image processing techniques for computer vision.
To understand various features and recognition techniques
To learn about histogram and binary vision
Apply three-dimensional image analysis techniques
Study real world applications of computer vision algorithms
Prerequisite
PREREQUISITE

NIL
Syllabus
AD8703 -BASICS OF COMPUTER VISION

SYLLABUS 3003

UNIT I INTRODUCTION

Image Processing, Computer Vision ,What is Computer Vision - Low-level, Mid-level,


High-level Fundamentals of Image Formation, Transformation: Orthogonal, Euclidean,
Affine, Projective, Fourier Transform, Convolution and Filtering, Image Enhancement,
Restoration, Histogram Processing.

UNIT II FEATURE EXTRACTION AND FEATURE SEGMENTATION

Feature Extraction -Edges - Canny, LOG, DOG; Line detectors (Hough Transform),
Corners -Harris and Hessian Affine, Orientation Histogram, SIFT, SURF, HOG, GLOH,
Scale-Space Analysis- Image Pyramids and Gaussian derivative filters, Gabor Filters
and DWT. Image Segmentation -Region Growing, Edge Based approaches to
segmentation, Graph-Cut, Mean-Shift, MRFs, Texture Segmentation.

UNIT III IMAGES, HISTOGRAMS, BINARY VISION

Simple pinhole camera model – Sampling – Quantisation – Colour images – Noise –


Smoothing –1D and 3D histograms - Histogram/Image Equalisation - Histogram
Comparison - Back-projection - k-means Clustering – Thresholding - Threshold
Detection Methods - Variations on Thresholding -Mathematical Morphology –
Connectivity.

UNIT IV 3D VISION AND MOTION

Methods for 3D vision – projection schemes – shape from shading – photometric


stereo – shape from texture – shape from focus – active range finding – surface
representations – point-based representation – volumetric representations – 3D object
recognition – 3D reconstruction – introduction to motion – triangulation – bundle
adjustment – translational alignment – parametric motion–spline-based motion- optical
flow – layered motion.

UNIT V APPLICATIONS

Overview of Diverse Computer Vision Applications: Document Image Analysis,


Biometrics, Object Recognition, Tracking, Medical Image Analysis, Content-Based
Image Retrieval, Video Data Processing , Virtual Reality and Augmented Reality.
Course Outcomes
COURSE OUTCOMES

CO1: Recognise and describe how mathematical and scientific concepts are
applied in computer vision.

CO2: Identify and interpret appropriate sources of information relating to


computer vision.

CO3: Apply knowledge of computer vision to real life scenarios..

CO4: Reflect on the relevance of current and future computer vision


applications

CO5: Discuss principles of computer vision using appropriate language and


terminology. Implement various I/O and file management techniques.
CO – PO/ PSO Mapping
CO-PO MAPPING

PO’s/PSO’s
COs
PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3

CO1
3 2 2 2 2 - - - 2 - - 2 2 - -
CO2
3 3 2 2 2 - - - 2 - - 2 2 - -
CO3
2 2 1 1 1 - - - 1 - - 1 2 - -
CO4
3 3 1 1 1 - - - 1 - - 1 2 - -
CO5
3 3 1 1 1 - - - 1 - - 1 3 1 -

1 – Low, 2 – Medium, 3 – Strong


Lecture Plan
LECTURE PLAN

Mode
No of Taxon
S No Topics Actual Lecture Pertaining of
periods Proposed date omy
Date CO delivery
level

FEATURE
EXTRACTION AND 24.08.20
1 1 CO2 K1 Lecture
FEATURE 23
SEGMENTATION

Edges - Canny, LOG, 24.08.20


2 1 CO2 K2 Lecture
DOG 23

Line detectors
30.08.20
3 (Hough Transform) 1 CO2 K2 Lecture
23
Corners -
30.08.20
4 Harris and Hessian 1 CO2 K2 Lecture
23
Affine

Orientation
31.08.20 Lecture
5 Histogram, SIFT, 1 CO2 K2
23
SURF, HOG, GLOH

Scale-Space
Analysis & Mean- 4.09.202 Lecture
6 1 CO2 K2
Shift, MRFs, Texture 3
Segmentation
Image Pyramids and
Gaussian derivative Lecture
4.09.202
7 filters, Gabor Filters 1 CO2 K2
3
and DWT

Image Segmentation Lecture


-Region Growing,
07.09.20
8 Edge Based 1 CO2 K2
23
approaches to
segmentation
Revision – Quiz ICT
Activity 07.09.20
9 1 CO2 K2 Tools
23
Activity Based Learning
ACTIVITY BASED LEARNING

UiPath.CV.Activities.CVGetTextWithDescriptor
https://scholarworks.calstate.edu/downloads/hh63sx58j
Lecture Notes
2. FEATURE EXTRACTION
FEATURE EXTRACTION using edge detection techniques such as Canny,
Laplacian of Gaussian (LOG), and Difference of Gaussians (DOG) can be
useful in various computer vision and image processing applications. Let's
discuss each of these methods:

2.1. Canny Edge Detection:


The Canny edge detection algorithm is widely used due to its effectiveness in
detecting edges with low error rates. It consists of the following steps:

a. Gaussian Smoothing: The input image is convolved with a Gaussian


filter to reduce noise.

b. Gradient Computation: The gradient magnitude and direction are


calculated using derivative operators (usually Sobel operators) to determine
the intensity change across the image.

c. Non-maximum Suppression: Only the local maxima in the gradient


direction are retained as potential edge pixels. This step helps to thin out the
edges.

d. Double Thresholding: Two threshold values are used to identify strong


and weak edges. Weak edges that are connected to strong edges are also
considered as edges.

e. Edge Tracking by Hysteresis: Weak edges are either discarded or


connected to strong edges based on connectivity. This step helps to eliminate
weak noise edges.
The Canny edge detection algorithm provides accurate and well-
connected edges.
2.1 Laplacian of Gaussian (LOG):
The Laplacian of Gaussian is an edge detection method that combines
Gaussian smoothing and the Laplacian operator. The steps involved in
LOG edge detection are as follows:

a. Gaussian Smoothing: The input image is convolved with a


Gaussian filter to reduce noise.

b. Laplacian Operator: The Laplacian operator is applied to the


smoothed image to detect regions of rapid intensity changes.

c. Zero Crossing: Zero-crossings in the Laplacian response are


identified as potential edges.

LOG edge detection can be sensitive to noise, and it often produces


thicker edges compared to the Canny edge detector.
2.3. Difference of Gaussians (DOG):
The Difference of Gaussians method is another edge detection technique
that involves subtracting two differently scaled Gaussian-blurred versions
of the input image. The DOG steps are as follows:

a. Gaussian Smoothing: The input image is convolved with two


Gaussian filters of different standard deviations.
b. Subtract Images: The smoothed images are subtracted to obtain the
Difference of Gaussians.
c. Zero Crossing: Zero-crossings in the Difference of Gaussians response are
identified as potential edges.

DOG edge detection is similar to LOG but can be computationally faster. It is


commonly used for real-time applications.

These edge detection techniques can be used as a pre-processing step for


various computer vision tasks such as object detection, image segmentation, and
feature matching. The choice of the method depends on the specific
requirements of the application, the characteristics of the input images, and the
trade-off between accuracy and computational efficiency.
Line detection
2.2 LINE DETECTION
LINE DETECTION is a fundamental technique in computer vision and image
processing used to identify straight lines in an image. The Hough Transform is a
popular method for line detection that was developed by Richard Duda and Peter
Hart in 1972. It is widely used due to its simplicity and robustness.
The Hough Transform works by representing lines in an image as points in a
parameter space called the Hough space. Each point in the Hough space
corresponds to a line in the image. The Hough Transform algorithm follows these
steps:
2.2.1. Edge Detection:
The first step is typically to perform edge detection on the input image using
techniques such as the Canny edge detector or the Sobel operator. This step
highlights the edges in the image, which are important for line detection.
2.2.2 HOUGH SPACE INITIALIZATION:

Create an accumulator array, or the Hough space, which is a two-dimensional


parameter space. The dimensions of this space correspond to the parameters of a
line equation. The most common representation is the Hough space with polar
coordinates, where one dimension represents the distance from the origin (rho)
and the other represents the angle with respect to a reference axis (theta).

2.2.3. VOTING:

For each edge pixel in the edge image, calculate the corresponding lines
in the Hough space by varying the parameters rho and theta. Increment
the accumulator array cells that correspond to these lines. This process is
known as voting.

2.2.4. THRESHOLDING:

After the voting process, the Hough space contains peaks that represent lines in
the input image. Apply a threshold to identify the most significant peaks, which
correspond to the most likely lines.

2.2.5. LINE EXTRACTION:

Convert the significant peaks back to the parameter space of the input image, and
extract the corresponding lines. This step involves finding the intersection points of
the peaks in the Hough space and converting them into lines in the original image.

2.2.6. POST-PROCESSING:

Finally, you can perform additional post-processing steps such as line merging, line
filtering, or line fitting to improve the accuracy and quality of the detected lines.
The Hough Transform is widely used for applications such as lane
detection in autonomous driving, shape detection, and feature
extraction in image analysis. It is a powerful technique for line
detection, although it may not be the most efficient method for real-
time applications due to its computational complexity.

2.3 CORNERS

Corners are important features in computer vision and image


processing. They represent points in an image where the intensity or
color changes significantly in multiple directions. Corner detection
algorithms aim to identify these points, as they are often useful for
various tasks such as image stitching, object recognition, and tracking.
Two popular corner detection algorithms are the Harris Corner Detector
and the Hessian Affine Detector.

2.3.1. HARRIS CORNER DETECTOR:

The Harris Corner Detector, introduced by Chris Harris and Mike


Stephens in 1988, is a widely used algorithm for corner detection. It
operates by analyzing the local intensity variations in different
directions within a small neighborhood of each pixel in the image. The
steps involved in the Harris Corner Detector are as follows:

a. Preprocessing:

Convert the image to grayscale if necessary.


b. Compute gradients:

Calculate the x- and y-derivatives of the image using techniques like


the Sobel operator.

c. Structure tensor:

Construct the structure tensor for each pixel in the image. It measures
the local intensity variations in different directions.

d. Corner response function:

Calculate a corner response function based on the eigenvalues of the


structure tensor. The eigenvalues indicate the strength and
directionality of the intensity variations.

e. Non-maximum suppression:

Suppress non-maximum corner responses to eliminate multiple corner


candidates in close proximity.

f. Thresholding:

Apply a threshold to the corner responses to select the most significant


corners.

g. Optional: Perform additional steps such as corner refinement or


sub-pixel accuracy estimation.

The Harris Corner Detector is known for its simplicity and effectiveness in
detecting corners. However, it may struggle with scale and rotation
changes in images.
2.4 HESSIAN AFFINE DETECTOR

The Hessian Affine Detector, proposed by Krystian Mikolajczyk and


Cordelia Schmid in 2004, is a corner detection algorithm that aims to
handle scale and affine transformations. It is an extension of the Harris
Corner Detector that incorporates information about the local structure
of image features. The main steps involved in the Hessian Affine
Detector are as follows:

a. Scale-space extrema detection:

Build a scale-space representation of the image using


techniques like Gaussian blurring at multiple scales. Detect the local
extrema in the scale-space to identify potential feature locations.

b. Hessian matrix computation:

Compute the Hessian matrix for each detected extrema,


which represents the local image structure. The Hessian matrix
captures the second-order derivatives of the image intensity.

c. Affine adaptation:

Adapt the detected extrema to affine transformations by


analyzing the eigenvalues and eigenvectors of the Hessian matrix.

d. Affine region selection:

Select the affine regions based on criteria such as stability,


contrast, and edge responses.

e. Orientation assignment:

Assign an orientation to each affine region to make them invariant to


rotation.
f. Optional:
Perform additional steps like feature descriptor computation or matching for
further analysis.
The Hessian Affine Detector is well-suited for detecting corners under various
transformations, making it useful for applications like object recognition and
image matching.

Both the Harris Corner Detector and the Hessian Affine Detector are widely used
corner detection algorithms, each with its strengths and limitations. The choice of
algorithm depends on the specific requirements of the application at hand.
2.5.ORIENTATION HISTOGRAM:
An orientation histogram is a representation of the distribution of gradient
orientations in an image or a local image patch. It is commonly used in feature
extraction algorithms to capture the dominant orientations of image features. The
orientation histogram is constructed by dividing the range of gradient orientations
into multiple bins and accumulating the gradient magnitudes into these bins based
on their respective orientations. This histogram provides information about the
spatial distribution and strength of different orientations within an image or a
feature region.
2.5.1 SIFT (Scale-Invariant Feature Transform):

SIFT is a popular feature extraction algorithm introduced by David Lowe in 1999.

It is widely used for tasks such as object recognition, image stitching, and image

retrieval. SIFT features are distinctive and invariant to changes in scale, rotation,

and affine transformations. The main steps of the SIFT algorithm include:
2.5.1 Scale-space extrema detection:

Identify keypoint candidates at multiple scales by detecting local


extrema in the difference-of-Gaussian (DoG) pyramid, which is constructed
by convolving the image with Gaussian filters at different scales.

2.5.2 Keypoint localization:

Refine the keypoint locations by eliminating unstable


keypoints using a threshold on the DoG response and rejecting
keypoints on edges.

2.5.3. Orientation assignment:

Assign an orientation to each keypoint based on the


gradient orientations in its local neighborhood. This step
contributes to the rotational invariance of SIFT features.

2.5.4. Feature descriptor computation:

Construct a descriptor for each keypoint by considering the


gradient magnitudes and orientations in a local image patch
around the keypoint. The descriptor captures information about
the local image structure and is robust to changes in scale and
rotation.
2.6 SURF (SPEEDED-UP ROBUST FEATURES):

SURF is a feature extraction algorithm introduced by Herbert Bay et al. in


2006. It is designed to be efficient and robust to various image
transformations. SURF features are computed based on the sum of Haar-
like wavelet responses, allowing for fast computation. The main steps of
the SURF algorithm include:

1. Scale-space extrema detection:

Similar to SIFT, SURF detects keypoints at multiple scales using


the DoG pyramid.

2. Keypoint localization:

Refine the keypoints by considering the Hessian determinant at


each candidate location. Keypoints with low Hessian responses or high
curvature are eliminated.

3. Orientation assignment:

Assign an orientation to each keypoint by computing the Haar


wavelet responses in the keypoint's neighborhood. This step provides
rotational invariance.

4. Feature descriptor computation:

Construct a descriptor for each keypoint by considering the Haar


wavelet responses in a local region around the keypoint. The descriptor
captures information about the gradient magnitudes and orientations in
different directions.
2.7 HOG (Histogram of Oriented Gradients):

HOG is a feature descriptor introduced by Navneet Dalal and Bill Triggs in


2005. It is commonly used for object detection and pedestrian detection
tasks. HOG features capture the distribution of gradient orientations in an
image or a local image patch. The main steps of the HOG algorithm
include:

2.7.1. Gradient computation:

Compute the gradient magnitudes and orientations of the image


using techniques like the Sobel operator.

2.7.2. Block division: Divide the image or the local region into regular
blocks.

2.7.3. Orientation histogram computation:

Construct an orientation histogram for each block by


accumulating the gradient magnitudes into multiple bins based on their
orientations.

2.7.4. Normalization:

Normalize the histograms within each block or a larger spatial


region.
2.8 IMAGE PYRAMIDS AND GAUSSIAN DERIVATIVE FILTERS

Image pyramids and Gaussian derivative filters are two common


techniques used in computer vision and image processing. Let's discuss
each of them in more detail:

2.8.1 Image Pyramids:

An image pyramid is a multi-scale representation of an image that consists


of a series of images at different scales. The idea behind image pyramids
is to create a set of images where each subsequent image is a down
sampled version of the previous one. This creates a hierarchical structure
where the top-level image is the original image, and the subsequent levels
represent images at reduced scales.

Image pyramids are useful for various tasks, such as image blending,
image resizing, feature extraction, and object detection. By working with
images at different scales, it becomes possible to handle objects of
different sizes and capture both fine details and global context.

There are two types of image pyramids: Gaussian pyramids and


Laplacian pyramids. Gaussian pyramids are created by applying a
Gaussian smoothing filter to the original image and then downsampling it.
Each level of the pyramid represents a blurred and downsampled version
of the previous level. Laplacian pyramids, on the other hand, are formed
by taking the difference between each level of the Gaussian pyramid and
its expanded version.
2.8.2 GAUSSIAN DERIVATIVE FILTERS

Gaussian derivative filters, also known as derivative of Gaussian (DoG) filters or


Gaussian gradient filters, are used for detecting edges and computing image gradients.
They are derived from the Gaussian function, which is a widely used smoothing filter in
image processing.

The Gaussian derivative filters are obtained by taking the derivative of the Gaussian
function with respect to the image coordinates. The filters are commonly used to
compute the first-order derivatives (gradients) of an image in the horizontal and
vertical directions. By convolving an image with these filters, you can estimate the
image gradients, which provide information about the intensity changes and edges in
the image.

The Gaussian derivative filters are often used in various computer vision tasks, such as
edge detection, feature extraction, and image enhancement. The magnitude and
orientation of the gradients obtained from these filters can be used to detect edges,
corners, and other image features.

In summary, image pyramids and Gaussian derivative filters are important tools in
computer vision and image processing. Image pyramids provide a multi-scale
representation of an image, while Gaussian derivative filters are used for edge
detection and gradient computation. Both techniques have applications in various
image analysis and computer vision tasks.
2.8.3 Gabor filters and the discrete wavelet transform (DWT)
Gabor filters and the discrete wavelet transform (DWT) are two
important techniques used in signal and image processing. Let's discuss
each of them in more detail:

2.8.3.1. Gabor Filters:

Gabor filters are linear filters used for analyzing and processing signals,
particularly in the field of image analysis. They are named after Dennis
Gabor, who introduced them in the 1940s. Gabor filters are widely used in
various computer vision tasks, including texture analysis, feature
extraction, and object recognition.

A Gabor filter is a combination of a Gaussian envelope and a sinusoidal


wave, which makes it both localized in space and frequency. The filter is
defined by its center frequency, bandwidth, orientation, and spatial aspect
ratio. By varying these parameters, Gabor filters can be designed to
respond selectively to specific patterns or features in an image.

Gabor filters are particularly effective in capturing texture information in


images. They are often used in tasks like texture classification, texture
segmentation, and image denoising. Gabor filter responses can be
computed by convolving the filter with the input image, resulting in
response maps that highlight the presence of specific spatial-frequency
patterns.
2.8.3.2 Discrete Wavelet Transform (DWT):

The discrete wavelet transform (DWT) is a mathematical technique used


for analyzing signals and images in both time and frequency domains. It
decomposes a signal or image into a set of wavelet coefficients, which
represent different scales and orientations in the data.

Unlike the Fourier transform, which provides a global frequency analysis,


the DWT offers a multi-resolution analysis by decomposing the signal or
image into different frequency bands. It captures both low-frequency and
high-frequency details of the data, making it suitable for tasks like image
compression, denoising, and feature extraction.

The DWT operates by passing the signal or image through a series of low-
pass and high-pass filters, followed by down sampling. The low-pass filters
capture the coarse-scale information, while the high-pass filters capture
the fine-scale details. This process is recursively applied to the resulting
approximation coefficients, creating a hierarchical structure of wavelet
coefficients at different scales.

The DWT has several advantages, including its ability to represent both
localized features and global properties of the data. It allows for efficient
compression by achieving high data compression ratios while preserving
important features. The DWT has found applications in various fields, such
as image and video compression, signal denoising, feature extraction, and
data analysis.
In summary, Gabor filters and the discrete wavelet transform (DWT) are
powerful techniques used in signal and image processing. Gabor filters are
effective for texture analysis and feature extraction, while the DWT
provides a multi-resolution analysis of signals and images, enabling tasks
like compression, denoising, and feature extraction.

2.9 IMAGE SEGMENTATION

Image segmentation is the process of partitioning an image into


meaningful regions or segments. It is a fundamental task in computer
vision and image processing, with applications in various fields such as
object recognition, image understanding, and medical imaging. Two
common approaches to image segmentation are region growing and edge-
based segmentation. Let's discuss each of them:

2.9.1 1. Region Growing:

Region growing is a bottom-up approach to image segmentation that


starts with a seed point or a set of seed points and iteratively grows
regions by adding neighboring pixels that satisfy certain similarity criteria.
The basic idea is to group together pixels that have similar characteristics,
such as intensity, color, texture, or other image features.

The region growing algorithm typically proceeds as follows:

- Initialize the seed points and create an empty region for each seed point.

- Select a seed point and add it to its corresponding region.


- Examine the neighboring pixels of the seed point and check if they meet
the similarity criteria to be added to the region.

- If a neighboring pixel satisfies the criteria, add it to the region and


recursively check its neighbors.

- Repeat the process until no more pixels can be added to any region.
Region growing can produce accurate segmentations when the regions
have distinct and homogeneous characteristics. However, it can be
sensitive to the initial seed points and may result in over-segmentation or
under-segmentation if the similarity criteria are not properly defined.

2.9.1.2. Edge-Based Approaches:

Edge-based segmentation techniques focus on detecting and capturing the


boundaries or edges between different regions in an image. The
underlying assumption is that the intensity or color transitions across
regions tend to be abrupt, leading to pronounced edges.

Edge-based segmentation algorithms typically involve the following steps:

- Detect edges in the image using edge detection techniques such as the
Canny edge detector or the Sobel operator.

- Enhance and refine the detected edges using techniques like edge
thinning, edge linking, or edge preserving smoothing.

- Apply a region growing or region merging algorithm to group the pixels


along the detected edges into coherent regions.
- Post-process the resulting regions to remove noise or small spurious
regions and refine the boundaries.

Edge-based segmentation methods can be effective when the regions of


interest have clear and well-defined boundaries. However, they may
struggle when regions have similar or gradual intensity transitions, or
when there are complex textures or cluttered backgrounds.
In practice, a combination of region growing and edge-based approaches,
along with other techniques like clustering, graph cuts, or machine
learning, is often used to achieve more accurate and robust image
segmentation results. The choice of the segmentation method depends on
the specific characteristics of the images and the segmentation
requirements of the application at hand.

Graph-Cut, Mean-Shift, MRFs (Markov Random Fields), and Texture


Segmentation are all popular techniques used in image processing and
computer vision for image segmentation tasks. Let's briefly discuss each of
these techniques:
2.10 GRAPH-CUt:

Graph-Cut is an algorithm that uses graph theory to partition an image


into multiple segments based on certain criteria. It represents the image
as a graph, where each pixel is a node, and the edges between nodes
represent the similarity between pixels. Graph-Cut aims to find the optimal
cut in the graph that minimizes the energy function, which is defined
based on the similarity of neighboring pixels and the dissimilarity between
different segments. It efficiently combines local and global information to
achieve accurate segmentation results.

2.11 Mean-shift:

Mean-Shift is an iterative clustering algorithm that is commonly used for


image segmentation and object tracking. It operates by shifting each pixel
in the image towards the mode (peak) of the pixel's feature space. The
feature space can include attributes such as color, texture, or other
relevant image properties. Pixels that converge to the same mode are
considered part of the same segment. Mean-Shift is particularly effective
in handling non-parametric and irregularly shaped distributions.
2.12 MRFs (Markov Random Fields):

Markov Random Fields are probabilistic models used for image segmentation
and other computer vision tasks. MRFs model the relationship between pixels
as a graph, where each pixel is a node, and edges represent the dependencies
between neighboring pixels. MRFs define an energy function that quantifies
the compatibility between neighboring pixels and assigns higher energy to less
likely configurations. The segmentation problem is then formulated as finding
the configuration of labels (segments) that minimizes the energy function.
MRFs provide a framework for incorporating both local and global information
into the segmentation process.

2.13 Texture Segmentation:

Texture Segmentation aims to partition an image based on its underlying


texture properties. Texture refers to the visual patterns and structures in an
image. Texture segmentation techniques focus on analyzing the statistical
properties of local image regions to distinguish between different textures.
Common approaches include using texture descriptors, such as co-occurrence
matrices or Gabor filters, to capture texture characteristics. These descriptors
are then used in clustering or classification algorithms to group similar texture
regions together.

These techniques are widely used in various applications, including medical


imaging, object recognition, and scene understanding, to extract meaningful
information from images by dividing them into coherent and semantically
meaningful regions or objects.
Lecture Slides
Lecture Slides

Lecture Slides
Lecture Videos
Lecture Videos

Lecture Videos
Assignment
Assignment

Assignments
1. Gaussian filtering. Gradient magnitude. Canny edge detection
2. Detecting interest points. Simple matching of features.
3.Stereo correspondence analysis.
4.Photometric Stereo.
Each assignment contains:
1. paper. Discusses theory, task, methods, results.
2. src folder
code
README file for instructions on how to run code
Part A Q & A
PART -A

1. What is the key property of the Canny edge detector?


The Canny edge detector is known for detecting edges while
simultaneously providing good localization and minimal response to
noise.

2. What is the primary use of the Harris corner detector?


The Harris corner detector is used to identify and locate corners or
interest points in an image, which is important for various computer
vision tasks like object recognition and tracking.

3. Define the Scale-Invariant Feature Transform (SIFT).


SIFT is a feature extraction technique that identifies scale-invariant
keypoints in an image, providing a robust representation for matching
and recognizing objects in different scales and orientations.

4. What is the purpose of image pyramids in scale-space analysis?


Image pyramids are used to create a multi-scale representation of an
image, which allows for detecting features at different scales and helps
in handling objects of varying sizes in the image.

5. Explain the concept of Graph-Cut in image segmentation.


Graph-Cut is a segmentation technique that models an image as a
graph and finds the optimal cuts in the graph to separate regions based
on certain criteria, often leading to accurate and coherent
segmentations.

48
PART -A

6. Name one edge-based approach to image segmentation.


One edge-based approach to image segmentation is the "Canny edge
detector.“
7. What is the primary purpose of the Canny edge detector?
The primary purpose of the Canny edge detector is to accurately detect
edges in an image while minimizing noise.

8. What property of corners does the Harris corner detector exploit?


The Harris corner detector exploits the property that corners have
significant intensity variations in all directions.

9. Briefly explain the concept of the Scale-Invariant Feature Transform


(SIFT).
SIFT is a feature extraction technique that identifies and describes
distinctive local features in an image, which are invariant to scale and
rotation.

10. What does HOG stand for and how is it used in computer vision?
HOG stands for Histogram of Oriented Gradients. It is used in computer
vision for object detection by describing the distribution of gradient
orientations in an image.

49
PART -A

11. What is the primary purpose of image pyramids in scale-space


analysis?
Image pyramids are used to create a multi-scale representation of an
image, which helps in detecting features of different sizes in the image.

12. How does the Mean-Shift algorithm contribute to image


segmentation?
The Mean-Shift algorithm is used for clustering data points, which in
the context of image segmentation helps in grouping pixels with similar
characteristics into regions.

13. Name one edge-based approach to image segmentation.


The Canny edge detector is an example of an edge-based approach to
image segmentation.

14. Define the term "Graph-Cut" in the context of image segmentation.


Answer: Graph-Cut is a segmentation technique that involves modeling
an image as a graph and finding optimal cuts in the graph to separate
regions.

15. In what type of image segmentation scenario is Region Growing


suitable?
Answer: Region Growing is suitable for segmenting images where
regions have similar characteristics and can be grown from a seed point
based on similarity criteria.
50
PART -A

16. Explain the role of Gabor filters in scale-space analysis.


Gabor filters are used in scale-space analysis to capture both frequency
and orientation information from an image, making them suitable for
tasks like texture analysis.
17. Mention one application scenario where Region Growing is a
suitable image segmentation method.
Region Growing is suitable for segmenting images where regions have
similar characteristics and can be grown from a seed point based on
similarity criteria. One application scenario is medical image analysis,
such as segmenting tumors in medical scans.
18. Describe the primary difference between Gaussian derivative filters
and Gabor filters in image analysis.
Gaussian derivative filters capture the gradient information at different
scales, primarily for edge detection, while Gabor filters capture both
frequency and orientation information, making them suitable for tasks
such as texture analysis and feature detection.
19. Explain the concept of the Discrete Wavelet Transform (DWT) in the
context of scale-space analysis.
DWT is a technique in scale-space analysis that decomposes an image
into different scales (frequencies) using wavelet functions. It allows
the analysis of signal content at different resolutions and is useful for
tasks such as image compression and denoising.

51
Part B Q
5
3
PART-B
1. List the types of noise. Consider that image is corrupted by
Gaussian noise. Suggest suitable method to minimize Gaussian noise
from the image and explain that method. (K3) (CO2)
2.Types of edge detection. & Explain Sobel operator to detect edges.
(K3) (CO2)

3.What is Edge detection? Explain canny edge detection algorithm


and write a MATLAB code to implement this algorithm. (K3) (CO2)

4.What is descriptor? Explain SIFT descriptor in detail. (K3) (CO2)

5.Explain shape context descriptors. & HOG descriptors. (K2) (CO2)

5.What is corner detection? Explain Moravec corner detection


algorithm and write aMATLAB code to implement this algorithm. (K3)
(CO2)

6.What is the morphological operation and its significance ? explain


any 3 of them with examples. (K3) (CO2)

7.Discuss Harris corner detection method in detail. (K2) (CO2)


5
4
PART-B
8.Discuss active contour technique for Segmentation. (K2) (CO2)

9.What is segmentation? Define Normalized cut & Explain graph


based segmentation in detail. (K3) (CO2)

10.Explain region splitting and region merging in image


segmentation. (K2) (CO2)

11.Discuss region splitting and region merging image segmentation


method in brief.6.mean shift and model finding. (K2) (CO2)
Supportive Online
Certification courses
5
6

SUPPORTIVE ONLINE COURSES

Course
S No Course title Link
provider

https://www.udemy.co
Computer vision applies m/topic/computer-
machine learning.
1 Udemy vision/

https://www.udacity.co

Introduction to Computer m/course/introduction-


2 Udacity Vision to-computer-vision--
ud810

https://www.coursera.o
Advanced Computer Vision rg/learn/advanced-
3 Coursera with TensorFlow
computer-vision-with-
tensorflow
Computer Vision and https://www.edx.org/lear
Image Processing n/computer-
Fundamentals programming/ibm-
edX
4 computer-vision-and-
image-processing-
fundamentals?webview=
false&campaign=Comput
er+Vision+and+Image+
Processing+Fundamental
s&source=edx&product_
category=course&placem
ent_url=https%3A%2F%
2Fwww.edx.org%2Flearn
%2Fcomputer-vision
Real life Applications in
day to day life and to
Industry
5
8

REAL TIME APPLICATIONS IN DAY TO DAY LIFE

AND TO INDUSTRY

1.Explain the role of an computer vision applications in the most prominent industries
including agriculture, healthcare, transportation, manufacturing, and retail. (K4, CO2)
Content beyond
Syllabus
6
0

Contents beyond the Syllabus

Basics of Computer Vision

Reference Video – Content Beyond Syllabus

https://www.youtube.com/watch?v=2w8XIskzdFw
Assessment Schedule
ASSESSMENT SCHEDULE

FIAT
Proposed date :04.09.2023
Prescribed Text books &
Reference books
PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS

TEXT BOOKS
D. A. Forsyth, J. Ponce, “Computer Vision: A Modern Approach”,
Pearson Education,
2003.
2. Richard Szeliski, “Computer Vision: Algorithms and Applications”,
Springer Verlag London Limited,2011.
REFERENCE BOOKS
B. K. P. Horn -Robot Vision, McGraw-Hill.
Simon J. D. Prince, Computer Vision: Models, Learning, and
Inference, Cambridge University Press, 2012.
Mark Nixon and Alberto S. Aquado, Feature Extraction & Image
Processing for Computer Vision, Third Edition, Academic Press,
2012.
E. R. Davies, (2012), “Computer & Machine Vision”, Fourth Edition,
Academic Press.
Concise Computer Vision: An Introduction into Theory and
Algorithms, by Reinhard Klette,2014
Mini Project
Suggestions
MINI PROJECT SUGGESTIONS

1. Real-Time Edge Detection using OpenCV


Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.

You might also like