Computer Vision System Toolbox - MATLAB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Computer Vision System Toolbox- MATLAB

Built-in Detector Classes:

1. vision.BlobAnalysis
2. vision.CascadeObjectDetector
3. vision.ForegroundDetector
4. vision.PeopleDetector

1. vision.BlobAnalysis
Definition
A blob is collection of connected pixels. The BlobAnalysis object computes statistics for connected regions in a binary
image.

We use step() function to computer descriptions of blobs in a binary image. The step method computes and returns
statistics of the input binary image depending on the property values specified.

Syntax of Construction of Blob Detector Object:

H = vision.BlobAnalysis

It returns a blob analysis System object, H, used to compute statistics for connected regions in a binary image.

Syntax of Using Step Function:

[AREA, CENTROID, BBOX] = step (H, BW)

where H is blob detector object and BW is Binary Image.

• Step function returns the area, centroid and the bounding box of the blobs when the AreaOutputPort,
CentroidOutputPort and BoundingBoxOutputPort properties are set to true.
• These are the only properties that are set to true by default.
• If you set any additional properties to true, the outputs come after the AREA, CENTROID, and BBOX outputs.
e.g. [AREA, CENTROID, BBOX, 4th output, 5th output, and so on]

Properties
We can set some properties of blob detector object to true to get addition information along with AREA, CENTEROID and
BBOX. For example, if we set Connectivity property to true, we get which pixels are connected to each other. For
example, [AREA, CENTROID, BBOX, Connectivity] = step (H, BW)

Some important properties are:

Properties Description
AreaOutputPort Returns blob area. The default is true.
CentroidOutputPort Return coordinates of blob centroids. The default is true.
BoundingBoxOutputPort Return coordinates of bounding boxes. The default is true.
MajorAxisLengthOutputPort Returns a vector whose values represent lengths of ellipses' major axes. The default
is false.
MinorAxisLengthOutputPort Returns a vector whose values represent lengths of ellipses' minor axes. The default is
false.
OrientationOutputPort Return vector whose values represent angles between ellipses' major axes and x-axis.
PerimeterOutputPort Return a vector whose values represent estimates of the perimeter lengths, in pixels,
of each blob.
Connectivity Tells that which pixels are connected to each other.
MinimumBlobArea Minimum blob area in pixels. The default is 0. This property is tunable.
MaximumBlobArea Maximum blob area in pixels. The default is intmax('uint32'). This property is tunable
ExcludeBorderBlobs Exclude blobs that contain at least one image border pixel

Example
Find the centroid of a blob.

hblob = vision.BlobAnalysis;
hblob.AreaOutputPort = false;
hblob.BoundingBoxOutputPort = false;
img = logical ([0 0 0 0 0 0; ...
0 1 1 1 1 0; ...
0 1 1 1 1 0; ...
0 1 1 1 1 0; ...
0 0 0 0 0 0]);
centroid = step(hblob, img); % [x y] coordinates of the centroid

2. vision.CascadeObjectDetector
Description
The cascade object detector uses the Viola-Jones algorithm to detect people’s faces, noses, eyes, mouth, or upper body.

To detect facial features or upper body in an image:

1. Create the vision.CascadeObjectDetector object and set its properties. hblob = vision.BlobAnalysis;
2. Call the object with arguments, as if it were a function. [area,centroid,bbox] = step(hblob, img);

We can train cascade detector to detect any type of object. MATLAB has trained this to detect facial features by default.

Syntaxes
detector = vision.CascadeObjectDetector

-It creates a detector to detect faces using the Viola-Jones algorithm.

detector = vision.CascadeObjectDetector(model)

-It creates a detector configured to detect objects defined by the input model, e.g. ‘Nose’, ‘Mouth’, etc.

detector = vision.CascadeObjectDetector(XMLFILE)

-It creates a detector and configures it to use the custom classification model specified with the XMLFILE input.

detector = vision.CascadeObjectDetector(Name,Value)

-It creates a cascade object detector object, detector, with additional options specified by one or more
Name,Value pair arguments, e.g. vision.CascadeObjectDetector('ClassificationModel','UpperBody')
Classification Models
There are some pre-trained models my MATLAB to detect facial features. By default, the detector is configured to detect
faces. Write the model name while creating detector object, e.g. detector=vision.CascadeObjectDetector('EyePairBig')

Classification Model Model Description


'FrontalFaceCART' (Default) Detects faces that are upright and forward facing.
'FrontalFaceLBP' Detects faces that are upright and forward facing.
'UpperBody' Detects the upper-body region, which is defined as the head and shoulders area.
'EyePairBig' Detects a pair of eyes.
'EyePairSmall'
'LeftEye' Detects the left and right eye separately.
'RightEye'
'ProfileFace' Detects upright face profiles.
'Mouth' Detects the mouth.
'Nose' Detects the nose.

Properties
There are some properties we can set to change the behavior of the detection algorithm:

Properties Default Value Description


MinSize size of the image Size of smallest detectable object specified as a two-element vector
used to train the [height width]. Set this property in pixels for the minimum size region
classification model containing an object.
MaxSize size of the input Size of largest detectable object specified as a two-element vector
image [height width]. Specify the size in pixels of the largest object to
detect. Use this property to reduce computation time when you
know the maximum object size prior to processing the image.
ScaleFactor 1.1 Specified as a value greater than 1.0001. The scale factor
incrementally scales the detection resolution between MinSize and
MaxSize.
Use this formula to find suitable ScaleFactor value:
size(img)/(size(img)-0.5)
MergeThreshold 4 It defines the criteria needed to declare a final detection in an area
where there are multiple detections around an object, e.g. same face
detected twice.
Increasing this threshold may help suppress false detections.
UseROI false Use region of interest, specified as true or false. Set this property to
true to detect objects within a rectangular region of interest within
the input image.

Output of Detector
The CascadeObjectDetector give only one output. It returns a M-by-4 element matrix. Each row of the output matrix
contains a four-element vector, [x y width height], that specifies in pixels, the upper-left corner and size of a bounding
box.
Usage/Example
faceDetector = vision.CascadeObjectDetector;
I = imread('visionteam.jpg');
bboxes = faceDetector(I);
IFaces = insertObjectAnnotation(I,'rectangle',bboxes,'Face');
Figure;
imshow(IFaces);
title ('Detected faces');

3. Camera Calibration
Geometric camera calibration, also referred to as camera re-sectioning, estimates the parameters of a lens and image
sensor of an image or video camera.

You can use these parameters to:

• Correct lens distortion


• Measure the size of an object in world units
• Determine the location of the camera in the scene

These tasks are used in applications such as:

• Machine vision to detect and measure objects


• Used in robotics
• Navigation systems
• 3-D scene reconstruction

Camera parameters include 3 entities:

1. Intrinsic parameters
2. Extrinsic parameters
3. Distortion coefficients

How to Calibrate
We calibrate camera using multiple images of a calibration pattern, such as a checkerboard. Using the correspondences,
you can solve for the camera parameters. After you calibrate a camera, to evaluate the accuracy of the estimated
parameters, you can:

• Plot the relative locations of the camera and the calibration pattern
• Calculate the reprojection errors.
• Calculate the parameter estimation errors.

Camera Model
The Computer Vision System Toolbox™ calibration algorithm uses the camera model proposed by Jean-Yves Bouguet.

The model includes two things:

• The pinhole camera model (have no lens)


• Lens distortion (lens is added)
The Pinhole Camera Model
A pinhole camera is a simple camera without a lens and with a single small aperture. Light rays pass through the
aperture and project an inverted image on the opposite side of the camera.

The pinhole camera parameters are represented in a 4-by-3 matrix called the camera matrix. This matrix maps the 3-D
world scene into the image plane. The calibration algorithm calculates the camera matrix using the extrinsic and intrinsic
parameters. The extrinsic parameters represent the location of the camera in the 3-D scene. The intrinsic parameters
represent the optical center and focal length of the camera.

• The world points are transformed to camera coordinates using the extrinsics parameters.
• The camera coordinates are mapped into the image plane using the intrinsics parameters.

Camera Calibration Parameters Explained


The calibration algorithm calculates the camera matrix using the extrinsic and intrinsic parameters.

• The extrinsic parameters represent a rigid transformation from 3-D world coordinate system to the 3-D camera’s
coordinate system.
• The intrinsic parameters represent a projective transformation from the 3-D camera’s coordinates into the 2-D
image coordinates.
Extrinsic Parameters:
The extrinsic parameters consist of a rotation, R, and a translation, t. The origin of the camera’s coordinate system is at
its optical center and its x- and y-axis define the image plane.

Intrinsic Parameters:
The intrinsic parameters include the focal length, the optical center, also known as the principal point, and the skew
coefficient. The camera intrinsic matrix, K, is defined as:
𝑓𝑥 0 0
[𝑠 𝑓𝑦 0]
𝑐𝑥 𝑐𝑦 1

The pixel skew is defined as:

[𝒄𝒙 𝒄𝒚 ] - Optical center (the principal point), in pixels.

(𝒇𝒙 , 𝒇𝒚 ) - Focal length in pixels.


Where 𝒇𝒙 = 𝑭⁄𝑷 and 𝒇𝒚 = 𝑭⁄𝑷
𝒙 𝒚
𝑭 - Focal length in world units, typically expressed in millimeters.
(𝑷𝒙 , 𝑷𝒚 ) - Size of the pixel in world units.
𝒔 - Skew coefficient, which is non-zero if the image axes are not perpendicular.
Where 𝒔 = 𝒇𝒙 𝐭𝐚𝐧 𝜶
Distortion in Camera Calibration
The camera matrix does not account for lens distortion because an ideal pinhole camera does not have a lens. To
accurately represent a real camera, the camera model includes:

• Radial lens distortion


• Tangential lens distortion.

Radial Distortion
Radial distortion occurs when light rays bend more near the edges of a lens than they do at its optical center. The
smaller the lens, the greater the distortion.

The radial distortion coefficients model this type of distortion.

The distorted points are denoted as: (𝑥𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 , 𝑦𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 )

xdistorted = x (1 + k1*r2 + k2*r4 + k3*r6)


ydistorted = y (1 + k1*r2 + k2*r4 + k3*r6)
• x, y — Undistorted pixel locations. x and y are in normalized image coordinates. Normalized image coordinates
are calculated from pixel coordinates by translating to the optical center and dividing by the focal length in
pixels. Thus, x and y are dimensionless.
• k1, k2, and k3 — Radial distortion coefficients of the lens.
• r2: x2 + y2

Typically, two coefficients are enough for calibration. For severe distortion, such as in wide-angle lenses, you can select 3
coefficients to include k3.

Tangential Distortion
Tangential distortion occurs when the lens and the image plane are not parallel. The tangential distortion coefficients
model this type of distortion.
The distorted points are denoted as: (𝑥𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 , 𝑦𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 )

xdistorted = x + [2 * p1 * x * y + p2 * (r2 + 2 * x2)]


ydistorted = y + [p1 * (r2 + 2 *y2) + 2 * p2 * x * y]
• x, y — Undistorted pixel locations. x and y are in normalized image coordinates. Normalized image coordinates
are calculated from pixel coordinates by translating to the optical center and dividing by the focal length in
pixels. Thus, x and y are dimensionless.

• p1 and p2 — Tangential distortion coefficients of the lens.

• r2: x2 + y2

4. Single Camera Calibrator App in MATLAB


Single Camera Calibration Steps

Follow this workflow to calibrate your camera using the app:

1. Prepare images, camera, and calibration pattern.


2. Add images and select standard or fisheye camera model.
3. Calibrate the camera.
4. Evaluate calibration accuracy.
5. Adjust parameters to improve accuracy (if necessary).
6. Export the parameters object.

To better the results, use between 10 and 20 images of the calibration pattern. The calibrator requires at least three
images. Use uncompressed images or lossless compression formats such as PNG.

5. Viola-Jones Algorithm for Face Detection


The Viola-Jones algorithm is a widely used mechanism for object detection. The main property of this algorithm is that
training is slow, but detection is fast. This algorithm uses Haar basis feature filters, so it does not use multiplications.

The algorithm has four stages:

1. Haar Feature Selection


2. Creating an Integral Image
3. Adaboost Training
4. Cascading Classifiers

The efficiency of the Viola-Jones algorithm can be significantly increased by first generating the integral image.
The integral image allows integrals for the Haar extractors to be calculated by adding only four numbers. For example,
the image integral of area ABCD (Fig.1) is calculated as II(yA,xA) – II(yB,xB) – II(yC,xC) + II(yD,xD).

Image area integration using integral image

Detection happens inside a detection window. A minimum and maximum window size is chosen, and for each size a
sliding step size is chosen. Then the detection window is moved across the image as follows:

1. Set the minimum window size, and sliding step corresponding to that size.
2. For the chosen window size, slide the window vertically and horizontally with the same step. At each step, a set
of N face recognition filters is applied. If one filter gives a positive answer, the face is detected in the current
widow.
3. If the size of the window is the maximum size stop the procedure. Otherwise increase the size of the window
and corresponding sliding step to the next chosen size and go to the step 2.

Each face recognition filter (from the set of N filters) contains a set of cascade-connected classifiers. Each classifier looks
at a rectangular subset of the detection window and determines if it looks like a face. If it does, the next classifier is
applied. If all classifiers give a positive answer, the filter gives a positive answer and the face is recognized. Otherwise
the next filter in the set of N filters is run.

Each classifier is composed of Haar feature extractors (weak classifiers). Each Haar feature is the weighted sum of 2-D
integrals of small rectangular areas attached to each other. The weights may take values ±1. Figure below shows
examples of Haar features relative to the enclosing detection window. Gray areas have a positive weight and white
areas have a negative weight. Haar feature extractors are scaled with respect to the detection window size.

Example rectangle features shown relative to the enclosing detection window


The classifier decision is defined as:

𝒇𝒎,𝒊 is the weighted sum of the 2-D integrals. 𝒕𝒎,𝒊 is the decision threshold for the 𝒊𝒕𝒉 feature extractor. 𝜶𝒎,𝒊 and 𝜷𝒎,𝒊
are constant values associated with the 𝒊𝒕𝒉 feature extractor. 𝜽𝒎 is the decision threshold for the 𝒎𝒕𝒉 classifier.

Object detection Viola-Jones filter

The cascade architecture is very efficient because the classifiers with the fewest features are placed at the beginning of
the cascade, minimizing the total required computation. The most popular algorithm for features training is AdaBoost.

Characteristics of Viola-Jones Algorithm


The characteristics of Viola–Jones algorithm which make it a good detection algorithm are:

• Robust – very high detection rate (true-positive rate) & very low false-positive rate always.
• Real time – For practical applications at least 2 frames per second must be processed.
• Face detection only (not recognition) - The goal is to distinguish faces from non-faces (detection is the first step
in the recognition process).

----------------------------------------------------------------------------------------------------------------------------------------------------------------
COMPILED BY: SYED TEHSEEN UL HASAN SHAH

You might also like