Computer Vision System Toolbox - MATLAB
Computer Vision System Toolbox - MATLAB
Computer Vision System Toolbox - MATLAB
1. vision.BlobAnalysis
2. vision.CascadeObjectDetector
3. vision.ForegroundDetector
4. vision.PeopleDetector
1. vision.BlobAnalysis
Definition
A blob is collection of connected pixels. The BlobAnalysis object computes statistics for connected regions in a binary
image.
We use step() function to computer descriptions of blobs in a binary image. The step method computes and returns
statistics of the input binary image depending on the property values specified.
H = vision.BlobAnalysis
It returns a blob analysis System object, H, used to compute statistics for connected regions in a binary image.
• Step function returns the area, centroid and the bounding box of the blobs when the AreaOutputPort,
CentroidOutputPort and BoundingBoxOutputPort properties are set to true.
• These are the only properties that are set to true by default.
• If you set any additional properties to true, the outputs come after the AREA, CENTROID, and BBOX outputs.
e.g. [AREA, CENTROID, BBOX, 4th output, 5th output, and so on]
Properties
We can set some properties of blob detector object to true to get addition information along with AREA, CENTEROID and
BBOX. For example, if we set Connectivity property to true, we get which pixels are connected to each other. For
example, [AREA, CENTROID, BBOX, Connectivity] = step (H, BW)
Properties Description
AreaOutputPort Returns blob area. The default is true.
CentroidOutputPort Return coordinates of blob centroids. The default is true.
BoundingBoxOutputPort Return coordinates of bounding boxes. The default is true.
MajorAxisLengthOutputPort Returns a vector whose values represent lengths of ellipses' major axes. The default
is false.
MinorAxisLengthOutputPort Returns a vector whose values represent lengths of ellipses' minor axes. The default is
false.
OrientationOutputPort Return vector whose values represent angles between ellipses' major axes and x-axis.
PerimeterOutputPort Return a vector whose values represent estimates of the perimeter lengths, in pixels,
of each blob.
Connectivity Tells that which pixels are connected to each other.
MinimumBlobArea Minimum blob area in pixels. The default is 0. This property is tunable.
MaximumBlobArea Maximum blob area in pixels. The default is intmax('uint32'). This property is tunable
ExcludeBorderBlobs Exclude blobs that contain at least one image border pixel
Example
Find the centroid of a blob.
hblob = vision.BlobAnalysis;
hblob.AreaOutputPort = false;
hblob.BoundingBoxOutputPort = false;
img = logical ([0 0 0 0 0 0; ...
0 1 1 1 1 0; ...
0 1 1 1 1 0; ...
0 1 1 1 1 0; ...
0 0 0 0 0 0]);
centroid = step(hblob, img); % [x y] coordinates of the centroid
2. vision.CascadeObjectDetector
Description
The cascade object detector uses the Viola-Jones algorithm to detect people’s faces, noses, eyes, mouth, or upper body.
1. Create the vision.CascadeObjectDetector object and set its properties. hblob = vision.BlobAnalysis;
2. Call the object with arguments, as if it were a function. [area,centroid,bbox] = step(hblob, img);
We can train cascade detector to detect any type of object. MATLAB has trained this to detect facial features by default.
Syntaxes
detector = vision.CascadeObjectDetector
detector = vision.CascadeObjectDetector(model)
-It creates a detector configured to detect objects defined by the input model, e.g. ‘Nose’, ‘Mouth’, etc.
detector = vision.CascadeObjectDetector(XMLFILE)
-It creates a detector and configures it to use the custom classification model specified with the XMLFILE input.
detector = vision.CascadeObjectDetector(Name,Value)
-It creates a cascade object detector object, detector, with additional options specified by one or more
Name,Value pair arguments, e.g. vision.CascadeObjectDetector('ClassificationModel','UpperBody')
Classification Models
There are some pre-trained models my MATLAB to detect facial features. By default, the detector is configured to detect
faces. Write the model name while creating detector object, e.g. detector=vision.CascadeObjectDetector('EyePairBig')
Properties
There are some properties we can set to change the behavior of the detection algorithm:
Output of Detector
The CascadeObjectDetector give only one output. It returns a M-by-4 element matrix. Each row of the output matrix
contains a four-element vector, [x y width height], that specifies in pixels, the upper-left corner and size of a bounding
box.
Usage/Example
faceDetector = vision.CascadeObjectDetector;
I = imread('visionteam.jpg');
bboxes = faceDetector(I);
IFaces = insertObjectAnnotation(I,'rectangle',bboxes,'Face');
Figure;
imshow(IFaces);
title ('Detected faces');
3. Camera Calibration
Geometric camera calibration, also referred to as camera re-sectioning, estimates the parameters of a lens and image
sensor of an image or video camera.
1. Intrinsic parameters
2. Extrinsic parameters
3. Distortion coefficients
How to Calibrate
We calibrate camera using multiple images of a calibration pattern, such as a checkerboard. Using the correspondences,
you can solve for the camera parameters. After you calibrate a camera, to evaluate the accuracy of the estimated
parameters, you can:
• Plot the relative locations of the camera and the calibration pattern
• Calculate the reprojection errors.
• Calculate the parameter estimation errors.
Camera Model
The Computer Vision System Toolbox™ calibration algorithm uses the camera model proposed by Jean-Yves Bouguet.
The pinhole camera parameters are represented in a 4-by-3 matrix called the camera matrix. This matrix maps the 3-D
world scene into the image plane. The calibration algorithm calculates the camera matrix using the extrinsic and intrinsic
parameters. The extrinsic parameters represent the location of the camera in the 3-D scene. The intrinsic parameters
represent the optical center and focal length of the camera.
• The world points are transformed to camera coordinates using the extrinsics parameters.
• The camera coordinates are mapped into the image plane using the intrinsics parameters.
• The extrinsic parameters represent a rigid transformation from 3-D world coordinate system to the 3-D camera’s
coordinate system.
• The intrinsic parameters represent a projective transformation from the 3-D camera’s coordinates into the 2-D
image coordinates.
Extrinsic Parameters:
The extrinsic parameters consist of a rotation, R, and a translation, t. The origin of the camera’s coordinate system is at
its optical center and its x- and y-axis define the image plane.
Intrinsic Parameters:
The intrinsic parameters include the focal length, the optical center, also known as the principal point, and the skew
coefficient. The camera intrinsic matrix, K, is defined as:
𝑓𝑥 0 0
[𝑠 𝑓𝑦 0]
𝑐𝑥 𝑐𝑦 1
Radial Distortion
Radial distortion occurs when light rays bend more near the edges of a lens than they do at its optical center. The
smaller the lens, the greater the distortion.
Typically, two coefficients are enough for calibration. For severe distortion, such as in wide-angle lenses, you can select 3
coefficients to include k3.
Tangential Distortion
Tangential distortion occurs when the lens and the image plane are not parallel. The tangential distortion coefficients
model this type of distortion.
The distorted points are denoted as: (𝑥𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 , 𝑦𝑑𝑖𝑠𝑡𝑜𝑟𝑡𝑒𝑑 )
• r2: x2 + y2
To better the results, use between 10 and 20 images of the calibration pattern. The calibrator requires at least three
images. Use uncompressed images or lossless compression formats such as PNG.
The efficiency of the Viola-Jones algorithm can be significantly increased by first generating the integral image.
The integral image allows integrals for the Haar extractors to be calculated by adding only four numbers. For example,
the image integral of area ABCD (Fig.1) is calculated as II(yA,xA) – II(yB,xB) – II(yC,xC) + II(yD,xD).
Detection happens inside a detection window. A minimum and maximum window size is chosen, and for each size a
sliding step size is chosen. Then the detection window is moved across the image as follows:
1. Set the minimum window size, and sliding step corresponding to that size.
2. For the chosen window size, slide the window vertically and horizontally with the same step. At each step, a set
of N face recognition filters is applied. If one filter gives a positive answer, the face is detected in the current
widow.
3. If the size of the window is the maximum size stop the procedure. Otherwise increase the size of the window
and corresponding sliding step to the next chosen size and go to the step 2.
Each face recognition filter (from the set of N filters) contains a set of cascade-connected classifiers. Each classifier looks
at a rectangular subset of the detection window and determines if it looks like a face. If it does, the next classifier is
applied. If all classifiers give a positive answer, the filter gives a positive answer and the face is recognized. Otherwise
the next filter in the set of N filters is run.
Each classifier is composed of Haar feature extractors (weak classifiers). Each Haar feature is the weighted sum of 2-D
integrals of small rectangular areas attached to each other. The weights may take values ±1. Figure below shows
examples of Haar features relative to the enclosing detection window. Gray areas have a positive weight and white
areas have a negative weight. Haar feature extractors are scaled with respect to the detection window size.
𝒇𝒎,𝒊 is the weighted sum of the 2-D integrals. 𝒕𝒎,𝒊 is the decision threshold for the 𝒊𝒕𝒉 feature extractor. 𝜶𝒎,𝒊 and 𝜷𝒎,𝒊
are constant values associated with the 𝒊𝒕𝒉 feature extractor. 𝜽𝒎 is the decision threshold for the 𝒎𝒕𝒉 classifier.
The cascade architecture is very efficient because the classifiers with the fewest features are placed at the beginning of
the cascade, minimizing the total required computation. The most popular algorithm for features training is AdaBoost.
• Robust – very high detection rate (true-positive rate) & very low false-positive rate always.
• Real time – For practical applications at least 2 frames per second must be processed.
• Face detection only (not recognition) - The goal is to distinguish faces from non-faces (detection is the first step
in the recognition process).
----------------------------------------------------------------------------------------------------------------------------------------------------------------
COMPILED BY: SYED TEHSEEN UL HASAN SHAH