Design and Implementation of A Cloud Particle Shape Recognition Algorithm Based On Matlab
Design and Implementation of A Cloud Particle Shape Recognition Algorithm Based On Matlab
Design and Implementation of A Cloud Particle Shape Recognition Algorithm Based On Matlab
This study addresses the significance of clouds in influencing climate change, emphasizing the
crucial role of cloud microphysical characteristics. The airborne two-dimensional stereo probe
detector (2D-S) is extensively employed in domestic meteorological observations for cloud
microphysics research. Recognizing the limitations of subjective, time-consuming, and
inconsistent manual cloud particle shape classification, this research proposes an innovative
approach. A cloud particle shape recognition method is introduced, leveraging a 15-layer
convolutional neural network (CNN). The CNN model incorporates a lightweight convolution
module, streamlining the automatic recognition of cloud particle shapes. This methodology
offers a more objective and efficient alternative to manual classification, enhancing the accuracy
of cloud microphysics studies. The proposed CNN-based approach contributes to advancing our
understanding of cloud dynamics and, consequently, refining climate change models.
Keywords-component; cloud particles; shape recognition; convolutional neural network; 2D-S
CHAPTER 1
INTRODUCTION
Clouds, with their diverse and intricate particle structures, play a pivotal role in Earth's
atmospheric processes, impacting weather patterns, climate dynamics, and overall environmental
conditions. Understanding the morphological characteristics of cloud particles is crucial for
advancing our knowledge of atmospheric science and improving climate models. In recent years,
the integration of advanced technologies, particularly computer vision and deep learning, has
opened new avenues for the analysis and recognition of cloud particle shapes. This research
endeavors to contribute to this burgeoning field by proposing and implementing a Cloud Particle
Shape Recognition Algorithm utilizing Convolutional Neural Networks (CNN), designed and
executed within the MATLAB environment.
1. Background:
The study of cloud particle shapes has traditionally relied on manual observations and
measurements, limiting the scope and efficiency of data collection. With the advent of
computational techniques and machine learning, there is an opportunity to enhance our ability to
analyze vast amounts of cloud imagery rapidly. MATLAB, a versatile numerical computing
environment, serves as an ideal platform for the development and implementation of
sophisticated algorithms. In this research, we focus on leveraging the power of CNNs to
automate cloud particle shape recognition, enabling more efficient and accurate analysis.
Clouds exhibit a wide range of particle shapes, including ice crystals, water droplets, and various
combinations thereof. These shapes influence the optical properties of clouds, affecting radiative
processes and, consequently, the Earth's energy budget. Understanding the distribution and
characteristics of cloud particle shapes is essential for climate studies, weather prediction, and
atmospheric modeling. The proposed algorithm aims to automate the identification and
classification of cloud particle shapes, facilitating a deeper understanding of their impact on the
Earth's climate system.
3. Role of Convolutional Neural Networks (CNNs):
MATLAB provides a robust and user-friendly environment for algorithm development, data
analysis, and visualization. Its extensive set of tools and libraries, combined with a
straightforward syntax, make it an ideal choice for implementing complex algorithms,
particularly in the field of image processing. The research leverages MATLAB's capabilities to
design, implement, and evaluate the proposed Cloud Particle Shape Recognition Algorithm,
ensuring a seamless integration of advanced techniques into a user-friendly framework.
5. Research Objectives:
c. Comparison of the proposed CNN-based approach with traditional methods to highlight the
advantages and advancements achieved.
4. Dataset Description:
The effectiveness of any deep learning model is contingent on the quality and diversity of the
dataset used for training and evaluation. In this study, we utilize a comprehensive dataset
comprising a vast collection of fundus images acquired from diabetic patients at different stages
of retinopathy. The dataset is carefully curated to ensure a representative distribution of images
across various stages, allowing the model to learn and generalize from a diverse set of cases.
This section also discusses the challenges associated with obtaining and annotating medical
image datasets, emphasizing the importance of data quality and ethical considerations in the
development of models for healthcare applications.
5. Preprocessing Techniques:
Prior to feeding the images into the InceptionNet V3 model, a series of preprocessing steps are
employed to enhance the quality and relevance of the input data. These preprocessing techniques
may include image resizing, normalization, and augmentation to mitigate issues such as class
imbalance and variations in illumination. This section provides a detailed account of the
preprocessing pipeline, explaining the rationale behind each step and its impact on the model's
performance.
6. Transfer Learning:
Transfer learning plays a pivotal role in the successful application of deep learning models to
medical image analysis. Leveraging pre-trained models on large-scale datasets for general image
recognition tasks, such as ImageNet, allows the model to learn rich hierarchical features that can
be repurposed for specific medical imaging tasks. In this study, we delve into the transfer
learning strategy employed, highlighting how InceptionNet V3 is fine-tuned on the diabetic
retinopathy dataset to adapt its features for accurate stage classification.
The training of the InceptionNet V3 model involves optimizing its parameters using appropriate
loss functions and regularization techniques. This section provides insights into the training
process, detailing the choice of optimization algorithms, learning rates, and the convergence
criteria. Additionally, we discuss the evaluation metrics employed to assess the model's
performance, including sensitivity, specificity, and the area under the receiver operating
characteristic curve (AUC-ROC).
MATLAB Implementation:
MATLAB, renowned for its versatility and extensive toolboxes, serves as the primary platform
for implementing the proposed CNN-based algorithm. Leveraging MATLAB's intuitive
interface, image processing capabilities, and compatibility with deep learning frameworks, the
implementation process is expected to be seamless and efficient.
Deep Learning Toolbox: Utilizing MATLAB's Deep Learning Toolbox to construct, train, and
evaluate the CNN architecture. This toolbox provides a user-friendly environment for designing
and fine-tuning neural networks.
[1] V. Ramanathan, R. D. Cess, E. F. Harrison, et al. Cloud radiative forcing and climate:
Results from the earth radiation budget experiment. Science, 1988, 243(4887): 57-63.
The study of climate and climate change is hindered by a lack of information on the effect of
clouds on the radiation balance of the earth, referred to as the cloud-radiative forcing.
Quantitative estimates of the global distributions of cloud-radiative forcing have been obtained
from the spaceborne Earth Radiation Budget Experiment (ERBE) launched in 1984. For the
April 1985 period, the global shortwave cloud forcing [-44.5 watts per square meter (W/m2)] due
to the enhancement of planetary albedo, exceeded in magnitude the longwave cloud forcing
(31.3 W/m2) resulting from the greenhouse effect of clouds. Thus, clouds had a net cooling
effect on the earth. This cooling effect is large over the mid- and high-latitude oceans, with
values reaching -100 W/m2. The monthly averaged longwave cloud forcing reached maximum
values of 50 to 100 W/m2 over the convectively disturbed regions of the tropics. However, this
heating effect is nearly cancelled by a correspondingly large negative shortwave cloud forcing,
which indicates the delicately balanced state of the tropics. The size of the observed net cloud
forcing is about four times as large as the expected value of radiative forcing from a doubling of
CO2. The shortwave and longwave components of cloud forcing are about ten times as large as
those for a CO2 doubling. Hence, small changes in the cloud-radiative forcing fields can play a
significant role as a climate feedback mechanism. For example, during past glaciations a
migration toward the equator of the field of strong, negative cloud-radiative forcing, in response
to a similar migration of cooler waters, could have significantly amplified oceanic cooling and
continental glaciation.
Summary: This study contributes to the growing body of knowledge surrounding cloud
radiative forcing and its implications for climate
[2] Wang Lei, Li Chengcai, Zhao Zengliang, etc. Application of two-dimensional particle
shape classification technology in cloud microphysical feature analysis. Atmospheric Science,
2014, 38(02):201- 212.
The airborne two-dimensional stereo (2D-S) optical array probe has been operating for more
than 10 yr, accumulating a large amount of cloud particle image data. However, due to the lack
of reliable and unbiased classification tools, our ability to extract meaningful morphological
information related to cloud microphysical processes is limited. To solve this issue, we propose a
novel classification algorithm for 2D-S cloud particle images based on a convolutional neural
network (CNN), named CNN-2DS. A 2D-S cloud particle shape dataset was established by using
the 2D-S cloud particle images observed from 13 aircraft detection flights in 6 regions of China
(Northeast, Northwest, North, East, Central, and South China). This dataset contains 33,300
cloud particle images with 8 types of cloud particle shape (linear, sphere, dendrite, aggregate,
graupel, plate, donut, and irregular). The CNN-2DS model was trained and tested based on the
established 2D-S dataset. Experimental results show that the CNN-2DS model can accurately
identify cloud particles with an average classification accuracy of 97%. Compared with other
common classification models [e.g., Vision Transformer (ViT) and Residual Neural Network
(ResNet)], the CNN-2DS model is lightweight (few parameters) and fast in calculations, and has
the highest classification accuracy. In a word, the proposed CNN-2DS model is effective and
reliable for the classification of cloud particles detected by the 2D-S probe.
Summary: This study explores the integration of advanced two-dimensional particle shape
[3] H. Letu, H. Ishimoto, J. Riedi, et al. Investigation of ice particle habits to be used for ice
cloud remote sensing for the GCOM-C satellite mission. Atmospheric Chemistry and
Physics,2015,15(21),31,665–31,703.
In this study, various ice particle habits are investigated in conjunction with inferring the optical
properties of ice clouds for use in the Global Change Observation Mission-Climate (GCOM-C)
satellite programme. We develop a database of the single-scattering properties of five ice habit
models: plates, columns, droxtals, bullet rosettes, and Voronoi. The database is based on the
specification of the Second-Generation Global Imager (SGLI) sensor on board the GCOM-C
satellite, which is scheduled to be launched in 2017 by the Japan Aerospace Exploration Agency.
A combination of the finite-difference time-domain method, the geometric optics integral
equation technique, and the geometric optics method is applied to compute the single-scattering
properties of the selected ice particle habits at 36 wavelengths, from the visible to the infrared
spectral regions. This covers the SGLI channels for the size parameter, which is defined as a
single-particle radius of an equivalent volume sphere, ranging between 6 and 9000 µm. The
database includes the extinction efficiency, absorption efficiency, average geometrical cross
section, single-scattering albedo, asymmetry factor, size parameter of a volume-equivalent
sphere, maximum distance from the centre of mass, particle volume, and six nonzero elements of
the scattering phase matrix. The characteristics of calculated extinction efficiency, single-
scattering albedo, and asymmetry factor of the five ice particle habits are compared.
Furthermore, size-integrated bulk scattering properties for the five ice particle habit models are
calculated from the single-scattering database and microphysical data. Using the five ice particle
habit models, the optical thickness and spherical albedo of ice clouds are retrieved from the
Polarization and Directionality of the Earth's Reflectances-3 (POLDER-3) measurements,
recorded on board the Polarization and Anisotropy of Reflectance’s for Atmospheric Sciences
coupled with Observations from a Lidar (PARASOL) satellite. The optimal ice particle habit for
retrieving the SGLI ice cloud properties is investigated by adopting the spherical albedo
difference (SAD) method. It is found that the SAD is distributed stably due to the scattering
angle increases for bullet rosettes with an effective diameter (Deff) of 10 µm and Voronoi
particles with Deff values of 10, 60, and 100 µm. It is confirmed that the SAD of small bullet-
rosette particles and all sizes of Voronoi particles has a low angular dependence, indicating that a
combination of the bullet-rosette and Voronoi models is sufficient for retrieval of the ice cloud's
spherical albedo and optical thickness as effective habit models for the SGLI sensor. Finally,
SAD analysis based on the Voronoi habit model with moderate particle size (Deff = 60 µm) is
compared with the conventional general habit mixture model, inhomogeneous hexagonal
monocrystal model, five-plate aggregate model, and ensemble ice particle model. The Voronoi
habit model is found to have an effect similar to that found in some conventional models for the
retrieval of ice cloud properties from space-borne radiometric observations.
Summary: This research lays the groundwork for future advancements in satellite
[4] E. W. Holroyd. Some techniques and uses of 2D-C habit classification software for snow
particles. Journal of Atmospheric and Oceanic Technology, 1987, 4:498-511.
A technique has been designed that uses observable properties of images from a 2D-C optical
array probe (size, linearity, area, perimeter, and image density) to classify unsymmetrical ice
particles into nine habit classes. Concentrations are calculated by requiring that the center of
each accepted particle appear to be within the field of view of the probe. Once the size and habit
are estimated, a generic mass and terminal velocity can be assigned to each particle to calculate
its contribution to ice water content and to precipitation rate. Examples are given to indicate the
value of a habit classifier in analyzing the structure of storms, showers, orographic clouds, and
seeded clouds. Though the techniques work well for most natural snowfalls, some examples of
imperfections are given to remind the analyst to look at the images and to understand how the
classifier will treat them.
Summary: This study explores the application of 2D-C habit classification software
[5] A. Korolev, B. Sussman. A technique for habit classification of cloud particles. Journal of
Atmospheric and Oceanic Technology, 2000, 17:1048-1057.
A new algorithm was developed to classify populations of binary (black and white) images of
cloud particles collected with Particle Measuring Systems (PMS) Optical Array Probes (OAPA).
The algorithm classifies images into four habit categories: “spheres,” “irregulars,” “needles,” and
“dendrites.” The present algorithm derives the particle habits from an analysis of dimensionless
ratios of simple geometrical measures such as the x and y dimensions, perimeter, and image area.
For an ensemble of images containing a mixture of different habits, the distribution of a
particular ratio will be a linear superposition of basis distributions of ratios of the individual
habits. The fraction of each habit in the ensemble is found by solving the inverse problem. One
of the advantages of the suggested scheme is that it provides recognition analysis of both
“complete” and “partial” images, that is, images that are completely or partially contained within
the sample area of the probe. The ability to process “partial” images improves the statistics of the
recognition by approximately 50% when compared with retrievals that use “complete” images
only. The details of this algorithm are discussed in this study.
Summary: This study presents a novel technique for the habit classification of cloud particles
[6] Huang Minsong, Lei Hengchi, Wang Xiujuan, Ice particle habit classification method
with improved thresholds and its application. Climate and Environment
Research,2020,25(04):419-428.
A versatile method to automatically classify ice particle habit from various airborne optical array
probes is presented. The classification is achieved using a multinomial logistic regression model.
For each airborne probe, the model determines the particle habit (among six classes) based on a
large set of geometrical and textural descriptors extracted from the two-dimensional image of a
particle. The technique is applied and evaluated using three probes with significantly different
specifications: the high-volume precipitation spectrometer, the two-dimensional stereo probe,
and the cloud particle imager. Performance and robustness of the method are assessed using
standard machine learning tools on the basis of thousands of images manually labeled for each of
the considered probes. The three classifiers show good performance characterized by overall
accuracies and Heidke skill scores above 90%. Depending on the application and user
preferences, the classification scheme can be easily adapted. For a more precise output, intraclass
subclassification can be achieved in a nested fashion, illustrated here with columnar crystals and
aggregates. A comparative study of the classification output obtained with the three probes is
presented for two aircraft flight periods selected when the three probes were operating together.
Results are globally consistent in term of proportions of habit identified (once blurry and partial
images have been automatically discarded). A perfect agreement is not expected as the three
considered probes are sensitive to different particle size range.
Summary: The study's findings contribute to the advancement of atmospheric science and
meteorological research
CHAPTER 3
EXISTING MEHTOD
Support Vector Machines
Support vector machines (SVMs) are a set of supervised learning methods used
for classification, regression and outliers’ detection. However, primarily, it is used for
Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyperplane:
Support Vector Machines, often referred to as SVMs, are a class of supervised machine learning
algorithms. They are primarily used for classification tasks, where the goal is to assign each
input data point to one of two or more classes based on its features. SVMs are also used for
regression tasks, where the goal is to predict a continuous numerical value based on input
features.
SVMs are particularly well-suited for problems where the data is high-dimensional and may not
be linearly separable. Linear separation means that it's possible to draw a straight line (or
hyperplane in higher dimensions) that effectively separates the data points into their respective
classes. However, SVMs can handle non-linear separation as well, thanks to a mathematical
concept known as the kernel trick.
SVMs are widely used in various domains, including image classification, text categorization,
bioinformatics, and many other fields where pattern recognition is essential.
SVMs work by finding an optimal hyperplane that best separates the data into different classes.
Let's break down the key concepts involved in SVMs:
Linear Separation:
At the core of SVMs is the concept of finding a hyperplane that linearly separates data points
belonging to different classes. In a 2D space, this hyperplane is a straight line. In higher
dimensions, it's a hyperplane. The objective is to maximize the margin between this hyperplane
and the nearest data points from each class.
Margin and Support Vectors:
The margin is the region between the hyperplane and the nearest data points from each class.
These nearest data points, which touch the margin, are called support vectors. SVMs get their
name from these crucial support vectors because they support or define the decision boundary.
In real-world data, perfect linear separation is often not possible due to noise and outliers. To
account for this, SVMs use a soft margin. The soft margin allows for some misclassifications,
but it seeks to minimize them while maximizing the margin. The balance between achieving a
wider margin and allowing misclassifications is controlled by a hyperparameter, denoted as 'C.'
SVMs can handle non-linearly separable data by mapping the data to a higher-dimensional space
using kernel functions. Kernel functions transform the original feature space into a higher-
dimensional space, where the data becomes linearly separable. Common kernel functions include
linear, polynomial, radial basis function (RBF), and sigmoid kernels. The choice of kernel
function depends on the nature of the data and the problem at hand.
The mathematical foundations of SVMs involve optimization and duality. Here are the key
components:
Objective Function:
The objective of an SVM is to find the optimal hyperplane that maximizes the margin while
minimizing classification errors. This optimization problem can be expressed as a convex
quadratic programming problem. The goal is to maximize:
- The margin, which is proportional to the inverse of the norm (magnitude) of the weight vector
associated with the hyperplane.
- Subject to constraints that enforce that data points are correctly classified within the margin or
on the correct side of the hyperplane.
Dual Problem:
Kernel Trick:
The kernel trick is a significant advancement in SVMs. It allows SVMs to implicitly compute the
transformation of input data into a higher-dimensional space without explicitly calculating the
transformation. This is important because it avoids the computational cost and storage
requirements associated with working in high-dimensional spaces. The kernel trick makes it
possible to use linear classifiers in high-dimensional feature spaces.
Training an SVM:
Data Preprocessing:
Data preprocessing is crucial to ensure that the input data is in a suitable form for the SVM. This
may include tasks such as data cleaning, handling missing values, and feature extraction.
Feature Scaling:
Kernel Selection:
Choosing the appropriate kernel function is an important decision. Linear kernels are used when
the data is linearly separable. For non-linearly separable data, selecting the right kernel function
(e.g., RBF, polynomial) is crucial.
Hyperparameter Tuning:
SVMs have hyperparameters that need to be tuned for optimal performance. The most important
hyperparameter is 'C,' which balances the trade-off between maximizing the margin and
minimizing classification errors. Techniques such as cross-validation can help in selecting
suitable hyperparameters.
SVM in Classification:
SVMs are often used for binary classification tasks, where the goal is to classify data points into
one of two classes. The optimal hyperplane learned during training is used to classify new data
points. The class of a new data point is determined by which side of the hyperplane it falls on.
SVMs can also be used for regression tasks. Support Vector Regression (SVR) aims to predict a
continuous value instead of
class labels. SVR is used when the output variable is continuous and can be thought of as
finding a function that predicts the target value.
SVMs are naturally binary classifiers, but they can be extended to handle multi-class
classification tasks. Common approaches for multi-class classification using SVMs include one-
vs-rest (OvR) and one-vs-one (Ovo) strategies. In the OvR approach, a separate SVM is trained
for each class against the rest. In the OvO approach, a binary classifier is trained for each pair of
classes. These binary classifiers vote on the final class assignment.
When using SVMs in practice, several challenges and considerations should be addressed:
Data Preprocessing: Proper data preprocessing is crucial to ensure that the input data is clean
and appropriately formatted for the SVM.
Feature Selection: Selecting the right features is essential, as not all features are equally
informative. Feature selection methods can help.
Hyperparameter Tuning: Careful selection of hyperparameters, such as 'C' and kernel
parameters, can significantly affect the performance of the SVM.
Imbalanced Data: Dealing with imbalanced datasets, where one class has significantly fewer
samples than others, requires appropriate techniques.
Scalability: SVMs may not scale well to large datasets, and training can be time-consuming.
Linear SVM and stochastic gradient descent variants can help with scalability.
The field of machine learning is constantly evolving, and SVMs are no exception. Future trends
in SVM and machine learning may include:
Deep Learning: Deep learning techniques, particularly convolutional neural networks (CNNs)
and recurrent neural networks (RNNs), are gaining popularity. These models can automatically
learn features from raw data, potentially reducing the need for manual feature engineering.
Hybrid Models: Researchers are exploring hybrid models that combine SVMs with deep
learning techniques to leverage the strengths of both approaches.
Privacy and Ethics: As machine learning applications become more integrated into daily life,
concerns regarding data privacy and ethical considerations are growing. Ensuring the responsible
use of machine learning technology is crucial.
Multi-Modal Data: The integration of data from various sources, such as audio, video, and
environmental sensors, is becoming more common. SVMs can be adapted to handle multi-modal
data.
Transfer Learning: Transfer learning techniques allow models trained on one dataset to be
adapted to a new dataset with fewer labeled examples. This is useful when collecting labeled
data is expensive or time-consuming.
Support Vector Machines (SVMs) are a versatile and effective class of machine learning
algorithms used for classification and regression tasks. They are particularly well-suited for high-
dimensional data and can handle both linear and non-linear separation through the use of kernel
functions. Despite their computational costs and sensitivity to hyperparameters, SVMs have
found applications in various fields, from image classification and text analysis to biomedical
data analysis and anomaly detection.
The mathematical foundations of SVMs, including the concept of margins, support vectors, and
the kernel trick, are key to their success. SVMs offer a balance between simplicity and
flexibility, making them a valuable tool in the machine learning toolbox.
As the field of machine learning continues to advance, SVMs will coexist with newer techniques,
and their usage will depend on the specific requirements and constraints of the problem at hand.
Whether used as standalone models or as part of hybrid systems, SVMs will continue to play a
role in pattern recognition, classification, and regression tasks in the years to come.
Disadvantages:
Slow for Large Datasets: SVMs can be slow if you have lots and lots of pictures to
classify because they need to analyse each image carefully.
Picky with Parameters: They are sensitive to settings like "kernel" and "C," so you need
to choose these values carefully to get good results.
Not So Great with Huge Images: SVMs might struggle if your images are super big, as
they work better with smaller ones.
CHAPTER 4
PROPOSED METHOD
The proposed method introduces a novel approach for cloud particle shape recognition,
addressing limitations in traditional methods that rely on subjective human observation.
Leveraging the efficiency and accuracy of convolutional neural networks (CNNs), a specialized
21-layer model is constructed. This model incorporates a lightweight convolution module
designed for optimal performance in cloud microphysics research. Unlike manual classification
methods, which are prone to subjectivity and time constraints, the CNN-based approach offers
automated and objective cloud particle shape recognition. By exploiting the power of deep
learning, the method ensures consistent and reliable results, overcoming the challenges
associated with manual classification. This innovation holds promise for advancing our
understanding of cloud microphysical characteristics and their implications for climate change,
contributing to more accurate meteorological observations and climate modeling.
Gray Image
Enchancement Image
CNN Network
Dataset CNN Layers Training options
CNN Classification
Classified Matrices
In a grayscale image:
1. Pixel Values: Each pixel in the image is represented by a single numeric value, usually an 8-
bit value ranging from 0 to 255. In this range, 0 typically represents black, 255 represents white,
and values in between represent various shades of gray.
2. Intensity: The pixel value corresponds to the brightness or intensity of the pixel. Higher
values indicate brighter areas, while lower values represent darker areas.
3. Image Representation: Grayscale images are often used in scenarios where color information
is not essential or can be omitted without losing critical details. Examples include medical
imaging, document processing, and certain types of photography.
4. *File Size: Grayscale images usually have smaller file sizes compared to their color
counterparts because they contain only one channel of information.
5. Image Processing: Grayscale images are commonly used in image processing tasks, such as
edge detection, image enhancement, and segmentation. The simplicity of grayscale images
makes certain types of image analysis more straightforward.
6. Printing and Display: Grayscale images are compatible with devices that support black-and-
white printing or display. They are also commonly used for printed documents, where color may
not be necessary or cost-effective.
Enhanced Image:
It seems like there might be a slight confusion in your question. Histograms are typically
associated with image enhancement rather than "enchantment." Image enhancement is a process
of improving the visual appearance of an image to make it more suitable for analysis or
presentation. Histograms play a crucial role in understanding and enhancing the contrast and
brightness of an image.
Here's a brief explanation of how histograms are used in image enhancement:
1. Understanding Histograms:
In the context of image processing, a histogram is a graphical representation of the
distribution of pixel intensities in an image.
The x-axis of the histogram represents the pixel intensity values (e.g., from 0 to 255 for
an 8-bit grayscale image), and the y-axis represents the frequency of occurrence of each
intensity.
2. Contrast Enhancement:
The histogram provides valuable information about the image's overall contrast. A
stretched histogram often indicates better contrast.
Histogram equalization is a common technique used to enhance contrast. It redistributes
the intensity values to cover the entire available range, resulting in a more balanced
distribution.
3. Brightness Adjustment:
Histograms can also be used to adjust the brightness of an image. Shifting the histogram
to the left or right corresponds to changing the overall brightness level.
Histogram stretching or normalization is a simple technique that involves scaling the
intensities to fill the entire dynamic range.
4. Local Histogram Equalization:
Global histogram equalization may lead to over-enhancement in some regions of the
image. Local histogram equalization techniques, like adaptive histogram equalization
(AHE), can be applied to smaller regions for more localized enhancement.
5. Color Image Enhancement:
For color images, histograms can be separately analysed for each color channel (e.g., red,
green, and blue) or transformed into other color spaces (e.g., HSV or LAB) for better
control over enhancement.
Histograms are powerful tools in image processing for understanding and enhancing the
distribution of pixel intensities in an image. Techniques like histogram equalization and
stretching can be applied to improve contrast and brightness, leading to visually enhanced
images for various applications.
When it comes to Machine Learning, artificial neural network performs really well. Artificial
Neural Networks are used in various classification task like image, audio, words. Different types
of Neural Networks are used for different purposes, for example for predicting the sequence of
words we use Recurrent Neural Networks more precisely an LSTM, similarly for image
classification we use Convolution Neural Network. In this we are going to build basic building
block for CNN. A convolutional neural network can consist of one or multiple convolutional
layers. The number of convolutional layers depends on the amount and complexity of the data.
Before diving into the Convolution Neural Network, let us first revisit some concepts of
Neural Network. In a regular Neural Network, there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our model. The number of neurons in
this layer is equal to total number of features in our data (number of pixels in case of an image).
2. Hidden Layer: The input from Input layer is then feed into the hidden layer. There can be
many hidden layers depending upon our model and data size. Each hidden layer can have
different numbers of neurons which are generally greater than the number of features. The output
from each layer is computed by matrix multiplication of output of the previous layer with
learnable weights of that layer and then by addition of learnable biases followed by activation
function which makes the network nonlinear.
3. Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into probability score of each class.
The data is then fed into the model and output from each layer is obtained this step is
called feed forward, we then calculate the error using an error function, some common error
functions are cross entropy, square loss error etc. After that, we back propagate into the model by
calculating the derivatives. This step is called back propagation which basically is used to
minimize the loss.
A Convolutional neural network (CNN) is a neural network that has one or more
convolutional layers and are used mainly for image processing, classification, segmentation and
also for other auto correlated data. A convolution is essentially sliding a filter over the input. One
helpful way to think about convolutions is this quote from Dr Prasad Samarakoon: “A
convolution can be thought as “looking at a function’s surroundings to make better/accurate
predictions of its outcome.” Rather than looking at an entire image at once to find certain
features it can be more effective to look at smaller portions of the image. The most common use
for CNNs is image classification, for example identifying satellite images that contain roads or
classifying hand written letters and digits. There are other quite mainstream tasks such as image
segmentation and signal processing, for which CNNs perform well at. CNNs have been used for
understanding in Natural Language Processing (NLP) and speech recognition, although often for
NLP Recurrent Neural Nets (RNNs) are used.
A CNN can also be implemented as a U-Net architecture, which are essentially two
almost mirrored CNNs resulting in a CNN whose architecture can be presented in a U shape. U-
nets are used where the output needs to be of similar size to the input such as segmentation and
image improvement. Each convolutional layer contains a series of filters known as convolutional
kernels. The filter is a matrix of integers that are used on a subset of the input pixel values, the
same size as the kernel. Each pixel is multiplied by the corresponding value in the kernel, then
the result is summed up for a single value for simplicity representing a grid cell, like a pixel, in
the output channel/feature map. These are linear transformations; each convolution is a type of
affine function. In computer vision the input is often a 3 channel RGB image. For simplicity, if
we take a greyscale image that has one channel (a two-dimensional matrix) and a 3x3
convolutional kernel (a two-dimensional matrix). The kernel strides over the input matrix of
numbers moving horizontally column by column, sliding/scanning over the first rows in the
matrix containing the images pixel values. Then the kernel strides down vertically to subsequent
rows.
Padding:
Reflection padding is by far the best approach, where the number of pixels needed for the
convolutional kernel to process the edge pixels are added onto the outside copying the pixels
from the edge of the image. For a 3x3 kernel, one pixel needs to be added around the outside, for
a 7x7 kernel then three pixels would be reflected around the outside. The pixels added around
each side is the dimension, halved and rounded down.
Traditionally in many research papers, the edge pixels are just ignored, which loses a
small proportion of the data and this gets increasing worse if there are many deep convolutional
layers. For this reason, I could not find existing diagrams to easily convey some of the points
here without being misleading and confusing stride 1 convolutions with stride 2 convolutions.
With padding, the output from a input of width w and height h would be width w and
height h (the same as the input with a single input channel), assuming the kernel takes a stride of
one pixel at a time.
Strides:
It is common to use a stride two convolution rather than a stride one convolution, where
the convolutional kernel strides over 2 pixels at a time, for example our 3x3 kernel would start at
position (1, 1), then stride to (1, 3), then to (1, 5) and so on, halving the size of the output
channel/feature map, compared to the convolutional kernel taking strides of one. With padding,
the output from an input of width w, height h and depth 3 would be the ceiling of width w/2,
height h/2 and depth 1, as the kernel outputs a single summed output from each stride.
For example, with an input of 3x64x64 (say a 64x64 RGB three channel image), one kernel
taking strides of two with padding the edge pixels, would produce a channel/feature map of
32x32.
The first step of creating and training a new convolutional neural network (Convnet) is to define
the network architecture. This topic explains the details of Convent layers, and the order they
appear in a ConvNet. For a complete list of deep learning layers and how to create them, see List
of Deep Learning Layers. To learn about LSTM networks for sequence classification and
regression, see Long Short-Term Memory Networks. To learn how to create your own custom
layers, see Define Custom Deep Learning Layers. The network architecture can vary depending
on the types and numbers of layers included.
Create an image input layer using image input layer. An image input layer inputs images to a
network and applies data normalization. Specify the image size using the input Size argument.
The size of an image corresponds to the height, width, and the number of color channels of that
image. For example, for a grayscale image, the number of channels is 1, and for a color image it
is 3.
Convolutional layer:
A 2-D convolutional layer applies sliding convolutional filters to the input. Create a 2-D
convolutional layer using convolution2dLayer. The convolutional layer consists of various
components. Filters and Stride a convolutional layer consist of neurons that connect to sub
regions of the input images or the outputs of the previous layer. The layer learns the features
localized by these regions while scanning through an image. When creating a layer using the
convolution2dLayer function, you can specify the size of these regions using the filter Size input
argument.
Dilated Convolution:
A dilated convolution is a convolution in which the filters are expanded by spaces inserted
between the elements of the filter. Specify the dilation factor using the 'Dilation Factor' property.
Use dilated convolutions to increase the receptive field (the area of the input which the layer can
see) of the layer without increasing the number of parameters or computation. The layer expands
the filters by inserting zeros between each filter element. The dilation factor determines the step
size for sampling the input or equivalently the up-sampling factor of the filter. It corresponds to
an effective filter size of (Filter Size – 1). * Dilation Factor + 1. For example, a 3-by-3 filter with
the dilation factor [2 2] is equivalent to a 5-by-5 filter with zeros between the elements. This
image shows a 3-by-3 filter dilated by a factor of two scanning through the input. The lower map
represents the input and the upper map represents the output.
Feature Maps
As a filter moves along the input, it uses the same set of weights and the same bias for the
convolution, forming a feature map. Each feature map is the result of a convolution using a
different set of weights and a different bias. Hence, the number of feature maps is equal to the
number of filters. The total number of parameters in a convolutional layer is ((h*w*c +
1)*Number of Filters), where 1 is the bias.
Zero Padding
You can also apply zero padding to input image borders vertically and horizontally using the
'Padding' name-value pair argument. Padding is rows or columns of zeros added to the borders of
an image input. By adjusting the padding, you can control the output size of the layer. This
image shows a 3-by-3 filter scanning through the input with padding of size 1. The lower map
represents the input and the upper map represents the output.
A Rectified Linear Unit is used as a non-linear activation function. A ReLU says if the value is
less than zero, round it up to zero. Create a ReLU layer using reluLayer. A ReLU layer performs
a threshold operation to each element of the input, where any value less than zero is set to zero.
Convolutional and batch normalization layers are usually followed by a nonlinear activation
function such as a rectified linear unit (ReLU), specified by a ReLU layer. A ReLU layer
performs a threshold operation to each element, where any input value less than zero is set to
zero, that is, The ReLU layer does not change the size of its input. There are other nonlinear
activation layers that perform different operations and can improve the network accuracy for
some applications. For a list of activation layers, see Activation Layers.
Batch normalization has the benefits of helping to make a network output more stable
predictions, reduce over fitting through regularization and speeds up training by an order of
magnitude. Batch normalization is the process of carrying normalization within the scope
activation layer of the current batch, subtracting the mean of the batch’s activations and dividing
by the standard deviation of the batch’s activations. Create a batch normalization layer using
batchNormalizationLayer. A batch normalization layer normalizes each input channel across a
mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to
network initialization, use batch normalization layers between convolutional layers and
nonlinearities, such as ReLU layers.
One example of such a system is the work of Liu et al. (2020), who proposed a CNN-based
approach for detecting forgery in handwritten Chinese characters. The system first extracts
features from the input image using a pre-trained CNN for character recognition, and then uses
these features to train a separate CNN for forgery detection. The forgery detection CNN is
trained using a dataset of genuine and forged characters, and is able to achieve high accuracy in
detecting forged characters. Another potential approach would be to use a CNN-based system for
detecting forgery directly from the input image without first identifying the numerals. This
approach has been explored in the context of signature verification, where CNNs have been used
to detect forged signatures based on various features of the handwriting, such as stroke direction
and pressure. One example of such a system is the work of Feng et al. (2020), who proposed a
CNN-based approach for detecting forged signatures using a dataset of genuine and forged
signatures. In summary, there are several existing methods for detecting forgery in handwritten
numerals using CNNs. These methods typically involve extracting features from the handwriting
and using them to train a separate CNN-based forgery detection system. However, the specific
approach used may depend on the characteristics of the input image and the type of forgery being
detected.
The concept of convolutional layers in neural networks was introduced in the 1980s, but it wasn't
until the 2010s that convolutional neural networks (CNNs) became a popular technique for
image classification and other computer vision tasks. In MATLAB, the convolutional layer is
implemented as part of the Neural Network Toolbox. The syntax for creating a convolutional
layer with 3 filters of size 16x16
This creates a layer with 3 filters (also called "channels") that each have a size of 16x16. The
layer applies each filter to the input image using a sliding window approach, which calculates a
dot product between the filter and the overlapping region of the input image. Convolutional
layers are typically followed by activation functions (such as ReLU or sigmoid) and pooling
layers (such as max pooling or average pooling) to reduce the spatial dimensions of the output
feature maps. Overall, convolutional layers have revolutionized the field of computer vision and
are now a cornerstone of many deep learning architectures.
Batch normalization is a technique used in machine learning to improve the performance and
stability of neural networks. It was first introduced by Sergey Ioffe and Christian Szegedy in
their 2015 paper "Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift." The paper proposed a method to normalize the activations of the
hidden layers in a neural network by adjusting and scaling the mean and variance of the batch of
inputs to each layer. The batch normalization layer was then implemented in popular deep
learning frameworks such as TensorFlow and Pytorch. In MATLAB, the batch normalization
layer was introduced in the 2017a release, as part of the Deep Learning Toolbox.
The batch normalization layer can be added to a neural network using the batch Normalization
Layer function, which takes input arguments such as the normalization mode, the epsilon value,
and the learn rate. In MATLAB, the batch normalization layer can be used in various deep
learning applications, such as image classification, object detection, and natural language
processing. It has been shown to improve the training speed and accuracy of deep neural
networks, and is now considered a standard technique in modern deep learning architectures.
The Max Pooling layer is a fundamental building block in convolutional neural networks
(CNNs) for image classification tasks. In this layer, a window of fixed size (usually 2x2) is
moved over the input image or feature map, and the maximum value within each window is
extracted and used to create a new, down sampled feature map. The Max Pooling layer was first
introduced in the late 1990s by Yann LeCun and his colleagues in their work on Convolutional
Neural Networks for handwritten digit recognition. Since then, it has become a popular and
widely used technique in CNNs for various computer vision tasks. In MATLAB, the Max
Pooling layer can be implemented using the "maxPooling2dLayer" function. The function takes
in the input image or feature map and the window size (2x2) as inputs, and returns the down
sampled feature map
The term "convolution" refers to a mathematical operation that is used in various fields,
including signal processing, image processing, and mathematics. In MATLAB, the convolution
operation can be performed using the "conv" function. Here is a brief history of the convolution
operation in MATLAB: The convolution function was first introduced in MATLAB 1.0, which
was released in 1984. The function was used to compute the convolution of two signals or arrays.
In MATLAB 5.0, which was released in 1996, the convolution function was optimized to
improve performance. The function was modified to take advantage of the Fast Fourier
Transform (FFT) algorithm for large arrays, which reduced the computation time significantly.
In MATLAB 7.0, which was released in 2004, the convolution function was further optimized to
support multi-dimensional arrays. This allowed users to perform convolution operations on
images and other multi-dimensional data sets. In MATLAB R2016a, which was released in
2016, a new function "conv2" was introduced. This function is specifically designed for
performing 2D convolution operations on images. It is optimized for speed and can handle large
images efficiently. To perform a convolution operation in MATLAB with a (3,32) kernel, you
can use the "conv" or "conv2" function depending on whether you are working with a one-
dimensional or two-dimensional signal/image
Batch normalization has the benefits of helping to make a network output more stable
predictions, reduce over fitting through regularization and speeds up training by an order of
magnitude. Batch normalization is the process of carrying normalization within the scope
activation layer of the current batch, subtracting the mean of the batch’s activations and dividing
by the standard deviation of the batch’s activations. Create a batch normalization layer using
batch Normalization Layer. A batch normalization layer normalizes each input channel across a
mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to
network initialization, use batch normalization layers between convolutional layers and
nonlinearities, such as ReLU layers.
The layer first normalizes the activations of each channel by subtracting the mini-batch mean and
dividing by the mini-batch standard deviation. Then, the layer shifts the input by a learnable
offset β and scales it by a learnable scale factor γ. β and γ are themselves learnable parameters
that are updated during network training. Batch normalization layers normalize the activations
and gradients propagating through a neural network, making network training an easier
optimization problem. To take full advantage of this fact, you can try increasing the learning rate.
Since the optimization problem is easier, the parameter updates can be larger and the network
can learn faster. You can also try reducing the L2 and dropout regularization. With batch
normalization layers, the activations of a specific image during training depend on which images
happen to appear in the same mini-batch. To take full advantage of this regularizing effect, try
shuffling the training data before every training epoch. To specify how often to shuffle the data
during training, use the 'Shuffle' name-value pair argument of training Options
A Rectified Linear Unit is used as a non-linear activation function. A ReLU says if the value is
less than zero, round it up to zero. Create a ReLU layer using reluLayer. A ReLU layer performs
a threshold operation to each element of the input, where any value less than zero is set to zero.
Convolutional and batch normalization layers are usually followed by a nonlinear activation
function such as a rectified linear unit (ReLU), specified by a ReLU layer. A ReLU layer
performs a threshold operation to each element, where any input value less than zero is set to
zero, that is, The ReLU layer does not change the size of its input. There are other nonlinear
activation layers that perform different operations and can improve the network accuracy for
some applications. For a list of activation layers, see Activation Layers.
A max pooling layer performs down-sampling by dividing the input into rectangular pooling
regions, and computing the maximum of each region. Create a max pooling layer using
maxPooling2dLayer. An average pooling layer performs down-sampling by dividing the input
into rectangular pooling regions and computing the average values of each region. Create an
average pooling layer using averagePooling2dLayer. Pooling layers follow the convolutional
layers for down-sampling, hence, reducing the number of connections to the following layers.
They do not perform any learning themselves, but reduce the number of parameters to be learned
in the following layers. They also help reduce over fitting.
A max pooling layer returns the maximum values of rectangular regions of its input. The size of
the rectangular regions is determined by the pool Size argument of max Pooling Layer. For
example, if pool Size equals [2, 3], then the layer returns the maximum value in regions of height
2 and width 3. An average pooling layer outputs the average values of rectangular regions of its
input. The size of the rectangular regions is determined by the pool Size argument of average
Pooling Layer. For example, if pool Size is [2, 3], then the layer returns the average value of
regions of height 2 and width 3.
Pooling layers scan through the input horizontally and vertically in step sizes you can specify
using the 'Stride' name-value pair argument. If the pool size is smaller than or equal to the stride,
then the pooling regions do not overlap. For non-overlapping regions (Pool Size and Stride are
equal), if the input to the pooling layer is n-by-n, and the pooling region size is h-by-h, then the
pooling layer down-samples the regions by h [6]. That is, the output of a max or average pooling
layer for one channel of a convolutional layer is n/h-by-n/h. For overlapping regions, the output
of a pooling layer is (Input Size – Pool Size + 2*Padding)/Stride + 1.
The history of convolution in MATLAB dates back to the earliest versions of the software.
Convolution is a fundamental operation in signal and image processing, and it has been a part of
MATLAB's signal processing toolbox since the earliest versions of the software. In MATLAB,
the function conv2 is used for two-dimensional convolution. This function takes two input
matrices, computes their convolution, and returns the result as a new matrix
In addition to the conv2 function, MATLAB also provides other functions for convolution, such
as conv, convn, and filter2. These functions are used for one-dimensional convolution, n-
dimensional convolution, and filtering
Batch normalization layer:
Batch normalization is a technique used in deep learning to improve the performance of neural
networks by normalizing the inputs to each layer. This helps to reduce the internal covariate
shift, which can slow down the training process and decrease accuracy. In MATLAB, the batch
normalization layer can be implemented using the 'batch Normalization Layer' function. The
function takes several parameters, including the number of input channels, the momentum for the
moving average, and the epsilon value for numerical stability. The use of convolution (3,64) in
batch normalization layers in MATLAB is an effective way to improve the performance of
CNNs for image processing tasks. By normalizing the inputs to each layer, the network can be
trained more efficiently and achieve higher accuracy.
Finally, the ReLU layer is a commonly used activation function that applies the Rectified Linear
Unit function to the output of the batch normalization layer. The ReLU function helps to
introduce non-linearity to the model, allowing it to learn more complex features and improve its
accuracy. Together, these three layers (convolution, batch normalization, and ReLU) form the
backbone of many modern CNNs used for image classification tasks. Their combination has led
to significant improvements in the accuracy and speed of image recognition systems, enabling
the development of new applications in fields like computer vision, robotics, and autonomous
vehicles.
Create a fully connected layer using fully connected layer. A fully connected layer multiplies the
input by a weight matrix and then adds a bias vector. The convolutional (and down-sampling)
layers are followed by one or more fully connected layers. As the name suggests, all neurons in a
fully connected layer connect to all the neurons in the previous layer. This layer combines all of
the features (local information) learned by the previous layers across the image to identify the
larger patterns. For classification problems, the last fully connected layer combines the features
to classify the images. This is the reason that the output Size argument of the last fully connected
layer of the network is equal to the number of classes of the data set. For regression problems,
the output size must be equal to the number of response variables.
You can also adjust the learning rate and the regularization parameters for this layer using the
related name-value pair arguments when creating the fully connected layer. If you choose not to
adjust them, then train Network uses the global training parameters defined by the training
Options function. For details on global and layer training options, and train the neural network. A
fully connected layer multiplies the input by a weight matrix W and then adds a bias vector b. If
the input to the layer is a sequence (for example, in an LSTM network), then the fully connected
layer acts independently on each time step. For example, if the layer before the fully connected
layer outputs an array X of size D-by-N-by-S, then the fully connected layer outputs an array Z of
size output Size-by-N-by-S. At time step t, the corresponding entry of Z is, where denotes time
step t of X.
A softmax layer applies a softmax function to the input. Create a softmax layer using softmax
layer. A classification layer computes the cross-entropy loss for multi-class classification
problems with mutually exclusive classes. Create a classification layer using classification Layer.
For classification problems, a softmax layer and then a classification layer must follow the final
fully connected layer. The softmax function is also known as the normalized exponential and can
be considered the multi-class generalization of the logistic sigmoid function. For typical
classification networks, the classification layer must follow the softmax layer. In the
classification layer, train Network takes the values from the softmax function.
CHAPTER 5
ADVANTAGES AND APPLICATIONS
Advantages
Great Detectives: CNNs are like super detectives for images. They're fantastic at finding
and understanding important things in pictures.
Recognize Patterns: They're experts at spotting patterns, shapes, and details, which
helps them tell one thing from another in photos.
Perfect for Photos: CNNs are the reason your phone recognizes faces, self-driving cars
see the road, and doctors find problems in medical scans.
Handles Big Pictures: They can work with large images, making them suitable for many
different types of pictures.
Applications:
Spotting Animals: CNNs help find and count animals in the wild, helping scientists with
wildlife research.
Recognizing Faces: They make it possible for your phone to recognize your face and
unlock itself.
Driving Safely: Self-driving cars use CNNs to see the road and avoid accidents by
spotting other cars, pedestrians, and traffic signs.
Finding Diseases: In hospitals, CNNs help doctors find illnesses and problems in X-rays
and scans.
CHAPTER 6
EXPERIMENTAL RESULTS
In conclusion, the proposed cloud particle shape recognition method based on a 15-layer
convolutional neural network (CNN) offers a significant advancement in cloud microphysics
research. Addressing the limitations of subjective and time-consuming manual classification
methods, this approach leverages the efficiency and accuracy of artificial intelligence. The use of
the airborne two-dimensional stereo probe detector (2D-S) in conjunction with the CNN model
provides a robust framework for automatic cloud particle shape recognition. The incorporation of
a lightweight convolution module enhances the model's efficiency without compromising on its
ability to accurately classify cloud particle shapes. By overcoming the challenges associated with
traditional classification methods, this innovative approach not only streamlines the process but
also improves the overall classification effectiveness. The utilization of advanced technologies
like CNNs in cloud microphysics research contributes to a more comprehensive understanding of
the crucial role clouds play in climate change dynamics, emphasizing the importance of accurate
and efficient cloud characterization for climate studies.
REFERENCES
[1] L. Wu, P. Fernandez-Loaiza, J. Sauma, E. Hernandez-Bogantes, and M. Masis,
“Classification of diabetic retinopathy and diabetic macular edema,” World Journal of Diabetes,
vol. 4, issue 6, Dec. 2013, pp. 290- 294.
[3] J. D. Osborne, S. Gao, W.-B. Chen, A. Andea, and C. Zhang, “Machine classification of
melanoma and nevi from skin lesions,” in Proceedings of the 2011 ACM Symposium on Applied
Computing (SAC 2011), ACM, Mar. 2011, pp. 100-105.
[7] M. Alban and T. Gilligan, “Automated detection of diabetic retinopathy using fluorescein
angiography photographs,” Stanford Technical Report, 2016.
[8] Diabetic retinopathy detection: identify signs of diabetic retinopathy in eye images,
https://www.kaggle.com/c/diabetic-retinopathy-detection
[10] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” in Proceedings of International Conference on Learning Representations (ICLR
2015), Sep. 2015.
[11] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”
Technical Report, 2009.
[13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception
architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR 2016), IEEE, Jun. 2016, pp. 2818-2826.
BIBLIOGRAPHY
Introduction To Matlab
What Is MATLAB?
The name MATLAB stands for Matrix Laboratory. The software is built up around vectors
and matrices. This makes the software particularly useful for linear algebra but MATLAB is
also a great tool for solving algebraic and differential equations and for numerical
integration. MATLAB has powerful graphic tools and can produce nice pictures in both 2D
and 3D. It is also a programming language, and is one of the easiest programming languages
for writing mathematical programs. These factors make MATLAB an excellent tool for
teaching and research.
MATLAB was written originally to provide easy access to matrix software developed by the
LINPACK (linear system package) and EISPACK (Eigen system package) projects. It
integrates computation, visualization, and programming environment. Furthermore,
MATLAB is a modern programming language environment: it has sophisticated data
structures, contains built-in editing and debugging tools, and supports object-oriented
programming. MATLAB has many advantages compared to conventional computer
languages (e.g., C, FORTRAN) for solving technical problems.
MATLAB abilities a family of add-on software program utility software application software
program software utility software-unique solutions called toolboxes. Very essential to
maximum customers of MATLAB, toolboxes assist you to studies and observe specialized
technology. Toolboxes are entire collections of MATLAB abilities (M-files) that increase the
MATLAB surroundings to remedy precise schooling of problems. Areas in which toolboxes
are to be had embody signal processing, manipulate systems, neural networks, fuzzy correct
judgment, wavelets, simulation, and hundreds of others.
It has powerful built-in routines that enable a very wide variety of computations. It also has
easy to use graphics commands that make the visualization of results immediately available.
Specific applications are collected in packages referred to as toolbox. There are toolboxes for
signal processing, symbolic computation, control theory, simulation, optimization, and
several other fields of applied science and engineering. MATLAB is an interactive system
whose basic data element is an array that does not require dimensioning. The software
package has been commercially available since 1984 and is now considered as a standard
tool at most universities and industries worldwide.
Cleve Moler, the chairman of the computer science department at the University of New
Mexico, started developing MATLAB in the late 1970s. The first MATLAB® was not a
programming language; it was a simple interactive matrix calculator. There were no
programs, no toolboxes, no graphics and no ODEs or FFTs. He designed it to give his
student’s access to LINPACK and EISPACK without them having to learn FORTRAN. It
soon spread to other universities and found a strong audience within the applied
mathematics community. The mathematical basis for the first version of MATLAB was a
series of research papers by J. H. Wilkinson and 18 of his colleagues, published between
1965 and 1970 and later collected in Handbook for Automatic Computation, Volume II,
Linear Algebra, edited by Wilkinson and C. Reinsch. These papers present algorithms,
implemented in Algol 60, for solving matrix linear equation and Eigen value problems.
In the 1970s and early 1980s, I was teaching Linear Algebra and Numerical Analysis at the
University of New Mexico and wanted my students to have easy access to LINPACK and
EISPACK without writing FORTRAN programs. By “easy access,” I meant not going
through the remote batch processing and the repeated edit-compile-link-load-execute process
that was ordinarily required on the campus central mainframe computer. Jack little, an
engineer, was exposed to it during a visit Moler made to Stanford University in 1983.
Recognizing its commercial potential, he joined with Moler and Steve Bangert. They rewrote
MATLAB in C and founded Math Works in 1984 to continue its development. These
rewritten libraries were known as JACKPAC. In 2000, MATLAB was rewritten to use a
newer set of libraries for matrix manipulation, LAPACK. MATLAB was first adopted by
researchers and practitioners in control engineering, little’s specialty, but quickly spread to
many other domains. It is now also used in education, in particular the teaching of linear
algebra and numerical analysis, and is popular amongst scientists involved in video
processing.
EISPACK and LINPACK:
In 1975, four of us Jack Dongarra, Pete Stewart, Jim Bunch, and myself proposed to the NSF
another research project that would investigate methods for the development of mathematical
software. A byproduct would be the software itself, dubbed LINPACK, for Linear Equation
Package. This project was also centered at Argonne. LINPACK originated in FORTRAN; it
did not involve translation from Algol. The package contained 44 subroutines in each of four
numeric precisions. In a sense, the LINPACK and EISPACK projects were failures. We had
proposed research projects to the NSF to “explore the methodology, costs, and resources
required to produce, test, and disseminate high-quality mathematical software.” We never
wrote a report or paper addressing those objectives. We only produced software.
So, I studied Niklaus Wirth’s book Algorithms + Data Structures = Programs and learned
how to parse programming languages. I wrote the first MATLAB an acronym for Matrix
Laboratory in FORTRAN, with matrix as the only data type. The project was a kind of
hobby, a new aspect of programming for me to learn and something for my students to use.
There was never any formal outside support, and certainly no business plan. This first
MATLAB was just an interactive matrix calculator. This snapshot of the start-up screen
shows all the reserved words and functions. There are only 71. To add another function, you
had to get the source code from me, write a FORTRAN subroutine, add your function name
to the parse table, and recompile MATLAB.
Starting MATLAB:
After logging into your account, you can enter MATLAB by double-clicking on the
MATLAB shortcut icon (MATLAB 7.0.4) on your Windows desktop. When you start
MATLAB, a special window called the MATLAB desktop appears. The desktop is a window
that contains other windows. The major tools within or accessible from the desktop are:
The Command Window
The Command History
The Workspace
The Current Directory
The Help Browser
Current Folder: This panel allows you to access the project folders and files.
Command Window: This is the main area where commands can be entered at the
command line. It is indicated by the command prompt (>>).
Workspace: The workspace shows all the variables created and/or imported from files.
Command History: This panel shows or return commands that are entered at the command
line.
Help Browser:
The critical way to get assist online is to use the MATLAB help browser, opened as a
separate window every through clicking at the question mark photograph (?) on the
computing tool toolbar, or through manner of typing assist browser on the spark off in the
command window. The assist Browser is an internet browser blanketed into the MATLAB
computing tool that shows a Hypertext Markup Language (HTML) file. The Help Browser
consists of panes, the help navigator pane, used to find out information, and the show pane,
used to view the information. Self-explanatory tabs apart from navigator pane are used to
performs are searching out.
MATLAB language:
This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both
"programming in the small" to rapidly create quick and dirty throw-away programs, and
"programming in the large" to create complete large and complex application programs.
MATLAB DESKTOP:
MATLAB Desktop is the precept MATLAB utility window. The computing tool includes
five sub home windows, the command window, the workspace browser, the modern-day-day
list window, the command records window, and one or greater decide domestic windows,
which is probably confirmed high-quality on the identical time due to the truth the client
suggests a photo. The command window is in which the character types MATLAB
instructions and expressions at the spark off (>>) and in which the output of these commands
is displayed. MATLAB defines the workspace because the set of variables that the client
creates in a bit consultation. The workspace browser suggests those variables and some facts
about them. Double clicking on a variable within the workspace browser launches the Array
Editor, which may be used to gain statistics and profits instances edit exceptional homes of
the variable.
The modern-day-day-day Directory tab above the workspace tab suggests the contents of the
cutting-edge list, whose path is shown inside the modern-day list window. For example, in
the home windows on foot machine the path is probably as follows: C: MATLAB Work,
indicating that listing “artwork” is a subdirectory of the number one list “MATLAB”;
WHICH IS INSTALLED IN DRIVE C. Clicking on the arrow within the modern list
window suggests a listing of these days used paths. Clicking at the button to the right of the
window permits the individual to trade the present-day listing. MATLAB uses a seeking out
path to find out M-documents and one-of-a-type MATLAB associated documents, which can
be put together in directories within the computer document tool. Any report run in
MATLAB need to be dwelling in the modern-day-day listing or in a list that is on is looking
for course. By default, the documents supplied with MATLAB and math works toolboxes are
included inside the searching out direction. The first-rate manner to look which directories
are on the searching out route. The satisfactory manner to appearance which directories are
speedy the quest route, or to characteristic or regulate a searching for course, is to pick out
outset path from the File menu the computing device, and then use the set course talk
discipline. It is proper exercise to feature any generally used directories to the hunt route to
avoid again and again having the exchange the cutting-edge-day listing.
The Command History Window contains a file of the instructions a person has entered in the
command window, together with every contemporary-day and former MATLAB periods.
Previously entered MATLAB instructions can be determined on and re-completed from the
command statistics window thru proper clicking on a command or series of commands. This
movement launches a menu from which to select numerous options similarly to executing the
commands. This is useful to select out abilities options in addition to executing the
instructions. This is a beneficial feature at the equal time as experimenting with numerous
commands in a piece session.
Features of MATLAB:
Uses of MATLAB:
Applications of MATLAB:
MATLAB can be used as a tool for simulating various electrical networks but the recent
developments in MATLAB make it a very competitive tool for Artificial Intelligence, Robotics,
Video processing, Wireless communication, Machine learning, Data analytics and whatnot.
Though it’s mostly used by circuit branches and mechanical in the engineering domain to solve a
basic set of problems its application is vast. It is a tool that enables computation, programming
and graphically visualizing the results. The basic data element of MATLAB as the name
suggests is the Matrix or an array. MATLAB toolboxes are professionally built and enable
you to turn your imaginations into reality. MATLAB programming is quite similar to C
programming and just requires a little brush up of your basic programming skills to start working
with.
Curve fitting
The curve fitting toolbox helps to analyze the pattern of occurrence of data. After a particular
trend which can be a curve or surface is obtained, its future trends can be predicted. Further
plotting, calculating integrals, derivatives, interpolation, etc. can be done.
Control systems
Systems nature can be obtained. Factors such as closed-loop, open-loop, its controllability
and observability, bode plot, NY Quist plot, etc. can be obtained. Various controlling
techniques such as PD, PI and PID can be visualized. Analysis can be done in the time
domain or frequency domain.
Signal Processing
Signals and systems and digital signal processing are taught in various engineering streams.
But MATLAB provides the opportunity for proper visualization of this. Various transforms
such as Laplace, Z, etc. can be done on any given signal. Theorems can be validated.
Analysis can be done in the time domain or frequency domain. There are multiple built-in
functions that can be used.
Mapping
Mapping has multiple applications in various domains. For example, in Big Data, the Map
Reduce tool is quite important which has multiple applications in the real world. Theft
analysis or financial fraud detection, regression models, contingency analysis, predicting
techniques in social media, data monitoring, etc. can be done by data mapping.
Deep learning
It’s a subclass of machine learning which can be used for speech recognition, financial fraud
detection, and medical video analysis. Tools such as time-series, Artificial neural network
(ANN), Fuzzy logic or combination of such tools can be employed.
Financial analysis
An entrepreneur before starting any endeavor needs to do a proper survey and the financial
analysis in order to plan the course of action. The tools needed for this are all available in
MATLAB. Elements such as profitability, solvency, liquidity, and stability can be identified.
Business valuation, capital budgeting, cost of capital, etc. can be evaluated.
Video processing
The most common application that we observe almost every day are bar code scanners, selfie
(face beauty, blurring the background, face detection), video enhancement, etc. The digital
video processing also plays quite an important role in transmitting data from far off satellites
and receiving and decoding it in the same way. Algorithms to support all such applications
are available.
Text analysis
Based on the text, sentiment analysis can be done. Google gives millions of search results for
any text entered within a few milliseconds. All this is possible because of text analysis.
Handwriting comparison in forensics can be done. No limit to the application and just one
software which can do this all.
Electric vehicles designing
Used for modeling electric vehicles and analyze their performance with a change in system
inputs. Speed torque comparison, designing and simulating of a vehicle, whatnot.
Aerospace
This toolbox in MATLAB is used for analyzing the navigation and to visualize flight
simulator.
Audio toolbox
Provides tools for audio processing, speech analysis, and acoustic measurement. It also
provides algorithms for audio and speech feature extraction and audio signal transformation.
DIGITAL IMAGE/VIDEO PROCESSING
Digital image processing:
Digital Image Processing means processing digital image by means of a digital computer.
We can also say that it is a use of computer algorithms, in order to get enhanced image either
to extract some useful information.
Image:
An image is defined as a two-dimensional function, F(x, y), where x and y are spatial
coordinates, and the amplitude of F at any pair of coordinates (x, y) is called the intensity of
that image at that point. When x, y, and amplitude values of F are finite, we call it a digital
image. In other words, an image can be defined by a two-dimensional array specifically
arranged in rows and columns. Image is composed of a finite number of elements, each of
which elements have a particular value at a particular location. These elements are referred to
as picture elements, image elements, and pixels.
Since capturing an image from a camera is a physical process. The sunlight is used as a
source of energy. A sensor array is used for the acquisition of the image. So, when the
sunlight falls upon the object, then the amount of light reflected by that object is sensed by
the sensors, and a continuous voltage signal is generated by the amount of sensed data. In
order to create a digital image, we need to convert this data into a digital form. This involves
sampling and quantization. (They are discussed later on). The result of sampling and
quantization results in a two-dimensional array or matrix of numbers which are nothing but a
digital image.
An image can be portrayed as a - dimensional trademark f (x, y), in which x and y are
spatial directions, and the sufficiency of any combine of instructions (x, y) is known as the
pressure or darkish degree of the image at that inconvenience. Whenever x, y and the
abundance estimations off are on the entire confined discrete quantities, we call the picture a
virtual photo. The district of DIP alludes to getting ready computerized photo through
strategies for to method for MATLAB. Manner of the use of advanced pc. Computerized
image incorporates of a confined form of things, every one in every of which has a chosen
location and fee. The components are alluded to as pixels.
Vision is the maximum innovative of our sensor, so it isn't sudden that photograph play
the unmarried greatest important component in human conviction. Nonetheless, in appraisal
to humans, who are controlled to the visible band of the EM variety imaging machines cover
nearly the complete EM range, starting from gamma to radio waves. They can highlight also
on previews created with the valuable useful manual of benefits that people aren't conscious
of accomplice with airship picture. There isn't commonly any present settlement among
creators concerning in which photo managing stops and specific associated districts nearby
aspect photo evaluation& workstation imaginative and prescient start.
In a few instances a difference is made through the use of characterizing picture handling
as an area wherein each the information and yield at a way are snap shots. This is
constraining and predominantly manufactured restriction. The district of image investigation
(photograph getting to know) is in amongst photograph getting ready and PC imaginative and
insightful. There aren't any easy restrictions in the continuum from picture preparing at one
prevent to complete ingenious and sensible on the inverse. In any case, one precious
worldview is to revel in as a primary challenge three types of automatic procedures in this
continuum: low-, mid-, and radical affirmation methodologies. Low-certificate way includes
crude obligations which incorporates image preparing to reduce clamor, appraisal
improvement and photo cleansing. A low-certificates approach is described through the way
that very it inputs and yields are previews.
Image enhancement:
It is amongst the simplest and most appealing in areas of Image Processing it is also used
to extract some hidden details from an image and is subjective.
Image restoration:
It also deals with appealing of an image but it is objective (Restoration is based on
mathematical or probabilistic model or image degradation).
Color image processing:
It deals with pseudo color and full color image processing color models are applicable to
digital image processing.
Image compression:
It involves in developing some functions to perform this operation. It mainly deals with
image size or resolution.
Morphological processing:
It deals with tools for extracting image components that are useful in the representation &
description of shape.
Segmentation:
It follows output of segmentation stage, choosing a representation is only the part of solution
for transforming raw data into processed data.
Color image:
It may be spoken to with the aid of techniques for manner of three capacities, R (xylem) for
purple, G (xylem) for inexperienced and B (xylem)for blue. An image may be nonstop with
renowned to the x and y arranges and moreover in sufficiency. Changing over this type of
picture to digital form requires that the guidelines in addition to the adequacy to be digitized.
Digitizing the set up's qualities is called analyzing. Digitizing the abundancy esteems is
known as quantization.
Grayscale image: ƒ
The image has 8 bits and 256 tons of grey; 1 = black and 255 = white. ƒ Requires 8 times
more saving space than a line-art image. Suitable for presenting black and white
photographs, for instance. Can be used in printing office.
Image Types:
1. Intensity of pixels;
2. Twofold images;
3. Filed images;
4. R G B images.
Most monochrome image making ready sports are finished utilizing parallel or force pix, so
our underlying highlight is on this image composes. Filed and RGB shading images.
Intensity Images:
A profundity picture is a measurement lattice whose traits were scaled to talk to goals. At the
point while the components of a profundity photo are of class unit8, or elegance unit sixteen,
they have complete quantity traits in the collection [0,255] and [0, 65535], for my part. On
the off danger that the picture is of class twofold, the qualities are skimming phase numbers.
Estimations of scaled, twofold pressure images are within the assortment [0, 1] by means of
methods for subculture.
Binary Images:
B=logical (A)
In the event that A contains of elements separated from 0s and 1s. Use of the intelligent
capability changes over all nonzero segments to sensible 1s and all sections with rate 0 to
coherent 0s. Utilizing social and valid administrators further makes clever well-known
shows. To take a look at if a cluster is coherent, we make use of the I practical trademark: is
logical(c). In the occasion that c is a coherent show off, this trademark restores a 1.
Otherwise returns a zero. Consistent cluster is probably modified over to numeric reveals the
utilization of the statistics style transformation presents.
Indexed Images:
Framework define a m*3 kind of magnificence twofold containing skimming trouble esteems
within the assortment [0, 1]. The duration m of the guide is identical to the huge sort of
shades it characterizes. Each line of manual suggests the blood pink, green and blue brought
materials of a solitary shading. Recorded pix make utilization of "coordinate mapping" of
pixel electricity esteems shading map esteems. The tinge of every pixel is resolved through
way of using the relating rate the whole range grid x as a pointer in to delineate. On the off
danger that x is of modernity twofold, at that factor the majority of its segments with values
masses substantially less than or indistinguishable to no less than one difficulty to the crucial
column in delineate, brought materials with fee 2 thing to the second line et cetera. In the
event that x is of complexity devices or unit 16, at that factor all delivered substances fee
zero thing to the important line in outline, introduced materials with charge 1 aspect to the
second et cetera.
RGB Image:
A RGB shading photograph is a M*N*three exhibit of tinge pixels wherein each coloration
pixel is triplet much like the purple, inexperienced and blue brought materials of a RGB
image, at a particular spatial area. A RGB image is probably considered as "stack" of three
dim scale pics that after advocated in to the darkish pink, green and blue contributions of a
tinge display screen. Deliver a shading picture at the show. Tradition the three previews
shaping a RGB color image are alluded to as the red, unpracticed and blue brought
substances pictures. The information fashion of the brought materials images comes to a
decision their form of qualities. On the off hazard that a RGB image is of modernity twofold
the type of traits is [0, 1]. Correspondingly the sort of characteristics is [0,255] or [0, 65535].
For RGB pics of modernity gadgets or unit sixteen individually. The form of bits uses to
speaks to the pixel estimations of the aspect pictures makes a decision the bit profundity of a
RGB photo. For instance, if every aspect image is a 8bit picture, the evaluating RGB photo is
expressed to be 24 bits profound. For the most part, the collection of bits in all
inconvenience snap shots is the indistinguishable. For this case the type of feasible shading in
a RGB photograph is (2^b) ^three, in which in b is numerous bits in the entirety about. For
the 8bit case the amount is 16,777,216 colorations.
Advantages of digital image:
The processing of images is faster and more cost-effective. One needs less time for
processing, as well as less film and other photographing equipment. ƒ
It is more ecological to process images. No processing or fixing chemicals are needed to
take and process digital images. However, printing inks are essential when printing
digital images.
When shooting a digital image, one can immediately see if the image is good or not. ƒ
copying a digital image is easy, and the quality of the image stays good unless it is
compressed.
For instance, saving an image as jpg format compresses the image. By resaving the
image as jpg format, the compressed image will be recompressed, and the quality of the
image will get worse with every saving.
Fixing and retouching of images has become easier. In new Photoshop 7, it is possible to
smooth face wrinkles with a new Healing Brush Tool in a couple of seconds.
The expensive reproduction (compared with restoring the image with a repro camera) is
faster and cheaper.
By changing the image format and resolution, the image can be used in a number of
media.
Some of the major fields in which digital image processing is widely used are mentioned
below.
Medical field
Remote sensing
Transmission and encoding
Machine/Robot vision
Color processing
Pattern recognition
Video processing
Microscopic Imaging