UNIT 2 Self Notes

UNIT :- 2
 What is Computer Vision?

 Computer vision is a field of artificial intelligence (AI) that enables computers and
systems to derive meaningful information from digital images, videos, and other
visual inputs — and take actions or make recommendations based on that
information.
 Deep learning, a subset of machine learning, has been particularly successful in
computer vision tasks due to its ability to automatically learn hierarchical
representations of data. Convolutional Neural Networks (CNNs) are a type of deep
neural network commonly used in computer vision tasks.
 Here are some common computer vision tasks in deep learning:

1. Image Classification
2. Object Detection
3. Image Segmentation
4. Facial Recognition
5. Gesture Recognition
6. Scene Recognition
7. OCR (Optical Character Recognition)
 Convolution Neural Network

 CNN stands for Convolutional Neural Network.
 It is a specialized type of ANN that are designed to automatically learn patterns and
features from input images.
 In another way, it can be defined as it a type of deep neural network that is designed
for processing structured grid data, such as images and videos and it is also
commonly used in computer vision tasks.
 From the above definition, we can say that CNNs have been particularly successful
in image-related tasks due to their ability to automatically learn hierarchical
representations of features.
 Main parts of CNN architecture.
 There are three types of layers that make up the CNN which are the convolutional
layers, pooling layers, and fully connected (FC) layers.
 When these layers are stacked, a CNN architecture will be formed.
1. Convolution Layer
2. Pooling Layer
3. Fully Connected Layer
1. Convolution Layer
 This is the first layer that is used to extract the various features from the input
images.
 In this layer, the mathematical operation of convolution is performed between
the input image and a filter of a particular size MxM.
 By sliding the filter over the input image, the dot product is taken between the filter
and the parts of the input image with respect to the size of the filter (MxM).
 The output is termed as the Feature map which gives us information about the image
such as the corners and edges.
 Later, this feature map is fed to other layers to learn several other features of the
input image.
 The convolution layer in CNN passes the result to the next layer once applying the
convolution operation in the input.
2. Pooling Layer
 A pooling layer is a crucial component in Convolutional Neural Networks
(CNNs) that is used to reduce the spatial dimensions of the input data.
 The primary purpose of pooling is to progressively reduce the spatial size of

the representation and the number of parameters and computations in the
network, thereby controlling overfitting and computational complexity.
Pooling is typically applied after convolutional layers in a CNN.
 There are two types of Pooling Layer

i. Max Pooling
ii. Average Pooling
Advantage of Pooling layer

1. Dimensionality Reduction
2. Translation Invariance
3. Feature Selection
 Padding
• Padding is a technique used to preserve the spatial dimensions of the input image after
convolution operations on a feature map and can improve the performance of the model.
• Padding is simply a process of adding layers of zeros to our input images so as to avoid
the problems mentioned above.
• Padding involves adding extra pixels around the border of the input feature map before convolution.
• This can be done in two ways:

• Valid Padding: In the valid padding, no padding is added to the input feature map, and the
output feature map is smaller than the input feature map. This is useful when we want to reduce
the spatial dimensions of the feature maps.
• Same Padding: In the same padding, padding is added to the input feature map such that the
size of the output feature map is the same as the input feature map. This is useful when we want
to preserve the spatial dimensions of the feature maps.
 Striding
A strided convolution is another basic building block of convolution that is used in
Convolutional Neural Networks.
3. Fully Connected Layer
 After several convolutional and pooling layers, fully connected layers are
often used to make predictions based on the learned features. These layers
connect every neuron to every neuron in the previous and subsequent
layers.
 These layers are usually placed before the output layer and form the last few layers
of a CNN Architecture.
4. Dropout
• When all the features are connected to the FC layer, it can cause overfitting in
the training dataset.
• Overfitting occurs when a particular model works so well on the training data
causing a negative impact in the model’s performance when used on a new data.
• To overcome this problem, a dropout layer is utilized wherein a few neurons are
dropped from the neural network during training process resulting in reduced
size of the model.
• Dropout results in improving the performance of a machine learning model as it
prevents overfitting by making the network simpler.
5. Activation Functions
• Finally, one of the most important parameters of the CNN model is the
activation function.
• It adds non-linearity to the network.
• There are several commonly used activation functions such as the ReLU,
Softmax, tanH and the Sigmoid functions
 Transfer Learning
 Transfer learning is a technique in deep learning where a pre-trained neural
network model is used (means a model that is already trained) as a starting point
for a new, related task, instead of training a neural network from scratch for a
specific task.
 The idea is that the knowledge learned from one task can be transferred to another
task, potentially saving a lot of training time and data.
 Working of Transfer learning

i. Pre-trained Model:
 Start with a pre-trained CNN model, such as VGG, ResNet, Inception, or
MobileNet, that has been trained on a large dataset like ImageNet for image
classification.
ii. Remove Last Layers:
 Remove the final classification layers (output layers) of the pre-trained model,
which are specific to the original task.
iii. Add New Layers:
 Add new layers to the model. These layers should be tailored to your specific task.
The number and structure of these layers depend on the complexity of your task.
iv. Fine-tuning:
 Optionally, you can choose to fine-tune some of the layers of the pre-trained model
on your task. Fine-tuning allows the model to adapt to the new data while retaining
some of the knowledge from the pre-trained model.
v. Training:
 Train the modified model on your dataset, which is typically smaller and more task-
specific than the original dataset.
Transfer learning is especially beneficial when you have limited data because the pre-
trained model has already learned useful features from a large dataset, which can be
applied to your smaller dataset.
Advantage of Transfer Learning

1. Improve Learning Speed
2. Better Generalisation
3. State of the Art Performance
4. Lower Data requirements
5. Domain Adaption
• Fine-tuning: Fine-tuning is the process of training a pre-trained model on a new task.It
can be thought of as a specific application of transfer learning.
• Methods of Fine Tunning

i. Select a pre-trained Model
ii. Remove the Top Layers
iii. Add New Layers
iv. Freeze or Unfreeze Layers
v. Data Augmentation
vi. Choose a Learning Rate:
vii. Loss Function and Metrics
viii. Training
ix. Hyperparameter Tuning
x. Evaluate and Fine-Tune
xi. Testing
• Feature extraction: Here, the developer uses pre-trained models to extract features
from new data. Then they use the best features to train a new classifier.
• Domain adaptation: It works by adapting a pre-trained model to a new domain by
fine-tuning it on the target domain data.
• Multi-task learning: The focus is to train a single model on multiple tasks to improve
performance on all tasks.
• Zero-shot learning: It involves the use of pre-trained models to make predictions on
new classes without any training data for those classes.
 Image Classification
 Step 1: Data Preparation
o Collect and Prepare Your Dataset: Gather a labeled dataset of images for
training and testing. Ensure that the dataset is balanced and representative of the
classes you want to classify.
o Data Preprocessing: Preprocess the images by resizing them to a consistent
size (e.g., 224x224 pixels), normalizing pixel values (usually in the range [0, 1]),
and augmenting the data if needed (applying random transformations like
rotation, flipping, and cropping to increase dataset diversity).
o Split the Dataset: Divide the dataset into training, validation, and test sets.
Typically, you allocate a larger portion to training (e.g., 70-80%) and the rest to
validation and testing.
 Step 2: Build the CNN Model
o Choose a Pre-trained Model (Optional): Consider using a pre-trained CNN
model like VGG, ResNet, Inception, or MobileNet as a starting point. These
models are trained on large datasets (e.g., ImageNet) and have learned useful
features. You can fine-tune these models for your specific task.
o Custom CNN Architecture (Alternative): If you prefer to build your own
CNN architecture, design a stack of convolutional layers, pooling layers, and
fully connected layers. Ensure that the architecture suits the complexity of your
classification problem.
o Compile the Model: Define the loss function (typically categorical cross-
entropy for classification), the optimizer (e.g., Adam, SGD), and the evaluation
metric (e.g., accuracy).
 Step 3: Training the CNN Model
o Training: Feed the training data into the CNN model and start training. During
training, the model adjusts its weights to minimize the loss function. This
process may take several epochs (iterations over the entire dataset). Monitor
training performance on the validation set to prevent overfitting.
 Step 4: Evaluate and Fine-Tune
o Validation: After training, evaluate the model's performance on the validation
set. This helps you tune hyperparameters, such as learning rate, batch size, and
model architecture, for better results.
o Fine-Tuning (Optional): Depending on validation results, you may decide to
fine-tune the model by adjusting layers, adding regularization (e.g., dropout), or
training for more epochs.
 Step 5: Testing and Deployment
o Testing: Once you are satisfied with the model's performance, evaluate it on the
separate test dataset to assess its generalization to unseen data.
o Deployment: If the model performs well, deploy it in your application for real-
time image classification. This could be in the form of a web app, mobile app, or
integration into an existing system.
o Monitoring and Maintenance: Continuously monitor the model's performance
in the production environment and retrain it periodically with new data if
necessary.
 Text Classification
 Text classification is the process of categorization text into organised groups.
 Text classification becoming an important part of business as it allows us to easily get
insights from data and automatic business process.
 The process of text classification typically involves several steps, including text pre-
processing, feature extraction, and machine learning model training.
 There are different ways to pre-process text:

 Stop word removal
 Stop words are common words in a language that are usually removed from text data
during pre-processing because they do not carry significant meaning and do not
contribute to the overall understanding of the text.
 That means the words which are generally filtered out before processing a natural
language are called Stop-words.
 Examples of stop words in English include "the", "a", "an", "and", "in", "of", "to", etc.
 Tokenization
 It can be also defined as Tokenization is the process of breaking down a piece of
text into smaller units called tokens, which are usually words or sub words.
 The tokenization process would break this sentence down into the following
tokens: "The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog".
 Kinds Of Tokenization
1. Word Tokenization
2. Character Tokenization
3. Sub-Word Tokenization
 Stemming.
 Stemming is the process of reducing a word to its base or root form, called a stem,
by removing its suffixes and prefixes or the roots of words known as
"lemmas".
 The process of obtaining the root word from the given word is called Stemming.
For example, the word "running" can be stemmed to "run", and the word "cats" can be stemmed
to "cat".
 Different Types of frequencies
i. Document Frequency
ii. Global Frequency
iii. Term Frequency
iv. IDF
v. TF-IDF
TF-IDF = TF * IDF
Assignment - 2

UNIT 2 Self Notes

Uploaded by

Copyright:

Available Formats

UNIT 2 Self Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT 2 Self Notes

Uploaded by

Copyright:

Available Formats

UNIT :- 2

 What is Computer Vision?

 Here are some common computer vision tasks in deep learning:

 Convolution Neural Network

 The primary purpose of pooling is to progressively reduce the spatial size of

 There are two types of Pooling Layer

Advantage of Pooling layer

• This can be done in two ways:

 Working of Transfer learning

Advantage of Transfer Learning

• Methods of Fine Tunning

 There are different ways to pre-process text:

You might also like