AIDS II (1)
AIDS II (1)
AIDS II (1)
1. Explain various components of CNN and their working with example. (CO4)
A conventional neural network consists of an input layer, hidden layers and output. The middle layers are called
as hidden layers because their inputs and outputs are governed by activation function and convolution.
In CNN, the input is a tensor with a shape:
After passing through a convolutional layer, the image becomes a features map, also called an activation map,
with shape:
(number of inputs) x (feature map height) x (feature map width) x (feature map channels).
The input is convolved in convolutional layers and the result is forwarded to the next layer. Each convolutional
neuron processes data only for its receptive field.
LAYERS
Input Layer : The input layer passes the data directly to the first hidden layer. Here the data is multiplied by the
first hidden layer's weights. Then the input layer passes the data through the activation function then it passes on.
It’s the layer in which we give input to our model. In CNN, Generally, the input will be an image or a sequence
of images. This layer holds the raw input of the image with width 32, height 32, and depth 3. The input layer is
where the raw data, like an image, is fed into the network. This data is typically represented as a multi-dimensional
array.
Example: For a 64x64 pixel RGB image, the input would be a 3D array of size 64x64x3, where 64x64 are the
height and width, and 3 corresponds to the color channels (Red, Green, Blue).
Fully Connected Layer
• Purpose: The fully connected (FC) layer is where the high-level reasoning and decision-making occur. It
combines all the features extracted by the convolutional and pooling layers to classify the input data.
• Working: The output from the last pooling layer is flattened into a 1D vector, which is then fed into one
or more fully connected layers. Each neuron in a fully connected layer is connected to every neuron in
the previous layer, enabling the network to learn complex patterns and relationships between features.
• Example: If the flattened vector represents features like edges, textures, and shapes from an image, the
fully connected layer uses this information to determine what the image represents, such as whether it's a
cat or a dog.
Output Layer
• Purpose: The output layer is the final layer in the CNN that provides the final prediction, usually in the
form of probabilities for different classes.
• Working: In a classification task, the output layer typically uses a Softmax activation function to convert
the raw scores from the fully connected layer into probabilities. Each output value represents the
likelihood that the input data belongs to a specific class.
• Example: If the network is classifying between cats and dogs, the output layer might produce probabilities
like [0.9, 0.1], indicating a 90% probability that the image is of a cat and a 10% probability that it is a
dog.
Complete Example Flow
4. Pooling: Down-samples the feature maps, reducing size but retaining important features.
6. Dropout: (Optional during training) Prevents overfitting by randomly turning off neurons.
2. Explain the need of RNN to process sequential data. State variants of RNN with example
application. (CO4)
Recurrent Neural Network(RNN) is a type of Neural Network where the output from the previous
step is fed as input to the current step. In traditional neural networks, all the inputs and outputs
are independent of each other. Still, in cases when it is required to predict the next word of a
sentence, the previous words are required and hence there is a need to remember the previous
words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
The main and most important feature of RNN is its Hidden state, which remembers some
information about a sequence. The state is also referred to as Memory State since it remembers
the previous input to the network. It uses the same parameters for each input as it performs the
same task on all the inputs or hidden layers to produce the output. This reduces the complexity
of parameters, unlike other neural networks.
NEED OF RNN
Traditional neural networks (like feedforward networks and CNNs) are excellent at handling
fixed-size inputs (e.g., images) but struggle with sequential data, where the order of the data
points is important. Examples of sequential data include time series, natural language, and video
frames.
1. Memory of Previous Inputs: RNNs have a built-in memory mechanism that allows them
to retain information from previous inputs. This is crucial for understanding the context in
sequences where the current output depends on prior inputs.
2. Shared Weights: RNNs use the same set of weights for each time step, which makes them
efficient for learning patterns across sequences of varying lengths.
Variants :-
- Output Gate: Determines what part of the memory is sent to the output and the next hidden state.
Purpose: These mechanisms allow LSTMs to maintain and utilize information over long sequences, making
them particularly effective for tasks requiring the understanding of context spread over time.
Example Application:
- Language Translation: In tasks like translating a sentence from English to French, understanding the entire
sentence is necessary to generate a correct translation. LSTM networks are capable of capturing the long-term
dependencies required to maintain the meaning of the sentence across multiple words. For instance, in the
translation of "I am going to school" to "Je vais à l'école," the LSTM maintains the context of the subject ("I")
and the action ("going to school") across the sequence, ensuring an accurate translation.
To solve the problem of Vanishing and Exploding Gradients in a Deep Recurrent Neural Network,
many variations were developed. One of the most famous of them is the Long Short Term
Memory Network(LSTM). In concept, an LSTM recurrent unit tries to “remember” all the past
knowledge that the network is seen so far and to “forget” irrelevant data. This is done by
introducing different activation function layers called “gates” for different purposes. Each LSTM
recurrent unit also maintains a vector called the Internal Cell State which conceptually describes
the information that was chosen to be retained by the previous LSTM recurrent unit.
LSTM networks are the most commonly used variation of Recurrent Neural Networks (RNNs).
The critical component of the LSTM is the memory cell and the gates (including the forget gate
but also the input gate), inner contents of the memory cell are modulated by the input gates and
forget gates. Assuming that both of the segue he are closed, the contents of the memory cell will
remain unmodified between one time-step and the next gradients gating structure allows
information to be retained across many time-steps, and consequently also allows group that to
flow across many time-steps. This allows the LSTM model to overcome the vanishing gradient
properly occurs with most Recurrent Neural Network models.
A Long Short Term Memory Network consists of four different gates for different purposes as described below:-
1. Forget Gate(f): At forget gate the input is combined with the previous output to generate a fraction
between 0 and 1, that determines how much of the previous state need to be preserved (or in other words,
how much of the state should be forgotten). This output is then multiplied with the previous state. Note:
An activation output of 1.0 means “remember everything” and activation output of 0.0 means “forget
everything.” From a different perspective, a better name for the forget gate might be the “remember gate”
2. Input Gate(i): Input gate operates on the same signals as the forget gate, but here the objective is to
decide which new information is going to enter the state of LSTM. The output of the input gate (again a
fraction between 0 and 1) is multiplied with the output of tan h block that produces the new values that
must be added to previous state. This gated vector is then added to previous state to generate current state
3. Input Modulation Gate(g): It is often considered as a sub-part of the input gate and much literature on
LSTM’s does not even mention it and assume it is inside the Input gate. It is used to modulate the
information that the Input gate will write onto the Internal State Cell by adding non-linearity to the
information and making the information Zero-mean. This is done to reduce the learning time as Zero-
mean input has faster convergence. Although this gate’s actions are less important than the others and are
often treated as a finesse-providing concept, it is good practice to include this gate in the structure of the
LSTM unit.
4. Output Gate(o): At output gate, the input and previous state are gated as before to generate another
scaling fraction that is combined with the output of tanh block that brings the current state. This output is
then given out. The output and state are fed back into the LSTM block.
At the heart of deep learning lies the neural network, an intricate interconnected system of nodes
that mimics the human brain’s neural architecture. Neural networks excel at discerning intricate
patterns and representations within vast datasets, allowing them to make predictions, classify
information, and generate novel insights. Autoencoders emerge as a fascinating subset of neural
networks, offering a unique approach to unsupervised learning. Autoencoders are an adaptable
and strong class of architectures for the dynamic field of deep learning, where neural networks
develop constantly to identify complicated patterns and representations. With their ability to learn
effective representations of data, these unsupervised learning models have received considerable
attention and are useful in a wide variety of areas, from image processing to anomaly detection.
Autoencoders are a specialized class of algorithms that can learn efficient representations of input
data with no need for labels. It is a class of artificial neural networks designed for unsupervised
learning. Learning to compress and effectively represent input data without specific labels is the
essential principle of an automatic decoder. This is accomplished using a two-fold structure that
consists of an encoder and a decoder. The encoder transforms the input data into a reduced-
dimensional representation, which is often referred to as “latent space” or “encoding”. From that
representation, a decoder rebuilds the initial input. For the network to gain meaningful patterns
in data, a process of encoding and decoding facilitates the definition of essential features.
Input Layer:
• The input layer takes in the original data. For example, if the input data is a 28x28 pixel
grayscale image (like from the MNIST dataset), the input layer has 784 nodes (28x28).
Encoder
• The hidden layers progressively reduce the dimensionality of the input, capturing
important features and patterns. These layer compose the encoder.
• The bottleneck layer (latent space) is the final hidden layer, where the dimensionality is
significantly reduced. This layer represents the compressed encoding of the input data.
• This is the layer that contains the compressed representation of the input data. It has a much
smaller number of neurons compared to the input layer.
• Purpose: The bottleneck forces the network to learn the most important features of the
data, effectively encoding the input.
Decoder
• The bottleneck layer takes the encoded representation and expands it back to the
dimensionality of the original input.
• The hidden layers progressively increase the dimensionality and aim to reconstruct
the original input.
• The output layer produces the reconstructed output, which ideally should be as close
as possible to the input data.
Output Layer:
• The output layer produces the reconstructed version of the input data. Ideally, this output
should be as close as possible to the original input.
Working of Autoencoder
Step-by-Step Process:
1. Encoding:
o The input data is passed through the encoder, where the data is compressed.
o Each layer in the encoder reduces the dimensionality, transforming the input into a
compact form in the latent space.
o Example: For an image, the encoder might capture the most significant features,
such as edges or shapes, while discarding less important details.
o The compressed representation in the latent space acts as a "summary" of the input
data. This representation should ideally capture the essential features of the input.
3. Decoding:
o The latent space representation is then passed through the decoder, where the
dimensionality is gradually increased.
o The goal is to reconstruct the original input from this compressed form as accurately
as possible.
4. Reconstruction:
o The difference between the input and the output (reconstruction error) is minimized
during training, allowing the network to learn an efficient encoding and decoding
process.
5. Training Process:
o The autoencoder is trained using backpropagation, with the loss function typically
being the Mean Squared Error (MSE) between the input and reconstructed output.
o The training process adjusts the weights in the encoder and decoder to minimize this
loss, ensuring the output closely resembles the input.
5. How does class imbalance affect classification? How is it handled? Explain with suitable
examples. (CO5)
Handling Class Imbalance
1. Resampling Techniques
a. Oversampling:
• Example: In fraud detection, if "Fraud" cases are rare, SMOTE can be used to generate
additional synthetic "Fraud" samples to balance the dataset.
b. Undersampling:
• Description: Reduce the number of instances in the majority class by randomly removing
samples. This balances the dataset but can lead to loss of important information.
• Techniques:
• Example: In a credit scoring application with many more "Good Credit" cases than "Bad
Credit," undersampling the "Good Credit" cases can help balance the classes.
2. Class Weighting
• Description: Modify the learning algorithm to give more importance (higher weight) to
the minority class. Many machine learning algorithms, such as logistic regression, SVM,
and neural networks, allow you to assign weights to classes.
• Example: In a medical diagnosis problem, you can assign a higher weight to the minority
class (e.g., "Disease Positive") to penalize the model more for misclassifying these cases,
encouraging the model to focus more on accurately predicting the minority class.
4. Ensemble Methods
• Description: Use ensemble methods like Random Forest or XGBoost, which can handle
imbalance by focusing on misclassified instances. Techniques like Balanced Random
Forest or EasyEnsemble combine undersampling with ensemble learning.
5. Evaluation Metrics
o Recall: Measures the ability of the model to identify all positive instances.
o ROC-AUC: Measures the trade-off between true positive rate and false positive
rate.
• Example: In a spam detection task, using F1-Score or ROC-AUC as the evaluation metric
ensures that both false positives and false negatives are considered, giving a better
indication of the model's performance on the minority class.
6. State various ensemble learning techniques and explain any one in detail (CO5).
Ensemble learning is a machine learning technique that enhances accuracy and resilience in
forecasting by merging predictions from multiple models. It aims to mitigate errors or biases that
may exist in individual models by leveraging the collective intelligence of the ensemble.
Ensemble learning involves combining multiple models to improve the overall performance of a
machine learning system. The key idea is that a group of weak models, when combined, can
produce a more accurate and robust model. There are several ensemble learning techniques,
including:
2. Boosting
3. Stacking
4. Voting
5. Random Forest
Bagging is one of the most popular ensemble learning techniques, primarily used to reduce
variance and prevent overfitting. The idea behind bagging is to create multiple subsets of the
original dataset by sampling with replacement, train a model on each of these subsets, and then
combine their predictions.
1. Working of Bagging
o Multiple subsets of the training data are created by randomly selecting samples from
the original dataset. Each subset is created with replacement, meaning that the same
data point can appear multiple times in a subset.
o Example: If you have a dataset with 1000 instances, you might create 10 subsets,
each with 1000 instances. Some instances from the original dataset will appear more
than once in a subset, while others might not appear at all.
o A separate model (usually of the same type, like decision trees) is trained on each of
these subsets.
o Example: If you use decision trees as your base model, each subset of the data will
be used to train a different decision tree.
• Step 3: Aggregating Predictions
o The predictions from all the trained models are aggregated to produce the final
output. For classification tasks, this is usually done by majority voting (the class
predicted most frequently is chosen). For regression tasks, the predictions are
averaged.
o Example: If you have trained 10 decision trees and 7 of them predict class A while
3 predict class B for a particular instance, the final prediction will be class A.
2. Benefits of Bagging
• Reduced Variance: Bagging helps to reduce the variance of the model, making it less
likely to overfit the training data. By averaging the predictions, the model becomes more
robust to noise in the data.
• Stability and Accuracy: The aggregation of multiple models improves the stability and
accuracy of the final prediction compared to using a single model.
• Random Forest is a popular ensemble technique that uses bagging as its core mechanism
but with an additional twist: it introduces randomness not only in the data sampling but
also in the feature selection.
• In Random Forest, each decision tree is trained on a different subset of the data (using
bagging), and during the training of each tree, only a random subset of features is
considered for splitting at each node. This further decorrelates the trees and improves the
performance of the ensemble.
4. Practical Example
Consider a scenario where you want to predict whether a customer will churn based on various
features like age, account balance, and customer service calls:
• Bagging Process:
1. Bootstrap Sampling: Create, say, 100 different subsets of the data by randomly
sampling with replacement from the original dataset.
2. Train Models: Train 100 decision trees, each on a different subset of the data.
3. Aggregate Predictions: For a new customer, each of the 100 trees makes a
prediction on whether the customer will churn. The final prediction is based on the
majority vote of all the trees.
Outcome: The final model is more stable, has lower variance, and is less prone to overfitting
compared to a single decision tree trained on the entire dataset.
1. Boosting:
o Boosting works by training models sequentially, with each new model focusing on
correcting the errors made by the previous models. Popular algorithms include
AdaBoost, Gradient Boosting, and XGBoost.
2. Stacking:
o In stacking, multiple models (of different types) are trained, and their predictions
are then used as input features for a meta-model, which makes the final prediction.
3. Voting:
o Voting involves training multiple models and then combining their predictions by
majority vote (for classification) or averaging (for regression). It can use different
types of models or the same model type with different parameters.
4. Random Forest:
Conclusion
Ensemble learning techniques like bagging are powerful methods to improve the performance
and robustness of machine learning models by combining the strengths of multiple models.
Bagging, in particular, is effective in reducing variance and preventing overfitting, making it a
widely used approach in various applications, including Random Forest.
7. Numerical on calculating various performance metrics like precision, recall, accuracy,
specificity and sensitivity given the confusion matrix (CO5).
SUMS
i) Bootstrapping
Holdout method
Hold-out is when you split up your dataset into a ‘train’ and ‘test’ set. The training set is what the
model is trained on, and the test set is used to see how well that model performs on unseen data.
A common split when using the hold-out method is using 80% of data for training and the
remaining 20% of the data for testing.
Random subsampling
Cross-validation
Cross-Validation is used to estimate the test error associated with a model to evaluate its
performance.
Validation set approach:
This is the most basic approach. It simply involves randomly dividing the dataset into two
parts: first a training set and second a validation set or hold-out set. The model is fit on the
training set and the fitted model is used to make predictions on the validation set.
Leave-one-out-cross-validation:
LOOCV is a better option than the validation set approach. Instead of splitting the entire dataset
into two halves, only one observation is used for validation and the rest is used to fit the model.
Cross-validation or ‘k-fold cross-validation’ is when the dataset is randomly split up into ‘k’
groups. One of the groups is used as the test set and the rest are used as the training set. The
model is trained on the training set and scored on the test set. Then the process is repeated until
each unique group as been used as the test set.
Bootstrapping
9. What is multimodal application? Explain any Multimodal data science application.
(CO6)
A multimodal application refers to a system or tool that integrates and processes multiple types
of data (modalities) simultaneously to provide more comprehensive insights or functionalities. In
the context of data science, multimodal applications combine different types of data, such as text,
images, audio, and video, to enhance the analysis, prediction, or interaction capabilities.
10. Application of Data science for text/images/videos with real time example. (CO6)
1. Text
Real-Time Example:
• Service: A company uses sentiment analysis to monitor customer reviews and feedback on
social media platforms and review sites.
• How It Works: Natural Language Processing (NLP) algorithms analyze the text of
customer reviews to determine the sentiment (positive, negative, or neutral).
• Impact: This analysis helps the company understand customer satisfaction trends, identify
potential issues, and make improvements to products or services based on real-time
feedback.
Tools/Technologies: NLP libraries (like SpaCy, NLTK), sentiment analysis models, machine
learning frameworks.
Description:
• How It Works:
o Machine Learning Models can improve over time by learning from user
interactions.
• Impact: Enhances customer service efficiency, reduces response times, and provides 24/7
support. Companies can handle a higher volume of inquiries with fewer human agents.
Email Filtering
Description:
• How It Works:
o Content Analysis: Emails are analyzed for certain keywords, patterns, or phrases
commonly associated with spam.
o Machine Learning Models: These models are trained on large datasets to identify
spam based on email content and sender behavior.
o User Feedback: Spam filters can learn from user actions, such as marking emails
as spam or not spam.
• Impact: Improves user experience by reducing the amount of spam, thereby increasing
productivity and protecting against phishing attacks.
Content Moderation
Description:
• How It Works:
o Text Analysis: Algorithms analyze text for harmful language, hate speech, or
offensive content.
o Flagging and Review: Content that triggers certain flags may be reviewed by
human moderators.
o AI and NLP Models: Continuous learning from flagged content helps improve the
accuracy of content moderation.
• Impact: Maintains a safe and respectful online environment, preventing the spread of
harmful content and reducing the risk of legal issues.
2. Images
Real-Time Example:
• Service: A security system in a public area uses object detection algorithms to monitor
video feeds from CCTV cameras.
• How It Works: Computer vision models analyze images from the cameras to detect and
identify objects or people, such as recognizing faces or detecting suspicious behavior.
• Impact: The system can trigger alerts if it detects unusual activity or unauthorized
individuals, enhancing security and allowing for faster responses to potential threats.
Facial Recognition
Description:
o How It Works:
▪ Feature Extraction: Unique facial features are extracted and converted into
a biometric template.
Medical Imaging
Description:
o Functionality: Medical imaging tools analyze medical scans to detect and diagnose
conditions.
o How It Works:
▪ Integration: Results are integrated into electronic health records (EHR) for
comprehensive patient management.
o Impact: Improves diagnostic accuracy, speeds up the analysis process, and supports
early detection of diseases.
Description:
o How It Works:
Real-Time Example:
• Service: A sports analytics platform uses action recognition to analyze video footage of
games.
• How It Works: Machine learning models analyze video frames to recognize and
categorize actions, such as a player shooting a basketball or making a pass.
• Impact: Coaches and analysts can use this information to review player performance,
identify patterns, and develop strategies based on the detailed analysis of in-game actions.
Video Surveillance
Description:
• How It Works:
o Video Analytics: Algorithms analyze video streams for motion detection, object
recognition, and anomaly detection.
• Impact: Enhances security and monitoring capabilities, allowing for quick responses to
potential security threats.
Content Recommendation
Description:
• How It Works:
o User Data Analysis: Viewing history, search queries, and interactions are analyzed.
Sports Analysis
Description:
• Functionality: Platforms analyze sports video footage to provide insights into player
performance and game strategy.
• How It Works:
o Action Tracking: Algorithms track player movements, game events, and key
actions.
• Impact: Provides detailed analysis and feedback to enhance team performance, optimize
strategies, and improve training.
Multimodal Applications
Real-Time Example:
• Service: An autonomous vehicle system combines text, image, and video data to navigate
and make driving decisions.
• How It Works:
o Images/Videos: Cameras capture real-time visual data to detect road signs, lane
markings, and other vehicles.
o Text: GPS data provides location-based information, and real-time traffic updates
can be integrated into the system.
o Action: The vehicle processes this multimodal data using machine learning
algorithms to make driving decisions, such as adjusting speed or changing lanes.
• Impact: This integrated approach allows the vehicle to operate safely and effectively in
diverse driving conditions by utilizing and processing multiple data types in real-time.