Unit - DL
Unit - DL
Unit - DL
The generative model can capture the data distribution. It is tried to maximise
the probability of making mistake in the discriminator. So, the discriminator
cannot find which is real data and which is artificial data. On the other side, the
discriminator is tried to understand which data is received from the training data
and which is received from the generator. The Generative Adversarial Network
is used to minimax the game. Here the work of the discriminator is to minimize
the rewards V(D, G). Another side the generator works to minimize the
discriminator rewards and also maximize the loss.
What are the types of Generative Adversarial Network or GAN model?
There are various types of GAN models or Generative Adversarial Network
models. These are given in below -
1. Condition GAN:
It is one type of GAN. The Condition GAN also represent a CGAN. This GAN
has mainly described the deep learning method. In the CGAN, we put some
conditional parameters. In the generator, when we add data "X" then it
generates its corresponding data. In the CGAN, the labels are put in the input to
help the discriminator identify the fake data and the real data.
2. Vanilla GAN:
It is one of the simplest types of GAN. The Vanilla GAN also represent a
VGAN. In the vanilla GAN, the Generator and the Discriminator are very
simple and have multilayer perceptron's. The algorithm of the VGAN is also
very easy. The VGAN is used to optimization of a mathematical equation which
is used in the stochastic gradient descent.
3. Laplacian Pyramid GAN:
It is another type of GAN. The Laplacian Pyramid GAN also represent a
LAPGAN. The LAPGAN represents the linearly invertible image. This consists
of a set of bandpass images which is also residual of low frequency. The images
are also spaced an octave apart. In the Laplacian Pyramid GAN, we use
multiple numbers of generators and Discriminators. Here we also used the
Laplacian pyramid of different levels. High-quality images produced by the
LAPGAN. In the first step, the image is always down-sampled in Laplacian
Pyramid GAN. After that, the images are upscaled in each layer of the pyramid
and then pass the images which have some noise from the CGAN. The images
belong in CGAN until it gets the original size.
4. Super Resolution GAN:
It is another type of Generative Adversarial Network. The Super Resolution
GAN also represent an SRGAN. The SRGAN is produce a high-resolution
image. In this GAN, the deep neural network is used with the adversarial
network. The SRGAN is very much useful for upscaling low-resolution images
into high-resolution images. And it is also used to errors minimizing in the
images.
5. Deep Convolutional GAN:
The last type of Generative Adversarial Network or GAN is Deep convolutional
GAN. The Deep convolutional GAN also represent a DCGAN. This GAN is
most powerful GAN than others. The DCGAN is very popular. Deep
convolution GAN is ConvNets in place of multi-layer perceptron's. This is
easily implemented without max pooling. The layers of DCGAN are not fully
connected.
Zero-shot learning
Zero-shot learning (ZSL) is a machine learning scenario in which an AI
model is trained to recognize and categorize objects or concepts without having
seen any examples of those categories or concepts beforehand.
Most state-of-the-art deep learning models for classification or regression are
trained through supervised learning, which requires many labelled examples of
relevant data classes. Models “learn” by making predictions on a labelled
training dataset; data labels provide both the range of possible answers and the
correct answers (or ground truth) for each training example. “Learning,” here,
means adjusting model weights to minimize the difference between the model’s
predictions and that ground truth. This process requires enough labelled samples
for many rounds of training and updates.
While powerful, supervised learning is impractical in some real-world
scenarios. Annotating large amounts of data samples is costly and time-
consuming, and in cases like rare diseases and newly discovered species,
examples may be scarce or non-existent. Consider image recognition tasks:
according to one study, humans can recognize approximately 30,000
individually distinguishable object categories.1 It’s not feasible, in terms of
time, cost and computational resources, for artificial intelligence models to
remotely approach human capabilities if they must be explicitly trained on
labelled data for each class.
The need for machine learning models to be able to generalize quickly to a large
number of semantic categories with minimal training overhead has given rise
to n-shot learning: a subset of machine learning that also includes few-shot
learning (FSL) and one-shot learning. Few-shot learning typically uses transfer
learning and meta learning-based methods to train models to quickly recognize
new classes with only a few labelled training examples—or, in one-shot
learning, a single labelled example.
Zero-shot learning, like all n-shot learning, refers not to any specific algorithm
or neural network architecture, but to the nature of the learning problem itself:
in ZSL, the model is not trained on any labelled examples of the unseen classes
it is asked to make predictions on post-training.
This problem setup doesn’t account for whether that class was present (albeit
unlabelled) in training data. For example, some large language models
(LLMs) are well-suited for ZSL tasks, as they are pre-trained through self-
supervised learning on a massive corpus of text that may contain incidental
references to or knowledge about unseen data classes. Without labeled
examples to draw upon, ZSL methods all rely on the use of such auxiliary
knowledge to make predictions.
Given its versatility and wide range of use cases, zero-shot learning has become
an increasingly notable area of research in data science, particularly in the fields
of computer vision and natural language processing (NLP).
An intuitive example for MIL is a situation where several people have a specific
key chain that contains keys. Some of these people are able to enter a certain
room, and some aren’t. The task is then to predict whether a certain key or a
certain key chain can get you into that room.
For solving this, we need to find the exact key that is common for all the
“positive” keychains – the green key. We can then correctly classify an entire
keychain – positive if it contains the required key, or negative if it doesn’t.
This standard assumption can be slightly modified to accommodate problems
where positive bags cannot be identified by a single instance, but by its
accumulation. For example, in the classification of desert, sea and beach
images, images of beaches contain both sand and water segments. Several
positive instances are required to distinguish a “beach” from “desert”/”sea”.
Characteristics of Multiple Instance Learning Problems
Task/Prediction: Instance level vs Bag Level
In some applications, like object localization in images (in content retrieval, for
instance), the objective is not to classify bags, but to classify individual
instances. The bag label is the presence of that entity in the image.
Note that the bag classification performance of a method often is not
representative of its instance classification performance. For example, when
considering negative bags, a single False Positive causes a bag to be
misclassified. On the other hand, in positive bags, it does not change the label,
which shouldn’t affect the loss at bag-level.
Bag Composition
Most existing MIL methods assume that positive and negative instances are
sampled independently from a positive and a negative distribution. This is often
not the case, due to the co-occurrence of several relations:
i) Intra Bag Similarities
The instances belonging to the same bag share similarities that instances from
other bags do not. In Computer Vision applications, it is likely that all segments
share some similarities related to the capture condition (e.g. illumination).
Another option is overlapping patches in an extraction process, as represented
below.
Showcasing the problem of ambiguous negative classes in Multiple Instance
Learning problems, where the positive concept can be marginally represented
on a negative bag.
ii) Instance Co Occurrence
Instances co-occur in bags when they share a semantic relation. This type of
correlation happens when the subject of a picture is more likely to be seen in
some environment than in another, or when some objects are often found
together.
Label noise occurs as well when you have different bags with different densities
of positive events. For instance, we have an audio recording (R1) of 10 seconds
containing only a total of 1 second of the tagged event in it and another audio
recording (R2) of the same duration in which the tagged event is present for a
total of 5 seconds. R1 is a weaker representation of the event compared to R2.
Highway network
In machine learning, the Highway Network was the first working very
deep feedforward neural network with hundreds of layers, much deeper than
previous artificial neural networks. It uses skip connections modulated by
learned gating mechanisms to regulate information flow, inspired by Long
Short-Term Memory (LSTM) recurrent neural networks. The advantage of a
Highway Network over the common deep neural networks is that it solves or
partially prevents the vanishing gradient problem, thus leading to easier to
optimize neural networks. The gating mechanisms facilitate information flow
across many layers ("information highways"). Highway Networks have been
used as part of text sequence labelling and speech recognition tasks.
The model has two gates in addition to the H(WH, x) gate: the transform
gate T(WT, x) and the carry gate C(WC, x). Those two last gates are non-linear
transfer functions (by convention Sigmoid function. The H(WH, x) function can
be any desired transfer function.
The carry gate is defined as C(WC, x) = 1 - T(WT, x). While the transform gate
is just a gate with a sigmoid transfer function.
Before talking about Highway Networks, Let’s start with plain network
which consists of L layers where the l-th layer (with omitting the symbol
for layer):
In particular, C = 1 - T:
Fractal Network
Advantages :
Siamese Networks :
A Siamese Neural Network is a class of neural network architectures
that contain two or more identical subnetworks. ‘identical’ here means, they
have the same configuration with the same parameters and weights. Parameter
updating is mirrored across both sub-networks. It is used to find the similarity of
the inputs by comparing its feature vectors, so these networks are used in many
applications
In this context,
x₁, x₂ are the two inputs.
f(x) represents the output of the encoding.
Distance denotes the distance function.
Numerous loss functions cater to diverse problem types. For instance, mean
squared error is apt for regression challenges, while cross-entropy loss suits
classification tasks.
Here,
Contrastive Loss