Discriminative Vs Generative Algorithms
Discriminative Vs Generative Algorithms
Discriminative Vs Generative Algorithms
Most of the Machine Learning and Deep Learning problems that you solve are
conceptualized from the Generative and Discriminative Models. In Machine Learning, one
can clearly distinguish between the two modelling types:
Classifying an image as a dog or a cat falls under Discriminative Modelling
Producing a realistic dog or a cat image is a Generative Modelling problem
Generative Modeling
Generative modelling defines how a dataset is generated. It tries to understand the
distribution of data points, providing a model of how the data is actually generated in terms
of a probabilistic model. (e.g., support vector machines or the perceptron algorithm gives a
separating decision boundary, but no model of generating synthetic data points). The aim is to
generate new samples from what has already been distributed in the training data.
Assume you have an autonomous driving dataset with an urban-scene setting. You now want
to generate images from it that are semantically and spatially similar. To do this, the
generative model must understand the data’s underlying structure and learn the realistic,
generalized representation of the dataset, such as the sky is blue, buildings are usually tall,
and pedestrians use sidewalks.
To generate training samples, you need a training dataset, which consists of unlabelled data
points. Each data point has its own features, such as individual pixel values (image-domain)
and a set of vocabulary (text-domain). This whole process of generation is stochastic and
influences the individual samples generated by the model.
Generative models learn to approximate p(x), which is the probability of observing
observation x. This helps them represent the data more realistically. In the above figure, the
generative model learns to generate urban-scene images, taking a random noise as a matrix or
vector. Its task is to generate realistic samples X, with probability distribution similar to
Pdata (original data from the training set). The noise adds randomness to the model and ensures
that the images generated are diverse.
https://www.youtube.com/watch?v=NGf0voTMlcs
https://www.youtube.com/watch?v=OCwZyYH14uw
https://online.stat.psu.edu/stat501/lesson/1/1.1