MLT - MKC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

MUTHAYAMMAL ENGINEERING COLLEGE

(An Autonomous Institution)

(Approved by AICTE, New Delhi, Accredited by NAAC & Affiliated to


Anna University)
Rasipuram - 637 408, Namakkal Dist., Tamil Nadu
MKC
MUST KNOW CONCEPTS
CSE 2020-2021

SUBJECT 16CSE14/ MACHINE LEARNING TECHNIQUES


S.
Notation Concept/Definition/Meaning/Units/Equation/
No Term Units
(Symbol) Expression
.
UNIT - I INTRODUCTION TO SUPERVISED LEARNING
Machine learning is an application of AI which deals with
1 system programming in order to automatically learn and improve
Machine Learning
with experience without being explicitly programmed. Eg:
Robots
Types of machine Supervised learning, Unsupervised Learning & Reinforcement
2
learning learning.
Learn from trained data and predict output for new input.
3 Supervised Learning
Unsupervised Predict output from hidden pattern without any external trained
4
Learning data.
Learner is a decision-making agent that takes actions in an
5 Reinforcement
environment and receives reward (or penalty) for its actions in
Learning
trying to solve a problem .
Types of Supervised Classification & Regression
6
learning
Classification is Supervised learning technique to categorize into
7 Classification a desired and distinct number of classes. example: Male and
Female
A regression problem is the output variable is a real or
8 Regression
continuous value, such as “salary” or “weight”
pattern recognition, optical character recognition, face
9 Example for recognition, medical diagnosis, speech recognition &
Classification Biometrics etc.
Machine learning techniques often have to deal with noisy data,
10 which may affect the accuracy of the resulting data models.
Noise
Therefore, effectively dealing with noise is a key aspect
in supervised learning to obtain reliable models from data.
Multi class in Multiclass classification is a classification task that consists of
11 more than two classes
supervised learning
Model selection is the process of selecting one final machine
learning model from among a collection of candidate machine
12 Model Selection learning models for a training dataset. Model selection is a
process that can be applied both across different types
of models (e.g. logistic regression, SVM, KNN, etc.)
Generalization refers to your model's ability to adapt properly to
13 Generalization new, previously unseen data, drawn from the same distribution as
the one used to create the model.
We can think of machine learning as learning models of data.
14 Bayesian machine The Bayesian framework for machine learning states that you
learning start out by enumerating all reasonable models of the data and
assigning your prior belief P(M) to each of these models.
Decision Trees are a non-parametric
supervised learning method used for
15 Decision tree is used
both classification and regression tasks. The goal is to create a
in machine learning
model that predicts the value of a target variable
by learning simple decision rules inferred from the data features.
Bayesian decision theory refers to a decision theory which is
16 Bayesian decision informed by Bayesian probability. It is a statistical system that
theory tries to quantify the tradeoff between
various decisions, making use of probabilities and costs.
Trained set which are trained with lot of data and produce
17 Over fitting
inaccurate output by the noise.
Trained set which have less number of data and not used to
18 Under fitting
generalize a new data.
posterior = prior × likelihood / evidence
19 posterior probability

20 Difference between predicted output and actual value.


Bias
Used to analyse linear relationship between two variables.
21 Variance

22 Used to analyse difference between two attributes.


Covariance

23 ill-posed problem where the data byitself is not sufficient to find


ill-posed problem
a unique solution.
A model trained on the training set predicts the right output for
24 Generalization
new instances is called generalization
validation set and is used to test the generalization
25 Validation set
ability
UNIT II PARAMETRIC AND SEMI-PARAMMMETRIC METHODS
Parametric methods, like Discriminant Analysis Classification,
fit a parametric model to the training data and interpolate
26 Parametric
to classify test data. Nonparametric methods,
classification
like classification and regression trees, use other means to
determine classifications.
For example, polynomial regression consists of performing
multiple regression with variables. in order to find the
27 Parametric polynomial coefficients (parameters). These types
regression of regression are known as parametric regression since they are
based on models that require the estimation of a finite number of
parameters.
Model complexity can be characterized by many things, and is a
bit subjective. In machine learning, model complexity often
28 Model complexity refers to the number of features or terms included in a given
predictive model, as well as whether the chosen model is linear,
nonlinear, and so on
parametric models, which are well-defined in the finite-
29 Parametric models
dimensional space, and
Non-parametric models, where the parameters can all span an
30 Non-parametric infinite space, a semi parametric model has a component that is
models finite-dimensional (i.e. it's easy to research and understand), and
another that is infinite
Machine learning model selection is the second step of
31 model selection the machine learning process, following variable selection and
data cleansing. Selecting the right machine learning model is a
critical step, as a model which does not appropriately fit the data
will yield inaccurate results
Parameter estimates (also called coefficients) are the change in
32 Parameter
the response associated with a one-unit change of the predictor,
estimates
all other predictors being held constant.
Multivariate Regression is a method used to measure the degree
Multivariate at which more than one independent variable (predictors) and
33
Regression more than one dependent variable (responses), are linearly
related.
Binary Binary classification is the task of classifying the elements of a
34
classification set into two groups on the basis of a classification rule.
Clustering is the task of dividing the population or data points
35 into a number of groups such that data points in the same groups
Clustering
are more similar to other data points in the same group and
dissimilar to the data points in other groups
36 Types of Hierarchical clustering,K-Means clustering.
clustering
If k is given, the K-means algorithm can be executed in the
K-means following steps: Partition of objects into k non-empty subsets. ...
37
algorithm Compute the distances from each point and allot points to
the cluster where the distance from the centroid is minimum
Hierarchical Minimum distance clustering is also called as single
38
Clustering. linkage hierarchical clustering or nearest neighbor clustering
Maximum Maximum likelihood estimation is a method that determines
39 likelihood values for the parameters of a model.
estimation
It describes a single trial of a Bernoulli experiment. A closed
form of the probability density function of Bernoulli distribution
40 Bernoulli density
is P ( x ) = p x ( 1 − p ) 1 − x P(x) = p^{x}(1-p)^{1-x}
P(x)=px(1−p)1−x
It is a combination of the prior distribution and the likelihood
41 prior distribution function, which tells you what information is contained in your
observed data (the “new evidence”).
posterior The posterior distribution summarizes what you know after the
42
distribution data has been observed
Independent Independent variables (also referred to as Features) are the input
43
variables for a process that is being analyzes.
Dependent Dependent variables are the output of the process
44
variables
The least squares method is a statistical procedure to find the
45 least squares
best fit for a set of data points by minimizing the sum of the
method
offsets or residuals of points from the plotted curve
Least squares Least squares regression is used to predict the behavior of
46
regression dependent variables
Polynomial Regression is a form of linear regression in which
47 Polynomial
the relationship between the independent variable x and
Regression
dependent variable y is modeled as an nth degree polynomial.
The relative squared error is relative to what it would have been
Relative Squared
48 if a simple predictor had been used. More specifically, this
Error
simple predictor is just the average of the actual values.
Cross-validation is a resampling procedure used to
49 Cross-validation
evaluate machine learning models on a limited data sample
50 Regularization is the process of adding information in order to
Regularization
solve an well-posed problem or to prevent overfitting.
UNIT III ARTIFICIAL NEURAL NETWORKS
An artificial neuron is a mathematical function conceived
51 Artificial neuron as a model of biological neurons, a neural network. ... Usually
each input is separately weighted, and the sum is passed
through a non-linear function known as an activation function or
transfer function.
An artificial neural network learning algorithm, or neural network,
52 Neural network or just neural net. , is a computational learning system that uses
learning a network of functions to understand and translate a data input of
one form into a desired output, usually in another form
A Perceptron is an algorithm used for supervised learning of
53 binary classifiers. Binary classifiers decide whether an input,
Perceptron
usually represented by a series of vectors, belongs to a specific
class
54 Perceptron Perceptron Learning Rule states that the algorithm would
Learning Rule automatically learn the optimal weight coefficients
Gradient descent is a first-order iterative optimization algorithm
55 Gradient descent
for finding a local minimum of a differentiable function
The Delta rule in machine learning and neural network
56 environments is a specific type of backpropagation that helps to
Delta rule
refine connectionist ML/AI networks, making connections
between inputs and outputs with layers of artificial neurons.
Multilayer networks solve the classification problem for non linear
Multilayer
57 sets by employing hidden layers, whose neurons are not directly
networks
connected to the output
The Backpropagation algorithm looks for the minimum value of
58 Backpropagation
the error function in weight space using a technique called the
algorithm
delta rule or gradient descent.
Gradient descent is an optimization algorithm used to minimize
some function by iteratively moving in the direction of steepest
59 Gradient descent descent as defined by the negative of the gradient. In machine
learning, we use gradient descent to update the parameters of our
mode
Multilayer networks solve the classification problem for non linear
60 Multilayer
sets by employing hidden layers, whose neurons are not directly
networks
connected to the output.
A multilayer perceptron (MLP) is a class of feedforward artificial
neural network (ANN). MLP utilizes a supervised learning
Multilayer technique called backpropagation for training. Its multiple
61
perceptron layers and non-linear activation distinguish MLP from a
linear perceptron. It can distinguish data that is not linearly
separable.
The activation function also helps the perceptron to learn, when it
62 Activation is part of a multilayer perceptron (MLP). Certain properties of
function the activation function, especially its non-linear nature, make it
possible to train complex neural networks.
The connections between the different neurons are represented by
the edge connecting two nodes in the graph representation of the
63 Representation of
artificial neural network. They are called weights and are
Neural Networks
typically represented as wij. The weights on a neural network is
the particular case of the parameters on any parametric model
A linear threshold unit is a simple artificial neuron whose output is
its thresholded total net input. That is, an LTU with threshold T
64 Threshold unit
calculates the weighted sum of its inputs, and then outputs 0 if this
sum is less than T, and 1 if the sum is greater than T
Backpropagation simplifies the network structure by removing
65 Need of weighted links that have a minimal effect on the trained network.
Backpropagation It is especially useful for deep neural networks working on error-
prone projects, such as image or speech recognition.
Difference The terms cost and loss functions almost refer to the same
66 between Cost and meaning.The cost function is calculated as an average of loss
Loss function functions. The loss function is a value which is calculated at every
instance. So, for a single training cycle loss is calculated numerous
times, but the cost function is only calculated once
Error-Correction Learning, used with supervised learning, is the
Error-Correction
67 technique of comparing the system output to the desired output
Learning
value, and using that error to direct the training.
is that neuron is (cytology) a cell of the nervous system, which
Difference conducts nerve impulses; consisting of an axon and several
68 between neuron dendrites neurons are connected by synapses while perceptron is
and Perceptron an element, analogous to a neuron, of an artificial neural network
consisting of one or more layers of artificial neurons.
Perceptron algorithms can be categorized into single-layer and
Perceptron multi-layer perceptrons. The single-layer type organizes neurons in
69
algorithms a single layer while the multi-layer type arranges neurons in
multiple layers
70 Problem in Neural If you accept most classes of problems can be reduced to
Network functions, this statement implies a neural network
Neural Networks have the ability to learn by themselves and
produce the output that is not limited to the input provided to
71 Advantages of them.
Neural Network
The input is stored in its own networks instead of a database,
hence the loss of data does not affect its working.
Neural networks can be used to recognize handwritten characters.
72 Applications of Image Compression - Neural networks can receive and process
Neural Network vast amounts of information at once, making them useful in image
compression
 Early Stopping:Early stopping is a form of regularization
while training a model with an iterative method, such as
gradient descent.
73 prevent Overfitting
in a neural network  Use Data Augmentation
 Use Regularization
 Use Dropouts
Feedforward Neural Network – Artificial Neuron
Radial basis function Neural Network
74 Types of Neural Kohonen Self Organizing Neural Network
Network
Recurrent Neural Network(RNN) – Long Short Term Memory
Convolutional Neural Network,Modular Neural Network
Recurrent neural networks (RNN) are the state of the art algorithm
for sequential data and are used by Apple's Siri and and Google's
75 Recurrent neural
voice search. It is the first algorithm that remembers its input, due
networks
to an internal memory, which makes it perfectly suited
for machine learning problems that involve sequential data.
UNIT IV INSTANCE BASED LEARNING
Definition. Instance-based learning refers to a family of
76
Instance-based techniques for classification and regression, which
learning produce a class label/predication based on the similarity
of the query to its nearest neighbor(s) in the training set.
Instance-based learning includes nearest neighbor,
Why instance locally weighted regression and case-based reasoning
77
based learning is methods. Instance-based methods are
called as lazy sometimes referred to as lazy learning methods because
learning they delay processing until a new instance must be
classified.
78
lazy learner A lazy learner simply stores the training data and only
technique when it sees a test tuple starts generalization to classify
the tuple based on its similarity to the stored training
tuples
A lazy learning algorithm is simply an algorithm where
79 Lazy algorithm the algorithm generalizes the data after a query is made.
The best example for this is KNN
KNN algorithm is one of the simplest classification
80
Why KNN algorithm and it is one of the most used learning
algorithm is used algorithms.KNN is a non-parametric, lazy learning
algorithm
K-NN is a lazy learner because it doesn't learn a
81
Why KNN is a
discriminative function from the training data but
lazy learner
“memorizes” the training dataset instead.
'k' in KNN is a parameter that refers to the number of
82
What does K
nearest neighbours to include in the majority of the
mean in kNN
voting process.
The 'K' in K-Means Clustering has nothing to do with
Is K means the 'K' in KNN algorithm. k-Means Clustering is an
83 supervised or unsupervised learning algorithm that is used for
unsupervised clustering whereas KNN is a supervised learning
algorithm used for classification
Nearest Neighbor Rule selects the class for x with the
84
What is nearest
assumption that: If x' and x were overlapping (at the
Neighbour rule
same point), they would share the same class
The main disadvantage of the KNN algorithm is that it is
Which is a
85
a lazy learner, i.e. it does not learn anything from the
disadvantage of
training data and simply uses the training data itself for
KNN
classification.
The main advantages of kNN for classification are: Very
What are simple implementation. Robust with regard to the search
86 advantages of space; for instance, classes don't have to be linearly
KNN separable. Classifier can be updated online at very little
cost as new instances with known classes are presented
The k-nearest neighbors (KNN) algorithm is a simple,
K Nearest supervised machine learning algorithm that can be used
87
Neighbor to solve both classification and regression problems. It's
algorithm in easy to implement and understand, but has a major
machine learning drawback of becoming significantly slows as the size of
that data in use grows
Locally weighted regression (LWR) attempts to fit the
training data only in a region around the location of a
88
Locally weighted query example. LWR is a type of lazy learning,
regression therefore the processing of training data is often
postponed until the target value of a query example
needs to be predicted.
In weighted kNN, the nearest k points are given
89 weighted kNN
a weight using a function called as the kernel function
 broad range of methods for distance weighting
the training examples
Remarks on
 range of methods for locally approximating
90 Locally weighted
target functions
regression

91 Radial basis Radial basis functions are means to approximate


functions multivariable (also called multivariate) functions by
linear combinations of terms based on a single
univariate function (the radial basis function). This is
radialised so that in can be used in more than one
dimension
Case-based reasoning (CBR) is a paradigm of artificial
intelligence and cognitive science that models
92
Case-based the reasoning process as primarily memory based. Case-
reasoning based reasoners solve new problems by retrieving stored
'cases' describing similar prior problem-solving episodes
and adapting their solutions to fit new needs
A lazy learning algorithm is simply an algorithm where
93 Lazy Learning the algorithm generalizes the data after a query is made.
The best example for this is KNN.
In artificial intelligence, eager learning is a learning
method in which the system tries to construct a general,
94
input-independent target function during training of the
Eager Learning
system, as opposed to lazy learning, where
generalization beyond the training data is delayed until a
query is made to the system
The Euclidean distance between two points in either the
plane or 3-dimensional space measures the length of a
95
Euclidean
segment connecting the two points. It is the most
distance
obvious way of representing distance between two
points
Why do we use Euclidean Distance gives the distance from each cell in
96 Euclidean the raster to the closest source
distance
Why Euclidean Usually, the Euclidean distance is used as
97 distance is used the distance metric. Then, it assigns the point to the class
in Knn among its k nearest neighbours (where k is an integer).
An RBF is a function that changes with distance from a
98
Radial basis location. For example, suppose the radial basis
function function is simply the distance from each location, so it
forms an inverted cone over each location
A radial basis function (RBF) is a real-
What is Gaussian valued function whose value depends only on the
99 radial basis distance between the input and some fixed point, either
function the origin, so that , or some other fixed point , called a
center, so that .
KNN algorithm can be used for both classification and
regression problems. The KNN algorithm uses 'feature
Can Knn be used
100 similarity' to predict the values of any new data points.
for prediction
... The average of the values is taken to be the
final prediction
UNIT V ADVANCED LEARNING
A Bayesian network is a probabilistic graphical model
101
Bayesian
that represents a set of variables and their conditional
network
dependencies via a directed acyclic graph (DAG).
In computer science and mathematics, a DAG is
102
Directed Acyclic
a graph that is directed and without cycles connecting
Graph (DAG)
the other edges.
causal graph A causal graph will depict whatever your assumptions
103 that you're making about the relationship between these
variables.
Two events A and B are conditionally
independent given an event C with P(C)>0 if
Conditional
104 P(A∩B|C)=P(A|C)P(B|C)(1.8) Recall that from the
Independence
definition of conditional probability,
P(A|B)=P(A∩B)P(B), if P(B)>0.
105
Diagnostic Diagnostic or bottom-up inference.
Inference
Probabilistic A probabilistic database is an uncertain database in
106
Database which the possible worlds have associated probabilities.
Confounding, in statistics, an extraneous variable in
107
a statistical model that correlates (directly or inversely)
Hidden Variables
with both the dependent variable and the
independent variable.
108
Direct influence means that we can take specific steps to
Direct Influence
try to get the thing done.
Multinomial logistic regression is used to predict a
Multinomial
109 nominal dependent variable given one or more
Variable
independent variables.
A Generative Model is a powerful way of learning any
110
Generative
kind of data distribution using unsupervised learning and
Model
it has achieved tremendous success in just few years.
A phylogenetic tree is a diagram that represents
111
Phylogenetic evolutionary relationships among
Tree organisms. Phylogenetic trees are hypotheses, not
definitive facts.
Hidden Markov models (HMMs) have proven to be one
112
Hidden Markov
of the most widely used tools for learning probabilistic
Model (HMM)
models of time series data.
A Kalman Filter can be applied to take in the GPS data
113 Kalman Filter from the car, however GPS devices are not always
entirely accurate.
Bayes ball is an efficient algorithm for computing d-
114 Bayes’ ball separation by passing simple messages between nodes of
the graph.
The junction tree algorithm (also known as
115 Junction Trees 'Clique Tree') is a method used in machine learning to
extract marginalization in general graphs.
116
Markov random A Markov Random Field is a graphical model of a joint
field probability distribution.
A maximal clique is a clique that cannot be extended by
117 Maximal Clique including one more adjacent vertex, meaning it is not a
subset of a larger clique.
118 Factor Graph A factor graph is a type of probabilistic graphical model.
Sum-product algorithm, which operates in a factor graph
119
Sum-Product
and at- tempts to compute various marginal functions
Algorithm
associated with the global function.
120
Max-Product Max-product is a standard belief
Algorithm propagation algorithm on factor graph models.
121
A decision node is a node in an activity at which the
Decision Node
flow branches into several optional flows.
122 Sensor Fusion where the data from different sensors are integrated to
extract more information for a specific application.
123
Random The random subspace method for constructing decision
Subspace forests.
Error-Correcting ECOC is an ensemble method designed for multi-class
124
Output Codes classification problem.
A Bayesian network is a probabilistic graphical model
125
Bayesian
that represents a set of variables and their conditional
network
dependencies via a directed acyclic graph (DAG).
GATE QUESTIONS
Multiple Expert classification methods rely on a large
Multiple Expert
126 training dataset in order to be properly utilized.
An ensemble is itself a supervised learning algorithm,
127 Ensemble because it can be trained and then used to make
predictions.
Linear Opinion An important question when eliciting opinions from
128 Pools experts is how to aggregate the reported opinions.
Hamming Hamming distance is a metric for comparing two binary
129 distance data strings.
130
Bagging is used when the goal is to reduce the variance
Bagging
of a decision tree classifier.
131
The term 'Boosting' refers to a family of algorithms
Boosting
which converts weak learner to strong learners.
AdaBoost is an ensemble learning method (also known
132 AdaBoost as “meta-learning”) which was initially created to
increase the efficiency of binary classifiers.
133
A decision stump is a machine learning model
Decision Stump
consisting of a one-level decision tree.
Mixture of experts refers to a machine
Mixture of learning technique where multiple experts (learners) are
134
experts used to divide the problem space into homogeneous
regions.
Dynamic Dynamic Classifier Selection based on Multiple
135 Classifier Classifier. Ensembles using Accuracy and Diversity.
Selection Measure accuracy and diversity.
136
Stacked Stacked generalization, a scheme for minimizing the
Generalization generalization error rate of one or more generalizers.
137 Cascading Cascading is a multistage method
Spoofing is the act of disguising a communication from
138 Spoofing an unknown source as being from a known, trusted
source.
Multiple kernel learning (MKL) algorithms aim to find
139
Multiple kernel
the best convex combination of a set of kernels to form
learning
the best classifier.
In the classical k-armed bandit problem, there are k
140 k-armed bandit alternative arms, each with a stochastic reward whose
probability distribution is initially unknown.
141
Markov decision Markov decision process (MDP) is a discrete-
process time stochastic control process.
A stopping rule problem has a finite horizon if there is a
142 finite-horizon known upper bound on. the number of stages at which
one may stop.
Infinite horizon problems are further characterized by
infinite-horizon the fact that the number of stages N is infinite.
143
An Optimal Policy is a policy where you are always
144 Optimal Policy choosing the. action that maximizes the
“return”/”utility” of the current state.
The Bellman Equations. Step-by-step derivation,
145
Bellman’s
explanation, and demystification of the most important
equation
equations in reinforcement learning.
Value iteration is a method of computing an optimal
146
MDP policy and its value. Value iteration starts at the
Value iteration
"end" and then works backward, refining an estimate of
either Q* or V*.
In Policy Iteration - You randomly select a policy and
147
find value function corresponding to it , then find a new
Policy Iteration
policy based on the previous value function, and so on
this will lead to optimal policy
Temporal difference (TD) learning is an approach to
148
Temporal
learning how to predict a quantity that depends on future
difference
values of a given signal.
A greedy search algorithm is an. algorithm that uses a
149 greedy search heuristic for making locally optimal choices at each
stage with the hope of finding a global optimum.
Q-learning is an off policy reinforcement learning
150 Q-learning algorithm that seeks to find the best action to take given
the current state.
Signatures:1.
1.Dr.G.KAVITHA Prof&Head
Faculty Team Prepared
2.Dr.N.NaveenKumar ASP/CSE 2.

HoD

You might also like