Machine Learning Notes

Mission Machine Learning
Pre-requisite Maths topics for learning Machine Learning

Linear Algebra WHY: most of the machine learning that we do, deals
with scalars and vectors and matrices -- vectors of features, matrices of weights etc.
You do vector matrix multiplication like say in logistic regression, neural networks... Or
you do matrix transpose first and then multiplication (for say in error back propagation
in neural networks). Sometimes you need to do clustering of input, maybe using spectral
clustering techniques, which requires you to know what eigen values are, eigen
vectors are.. Sometimes you need to take inverses of matrices, say in computing
inverse of covariance matrix for fitting a Gaussian distribution.
Optimization Theory WHY: How do you train the weights of your model so that the
training error is minimized ? Answer: optimization.
You may need to know how to take derivatives of loss function with respect to some
parameter so that you can carry out gradient descent optimization. You may need to
know what gradients mean. What are hessians if you are doing second order
optimization like LBFGS. You may need to learn what Newton steps are, maybe to
solve line searches. You will need to understand functional derivatives to better
understand Gradient Boosted Decision Trees. You will need to
understand convergence properties of various optimization methods to get an idea of
how fast or slow your algorithm will run.
Probability and Statistics WHY: When you are doing machine learning, you are
primarily after some kind of distribution. What is the probability of an output given
my input ? Why do I need this ? When your machine learning model predicts (assigns
probabilities) high enough to known observation, you know you have a good model at
hand. Its a goodness criteria. Statistics help you to count well, normalize well, obtain
distributions, find out the mean of your input feature, its standard deviation. Why
do you need these things ? You need means and variances to better normalize your input
data before you feed it into you machine learning system. This helps in faster
convergence (optimization theory concept).
Signal Processing WHY: You usually do not feed raw input to your machine learning
systems. You do some kind of pre processing. For instance you would like to extract
some features from the input speech signal, or an image. Now, extracting these features
needs you to know properties of these underlying signals. Digital signal processing or
Image processing will help you gain expertise. You would be in a better situation to
know what feature extraction works and what does not. You would want to learn what is
a Fourier transform because maybe you would like to apply that to speech signal or
maybe apply discrete cosine transform to images before using them as features to
your machine learning system.
Pre-requisites for Machine Learning:
1. Vectors Basics
Click on the link below :
vectors basics.pdf
2. Matrix, Row Vector and Column vector

Understanding how matrices are categorized by dimension is the trick to seeing the difference between
column and row vectors. Vectors can be viewed as a special type of matrix where one of their two
dimensions is always equal to 1. Depending on which dimension is set to 1, you'll get either a column or a row
vector. A column vector is an nx1 matrix, because it always has 1 column and some number of rows. A row
vector is a 1xn matrix, as it has 1 row and some number of columns. This is the major difference between a column
and a row vector.
3. Identity Matrix
It has elements Iij that equal 1 if i = j and 0 if i = j
4. Root Mean Square Deviation or Root Mean Square Error

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) (or
sometimes root-mean-squared error) is a frequently used measure of the differences between
values (sample and population values) predicted by a model or an estimator and the values
actually observed. The RMSD represents the sample standard deviation of the differences
between predicted values and observed values. These individual differences are
called residuals when the calculations are performed over the data sample that was used for
estimation, and are called “prediction errors” when computed out-of-sample. The RMSD serves to
aggregate the magnitudes of the errors in predictions for various times into a single measure of
predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models
for a particular dataset and not between datasets, as it is scale-dependent.
RMSD is the square root of the average of squared errors. The effect of each error on RMSD is
proportional to the size of the squared error; thus larger errors have a disproportionately large
effect on RMSD.
Important Terminologies in Machine learning

 Proliferation
 Training Set
 Generalization
 Supervised Learning
 Regression
 Clustering
 Density Estimation
 Exploration/Exploitation
 Polynomial Curve fitting
 Error Function
 Over fitting
 Regularized Error Function


Machine Learning Notes

Uploaded by

Copyright:

Available Formats

Machine Learning Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Notes

Uploaded by

Copyright:

Available Formats

Mission Machine Learning

Pre-requisite Maths topics for learning Machine Learning

2. Matrix, Row Vector and Column vector

4. Root Mean Square Deviation or Root Mean Square Error

Important Terminologies in Machine learning

You might also like