ML PDF
ML PDF
ML PDF
This page will guide you to brush up on the skills of machine learning to crack the
interview.
Here, our focus will be on real-world scenario ML interview questions asked in
Microso , Amazon, etc., And how to answer them.
Let’s get started!
Firstly, Machine Learning refers to the process of training a computer program to
build a statistical model based on data. The goal of machine learning (ML) is to turn
data and identify the key patterns out of data or to get key insights.
For example, if we have a historical dataset of actual sales figures, we can train
machine learning models to predict sales for the coming future.
The simplest answer is to make our lives easier. In the early days of “intelligent”
applications, many systems used hardcoded rules of “if” and “else” decisions to
process data or adjust the user input. Think of a spam filter whose job is to move the
appropriate incoming email messages to a spam folder.
But with the machine learning algorithms, we are given ample information for the
data to learn and identify the patterns from the data.
Unlike the normal problems we don’t need to write the new rules for each problem in
machine learning, we just need to use the same workflow but with a different
dataset.
Let’s talk about Alen Turing, in his 1950 paper, “Computing Machinery and
Intelligence”, Alen asked, “Can machines think?”
Full paper here
The paper describes the “Imitation Game”, which includes three participants -
Human acting as a judge,
Another human, and
A computer is an attempt to convince the judge that it is human.
The judge asks the other two participants to talk. While they respond the judge needs
to decide which response came from the computer. If the judge could not tell the
difference the computer won the game.
The test continues today as an annual competition in artificial intelligence. The aim
is simple enough: convince the judge that they are chatting to a human instead of a
computer chatbot program.
The different naive Bayes classifiers mainly differ by the assumptions they make
regarding the distribution of P(yi | xi): can be Bernoulli, binomial, Gaussian, and so
on.
Thus making the dataset easier to visualize. PCA is used in finance, neuroscience, and
pharmacology.
It is very useful as a preprocessing step, especially when there are linear correlations
between features.
Suppose we have given some data points that each belong to one of two classes, and
the goal is to separate two classes based on a set of examples.
In SVM, a data point is viewed as a p-dimensional vector (a list of p numbers), and we
wanted to know whether we can separate such points with a (p-1)-dimensional
hyperplane. This is called a linear classifier.
There are many hyperplanes that classify the data. To choose the best hyperplane
that represents the largest separation or margin between the two classes.
If such a hyperplane exists, it is known as a maximum-margin hyperplane and the
linear classifier it defines is known as a maximum margin classifier. The best
hyperplane that divides the data in H3
We have data (x1, y1), ..., (xn, yn), and different features (xii, ..., xip), and yiis either 1
or -1.
The equation of the hyperplane H3 is the set of points satisfying:
w. x-b = 0
Where w is the normal vector of the hyperplane. The parameter b||w||determines the
offset of the hyperplane from the original along the normal vector w
So for each i, either xiis in the hyperplane of 1 or -1. Basically, xisatisfies:
w . xi - b 1 or w. xi - b -1
10. What is Cross-Validation?
Cross-validation is a method of splitting all your data into three parts: training,
testing, and validation data. Data is split into k subsets, and the model has trained on
k-1of those datasets.
The last subset is held for testing. This is done for each of the subsets. This is k-fold
cross-validation. Finally, the scores from all the k-folds are averaged to produce the
final score.
Cross-validation
Whereas, recall answers the question, “Out of all the items that are truly relevant,
how many are found by the classifier?
In general, the meaning of precision is the fact of being exact and accurate. So the
same will go in our machine learning model as well. If you have a set of items that
your model needs to predict to be relevant. How many items are truly relevant?
The below figure shows the Venn diagram that precision and recall.
Whereas for the Underfitting case we are not able to understand or capture the
patterns from the data, in this case, we need to change the algorithms, or we need to
feed more data points to the model.
17. What are Loss Function and Cost Functions? Explain the key
Difference Between them?
When calculating loss we consider only a single data point, then we use the term loss
function.
Whereas, when calculating the sum of error for multiple data then we use the cost
function. There is no major difference.
In other words, the loss function is to capture the difference between the actual and
predicted values for a single record whereas cost functions aggregate the difference
for the entire training dataset.
The Most commonly used loss functions are Mean-squared error and Hinge loss.
Mean-Squared Error(MSE): In simple words, we can say how our model predicted
values against the actual values.
MSE = √(predicted value - actual value)2
Hinge loss: It is used to train the machine learning classifier, which is
L(y) = max(0,1- yy)
Where y = -1 or 1 indicating two classes and y represents the output form of the
classifier. The most common cost function represents the total cost as the sum of the
fixed costs and the variable costs in the equation y = mx + b
Based on the above observations select one best-fit algorithm for a particular
dataset.
23. What is Clustering?
Clustering is the process of grouping a set of objects into a number of groups. Objects
should be similar to one another within the same cluster and dissimilar to those in
other clusters.
A few types of clustering are:
Hierarchical clustering
K means clustering
Density-based clustering
Fuzzy clustering, etc.
Page 17 © Copyright by Interviewbit
Machine Learning Interview Questions
29. What is P-value?
P-values are used to make a decision about a hypothesis test. P-value is the minimum
significant level at which you can reject the null hypothesis. The lower the p-value,
the more likely you reject the null hypothesis.
Parametric models will have limited parameters and to predict new data, you only
need to know the parameter of the model.
Non-Parametric models have no limits in taking a number of parameters, allowing for
more flexibility and to predict new data. You need to know the state of the data and
model parameters.
Conclusion
The above-listed questions are the basics of machine learning. Machine learning is
advancing so fast hence new concepts will emerge. So to get up to date with that join
communities, attend conferences, read research papers. By doing so you can crack
any ML interview.
Additional Resources
Practice Coding
Python Interview Questions