Summer Training Report ML
Summer Training Report ML
Summer Training Report ML
ON
MACHINE LEARNING
At
I would especially like to thank Mr. Ankush Singla, Project Mentor &
Faculty Head, without whose guidance and support this training would
not have been possible. Their encouragement and experience helped
to realize the practical aspect of programming. They gave me ample
support and help for accomplishment of my project. I feel grateful to
them for giving me the opportunity to have a practical experience in
this field. Their knowledge and immense work experience helped me a
lot in making this six weeks Practical Summer Training Program a great
learning experience.
VISHAL KUMAR
ABSTRACT
The area of Machine Learning deals with the design of programs that
can learn rules from data, adapt to changes, and improve performance
with experience. In addition to being one of the initial dreams of
Computer Science, Machine Learning has become crucial as
computers are expected to solve increasingly complex problems and
become more integrated into our daily lives. This is a hard problem, since
making a machine learn from its computational tasks requires work at
several levels, and complexities and ambiguities arise at each of those
levels.
So, here we study how the Machine learning take place, what are the
methods, discuss various Projects (Implemented during Training)
applications, present and future status of machine learning.
TABLE OF CONTENT
Projects ------------------------------------------------------------------------------------- 40
REFERENCES ------------------------------------------------------------------------------ 47
Introduction to Machine Learning
Learning?
Learning is a phenomenon and process which has manifestations of various
aspects. Roughly speaking, learning process includes (one or more of) the
following:
2) To solve the same problem more effectively and give better quality
solutions.
After the data is pre-processed, we get some good structured data, and
this data is now an input for machine learning. But is this a one-time job?
Of course not, the process has to be iterative, and it has to be iterative
until the data is available. In machine learning the major chunk of time is
spent in this process. That is, working on the data to make it structured,
clean, ready and available. Once the data is available, the algorithms
could be applied to the data. Not only pre-processing tools, but the
machine learning products also offer a large number of machine learning
algorithms as well. The result of the algorithm applied data is a model,
but now the question is whether this is the final model we needed.
No, it is the candidate model that we got. Candidate model means the first
most appropriate model that we get, but still it needs to be massaged. But
do we get only one candidate model? Of course not, since this is an
iterative process, we do not actually know what the best candidate model
is, until we again and again produce several candidate models through the
iterative process. We do it until we get the model that is good enough to
be deployed. Once the model is deployed, applications start making use
of it, so there is iteration at small levels and at the largest level as well.
We need to repeat the entire process again and again and re-create the
model at regular intervals. The reason again for this process is very
simple, it’s because the scenarios and factors change and we need to have
our model up to date and real all the time. This could eventually also
mean to process new data or applying new algorithms altogether.
The task of the supervised learner is to predict the value of the function for
any valid input object after having seen a number of training examples (i.e.
pairs of input and target output). To achieve this, the learner has to
generalize from the presented data to unseen situations in a "reasonable"
way. “Supervised learning is a machine learning technique whereby the
algorithm is first presented with training data which consists of examples
which include both the inputs and the desired outputs; thus enabling it to
learn a function. The learner should then be able to generalize from the
presented data to unseen examples.” By Tom M. Mitchell
Unsupervised Machine Learning: Unsupervised learning is a type of
machine learning where manual labels of inputs are not used. It is
distinguished from supervised learning approaches which learn how to
perform a task, such as classification or regression, using a set of human
prepared examples. Unsupervised learning means we are only given the X
(Feature Vector) and some (ultimate) feedback function on our
performance. We simply have a training set of vectors without function
values of them. The problem in this case, typically, is to partition the
training set into subsets in some appropriate way. Input data is not labeled
and does not have a known result. A model is prepared by deducing
structures present in the input data. This may be to extract general rules. It
may be through a mathematical process to systematically reduce
redundancy, or it may be to organize data by similarity.
Semi-Supervised Learning: Semi-Supervised learning uses both labeled
and unlabeled data to perform an otherwise supervised learning or
unsupervised learning task. There is a desired prediction problem but the
model must learn the structures to organize the data as well as make
predictions. The goal is to learn a predictor that predicts future test data
better than the predictor learned from the labeled training data alone. semi-
supervised learning finds applications in cognitive psychology as a
computational model for human learning. In human categorization and
concept forming, the environment provides unsupervised data (e.g., a child
watching surrounding objects by herself) in addition to labeled data from a
teacher (e.g., Dad points to an object and says “bird!”). There is evidence
that human beings can combine labeled and unlabeled data to facilitate
learning.
Y = F(X)
The goal is to approximate the mapping function so well that when you
have new input data (x) that you can predict the output variables (Y) for
that data.
Here, is the number of training examples. To make the math a little bit
easier, we put a factor of , and it gives us the same value of the process.
Gradient Descent
Gradient descent is an algorithm that is used to minimize a function.
Gradient descent is used not only in linear regression; it is a more general
algorithm.
We will now learn how gradient descent algorithm is used to minimize
some arbitrary function f and, later on, we will apply it to a cost function
to determine its minimum.
We will start off by some initial guesses for the values of and then
keep on changing the values according to the formula:
Here, is called the learning rate, and it determines how big a step needs
to be taken when updating the parameters. The learning rate is always a
positive number.
We want to simultaneously update , that is, calculate the right-
hand-side of the above equation for both and then
update the values of the parameters to the newly calculated ones. This
process is repeated till convergence is achieved.
If is too small, then we will end up taking tiny baby steps, which means a
lot of steps before we get anywhere near the global minimum. Now, if is
too large, then there is a possibility that we miss the minimum entirely. It
may fail to converge or it can even diverge.
and as y can take only 0 & 1, the other value probability is 1 minus the
hypothesis value.
With the above interpretation we can safely decide the decision boundary
with the following rule: y=1 if g(y)≥0.5,
else y=0. g(ΘTX)≥0.5 implies ΘTX≥0 and similarly for less than
condition.
Cost function
With the modified hypothesis function, taking a square error function
won't work as it no longer convex in nature and tedious to minimize. We
take up a new form of cost function which is as follows:
E(g(Θ,X),y) = −log(g(Θ,X)) if y=1
E(g(Θ,X),y) = −log(1−g(Θ,X)) if y=0
where β is equal to Θ.
for each i =1,...n and p is the learning rate at which we move along the
slope on the curve to minimize the cost function.
RED GREEN
To demonstrate the concept of Naïve Bayes Classification, consider the
example displayed in the illustration above. As indicated, the objects can
be classified as either GREEN or RED. Our task is to classify new cases as
they arrive, i.e., decide to which class label they belong, based on the
currently exiting objects.
Since there are twice as many GREEN objects as RED, it is reasonable to
believe that a new case (which hasn't been observed yet) is twice as likely
to have membership GREEN rather than RED. In the Bayesian analysis,
this belief is known as the prior probability. Prior probabilities are based
on previous experience, in this case the percentage of GREEN and RED
objects, and often used to predict outcomes before they actually happen.
Thus, we can write:
w0+wTxneg = −1 (2)
If we subtract those two linear equations (1) and (2) from each other, we
get:
Random forest: Random forest is just an improvement over the top of the
decision tree algorithm. The core idea behind Random Forest is to generate
multiple small decision trees from random subsets of the data (hence the
name “Random Forest”). Each of the decision tree gives a biased classifier
(as it only considers a subset of the data). They each capture different
trends in the data. This ensemble of trees is like a team of experts each with
a little knowledge over the overall subject but thorough in their area of
expertise. Now, in case of classification the majority vote is considered to
classify a class. In analogy with experts, it is like asking the same multiple
choice question to each expert and taking the answer as the one that most
no. of experts vote as correct. In case of Regression, we can use the avg. of
all trees as our prediction. In addition to this, we can also weight some
more decisive trees high relative to others by testing on the validation data.
Unsupervised Learning Algorithms: Unsupervised learning is where you
only have input data (X) and no corresponding output variables.The goal
for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.
Soft Clustering: In soft clustering, instead of putting each data point into
a separate cluster, a probability or likelihood of that data point to be in
those clusters is assigned. For example, from the above scenario each
costumer is assigned a probability to be in either of 10 clusters of the retail
store.
1 Specify the desired number of clusters K : Let us choose k=2 for these 5
data points in 2-D space.
2 Randomly assign each data point to a cluster : Let’s assign three points
in cluster 1 shown using red color and two points in cluster 2 shown using
grey color.
3 Compute cluster centroids : The centroid of data points in the red cluster
is shown using red cross and those in grey cluster using grey cross.
4 Re-assign each point to the closest cluster centroid : Note that only the
data point at the bottom is assigned to the red cluster even though its
closer to the centroid of grey cluster. Thus, we assign that data point into
grey cluster
Single Neuron(Perceptron):
The basic unit of computation in a neural network is the neuron, often
called a node or unit. It receives input from some other nodes, or from an
external source and computes an output. Each input has an
associated weight (w), which is assigned on the basis of its relative
importance to other inputs. The node applies a function f (defined below)
to the weighted sum of its inputs as shown in Figure 1 below:
σ(x) = 1 / (1 + exp(−x))
tanh(x) = 2σ(2x) − 1
f(x) = max(0, x)
• Set of states, S
• Set of actions, A
• Reward function, R
• Policy, π
• Value, V
We have to take an action (A) to transition from our start state to our end
state (S). In return getting rewards (R) for each action we take. Our actions
can lead to a positive reward or negative reward.
The set of actions we took define our policy (π) and the rewards we get in
return defines our value (V). Our task here is to maximize our rewards by
choosing the correct policy. So we have to maximize
These are the basic libraries that transform Python from a general purpose
programming language into a powerful and robust tool for data analysis
and visualization. Sometimes called the SciPy Stack, they’re the
foundation that the more specialized tools are built on.
3.) Pandas adds data structures and tools that are designed for practical
data analysis in finance, statistics, social sciences, and engineering.
Pandas works well with incomplete, messy, and unlabeled data
(i.e., the kind of data you’re likely to encounter in the real world),
and provides tools for shaping, merging, reshaping, and slicing
datasets.
4.) IPython(Jupyter Notebook) extends the functionality of Python’s
interactive interpreter with a souped-up interactive shell that adds
introspection, rich media, shell syntax, tab completion, and
command history retrieval. It also acts as an embeddable
interpreter for your programs that can be really useful for
debugging. If you’ve ever used Mathematica or MATLAB, you
should feel comfortable with IPython.
5.) matplotlib is the standard Python library for creating 2D plots and
graphs. It’s pretty low-level, meaning it requires more commands
to generate nice-looking graphs and figures than with some more
advanced libraries. However, the flip side of that is flexibility.
With enough commands, you can make just about any kind of
graph you want with matplotlib.
Project Code:
Best Estimator learned through GridSearch
SVC(C=3,cache_size=200,class_weight=None,coef0=0.0, degree=3,
gamma=0.001,kernel='rbf', max_iter=-1, probability=False,
random_state=None,
shrinking=True, tol=0.001, verbose=False)
Other Projects Which are implemented During Summer Training:
Books:
Links:
https://www.medium.com/
https://www.analyticsvidhya.com
http://www.tutorialspoint.com/numpy
http://www.tutorialpoint.com/pandas