Basic of SVM Algorithm
Basic of SVM Algorithm
Basic of SVM Algorithm
Support Vector Machines is one of the most popular algorithms and is one of the best choices
for high-performance algorithms with a little tuning and it presents one of the most robust
prediction methods. SVM is implemented uniquely when compared to other ML algorithms.
An SVM training algorithm builds a model that assigns new examples to one category or the
other, making it a non-probabilistic binary linear classifier.
SVM is a Supervised Learning algorithm, which is used for Classification as well as
Regression problems. SVM is used for Classification problems in Machine Learning. In
addition to performing linear classification, SVMs can efficiently perform a non-linear
classification as well using a trick or parameter called as Kernel, which implicitly maps their
inputs into high-dimensional feature spaces. SVM is also an Unsupervised Learning algorithm.
When data is unlabelled, supervised learning is not possible, and an unsupervised learning
approach is required, which attempts to find natural clustering of the data to groups, and then
map new data to these formed groups.
The support-vector clustering algorithm, created by Hava Siegelmann and Vladimir Vapnik,
applies the statistics of support vectors, developed in the support vector machines algorithm, to
categorize unlabeled data, and is one of the most widely used clustering algorithms in
industrial applications. But here we will stick to the Supervised Learning model. A support
Vector Machine is a discriminative classifier formally defined by a separating hyper plane. In
other words, given labelled training data (supervised learning), the algorithm outputs an
optimal hyper plane which categorizes new examples. In two dimensional spaces, this hyper
plane is a line dividing a plane into two parts wherein each class lay in either side.
In simple terms, SVM as a model that represents the data points in space, mapped such that the
data points of the separate categories are divided by a clear gap that is as wide as possible. For
1 Dimensional data, the support vector classifier is a point. Similarly, for 2 Dimensional data,
the support vector classifier will be a line, and for 3-dimensional data, a support vector
classifier is a plane. In geometry, a hyper plane is a subspace whose dimension is one less than
that of its ambient space. If space is 3-dimensional then its hyperplanes are the 2-dimensional
planes, while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines. This
notion can be used in any general space in which the concept of the dimension of a subspace is
defined.
Mathematics of support vector machines
The key points for you to understand about support vector machines are:
1. Support vector machines find a hyperplane (that classifies data) by maximizing the
distance between the plane and nearest input data points (called support vectors).
2. This is done by minimizing the weight vector ‘w’ which is used to define the hyperplane.
3. Step 2 relies on optimization theory and certain assumptions (which are detailed out
below).
We all know the equation of a hyper plane is w.x+b=0 where w is a vector normal to hyper
plane and b is an offset.
Drawing this on a 2-dimensional space gives us all the possible solutions which satisfy this
equation. w defines the orientation and b the position in relation to the origin. Note that the
parameter vector w is perpendicular to the line itself. If a point, x, satisfies this condition then
we know it lies on the boundary itself. If the result is larger than 0 then it’s on the positive side
of the boundary, if it’s less than 0 then it’s on the negative side. This is now our decision rule.
Now, let’s add a few constraints. First, if we have two classes they’re denoted as yi = +1
or yi = -1. Second, I want every sample to have a value indicative of its class:
boundary:
As a matter of fact, if we’re not interested in the sign we can re-write this as:
Step 3: Explore the data to figure out what they look like
Start
S No x y
1 1 1
2 2 1
3 4 0
4 5 1
5 5 -1
6 6 0
7 1 -1
8 2 -1
1 1 1 1
Quality of Life After Chemotherapy
0.5
0 0 0
0 1 2 3 4 5 6 7
-0.5
-1 -1 -1 -1
-1.5
Quality of Life Before Chemotherapy
S2=(2,-1)
S3=(4,0)
Quality of Life Before and After Chemotherapy
1.5
1 S1
Quality of Life After Chemotherapy
0.5
0 S3
0 1 2 3 4 5 6 7
-0.5
-1 S2
-1.5
Quality of Life Before Chemotherapy
~ 2
s 1= 1
1
()
( )
2
~
s 2= −1
1
()
4
~
s 3= 0
1
Now we need to found three parameters α 1,α 2and α 3based on three linear equations
~~ ~~ ~~
α 1 S1 S 1+ α 2 S 2 S 1+ α 3 S 3 S1 =−1
( )( ) ( )( ) ( )( )
2 2 2 2 4 2
α 1 1 1 + α 2 −1 1 +α 3 0 1 =−1
1 1 1 1 1 1
~~ ~~ ~ ~
α 1 S1 S 2+ α 2 S 2 S 2+ α 3 S 3 S 2=−1
( )( ) ( )( ) ( )( )
2 2 2 2 4 2
α 1 1 −1 + α 2 −1 −1 +α 3 0 −1 =−1
1 1 1 1 1 1
~~ ~~ ~~
α 1 S1 S 3 +α 2 S2 S 3+ α 3 S 3 S 3=1
( )( ) ( )( ) ( )( )
2 4 2 4 4 4
α 1 1 0 + α 2 −1 0 +α 3 0 0 =1
1 1 1 1 1 1
6 α 1+ 4 α 2 +9 α 3=−1
4 α 1 +6 α 2 +9 α 3=−1
9 α 1+ 9 α 2 +17 α 3=1
Now we find the hyper plane which discriminate the positive class from the negative class is
given by
~ ~
w=∑ α i Si
~
i
() ( ) ()
2 2 4
~
w=α 1 1 + α 2 −1 +α 3 0 =1
1 1 1
() ( ) ()
2 2 4
~
w=(−3.25 ) 1 + (−3.25 ) −1 +(3.5) 0
1 1 1
y=wx+b
()
w= 1
0
and b = -3
1 S1
0.5
0 S3
0 1 2 3 4 5 6 7
-0.5
-1 S2
-1.5
Quality of Life Before Chemotherapy