10

Using SVM with sklearn library, I would like to plot the data with each labels representing its color. I don't want to color the points but filling area with colors.

I have now :

d_pred, d_train_std, d_test_std, l_train, l_test

d_pred are the labels predicted. I would plot d_pred with d_train_std with shape : (70000,2) where X-axis are the first column and Y-Axis the second column.

Thank you.

5
  • what exactly do you want to plot? do you only need a scatter plot or do you want to plot the decision surface/boundaries ?? Also what is d_train_std ?
    – seralouk
    Commented Jul 24, 2018 at 17:38
  • Thank you for your answer. I also want to plot the decision boundary. d_train_std is the training data that I have normalized.
    – anthonya
    Commented Jul 25, 2018 at 7:55
  • see my answer and let me know
    – seralouk
    Commented Jul 25, 2018 at 10:00
  • Thank you very much
    – anthonya
    Commented Jul 25, 2018 at 11:46
  • I did it, thank you
    – anthonya
    Commented Jul 25, 2018 at 13:28

3 Answers 3

15

You cannot visualize the decision surface for a lot of features. This is because the dimensions will be too many and there is no way to visualize an N-dimensional surface.

However, you can use 2 features and plot nice decision surfaces as follows.

I have also written an article about this here: https://towardsdatascience.com/support-vector-machines-svm-clearly-explained-a-python-tutorial-for-classification-problems-29c539f3ad8?source=friends_link&sk=80f72ab272550d76a0cc3730d7c8af35

Case 1: 2D plot for 2 features and using the iris dataset

from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target

def make_meshgrid(x, y, h=.02):
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    return xx, yy

def plot_contours(ax, clf, xx, yy, **params):
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

model = svm.SVC(kernel='linear')
clf = model.fit(X, y)

fig, ax = plt.subplots()
# title for the plots
title = ('Decision surface of linear SVC ')
# Set-up grid for plotting.
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_ylabel('y label here')
ax.set_xlabel('x label here')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
ax.legend()
plt.show()

enter image description here

Case 2: 3D plot for 3 features and using the iris dataset

from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from mpl_toolkits.mplot3d import Axes3D

iris = datasets.load_iris()
X = iris.data[:, :3]  # we only take the first three features.
Y = iris.target

#make it binary classification problem
X = X[np.logical_or(Y==0,Y==1)]
Y = Y[np.logical_or(Y==0,Y==1)]

model = svm.SVC(kernel='linear')
clf = model.fit(X, Y)

# The equation of the separating plane is given by all x so that np.dot(svc.coef_[0], x) + b = 0.
# Solve for w3 (z)
z = lambda x,y: (-clf.intercept_[0]-clf.coef_[0][0]*x -clf.coef_[0][1]*y) / clf.coef_[0][2]

tmp = np.linspace(-5,5,30)
x,y = np.meshgrid(tmp,tmp)

fig = plt.figure()
ax  = fig.add_subplot(111, projection='3d')
ax.plot3D(X[Y==0,0], X[Y==0,1], X[Y==0,2],'ob')
ax.plot3D(X[Y==1,0], X[Y==1,1], X[Y==1,2],'sr')
ax.plot_surface(x, y, z(x,y))
ax.view_init(30, 60)
plt.show()

enter image description here

0

It can be difficult to get the function in 3D. An easy way to get a visualization is to get a large amount of points that cover your point space and run them through your learned function (my_model.predict), keep the points that hit inside the function, and visualize them. The more you add the more defined the boundary will be.

1
0

Here's my code that does what @Christian Tuchez describes:

outputs = my_clf.predict(1_test)

hits = []
for i in range(outputs.size):
    if outputs[i] == 1:
        hits.append(i)  # save the index where it's 1

This saves the index of all the points that hit in the function (saved in the "hits" list). You can probably accomplish this without a loop, I just found it easiest for me.

Then to display just those points, you'd write something like this:

ax.scatter(1_test[hits[:], 0], 1_test[hits[:], 1], 1_test[hits[:], 2], c="cyan", s=2, edgecolor=None)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.