scikit-learn : Radial Basis Function kernel, RBF

bogotobogo.com site search:

Supervised Learning - Linearly Separable Data

"In Euclidean geometry linearly separable is a geometric property of a pair of sets of points. This is most easily visualized in two dimensions (the Euclidean plane) by thinking of one set of points as being colored blue and the other set of points as being colored red. These two sets are linearly separable if there exists at least one line in the plane with all of the blue points on one side of the line and all the red points on the other side. This idea immediately generalizes to higher dimensional Euclidean spaces if line is replaced by hyperplane." - wiki : Linear separability

"Some supervised learning problems can be solved by very simple models (called generalized linear models) depending on the data. Others simply don't." - Machine Learning 101 - General Concepts

Examples: svm_gui.py

For better understanding, we'll run svm_gui.py which is under sklearn_tutorial/examples directory. We can download the tutorial from Tutorial Setup and Installation:

git clone https://github.com/astroML/sklearn_tutorial

It's an interactive example.

$ python $SKL_HOME/examples/svm_gui.py

Linear Model

Accuracy: 95.8333333333:

Accuracy: 66.6666666667:

The two pictures above used the Linear Support Vector Machine (SVM) that has been trained to perfectly separate 2 sets of data points labeled as white and black in a 2D space. Note that we used hyperplane as a separator

Non-Linear - (Gaussian) Radial Basis Function kernel

SVM with gaussian RBF (Radial Gasis Function) kernel is trained to separate 2 sets of data points. The points are labeled as white and black in a 2D space. This dataset cannot be separated by a simple linear model.

However, as we can see from the picture below, they can be easily kernelized to solve nonlinear classification, and that's one of the reasons why SVMs enjoy high popularity.

"In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in support vector machine classification." - Radial basis function kernel

Kernel SVM

Let's see how a nonlinear classification problem looks like using a sample dataset created by XOR logical operation (outputs true only when inputs differ - one is true, the other is false).

In the code below, we create XOR gate dataset (500 samples with either a class label of 1 or -1) using NumPy's logical_xor function:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
X_xor = np.random.randn(1000, 2)
y_xor = np.logical_xor(X_xor[:, 0] > 0, X_xor[:, 1] > 0)
y_xor = np.where(y_xor, 1, -1)
plt.scatter(X_xor[y_xor==1, 0], X_xor[y_xor==1, 1],
c='b', marker='x', label='1')
plt.scatter(X_xor[y_xor==-1, 0], X_xor[y_xor==-1, 1],
c='r', marker='s', label='-1')
plt.ylim(-3.0)
plt.legend()
plt.show()

Here is the plot:

As we can see from the plot, we cannot separate samples using a linear hyperplane as the decision boundary via linear SVM model or logistic regression.

The kernel methods is to deal with such a linearly inseparable data is to create nonlinear combinations of the original features to project the dataset onto a higher dimensional space via a mapping function and make them linearly separable.

As shown in the picture below, we can transform a two-dimensional dataset onto a new three-dimensional feature space where the classes become separable via the following projection:

$$\phi(x_1, x_2) = (z_1, z_2, z_3) = (x_1, x_2, x_1^2+x_2^2)$$