scikit-learn : Radial Basis Function kernel, RBF
"In Euclidean geometry linearly separable is a geometric property of a pair of sets of points. This is most easily visualized in two dimensions (the Euclidean plane) by thinking of one set of points as being colored blue and the other set of points as being colored red. These two sets are linearly separable if there exists at least one line in the plane with all of the blue points on one side of the line and all the red points on the other side. This idea immediately generalizes to higher dimensional Euclidean spaces if line is replaced by hyperplane." - wiki : Linear separability
"Some supervised learning problems can be solved by very simple models (called generalized linear models) depending on the data. Others simply don't." - Machine Learning 101 - General Concepts
For better understanding, we'll run svm_gui.py which is under sklearn_tutorial/examples directory. We can download the tutorial from Tutorial Setup and Installation:
git clone https://github.com/astroML/sklearn_tutorial
It's an interactive example.
$ python $SKL_HOME/examples/svm_gui.py
Accuracy: 95.8333333333:
Accuracy: 66.6666666667:
The two pictures above used the Linear Support Vector Machine (SVM) that has been trained to perfectly separate 2 sets of data points labeled as white and black in a 2D space. Note that we used hyperplane as a separator
SVM with gaussian RBF (Radial Gasis Function) kernel is trained to separate 2 sets of data points. The points are labeled as white and black in a 2D space. This dataset cannot be separated by a simple linear model.
However, as we can see from the picture below, they can be easily kernelized to solve nonlinear classification, and that's one of the reasons why SVMs enjoy high popularity.
"In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in support vector machine classification." - Radial basis function kernel
Let's see how a nonlinear classification problem looks like using a sample dataset created by XOR logical operation (outputs true only when inputs differ - one is true, the other is false).
In the code below, we create XOR gate dataset (500 samples with either a class label of 1 or -1) using NumPy's logical_xor function:
import numpy as np import matplotlib.pyplot as plt np.random.seed(0) X_xor = np.random.randn(1000, 2) y_xor = np.logical_xor(X_xor[:, 0] > 0, X_xor[:, 1] > 0) y_xor = np.where(y_xor, 1, -1) plt.scatter(X_xor[y_xor==1, 0], X_xor[y_xor==1, 1], c='b', marker='x', label='1') plt.scatter(X_xor[y_xor==-1, 0], X_xor[y_xor==-1, 1], c='r', marker='s', label='-1') plt.ylim(-3.0) plt.legend() plt.show()
Here is the plot:
As we can see from the plot, we cannot separate samples using a linear hyperplane as the decision boundary via linear SVM model or logistic regression.
The kernel methods is to deal with such a linearly inseparable data is to create nonlinear combinations of the original features to project the dataset onto a higher dimensional space via a mapping function and make them linearly separable.
As shown in the picture below, we can transform a two-dimensional dataset onto a new three-dimensional feature space where the classes become separable via the following projection:
$$\phi(x_1, x_2) = (z_1, z_2, z_3) = (x_1, x_2, x_1^2+x_2^2)$$Picture credit : Python Machine Learning by Sebastian Raschka
The solution by a non-linear kernel is available SVM II - SVM with nonlinear decision boundary for xor dataset.
Machine Learning with scikit-learn
scikit-learn installation
scikit-learn : Features and feature extraction - iris dataset
scikit-learn : Machine Learning Quick Preview
scikit-learn : Data Preprocessing I - Missing / Categorical data
scikit-learn : Data Preprocessing II - Partitioning a dataset / Feature scaling / Feature Selection / Regularization
scikit-learn : Data Preprocessing III - Dimensionality reduction vis Sequential feature selection / Assessing feature importance via random forests
Data Compression via Dimensionality Reduction I - Principal component analysis (PCA)
scikit-learn : Data Compression via Dimensionality Reduction II - Linear Discriminant Analysis (LDA)
scikit-learn : Data Compression via Dimensionality Reduction III - Nonlinear mappings via kernel principal component (KPCA) analysis
scikit-learn : Logistic Regression, Overfitting & regularization
scikit-learn : Supervised Learning & Unsupervised Learning - e.g. Unsupervised PCA dimensionality reduction with iris dataset
scikit-learn : Unsupervised_Learning - KMeans clustering with iris dataset
scikit-learn : Linearly Separable Data - Linear Model & (Gaussian) radial basis function kernel (RBF kernel)
scikit-learn : Decision Tree Learning I - Entropy, Gini, and Information Gain
scikit-learn : Decision Tree Learning II - Constructing the Decision Tree
scikit-learn : Random Decision Forests Classification
scikit-learn : Support Vector Machines (SVM)
scikit-learn : Support Vector Machines (SVM) II
Flask with Embedded Machine Learning I : Serializing with pickle and DB setup
Flask with Embedded Machine Learning II : Basic Flask App
Flask with Embedded Machine Learning III : Embedding Classifier
Flask with Embedded Machine Learning IV : Deploy
Flask with Embedded Machine Learning V : Updating the classifier
scikit-learn : Sample of a spam comment filter using SVM - classifying a good one or a bad one
Machine learning algorithms and concepts
Batch gradient descent algorithmSingle Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function
Batch gradient descent versus stochastic gradient descent
Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method
Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD)
Logistic Regression
VC (Vapnik-Chervonenkis) Dimension and Shatter
Bias-variance tradeoff
Maximum Likelihood Estimation (MLE)
Neural Networks with backpropagation for XOR using one hidden layer
minHash
tf-idf weight
Natural Language Processing (NLP): Sentiment Analysis I (IMDb & bag-of-words)
Natural Language Processing (NLP): Sentiment Analysis II (tokenization, stemming, and stop words)
Natural Language Processing (NLP): Sentiment Analysis III (training & cross validation)
Natural Language Processing (NLP): Sentiment Analysis IV (out-of-core)
Locality-Sensitive Hashing (LSH) using Cosine Distance (Cosine Similarity)
Artificial Neural Networks (ANN)
[Note] Sources are available at Github - Jupyter notebook files1. Introduction
2. Forward Propagation
3. Gradient Descent
4. Backpropagation of Errors
5. Checking gradient
6. Training via BFGS
7. Overfitting & Regularization
8. Deep Learning I : Image Recognition (Image uploading)
9. Deep Learning II : Image Recognition (Image classification)
10 - Deep Learning III : Deep Learning III : Theano, TensorFlow, and Keras
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization