scikit-learn : Installation on Ubuntu
The fact that your're here tells me you know what's scikit-learn. It's python's way of doing machine learning based on SciPy.
Meet Machine Learning professionals from scikit-learn at LinkedIn scikit-learn
A general guide for installation can be found at Installing scikit-learn.
Installing from source requires you to have installed Python (>= 2.6), NumPy (>= 1.3), SciPy (>= 0.7), setuptools, Python development headers and a working C++ compiler. Under Debian-based operating systems, which include Ubuntu, we can install all these requirements by issuing:
$ sudo apt-get install build-essential python-dev python-numpy \ python-setuptools python-scipy libatlas-dev libatlas3-base
We will need matplotlib as well:
$ sudo apt-get install python-matplotlib
easy_install is usually the fastest way to install the latest stable release. If we have pip or easy_install, we can install or update with the command:
$ pip install -U scikit-learn
or:
$ easy_install -U scikit-learn
Testing requires having the nose library. After installation, the package can be tested by executing from outside the source directory:
$ sudo apt-get install python-nose $ nosetests sklearn --exe
Initially, I got error messages:
Ran 1720 tests in 262.153s FAILED (SKIP=14, errors=1)
I could not make it work. But I moved on since it seemed to be working at least for the materials from tutorial.
I got the tutorial files from Tutorial Setup and Installation.
git clone https://github.com/astroML/sklearn_tutorial
Then, get the dataset where the $TUTORIAL_HOME =sklearn_tutorial/doc
$ cd $TUTORIAL_HOME/data/sdss_colors $ python fetch_data.py $ cd $TUTORIAL_HOME/data/sdss_photoz $ python fetch_data.py $ cd $TUTORIAL_HOME/data/sdss_spectra $ python fetch_data.py
Machine Learning with scikit-learn
scikit-learn installation
scikit-learn : Features and feature extraction - iris dataset
scikit-learn : Machine Learning Quick Preview
scikit-learn : Data Preprocessing I - Missing / Categorical data
scikit-learn : Data Preprocessing II - Partitioning a dataset / Feature scaling / Feature Selection / Regularization
scikit-learn : Data Preprocessing III - Dimensionality reduction vis Sequential feature selection / Assessing feature importance via random forests
Data Compression via Dimensionality Reduction I - Principal component analysis (PCA)
scikit-learn : Data Compression via Dimensionality Reduction II - Linear Discriminant Analysis (LDA)
scikit-learn : Data Compression via Dimensionality Reduction III - Nonlinear mappings via kernel principal component (KPCA) analysis
scikit-learn : Logistic Regression, Overfitting & regularization
scikit-learn : Supervised Learning & Unsupervised Learning - e.g. Unsupervised PCA dimensionality reduction with iris dataset
scikit-learn : Unsupervised_Learning - KMeans clustering with iris dataset
scikit-learn : Linearly Separable Data - Linear Model & (Gaussian) radial basis function kernel (RBF kernel)
scikit-learn : Decision Tree Learning I - Entropy, Gini, and Information Gain
scikit-learn : Decision Tree Learning II - Constructing the Decision Tree
scikit-learn : Random Decision Forests Classification
scikit-learn : Support Vector Machines (SVM)
scikit-learn : Support Vector Machines (SVM) II
Flask with Embedded Machine Learning I : Serializing with pickle and DB setup
Flask with Embedded Machine Learning II : Basic Flask App
Flask with Embedded Machine Learning III : Embedding Classifier
Flask with Embedded Machine Learning IV : Deploy
Flask with Embedded Machine Learning V : Updating the classifier
scikit-learn : Sample of a spam comment filter using SVM - classifying a good one or a bad one
Machine learning algorithms and concepts
Batch gradient descent algorithmSingle Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function
Batch gradient descent versus stochastic gradient descent
Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method
Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD)
Logistic Regression
VC (Vapnik-Chervonenkis) Dimension and Shatter
Bias-variance tradeoff
Maximum Likelihood Estimation (MLE)
Neural Networks with backpropagation for XOR using one hidden layer
minHash
tf-idf weight
Natural Language Processing (NLP): Sentiment Analysis I (IMDb & bag-of-words)
Natural Language Processing (NLP): Sentiment Analysis II (tokenization, stemming, and stop words)
Natural Language Processing (NLP): Sentiment Analysis III (training & cross validation)
Natural Language Processing (NLP): Sentiment Analysis IV (out-of-core)
Locality-Sensitive Hashing (LSH) using Cosine Distance (Cosine Similarity)
Artificial Neural Networks (ANN)
[Note] Sources are available at Github - Jupyter notebook files1. Introduction
2. Forward Propagation
3. Gradient Descent
4. Backpropagation of Errors
5. Checking gradient
6. Training via BFGS
7. Overfitting & Regularization
8. Deep Learning I : Image Recognition (Image uploading)
9. Deep Learning II : Image Recognition (Image classification)
10 - Deep Learning III : Deep Learning III : Theano, TensorFlow, and Keras
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization