Web Technologies
- Machine Learning 2020
Arthus Samuel(1959)
Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell(1998)
Well-posed learning problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
- Supervised learning
right answer is given and perform regression based on the answer.
example: classification problem. - Unsupervised learning
- Others: Reinforcement learning, recommneder systems
If an expert system--brilliantly designed, engineered and implemented--cannot learn not to repeat its mistakes, it is not as intelligent as a worm or a sea anemone or a kitten.
-Oliver G. Selfridge, from The Gardens of Learning.
"Find a bug in a program, and fix it, and the program will work today. Show the program how to find and fix a bug, and the program will work forever."
- Oliver G. Selfridge, in AI's Greatest Trends and Controversies, Marti A. Hearst and Haym Hirsh, Editors. IEEE Intelligent Systems (January/February 2000). A timely and thought provoking collection of views from AI scholars and practitioners.
Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as examples that illustrate relations between observed variables.
A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases. Artificial intelligence is a closely related field, as are probability theory and statistics, data mining, pattern recognition, adaptive control, computational neuroscience and theoretical computer science.
- from wiki
wiki
Though I've been keenly watching the developments of machine learning for a while, I recently started to picking up real stuff. I referred my work as tutorials, however, it's just my notes.
scikit-learn is a machine learning library based on SciPy.
I wrote a tutorial: scikit-learn.
Well, I barely followed the official guide.
Anyway, I am a moderator for scikit-learn group at linkedin. Please join scikit-learn
I haven't got deeply into OvenCV's machine learning features, but it has some. I wrote a tutorial for C and Python with OpenCV (though it's most on video and digital image processing). Please take a look at OpenCV.
This is THE ONE that everyone knows. Though there are so many good guides out there, again, I also wrote my notes on it. Probably, it is a a sort of 'Machine Learning in 24 days'. Please visit my Matlab pages. It's mostly on signal processing (image/audio/video).
This is open source version of Matlab. I briefly wrote introduction in the other section of this page.
This is for Natural Language Processing via Python tool called NLTK. I've just started tutorials which are very basic. Please visit NLTK (Natural Language Toolkit).
Here I'll list some of the interesting posts from other sources.
From machinelearningmastery.com - Jan. 2014
- Put Machine Learning on a pedestal - Don't put machine learning on a pedestal
- Write Machine Learning Code - Don't write machine learning code
- Doing Things Manually - Don't do things manually
- Reinvent Solutions to Common Problems - Don't reinvent solutions to common problems
- Ignoring the Math - Don't ignore the math
There are several tools that can help us to learn machine learning algorithms such as Matlab, NumPy, R, and Octave, etc. Among them, we're going to use Octave which seems to be most effective in terms of learning curve and features it has.
So, let's install Octave from sourceforge.net.
To customize the prompt, we use PS1(). In this case, we want to switch it to ">> ".
PS1 (">> ")
To check current directory:
>> pwd ans = C:\Octave\3.2.4_gcc-4.4.0\bin
To draw histogram from the Gaussian distribution with 50 intervals:
>> w = 5 + sqrt(10)*(randn(1,1000)) >> hist(w,50)
Then, we get this picture:
>> % Matrix >> m = [1 2; 3 4; 5 6] m = 1 2 3 4 5 6 >> size(m) ans = 3 2 >> m(2,:) % 2nd row only ans = 3 4 >> m(:,2) % 2nd column only ans = 2 4 6 >> m([1:3],:) % all ans = 1 2 3 4 5 6 >> m([1 3],:) % 1st & 3rd rows only ans = 1 2 5 6
To check the variable:
>> who Variables in the current scope: ans m w >> whos % for more info Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== ans 1x29 29 char m 3x2 48 double w 1x1000 8000 double Total is 1035 elements using 8077 bytes
Here is how to load a data file:
>> cd 'c:\test' >> load('population.txt') >> who Variables in the current scope: ans m population w >> whos Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== ans 1x7 7 char m 3x2 48 double population 19x2 304 double w 1x1000 8000 double Total is 1051 elements using 8359 bytes >> population population = 1 200 1000 310 1750 791 1800 978 1850 1262 1900 1650 1950 2519 1955 2756 1960 2982 1965 3335 1970 3692 1975 4068 1980 4435 1985 4831 1990 5263 1995 5674 2000 6070 2005 6454 2010 6972
Here is how to remove data:
>> whos Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== ans 1x7 7 char m 3x2 48 double population 19x2 304 double w 1x1000 8000 double Total is 1051 elements using 8359 bytes >> clear w >> whos Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== ans 1x7 7 char m 3x2 48 double population 19x2 304 double Total is 51 elements using 359 bytes
Here is how to save data: take the most recent population data from 11th row to 19th row, and then save it to recent.data file.
>> v = population(11:19) v = 1970 1975 1980 1985 1990 1995 2000 2005 2010 >> save recent.data v;
The saved recent.data file looks like this:
# Created by Octave 3.2.4, Tue Jul 02 15:53:33 2013 Pacific Daylight Time# name: v # type: matrix # rows: 1 # columns: 9 1970 1975 1980 1985 1990 1995 2000 2005 2010
>> m(:,2) = [10; 11; 12;] % assign new values to 2nd column m = 1 10 3 11 5 12 >> m = [m, [100; 101; 102]] % append another column m = 1 10 100 3 11 101 5 12 102 >> n = [1000; 1001; 1002] n = 1000 1001 1002 >> r = [m n] % column-wise append r = 1 10 100 1000 3 11 101 1001 5 12 102 1002 >> n = [1000 1001 1002] n = 1000 1001 1002 >> s = [m; n] % row-wise append s = 1 10 100 3 11 101 5 12 102 1000 1001 1002 >> s(:) % put all elements of m into a column vector ans = 1 3 5 1000 10 11 12 1001 100 101 102 1002 >> clear % clear all >> % Matrix Multiplication >> >> A = [1 2; 3 4; 5 6] A = 1 2 3 4 5 6 >> size(A) ans = 3 2 >> B = [10 11; 12 13;] B = 10 11 12 13 >> size(B) ans = 2 2 >> C = A * B C = 34 37 78 85 122 133 >> size(C) ans = 3 2 >> % Element-wise multiplication using dot(.) >> B = [10 11; 12 13; 14 15;] B = 10 11 12 13 14 15 >> A.*B ans = 10 22 36 52 70 90 >> A A = 1 2 3 4 5 6 >> A.^2 % A[i]^2 ans = 1 4 9 16 25 36 >> A A = 1 2 3 4 5 6 >> 1./A % inverse each element of A ans = 1.00000 0.50000 0.33333 0.25000 0.20000 0.16667 >> A' % transpose of A ans = 1 3 5 2 4 6 >> max(A) % column-wise max ans = 5 6 >> A <= 4 % element-wise comparison ans = 1 1 1 1 0 0
For more on octave, please visit http://www.bogotobogo.com/WebTechnologies/gnu_octave.php.
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization