Python is arguably the most popular machine learning language out there, and you can find plenty of resources online. Are you considering starting machine learning with Python now? This tutorial may help you get started, from zero to one, but from one to 100, it’s up to you. The original text of this tutorial is divided into two parts, which have been put together in This article by Machine Heart.suo.im/KUWgl 和 suo.im/96wD3. This tutorial is written by Matthew Mayo, associate editor and data scientist at KDnuggets.
“Getting started” is often the hardest, especially when there are so many choices that it is often difficult for a person to make a decision to make a choice. The purpose of this tutorial is to help novice Python machine learning students with little background grow into knowledgeable practitioners, using free materials and resources in the process. The main goal of this outline is to take you through the vast number of resources available. No doubt there are many resources, but which ones are the best? Which are complementary? In what order is it appropriate to study these resources?
First, I’m going to assume that you’re not an expert on:
-
Machine learning
-
Python
-
Any Python library for machine learning, scientific computing, or data analysis
Of course, it would be nice if you had some basic understanding of the first two topics, but that’s not necessary, just spend a little more time in the early stages.
Based on article
Step 1: Basic Python skills
A basic understanding of Python is essential if we are going to use Python to perform machine learning. Fortunately, because Python is a widely used general-purpose programming language with applications in scientific computing and machine learning, it’s not that hard to find a beginner’s tutorial. Your level of experience in Python and programming is critical to getting started.
First, you need to install Python. Because we’ll be using scientific computing and machine learning packages later, I recommend you install Anaconda. This is an industrial-grade Python implementation for Linux, OS X, and Windows, complete with the required packages for machine learning, including Numpy, SciKit-learn, and Matplotlib. It also includes iPython Notebook, an interactive environment used in many of our tutorials. I recommend installing Python 2.7.
If you don’t know how to program, I suggest you start with the following free online books and then move on to subsequent material:
-
Learn Python the Hard Way, the author Zed a. Shaw:https://learnpythonthehardway.org/book/
If you have programming experience, but do not know Python or are very young, I recommend you take the following two courses:
-
Google Developer Python course (highly recommended for visual learners) : http://suo.im/toMzq
-
Introduction to Scientific Computing in Python (M. Scott Shell from UCSB Engineering) (a good introduction, about 60 pages) : http://suo.im/2cXycM
If you need a quick 30-minute tutorial on getting started with Python, see the following:
-
Learn X (X=Python) in Y minutes: http://suo.im/zm6qX
Of course, if you’re already an experienced Python programmer, you can skip this step. Even so, I suggest you also often use Python documentation: https://www.python.org/doc/
Step 2: Basic machine learning skills
Zachary Lipton of KDnuggets has already pointed out that there are many different criteria people use to evaluate a “data scientist”. This is actually a reflection of the field of machine learning, because most of what data scientists do involves using machine learning algorithms to varying degrees. Is it necessary to be very familiar with kernel methods in order to effectively create and obtain insights from support vector machines? Of course not. As with almost everything in life, mastering the depth of theory is about practical application. An in-depth understanding of machine learning algorithms is beyond the scope of this article, and it usually requires you to devote a significant amount of time to more academic courses, or at least to intensive self-taught training yourself.
The good news is that for practical use, you don’t need a PhD in machine learning to understand theory — you don’t need to study computer science theory to be an effective programmer.
Ng’s Coursera machine learning content has often received rave reviews; My advice, however, is to look at the lecture notes taken online by the previous student. Skip notes specific to Octave (a language similar to Matlab that has nothing to do with your Python learning). It is important to understand that these are not official notes, but they are a good reference to ng’s course materials. Of course, if you have the time and interest, you can take Ng’s machine learning course on Coursera right now: http://suo.im/2o1uD
-
Wu En of course the unofficial notes: http://www.holehouse.org/mlclass/
In addition to the ng classes mentioned above, if you need more, there are many different classes available online. I like Tom Mitchell, for example. Here’s a video of his recent talk (along with Maria-Florina Balcan). Very approachable.
-
Tom Mitchell’s Machine Learning course: http://suo.im/497arw
You don’t need all the notes and videos right now. One effective way to do this is to go directly to the specific exercises below as you see fit, referring to the notes and appropriate sections of the video above,
Step 3: Overview of scientific computing Python packages
Ok, so we’ve mastered Python programming and learned a little bit about machine learning. Beyond Python, there are open source software libraries that are often used to perform actual machine learning. Broadly speaking, there are a number of so-called Scientific Python libraries that can be used to perform basic machine learning tasks (this is certainly a bit subjective) :
-
Numpy – mainly useful for its n-dimensional array objects http://www.numpy.org/
-
Pandas – Python library of data analysis, including the http://pandas.pydata.org/ framework (dataframes) data structures, etc
-
Matplotlib – A 2D drawing library that produces publically quality charts http://matplotlib.org/
-
Scikit-learn — machine learning algorithms for data analysis and data mining people http://scikit-learn.org/stable/
A good way to learn about these libraries is to study the following material:
-
Scipy Lecture Notes from Gael Varoquaux, Emmanuelle Gouillart and Olav Vahtras: http://www.scipy-lectures.org/
-
10 Minutes to Pandas: http://suo.im/4an6gY
You’ll see some other packages later in this tutorial, such as the Matplotlib-based data visualization library Seaborn. The packages mentioned above are just a few of the core libraries commonly used in Python machine learning, but understanding them should keep you from getting confused when you come across other packages later.
Let’s get started!
Step 4: Learn machine learning with Python
First check your preparation
-
Python: ready
-
Basic materials for machine learning: Ready
-
Numpy: ready
-
Pandas: ready
-
Matplotlib: ready
It is now time to implement machine learning algorithms using the Python machine learning standard library sciKit-Learn.
Scikit – learn flow chart
Many of the following tutorials and exercises are done using iPython (Jupyter) Notebook, an interactive environment for executing Python statements. IPython Notebook can be easily found online or downloaded to your local computer.
-
From Stanford iPython Notebook overview: http://cs231n.github.io/ipython-tutorial/
Also note that the following tutorials are made up of a number of online resources. If you feel there is something inappropriate about the course, talk to the author. Our first tutorial started with SciKit-learn, and I recommend that you take a look at the following articles in order before continuing with the tutorial.
Scikit-learn is Python’s most commonly used general-purpose machine learning library, which covers k-nearest Neighbor algorithms.
-
Introduction to SciKit-Learn by Jake VanderPlas: http://suo.im/3bMdEd
The following is a more in-depth and extended introduction, including starting a project with a well-known database:
-
Randal Olson’s Machine Learning Case Notes: http://suo.im/RcPR6
The next article focuses on strategies for evaluating different models on SciKit-Learn, including training set/test set segmentation methods:
-
Evaluation of Kevin Markham’s model: http://suo.im/2HIXDD
Step 5: Basic algorithms for implementing machine learning on Python
With the basics of SciKit-Learn in hand, we can explore more general and practical algorithms. Let’s start with the well-known K-means clustering algorithm, which is a very simple and efficient way to solve unsupervised learning problems well:
-
K-means clustering: http://suo.im/40R8zf
Next we can return to the classification problem and learn about the most popular classification algorithms ever:
-
The decision tree: http://thegrimmscientist.com/tutorial-decision-trees/
With the classification problem in mind, we can move on to continuous numerical prediction:
-
Linear regression: http://suo.im/3EV4Qn
We can also apply the idea of regression to classification problems, namely logistic regression:
-
Logistic regression: http://suo.im/S2beL
Step 6: Implement advanced machine learning algorithms on Python
Now that we’re familiar with SciKit-Learn, we can look at more advanced algorithms. The first is support vector machines, which are nonlinear classifiers that rely on mapping data transformations into higher-dimensional Spaces.
-
Support vector machines: http://suo.im/2iZLLa
Later, we can check the random forest learning as an ensemble classifier through the Kaggle Titanic competition:
-
Kaggle Titanic Competition (using Random Forest) : http://suo.im/1o7ofe
Dimensional reduction algorithms are often used to reduce the number of variables used in a problem. Principal component analysis is a special form of unsupervised dimension reduction algorithm:
-
Dimension reduction algorithm: http://suo.im/2k5y2E
Before moving on to step 7, we can take a moment to consider some of the progress that has been made in a relatively short period of time.
Starting with Python and its machine learning libraries, we have not only looked at some of the most common and well-known machine learning algorithms (K-nearest neighbors, K-means clustering, support vector machines, etc.), but also looked at powerful integration techniques (random forest) and some additional machine learning tasks (dimensionality reduction algorithms and model validation techniques). In addition to some basic machine learning tips, we’ve started looking for some useful toolkits.
We will learn more about the new necessary tools.
Step 7: Python deep learning
Neural networks consist of many layers
Deep learning is everywhere. Deep learning builds on decades-old neural networks, but recent advances began a few years ago and have greatly improved the cognitive capabilities of deep neural networks, generating widespread interest. If you’re not familiar with neural networks, KDnuggets has plenty of articles detailing recent deep learning innovations, achievements, and accolades.
The final step is not intended to review all types of deep learning, but rather to explore a few simple network implementations in two advanced contemporary Python deep learning libraries. For readers interested in digging deeper into deep learning, I suggest starting with these free online books:
-
Neural network with deep study, author Michael Nielsen:http://neuralnetworksanddeeplearning.com/
1.Theano
Link: http://deeplearning.net/software/theano/
Theano is the first Python deep learning library we’ve talked about. See what Theano has to say:
Theano is a Python library that allows you to effectively define, optimize, and evaluate mathematical expressions containing multidimensional arrays.
The following introductory tutorial on deep learning with Theano is a bit long, but good enough, vividly described, and highly rated:
-
Theano Deep Learning by Colin Raffel: http://suo.im/1mPGHe
2.Caffe
Link: http://caffe.berkeleyvision.org/
The other library we will test drive is Caffe. Again, let’s start with the author:
Caffe is a deep learning framework constructed by expression, speed, and modularity. Caf FE was developed by Bwekeley’s Center for Vision and Learning and community workers.
This tutorial is by far the best in this article. We’ve looked at several interesting examples above, but none of them can compete with the following example, which implements Google’s DeepDream through Caffe. This is quite wonderful! Once you’ve mastered the tutorial, try to make your processor run freely just for fun.
-
Implement Google DeepDream through Caffe: http://suo.im/2cUSXS
I’m not promising that this will be quick or easy, but if you put in the time and complete the seven steps above, you’ll get pretty good at understanding a lot of machine learning algorithms and implementing them in Python through popular libraries, including some that are at the forefront of current deep learning research.
Advanced article
Machine learning algorithm
This is the next installment in a seven-step series on mastering machine learning in Python. If you’ve already learned the first part of the series, you should be at a satisfactory speed and proficiency level. If not, you might want to review the previous post, depending on your current level of understanding. I promise it will be worth it. After a quick review, this article will focus more explicitly on several machine learning-related task sets. By safely skipping some basic modules — Python basics, machine learning basics, and so on — we can jump right into different machine learning algorithms. This time we can better categorize tutorials by feature.
Step 1: Fundamentals of machine learning review & a new perspective
The previous chapter includes the following steps:
1. Basic Python skills
2. Basic machine learning skills
3. Overview of Python packages
4. Starting machine learning with Python: Introduction & Model Evaluation
5. Machine learning topics for Python: K-means clustering, decision trees, linear regression & Logistic regression
6. Advanced machine learning topics in Python: Support vector machines, random forests, PCA dimensionality reduction
Deep learning in Python
As mentioned above, if you are starting from scratch, I suggest you read the previous chapter in order. I will also list all the materials for beginners to get started, and the installation instructions were included in the previous article.
However, if you’ve read it, I’ll start with the basics below:
-
An explanation of key terms in machine Learning by Matthew Mayo. Address: http://suo.im/2URQGm
-
Wikipedia entry: Statistical classification. Address: http://suo.im/mquen
-
Machine Learning: A complete and detailed Overview by Alex Castrounis. Address: http://suo.im/1yjSSq
If you’re looking for an alternative or complementary approach to learning the basics of machine learning, I happen to recommend Shai Ben-David’s video lecture and Shai Shalev-Shwartz’s textbook right now:
-
Shai Ben-David’s Introduction to Machine Learning video lecture, University of Waterloo. Address: http://suo.im/1TFlK6
-
Understanding Machine Learning: From Theory to Algorithms by Shai Ben-David & Shai Shalev-Shwartz. Address: http://suo.im/1NL0ix
Keep in mind that you don’t need to read all of this introductory material to begin my article series. Video lectures, textbooks, and other resources are available when models are implemented using machine learning algorithms or when appropriate concepts are actually applied in subsequent steps. Judge for yourself.
Step 2: More categories
We’ll start with the new material, first consolidating our classification techniques and introducing some additional algorithms. Although the first part of this article covers decision trees, support vector machines, logistic regression, and synthetic classification random forests, we will add k-nearest neighbor, naive Bayes classifier, and multilayer perceptron.
Scikit – learn classifiers
K-nearest Neighbor (kNN) is an example of a simple classifier and lazy learner, where all calculations occur at classification time (rather than in advance during training steps). KNN is nonparametric and determines how to classify by comparing the data instance with the most recent instance of K.
-
K-nearest neighbor classification using Python. Address: http://suo.im/2zqW0t
Naive Bayes is a classifier based on Bayes’ theorem. It assumes that features are independent and that the existence of any particular feature in a class is independent of the existence of any other feature in the same class.
-
Document classification using SciKit-learn by Zac Stewart. Address: http://suo.im/2uwBm3
Multilayer perceptron (MLP) is a simple feedforward neural network consisting of multiple layers of nodes, each of which is fully connected to the subsequent layer. Multilayer perceptron is introduced in Scikit-Learn version 0.18.
First read an overview of the MLP classifier from the SciKit-Learn documentation and then practice implementing it using tutorials.
-
Neural network model (supervised), SciKit-learn documentation. Address: http://suo.im/3oR76l
-
Python and SciKit-learn Neural Network Beginner’s Guide 0.18! By Jose Portilla. Address: http://suo.im/2tX6rG
Step 3: More clustering
Now we move on to clustering, a form of unsupervised learning. In the previous chapter, we discussed the K-means algorithm; Here we introduce DBSCAN and expectation maximization (EM).
Scikit-learn clustering algorithm
First, read these introductory articles; The first is a quick comparison of K-means and EM clustering techniques, which is a good continuation of the new clustering form, and the second is an overview of the clustering techniques available in SciKit-Learn:
-
Comparison of clustering Techniques: a concise Technical Overview by Matthew Mayo. Address: http://suo.im/4ctIvI
-
Compare different clustering algorithms in toy datasets, SciKit-learn documentation. Address: http://suo.im/4uvbbM
Expectation maximization (EM) is a probabilistic clustering algorithm, and thus involves determining the probability that an instance belongs to a particular cluster. EM is close to the maximum likelihood or maximum posterior estimate of parameters in statistical models (Han, Kamber and Pei). The EM process iterates from a set of parameters until the clustering with respect to the K cluster is maximized.
First read the tutorial on the EM algorithm. Next, take a look at the relevant SciKit-Learn documentation. Finally, follow the tutorial to implement EM clustering yourself using Python.
-
Expectation Maximization (EM) algorithm tutorial by Elena Sharova. Address: http://suo.im/33ukYd
-
Gaussian mixture model, SciKit-learn documentation. Address: http://suo.im/20C2tZ.
-
A quick introduction to Building gaussian Mixture models using Python by Tiago Ramalho. Address: http://suo.im/4oxFsj
If gaussian mixture models seem confusing at first glance, this relevant section from the SciKit-Learn documentation should allay any unnecessary concerns:
Gaussian mixture objects implement expectation maximization (EM) algorithm to fit Gaussian model mixture.
Density-based and noisy spatial clustering applications (DBSCAN) operate by grouping dense data points together and designating low-density data points as outliers.
First read from the SciKit-Learn documentation and follow the example implementation of DBSCAN, then follow the concise tutorial:
-
DBSCAN clustering algorithm demonstration, SciKit-learn documentation. Address: http://suo.im/1l9tvX
-
Density-based clustering algorithm (DBSCAN) and implementation. Address: http://suo.im/1LEoXC
Step 4: More integration methods
The previous article covered only a single integration approach: random forest (RF). RF has achieved great success as a top class classifier in the past few years, but it is certainly not the only integrated classifier. We will look at packaging, promotion and voting.
Give me a raise
First, read the overview of these integrated learners. The first is general-purpose. The second is that they are related to Scikit-learn:
-
Introduction to integrated learning machine by Matthew Mayo. Address: http://suo.im/cLESw
-
Integration methods in scikit-learn, scikit-learn documentation. Address: http://suo.im/yFuY9
Then, before continuing with the new integration approach, take a quick look at random forest with a new tutorial:
-
Random forest in Python, from Yhat. Address: http://suo.im/2eujI
Packaging, promotion and voting are different forms of ensemble classifiers, which all involve the construction of multiple models. However, what algorithms these models are built from, the data the models use, and how the results are ultimately put together will vary with the scenario.
-
Wrap: Build multiple models from the same classification algorithm, while wrapping the classifier with different (independent) data samples from the training set — Scikit-learn
-
Upgrade: Build multiple models from the same classification algorithm, linking models one by one to improve the learning of each subsequent model — Scikit-learn implements AdaBoost
-
Voting: Build multiple models from different classification algorithms and use standards to determine how best to combine models — Scikit-learn implements voting classifiers
So why combine models? To address this issue from a specific perspective, here is an overview of the bias – variance tradeoff, specifically related to promotion, and the sciKit-Learn documentation:
-
Single evaluator VS packaging: deviation-variance decomposition, SciKit-learn documentation. Address: http://suo.im/3izlRB
Now that you’ve read some introductory material on the Integrated learner and have a basic understanding of several specific integrated classifiers, here’s how to implement the Integrated classifier in Python with SciKit-Learn from Machine Learning Mastery:
-
Implementing integrated machine learning algorithms in Python using SciKit-learn by Jason Brownlee. Address: http://suo.im/9WEAr
Step 5: Gradient lifting
Next we will continue to learn ensemble classifiers and explore one of the most popular machine learning algorithms of our time. Gradient lifting has recently made a significant impact on machine learning, becoming one of the most popular and successful algorithms in the Kaggle competition.
Give me a gradient lift
First, read an overview of gradient ascension:
-
Wikipedia entry: Gradient ascension. Address: http://suo.im/TslWi
Next, learn why gradient lifting is the “most successful” method of Kaggle competition:
-
Why gradient lifting perfectly solves many Kaggle problems? Quora: http://suo.im/3rS6ZO
-
Kaggle master explains what gradient ascension is by Ben Gorman. Address: http://suo.im/3nXlWR
While SciKit-Learn has its own implementation of gradient enhancement, we will make a slight change to use the XGBoost library, which we mentioned is a faster implementation.
The following links provide some additional information about the XGBoost library, as well as gradient promotion (as necessary) :
-
Wikipedia entry: XGBoost. Address: http://suo.im/2UlJ3V
-
XGBoost library on Ghub. Address: http://suo.im/2JeQI8
-
XGBoost document. Address: http://suo.im/QRRrm
Now follow this tutorial to bring it all together:
-
A guide to implementing XGBoost gradient ascending trees in Python by Jesse Steinweg-Woods. Address: http://suo.im/4FTqD5
You can also enhance with these more concise examples:
-
Example of XGBoost on Kaggle (Python). Address: http://suo.im/4F9A1J
-
Iris Data Sets and XGBoost simple Tutorial by Ieva Zarina. Address: http://suo.im/2Lyb1a
Step 6: More dimensionality reduction
Dimensionality reduction is the reduction of a variable used for model building from its initial number to a reduced number by using a procedure to obtain a set of master variables.
There are two main forms of dimensionality reduction:
-
1. Feature selection — Select a subset of related features. Address: http://suo.im/4wlkrj
-
2. Feature extraction — Construct an informative and non-redundant feature set of derived values. Address: http://suo.im/3Gf0Yw
The following is a pair of common feature extraction methods.
Principal component analysis (PCA) is a statistical procedure that uses orthogonal transformations to convert a set of observations of potentially related variables into a set of linearly unrelated variable values called principal components. The number of principal components is less than or equal to the number of original variables. The transformation is defined in such a way that the first principal component has the maximum possible variance (that is, considering as much variability as possible in the data)
The above definitions are taken from the PCA Wikipedia entry and can be read further if you are interested. However, the following overview/tutorial is quite thorough:
-
Principal Component Analysis: 3 Simple Steps by Sebastian Raschka. Address: http://suo.im/1ahFdW
Linear discriminant analysis (LDA) is a generalization of Fisher’s linear discriminant and is a method used in statistics, pattern recognition, and machine learning to discover linear combination features or to separate features of two or more categories of objects or events. The resulting combination can be used as a linear classifier or, more commonly, as a dimensionality reduction prior to subsequent classifications.
Closely related to analysis of variance (ANOVA) and regression analysis, LDA also attempts to represent a dependent variable as a linear combination of other features or measures. However, ANOVA uses categorical independent variables and continuous dependent variables, whereas discriminant analysis has continuous independent variables and categorical dependent variables (i.e. class labels).
The above definition is also from Wikipedia. Here’s the full read:
-
Linear discriminant Analysis — Up to bits by Sebastian Raschka. Address: http://suo.im/gyDOb
Are you confused about the actual difference between PCA and LDA for dimensionality reduction? Sebastian Raschka clarifies as follows:
Linear discriminant analysis (LDA) and principal component analysis (PCA) are linear transformation techniques commonly used for dimensionality reduction. PCA can be described as an “unsupervised” algorithm because it “ignores” class labels and its goal is to find the direction that maximizes the variance of the data set (the so-called principal component). In contrast to PCA, LDA is “supervised” and computes the direction of the axis (” linear discriminant “) that represents the maximum spacing between multiple classes.
For a brief explanation of this, read the following:
-
What are the differences in dimension reduction between LDA and PCA? By Sebastian Raschka. Address: http://suo.im/2IPt0U
Step 7: Do more deep learning
The previous chapter provided a gateway to learning neural networks and deep learning. If your learning is going well so far and you want to consolidate your understanding of neural networks and practice implementing a few common neural network models, please read on.
First, take a look at some deep learning basics:
-
Key Terms and Explanations for Deep Learning, by Matthew Mayo
-
7 Steps to Understanding Deep Learning by Matthew Mayo. Address: http://suo.im/3QmEfV
Next, try some concise summaries/tutorials at Google’s Machine intelligence open source software library TensorFlow (an effective deep learning framework and almost the best neural network tool available today) :
-
Introduction to TensorFlow for Anyone to Understand (Parts 1 and 2)
-
Introduction to TensorFlow (Parts 3 and 4)
Finally, try these tutorials directly from the TensorFlow site, which implements some of the most popular and common neural network models:
-
Circular Neural Networks, Google TensorFlow tutorial. Address: http://suo.im/2gtkze
-
Convolutional Neural Networks, Google TensorFlow tutorial. Address: http://suo.im/g8Lbg
In addition, an article is currently in the works on 7 steps to master deep learning, focusing on using the advanced API at the top of TensorFlow to increase the ease and flexibility of model implementations. I’ll also add a link here when I’m done.
Related:
-
5 EBooks you should Read before getting into machine learning. Address: http://suo.im/SlZKt
-
Understand the 7 steps of deep learning. Address: http://suo.im/3QmEfV
-
Machine learning key terms and explanations. Address: http://suo.im/2URQGm