This article was originally written by AI Frontier


Translated by James Le, edited by Shang Jian, Emily

“There is no doubt that the machine learning/AI sub-field has grown in popularity over the past few years. Big data is hot in the tech industry, and machine learning, which uses large amounts of data to make predictions or make recommendations, is undoubtedly very powerful. Some of the most common examples of machine learning, like Netflix’s algorithm can recommend movies based on movies you’ve seen before, and Amazon’s algorithm can recommend books based on books you’ve bought before.”


So if you want to learn more about machine learning, how do you get started? For me, my introductory course was the artificial intelligence course I attended when I studied abroad in Copenhagen. My lecturer at the time was a full professor of applied mathematics and computer science at the Technical University of Denmark. His research was in logic and artificial intelligence, with an emphasis on using logic to model human planning, reasoning and problem solving. This course includes discussion of theory/core concepts and hands-on problem solving. The textbook we use is one of the AI classics: Artificial Intelligence — A Modern Approach by Peter Norvig A Modern Approach), this book mainly covers agents, search problem solving, adversarial search, probability theory, multi-agent systems, social AI and the philosophy/ethics/future of AI, and more. At the end of the course, a team of three of us implemented a simple programming project, that is, a search-based agent to solve transportation tasks in a virtual environment.

I had already learned a lot in that course and decided to continue studying related topics. Over the past few weeks, I’ve been in San Francisco to attend several technical talks on deep learning, neural networks, and data structures, as well as a machine learning conference attended by some of the most prominent academics in the field. On top of that, I took an Intro to Machine Learning online course on Udacity in early June and finished it just the other day. In this article, I want to share some of the most commonly used machine learning algorithms I learned from the course.

Machine learning algorithms can be divided into three categories: supervised learning, unsupervised learning and reinforcement learning. Supervised learning can be used when a particular data set (training set) has a certain attribute (label), but other data does not have labels or need predictive labels. Unsupervised learning can be used with a given unlabeled data set (the data is not pre-assigned) to find potential relationships between the data. Reinforcement learning is somewhere in between, with each prediction having some form of feedback, but no precise labels or error messages. Since this is an introductory course, I haven’t studied reinforcement learning, but I hope the following 10 algorithms for supervised and unsupervised learning are interesting enough.

1. Decision Trees

Decision tree is a decision support tool that uses tree graphs or decision models and sequences of possibilities, including the outcome of contingencies, resource costs, and utilities. Here’s the rationale:

From a business decision point of view, a decision tree is the least yes/no problem one must know in order to assess the probability of making the right decision most of the time. As a method, it allows you to approach problems in a structured and systematic way that leads to logical conclusions.

2. Naive Bayesian Classification

Naive Bayes classifier is a class of simple probability classifiers based on bayesian theorem and strong (naive) independence hypothesis between features. In the figure is the bayesian formula, including P (A | B) is A posteriori probability, P (B | A) is the likelihood that prior probability P (A) is A class, the prior probability P (B) is predicted.

Some application examples:

  • Detecting spam
  • Categorize news by category, such as technology, politics, sports
  • Decide whether the feelings expressed in the text are positive or negative
  • Face recognition

3. Ordinary Least Squares Regression

If you know statistics, you’ve probably heard of linear regression before. The least square method is a method for calculating linear regression. You can think of linear regression as fitting a line through a set of points. There are a number of ways to do this, and the “least square” method looks like this: You can draw a straight line, then for each data point, calculate the vertical distance from each point to the line, and then add them up, so that the final fitting line is the line with the smallest possible sum of distances.

Linearity refers to the model you use to fit the data, and least square refers to the error measure you minimize.

4. Logistic Regression

Logistic regression is a powerful statistical method that can represent a binomial result with one or more explanatory variables. It measures the relationship between class-dependent variables and one or more independent variables that follow a cumulative logical distribution by using logical functions to estimate probabilities.

In general, logistic regression can be used in the following real application scenarios:

  • Credit score
  • Calculate the success rate of your marketing campaign
  • Forecast revenue for a product
  • Whether there will be an earthquake on a particular day

5. Support Vector Machine (SVM)

SVM is a binary classification algorithm. Given two types of points in n-dimensional coordinates, SVM generates (n-1) -dimensional hyperplanes to divide these points into two groups. If you have two types of points on a plane that can be separated linearly, THE SVM will find a line that divides the points into two types as far away from all of them as possible.

In terms of scale, some of the biggest problems solved using SVM (with appropriate modifications) include display advertising, human Splice Site Recognition, image-based gender detection, large-scale image classification…

6. Ensemble Methods

The integration approach is a learning algorithm that classifies new data points by building a set of classifiers and then weighting their predicted results. The original integration method is Bayesian averaging, but more recent algorithms include error correction output coding, Bagging and Boosting.

So how does the integration approach work? And why are they better than individual models?

  • They average the biases of individual models: if you average democratic and Republican polls together, you get an even result, with no bias towards either side.
  • They reduce variance: the overall opinion of a set of models is more consistent than the opinion of any one model alone. In finance, this is called diversification, and there are many stocks that have less uncertainty than a single stock, which is why your model works better with more data.
  • They are unlikely to overfit: if you have individual models that are not overfitted, then simply combining the predictions of those models (averaging, weighted averaging, logistic regression) will not overfit the resulting model.

4. Clustering Algorithms

Clustering is the task of grouping a series of objects with the goal of making objects in the same group (cluster) more similar to each other than objects in other groups.

Each clustering algorithm is different. Here are some examples:

  • Algorithm based on center of mass
  • Connection-based algorithms
  • Density-based algorithms
  • The probability of
  • Dimension reduction
  • Neural network/deep learning

8. Principal Component Analysis (PCA)

PCA is a statistical process that converts the observed values of a set of potentially correlated variables into the values of a set of linearly unrelated variables by using orthogonal transformations, known as principal components.

Some applications of PCA include compression, simplification of data for easy learning, visualization, etc. Note that domain knowledge is important when choosing whether to continue using PCA. The case of noisy data, where all components of a PCA have high variances, does not apply.

9. Singular Value Decomposition (SVD)

In linear algebra, SVD is the factorization of complex matrices. For a given m * n matrix m, there exists such decomposition that M=U sigma V, where U and V are unitary matrices and sigma is diagonal matrix.

In fact, PCA is a simple application of SVD. In computer vision, the first face recognition algorithms used PCA and SVD to represent faces as a linear combination of “feature faces” for dimensionality reduction, and then matched faces to identities through simple methods. Although modern methods are more complex, many aspects still rely on similar techniques.

10. Independent Component Analysis (ICA)

ICA is a statistical technique used to reveal hidden factors in random variables, measurements, or signal sets. ICA defines a generation model for observed multivariable data, which is usually a large database of samples. In the model, it is assumed that the data variables are linearly mixed by some unknown potential variables in an unknown way. The potential variables are assumed to be non-Gaussian and independent of each other and are referred to as the independent components of the observed data.

ICA is related to PCA, but it is a more powerful technique for finding out the underlying factors of the source when these classical methods fail completely. Applications include digital images, document databases, economic indicators and psychometric measurements.

Now use your understanding of these algorithms to create machine learning applications that make the experience better for people around the world.

英文原文 :

www.kdnuggets.com/2016/08/10-…

You know what I mean