Machine Learning is Fun! Part 1 — The World’s brave Introduction to Machine Learning

By Adam Geitgey

Translation: Kaiser


Multilingual version of this article: Japanese, Portugues, Turkce, Francais, 한 국 어, an ل ع َ ر َ ب ِ ي َ ّ ة ‎ ‎, Espanol (Mexico), Espanol or Polski (Espana).

Have you ever heard people talk about machine learning but with only a vague concept? Are you tired of falling asleep while your colleagues talk? This is an opportunity for change.


This tutorial is for anyone who is curious about machine learning, but doesn’t know where to start. I think there are a lot of people out there who have tried to read the Wikipedia page and become more and more lost, hoping for someone to provide a high-level explanation, and you’ve come to the right place.

Our goal is to make it accessible to everyone — and that’s where generalities come in. But if this article gets one person really interested in machine learning, then it’s done.


What is machine learning?

The core idea of machine learning is to create universal algorithms that can mine data for interesting things without having to write code for a problem. All you need to do is “feed” the data to the pervasive algorithm, which then builds its own logic on the data.

For example, there is an algorithm, called a classification algorithm, that can sort data into different groups. A sorting algorithm that recognises handwritten digits can also be used to detect spam without changing a line of code. The algorithm is the same algorithm, but the input of different training data, it has different classification logic.

“Machine learning” is a basket into which all kinds of universal algorithms can be put.


Two machine learning algorithms

Machine learning falls into two main categories — supervised and unsupervised — and the distinction is simple but crucial.

Supervised learning

Think of yourself as a real estate agent. Your business is growing, so you hired a team of trainee salespeople to help. Here’s the problem — you know what a house is worth at a glance, but an intern doesn’t have that much life experience.

To assist the intern (and free yourself up for a vacation), you decide to create a mini-program that estimates the price of housing in a local area based on size, neighborhood, similar property prices, and so on.

So you kept a little notebook of every transaction you made in the city over the last three months. A ton of details were sorted out for each property — the number of rooms, the size, the surroundings and, most importantly, the final sale price:

Armed with the training data, we wanted to create a program to estimate other housing prices:

This is supervised learning. You know how much each property really sells for, in other words, the answer to the question is known and the logic can be reversed.

To develop small programs that feed training data from each property into a machine learning algorithm, the algorithm tries to figure out the math.

It’s kind of like an arithmetic answer without the sign:

From the picture above, can you deduce the original appearance of these questions? Obviously, we need to fiddle with these numbers to make the equation work.

In supervised learning, what we’re doing is actually using the computer instead of the human to make the equations work. Once you learn to solve a problem, any subproblem in that problem can be solved!

Unsupervised learning

Let’s go back to the original example of selling real estate. What if we don’t know the exact price of each property? Even if you only know the area, location and so on, you can still make some noise. This is called unsupervised learning.

It’s as if someone gave you a piece of paper with a bunch of numbers written on it and said, “I don’t know what that means, but you can guess what it means — good luck!”

What can we do with this data? For starters, an algorithm can be developed to automatically identify niche segments from the data. You may find that buyers near the local university prefer smaller, multi-bedroom houses, while those in the suburbs prefer larger, three-bedroom houses. Understanding the existence of different types of consumers can guide market behavior.

Another is to automatically identify unusual properties that have few similarities. Perhaps these exotic properties are luxury mansions, and the best sales people can be deployed to handle the big deals.

The following article focuses on supervised learning, but not because unsupervised learning is less useful or interesting. In fact, unsupervised learning is growing in importance and growing rapidly, because there is no need to label the data that corresponds to the correct answer beforehand.

Note: There are many other kinds of machine learning algorithms, but it is recommended to start with the basics.


Good for you, but is it really possible to “learn” real home prices?

As a human, your brain can face any situation and learn how to react without explicit guidance. If you sell a house for a long time, you slowly get a “feel” for the price, the selling strategy, the customer observation, etc. The goal of strong artificial intelligence research is to give computers this ability.

But current machine learning algorithms aren’t that great — they only work on very specific, limited problems. Perhaps “learning” is more defined as “deriving equations based on sample data to solve specific problems.”

Unfortunately, “letting machines come up with equations to solve specific problems based on sample data” isn’t a good name, so we’re back to “machine learning.”

Of course, if you read this article 50 years from now, when strong artificial intelligence is widespread, it will look very “classical”. Don’t look. Let your robot bring you a bun, future human.


Put the code in!

However, the above example in the forecast housing price procedures should be written? Think about it for a second, then move on.

If you don’t know anything about machine learning, you might try to follow the basic rules of predicting house prices by writing code like this:

[amalthea_exercise lang="python" executable="false" writable="false"]
[amalthea_sample_code]
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = 0

    # The average price for this area is $200 per square meter
    price_per_sqft = 200

    if neighborhood == "hipsterton":
        Some areas are more expensive
        price_per_sqft = 400

    elif neighborhood == "skid row":
        Some areas are cheaper
        price_per_sqft = 100

    # Forecast actual price based on base price and area
    price = price_per_sqft * sqft

    Adjust the forecast based on the number of rooms
    if num_of_bedrooms == 0:
    
        # Apartments are slightly cheaperPrice = price -- 20000else:
        Houses with many bedrooms cost more
        price = price + (num_of_bedrooms * 1000)

    return price[/amalthea_sample_code]
[/amalthea_exercise]
Copy the code

If you keep writing for hours, you might get a program that runs. But there are pitfalls, and they cannot cope with price changes.

Wouldn’t it be so much better if computers could figure out for themselves how to apply these equations? Who cares what the equation is as long as you get the right number?

[amalthea_exercise lang="python" executable="false" writable="false"[AMalthea_sample_code] def estimate_house_sales_price(Num_of_bedrooms, SQFT, neighborhood): price = jarvisreturn price
[/amalthea_sample_code]
[/amalthea_exercise]
Copy the code

The problem can be imagined as: the price is the stew, and the recipe is the number of bedrooms, square feet and surroundings. If you can figure out how much each ingredient contributes to the final price, maybe that’s the exact weight of the final price that the recipe “mixes”.

This makes the original program (full of if/else) simple as follows:

[amalthea_exercise lang="python" executable="false" writable="false"]
[amalthea_sample_code]
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = 0

    # Add a dash of formula 1
    price += num_of_bedrooms * .841231951398213

    # Extra handle recipe 2Price += SQFT * 1231.1231231# The Right amount of Formula 3Price += neighborhood * 2.3242341421# Salt at the endPrice + = 201.23432095return price
[/amalthea_sample_code]
[/amalthea_exercise]
Copy the code

Note the wonderfully deep bold numbers — 841231951398213, 1231.1231231, 2.3242341421 and 201.23432095 — that are our weights. As long as we can find the right weights, we can predict house prices.

A rough way to calculate weights is as follows:

The first step:

Set ownership reset to 1.0:

[amalthea_exercise lang="python" executable="false" writable="false"]
[amalthea_sample_code]
def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):
    price = 0

    # Add a dash of formula 1Price += num_of_bedrooms * 1.0# Extra handle recipe 2Price += SQFT * 1.0# The Right amount of Formula 3Price += neighborhood * 1.0# Salt at the endPrice + = 1.0return price
[/amalthea_sample_code]
[/amalthea_exercise]
Copy the code

Step 2:

Plug the parameters for each property into the formula to calculate the error between the forecast and the actual price:

Let’s say the first house actually sold for $250,000, but your equation predicted $178,000, but this house is $72,000 less.

Now add up the squares of the errors for each house. Let’s say you have 500 transactions, and the sum of the squares of the errors is $86,123,373. Divide the sum of squares by 500 to get the average error per house. This average error is the cost of the equation (the least square method is to prevent the errors from negating each other).

If we can reduce the cost to zero by adjusting the weights, then the equation is perfect. This means that in all cases, the equation correctly guessed house prices based on the input data. That was our goal — to try different weights and keep the cost as low as possible.

Step 3:

Repeat step 2 over and over again, trying every possible combination of weight values. Whichever combination brings the cost closest to zero. Find that set of weights, and you’re done!


Imagination time

Pretty simple, right? Think back to what you did, take some data, fill it in with three general, simple steps, and get an equation that can guess home prices. Zillow is facing a serious threat!

But here are a few exciting facts:

  1. Over the past 40 years, research in many fields, such as linguistics/translation, has shown that universal learning algorithms that “churn the digital stew” (as the authors call it) have outperformed human methods that allow real people to try and discover explicit rules for themselves. The “brute force” methods of machine learning ultimately defeated the human experts.

  2. That’s a pretty stupid equation. It doesn’t know what “square meters” and “bedrooms” are, it just knows how many numbers to “stir” to get the right answer.

  3. You probably don’t know why a particular set of weights is good, you just write an equation you don’t understand, but it turns out to work.

  4. Suppose instead of “square meters” and “number of bedrooms,” we read in a sequence. Let’s say each number represents “the brightness of a pixel in a picture taken from the roof of a car,” and instead of predicting “house prices,” we predict “the Angle of the steering wheel.” So that’s the wind equation for a man on autopilot?

Crazy, right?


Step 3: “Try every number”

Of course, you can’t really try every possible combination of weights to find the optimal solution, and the reality is that you never run out of them.

To avoid this, mathematicians have found clever ways to get a good result as quickly as possible. Here’s one of them:

First, write an equation that represents step 2 above:

Then rewrite it in machine learning slang (which can be ignored for the time being) :

This equation represents how far out of line our price forecasts are with the current weighting mix.

If you visualize all the possible weights for the number of rooms and square meters, you get something like the following:

This lowest point in blue is the lowest point of the cost — the lowest error of the equation, the highest point is the most ridiculous case. So if we find a set of weights that correspond to the lowest point, that’s the answer!

So we just need to “go down” the way to adjust the weight, close to the lowest point. If every small adjustment is toward the bottom, it will be reached sooner or later.

The derivative of the function is the slope of the tangent line, in other words, it tells us which way to go downhill.

So if you take the partial derivative of the cost function with respect to each weight, and then subtract this value from the weight, that will get us closer to the bottom. Repeat, and eventually we’ll hit the bottom and get the optimal solution for the weight (don’t worry if you don’t understand, keep reading).

It’s a highly generalized way of finding the best weight for an equation, called Batch gradient Descent. If you’re interested, don’t be afraid to get more details.

This is all done automatically when you use machine learning libraries to solve real problems, but it’s useful to know what’s going on.


What else did I skip?

The above three-step algorithm is multivariable linear regression, which makes the prediction target equation based on a straight line passing through the housing data and uses this equation to guess the house price that has never been seen before, which is very effective when solving practical problems.

But the above method may only work for very simple cases, and is not a panacea. One reason is that housing prices are not always simple enough to represent on a straight line.

Fortunately, there are many other approaches, and many machine learning algorithms can handle non-linear data (such as neural networks or support vector machines with cores). There are also algorithms that use linear regression in smarter ways to fit more complex lines. But in either case, the fundamental idea is to find the best weight.

And I’m ignoring the overfitting problem. It’s not hard to find a great set of weights for existing raw data, but it might not work for new houses outside the training set. There are many ways to avoid this phenomenon (such as regularization and using cross-validation data sets), which are key propositions for successful application of machine learning algorithms.

Although the basic concept is simple, achieving useful results with machine learning requires skill and experience. But these are skills that every developer can learn!


Is machine learning magic?

Seeing how easily machine learning can solve seemingly difficult problems such as handwriting recognition, you might think that with enough data, no problem would be a problem. Import the data and wait for the computer to come up with a formula that fits the data!

But keep in mind that machine learning, for it to work, has to be solvable if the target problem is actually solvable with the available data.

Like building a model that predicts house prices based on what kind of plants are growing in the house, it doesn’t work. No matter how hard you try, the computer can’t figure out the relationship.

So if a human expert can’t solve a problem with data, neither can a computer. On the contrary, the advantage of computers is that they can solve problems that humans can solve more quickly.


How to learn more machine learning

In my opinion, the biggest problem with machine learning today is that it’s still mostly in academia, and there’s still not enough accessible material for people who want to know a little more than they want to be experts, and that’s getting better.

Ng’s Machine Learning Class on Coursera is amazing and I highly recommend it as a place to start. For CS majors, as long as you remember a little bit of math, you can learn.

You can also try out a host of machine learning algorithms for yourself by downloading and installing SciKit-Learn, a Python framework that provides a “black box” version of standard algorithms.



Recommended reading

PaddlePaddle mail swindler (Final)

This comment is toxic! — The general routine of text classification

Build your own AlphaZero with Python and Keras