digression

I was very interested in artificial intelligence very early on. I remember my graduation thesis in university, which solved a classic pathfinding problem using genetic algorithm. I have been to classic human thought is very fear and worship, such as traditional computer data structure algorithms, such as classic sorting algorithm and dynamic programming thought, the seemingly complex problems in a short span of ten lines of a for loop can solve even, this makes me feel a kind of aesthetics, is also highly praise for the greatness of the human thought.

But the traditional computer algorithms are actually done by people writing code, and people solving problems with a whole set of problem-solving ideas. But wouldn’t it be cool if a machine could have a mind of its own, if it could “learn” how to solve a problem by itself?

However, according to my current cognition, artificial intelligence is more like a tool, a “mathematical tool”, a “statistical tool”, which summarizes a “rule” from a large amount of data to solve practical problems. It’s a long way from a computer actually thinking, and it may not even be the same thing at the moment. It may require breakthroughs in other disciplines such as human cognition, brain science, to make machines think. Haha, I digress.

Let’s introduce ourselves first.

linear

  1. What is linear?

There is a class of geometric objects, such as lines, planes and cubes, which all look angular and “straight”, and are called linear in mathematics

Dealing with the problems associated with them is very simple. For example, you learned in high school that two lines can be expressed by two linear equations, and you want to find where they intersect:

The intersection point can be obtained by solving the equations of the two simultaneous equations

  1. Why do we study linearity

(1) Our world and universe are too complex for many phenomena to be understood, let alone described mathematically;

(2) Some complex problems that meet specific conditions can be transformed into simple linear problems. Linear problems can be completely understood, completely described by mathematics

Return to the

According to my current knowledge, the main tasks of machine learning fall into two categories. The first is categorizing tasks, like

  • Determine whether a picture is a cat or a dog (dichotomies, because I define the target conclusion as either a cat or a dog)
  • Determine whether a stock will go up or down tomorrow
  • Determine the number in a picture. Because I defined the goal conclusion as 10, 0 to 9.)

In other words, the result of classification is one of a range of artificially predefined results

The second type of task is the regression task, which yields the value of a continuous number, not a category. For example,

  • Forecast house price
  • Forecast stock price

What is machine learning

This is my superficial understanding at present. Machine learning now I think of it as a mathematical tool. By feeding the machine lots of learning material, and then running a machine learning algorithm, it trains a model. It then throws the problem at the machine, which uses the model to calculate the result.

A preliminary perceptual understanding of linear regression

For example, I have collected two sets of data with X and Y (such as age and height), and I want to know whether these two sets of variables are linearly related. So let me draw a scatter diagram with one variable on the X-axis and the other variable on the Y-axis.

So I can find a line that looks like this. The feature of this line is that it is as close as possible to all the discrete points, or it can be expressed as that the sum of the differences between the distances of each discrete point and the straight line is the smallest. So I can pretty well predict the unknown y value from the x value that I know, based on the line that I figured out. If x and y are linear, then the prediction is very good. So the main task of linear regression is to find this line.

Univariate linear regression

Let’s start with univariate linear regression, assuming x has only one feature (such as nitric oxide concentration) and y is the house price. According to the above perceptual understanding, our goal is to find the best equation of a line:

It’s just the process of finding parameters A and b. So actually our goal is to make, for each x point, make

The smallest. This equation is called the loss function. You might ask why is the sum of the squares of difference the smallest? Instead of the absolute sum of the difference being the smallest or the difference being the smallest to the third or fourth power, right? Difference of the sum of the squares of the smallest in mathematics is called the least squares method, here is a link to www.zhihu.com/question/24… I will not go into detail here.

Therefore, the basic idea of a kind of machine learning algorithm is to obtain the machine learning model by determining the loss function of the problem and then optimizing the loss function. How do I minimize this loss function, which is to evaluate a and B. I have to take the derivative of a and b. The point where the derivative is 0 is the extreme point. Now let’s take the derivative of a (the chain rule for composite functions) :

Let me simplify this:

According to the same process, a is obtained. The simplification process is omitted:

Then python implements it: in a nutshell I need to define two methods.

  • Fit fitting method. Or what we call training methods. By passing the training data as parameters into this method, the parameters of the model are obtained.
  • Predict method of prediction. You plug in the x value, you get the predicted value

One thing to notice here is that instead of going through a loop, we’re using vectorization to find a. And we saw that the numerator and the denominator of A could actually be solved by cycling, but you can actually view the numerator and the denominator of A as the dot product of each of the components of A times each of the components of B. This has two advantages:

  • Clearer code
  • Vectors are parallel operations. (call GPU stream processor for parallel computation) much faster than loop in CPU

When we take the parameters of a and B, we have a model (in this case y= Ax +b), and then we can make a prediction, by plugging x into this equation, we can get the predicted y value.

Multiple linear regression

So once we understand univariate linear regression, what we need to do is, when we have multiple features, how do we make predictions? That’s multiple linear regression. We can understand that multiple linear regression actually requires an equation like this

That is, each feature is preceded by a constant coefficient, followed by a constant (intercept). So here we’re arranging these coefficients into a vector

And then for our convenience, let’s set x0, x0 is equal to 1, so we end up with the dot product of the following two vectors

Then combine all of the X vectors (samples) into a matrix, arranging theta into a column vector. So y is the predicted value of all the x vectors. Here we use matrix and vector multiplication (so you need to review linear algebra if you forget).

So according to the least square method, our goal is to make

The smallest. In other words, the derivation of the entire matrix is omitted. Here is the final solution of Theta:

In other words, we can directly solve the mathematical solution of parameters through mathematical derivation. However, generally speaking, there are few machine learning methods that can directly solve the mathematical solution of parameters, and other methods such as gradient descent may be needed to solve parameters.

Realization of multiple linear regression

And then we implement it based on this mathematical solution.

Simple linear regression (Boston housing price forecast)

The Boston housing price data set is a built-in data set from SkLearn, a machine learning framework

In fact, I was confused when I saw this data set. Does this example take us to predict housing prices? Forecast shenzhen housing prices tomorrow? I think it can be understood that by collecting some characteristics (learning materials) as shown below and the average house price in certain areas of Boston (target conclusion), you can guess what price you or the realtor should get when selling the house. Or we can use this data set to understand which factor has a greater impact on housing prices.

Data is introduced

The dataset contains housing information data for the suburbs of Boston, Massachusetts, from the UCI machine learning repository (the dataset has been taken offline), has been collected since 1978 and includes 506 samples, each containing 12 characteristic variables and the average house price for the area.

Field meaning

As you can see, the researchers wanted to identify the most important factors affecting housing prices, such as environmental factors (nitric oxide concentration), location factors (weighted distance to 5 central areas of Boston) and so on (though I believe the factors affecting housing prices in China are much more complex).

After solving (or learning) the values of each parameter, property developers can collect these characteristics if they want to set a price, and then use the model’s Predict method to get a reference value for housing prices.

Then we can also see which factors are positively and negatively correlated with house prices. However, the larger the parameter, the more it affects the housing price, which is the interpretability of linear regression method for the results (some machine learning methods are not supported).