This is the 31st day of my participation in the August More Text Challenge

Representation of a neural network

Why learn neural networks when you have linear regression and logistic regression?

Let me give you some examples.

This is a classification problem of supervised learning. There are only two eigenquantities.

If you use logistic regression, and you want to fit the following curve, you might need to be very polynomial to separate the positive and negative samples.


g ( Theta. 0 + Theta. 1 x 1 + Theta. 2 x 2 + Theta. 3 x 1 x 2 + Theta. 4 x 1 2 x 2 + Theta. 5 x 1 3 x 2 + Theta. 6 x 1 x 2 2 + ) \begin{aligned} &g\left(\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}\right. +\theta_{3} x_{1} x_{2}+\theta_{4} x_{1}^{2} x_{2} \left.+\theta_{5} x_{1}^{3} x_{2}+\theta_{6} x_{1} x_{2}^{2}+\ldots\right) \end{aligned}

Just two eigenquantities is such a hassle, but what about 100?

Even with only quadratic terms, such as x12, x22,… , x1002, x1x2, x1x3… , x99x100x_1^2, x_2^2,… , x_{100}^2, x_1x_2, X_1x_3,… , x_{99}x_{100}x12, x22,… , x1002, x1x2, x1x3… And x99x100, which eventually has more than 5,000 polynomials. Approximately n22\frac{n^2}{2}2n2, x1x_1x1 multiplied from x1x_1x1 to x100x_{100}x100 is 100 terms, All the way down to x100x_{100}x100 going from x1x_1x1 to x100x_{100}x100 adds up to 10,000 terms. But x1x100x_1x_{100}x1x100 is the same as x100x1x_{100}x_1x100x1. So you end up with about half.

Now there are more than 5000 items, and the final result is likely to be over-fitting, and there will be the problem of too much computation.

Now let’s do the third degree. X13, x23,… X1003, x12x2, x12x3… , x99x1002x_1^3, x_2^3,… , x_{100}^3, x_1^2x_2, x_1^2x_3,… , x_{99}x_{100}^2×13, x23,… X1003, x12x2, x12x3… , x99x1002, so there are 170000 cubic entries in the end. (Check yourself, I won’t tweet).

As you can see from the above example, when the number of features is very large, the space will expand rapidly. Therefore, it is unwise to construct nonlinear classifier by constructing polynomial when n is very large.

However, for many practical problems of machine learning in reality, the number of characteristic quantities N is generally very large.

For computer vision, for example: now you need machine learning to determine whether an image is a car.

You can tell it’s a car when you see it, but you definitely can’t tell it’s a car.

Take this little piece. A grid that looks like a doorknob to you, but only recognizes pixel intensity values, knows the brightness values of each pixel. So the problem with computer vision is to take the brightness matrix of the pixels and tell us that this matrix represents a doorknob of a car.

When we build a car recognizer using machine learning, all we have to do is provide a labeled sample. Some of them are cars, and some of them are not cars. This sample set is fed into the learning algorithm, which produces a classifier and, finally, a new image, which determines what it is.