Introduction/Basic introduction
- To learn the principles of machine learning, it is important to have a basic knowledge of mathematics (calculus, line generation, statistical probability). I will not go into the derivation of the mathematical formula here. This content, we need to learn offline. In this way, I will mainly explain concepts and applications. First of all, I will talk about some basic things and try to explain them in a simple and understandable way. At present, I find that some courses about machine learning are full of dazzling skills of formulas, which all of a sudden excludes the majority of students who are not good at mathematics. I don’t like it that way.
- Of course, I won’t start with the code and application of TensorFlow, but I need to understand some of the concepts and basics. Otherwise it would be confusing for beginners. Because machine learning is continuous and linearly increasing, if you don’t understand the front, you can’t learn the back, and it gets more and more difficult.
- I will not talk about deep learning for the moment. I will start with the most basic learning algorithm and slowly enter deep learning.
- Machine learning is summed up in two categories, supervised learning and unsupervised learning.
- Supervised learning is divided into two kinds of problems, regression problem and classification problem. So let’s start with the regression problem.
- We’ll talk about unsupervised learning and deep learning later.
Linear regression
I chose to start with the regression problem because it’s the foundation of the foundation in machine learning, and I can start from there. Loss function, gradient descent, learning rate. It all starts here. So let’s start with the basic univariate linear regression.
Linear regression, there are two ways to do it. I’m just going to do linear regression for machine learning, and maybe it’s easier to do slope and intercept math. But it’s not what we want. Deep learning is a systematic discipline, and formulas alone cannot solve all problems.
Predicting housing prices
This is a classic regression problem, for example we have two sets of data, one is the size of the house, the other is the area of the house. So to predict what the price of a house is for a given area, we have to solve for y given x. We graphically represent the data as follows:
On the X-axis is the size of the house, and on the Y-axis is the price of the house. PS: The hand drawing is a little ugly, please don’t mind.
1. The first step is to determine the model for calculating the price
Let the size of the house be x and the selling price be Y. We need to build an expression based on the input x to calculate the output y, that is, the model. As the name implies, linear regression assumes a linear relationship between the output and each input, then the following formula is available:
y = x * w + b
Copy the code
In linear regression w is the slope and B is the intercept. But in machine learning, we like to define it as w is the weight and B is the bias. So our goal is to solve for w and B.
I’m going to say a little bit more here, if it was multiple linear regression, what would it look like? For example, the location of a house is also a factor in the price. So there could be multiple x’s. There will also be multiple corresponding w weights, defined as follows: y = x1 * w1 + x2*w2 + b. I’ll talk about that later, but just to add a little bit. We’ll talk about that later.
2. Step 2 Determine the Loss function
Why is there a loss function? Why don’t we use it? Our goal is to figure out what w and b are, but we don’t know what w and b are to begin with. So machine learning tends to start by randomly initializing W and B, but that’s definitely not the right result. So how do we optimize w and B? Let their values approximate the correct answer. Therefore, only by finding a comparative method to measure the quality of parameters can we be sure. The Loss function does this comparison. We need to measure the loss between the price forecast value and the true value. Therefore, it will compare with Loss. The smaller the result returned by Loss is, it means that Loss is also the smallest. That proves our estimate is correct. But how do you determine this algorithm?
See figure
So what we’re going to do with linear regression is we’re going to get this blue line, but we’re not going to do it intuitively, we’re going to have to do it with a machine. So we calculate the value from each reference point to the blue line segment (the green line segment below represents the error).
How do you calculate the error, and that’s when you introduce the mean square error that I’m going to talk about next. So here’s the formula. The mean square error can be used as an indicator to measure the predicted results.
Y(X1) represents the calculated value of the model, and Yi represents the actual value, also known as the reference value. It’s simply the difference between the prediction and the reference and the sum of squares, divided by the number of samples, to get a loss of one standard deviation. Generally, we will initially generate W and B randomly. Whether w and B are good or not is determined by Loss function. If not, how can we optimize? Now we propose the most important optimization algorithm for machine learning, gradient descent.
3. Gradient descent
The above least square method only solves our initial w and B, whether they are good or not, and makes a loss calculation. So what should we do to find the optimal w and b? We mentioned before that the function of Loss is to find out the gap between the realistic value and the ideal value. If the gap is found, how can we shorten it? We turn the Loss function into a graphic display, as shown in the figure below. The gradient descent algorithm is a downhill problem, approaching the bottom step by step in the steepest direction (extremum).
Mathematically, let me use this diagram to illustrate
J is a function of θ, and we are at θ 0, from which we have to go to the lowest point of J, which is the bottom of the hill. First, we determine the direction of progress, which is the reverse of the gradient. Then we take a distance step, which is the learning rate. After that step, we reach θ 1.
Alpha is called learning rate or step length in gradient descent algorithms, which is a hyperparameter, meaning that we can control the distance of each step by alpha, so that we don’t take a step too big, too big, and miss the lowest point. Also make sure you don’t walk too slowly, which can lead to long training sessions. So the choice of alpha is often very important in gradient descent. It takes trial and error. But I’ll talk about some optimization algorithms for gradient descent. To solve this problem.
So one of the questions you might ask is, what is J of theta? J of theta is the fastest way down, so how do we get J of theta? J(θ) is the partial derivative of w and B respectively according to the Loss function. The specific mathematical formula is as follows:
I’m not going to explain how we got the formula or how we derived it, because that’s a little bit out of focus. You can take a look at the chain rule a little bit more privately.
4. Back propagation
There’s another concept that I want to talk about in passing is back propagation. This algorithm is mentioned a lot in deep learning. And then when we do the gradient, we’re going to pass back the values of w and B from the last partial derivative. Update the latest parameters, and then perform gradient update. Otherwise, the gradient will disappear. To put it bluntly, save the w and b parameters, and the next gradient update will be from the latest parameter. See my demo code later.
Use native JS for linear regression
Using the above principles, let’s take a look at an example of how to implement linear regression in machine learning using native javascript.
- Identify training data sets and hyperparameters
First, we need to define the data set of x and y. This is a normal distribution data, since JS does not have a random normal distribution method. Let’s define a dead set of data, just to demonstrate. Of course you can use a random range of data. It’s just that the data is very discrete. It’s not intuitive.
const x= [13.18.23.36.42.48.58.72.85.94] // The X-axis is the square of the house
const y =[18.25.39.47.32.59.73.87.83.94] // The y axis corresponds to the price of the house
const LEARNING_RATE = 0.0003; / / vector
let w = 0; / / weight
let b = 0; / / bias
Copy the code
- Determine a linear model
const hypothesis = x= > w * x + b;
Copy the code
- Define the loss function, according to the MSE formula to get the following code
const LossFuc = () = > {
let sum = 0;
for (let i = 0; i < M; i++) {
//MSE
sum += Math.pow(hypothesis(x[i]) - y[i], 2);
}
return sum / (2 * M);
}
Copy the code
- Define a method for computing gradients
const gradient=(arg,deriv) = >arg - LEARNING_RATE * (deriv / M)
Copy the code
5. Define training methods
const training = () = > {
let bSum = 0;
let wSum = 0;
// Calculate loss and partial derivative
for (let i = 0; i < M; i++) {
// Take the partial derivative of w
wSum += (hypothesis(x[i]) - y[i]) * x[i];
// Partial with respect to b
bSum += hypothesis(x[i]) - y[i];
}
// Calculate gradients, update parameters, and propagate data back
w = gradient(w,wSum);
b = gradient(b,bSum);
}
Copy the code
To sum up: We talked about the whole machine learning process above and drew a flow chart as follows:
I want you to keep this whole process in mind, and we’re going to use this process a lot, and it’s going to get more and more complicated, but it’s going to build on it.
The code has been uploaded to codesandBoxDemo address
We found that we were writing machine learning code using native JS. You have to encapsulate these mathematical formulas yourself. It’s like slash-and-burn. If deep learning is involved in the future, using neural networks, such as CNN or RNN and GAN algorithms, the code will be very complicated and difficult to understand. So we need a framework to solve this problem. For the front end, let’s see if there’s a machine learning framework in javascript. TensorflowJS comes to mind here. Let’s see if TensorflowJS can solve our problem well.
TensorflowJs was used for linear regression
Now let’s modify the above code and use TensorflowJs for linear regression to see if it improves. First, I went to study the official documents. First of all, I found a tricky problem. In TensorflowJs all the types are tensor types, how do we translate the tensor types into normal JS primitives. This is not like Python where you can convert to numpy to play. At this point, I realized that the tensor genre has a dataSync method that helps us.
To solve this problem, let’s install the original ideas we write JS to come. First, the training data and hyperparameters are defined, and w and B are randomly initialized
const trainX= [13.18.23.36.42.48.58.72.85.94]
const trainY =[18.25.39.47.32.59.73.87.83.94]
const LEARNING_RATE = 0.0003;
const w = tf.variable(tf.scalar(Math.random()));// Use tf.variable to represent trainable arguments
const b = tf.variable(tf.scalar(Math.random()));
Copy the code
Here may need to explain some pre-knowledge, because machine learning has matrix operations, involving some data transformation, what is scalar, tensor, vector, matrix, matrix addition, subtraction, multiplication and division, transpose, broadcast and other operational knowledge. If you do not understand these knowledge, we can open a separate text to talk about.
Define linear model calculations
function predict(x) {
return tf.tidy(function() { // Use Tidy to perform CG action
return w.mul(x).add(b); // The tensor type provides some calculation methods by default.
});
}
Copy the code
We find tensorFlowJS to be a functional programming style.
Define loss function MSE(mean square deviation) calculation
// Loss function, MSE
function loss(prediction, labels) {
const error = prediction
.sub(labels)
.square()
.mean();
return error;
}
Copy the code
Finally, the main training method is determined
function training() {
// SGD stochastic gradient descent
const optimizer = tf.train.sgd(LEARNING_RATE);
// automatic derivative
optimizer.minimize(function() {
const predsYs = predict(tf.tensor1d(trainX));
stepLoss = loss(predsYs, tf.tensor1d(trainY));
return stepLoss;
});
}
Copy the code
From the above, the loss function and gradient descent optimization algorithm, we can see that it is much simpler than the native implementation, without their own cycle accumulation, and do not need to take their own derivatives. TensorflowJS provides methods and optimizers for automatic derivation. The code readability is much better than the pure JS approach. So it still provides some convenience.
The final implementation is to render with React. Of course, international practice, or paste demo code. codesandbox
Neural networks are used to solve linear regression problems
The real power of TensorflowJS is deep learning using neural networks. For those of you who don’t know much about neural networks, that’s okay, but let’s look at it a little bit ahead of time, just to give you a simple idea. How to solve linear regression problem by using neural network. Although it’s a bit of a stab in the dark. But you can see that neural networks can solve not only big problems, but also simple problems. So there are a lot of deep learning courses that start directly with neural networks. But I don’t like it that way, I prefer to do it gradually.
So without further ado, let’s see how we can use neural networks to solve linear regression problems. The code will be much cleaner. So I’m not going to break it down, I’m just going to post it all at once. I will talk about it slowly in future articles.
const model = tf.sequential(); // Define the model, using neural network sequence model
const nr_epochs = 10;
const x = [13.18.23.36.42.48.58.72.85.94]
const y = [18.25.39.47.32.59.73.87.83.94]
const xs = tf.tensor2d(x,[10.1]);
const ys = tf.tensor2d(y,[10.1]);
const LEARNING_RATE = 0.0001; / / vector
let w = 0
let b = 0
function initModel(cb) {
model.add(tf.layers.dense({ units: 1.inputShape: [1]}));// Our problem is very simple, requiring only a single layer of neural network
model.setWeights([tf.tensor2d([w], [1.1]), tf.tensor1d([b])]); // marks the following parameters to track
const optimizer = tf.train.sgd(LEARNING_RATE);// Gradient descent optimization
model.compile({ loss: 'meanSquaredError'.optimizer: optimizer });
model.fit(xs, ys, { // Start training
epochs: nr_epochs, callbacks: {
onEpochEnd: (epoch, logs) = > { // Handle the callback for each epoch
w = model.getWeights()[0].dataSync()[0];
b = model.getWeights()[1].dataSync()[0];
cb()
}
}
});
}
Copy the code
The training of this neural network is continuous, so we can’t control it through the button area. The onEpochEnd callback monitors the progress of each cycle batch. Neural network linear regression example codesandbox
Question for you to think about
- If our training sample has a lot of poor data, what should we do?
- What happens if the learning rate is set too high or too low? How can we optimize the learning rate?
- If we have a very large sample, how do we train, how do we optimize?
- How should polynomial regression be done?
- How do you do logistic regression?
Last thing I want to say about TensorflowJS
TensorflowJS is more suitable for transfer learning, it is not suitable for large-scale model training. Comparing maturity and third party support is not as convenient as Python, and in terms of syntax, matrix manipulation is still more fun in Python. And TensorflowJS is a bit of a backwater, with very little material to learn, unlike Python, which has a lot of material to learn. But I think it’s okay to do something small based on a model, and it can run in NodeJS. So in general, if you are very familiar with JS and the model is simple, consider it. Okay, that’s it. Thank you for seeing this to the end.