This article briefly introduced some basic concepts of machine learning, including definitions, advantages and disadvantages, and the division of machine learning tasks.

Next, I plan to introduce the implementation steps of a complete machine learning project in several articles. Finally, I will introduce the implementation of the corresponding code with the example of “Hands-on – ML-with – Sklearning-and-TF”.

This is how to build a complete machine learning project first!

Here are the main steps of a complete machine learning project, as follows:

  1. Project overview.
  2. Get the data.
  3. Discover and visualize data and discover patterns
  4. Prepare data for machine learning algorithms.
  5. Select the model and train it.
  6. Fine tuning model.
  7. Give solutions.
  8. Deploy, monitor, and maintain the system

This first article will cover the first section, which is what you need to determine when starting a project, including choosing the right loss function.


1. Project Overview

1.1 Demarcation Issues

When we start a machine learning project, we need to know two things:

  1. What is ** s business objective? What does ** expect to gain with algorithms or models that determine what algorithms and performance metrics to evaluate?
  2. How effective are the current solutions?

With these two questions, we can begin to design the system, the solution.

But first, there are a few things to be clear about:

  • Supervised or unsupervised, or reinforcement learning?
  • Is it classification, regression, or some other type of problem?
  • Batch learning or online learning.

1.2 Selecting Performance Indicators

The selection of performance indicators usually refers to the accuracy of the model. In machine learning, the accuracy of the algorithm needs to be improved by reducing losses, so it is necessary to select an appropriate loss function to train the model.

Generally, loss functions can be divided into two categories from the types of learning tasks: regression loss and classification loss, which correspond to regression problems and classification problems respectively.

Return loss
Mean square error/square error/L2 error

The mean square error (MSE) measures the mean square of the difference between the predicted value and the true value, and it only considers the average size of the error, not its direction. However, after squaring, the predicted value that deviates more from the true value will be punished more severely, and MSE has good mathematical characteristics, that is, it is especially easy to take derivatives, so it is easier to calculate the gradient.

The mathematical formula of MSE is as follows:

The code implementation is as follows:

def rmse(predictions, targets):
    # Error between actual value and predicted value
    differences = predictions - targets
    differences_squared = differences ** 2
    mean_of_differences_squared = differences_squared.mean()
    # take the square root
    rmse_val = np.sqrt(mean_of_differences_squared)
    return rmse_val
Copy the code

Of course, the above code implements the root mean square error (RMSE), a simple test example is as follows:

y_hat = np.array([0.000.0.166.0.333])
y_true = np.array([0.000.0.254.0.998])

print("d is: " + str(["%.8f" % elem for elem in y_hat]))
print("p is: " + str(["%.8f" % elem for elem in y_true]))
rmse_val = rmse(y_hat, y_true)
print("rms error is: " + str(rmse_val))
Copy the code

The output is:

D is: ['0.00000000', '0.16600000', '0.33300000'] p is: ['0.00000000', '0.25400000', '0.99800000'] RMS error is: 0.387284994115Copy the code
Squared absolute error divided by L1 error

The mean absolute error (MAE) measures the average sum of the absolute differences between the predicted and observed values. Like MSE, this measure measures the size of the error regardless of direction. But unlike MSE, MAE requires more sophisticated tools like linear programming to calculate gradients. In addition, MAE is more robust to outliers because it does not square.

The mathematical formula is as follows:

The MAE code is also not difficult to implement, as follows:

def mae(predictions, targets):
    differences = predictions - targets
    absolute_differences = np.absolute(differences)
    mean_absolute_differences = absolute_differences.mean()
    return mean_absolute_differences
Copy the code

We can use the MSE test code directly, and the output is as follows:

d is: ['0.00000000'.'0.16600000'.'0.33300000']
p is: ['0.00000000'.'0.25400000'.'0.99800000']
mae error is: 0.251
Copy the code
Mean bias error

This loss function is used less often, so it’s unusual in machine learning, and IT’s the first time I’ve seen it. It is similar to MAE except that it does not use absolute values. Therefore, it is important to note that the positive and negative errors can cancel each other out. Although not as accurate in practice, it can determine whether the model has positive or negative bias.

The mathematical formula is as follows:

To implement this code, you just need to delete the code adding absolute values on the basis of MAE, as shown below:

def mbe(predictions, targets):
    differences = predictions - targets
    mean_absolute_differences = differences.mean()
    return mean_absolute_differences
Copy the code

Using the same test example, the results are as follows:

d is: ['0.00000000'.'0.16600000'.'0.33300000']
p is: ['0.00000000'.'0.25400000'.'0.99800000']
mbe error is: -0.251
Copy the code

You can see that the simple test sample we gave has a negative bias.

Classification error
Hinge Loss/MULTI-classification SVM error

Hinge Loss is often used for maximum-margin classification, which is that within a certain security interval (usually 1), the points for the correct category should be higher than the sum of the points for all the wrong categories. The most commonly used is support vector machines (SVM). Although not differentiable, it is a convex function that can be used with convex optimizers commonly used in machine learning.

The mathematical formula is as follows:

In the formula, Sj represents the predicted value, while S_yi is the true value, which can also be said to be the correctly predicted value, and 1 represents the interval margin. Here, we hope to express the similarity between the two predicted results by the difference between the real value and the predicted value, while margin is a safety factor set artificially. We hope that the score of correct classification is higher than the score of wrong prediction, and is higher than a margin value, that is, the higher s_yi is, the better, and the lower s_j is, the better. The calculated Loss tends to 0.

To illustrate a simple example, suppose we now have the following three training samples and we need to predict the three categories. The values in the table below are the values of each category obtained by the algorithm:

Each column is the value of each category for each picture, and we can also know the true value of each column is dog, cat, and horse. The simple code implementation is as follows:

def hinge_loss(predictions, label):
    ''' hinge_loss = max(0, s_j - s_yi +1) :param predictions: :param label: :return: '''
    result = 0.0
    pred_value = predictions[label]
    for i, val in enumerate(predictions):
        if i == label:
            continue
        tmp = val - pred_value + 1
        result += max(0, tmp)
    return result
Copy the code

Test examples are as follows:

image1 = np.array([0.39.1.49.4.21])
image2 = np.array([4.61.3.28.1.46])
image3 = np.array([1.03.2.37.2.27])
result1 = hinge_loss(image1, 0)
result2 = hinge_loss(image2, 1)
result3 = hinge_loss(image3, 2)
print('image1,hinge loss={}'.format(result1))
print('image2,hinge loss={}'.format(result2))
print('image3,hinge loss={}'.format(result3))

# output result
# image1, hinge loss = 8.48
# image2, hinge loss = 0.0
# image3, hinge loss = 5.199999999999999

Copy the code

The calculation process is more graphic:

## 1st training example
max(0, (1.49) - (0.39) + 1) + max(0, (4.21) - (0.39) + 1)
max(0.2.88) + max(0.5.6)
2.88 + 5.6
8.48 (High loss as very wrong prediction)
## 2nd training example
max(0, (4.61) - (3.28) +1) + max(0, (1.46) - (3.28) +1)
max(0.6.89) + max(0.0.82)
0 + 0
0 (Zero loss as correct prediction)
## 3rd training example
max(0, (1.03) - (2.27) +1) + max(0, (2.37) - (2.27) +1)
max(0.4.3) + max(0.0.9)
4.3 + 0.9
5.2 (High loss as very wrong prediction)
Copy the code

Through calculation, the higher hinge loss value is, the more inaccurate the prediction is.

Cross entropy loss/negative log likelihood

Cross entropy loss (CROSS Entroy Loss) is the most commonly used loss function in classification algorithms.

Mathematical formula:

According to the formula, if the actual label y_i is 1, then the formula has only the first half; If it’s 0, it’s just the second half. Simply put, cross entropy multiplies the logarithm of the probabilities of predicting real categories, and it punishes those with high confidence but wrong predictions.

The code implementation is as follows:

def cross_entropy(predictions, targets, epsilon=1e-10):
    predictions = np.clip(predictions, epsilon, 1. - epsilon)
    N = predictions.shape[0]
    ce_loss = -np.sum(np.sum(targets * np.log(predictions + 1e-5))) / N
    return ce_loss
Copy the code

The test sample is as follows:

predictions = np.array([[0.25.0.25.0.25.0.25],
                            [0.01.0.01.0.01.0.96]])
targets = np.array([[0.0.0.1],
                    [0.0.0.1]])
cross_entropy_loss = cross_entropy(predictions, targets)
print("Cross entropy loss is: " + str(cross_entropy_loss))

# output result
# Cross entropy loss is 0.713532969914
Copy the code

Above code example, source code address:

Github.com/ccc013/Code…

1.3 Verifying assumptions

Verify the hypothesis also can be sure that you actually design the system input and output, we need commercial, machine learning project is certainly not just a algorithm model, usually there will be showing the front page, the back-end services, etc., you need to communicate, head of the before and after, verify the problem of the interface.

For example, the example given in the book hands-on mL-with-Sklearning-and-TF is to design a system to predict house prices. The output is the value of house prices, but if the front end needs to show the categories, namely cheap, medium or expensive, then the output of house prices in our system is meaningless. This is a classification problem, not a regression problem.

Therefore, when you are working on a machine learning project, you need to maintain good communication with your colleagues who are transferring work, and communicate with them at any time to confirm the problems of the interface.


summary

The first chapter briefly introduced the start of a machine learning project, first need to clear the business objectives, existing solutions, designed machine learning system belongs to what type of task, and on this basis, select the appropriate performance indicators, namely the loss function.


Reference:

  • “Hands-on – ML-with-sklearning-and-tf” section 2
  • www.jiqizhixin.com/articles/09…
  • Blog.csdn.net/fendegao/ar…
  • Blog.csdn.net/xg123321123…

Welcome to follow my wechat official account – Machine Learning and Computer Vision, or scan the qr code below, we can communicate, learn and progress together!

Past wonderful recommendation

Learning notes
  • Introduction to Machine Learning series 1 – An Overview of Machine learning
  • Getting to know GAN
  • GAN Learning Series 2: The Origin of GAN
  • [GAN Learning series 3] Image Restoration using Deep Learning and TensorFlow
Math study notes
  • Math Notes for programmers 1- Base conversion
  • Programmer’s Math Note 2– remainder
  • Mathematical Notes for Programmers 3– Iterative methods
Github projects & Resource tutorials recommended
  • [Github Project recommends] a better site for reading and finding papers
  • TensorFlow is now available in Chinese
  • Must-read AI and Deep learning blog
  • An easy-to-understand TensorFlow tutorial
  • Recommend some Python books and tutorials, both beginner and advanced!