Recently, I have been studying neural networks in machine learning. There are two excellent references in this section. The links are as follows: Colah. Github. IO is a series of articles about Neural Networks and Deep Learning. The former involves the introduction of the basic concept of neural network and its formula, with handwritten number recognition algorithm, very friendly to beginners. Before reading, though, I recommend checking out a video that’s more suitable for getting started: Getting Started with Deep Learning

This article I intend to in the author has python2 algorithm on the basis of its python3 conversion, at the same time to make their own understanding, is to deepen the impression.

Basic structure of neural network

class Network(object):
    def __init__(self, sizes):
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]
Copy the code

Sizes are illustrated as above, [2, 3, 1]. Indicates that the neural network has three layers, which are input layer, hidden layer and output layer respectively. For this data set, each image is 28 * 28 pixels, a total of 784 pixels, and the output is 10 neurons, the maximum of which is taken as the prediction number. Therefore, NET = Network([784, 30, 10]) indicates that the neural Network has three layers, with 784 neurons in the input layer and 10 neurons in the output layer.

The data load

def load_data():
    f = gzip.open('./data/mnist.pkl.gz'.'rb')
    training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
    f.close()
    return (training_data, validation_data, test_data)
Copy the code

The data set is read from a file for processing.

def load_data_wrapper():
    tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = zip(training_inputs, training_results)
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = zip(validation_inputs, va_d[1])
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = zip(test_inputs, te_d[1])
    return (training_data, validation_data, test_data)
Copy the code

After the above two steps of processing, Training_data contains 50000 groups of data, and each group of data contains vectors as follows: the former is the gray value of 784 pixels, and the latter contains 10 values, which are the correct classification values of images.

// indicates the classification as5.
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]
Copy the code

The matrix can be converted into a picture by using the following methods:

    c = tr_d[0][0]
    s = np.reshape(c, (784, 1))
    img = s.reshape((28, 28))
    new_im = Image.fromarray(img)
    # print(s)
    new_im.show()
Copy the code

Above, the data is ready.

Basic concepts, such as sigmoID activation function, stochastic gradient descent, forward propagation algorithm and back propagation algorithm, will be implemented in the code, you can refer to the above information.

Stochastic gradient descent

    def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None):
        """Desc: Random gradient descent :param training_data: list of tuples (x, Y) : Param epochs: Param mini_batch_size: Random minimum set :param eta: learning rate: learning rate :param test_data: test data, if any, will evaluate the algorithm, but will reduce the running speed :return:"""
        if test_data:
            test_data = list(test_data)
            n_test = len(test_data)
        training_data = list(training_data)
        n = len(training_data)
        for j in range(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k: k + mini_batch_size]
                for k in range(0, n, mini_batch_size)
            ]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            if test_data:
                print("Epoch {}: {} / {}".format(
                    j, self.evaluate(test_data), n_test))
            else:
                print("Epoch {} complete".format(j))
Copy the code

The parameters are described above. For each iteration, the data set was first scrambled, and the training data were separated according to the minimum batch data set given by stochastic gradient descent. For each batch data set, the learning rate is used to update it. If there is a test set, the accuracy of the algorithm is evaluated. The evaluation algorithm is as follows:

    def evaluate(self, test_data):
        """Evaluating test set accuracy :param test_data: :return:"""
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)
        
    def feedforward(self, a):
        """return the output of the network if "a" is input"""
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a) + b)
        return a
Copy the code

For the test data set, the feedforward function calculates the predicted output of 10 values based on the biases and weights updated in the stochastic gradient descent. The np.argmax() function gets the maximum of the 10 values, even the predicted output value. Determine if they are equal and count them to see how many predictions are correct. As above, let’s ignore the update_mini_batch() function

if __name__ == '__main__': training_data, validation_data, test_data = \ mnist_loader.load_data_wrapper() net = Network([784, 30, 10]) net.sgD (Training_data, 30, 10, 3.0, test_data)Copy the code

The neural network was initialized and three layers were established, including 784 neurons in the input layer, 30 neurons in the hidden layer and 10 neurons in the output layer. Random gradient descent was used to iterate 30 times, with 10 data sets of random descent and a learning rate of 3.0. The results for the test set are as follows:

Epoch 0: 9080 / 10000
Epoch 1: 9185 / 10000
Epoch 2: 9327 / 10000
Epoch 3: 9348 / 10000
Epoch 4: 9386 / 10000
Epoch 5: 9399 / 10000
Epoch 6: 9391 / 10000
Epoch 7: 9446 / 10000
Epoch 8: 9427 / 10000
Epoch 9: 9478 / 10000
Epoch 10: 9467 / 10000
Epoch 11: 9457 / 10000
Epoch 12: 9453 / 10000
Epoch 13: 9440 / 10000
Epoch 14: 9452 / 10000
Epoch 15: 9482 / 10000
Epoch 16: 9470 / 10000
Epoch 17: 9483 / 10000
Epoch 18: 9488 / 10000
Epoch 19: 9484 / 10000
Epoch 20: 9476 / 10000
Epoch 21: 9496 / 10000
Epoch 22: 9469 / 10000
Epoch 23: 9503 / 10000
Epoch 24: 9495 / 10000
Epoch 25: 9499 / 10000
Epoch 26: 9510 / 10000
Epoch 27: 9495 / 10000
Epoch 28: 9487 / 10000
Epoch 29: 9478 / 10000
Copy the code

It can be seen that with the increase of training times, the accuracy of the model is constantly improving.

Random gradient drop updates biases and weights

    def update_mini_batch(self, mini_batch, eta):
        """
        梯度下降更新weights和biases, 用到backpropagation反向传播。
        :param mini_batch:
        :param eta:
        :return:
        """
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]

        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]

        self.weights = [w - (eta / len(mini_batch)) * nw
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b - (eta / len(mini_batch)) * nb
                       for b, nb in zip(self.biases, nabla_b)]
Copy the code

First initialize matrices of the same size as biases and weights. For each nimi_batch x(pixel matrix, 784), y(prediction matrix, 10), the back propagation algorithm is used to calculate delta_Nabla_B, delta_NABla_W = self.backprop(x, y) to get the gradient of each time, Add to get the gradient of mini_batch, and update the weights and bias items.

Back propagation algorithm

    def backprop(self, x, y):
        """:param x: :param y: :return: (nabla_b, nabla_W): gradient for loss function, similar to biaes, weight. """
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        # feedforward
        activation = x
        activations = [x]  Store all active values
        zs = []  # Store all the z vectors
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation) + b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)

        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
                sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())

        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose())
        return (nabla_b, nabla_w)

    def cost_derivative(self, output_activations, y):
        """:param output_activations: :param y: :return: given output firing. """
        return (output_activations - y)
Copy the code

This is the most important and most complex backpropagation algorithm. Vectors in the algorithm mini_BITCH are parameters, which first initialize matrices of the same size as biases and weights, and store all activated values.

 for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation) + b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
Copy the code

The above functions operate on biases and weights, and the activation value is obtained by sigmoid function. And then figure out the parameters for the last layer.

        for l in range(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose())
Copy the code

This function starts from the penultimate layer, iterates to find the gradient value of each layer, and returns the updated gradient.

So far all the code of the algorithm is completed. For the full code, see github: code

Conclusion:

  • python3
  • Introduction to neural networks
  • Stochastic gradient descent
  • Back propagation algorithm

Todo: Use this idea to look at the handwritten number recognition problem on Kaggle.