Author: Li Xiaowen, engaged in data analysis and data mining work successively, mainly developed the language Python, and now works as an algorithm engineer in a small Internet company.

Github: github.com/tushushu

1. The principle of the article

Let’s talk about what a fully connected neural network is in terms of words rather than big mathematical formulas.

1.1 Network Structure

The soul artist uses PPT to draw a rough network structure diagram as follows:

1.2 Simoid function

The expression for the Sigmoid function is:

It is not difficult to conclude:

So the Sigmoid function has a range of 0, 1, and its derivative is y times 1 minus y.

1.3 Chain derivative

z = f(y)y = g(x)

dz / dy = f'(y)dy / dx = g'(x)

dz / dz = dz / dy * dy / dx = f'(y) * g'(x)

1.4 Forward Propagation

All inputs of the current node are computed for the current node as inputs for the output node of the current node.

1.5 Back Propagation

The gradient loss of the input node of the current node is taken as the gradient loss of the input node of the current node by multiplying the partial derivative of the output node with respect to the input node.

1.6 Topology Sorting

Assuming that there are k nodes in our neural network, any node may have multiple inputs, and the sequence of node execution needs to be considered. The principle is that the current node can be executed only after all the input nodes of the current node are executed.

2. Implementation

I use the universe’s simplest programming language – Python to achieve a fully connected neural network, easy to learn and use. A brief description of the implementation process, more detailed comments please refer to my github code.

2.1 Creating the BaseNode Abstract class

Use BaseNode as the parent of each type of Node. Includes the following attributes:

  1. Name — Node name
  2. Value — Node data
  3. Inbound_nodes – Input nodes
  4. Outbound_nodes – Output nodes
  5. Gradients — Gradients for input nodes
class BaseNode(ABC) :def __init__(self, *inbound_nodes.name=None):
        self.name = name
        self._value = None
        self.inbound_nodes = [x for x in inbound_nodes]
        self.outbound_nodes = []
        self.gradients = dict()
        for node in self.inbound_nodes:
            node.outbound_nodes.append(self)

    def __str__(self):
        size = str(self.value.shape) if self.value is not None else "null"
        return "<Node name: %s, Node size: %s>" % (self.name, size)

    @property
    def value(self)->ndarray:
        return self._value

    @value.setter
    def value(self, value):
        err_msg = "'value' has to be a number or a numpy array!"
        assert isinstance(value, (ndarray, int, float)), err_msg
        self._value = value

    @abstractmethod
    def forward(self):
        return

    @abstractmethod
    def backward(self):
        return
Copy the code

2.2 Creating an InputNode Class

Used to store training and test data. The INDEXES attribute is used to store the data subscripts in each Batch.

class InputNode(BaseNode) :def __init__(self.valuendarray.name=None):
        BaseNode.__init__(self, name=name)
        self.value = value
        self.indexes = None

    @property
    def value(self):
        err_msg = "Indexes is None!"
        assert self.indexes is not None, err_msg
        return self._value[self.indexes]

    @value.setter
    def value(self, value: ndarray):
        BaseNode.value.fset(self, value)

    def forward(self):
        return

    def backward(self):
        self.gradients = {self0}
        for node in self.outbound_nodes:
            self.gradients[self] += node.gradients[self]
Copy the code

2.3 Creating the LinearNode class

Used to perform linear operations.

  1. Y = WX + Bias
  2. dY / dX = W
  3. dY / dW = X
  4. dY / dBias = 1
class LinearNode(BaseNode) :def __init__(self.dataBaseNode.weightsWeightNode.biasWeightNode.name=None):
        BaseNode.__init__(self, data, weights, bias, name=name)

    def forward(self):
        data, weights, bias = self.inbound_nodes
        self.value = np.dot(data.value, weights.value) + bias.value

    def backward(self):
        data, weights, bias = self.inbound_nodes
        self.gradients = {node: np.zeros_like(node.value) for node in self.inbound_nodes}
        for node in self.outbound_nodes:
            grad_cost = node.gradients[self]
            self.gradients[data] += np.dot(grad_cost, weights.value.T)
            self.gradients[weights] += np.dot(data.value.T, grad_cost)
            self.gradients[bias] += np.sum(grad_cost, axis=0, keepdims=False)
Copy the code

2.4 Creating the MseNode class ****

Used to calculate the difference between predicted and actual values.

  1. MSE = (label – prediction) ^ 2 / n_label
  2. dMSE / dLabel = 2 * (label – prediction) / n_label
  3. dMSE / dPrediction = -2 * (label – prediction) / n_label
class MseNode(BaseNode) :def __init__(self.labelInputNode.predLinearNode.name=None):
        BaseNode.__init__(self, label, pred, name=name)
        self.n_label = None
        self.diff = None

    def forward(self):
        label, pred = self.inbound_nodes
        self.n_label = label.value.shape[0]
        self.diff = (label.value - pred.value).reshape(-1.1)
        self.value = np.mean(self.diff**2)

    def backward(self):
        label, pred = self.inbound_nodes
        self.gradients[label] = (2 / self.n_label) * self.diff
        self.gradients[pred] = -self.gradients[label]
Copy the code

2.5 Creating the SigmoidNode class

Used to calculate the Sigmoid value.

  1. Y = 1 / (1 + e^(-X))
  2. dY / dX = Y * (1 – Y)
class SigmoidNode(BaseNode) :def __init__(self.input_nodeLinearNode.name=None):
        BaseNode.__init__(self, input_node, name=name)

    @staticmethod
    def _sigmoid(arr: ndarray) -> ndarray:
        return 1. / (1. + np.exp(-arr))

    @staticmethod
    def _derivative(arr: ndarray) -> ndarray:
        return arr * (1 - arr)

    def forward(self):
        input_node = self.inbound_nodes[0]
        self.value = self._sigmoid(input_node.value)

    def backward(self):
        input_node = self.inbound_nodes[0]
        self.gradients = {input_node: np.zeros_like(input_node.value)}
        for output_node in self.outbound_nodes:
            grad_cost = output_node.gradients[self]
            self.gradients[input_node] += self._derivative(self.value) * grad_cost
Copy the code

2.6 Creating the WeightNode class

Used to store and update weights.

class WeightNode(BaseNode) :def __init__(self.shapeUnion[Tuple[int.int].int].name=None, learning_rate=None):
        BaseNode.__init__(self, name=name)
        if isinstance(shape, int):
            self.value = np.zeros(shape)
        if isinstance(shape, tuple):
            self.value = np.random.randn(*shape)
        self.learning_rate = learning_rate

    def forward(self):
        pass

    def backward(self):
        self.gradients = {self0}
        for node in self.outbound_nodes:
            self.gradients[self] += node.gradients[self]
        partial = self.gradients[self]
        self.value -= partial * self.learning_rate
Copy the code

2.7 Creating a Fully connected Neural network Class

class MLP:
    def __init__(self) :self.nodes_sorted = []
        self._learning_rate = None
        self.data = None
        self.prediction = None
        self.label = None
Copy the code

2.8 Network Structure

def __str__(self):
    if not self.nodes_sorted:
        return "Network has not be trained yet!"
    print("Network informantion:\n")
    ret = ["learning rate:", str(self._learning_rate), "\n"]
    for node in self.nodes_sorted:
        ret.append(node.name)
        ret.append(str(node.value.shape))
        ret.append("\n")
    return "".join(ret)
Copy the code

2.9 more

Store the learning rate and assign it to the owner-heavy node.

@property
def learning_rate(self) -> float:
    return self._learning_rate

@learning_rate.setter
def learning_rate(self, learning_rate):
    self._learning_rate = learning_rate
    for node in self.nodes_sorted:
        if isinstance(node, WeightNode):
            node.learning_rate = learning_rate
Copy the code

2.10 Topology Sorting

Realize topology sort, the nodes according to the update order.

def topological_sort(self, input_nodes):
    nodes_sorted = []
    que = copy(input_nodes)
    unique = set()
    while que:
        node = que.pop(0)
        nodes_sorted.append(node)
        unique.add(node)
        for outbound_node in node.outbound_nodes:
            if all(x in unique for x in outbound_node.inbound_nodes):
                que.append(outbound_node)
    self.nodes_sorted = nodes_sorted
Copy the code

2.11 Forward and reverse propagation

def forward(self):
    assert self.nodes_sorted is not None, "nodes_sorted is empty!"
    for node in self.nodes_sorted:
        node.forward()

def backward(self):
    assert self.nodes_sorted is not None, "nodes_sorted is empty!"
    for node in self.nodes_sorted[::-1]:
        node.backward()

def forward_and_backward(self):
    self.forward()
    self.backward()
Copy the code

2.12 Establishing a fully connected neural network

def build_network(self, data: ndarray, label: ndarray, n_hidden: int, n_feature: int):
    weight_node1 = WeightNode(shape=(n_feature, n_hidden), name="W1")
    bias_node1 = WeightNode(shape=n_hidden, name="b1")
    weight_node2 = WeightNode(shape=(n_hidden, 1), name="W2")
    bias_node2 = WeightNode(shape=1, name="b2")
    self.data = InputNode(data, name="X")
    self.label = InputNode(label, name="y")
    linear_node1 = LinearNode(
        self.data, weight_node1, bias_node1, name="l1")
    sigmoid_node1 = SigmoidNode(linear_node1, name="s1")
    self.prediction = LinearNode(
        sigmoid_node1, weight_node2, bias_node2, name="prediction")
    MseNode(self.label, self.prediction, name="mse")
    input_nodes = [weight_node1, bias_node1,
                    weight_node2, bias_node2, self.data, self.label]
    self.topological_sort(input_nodes)
Copy the code

2.13 Training model

Random gradient descent training model was used.

def train_network(self, epochs: int, n_sample: int, batch_size: int, random_state: int):
    steps_per_epoch = n_sample // batch_size
    for i in range(epochs):
        loss = 0
        for _ in range(steps_per_epoch):
            indexes = choice(n_sample, batch_size, replace=True)
            self.data.indexes = indexes
            self.label.indexes = indexes
            self.forward_and_backward()
            loss += self.nodes_sorted[-1].value
        print("Epoch: {}, Loss: {:.3f}".format(i + 1, loss / steps_per_epoch))
    print()
Copy the code

2.14 Removing Unnecessary Nodes

After model training, remove mSE and Label nodes.

def pop_unused_nodes(self):
    for _ in range(len(self.nodes_sorted)):
        node = self.nodes_sorted.pop(0)
        if node.name in ("mse"."y") :continue
        self.nodes_sorted.append(node)
Copy the code

2.15 Training model

def fit(self, data: ndarray, label: ndarray, n_hidden: int, epochs: int,
        batch_size: int, learning_rate: float):
    label = label.reshape(-1.1)
    n_sample, n_feature = data.shape
    self.build_network(data, label, n_hidden, n_feature)
    self.learning_rate = learning_rate
    print("Total number of samples = {}".format(n_sample))
    self.train_network(epochs, n_sample, batch_size)
    self.pop_unused_nodes()
Copy the code

2.16 Prediction of multiple samples

def predict(self, data: ndarray) -> ndarray:
    self.data.value = data
    self.data.indexes = range(data.shape[0])
    self.forward()
    return self.prediction.value.flatten()
Copy the code

3 Effect Evaluation

3.1 the main function

Using the famous Boston housing price data set, split into training set and test set in a 7:3 ratio, training model, and statistical accuracy.

@run_time
def main():
    print("Tesing the performance of MLP....")
    data, label = load_boston_house_prices()
    data = min_max_scale(data)
    data_train, data_test, label_train, label_test = train_test_split(
        data, label, random_state=20)
    reg = MLP()
    reg.fit(data=data_train, label=label_train, n_hidden=8,
            epochs=1000, batch_size=8, learning_rate=0.0008)
    get_r2(reg, data_test, label_test)
    print(reg)
Copy the code

3.2 Effect Display

Goodness of fit 0.803, running time 6.9 seconds.

3.3 Tool Functions

I have customized some tool functions, which can be viewed on Github

https://github.com/tushushu/imylu/tree/master/imylu/utils
Copy the code

Run_time – test function run time 2. Load_boston_house_prices – load Boston house price data 3. Train_test_split-split training set, test set 4

conclusion

Matrix multiplication

Click to become a registered member of the community