♚
Author: Li Xiaowen, engaged in data analysis and data mining work successively, mainly developed the language Python, and now works as an algorithm engineer in a small Internet company.
Github: github.com/tushushu
1. The principle of the article
Let’s talk about what a fully connected neural network is in terms of words rather than big mathematical formulas.
1.1 Network Structure
The soul artist uses PPT to draw a rough network structure diagram as follows:
1.2 Simoid function
The expression for the Sigmoid function is:
It is not difficult to conclude:
So the Sigmoid function has a range of 0, 1, and its derivative is y times 1 minus y.
1.3 Chain derivative
z = f(y)y = g(x)
dz / dy = f'(y)dy / dx = g'(x)
dz / dz = dz / dy * dy / dx = f'(y) * g'(x)
1.4 Forward Propagation
All inputs of the current node are computed for the current node as inputs for the output node of the current node.
1.5 Back Propagation
The gradient loss of the input node of the current node is taken as the gradient loss of the input node of the current node by multiplying the partial derivative of the output node with respect to the input node.
1.6 Topology Sorting
Assuming that there are k nodes in our neural network, any node may have multiple inputs, and the sequence of node execution needs to be considered. The principle is that the current node can be executed only after all the input nodes of the current node are executed.
2. Implementation
I use the universe’s simplest programming language – Python to achieve a fully connected neural network, easy to learn and use. A brief description of the implementation process, more detailed comments please refer to my github code.
2.1 Creating the BaseNode Abstract class
Use BaseNode as the parent of each type of Node. Includes the following attributes:
- Name — Node name
- Value — Node data
- Inbound_nodes – Input nodes
- Outbound_nodes – Output nodes
- Gradients — Gradients for input nodes
class BaseNode(ABC) :def __init__(self, *inbound_nodes.name=None):
self.name = name
self._value = None
self.inbound_nodes = [x for x in inbound_nodes]
self.outbound_nodes = []
self.gradients = dict()
for node in self.inbound_nodes:
node.outbound_nodes.append(self)
def __str__(self):
size = str(self.value.shape) if self.value is not None else "null"
return "<Node name: %s, Node size: %s>" % (self.name, size)
@property
def value(self)->ndarray:
return self._value
@value.setter
def value(self, value):
err_msg = "'value' has to be a number or a numpy array!"
assert isinstance(value, (ndarray, int, float)), err_msg
self._value = value
@abstractmethod
def forward(self):
return
@abstractmethod
def backward(self):
return
Copy the code
2.2 Creating an InputNode Class
Used to store training and test data. The INDEXES attribute is used to store the data subscripts in each Batch.
class InputNode(BaseNode) :def __init__(self.value: ndarray.name=None):
BaseNode.__init__(self, name=name)
self.value = value
self.indexes = None
@property
def value(self):
err_msg = "Indexes is None!"
assert self.indexes is not None, err_msg
return self._value[self.indexes]
@value.setter
def value(self, value: ndarray):
BaseNode.value.fset(self, value)
def forward(self):
return
def backward(self):
self.gradients = {self: 0}
for node in self.outbound_nodes:
self.gradients[self] += node.gradients[self]
Copy the code
2.3 Creating the LinearNode class
Used to perform linear operations.
- Y = WX + Bias
- dY / dX = W
- dY / dW = X
- dY / dBias = 1
class LinearNode(BaseNode) :def __init__(self.data: BaseNode.weights: WeightNode.bias: WeightNode.name=None):
BaseNode.__init__(self, data, weights, bias, name=name)
def forward(self):
data, weights, bias = self.inbound_nodes
self.value = np.dot(data.value, weights.value) + bias.value
def backward(self):
data, weights, bias = self.inbound_nodes
self.gradients = {node: np.zeros_like(node.value) for node in self.inbound_nodes}
for node in self.outbound_nodes:
grad_cost = node.gradients[self]
self.gradients[data] += np.dot(grad_cost, weights.value.T)
self.gradients[weights] += np.dot(data.value.T, grad_cost)
self.gradients[bias] += np.sum(grad_cost, axis=0, keepdims=False)
Copy the code
2.4 Creating the MseNode class ****
Used to calculate the difference between predicted and actual values.
- MSE = (label – prediction) ^ 2 / n_label
- dMSE / dLabel = 2 * (label – prediction) / n_label
- dMSE / dPrediction = -2 * (label – prediction) / n_label
class MseNode(BaseNode) :def __init__(self.label: InputNode.pred: LinearNode.name=None):
BaseNode.__init__(self, label, pred, name=name)
self.n_label = None
self.diff = None
def forward(self):
label, pred = self.inbound_nodes
self.n_label = label.value.shape[0]
self.diff = (label.value - pred.value).reshape(-1.1)
self.value = np.mean(self.diff**2)
def backward(self):
label, pred = self.inbound_nodes
self.gradients[label] = (2 / self.n_label) * self.diff
self.gradients[pred] = -self.gradients[label]
Copy the code
2.5 Creating the SigmoidNode class
Used to calculate the Sigmoid value.
- Y = 1 / (1 + e^(-X))
- dY / dX = Y * (1 – Y)
class SigmoidNode(BaseNode) :def __init__(self.input_node: LinearNode.name=None):
BaseNode.__init__(self, input_node, name=name)
@staticmethod
def _sigmoid(arr: ndarray) -> ndarray:
return 1. / (1. + np.exp(-arr))
@staticmethod
def _derivative(arr: ndarray) -> ndarray:
return arr * (1 - arr)
def forward(self):
input_node = self.inbound_nodes[0]
self.value = self._sigmoid(input_node.value)
def backward(self):
input_node = self.inbound_nodes[0]
self.gradients = {input_node: np.zeros_like(input_node.value)}
for output_node in self.outbound_nodes:
grad_cost = output_node.gradients[self]
self.gradients[input_node] += self._derivative(self.value) * grad_cost
Copy the code
2.6 Creating the WeightNode class
Used to store and update weights.
class WeightNode(BaseNode) :def __init__(self.shape: Union[Tuple[int.int].int].name=None, learning_rate=None):
BaseNode.__init__(self, name=name)
if isinstance(shape, int):
self.value = np.zeros(shape)
if isinstance(shape, tuple):
self.value = np.random.randn(*shape)
self.learning_rate = learning_rate
def forward(self):
pass
def backward(self):
self.gradients = {self: 0}
for node in self.outbound_nodes:
self.gradients[self] += node.gradients[self]
partial = self.gradients[self]
self.value -= partial * self.learning_rate
Copy the code
2.7 Creating a Fully connected Neural network Class
class MLP:
def __init__(self) :self.nodes_sorted = []
self._learning_rate = None
self.data = None
self.prediction = None
self.label = None
Copy the code
2.8 Network Structure
def __str__(self):
if not self.nodes_sorted:
return "Network has not be trained yet!"
print("Network informantion:\n")
ret = ["learning rate:", str(self._learning_rate), "\n"]
for node in self.nodes_sorted:
ret.append(node.name)
ret.append(str(node.value.shape))
ret.append("\n")
return "".join(ret)
Copy the code
2.9 more
Store the learning rate and assign it to the owner-heavy node.
@property
def learning_rate(self) -> float:
return self._learning_rate
@learning_rate.setter
def learning_rate(self, learning_rate):
self._learning_rate = learning_rate
for node in self.nodes_sorted:
if isinstance(node, WeightNode):
node.learning_rate = learning_rate
Copy the code
2.10 Topology Sorting
Realize topology sort, the nodes according to the update order.
def topological_sort(self, input_nodes):
nodes_sorted = []
que = copy(input_nodes)
unique = set()
while que:
node = que.pop(0)
nodes_sorted.append(node)
unique.add(node)
for outbound_node in node.outbound_nodes:
if all(x in unique for x in outbound_node.inbound_nodes):
que.append(outbound_node)
self.nodes_sorted = nodes_sorted
Copy the code
2.11 Forward and reverse propagation
def forward(self):
assert self.nodes_sorted is not None, "nodes_sorted is empty!"
for node in self.nodes_sorted:
node.forward()
def backward(self):
assert self.nodes_sorted is not None, "nodes_sorted is empty!"
for node in self.nodes_sorted[::-1]:
node.backward()
def forward_and_backward(self):
self.forward()
self.backward()
Copy the code
2.12 Establishing a fully connected neural network
def build_network(self, data: ndarray, label: ndarray, n_hidden: int, n_feature: int):
weight_node1 = WeightNode(shape=(n_feature, n_hidden), name="W1")
bias_node1 = WeightNode(shape=n_hidden, name="b1")
weight_node2 = WeightNode(shape=(n_hidden, 1), name="W2")
bias_node2 = WeightNode(shape=1, name="b2")
self.data = InputNode(data, name="X")
self.label = InputNode(label, name="y")
linear_node1 = LinearNode(
self.data, weight_node1, bias_node1, name="l1")
sigmoid_node1 = SigmoidNode(linear_node1, name="s1")
self.prediction = LinearNode(
sigmoid_node1, weight_node2, bias_node2, name="prediction")
MseNode(self.label, self.prediction, name="mse")
input_nodes = [weight_node1, bias_node1,
weight_node2, bias_node2, self.data, self.label]
self.topological_sort(input_nodes)
Copy the code
2.13 Training model
Random gradient descent training model was used.
def train_network(self, epochs: int, n_sample: int, batch_size: int, random_state: int):
steps_per_epoch = n_sample // batch_size
for i in range(epochs):
loss = 0
for _ in range(steps_per_epoch):
indexes = choice(n_sample, batch_size, replace=True)
self.data.indexes = indexes
self.label.indexes = indexes
self.forward_and_backward()
loss += self.nodes_sorted[-1].value
print("Epoch: {}, Loss: {:.3f}".format(i + 1, loss / steps_per_epoch))
print()
Copy the code
2.14 Removing Unnecessary Nodes
After model training, remove mSE and Label nodes.
def pop_unused_nodes(self):
for _ in range(len(self.nodes_sorted)):
node = self.nodes_sorted.pop(0)
if node.name in ("mse"."y") :continue
self.nodes_sorted.append(node)
Copy the code
2.15 Training model
def fit(self, data: ndarray, label: ndarray, n_hidden: int, epochs: int,
batch_size: int, learning_rate: float):
label = label.reshape(-1.1)
n_sample, n_feature = data.shape
self.build_network(data, label, n_hidden, n_feature)
self.learning_rate = learning_rate
print("Total number of samples = {}".format(n_sample))
self.train_network(epochs, n_sample, batch_size)
self.pop_unused_nodes()
Copy the code
2.16 Prediction of multiple samples
def predict(self, data: ndarray) -> ndarray:
self.data.value = data
self.data.indexes = range(data.shape[0])
self.forward()
return self.prediction.value.flatten()
Copy the code
3 Effect Evaluation
3.1 the main function
Using the famous Boston housing price data set, split into training set and test set in a 7:3 ratio, training model, and statistical accuracy.
@run_time
def main():
print("Tesing the performance of MLP....")
data, label = load_boston_house_prices()
data = min_max_scale(data)
data_train, data_test, label_train, label_test = train_test_split(
data, label, random_state=20)
reg = MLP()
reg.fit(data=data_train, label=label_train, n_hidden=8,
epochs=1000, batch_size=8, learning_rate=0.0008)
get_r2(reg, data_test, label_test)
print(reg)
Copy the code
3.2 Effect Display
Goodness of fit 0.803, running time 6.9 seconds.
3.3 Tool Functions
I have customized some tool functions, which can be viewed on Github
https://github.com/tushushu/imylu/tree/master/imylu/utils
Copy the code
Run_time – test function run time 2. Load_boston_house_prices – load Boston house price data 3. Train_test_split-split training set, test set 4
conclusion
Matrix multiplication
Click to become a registered member of the community