Deep Learning From Scratch II: Perceptrons – Deep Ideas
Translation: Sun Yimeng
- Chapter 1: Calculation diagram
- Chapter two: Perceptrons
- Chapter three: Training standards
- Chapter 4: Gradient descent and back propagation
- Chapter 5: Multilayer perceptron
- Chapter 6: TensorFlow
Perceptrons
An exciting example
A perceptron is a small form of neural network, a basic building block for more complex structures.
Before we go into more detail, let’s take a look at this exciting example. Suppose we have a data set of a hundred points on the plane, half of which are red and half are blue.
Click run code and watch the distribution of points.
import numpy as np
import matplotlib.pyplot as plt[/amalthea_pre_exercise_code]
[amalthea_sample_code]
# Create some red dots that focus on (-2, -2)
red_points = np.random.randn(50, 2) - 2*np.ones((50, 2))
# Create some blue dots that focus on (2, 2)
blue_points = np.random.randn(50, 2) + 2*np.ones((50, 2))
# Draw both the red and blue dots on the graph
plt.scatter(red_points[:,0], red_points[:,1], color='red')
plt.scatter(blue_points[:,0], blue_points[:,1], color='blue')
Copy the code
In the diagram, the red dots are concentrated inAnd the blue dots are concentrated in. Looking at the data, do you think there’s a way to tell if a point is red or blue?
If ask youWhat color is it? You’ll immediately say blue, and even if it’s not in the data above, we can still tell what color it is based on the region it’s in (blue).
But is there a more general way to conclude that blue is more likely? Obviously, we can draw a line on the diagram above, perfectly dividing the space into red and blue zones.
# Create some red dots that focus on (-2, -2)
red_points = np.random.randn(50, 2) - 2*np.ones((50, 2))
# Create some blue dots that focus on (2, 2)
blue_points = np.random.randn(50, 2) + 2*np.ones((50, 2))
# Draw both the red and blue dots on the graph
plt.scatter(red_points[:,0], red_points[:,1], color='red')
plt.scatter(blue_points[:,0], blue_points[:,1], color='blue')[/amalthea_pre_exercise_code]
[amalthea_sample_code]
# Draw a line y = -x
x_axis = np.linspace(-4, 4, 100)
y_axis = -x_axis
plt.plot(x_axis, y_axis)
Copy the code
We could use oneWeight vector
And abias
To implicitly represent the line, the point on the lineConform to the.
Substitute in the data in the above example, and get , . soIs equal to the.
So this line can be expressed as:
Ok, now to determine whether it’s red or blue, just determine whether it’s above or below the line: clickPlug inAnd according to the sign of the result, if it’s positive,It’s just above the line, and the negative is just below.
Like the points above:
, so it’s above the line, so it’s blue.
Definition of perceptron
Often, oneclassifier
(classifier
) function: { }, a point can be mapped to a category (C categories in total).
And aBinary classifier
There are two categories ()classifier
.
We use to judge red dots and blue dotsperceptron
, is aBinary classifier
, including 且 bias
:
thisThat will beDivided into two Spaces, each corresponding to a category.
Examples of red and blue dots are two dimensionsIn two dimensions space is divided along a line. Generalized toIn the case of dimensions, the partition of the plane is always along oneThe hyperplane of dimensions.
From categorizing to calculating probabilities
In practice, we don’t just want to know which category a point is most likely to fall into, we also want to know what the probability is that this point falls into a certain category.
So to determine the red and blue, we put in the data at point x, and if we get theThe larger the value, the further away that point must be from the dividing line, and the more confident we are that it’s blue.
But when we get oneWe can’t say if it’s big or not. So in order to convert this value into a probability, we can compress the values so that they’re distributed between 0 and 1.
It can be usedsigmoid
Function σ implements:
Among them
Let’s look at the implementation of the sigmoid function:
import matplotlib.pyplot as plt
import numpy as np
Create an interval from -5 to 5, step 0.01A = np. Arange (-5, 5, 0.01)Calculate the value of the corresponding sigmoid function
s = 1 / (1 + np.exp(-a))
# Draw the result
plt.plot(a, s)
plt.grid(True)
plt.show()
Copy the code
As shown in figure, when, that is, when the point is on the dividing line,sigmoid
The probability that the function gets this value is 0.5. As the asymptote gets closer to one,The greater the value of phi; The closer I get to 0,The smaller the value is.
It meets our expectations.
Now define the Operation of the sigmoid function, which we will use later:
class Operation:
"""Represents a graph node that performs a computation. An `Operation` is a node in a `Graph` that takes zero or more objects as input, and produces zero or more objects as output. """
def __init__(self, input_nodes=[]):
"""Construct Operation """
self.input_nodes = input_nodes
# Initialize list of consumers (i.e. nodes that receive this operation's output as input)
self.consumers = []
# Append this operation to the list of consumers of all input nodes
for input_node in input_nodes:
input_node.consumers.append(self)
# Append this operation to the list of operations in the currently active default graph
_default_graph.operations.append(self)
def compute(self):
"""Computes the output of this operation. "" Must be implemented by the particular operation. """
pass
class Graph:
"""Represents a computational graph """
def __init__(self):
"""Construct Graph"""
self.operations = []
self.placeholders = []
self.variables = []
def as_default(self):
global _default_graph
_default_graph = self
class placeholder:
"""Represents a placeholder node that has to be provided with a value when computing the output of a computational graph """
def __init__(self):
"""Construct placeholder """
self.consumers = []
# Append this placeholder to the list of placeholders in the currently active default graph
_default_graph.placeholders.append(self)
class Variable:
"""Represents a variable (i.e. an intrinsic, changeable parameter of a computational graph). """
def __init__(self, initial_value=None):
"""Construct Variable Args: initial_value: The initial value of this variable """
self.value = initial_value
self.consumers = []
# Append this variable to the list of variables in the currently active default graph
_default_graph.variables.append(self)
class add(Operation):
"""Returns x + y element-wise. """
def __init__(self, x, y):
"""Construct add Args: x: First summand node y: Second summand node """
super().__init__([x, y])
def compute(self, x_value, y_value):
"""Compute the output of the add operation Args: x_value: First summand value y_value: Second summand value """
return x_value + y_value
class matmul(Operation):
"""Multiplies matrix a by matrix b, producing a * b. """
def __init__(self, a, b):
"""Construct matmul Args: a: First matrix b: Second matrix """
super().__init__([a, b])
def compute(self, a_value, b_value):
"""Compute the output of the matmul operation Args: a_value: First matrix value b_value: Second matrix value """
return a_value.dot(b_value)
class Session:
"""Represents a particular execution of a computational graph. """
def run(self, operation, feed_dict={}):
"""Computes the output of an operation Args: operation: The operation whose output we'd like to compute. feed_dict: A dictionary that maps placeholders to values for this session """
# Perform a post-order traversal of the graph to bring the nodes into the right order
nodes_postorder = traverse_postorder(operation)
# Iterate all nodes to determine their value
for node in nodes_postorder:
if type(node) == placeholder:
# Set the node value to the placeholder value from feed_dict
node.output = feed_dict[node]
elif type(node) == Variable:
# Set the node value to the variable's value attribute
node.output = node.value
else: # Operation
# Get the input values for this operation from node_values
node.inputs = [input_node.output for input_node in node.input_nodes]
# Compute the output of this operation
node.output = node.compute(*node.inputs)
# Convert lists to numpy arrays
if type(node.output) == list:
node.output = np.array(node.output)
# Return the requested node value
return operation.output
def traverse_postorder(operation):
"""Performs a post-order traversal, returning a list of nodes in the order in which they have to be computed Args: operation: The operation to start traversal at """
nodes_postorder = []
def recurse(node):
if isinstance(node, Operation):
for input_node in node.input_nodes:
recurse(input_node)
nodes_postorder.append(node)
recurse(operation)
return nodes_postorder
[/amalthea_pre_exercise_code]
[amalthea_sample_code]
class sigmoid(Operation):
"""Returns the sigmoID result of element X. """
def __init__(self, a):
"""Construct sigmoID parameter list: A: input node"""
super().__init__([a])
def compute(self, a_value):
"""Calculate the output parameter list of this sigmoid operation: a_value: input value"""
return 1 / (1 + np.exp(-a_value))
def reTrue():
return True
reTrue()
Copy the code
1. Here’s an example
Now we can do one in Pythonperceptron
To solve the previous red/blue problem. To use thisperceptron
Work it outIs the probability of a blue dot
Create a new graph
Graph().as_default()
x = placeholder()
w = Variable([1, 1])
b = Variable(0)
p = sigmoid( add(matmul(w, x), b) )
session = Session()
print(session.run(p, {
x: [3, 2]
}))
Copy the code
Multicategory perceptron
So far, we’ve only usedperceptron
I did a binaryclassifier
Is used to calculate the probability that a point falls into one of two categoriesSo, naturally, the probability of being in the other category is.
But more often than not, there are more than two categories. For example, when classifying images, there may be many categories to output (dogs, chairs, people, houses, etc.).
So we’re going to extend the perceptron, so that it can output multiple categories of possibilities.
We’re still taking constantsAs the number of categories. But I’m not going to use the binaryWeight vector
But rather to introduceWeight matrix
.
Each column of the weight matrix contains the weights in a separate linear classifier, one for each category.
In the binary case, we’re going to calculate 的The dot productAnd now we have to calculate. To calculateReturns a locationThe vector of theta, its terms can be viewed asWeight matrix
The dot product of different columns.
And then we take the vector 1, 2addThe offset vector
. vectorAn item of. Corresponds to a category.
This generates a location atEach term of this vector represents points belonging to a certain category (total).
It may look complicated, but this matrix multiplication, in parallel, isFor each of the categories, their respective correspondingLinear classifier
Well, each of them has its own line, and that line can still be divided, as we did in the red and blue problem, by the sum of the given weightsbias
Implicitly, except in this case,Weight vector
byWeight matrix
Each column is provided, whilebias
It isThe terms of the vector.
1. Softmax
The original perceptron generates a single scalar, and with sigmoid, we compress that scalar to get a probability between zero and one.
Spread to multiple categoriesperceptron
It will generate a vector. And likewise, the vector a’s thThe larger the term, the more confident we are that the input point belongs to number oneA category.
So we also want to put the vector 1, 2Each term of the vector represents the probability that the input value belongs to each category. Each term of the vector is distributed between 0 and 1, and the sum of all terms is 1.
The usual way to do this is to use the Softmax function. The Softmax function is actually a generalization of sigmoID in the case of multi-category output:
[math] ? \sigma(a)_i = \frac{e^{a_i}}{\sum_{j=1}^C e^{a_j}} ? [/math]
class softmax(Operation):
""Return the result of a's Softmax function.""
def __init__(self, a):
"""Construct softmax parameter list: A: input node"""
super().__init__([a])
def compute(self, a_value):
"""Calculate the output values of the Softmax Operation parameter list: a_value: input value"""
return np.exp(a_value) / np.sum(np.exp(a_value), axis=1)[:, None]
Copy the code
2. Batch calculation
We can pass in multiple values at once, in the form of a matrix. In other words, where before we could only pass in one point at a time, now we can pass in one matrix at a timeEach row of the matrix contains a point (totalLine, including 个 D point).
We call this matrix batch.
And in that case, what we calculate isRather than. To calculateWill return aEach row of the matrix contains the values of each point.
Let’s add another one to each rowThe offset vector
At this time,Is aThe row vector of.
So this whole thing is just evaluating a function, including. hereCalculation chart
As follows:
Example 3.
Let’s generalize the previous red/blue example to support batch calculation and multi-category output.
Create a new graph
Graph().as_default()
X = placeholder()
Create a weight matrix for the two output categories:
# The blue weight vector is (1, 1), and the red weight vector is (-1, -1).
W = Variable([
[1, -1],
[1, -1]
])
b = Variable([0, 0])
p = softmax( add(matmul(X, W), b) )
# Create a Session and run perceptron for our blue/red dots
session = Session()
output_probabilities = session.run(p, {
X: np.concatenate((blue_points, red_points))
})
Print the probability of the first 10 lines, i.e. the first 10 points
print(output_probabilities[:10])
Copy the code
Since the first 10 points in the dataset are all blue, the perceptron output is more likely to be blue (left column) than red.
If you have any questions, feel free to ask them in the comments section of the original post.