1/ XOR Problem
The xOR problem is that if two inputs have the same Boolean value, the output is True (represented by 1), and if two inputs have the same Boolean value, the output is False (represented by 0).
Multilayer perceptrons (MLPS) are said to be difficult to deal with xOR problems, such as the one below.
First, we will introduce a question from The fourth chapter of The book neural Networks and Deep Learning by Professor Qiu Xipeng, exercise 4-2:
Problem 4-2 Tries to design a feedforward neural network to solve the XOR problem. It requires that the feedforward neural network has two hidden neurons and one output neuron, and uses ReLU as the activation function.
One possible outcome is the following,
Therefore, the calculation of the entire network is:
Substitution:
\left[\begin{array}{LLLL} 0&1&1&0 \end{array}\right]y=[0110] \boldsymbol{y}=\left[\begin{array}{LLLL} 0&1&1&0 \end{array}\right]y=[0110]
2/ Use Flux training
2.1/ Two implicit neurons
In fact, if such a network structure is trained in the way of random initialization, it is not good to train. The reason is that a ReLU activation function is required in the middle, which can be trained well if other activation functions are used.
Use the following code:
using Flux
functionLoss () ŷ = MLP (data) Flux. Mse (ŷ, Y)end
cb = function ()
println(loss())
end
data = Array([[0 1 0 1];
[0 0 1 1]]);
y = Array([[0 1 1 0]; ] ); mlp = Chain(Dense(2.2, relu), Dense(2.1));
ps = Flux.params(mlp);
opt = ADAM(0.01)
@timeFlux.train! (loss, ps, Iterators.repeated((),1000), opt, cb=cb)
Copy the code
After the training, the losses would still be high and the output would all be close to 0.50.50.5.
If we set the parameters of the hidden layer to the result of equation (1), and then train only the weight of the output layer w(2)\ boldSymbol {w}^{(2)}w(2), we get the same result:
# Custom weight, initialize the weight to all 1 matrix
mlp = Chain(Dense(2.2, relu, bias=[0; -1], init=ones),
Dense(2.1, bias=zeros(1), init=ones))
Just take out the third parameter, the weight training of the output layer
ps = Flux.params(Flux.params(mlp)[3])
opt = ADAM(0.1)
@timeFlux.train! (loss, ps, Iterators.repeated((),1000), opt, cb=cb)
Copy the code
The result is w(2)=[0.9999…−1.9999…] ^ \ boldsymbol {w} {} (2) = / 0.9999-1.9999… W (2) = (0.9999-1.9999…) It is the same as the problem.
2.2/ Three implicit neurons
But actually, if we use three neurons on the hidden layer, we can solve this problem. This proves that the power of the model is not enough to use only two implicit neurons, and can be solved by adding one more:
mlp = Chain(Dense(2.3, relu), Dense(3.1));
ps = Flux.params(mlp)
opt = ADAM(0.01)
@timeFlux.train! (loss, ps, Iterators.repeated((),1000), opt, cb=cb)
# loss = 0.22230548
# loss = 0.21818444
#...
# loss = 0.0ŷ = MLP (data)Copy the code
The final solution is correct, as you can see by looking at the parameters of each layer
When two implicit neurons are used, it is very difficult to solve the training by randomly initializing the weight, but if three implicit neurons are used, the power is enough to solve the xOR problem.
However, in fact, when two hidden neurons are used, instead of using ReLU as the activation function, Sigmoid function is used at the hidden layer, so the xOR problem can be solved even when only two hidden neurons are used.
The main problem should be that ReLU will directly truncate the part less than zero, which means that the neuron is not activated, which will easily cause the neuron to “die” and cannot continue training.