Ng Machine Learning -4- Neural Network fundamentals

This week mainly explains the basic knowledge of Neural networks-neural Networks:

  • Nonlinear hypothesis
  • Neurons and the brain
  • Model to represent
  • Features and intuitive understanding
  • Multiclass classification problem

Non linear Hypotheses

Disadvantages of linear regression and logistic regression: when there are too many features, the computational load will be very large

Suppose we wanted to train a model to recognize visual objects (such as whether a car is in a picture), how could we do so? One approach is to take a lot of pictures of cars and a lot of pictures of non-cars, and use the value (saturation or brightness) of each pixel in those images as a feature.

Assuming you take a small image of 50 by 50 pixels and consider all pixels as features, there are 2500 features. What ordinary logistic regression model cannot handle, neural network is needed

Neurons and the brain

Model to represent

Model representation 1

Each neuron can be considered as a processing unit/Nucleus, which mainly contains:

  • Multiple inputs/dendritesinput/Dendrite
  • One output/axonoutput/Axon

A neural network is a network of large numbers of neurons linked to each other and communicating via electrical impulses

  1. Neural network models are built on many neurons, and each neuron is a learning model
  2. Neurons are called activation units; In neural networks, parameters can also be called weights
  3. A neural network resembling a neuron

The neural network

Below is an example of a neuron in a logistic regression model as its own learning model

A neural network that resembles a neuron

  • X1,x2,x3{x_1},{x_2},{x_3}x1,x2,x3 are the input units to which the raw data is fed

  • A couple of basic concepts

    • Input layer: The layer where the data node resides
    • Network layer: output hi{h_i}hi along with its network layer parameters w,b{w,b}w,b
    • Hidden layer: The middle layer of the network layer
    • Output layer: The last layer
    • Bias unit:bias unitAnd add bias units to each layer

    Activation unit and output of the above model are expressed as:

    Expressions for the three activation units:


    a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) a^{(2)}_1 = g(\Theta^{(1)}_{10}x_0+\Theta^{(1)}_{11}x_1+\Theta^{(1)}_{12}x_2+\Theta^{(1)}_{13}x_3)


    a 2 ( 2 ) = g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) a^{(2)}_2 = g(\Theta^{(1)}_{20}x_0+\Theta^{(1)}_{21}x_1+\Theta^{(1)}_{22}x_2+\Theta^{(1)}_{23}x_3)


    a 3 ( 2 ) = g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) a^{(2)}_3 = g(\Theta^{(1)}_{30}x_0+\Theta^{(1)}_{31}x_1+\Theta^{(1)}_{32}x_2+\Theta^{(1)}_{33}x_3)

    The output expression is:


    h Θ ( x ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) h_{\Theta}^{(x)} = g(\Theta^{(2)}_{10}a^{(2)}_{0}+{\Theta}^{(2)}_{11}a^{(2)}_{1}+{\Theta}^{(2)}_{12}a^{(2)}_{2}+{\Theta}^{(2)}_{13}a^{(2)}_ 3)

    Each row of the eigenmatrix (a training instance) is fed to the neural network, and eventually the whole training set needs to be fed to the neural network.

    This left-to-right algorithm is called FORWARD PROPAGATION

Memory method of model marker

Ai (j)a^{(j)}_{I} AI (j) represents the third activation unit at layer JJJ

Theta (j) {\ theta} ^ {} (j) theta (j) is mapped to a representative from the first JJJ layer j j j + 1 + 1 + 1 layer weight matrix; For example, the size of θ(1)\theta^{(1)}θ(1) in the neural network shown above is 3*4. Its size is expressed as follows:

  • Take the number of active units at JJJ layer as the number of rows
  • The matrix with the number of activation units +1 of layer J +1j+1j+1 as the number of columns

Model Representation 2

Compared with cyclic coding, vectorization is more convenient to calculate.

If there are now:


x = [ x 0   x 1   x 2   x 3   ] x=\begin{bmatrix}x_0\ x_1\ x_2 \ x_3 \ \end{bmatrix}


z ( 2 ) = [ z 1 ( 2 )   z 2 ( 2 )   z 3 ( 2 ) ] z^{(2)}=\begin{bmatrix}z_1^{(2)}\ z_2^{(2)}\ z_3^{(2)}\end{bmatrix}

Where Z satisfies:


z ( 2 ) = Θ ( 1 ) x z^{(2)}={\Theta}^{(1)}x

That is, the part in parentheses in the above three activation units, then:


a ( 2 ) = g ( z ( 2 ) ) a^{(2)}=g(z^{(2)})

A (1)a^{(1)}a(1)


z ( 2 ) = Θ ( 1 ) a ( 1 ) z^{(2)}=\Theta^{(1)} a^{(1)}


a ( 2 ) = g ( z ( 2 ) ) a^{(2)}=g(z^{(2)})


z ( 3 ) = Θ ( 2 ) a ( 2 ) z^{(3)}=\Theta^{(2)} a^{(2)}

Then the output H can be expressed as:


h Θ ( x ) = a ( 3 ) = g ( z ( 3 ) ) h_{\Theta}(x)=a^{(3)}=g(z^{(3)})

Features and intuitive understanding

In neural networks, the computation of single layer neurons (no intermediate layer) can be used to represent logical operations, such as logical AND (AND), logic OR (OR).

Implementing logic “AND”

x_1 x_2 h
0 0 0
0 1 0
1 0 0
1 1 1

Implement logic “OR”

x_1 x_2 h
0 0 0
0 1 1
1 0 1
1 1 1

Implementing logical “not Not”

Multiclass classification problem

When there are more than two categories in the output, such as using neural network algorithms to identify pedestrians, cars, motorcycles, etc.

  • The input vector has three dimensions, two intermediate layers
  • There are four neurons in the output layer to represent four classifications, that is, every data will appear in the output layer [a,b, C,d]T[a,b,c,d] T, and only one of [A,b,c,d][a,b,c,d][a,b,c,d] is 1, indicating the current class

Solution in TF

The above multi-category classification problem is similar to the handwritten number problem in TF, and the solutions are as follows:

  • Set the output to doutd_{out}dout a vector of output nodes, doutd_{out}dout is the same as the number of categories
  • Let the I ∈[1,dout] I \in [1,d_{out}] I ∈[1,dout] output value represent the probability P that the current sample belongs to category I
  • If it belongs to the
    i i
    Class with an index of
    i i
    Is set to 1 and the rest to 0!!!!
  • In the figure below: for all cat images, the numeric encoding is 0 and the one-hot encoding is [1,0,0,0]; Other analogy

  1. Handwritten digital picture data

The total number of categories is 10, that is, the total value of output nodes dout= 10D_ {out}=10dout=10. Assuming that the category of a sample is I, that is, the number in the picture is III, a vector YYy with length 10 is required, and the position of the index number iii is set to 1, and the rest are 0.

  • The one-hot encoding of 0 is [1,0,0,0…
  • The one-hot encoding of 1 is [0,1,0,0…
  • The rest of the analogy