If one had to, Weng’s work can be divided into two periods: 1) CCIPCA+IHDR; 2) CCILCA + DN. IHDR, as an attempt to achieve a robotic brain, has clearly failed. This is because there is not much correlation between the two periods of work [nevertheless, IHDR is still a great incremental learning model worth further study and understanding]. Why can’t IHDR be the ultimate developmental model (brain) for robots? Weng’s answer is that the human brain does not have a central controller, whereas IHDR has a central controller that regulates all learning processes. In context, the implication is that the developmental network (DN) has no central regulation. This paper introduces DN based on professor Weng’s lecture and related papers. Firstly, the idea of developmental network is introduced systematically, and then the method is introduced. Then, an implementation example of developmental network, WHat-network, is used to further understand developmental network. Finally, the experimental results and the whole paper are analyzed and summarized. The example Python program has been uploaded to Github.
@[toc]
1. The brain
1.1 Context from muscle
Weng believes autonomous intent is learned from top-down contexts; Further, the context of intention is from the muscle (motor). Here are some examples from Professor Weng to facilitate understanding of the above sentence.
- When we put a newborn cat in an environment with only vertical boundaries for a period of time, and then put it in a normal environment, we found that its neurons in V1 didn’t respond to the horizontal edge (in other words, it didn’t stare at the horizontal edge). 【Blakemore & Cooper, Nature 1970】
2. The image below shows optic nerve cells in the left visual cortex dominated by input from the right or left eye. The normal condition is shown in the left figure below. Comparatively speaking, most of the nerve cells in the left visual cortex are dominated by input from the right eye, and a significant portion is dominated by the left eye. Now cover the right eye of a 10-day-old kitten, and after 21 days (i.e., 31 days after birth), the situation is shown in the middle. Decreased dominance of the right eye. After another 6 days, the results are shown on the right. It’s basically all dominated by the left eye. [Note: Under normal circumstances, input from the left eye to the right hemisphere and input from the right eye to the left hemisphere]
3. Cut the optic nerve and, over time, find that it automatically connects with neurons in the auditory area. That is, to “see” with the auditory area. 【Sur, Angelucci and Sharm, Nature 1999】
1.2 The brain is not a cascading structure
Deep learning uses deep neural network as a learning model, which is a cascade structure. However, the brain is not a cascade structure, and the synaptic connections between neurons are intricate.
1.3 Unity of symbol and connection
The brain learns and stores knowledge through connections between neurons, and is also capable of logical reasoning. This shows that the brain is the unity of symbolism and connectionism.
1.4 Autonomous development
Human beings develop from a fertilized egg into a baby, and from a baby into an adult. This process is autonomous development. The natural idea is that if you give a machine a body and an autonomous development program, the robot will develop by interacting autonomously with the environment and other agents just as humans do, learning more and more and getting smarter. This is also the core driving goal of Professor Weng’s research on the theory of autonomous mind development.
2. Emergent Universal Turing Machines
2.1 Finite automaton (FA)
Finite automata, or finite-state machines, we first look at finite-state machines (FSM), which are very similar to FA. Finite state machines (FSM) consist of finite states and state transition relationships. The finite state machine as shown in the figure below is composed of six states, and the transition conditions between states are clearly given in the figure. For example, if the current state is 2, enter the character I to switch to state 6.However, finite automata consists of a finite set of internal states and a set of control rules that control what state to turn to after reading the input symbol in the current state.
Turing machine control is a finite automaton
2.2 a Turing machine
The Turing machine was an abstract model proposed by Alan Turing in his paper on Computable Numbers and Their Applications to decision Problems, published at the age of 24. The abstract model largely mimics the process of human processing: it has a reading and writing head similar to the human eye and hand that can read and output information; An infinite strip of paper, providing a constant stream of information and output; A control similar to our brain, which can do different things depending on the problem.
Humans can also be abstracted into Turing machines. Every decision-making, thinking person can be abstractly regarded as a Turing machine, and everyone has their own operating system. The input state set is all the information you can see, hear, smell and feel in your environment, and the possible output set is your words and actions, as well as the facial expressions you can express. The inner set of states is much more complicated. Since we can think of any combination of nerve cell states as an internal state, the number of possible combinations of nerve cell states would be astronomical. It’s a human memory.As long as a Turing machine has an internal state, it has a memory.
If Turing machine and human analogy, then the human brain is the Turing machine’s internal state storage and control program instructions.
2.3 Universal Turing machine
We can encode the Turing machine described above as the string < M>. We can further construct a special Turing machine, which takes the code < M> of any Turing machine M as input (Turing machine takes data as input, general Turing machine takes Turing machine as input), and then simulates the operation of M. Such a Turing machine is called a general Turing machine. A modern computer is a simulation of such a universal Turing machine, which can take a program describing other Turing machines and run it to implement the algorithms described in the program.
2.4 Emergent UTM
The figure above shows a traditional finite automaton, which can be expressed as a symbol lookup table with the same meaning as the following:We further code: A-00; B-01; C-10; D – 11. The following lookup table is available. The lookup table is an emergent finite automaton.
We take Z as the state and X as the input. At the initial state 00, input 01, it will reach the next state 01, then input 10, it will jump to the state 10, continue to input 10, it will jump to the closed state 10, at this time input 11, it will jump to the state 11. This is very close to the main character of this paper, the Developmental network (DN).
3. Delevelopment Network
3.1 From emergent UTM to Developmental network
Also keep an eye on the lookup table and transformation examples above. We take state Z as human perceptual input and input X as human muscle movement information. We bind X and Z to a node in a memory graph (developmental network). Y is implicit in the human or robot’s own model and does not need to be learned.
[note: robots or design good, don’t need to learn his transformation relations (the controller). Because, for a particular muscle action, or a robot is a man must run according to the movement model of their own, thereby automatically into the next state, in the state of awareness of new information, and the output binding muscle action information.”
The learning process can be described as assuming that the robot or human can acquire the current appropriate muscle action information X [up] by means of instruction or autonomous learning for the new perceptual input Z [bottom]. We bind X and Z together to form a memory node in the developmental network.
[note: in order to better understand the relationship between the emergence of Turing machine and the development of network, this simplifies the process of study and utilization of network development, the development of real network learning process considering the similarity between information, is an effective: (1) clustering process composition analysis (LCA), and (2) Hebbian in accordance with the rules to update the network parameters.”
A vivid example is shown in the figure below: [The key to understanding the developmental network is that the control rules of Turing machines are implicit in the human and robot bodies, and there is no need to learn them. Human and robot are themselves a model, and taking a certain action will naturally have an explicit effect on the environment]
3.2 Characteristics of developmental networks
According to Weng, the autonomous development program of a robot should have the following eight characteristics:
- 1. To perceive and act upon the physical world;
- Emergent: Internal representations are automatically generated;
- Naturally: code naturally, not manually;
- Incrementally: Not batch learning, no data sets;
- Cranial closure: human cannot alter intracranial parameters;
- Attention: it has attention mechanism for perceptual features and motor features;
- Motivated: avoidance of pain, search for quick things, curiosity, etc.
- 3. Able to extract ideas and rules from context;
There is also no doubt that developmental networks unify supervised and unsupervised learning.
3.3 Autonomic development program
From the above introduction, we can roughly know the individual node pattern of the developmental network and the characteristics of the entire developmental network. The robot needs an independent development program to independently build a development network online, including the addition of network nodes and the update of network parameters. The autonomic development program proposed by Professor Weng has the following characteristics:
- In-place learning: Each neuron processes its own learning through its own internal physiological mechanisms and interactions with other neurons. Each neuron has the same set of developmental programs and the same developmental capacity. In situ learning is neuron-centered learning.
[Note: Thus, it is only necessary to describe the learning process of a single nerve cell to understand the learning of the entire developmental network, no matter how complex it may be. A) response yyy from the bottom up (input from the upper layer to the current layer); b) response AAA from the top down (input from the lower layer to the current layer);c) Inhibitory response HHH from other neurons in the same layer H)z=g(wBY + WPA − WHH), where GGG is an arbitrary activation function. The green circle and thick line in the figure below represent a complete neuron cell, the solid line without arrows connected to the circle is bottom-up input, the solid line connected to the thick line is top-down input, and the dotted line connected to the thick line is inhibition input of the neurons in the same layer.
- Lateral inhibition: In biological neural networks, such as human retinas, there exists a phenomenon of “Lateral inhibition”, that is, after one excited nerve cell, it will inhibit other surrounding nerve cells through its branches. Lateral inhibition is a mechanism by which neurons in the same layer compete. In order to make lateral inhibition more effective in the developmental network: a) the strongly responding neurons can effectively inhibit the weakly responding neurons; B) Weakly responding neurons do not affect strongly responding neurons. Professor Weng used the top-K competition rule, that is, the neurons in the same layer were ranked from strong to weak in response degree, and the neurons ranked in the first K had a non-zero weight update, while the weights of other neurons remained unchanged.
[note: For fixed number of neuron network in advance before the competition mechanism will make the learned model is replaced by the latest pattern, that is forgotten. Through the communication with professor weng, also confirmed the existence of the forgotten, and professor weng thinks this forgotten is very similar to the characteristics of the human brain, is reasonable. Different point of view with professor weng, I stick to forget not good features, I don’t think people have all the good qualities.
- Hebbian Learning (Hebbian learning) : Hebbian theory describes the basic principle of synaptic plasticity, that is, continuous repeated stimulation from presynaptic neurons to postsynaptic neurons can lead to an increase in synaptic transmission efficiency. This theory was developed by Donald. It was put forward by Hebb in 1949 and is also known as Hebb’s law. Hebb’s theory can be used to explain “associative learning”, in which repeated stimulation of neurons increases the strength of synapses between neurons. This method of learning is called Hebb learning. Hebb’s theory has also become the biological basis for unsupervised learning.
[Note: Assuming that the weight of the winning neuron in the current competition is VJT −1v_j^{t-1} VJT −1 and the current input is Y (t)y(t)y(t), the winning neuron is updated according to the following formula:
V_j ^ {t} = \ alpha v_j ^ {} t – 1 + (1 – \ alpha) y (t) \ tag {1}, 1 – alpha 1 – \ 1 – alpha as a vector, alpha \ alpha alpha for forgetting rate. 】
- Lobe components: Leaf component analysis divides the sample space into C non-overlapping regions, known as leaf regions. As the name suggests, it is like several leaves growing from the same bud point on a branch. The leaf composition of a two-dimensional space is shown in the figure below:
[note: | composition of leaf by leaf component analysis algorithm is derived in detail and applied see this blog, leaf component analysis is a clustering algorithm, each clustering center by its direction vector direction. Speak more understand points, the distance of the clustering algorithm is composed of two vectors of cos (theta) cos (\ theta) cos (theta) said, Including theta \ theta theta Angle between two vectors. We know that the distance measurement way of very large effect on the clustering algorithm performance, a distance metric may only be applied to some specific data. Professor weng to leaf component analysis as a distance measure, and the effect is good, the main reason is that its input data for the picture, the dimension is very high, In such a high dimension input space, the distribution of data is basically in a hypersphere, so use composition is appropriate. However, for some low dimensional, or spread the whole state space data distribution, composition is not appropriate. The following figure, is a 2 d data sets, we can’t use leaf component analysis will it get into the right class.”
Summary: The main features of the autonomic development program are given above, and each feature is explained in detail. Lateral inhibition, Hebb learning and leaf components are not completely independent individuals, they are interdependent relations. The development program of individual neuron cells can be regarded as leaf component analysis, but the leaf component analysis algorithm is updated by The Herb rule and the clustering center vector (neuron weight parameters) of lateral inhibition victory. This is a set of self-contained algorithm, and the effectiveness of the algorithm is verified by experiments. There are two main reasons why I hold a positive attitude towards this algorithm: 1) Is it reasonable to forget? 2) Is the clustering of leaf components universal to all data distributions? As for the choice of clustering distance measurement, there is a saying that: which one to use depends on what type of data we have and what our notion of similar is. Of course, I’m also trying to incorporate other distance measures into autonomic development programs. However, the current autonomic development program is based on leaf component analysis, that is, a new autonomic development program has to be derived by replacing the distance measure. It’s a bit of a challenge.
3.4 Examples of developmental networks
What-what Networks (WWNs)
4. An example of DN: what-network
4.1 Data Preparation
Learning: What-network uses image data (Input), the type label of the object (What motor) and the location label of the object in the image (Where motor) as training data to develop the network. Utilization: For the similar samples encountered in training, the developmental network can obtain the type of the object and the location of the picture.
4.1.1 Description of the construction process
Firstly, it introduces how to construct the data required by the WHat-network training. We can take photos in real time and tag them with location and type. However, the process is cumbersome. The main purpose of this part is to visualize the developmental network, followed by the description of real application. Therefore, this part adopts artificially constructed data to train DN. Manually constructed image data, can automatically get the type and location of the object label, eliminating the trouble of manual label. The following is a simple way to construct training data manually.
- Background image: To simulate a real scene image, an object usually has a background. For example, a cat on a sofa, the sofa is the background of the cat. The background size of this part is 19×19, which is generated by randomly cutting 19×19 pieces of the 13 real scene pictures on the left of the above image.
- Object images: This section uses low-resolution images of five animals with a size of 11×11 as objects. They are 1-cat, 2-dog, 3-elephant, 4-pig and 5-truck.
- Input: Randomly select an object image and place it at a random position in a randomly selected background image (there are 25 possible positions here, and the blue square in the middle above is the center position of the object image). (As shown in the middle of the figure above)
- Where-what motors: Where motor and What motor take the position and type of construction as Input.
During each training, a group of training samples [Image,type,position] is randomly obtained by the above methods. The Python program of sample construction is shown below.
4.1.2 Sample construction procedure
- 1. Randomly cut the real picture to get 19×19 background picture.
def get_background(background_height, background_width) :
epsilon_error = 1.0/ (256*sqrt(12))
f = open('backgrounds/backgrounds_to_use.txt')
splitData = f.read().split("\n")
curr_num = np.random.randint(len(splitData) - 2) + 1
img_file = 'backgrounds/'+splitData[curr_num]+'. '+splitData[0]
curr_img = np.array(Image.open(img_file))
rows, cols = curr_img.shape
ul_row = np.random.randint(rows-background_height)
ul_col = np.random.randint(cols-background_width)
bg = curr_img[ul_row:ul_row + background_height, ul_col:ul_col + background_width]
bg1 = copy.deepcopy(bg)
samplemin = np.min(bg1)
samplemax = np.max(bg1)
bg = (bg - samplemin)/(samplemax-samplemin+epsilon_error)
plt.imshow(bg)
plt.show()
ul_img = Image.fromarray(np.uint8(bg))
ul_img.show()
return bg
Copy the code
- 2. Select a random object as the foreground image and add it to the background image.
def add_foreground(background, training_type) : # 01:cat; 02:dog; 03:elephant; 04:pig; 05:truck
f = open('foregrounds/foregrounds_to_use.txt')
splitData = f.read().split("\n")
curr_num = np.random.randint(training_type) + 1
img_type = curr_num - 1
img_file = 'foregrounds/'+splitData[curr_num]+'. '+splitData[0]
curr_img = Image.open(img_file, 'r')
fg = np.array(curr_img.convert('L'))
obj_width, obj_height = fg.shape
fg1 = copy.deepcopy(fg)
samplemin = np.min(fg1)
samplemax = np.max(fg1)
fg = (fg - samplemin)/(samplemax-samplemin)
# plt.imshow(fg)
# plt.show()
p_r = np.random.randint(5)
p_c = np.random.randint(5)
r = (p_r)*2
c = (p_c)*2
position = p_r*5+p_c
background[r:r+obj_width, c:c+obj_width] = fg
# plt.imshow(background)
# plt.show()
return background, img_type, position
Copy the code
- 3. Further obtain where and What Motors by using the constructed input pictures and encapsulate them into standard sample data format
,>
.
def get_image(input_dim, z_neuron_num) :
""" 1: cat; 2:dot; 3:elephant; 4:pig; 5:truck position: (row-1)*col """
epsilon_error = 1/ (256*sqrt(12))
background_width = input_dim[0]
background_height = input_dim[1]
bg = get_background(background_height, background_height)
training_type = 1
if len(z_neuron_num) ! =1:
training_type = z_neuron_num[0]
training_image, img_type, position = add_foreground(bg, training_type)
true_z = []
if len(z_neuron_num) ! =1:
true_z.append(img_type)
true_z.append(position)
else:
true_z.append(position)
true_z = np.array(true_z)
return training_image, true_z
Copy the code
4.1.3 Sample examples of construction
The following image shows a randomly selected background, randomly selected objects, and randomly placed positions to get the final output image and Where and What Motors. [Note: The images are normalized and displayed by plt.imshow of Matplotlib, so the display is different from the real grayscale images.]
4.2 Introduction to what-What Network
A simple example of what-What network introduced in this paper is shown in the figure below [it should be the simplest form, but the sparrow is small, but it should have many parts as well]. The network consists of three layers: Input, DN and Motors. The most core is DN marked in red box, which stores all the knowledge learned. And, in this demo example,The number of network neurons in DN grows adaptively, the number of neurons in the development layer is 25x(3^split_times), where split_times is the number of times meeting the splitting requirement [It’s a little bit like mitosis of a cell, where instead of a cell being divided into two, one node here can be divided into whatever value we set, which is three】. The number of neurons in the input layer is the vectorized length of the input image 19×19=361. What Motor has five neurons and Where Motor has 25 neurons. The number of neurons in input layer and motor layer was determined by samples directly.
4.3 WHAT-What network training
The first feature of DN is in-place. In-place means that the network is local, that the renewal of a single neuron is related only to itself and the surrounding neurons, and that the renewal of each neuron shares the same set of developmental procedures (similar to the way cells in the human body share the same set of DNA). Therefore, we only need to clearly explain the parameters, output calculation and weight update of one neuron in DN. The methods and procedures for all neuron renewal are the same.
4.3.1 Weight Initialization
DN neuron has five parameters in total: ①bottom up weight; 2.top down weight; 3.lateral weight; (4)inhibit weight; 5.synapse factor. The following figure shows the first three parameters. We ignore the latter two groups of parameters and only use the first three parameters to calculate the output of a neuron as shown in the figure below [please read the figure below carefully]. First, the response of X and Y are calculated, and then the weighted sum of the two responses is used to obtain the temporary response Temp Y. Then, the response of the lateral neuron to the neuron is obtained by considering the lateral parameters, and the weighted sum with Temp Y is used to obtain the final neuron response Y.Fourth parameter:inhibit weightIs the same-layer inhibition parameter, that is, when a neuron is activated, only nearby neurons will be activated proportionally, and other neurons slightly distant will be inhibited (for the current sample, the parameter is not updated). The fifth parameter:synapse factorIt’s the attention selection factor. For different neurons, the input of the importance of different dimensions are different, therefore, for a particular input 】 【 X or Z, each neuron is accompanied by a factor of Y vector, the factor of vector and the input vector dimensions, one to factor vector on each element indicates the importance of corresponding to the input vector elements, the larger the value, the more important. [This is similar to feature selection]
In this example, the fourth argument is used in the top-k competition function; The fifth parameter is used only in the attentional selection of Y to input X, enabling the developmental network to adaptively select the important dimension of perception data [feature selection].
Here is the network parameter definition and the initial function for this example.
def dn_create(self) :
self.y_lsn_flag = np.zeros((1, self.y_neuron_num))
self.y_firing_age = np.zeros((1, self.y_neuron_num))
self.y_inhibit_age = np.zeros((1, self.y_neuron_num))
self.y_bottom_up_weight = np.ones((self.x_neuron_num, self.y_neuron_num))
self.y_top_down_weight = []
for i in range(self.z_area_num):
self.y_top_down_weight.append(np.ones((self.z_neuron_num[i], self.y_neuron_num)))
self.y_lateral_weight = np.zeros((self.y_neuron_num, self.y_neuron_num))
self.y_inhibit_weight = np.ones((self.y_neuron_num, self.y_neuron_num))
self.y_synapse_flag = 1
""" 1: only bottom-up 2: bottom-up + top-down 3: bottom-up + top-down + lateral 4: bottom-up + top-down + lateral + inhibit """
self.y_synapse_coefficient = [0.8.1.2.5.0]
self.y_synapse_age = 20
self.y_bottom_up_synapse_diff = np.zeros(self.y_bottom_up_weight.shape)
self.y_bottom_up_synapse_factor = np.ones(self.y_bottom_up_weight.shape)
self.y_top_down_synapse_diff = []
self.y_top_down_synapse_factor = []
for i in range(self.z_area_num):
self.y_top_down_synapse_diff.append(np.zeros(self.y_top_down_weight[i].shape))
self.y_top_down_synapse_factor.append(np.ones(self.y_top_down_weight[i].shape))
self.y_lateral_synapse_diff = np.zeros(self.y_lateral_weight.shape)
self.y_lateral_synapse_factor = np.ones(self.y_lateral_weight.shape)
self.y_inhibit_synapse_diff = np.zeros(self.y_inhibit_weight.shape)
self.y_inhibit_synapse_factor = np.ones(self.y_inhibit_weight.shape)
# z weights
self.z_bottom_up_weight = []
self.z_firing_age = []
for i in range(self.z_area_num):
self.z_bottom_up_weight.append(np.zeros((self.y_neuron_num, self.z_neuron_num[i])))
self.z_firing_age.append(np.zeros((1, self.z_neuron_num[i])))
# responses
self.x_response = np.zeros((1, self.x_neuron_num))
# pre lateral response is bottom up + top down, used to get lateral
# pre response is bottom up + top down + lateral
self.y_bottom_up_percent = 1/2
self.y_top_down_percent = 1/2
self.y_lateral_percent = 1/2
self.y_bottom_up_response = np.zeros((1, self.y_neuron_num))
self.y_top_down_response = np.zeros((self.z_area_num, self.y_neuron_num))
self.y_pre_lateral_response = np.zeros((1, self.y_neuron_num))
self.y_lateral_response = np.zeros((1, self.y_neuron_num))
self.y_pre_response = np.zeros((1, self.y_neuron_num))
self.y_response = np.zeros((1, self.y_neuron_num))
self.z_response = []
for i in range(self.z_area_num):
self.z_response.append(np.zeros((1, self.z_neuron_num[i])))
Copy the code
4.3.2 Response calculation
Based on the above figure and the previous section, the input of a complete response calculation function includes: input vector input_vec, weight vector weight_vec, and attention factor synapse_factor. First, multiply each element of the attention factor by the corresponding element of the input vector, then compute its normalized unit vector, The final response is then obtained by dot multiplying the normalized input vector of the fusion attention factor with the normalized weight vector (the dot product of the two normalized vectors is the Angle cosine of the original vector cos(θ)cos(\theta)cos(θ)).
def compute_response(input_vec, weight_vec, synapse_factor) :
""" input_vec is of shape 1x input_dim weight_vec is of shape input_dim x neuron_num syanpse_factor is of shape input_dim x neuron_num """
_, neuron_num = weight_vec.shape
_, input_dim = input_vec.shape
# reshape input to neuron_num x input_dim
temp_input = np.tile(input_vec, (neuron_num, 1))
temp_input = temp_input*synapse_factor.T
# normalize input
temp_input_norm = np.sqrt(np.sum(temp_input*temp_input, axis=1))
temp_input_norm[temp_input_norm == 0] = 1
temp_input = temp_input/np.tile(temp_input_norm.reshape(-1.1), (1, input_dim))
# normalize weight
weight_vec_normalized = weight_vec*synapse_factor
# plt.imshow(weight_vec_normalized)
# plt.show()
weight_vec_norm = np.sqrt(np.sum(weight_vec_normalized*weight_vec_normalized, axis=0))
weight_vec_norm[weight_vec_norm == 0] = 1
weight_vec_normalized = weight_vec_normalized/np.tile(weight_vec_norm, (input_dim, 1))
output_vec = np.zeros((1, neuron_num))
for i in range(neuron_num):
output_vec[0, i] = np.dot(temp_input[i,:].reshape(1, -1), weight_vec_normalized[:, i].reshape(-1.1))0.0]
return output_vec
Copy the code
4.3.3 Top-K competition response
Suppose we use the above response calculation function to get the response of each input separately, and then sum the weighted sum to get the final neuron response. [Function related codes are as follows]
self.x_response = training_image.reshape(1, -1)
for i in range(self.z_area_num):
self.z_response[i] = np.zeros(self.z_response[i].shape)
self.z_response[i][0,true_z[i]] = 1
self.x_response = preprocess(self.x_response)
# compute response
self.y_bottom_up_response = compute_response(self.x_response,
self.y_bottom_up_weight,
self.y_bottom_up_synapse_factor)
for i in range(self.z_area_num):
self.y_top_down_response[i] = compute_response(self.z_response[i],
self.y_top_down_weight[i],
self.y_top_down_synapse_factor[i])
# top-down + bottom-up response
self.y_pre_lateral_response = (self.y_bottom_up_percent*self.y_bottom_up_response +
self.y_top_down_percent*np.mean(self.y_top_down_response, axis=0).reshape(1, -1))/(self.y_bottom_up_percent + self.y_top_down_percent)
# Side responseself.y_lateral_response = compute_response(self.y_pre_lateral_response, self.y_lateral_weight, self.y_lateral_synapse_factor) self.y_pre_response = ((self.y_bottom_up_percent + self.y_top_down_percent)*self.y_pre_lateral_response + self.y_lateral_percent*self.y_lateral_response) self.y_response = top_k_competition(self.y_pre_response, self.y_top_down_response, self.y_inhibit_weight, self.y_inhibit_synapse_factor, self.y_top_k)Copy the code
We need to further calculate the activated neurons through top-K competition, so as to know which neurons can use the current sample for parameter update. Top-k competition calculations make use of inhibit weight. The procedure is as follows:
def top_k_competition(response_input, top_down_response, inhibit_weight, inhibit_synapse_factor, top_k) :
""" TODO: there are two ways to do things 1: if a neuron is within the synapse, then include that neuron in top-k 2: If a neuron is within the synapse and the weight is > 0.5, then include that neuron in top-k this version does things in the 1st way response_input is of size 1xneuron_num """
response_output = np.zeros(response_input.shape)
_, neuron_num = response_input.shape
top_down_flag = np.ones((1, neuron_num))
for i in range(len(top_down_response)):
top_down_flag = top_down_flag*top_down_response[i]
for i in range(neuron_num):
curr_response = response_input[0,i]
curr_mask = (inhibit_synapse_factor[:,i] > 0)
compare_response = response_input*curr_mask.T.reshape(1, -1)
compare_response[0,i] = curr_response
neuron_id = np.argsort(-compare_response.reshape(-1))
for j in range(top_k):
if len(top_down_response) ! =0:
if neuron_id[j] == i and top_down_flag[0,i]>0:
response_output[0,i] = 1
break
elif neuron_id[j] == i:
response_output[0,i] = 1
break
return response_output
Copy the code
4.3.4 Updating Parameters
According to the top-K competition function, the current activated neurons can be calculated, and each weight can be updated according to The Hebb learning rules.
Two points need to be made here:
- This example only makes sense with the attention selection for X, which is bottom_up_synapse_factor, and is updated online, via the Synapse DIff intermediate variable; Z has no attention selection, so synapse_factor is fixed to 1.
- Learning rate LR is a function of access frequency and forgotten parameters, which is adjusted adaptively with learning. The reason why static and fixed learning rate is not used is to reasonably allocate the weights of historical data and current data, so as to better track the dynamics of input environment. In this case, the learning rate can be obtained as a function (firing_age is the activation times of this neuron) :
The procedure is as follows:
# hebbian learning and synapse maitenance
for i in range(self.y_neuron_num):
if self.y_response[0,i] == 1: # firing neuron, currently set response to 1
if self.y_lsn_flag[0,i] == 0:
self.y_lsn_flag[0,i] = 1
self.y_firing_age[0,i] = 0
lr = get_learning_rate(self.y_firing_age[0,i]) # learning rate
# bottom-up weight and synapse factor
self.y_bottom_up_weight[:,i] = (1-lr)*self.y_bottom_up_weight[:,i] + lr*self.x_response.reshape(-1)
self.y_bottom_up_synapse_diff[:,i] = ((1-lr)*self.y_bottom_up_synapse_diff[:,i]+
lr*(np.abs(self.y_bottom_up_weight[:,i]-self.x_response.reshape(-1))))
if self.y_synapse_flag>0 and self.y_firing_age[0,i] > self.y_synapse_age:
self.y_bottom_up_synapse_factor[:,i] = get_synapse_factor(self.y_bottom_up_synapse_diff[:,i],
self.y_bottom_up_synapse_factor[:,i],
self.y_synapse_coefficient)
# top-down weight and synapse factor
for j in range(self.z_area_num):
self.y_top_down_weight[j][:,i] = (1-lr)*self.y_top_down_weight[j][:,i] + lr*self.z_response[j].reshape(-1)
self.y_top_down_synapse_diff[j][:,i] = ((1-lr)*self.y_top_down_synapse_diff[j][:,i] +
lr*np.abs(self.y_top_down_weight[j][:,i]-self.z_response[j].reshape(-1)))
if (self.y_synapse_flag>1) and (self.y_firing_age[0,i]>self.y_synapse_age):
self.y_top_down_synapse_factor[j][:,i] = get_synapse_factor(self.y_top_down_synapse_diff[j][:,i],
self.y_top_down_synapse_factor[j][:,i],
self.y_synapse_coefficient)
# lateral weight and synapse factor
# lateral exitation connection only exists within firing neurons
self.y_lateral_weight[:,i] = (1-lr)*self.y_lateral_weight[:,i]+lr*self.y_response.reshape(-1)
self.y_lateral_synapse_diff[:,i] = ((1-lr)*self.y_lateral_synapse_diff[:,i] +
lr*np.abs(self.y_lateral_weight[:,i] - self.y_response.reshape(-1)))
if (self.y_synapse_flag > 2) and (self.y_firing_age[0,i]>self.y_synapse_age):
self.y_lateral_synapse_factor[:,i] = get_synapse_factor(self.y_lateral_synapse_diff[:,i],
self.y_lateral_synapse_factor[:,i],
self.y_synapse_coefficient)
self.y_firing_age[0,i] = self.y_firing_age[0,i] + 1
elif self.y_lsn_flag[0,i] == 0: # initialization stage neuron is always updating
lr = get_learning_rate(self.y_firing_age[0,i])
normed_input = self.x_response.reshape(-1.1)*self.y_bottom_up_synapse_factor[:,i].reshape(-1.1)
self.y_bottom_up_weight[:,i] = (1-lr)*self.y_bottom_up_weight[:,i]+lr*normed_input.reshape(-1)
self.y_bottom_up_weight[:,i] = self.y_bottom_up_weight[:,i]*self.y_bottom_up_synapse_factor[:,i]
self.y_bottom_up_synapse_diff[:,i] = ((1-lr)*self.y_bottom_up_synapse_diff[:,i] +
lr*np.abs(self.y_bottom_up_weight[:,i] - normed_input.reshape(-1)))
if self.y_synapse_flag>0 and self.y_firing_age[0,i] > self.y_synapse_age:
self.y_bottom_up_synapse_factor[:,i] = get_synapse_factor(self.y_bottom_up_synapse_diff[:,i],
self.y_bottom_up_synapse_factor[:,i],
self.y_synapse_coefficient)
# top-down weight and synapse factor
for j in range(self.z_area_num):
self.y_top_down_weight[j][:,i] = (1-lr)*self.y_top_down_weight[j][:,i] + lr*self.z_response[j].reshape(-1)
self.y_top_down_synapse_diff[j][:,i] = ((1-lr)*self.y_top_down_synapse_diff[j][:,i] +
lr*np.abs(self.y_top_down_weight[j][:,i]-self.z_response[j].reshape(-1)))
if (self.y_synapse_flag>1) and (self.y_firing_age[0,i]>self.y_synapse_age):
self.y_top_down_synapse_factor[j][:,i] = get_synapse_factor(self.y_top_down_synapse_diff[j][:,i],
self.y_top_down_synapse_factor[j][:,i],
self.y_synapse_coefficient)
# lateral weight and synapse factor
# lateral exitation connection only exists within firing neurons
self.y_lateral_weight[:,i] = (1-lr)*self.y_lateral_weight[:,i]+lr*self.y_response.reshape(-1)
self.y_lateral_synapse_diff[:,i] = ((1-lr)*self.y_lateral_synapse_diff[:,i] +
lr*np.abs(self.y_lateral_weight[:,i] - self.y_response.reshape(-1)))
if (self.y_synapse_flag > 2) and (self.y_firing_age[0,i]>self.y_synapse_age):
self.y_lateral_synapse_factor[:,i] = get_synapse_factor(self.y_lateral_synapse_diff[:,i],
self.y_lateral_synapse_factor[:,i],
self.y_synapse_coefficient)
else:
lr = get_learning_rate(self.y_inhibit_age[0,i])
temp = np.zeros(self.y_inhibit_synapse_factor.shape)
for j in range(self.y_neuron_num):
temp[:,j] = self.y_pre_lateral_response.reshape(-1)*self.y_inhibit_synapse_factor[:,j]
temp[:,j] = (temp[:,j] > self.y_pre_lateral_response[0,i])
self.y_inhibit_weight[:,i] = (1-lr)*self.y_inhibit_weight[:,i]+lr*temp[:,i]
self.y_inhibit_synapse_diff[:,i] = ((1-lr)*self.y_inhibit_synapse_diff[:,i] +
lr*np.abs(self.y_inhibit_weight[:,i]-temp[:,i]))
if (self.y_synapse_flag > 3) and (self.y_firing_age[0,i]>self.y_synapse_age):
self.y_inhibit_synapse_factor[:,i] = get_synapse_factor(self.y_inhibit_synapse_diff[:,i],
self.y_inhibit_synapse_factor[:,i],
self.y_synapse_coefficient)
self.y_inhibit_age[0,i] = self.y_inhibit_age[0,i] + 1
## z neuron learning
for area_idx in range(self.z_area_num):
for i in range(self.z_neuron_num[area_idx]):
if self.z_response[area_idx][0,i] == 1:
lr = get_learning_rate(self.z_firing_age[area_idx][0,i])
self.z_bottom_up_weight[area_idx][:,i] = (1-lr)*self.z_bottom_up_weight[area_idx][:,i]+lr*self.y_response.reshape(-1)
self.z_firing_age[area_idx][0,i] = self.z_firing_age[area_idx][0,i] + 1
Copy the code
4.3.5 DN use
If we think DN training is ok, or the learning effect of DN is tested midway, we need to check whether DN is accurate for a certain input X and output WHERE motor and What motor. If the accuracy rate is very high, it indicates that DN has a good learning effect on the current sample and can be used for prediction.
Given the input X, the procedure for calculating the output Z is as follows:
def dn_test(self, test_image) :
self.x_response = test_image.reshape(1, -1)
self.x_response = preprocess(self.x_response)
self.y_bottom_up_response = compute_response(self.x_response,
self.y_bottom_up_weight,
self.y_bottom_up_synapse_factor)
self.y_pre_lateral_response = self.y_bottom_up_response
self.y_lateral_response = compute_response(self.y_pre_lateral_response,
self.y_lateral_weight,
self.y_lateral_synapse_factor)
self.y_pre_response = ((self.y_bottom_up_percent*self.y_pre_lateral_response+
self.y_lateral_percent*self.y_lateral_response)/(self.y_bottom_up_percent+
self.y_lateral_percent))
self.y_response = top_k_competition(self.y_pre_response,
[],
self.y_inhibit_weight,
self.y_inhibit_synapse_factor,
self.y_top_k)
z_output = []
for i in range(self.z_area_num):
self.z_response[i] = compute_response(self.y_response,
self.z_bottom_up_weight[i],
np.ones(self.z_bottom_up_weight[i].shape))
z_output_i = np.argmax(self.z_response[i])
z_output.append(z_output_i)
return np.array(z_output)
Copy the code
4.3.5 Adaptive growth of DN neurons
In this case, the developmental program adaptively increases the number of neurons in the developmental network. When a certain neuron is continuously activated for more than a certain number of times (20 in the program), the neuron is mitotic, and one neuron splits into three. The splitting function is as follows:
def dn_split(dn, split_num, split_firing_age) :input_dim = dn.x_neuron_num y_top_k = dn.y_top_k z_neuron_num = dn.z_neuron_num y_neuron_num = dn.y_neuron_num*split_num new_to_old_index = np.zeros(y_neuron_num, dtype=np.int)
for i in range(dn.y_neuron_num):
start_ind = i*split_num
end_ind = (i+1)*split_num
new_to_old_index[start_ind:end_ind] = i
new_to_old_index = new_to_old_index.tolist()
new_dn = DN(input_dim, y_neuron_num, y_top_k, z_neuron_num)
for i in range(new_dn.y_neuron_num):
j = new_to_old_index[i]
new_dn.y_lsn_flag[0,i] = dn.y_lsn_flag[0,j]
new_dn.y_firing_age[0,i] = split_firing_age
new_dn.y_inhibit_age[0,i] = split_firing_age
new_dn.y_bottom_up_weight[:,i] = (dn.y_bottom_up_weight[:,j] +
generate_rand_mutate(dn.y_bottom_up_weight[:,j].shape))
new_dn.y_bottom_up_weight[:,i] = (dn.y_bottom_up_weight[:,j])/np.max(new_dn.y_bottom_up_weight[:,j])
for z_ind in range(new_dn.z_area_num):
new_dn.y_top_down_weight[z_ind][:,i] = (dn.y_top_down_weight[z_ind][:,j]+
generate_rand_mutate(dn.y_top_down_weight[z_ind][:,j].shape))
new_dn.y_top_down_weight[z_ind][:,i] = new_dn.y_top_down_weight[z_ind][:,i]/np.max(new_dn.y_top_down_weight[z_ind][:,i])
new_dn.y_lateral_weight[:,i] = dn.y_lateral_weight[new_to_old_index, j]
new_dn.y_inhibit_weight[:,i] = (dn.y_inhibit_weight[new_to_old_index, j]+
generate_rand_mutate(dn.y_inhibit_weight[new_to_old_index, j].shape))
new_dn.y_inhibit_weight[:,i] = new_dn.y_inhibit_weight[:,i]/np.max(new_dn.y_inhibit_weight[:,i])
new_dn.y_bottom_up_synapse_diff[:,i] = dn.y_bottom_up_synapse_diff[:,j]
new_dn.y_bottom_up_synapse_factor[:,i] = np.ones(dn.y_bottom_up_synapse_factor[:,j].shape)
for z_ind in range(new_dn.z_area_num):
new_dn.y_top_down_synapse_diff[z_ind][:,i] = dn.y_top_down_synapse_diff[z_ind][:,j]
new_dn.y_top_down_synapse_factor[z_ind][:,i] = np.ones(dn.y_top_down_synapse_factor[z_ind][:,j].shape)
new_dn.y_lateral_synapse_diff[:,i] = dn.y_lateral_synapse_diff[new_to_old_index, j]
new_dn.y_lateral_synapse_factor[:,i] = np.ones(dn.y_lateral_synapse_factor[new_to_old_index, j].shape)
new_dn.y_inhibit_synapse_diff[:,i] = dn.y_inhibit_synapse_diff[new_to_old_index, j]
new_dn.y_inhibit_synapse_factor[:,i] = np.ones(dn.y_inhibit_synapse_factor[new_to_old_index,j].shape)
for z_ind in range(new_dn.z_area_num):
new_dn.z_bottom_up_weight[z_ind][i,:] = dn.z_bottom_up_weight[z_ind][j,:]
for z_ind in range(new_dn.z_area_num):
new_dn.z_firing_age[z_ind] = np.ones(dn.z_firing_age[z_ind].shape)
return new_dn
Copy the code
4.4 the results
The training effect of each step is shown as follows. We found that there were three splitting operations in the middle, and the final number of DN neurons was 2533=225, and the final test accuracy was [0.99 0.962]. We noticed that whenever the network performance hit a bottleneck, the network performance was improved again after the split.500 training, Current Performance: [0.226 0.054] 1000 training, Current Performance: [0.278 0.068] 1500 Training, Current Performance: [0.362 0.164] [0.41 0.17] 2500 training, Current Performance: [0.4 0.172] 3000 training, Current Performance: [0.442 0.182] 3500 training, Current Performance: [0.476 0.212] 4000 training, Current Performance: [0.48 0.226] 4500 training, Current Performance: [0.486 0.206] 5000 training, Current Performance: [0.466 0.242] 5500 training, Current Performance: [0.462 0.204] 6000 training, Current Performance: [0.48 0.232] 6500 training, Current Performance: [0.436 0.174] 7000 training, Current Performance: Splitting at 7263 7500 training, Current Performance: [0.444 0.292] 8000 training, Current Performance: [0.776 0.558] 8500 training, Current Performance: [0.8 0.558] [0.78 0.59] 9500 training, Current Performance: [0.8 0.574] 10000 training, Current Performance: [0.784 0.6] 10500 training, Current Performance: [0.794 0.586] 11000 training, Current Performance: [0.82 0.596] 11500 training, current Performance: [0.824 0.614] 12000 training, Current Performance: [0.826 0.602] 12500 training, Current Performance: [0.818 0.58] Splitting at 12804 13000 training, Current Performance: [0.688 0.444] 13500 training, Current Performance: [0.802 0.696] 14000 training, Current Performance: [0.922 0.882] 14500 training, Current Performance: [0.934 0.908] 15000 training, Current Performance: [0.92 0.9] 15500 training, Current Performance: [0.924 0.876] 16000 training, Current Performance: [0.96 0.922] 16500 training, Current Performance: [0.974 0.928] 17000 training, Current Performance: [0.988 0.954] 17500 training, current Performance: [0.97 0.956] 18000 training, Current Performance: [0.984 0.95] 18500 Training, Current Performance: [0.992 0.984] 19000 Training, Current Performance: [0.988 0.956] 19500 training, Current Performance: [0.988 0.97] 20000 training, Current Performance: [0.996 0.97] 20500 training, Current Performance: [0.986 0.958] 21000 training, Current Performance: [0.99 0.956] 21500 Training, Current Performance: [0.994 0.972] 22000 training, Current Performance: [0.994 0.978] 22500 training, Current Performance: [0.992 0.982] 23000 training, Current Performance: [0.992 0.978] 23500 training, Current Performance: [0.992 0.966] 24000 training, Current Performance: [0.99 0.958] 24500 training, Current Performance: [0.99 0.962] Testing error is [0.996 0.975]DN y_bottom_up_weight is printed in 15×15 matrix points as follows, each matrix point is a small 19×19 matrix.
4.5 Program [Uploaded to Github]
Disclaimer: This example program is based on the matlab code written by Professor Weng's student Zejia Zheng.
Attach source code addressgithub/ZejiaZheng/multiresolutionDN[matlab]
I have developed a deeper understanding of the development network by rewriting matlab code into Python code. I am based on jupyter Notebook interactive compilation framework written python program, has been uploaded to Github. github/huyanmingtoby/multiresolutionDN[python]
4.6 Analysis and discussion
This section (Section 4) supplements the description of developmental networks with an implementation example [WHat-network]. The constructed image data and the location and type of the object are used as the input of the WWN to train the network. During training, the developmental network itself mitotic neurons in the developmental layer twice, eventually forming 225 neurons. Each split can greatly improve the accuracy of the test. Finally, y_bottom_up_weight was also printed out, and the weights associated with the X input for each neuron were printed as 19×19, and it was found that they were a series of patterns very similar to the input pictures. All possible positions for all objects is 5×25=125. The ideal number of neurons is 125 to store all possible patterns. The number of neurons in DN at initialization is 25, which is obviously not enough. After one mitosis it becomes 25×3=75, which is not enough; After another mitosis 75×3=225, it is theoretically enough [the actual experimental results also show that it is enough, the test accuracy is very high [0.99 0.962]]. However, it is important to note that the background is randomly selected, for the same object in the same location into a random background, there are still countless kinds. Finally, the neurons of the developmental network had a good effect at 225, indicating that the developmental network was able to grasp important information (object related pixels) and ignore those background pixels, which was the effect of attentional selection. The neuron’s attention option value allows it to respond only to the corresponding region of the input, thus avoiding the influence of background noise.
Personal opinion: AFTER reading papers, especially those on machine learning, I felt I understood them, but in fact, many important ideas were hidden in the real implementation process. If you really feel that the paper is very valuable, it is best to find the author’s implementation of the program to read, and their own programming implementation. There are many unspeakable but important ideas that authors may not include in papers for readability, but are sure to hide in programs. If you feel that the development of network is very valuable, it is recommended that the complete program git down, look at it, start to write, run a program.
5. Conclusion
In this paper, the idea, method, algorithm and program of developmental network are systematically introduced. The developmental network has the characteristics of in-place, and each neuron contains all developmental programs. Therefore, we focus on the initialization, training and utilization of a neuron. The development process of each neuron is a CCILCA. In order to clearly describe the idea of DN, this paper separated the leaf component analysis, and it was hard to see the figure of completing LCA. But I think it is beneficial for the introduction of DN ideas and methods. I will restore this broken LCA to the full algorithm in another blog post. Finally, a sub-WWN of the development network is used to deepen the understanding of DN and reveal more details of DN. The number of neurons at the developmental layer of DN is adjusted adaptively, and the final learning effect is also very good. In addition, this article is also mixed with a lot of personal opinions. May not be appropriate, welcome to exchange. Now I sort out my doubts about DN as follows:
- Is forgetfulness in DN justified, and is human quality good?
- Can cos(θ)cos(\theta)cos(θ) be replaced by other distance measures without changing the autonomic developmental procedures?
- DN, in the final analysis, stores the learned knowledge in the form of lookup table, namely
,>
. Is the knowledge in human brain really existing in one memory state after another? Like deep learning, RBF network can use nonlinear function fitting the relationship between x and z, this is clearly a more efficient way, putting aside the human incremental, sequence learning 】, this function characterization ways can the information more effective coding, so as to make the same amount of information, the model function said need capacity will be smaller.