Adaptive Resonance Theory (ART) network was proposed by G. A.Carpenter, a scholar of Boston University, in 1976. For many years, G. A.Carpenter has been trying to establish a unified mathematical theory for human psychological and cognitive activities, and ART is the core part of this theory. Then G. A.Carpenter and S.Grossberg proposed ATR network. After years of research and continuous development, ART network has three forms: ART ⅰ processing bipolar or binary signals; ART ⅱ is an extended form of ART ⅰ and is used to process continuous analog signals. ART ⅲ is a hierarchical search model, which is compatible with the functions of the first two structures and expands the two-layer neural network into any multi-layer neural network. ART ⅲ has strong function and extensibility because it includes the bioelectrochemical reaction mechanism of biological neurons in the operation model of neurons. This paper introduces the working principle of ART I network, and finally gives a simple example and the corresponding program.
@[toc]
1. Catastrophic forgetting
For the learning network with tutor signal, stable memory can be achieved through repeated input of sample mode. If new samples are added to continue training, the previous training results will be gradually affected. Specific performance for the old data, old knowledge forgotten. The weight adjustment formula of the untutored learning network contains the learning item of data and the forgetting item of old data. Some compromise can be achieved by controlling the learning coefficient and forgetting coefficient. However, there is no general method to determine this coefficient. Therefore, the learning of these two types of neural networks will forget the old samples, which will affect the classification performance of the network. It is unrealistic to solve the problem of sample forgetting by expanding the network infinitely. ART network is a good solution to this problem by ensuring a compromise between the past memory mode and the new input training mode while appropriately increasing the network scale, so as to maximize the reception of new knowledge (flexibility) while ensuring less influence on the past pattern samples (stability).
ART networks and algorithms have great flexibility in adapting to new input patterns and can avoid modification of previously learned patterns. The thinking of ART network is when the network receives the new input, according to the pre-set threshold of reference check the input mode and all storage mode the degree of match between class a typical vector to determine similarity, the similarity of more than the threshold of all patterns, choose the most similar as the representative of the model class, and adjust the weights associated with the category, So that the subsequent input similar to the pattern can be matched with the pattern to obtain greater similarity. If the similarity does not exceed the threshold, a new pattern class will be created in the network, and the weight connected with the pattern class will be established to represent and store the pattern and all similar patterns entered later.
2. ART TYPE I network
According to the three elements of neural network: neuron model, network structure and learning algorithm, it is introduced.
2.1 Network Structure
2.1.1 The network system structure is as follows:
ART I network structure consists of two subsystems composed of two layers of neurons, namely, comparison layer C and recognition layer R, which contain three control signals: reset signal R and logical control signal G1 and G2.
2.1.2 C layer structure is shown in the figure below:
This layer has n neurons, each receiving signals from three aspects: external input signal, R layer winning neurons of the alien weight vector return signal and control signal
. The output of layer C neurons is generated according to the 2/3 majority voting principle, and the output value is the same as the majority signal value of the three signals. When the network is up and running,
, the recognition layer has not generated competitive winning neurons, so the feedback signal is 0. By the 2/3 rule, the C layer output should depend on the input signal that has
. When the network recognition layer has a feedback signal,
According to the 2/3 rule, the output of layer C depends on the comparison result of input signal and feedback signal. If Xi = tij, then,
, otherwise,
. You can see the control signal
The function of the comparison layer is to enable the comparison layer to distinguish between different stages of network operation, network operation phase
The function of layer C is to make the input signal directly output, after
The function of layer C is to perform the comparison function, at this time
for
and
, and both are 1, then
Is 1, otherwise is 0. It can be seen that R layer feedback signal has an adjustment effect on C layer output.
2.1.3 R layer structure is shown in the figure below:
Function is equivalent to feedforward competitive network, R layer has M neurons, representing M input pattern classes, M can be dynamically increased to set up new pattern classes. The output vector C of layer C reaches layer R neuron along the inner star weight vector of layer R neuron, and indicates the category of the input mode at the place where winning neuron is generated after competition. The output of the winning neuron is 1, and the rest is 0. Each neuron in layer R corresponds to two weight vectors, one is the inner star weight vector that gathers feedforward signals from Layer C to layer R, and the other is the alien weight vector that sends feedback signals from Layer R to layer C.
2.1.4 Control signal
The signal G2G_2G2 detects whether input mode XXX is 0, which is equal to the logical or of each component of X. If xi is all 0, G2=0, otherwise G2=1. The logic of each component of R layer output vector is R0R_0R0, then signal G1=G2G_1=G_2G1=G2 and (non of R0R_0R0). When all components of the output vector of layer R are 0 and the input vector XXX is not 0, g1g 1g1 is 1; otherwise, G1g 1g1 is 0. The function of G1g 1G1 is to enable the comparison layer to distinguish different stages of network operation. At the beginning stage of network operation, G1 enables C layer to directly output input signals, and then g1g 1G1 enables C layer to perform comparison function. At this point, CI is the comparison signal of XI and TIJ, both of which are 1. Otherwise 0. If TjT_jTj and XXX do not reach the set similarity according to some pre-set measurement standard, it indicates that the two are not close enough, so the system sends a Reset signal to make the competitive winning neuron invalid.
2.2 Network Operation Principle
The network runtime accepts input patterns from the environment and checks the match between the input patterns and all the stored pattern classes at layer R. The pattern class stored in R layer is reflected by the alien weight vector corresponding to the neurons in R layer. For the winning neuron with the highest matching degree, the network should continue to investigate the similarity between its storage pattern class and the current input pattern. Similarity is investigated according to the pre-designed reference threshold, and the following situations may occur: A. If the similarity exceeds the reference threshold, the current input mode is classified into this class. The full-time adjustment rule is that the neuron whose similarity exceeds the reference threshold adjusts its corresponding inner alien weight vector, so as to obtain greater similarity when it encounters samples close to the current input mode in the future. The other weights are left unchanged. B. If the similarity does not exceed the threshold value, the similarity of the pattern class represented by the neuron with the second highest matching degree at R layer is investigated. If the threshold value is exceeded, the operation of the network will return to situation A; otherwise, it will still return to situation B. If the similarity between all the pattern classes finally stored and the current input pattern does not exceed the threshold, a neuron representing the new pattern class should be set up at the output end of the network to represent and store the pattern, so as to participate in the subsequent matching process. The network does this for each new input sample it accepts. For each input mode, the network operation process can be summarized into four stages:
2.2.1 Matching Phase
The network is in the wait state before there is no input mode, at which point X=0X=0X=0. When the input mode XXX is not all 0, G1=1G_1=1G1=1 allows the input mode to pass through layer C directly and forward to layer R, and all the inner star weight vectors BjB_jBj corresponding to neurons in layer R are matched and calculated: Netj =BjTX=∑ I = 1NBijxinet_j =B_j^{T}X=\ Sum_ {I =1}^ Nb_ {ij} x_INetj =BjTX=∑ I = 1NBijxi Select competitive winning neurons with maximum matching (with maximum dot product) : Netj ∗ = Max j {netj} net_j ^ * = \ max_j \ {net_j \} netj ∗ = maxj {netj} the winning neuron output rj ∗ = 1 r_j ^ * = 1 rj ∗ = 1, other neurons output is zero.
2.2.2 Comparison phase
The alien weight vector Tj∗T^∗_jTj∗ connected by the winning neurons in layer R is activated, and the N weight signals from neuron J return to the N neurons in layer C. At this point, the output of layer R is not all zero, so the latest output state of layer C depends on the comparison result between the alien weight vector returned by layer R and the network input mode X. Since the alien weight vector is a typical vector of the r-layer pattern class, the comparison result reflects the similarity between the typical vector of the mode class competing for the first place in the r-layer in the matching stage and the current input mode X. The size of similarity can be reflected by similarity N0N_0N0, which is defined as: N0 = XTtj ∗ = ∑ I = 1 ntij ∗ ncin_0 xi = ∑ I = 1 = X ^ Tt_j ^ * = \ sum_ {I = 1} ^ n t_ x_i = \ {ij} ^ * sum_ c_iN0 = {I = 1} ^ n XTtj ∗ = ∑ I = 1 ntij ∗ xi = ∑ I = 1 nci as input Xix_ixi is a binary number, and N0N_0N0 actually represents the number of times that the same component of the category mode typical vector of the winning neuron and the input mode sample are simultaneously 1. The non-zero component digit N1N_1N1 N1=∑inxiN_1=\ Sum_I ^ NX_IN1 =∑inxi in the input mode sample is used to compare the warning threshold of ρ\rhoρ between 0 10 and 10 1 to check whether the similarity between the input mode and the typical vector of the mode class is lower than the warning threshold. If there is: N0/N1
ρN_0 / N_1 > \rhoN0/N1>ρ indicates that XXX is very close to the corresponding category mode of the winning neuron, and XXX is said to resonate with Tj∗T_j^*Tj∗. The first-stage matching results are valid, and the network enters the learning stage.
ρn_0>
2.2.3 Search Phase
The network enters the search phase when it sends out Reset signal. The Reset signal is used to suppress the neurons that have won the competition in the first place and continue to be suppressed in the subsequent process until the next new mode is entered. Since the winning neurons in R layer were suppressed, R0=0R_0=0R0=0, G1=1G_1=1G1=1, so the network returned to the initial matching state. Since the neurons that won last time were continually suppressed, the winner must be the second-best match. Then enter the comparison stage, and calculate the similarity between the alien weight vector Tj ∗ T ^∗_jtj∗ and the input mode. If the similarity of all the pattern classes at layer R cannot meet the requirements in the similarity check at the comparison stage, it indicates that the current input mode has no class, and a neuron needs to be added to the network output layer to represent and store the pattern class. Therefore, its star weight vector Bj∗B^∗_jBj∗ is designed as the current input mode vector. Alien weight vector Tj∗T^∗_jTj∗ all components are set to 1.
2.2.4 Learning stage
In the learning stage, the pattern class corresponding to the winning neuron that resonates should be strengthened so that greater resonance can be obtained when input samples similar to the pattern appear in the future. Alien weight vectors Tj∗T^∗_jTj∗ and inner weight vector Bj∗B^∗_jBj∗ are adjusted during operation to further enhance memory. After learning, the memory of the sample will remain in two sets of weight vectors, even if the input sample changes, the weight will still exist, so it is called long-term memory. When the later input samples are similar to the memorized samples, the two sets of long-term memories recall the R-layer output to the state of the memorized samples.
2.3 Network learning algorithm
ART I networks can be implemented using learning algorithms or hardware. Training can be carried out in the following steps: (1) When the network is initialized from layer C to layer R, the inner star weight vector BjB_jBj is given the same smaller value, For example, bij(0)=11+ Nb_ {ij}(0) = \frac{1}{1+n}bij(0)=1+ N1 Each component of the alien weight vector TjT_jTj from layer R to layer C is assigned with a value of 1. The initial weight has a great influence on the whole algorithm, and the inner star weight vector is set according to the formula above. The input vector can be guaranteed to converge to its proper category without easily using unused neurons. Each component of the alien weight vector is set to 1 to ensure that the similarity of the model can be calculated correctly during the similarity measurement. The warning threshold of similarity measurement ρ is set as the number between 0 and 1, indicating that two modes are considered similar only when they are similar. Therefore, their size directly affects the classification accuracy. Given an input mode, X=(x1,x2… , xn), xi ∈ (0, 1) nX = (x_1, x_2,… , x_n), x_i \ in (0, 1) ^ nX = (x1, x2,… ,xn),xi∈(0,1)n (3) matching degree calculate the matching degree of input mode XXX for all inner star weight vector BjB_jBj of R layer: BjTX=∑ I =1nbijxiB_j^TX=\ Sum_ {I =1}^ Nb_ {ij}x_iBjTX=∑ I = 1Nbijxi (4) Select the winning optimal matching neuron J∗J^ J∗J ∗ within the effective output neuron set J∗J^ J∗ at the R layer, Calculate the similarity of rj∗=1,else, Rj = 0R ^*_j=1, else, R_j =0rj∗=1,else,rj=0 (5) Winning neuron J ∗ at the R layer is sent back to the typical vector T∗j of the storage mode class through the alien planet, The output signals of layer C give the comparison results of vector Tj∗T^∗_jTj∗ and X, CI =tij∗ c_I = T^∗_ {ij} CI =tij∗, from which the similarity of the two vectors can be calculated as follows: N1=∑ 1Nxi,N0=∑1nciN_1=\ SUM_1 ^ NX_I,N0= \ SUM_1 ^ NC_in1 =∑ 1NXI,N0=∑ 1NCI (6) warning threshold test According to the set threshold ρ\rhoρ similarity test. (7) Search matching pattern class According to the method introduced above to conduct pattern class search. (8) Adjust the weight of the network to modify the weight vectors corresponding to j∗j^ J ∗ of neurons at the R layer. Two rules are adopted for network learning, and the adjustment of alien weight vectors follows the following rules: Tij ∗(t+1)=tij∗xit_{ij}^*(t+1)= T_ {ij}^* X_ITIj ∗(t+1)=tij∗ XI The alien weight vector is the typical vector or cluster center of the corresponding mode class, and the inner star weight vector is adjusted according to the following rules: Bij ∗ (t + 1) = tij ∗ ntij xi0.5 + ∑ I = 1 (t) ∗ (t) xi = tij ∗ (t + 1) + 0.5 ∑ I = 1 ntij ∗ (t + 1) b_ {ij} ^ * (t + 1) = \ frac {t_ {ij} ^ * (t) x_i} {0.5 + \ sum_ {I = 1} ^ n T_ (t) x_i} {the ij} ^ * = \ frac {t_ {ij} ^ * (t + 1)} {0.5 + \ sum_ {I = 1} ^ n T_ {ij} ^ * (t + 1)} bij ∗ (t + 1) = 0.5 + ∑ I = 1 ntij ∗ (t) xitij ∗ xi (t) = 0.5 + ∑ I = 1 ntij ∗ (t + 1) tij ∗ (t + 1) can be seen, if no 0.5, the denominator of constant type is equivalent to an alien power vector normalization.
2.4 ART I sample program
[""" """ """ "
import numpy as np
def ART_learn(train_data_active, weight_t_active, weight_b_active, n) :
weight_t_update = train_data_active * weight_t_active
weight_b_update = weight_t_update / (0.5 + np.sum(weight_t_update))
return weight_t_update, weight_b_update
def ART_core(train_data, R_node_num, weight_b, weight_t, threshold_ro, n) :
data_length, data_num = train_data.shape
result= np.zeros(data_num)
for i in range(data_num):
R_node = np.zeros(R_node_num)
for n in range(R_node_num):
net = []
for j in range(R_node_num):
net.append(np.sum(np.dot(train_data[:,i], weight_b[:,j])))
j_max = np.where(net == np.max(net))[0] [0]
if R_node[j_max] == 1:
net[j_max] = -n
j_max = np.where(net == np.max(net))[0] [0]
R_node[j_max] = 1
weight_t_active = weight_t[:, j_max]
weight_b_active = weight_b[:, j_max]
Similarity_N0 = np.sum(weight_t_active*train_data[:,i])
Similarity_N1 = np.sum(train_data[:,1])
flag = 1
if threshold_ro < Similarity_N0 / Similarity_N1:
weight_t[:,j_max], weight_b[:,j_max] = ART_learn(train_data[:,i], weight_t_active, weight_b_active, j_max)
print('Sample %d belongs to class %d \n'%(i, j_max))
result[i] = j_max
flag = 0
break
if flag == 1:
R_node_num = R_node_num + 1
if R_node_num == data_num + 1:
print('Sample %d belongs to category %d \n Error: current category number is %d \n'%(i, R_node_num, R_node_num))
return R_node_num,weight_b,weight_t,result
weight_b = np.column_stack((weight_b, train_data[:,i]))
weight_t = np.column_stack((weight_t, np.ones(data_length)))
print('Sample %d belongs to class %d \n'%(i, R_node_num))
result[i] = R_node_num
return R_node_num,weight_b,weight_t,result
train_data=np.array([[0.0.0.1.1.1.0],
[0.0.0.1.1.0.0],
[0.0.0.1.0.1.1],
[1.0.1.0.1.0.1],
[1.1.1.0.1.0.0],
[1.1.0.0.0.0.1]])
data_length, data_num = train_data.shape
N = 100
R_node_num = 3
weight_b = np.ones([data_length, R_node_num]) / N
weight_t = np.ones([data_length, R_node_num])
threshold_ro = 0.5
result_pre = np.zeros(data_num)
IsOver = False
for n in range(10):
R_node_num, weight_b, weight_t, result = ART_core(train_data, R_node_num, weight_b, weight_t, threshold_ro,n)
for i in range(min(len(result), len(result_pre))):
IsOver = True
ifresult[i] ! = result_pre[i]: IsOver =False
break
if IsOver:
print('Sample classification iteration completed !!!! ')
break
if R_node_num == data_num+1:
print('Classification error: number of sample categories greater than number of samples')
print("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -")
result_pre = result
Copy the code
Output result: Sample 0 belongs to class 0
Sample 1 belongs to class 0
Sample 2 is in category 1
Sample 3 is in category 2
Sample 4 is in category 1
Sample 5 is in category 2
Sample 6 is in category 4
Sample 0 is in category 3
Sample 1 belongs to class 0
Sample 2 is in category 1
Sample 3 is in category 2
Sample 4 is in category 1
Sample 5 is in category 2
Sample 6 is in category 3
Sample 0 is class 0
Sample 1 belongs to class 0
Sample 2 is in category 1
Sample 3 is in category 2
Sample 4 is in category 1
Sample 5 is in category 2
Sample 6 is in category 3
Sample 0 is class 0
Sample 1 belongs to class 0
Sample 2 is in category 1
Sample 3 is in category 2
Sample 4 is in category 1
Sample 5 is in category 2
Sample 6 is in category 3
The iteration of sample classification is completed !!!!
3. Summary
The characteristic of ART network is non-offline learning, that is, it does not start to run after repeated training of input samples, but in a real-time way while learning. Each output neuron can be regarded as a representative of a similar sample, and there is only one output neuron at most each time. When the input sample is close to an inner star weight vector, it means that its output neurons respond. By adjusting the size of warning threshold, the number of mode classes can be adjusted. When ρ is small, there are fewer mode classes; when ρ is large, there are more modes. When the ART I model is realized by hardware, the neurons in layer C and layer R are realized by circuit, and the weight of long-term memory is completed by CMOS circuit.