A list,

Single layer feed forward neural network (SLFN) for its good ability of learning has been widely used in many fields, but the traditional learning algorithm, the inherent drawbacks, such as BP, become the main bottleneck restricting its development, most of feedforward neural network using the gradient descent method, this method is the following several aspects of the shortcomings and deficiencies:

1. Slow training speed. Because the gradient descent method needs many iterations to achieve the purpose of correcting weights and thresholds, the training process takes a long time.

2, easy to fall into the local minimum, unable to reach the global minimum;

3. The selection of learning rate yITA is sensitive, and the learning rate has a great influence on the performance of neural network. Only when the learning rate is too small, the convergence speed of the algorithm is slow and the training process takes a long time; when the learning rate is too large, the training process may be unstable.

This paper introduces a new SLFN algorithm, extreme learning machine, the algorithm will be randomly generated between input layer and hidden layer connection weights and threshold of hidden layer neurons, and no adjustment in the process of training, only need to set the number of neurons in hidden layer, can obtain the optimal solution only, compared with the traditional training methods, This method has the advantages of fast learning speed and good generalization performance.





A typical single hidden layer feedforward neural network is shown in the figure above. The input layer is fully connected with the hidden layer and the hidden layer is fully connected with the output layer. The number of neurons in the input layer is determined according to the number of features of the sample, while the number of neurons in the output layer is determined according to the number of types of samples.

















When the number of hidden layer neurons is the same as the number of samples, Equation (10) has a unique solution, that is to say, the approximate training sample with zero error. In normal learning algorithms, W and B need to be adjusted constantly, but the research results tell us that they do not need to be adjusted constantly in fact, and can even be specified at will. Adjusting them takes time and doesn’t do much good. (There is doubt here, may be taken out of context, this conclusion may be based on a certain premise).







To sum up: ELM and BP are based on the feedforward neural network, under the architecture of their differences of learning methods, BP is through the gradient descent method, using back propagation learning way, need continuous iteration to update the weights and thresholds, and ELM is by increasing the number of nodes in hidden layer to achieve the purpose of the study, The number of hidden layer nodes is generally determined according to the number of samples, and the number of hidden layer and the number of samples are ingeniously connected. In fact, in many forward neural networks, the default maximum number of hidden layer nodes is the sample number (such as RBF). It does not need to iterate, so it is much faster than BP. The essence of ELM is the two theorems he relies on to determine how he learns. The weight W between the input layer and the hidden layer and the threshold B of the hidden layer node are randomly initialized and do not need to be adjusted. In general, the number of hidden layer nodes is the same as the number of samples (when the number of samples is small).

The MATLAB code of ELM can be directly downloaded online, and the principle is very simple. The process of ELMtrain is to calculate the weight between the hidden layer and the output layer, which is obtained by formula 11 according to the label matrix T. ELMpredict calculates the output T by using the weight between the hidden layer and the output layer. Of course, the weight between input node and hidden layer node randomly initialized in ELMtrain and the threshold of hidden layer node will be copied, and no random initialization will be carried out (otherwise, the weight between hidden layer and output layer will be calculated in vain. Because they are based on randomly initialized values in ELMtrain.) So the whole EXTREME learning machine is very simple.

Extreme learning machine is one type of learning, and deep learning can also be combined with extreme learning machine, such as ELM-AE

The thought of the extreme learning machine with RBF neural network, by increasing the number of neurons in hidden layer can be achieved to the undivided linear sample mapping to the top of the space of the linear separable, and then through the hidden layer and output layer between linear classification, complete classification function, but the activation of the radial basis function is using radial basis function, Equivalent to the kernel function in SVM (radial basis kernel function)

After reading the radial basis neural network, it is found that extreme learning machine must be influenced by the idea of RBF. Because RBF also has as many hidden layer nodes as possible, and strict radial basis network requires that the number of hidden layer nodes be equal to the number of input samples, and the final output is linearly weighted between hidden layer neurons and output layer neurons, and the weight between hidden layer and output layer, RBF is obtained by solving linear equations. The extreme learning machine is also obtained by solving linear equations through the generalized inverse matrix of hidden layer output.

Theorem one is regularized RN (general approximator) in the corresponding RBF, that is, the number of hidden layer nodes = the number of input samples; Theorem two is generalized network GN (pattern classifier) in the corresponding RBF, the number of hidden layer nodes < the number of input samples. Personal sensation EXTREME learning machine is a learning idea combining RBF and BP algorithms. Between the input layer and the hidden layer is BP, between the hidden layer and the output layer is RBF, of course, the number of nodes in the hidden layer is also used in the two forms of RBF, namely RN and GN.

Ii. Source code

function [time, Et]=ANFISELMELM(adaptive_mode)
adaptive_mode = 'none';
tic;
clc;
clearvars -except adaptive_mode;
close all;
 
% load dataset
data = csvread('iris.csv');
input_data = data(:, 1:end- 1);
output_data = data(:, end);
 
% Parameter initialization
[center,U] = fcm(input_data, 3[2 100 1e-6]); %center = center cluster, U = membership level
[total_examples, total_features] = size(input_data);
class = 3; % [Changeable]
epoch = 0;
epochmax = 400; % [Changeable]
%Et = zeros(epochmax, 1);
[Yy, Ii] = max(U); % Yy = max value between both membership function, Ii = the class corresponding to the max value
% Population initialization
pop_size = 10;
population = zeros(pop_size, 3, class, total_features); % parameter: population size * 6 * total classes * total features
velocity = zeros(pop_size, 3, class, total_features); % velocity matrix of an iteration
c1 = 1.2;
c2 = 1.2;
original_c1 = c1;
original_c2 = c2;
r1 = 0.4;
r2 = 0.6;
max_c1c2 = 2;
% adaptive c1 c2
% adaptive_mode = 'none';
% class(adaptive_mode)
iteration_tolerance = 50;
iteration_counter = 0;
change_tolerance = 10;
is_first_on = 1;
is_trapped = 0;
%out_success = 0;
for particle=1:pop_size
    a = zeros(class, total_features);
    b = repmat(2, class, total_features);
    c = zeros(class, total_features);
    for k =1:class
        for i = 1:total_features % looping for all features
            % premise parameter: a
            aTemp = (max(input_data(:, i))-min(input_data(:, i)))/(2*sum(Ii' == k)2 -);
            aLower = aTemp*0.5;
            aUpper = aTemp*1.5;
            a(k, i) = (aUpper-aLower).*rand()+aLower;
            %premise parameter: c
            dcc = (2.11.9).*rand()+1.9;
            cLower = center(k,total_features)-dcc/2;
            cUpper = center(k,total_features)+dcc/2;
            c(k,i) = (cUpper-cLower).*rand()+cLower;
        end
    end
    population(particle, 1,,, :) = a;
    population(particle, 2, :, :) = b;
    population(particle, 3, :, :) = c;
end
%inisialisasi pBest
pBest_fitness = repmat(100, pop_size, 1);
pBest_position = zeros(pop_size, 3, class, total_features);
% calculate fitness function
for i=1:pop_size
    particle_position = squeeze(population(i, :, :, :));
    e = get_fitness(particle_position, class, input_data, output_data);
    if e < pBest_fitness(i)
        pBest_fitness(i) = e;
        pBest_position(i, :, :, :) = particle_position;
    end
end
% find gBest
[gBest_fitness, idx] = min(pBest_fitness);
gBest_position = squeeze(pBest_position(idx, :, :, :));
% ITERATION
while epoch < epochmax
    epoch = epoch + 1;
    
    % calculate velocity and update particle
    % vi(t + 1) = wvi(t) + c1r1(pbi(t) - pi(t)) + c2r2(pg(t) - pi(t))
    % pi(t + 1) = pi(t) + vi(t + 1)
    r1 = rand();
    r2 = rand();
    for i=1:pop_size
        velocity(i, :, :, :) = squeeze(velocity(i, :, :, :)) + ((c1 * r1) .* (squeeze(pBest_position(i, :, :, :)) - squeeze(population(i, :, :, :)))) + ((c2 * r2) .* (gBest_position(:, :, :) - squeeze(population(i, :, :, :))));
        population(i, :, :, :) = population(i, :, :, :) + velocity(i ,:, :, :);
    end
    
    
    % Draw the SSE plot
    plot(1:epoch, Et);
    title(['Epoch ' int2str(epoch) ' -> MSE = ' num2str(Et(epoch))]);
    grid
    pause(0.001);
end
%[out output out-output]
% ----------------------------------------------------------------
time = toc;
end
Copy the code

3. Operation results

Fourth, note

Version: 2014 a