This paper proposes NASH method for neural network structure search. The core idea is similar to the previous EAS method, which uses network morphism to generate a series of complex subnets with consistent effects and inherited weights. In this paper, the network morphism is richer, and the search can be completed only with the assistance of simple mountain climbing algorithm, which takes 0.5GPU day. Xiao Fei algorithm engineering notes public number

Thesis: Simple And Efficient Architecture Search for Convolutional Neural Networks

  • Thesis Address:Arxiv.org/pdf/1711.04…

Introduction


The goal of the paper is to greatly reduce the amount of computing in network search and maintain the high performance of the results. The core idea is similar to EAS algorithm, and the main contributions are as follows:

  • The baseline method is provided, and the randomly constructed network is trained with SGDR. The error rate of ciFAR-10 can reach 6%-7%, which is higher than most NAS methods.
  • EAS extends the research on network morphisms and can provide popular network construction blocks, such as Skip Connection and BN.
  • In each iteration, a series of network morphisms are applied to the current network to obtain multiple new networks, and then cosine annealing is used for rapid optimization to obtain new networks with better performance. On CIFAR-10, NASH needed just 12 hours on a single card to reach baseline accuracy.

Network Morphism


forOn a series of networks, network morphism is the mapping, from the parameter isIn the networkConverts to the argumentIn the networkAnd satisfy Formula 1, that is, for the same input, the output of the network remains unchanged.

Examples of network morphisms of several standard network structures are given below:

Network morphism Type I

willI’m going to replace formula 2,In order to satisfy formula 1, setandCan be used to add a full connection layer.

Another more complicated strategy is formula 3,Set,and, can be used to express BN layer, whereandRepresents statistical structure,andBe learnableand.

Network morphism Type II

Assuming thatIt can be any functionSaid, that is,

You can.Coordinate with any functionSubstitute formula 4 for lambda.Set,. This morphism can be expressed in two forms:

  • Increasing the layer width willImagine a layer to be widened, setYou can double the layer width.
  • Skip Connection of concatenation type, assumedIs itself a series of layer operationsSet,To achieve short circuit connection.

Network morphism Type III

Any idempotent functionBoth can be initialized by substitution with formula 5, formula 5 also holds for idempotent functions with no weight, such as ReLU.

Network morphism Type IV

Any layerCan work with any functionMake the substitution for formula 6, initialize, can be used to combine arbitrary functions, especially nonlinear functions, and can also be used to add additive Skip connection.

In addition, different combinations of network morphisms can generate new morphisms, such as the “conv-batchnorm-relu” network structure that can be inserted after the ReLU layer by formulas 2, 3 and 5.

Architecture Search by Network Morphisms


NASH method is based on mountain climbing algorithm. It starts from small networks and generates larger sub-networks through network morphism. Due to the constraints of Formula 1, the performance of subnetworks is the same as that of the original network.

Figure 1 visualizes a step of NASH method. The ApplyNetMorph(Model, n) of Algorithm 1 contains N network morphism operations, each of which is a random one of the following methods:

  • Deepening the network, such as adding the conv-batchnorm-relu module, the insertion position and the size of the convolution kernel are random, and the number of channels is consistent with the most recent convolution operation.
  • Widen the network, for example, using Network Morphism Type II to widen the output channels at random proportions.
  • Add the layerThe layerSkup connection, using Network Morphism Type II or IV, the insertion location is randomly selected.

Because of the use of network morphism, the subnet inherits the weight of the original network and has consistent performance, NASH method has the advantage of being able to quickly evaluate the performance of the subnet. This paper uses a simple hill-climbing algorithm, of course, other optimization strategies can also be selected.

Experiments


Baslines

Retraining from Scratch

CIFAR-10

CIFAR-100

CONCLUSION


The core idea of NASH method is similar to the previous EAS method. Network morphism is used to generate a series of complex subnets with consistent effects and inherited weights. The network morphism in this paper is richer, and the search can be completed only with the assistance of simple mountain climbing algorithm, which takes 0.5GPU day





If this article is helpful to you, please click a like or watch it. For more content, please follow the wechat public account.