“This is the 17th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

InfoGAN principle

The original GAN was able to produce meaningful output, but the disadvantage was that its properties were uncontrollable. For example, there is no way to explicitly ask the generator to generate the face of a female celebrity that has dark hair, fair skin, brown eyes, and a smile. The fundamental reason for this is that the 100-DIM noise vector used incorporates all the significant attributes of the generator output.

If you can modify the original GAN to separate the representation into merging and separating interpretable potential code vectors, you can tell the generator what to compose. Merge and split codes can be expressed as follows:

Gans with separated representations can also be optimized in the same way as regular gans. The output of the generator can be expressed as:


G ( z . c ) = G ( z ) G(z,c)=G(z)

Code z = z (z, c) = (z, c) z = (z, c) contains two elements, ZZZ said merger, said c = c1 and c2,… ,cLc=c_1,c_2,… ,c_Lc=c1,c2,… CL represents the separate coding representation. To enforce the decoupling of codes, InfoGAN proposed a regularization function for the original loss function, which maximizes the mutual information between the underlying codes CCC and G(z,c)G(z, C)G(z, C) :


I ( c ; G ( z . c ) ) = I G ( c ; z ) I(c; G(z,c))=IG(c; z)

The regularizer forces the generator to consider the underlying encoding. In the field of information theory, the mutual information between potential codes CCC and G(z,c)G(z, C)G(z, C) is defined as:


I ( G ( c ; z ) = H ( c ) H ( c G ( z . c ) ) I(G(c; z)=H(c)-H(c|G(z,c))

H H (c) H (c) (c) is a potential entropy coding CCC, and H ∣ G (z, c) (c) H (c | G (z, c)) H ∣ G (z, c) (c) is the output of the generator G (z, c) G (z, c) G (z, c) after the conditional entropy of CCC. Maximum mutual information means in the generated output generated when the H ∣ G (z, c) (c) H (c | G (z, c)) H ∣ G (z, c) (c) to minimize or reduce the potential uncertainty in coding.

But due to estimate H ∣ G (z, c) (c) H (c | G (z, c)) H ∣ G (z, c) (c) to the posterior distribution p (c ∣ G (z, c)) = p (c ∣ x) p (z, c) (c | G) = p (c | x) p (c ∣ G (z, c)) = p (∣ x c), So it is difficult to estimate H ∣ G (z, c) (c) H (c | G (z, c)) H ∣ G (z, c) (c).

Solution is through the use of auxiliary distribution Q (c) ∣ x Q (c | x) Q (c ∣ x) estimate the a posteriori probability to estimate the lower limit of mutual information and estimated the lower limit of mutual information is:


I ( c ; G ( z . c ) ) p L I ( G . Q ) = E c …… p ( c ) . x …… G ( z . c ) [ l o g Q ( c x ) ] + H ( c ) I(c; G(z,c)) \ge L_I(G,Q)=E_{c \sim p(c),x \sim G(z,c)}[logQ(c|x)]+H(c)

In InfoGAN, H(c)H(c)H(c) is assumed to be constant. Therefore, maximizing mutual information is a matter of maximizing expectations. The generator must be confident that it has generated output with specific attributes. The maximum value of this expectation is zero. Therefore, the maximum value of the lower limit of mutual information is H(c)H(c)H(c). In InfoGAN, discrete potential coding Q c ∣ (x) Q (c | x) Q (c ∣ x) can be used softmax said. Expectation is a negative categorical_crossentropy loss in TF.keras.

For one-dimensional continuous coding, the expectation is a double integral over CCC and XXX, due to the fact that the desired samples come from both the separate coding distribution and the generator distribution. One way to estimate the expected value is by assuming that the sample is a good measure of continuous data. As a result, damage estimates for clogQ clogQ ∣ x (c) (c) | x clogQ ∣ x (c).

In order to complete the InfoGAN network, there should be a logQ logQ ∣ x (c) (c) | x logQ ∣ x (c). For simplicity, network Q is a secondary network attached to the discriminator.

Discriminator loss function:


L ( D ) = E x …… p d a t a l o g D ( x ) E z . c l o g [ 1 D ( G ( z . c ) ) ] Lambda. I ( c ; G ( z . c ) ) \ mathcal L ^ {} (D) = – \ mathbb E_ \ {x sim p_ {data}} logD (x) – \ mathbb E_ {c} z, log (1 – D (z, c) (G)] – \ lambda I (c; G(z,c))

Generator loss function:


L ( G ) = E z . c l o g D ( G ( z . c ) ) Lambda. I ( c ; G ( z . c ) ) \mathcal L^{(G)} = -\mathbb E_{z,c}logD(G(z,c))-\lambda I(c; G(z,c))

Where λ\lambdaλ is a positive constant

InfoGAN implementation

If applied to MNIST data sets, InfoGAN can learn discrete and continuous encoding to modify generator output attributes. For example, like CGAN and ACGAN, discrete encoding in the form of a 10-dimensional hot label will be used to specify the numbers to be generated. However, you can add two consecutive encodings, one for controlling the Angle of the writing style and one for adjusting the stroke width. Keep the smaller size encoding to represent all other attributes:

The generator

def generator(inputs,image_size,activation='sigmoid',labels=None,codes=None) :

    image_resize = image_size // 4
    kernel_size = 5
    layer_filters = [128.64.32.1]
    inputs = [inputs,labels] + codes
    x = keras.layers.concatenate(inputs,axis=1)
    
    x = keras.layers.Dense(image_resize*image_resize*layer_filters[0])(x)
    x = keras.layers.Reshape((image_resize,image_resize,layer_filters[0]))(x)
    for filters in layer_filters:
        if filters > layer_filters[-2]:
            strides = 2
        else:
            strides = 1
        x = keras.layers.BatchNormalization()(x)
        x = keras.layers.Activation('relu')(x)
        x = keras.layers.Conv2DTranspose(filters=filters,
                kernel_size=kernel_size,
                strides=strides,
                padding='same')(x)
    if activation is not None:
        x = keras.layers.Activation(activation)(x)
    return keras.Model(inputs,x,name='generator')
Copy the code

discriminator

def discriminator(inputs,activation='sigmoid',num_labels=None,num_codes=None) :
    kernel_size = 5
    layer_filters = [32.64.128.256]
    x = inputs
    for filters in layer_filters:
        if filters == layer_filters[-1]:
            strides = 1
        else:
            strides = 2
        x = keras.layers.LeakyReLU(0.2)(x)
        x = keras.layers.Conv2D(filters=filters,
                kernel_size=kernel_size,
                strides=strides,
                padding='same')(x)
    x = keras.layers.Flatten()(x)
    outputs = keras.layers.Dense(1)(x)
    if activation is not None:
        print(activation)
        outputs = keras.layers.Activation(activation)(outputs)
    if num_labels:
        layer = keras.layers.Dense(layer_filters[-2])(x)
        labels = keras.layers.Dense(num_labels)(layer)
        labels = keras.layers.Activation('softmax',name='label')(labels)
        # 1-dim continous Q of 1st c given x
        code1 = keras.layers.Dense(1)(layer)
        code1 = keras.layers.Activation('sigmoid',name='code1')(code1)
        # 1-dim continous Q of 2nd c given x
        code2 = keras.layers.Dense(1)(layer)
        code2 = keras.layers.Activation('sigmoid',name='code2')(code2)
        outputs = [outputs,labels,code1,code2]
    return keras.Model(inputs,outputs,name='discriminator')
Copy the code

Model building

#mi_loss
def mi_loss(c,q_of_c_give_x) :
    """mi_loss = -c * log(Q(c|x)) """
    return K.mean(-K.sum(K.log(q_of_c_give_x + K.epsilon()) * c,axis=1))
    
def build_and_train_models(latent_size=100) :
    """Load the dataset, build InfoGAN models, Call the InfoGAN train routine. """
    (x_train,y_train),_ = keras.datasets.mnist.load_data()
    image_size = x_train.shape[1]
    x_train = np.reshape(x_train,[-1,image_size,image_size,1])
    x_train = x_train.astype('float32') / 255.
    num_labels = len(np.unique(y_train))
    y_train = keras.utils.to_categorical(y_train)
    
    # super parameter
    model_name = 'infogan_mnist'
    batch_size = 64
    train_steps = 40000
    lr = 2e-4
    decay = 6e-8
    input_shape = (image_size,image_size,1)
    label_shape = (num_labels,)
    code_shape = (1.)#discriminator model
    inputs = keras.layers.Input(shape=input_shape,name='discriminator_input')
    #discriminator with 4 outputs
    discriminator_model = discriminator(inputs,num_labels=num_labels,num_codes=2)
    optimizer = keras.optimizers.RMSprop(lr=lr,decay=decay)
    loss = ['binary_crossentropy'.'categorical_crossentropy',mi_loss,mi_loss]
    loss_weights = [1.0.1.0.0.5.0.5]
    discriminator_model.compile(loss=loss,
            loss_weights=loss_weights,
            optimizer=optimizer,
            metrics=['acc'])
    discriminator_model.summary()
    input_shape = (latent_size,)
    inputs = keras.layers.Input(shape=input_shape,name='z_input')
    labels = keras.layers.Input(shape=label_shape,name='labels')
    code1 = keras.layers.Input(shape=code_shape,name='code1')
    code2 = keras.layers.Input(shape=code_shape,name='code2') generator_model = generator(inputs,image_size,labels=labels,codes=[code1,code2]) generator_model.summary() optimizer =  keras.optimizers.RMSprop(lr=lr*0.5,decay=decay*0.5)
    discriminator_model.trainable = False
    inputs = [inputs,labels,code1,code2]
    adversarial_model = keras.Model(inputs,
            discriminator_model(generator_model(inputs)),
            name=model_name)
    adversarial_model.compile(loss=loss,loss_weights=loss_weights,
            optimizer=optimizer,
            metrics=['acc']) adversarial_model.summary() models = (generator_model,discriminator_model,adversarial_model) data = (x_train,y_train)  params = (batch_size,latent_size,train_steps,num_labels,model_name) train(models,data,params)Copy the code

Model training

def train(models,data,params) :
    generator,discriminator,adversarial = models
    x_train,y_train = data
    batch_size,latent_size,train_steps,num_labels,model_name = params

    save_interval = 500
    code_std = 0.5
    noise_input = np.random.uniform(-1.0.1.,size=[16,latent_size])
    noise_label = np.eye(num_labels)[np.arange(0.16) % num_labels]
    noise_code1 = np.random.normal(scale=code_std,size=[16.1])
    noise_code2 = np.random.normal(scale=code_std,size=[16.1])
    train_size = x_train.shape[0]
    for i in range(train_steps):
        rand_indexes = np.random.randint(0,train_size,size=batch_size)
        real_images = x_train[rand_indexes]
        real_labels = y_train[rand_indexes]
        #random codes for real images
        real_code1 = np.random.normal(scale=code_std,size=[batch_size,1])
        real_code2 = np.random.normal(scale=code_std,size=[batch_size,1])
        # Generate fake images, tags and codes
        noise = np.random.uniform(-1..1.,size=[batch_size,latent_size])
        fake_labels = np.eye(num_labels)[np.random.choice(num_labels,batch_size)]
        fake_code1 = np.random.normal(scale=code_std,size=[batch_size,1])
        fake_code2 = np.random.normal(scale=code_std,size=[batch_size,1])
        inputs = [noise,fake_labels,fake_code1,fake_code2]
        fake_images = generator.predict(inputs)
        x = np.concatenate((real_images,fake_images))
        labels = np.concatenate((real_labels,fake_labels))
        codes1 = np.concatenate((real_code1,fake_code1))
        codes2 = np.concatenate((real_code2,fake_code2))
        y = np.ones([2 * batch_size,1])
        y[batch_size:,:] = 0
        #train discriminator network
        outputs = [y,labels,codes1,codes2]
        # metrics = ['loss', 'activation_1_loss', 'label_loss',
        # 'code1_loss', 'code2_loss', 'activation_1_acc',
        # 'label_acc', 'code1_acc', 'code2_acc']
        metrics = discriminator.train_on_batch(x, outputs)
        fmt = "%d: [dis: %f, bce: %f, ce: %f, mi: %f, mi:%f, acc: %f]"
        log = fmt % (i, metrics[0], metrics[1], metrics[2], metrics[3], metrics[4], metrics[6])
        #train the adversarial network
        noise = np.random.uniform(-1..1.,size=[batch_size,latent_size])
        fake_labels = np.eye(num_labels)[np.random.choice(num_labels,batch_size)]
        fake_code1 = np.random.normal(scale=code_std,size=[batch_size,1])
        fake_code2 = np.random.normal(scale=code_std,size=[batch_size,1])
        y = np.ones([batch_size,1])
        inputs = [noise,fake_labels,fake_code1,fake_code2]
        outputs = [y,fake_labels,fake_code1,fake_code2]
        metrics = adversarial.train_on_batch(inputs,outputs)
Copy the code

Results show

steps = 500
Copy the code

steps = 16000
Copy the code

Modify separate coding for writing AngleCopy the code