Self-coding neural network is an unsupervised learning algorithm whose goal is to make the output value as similar as possible to the input value
1. Network structure
The simple autoencoder model is a three-layer neural network model, which includes input layer, hidden layer and output reconstruction layer.
In practice, we tend to design a two-tier model
- Coding layer
- Decoding layer
The coding layer is responsible for reading the data and performing a series of linear transformations to compress the input sample into the hidden layer.
The decoding layer needs to restore the complex network structure and make the restored value as similar as possible to the input value
Function of 2.
Since the encoding neural network, the goal is to achieve the result of the result of the output and input as similar as possible, and have a very important step is to input the data of “compression”, namely the feature of the input is mapped to the hidden layer, and the hidden layer of the sample dimension is actually much less than the dimension of input samples, this is the first since the encoder function to realize feature dimension reduction: It will always learn the most important features, so as to pave the way for the subsequent decoding operation, so we can get the main feature vector of the input data after the coding layer
The second feature is what it was designed for, which is approximate output. Once the model has learned to encode and decode, we can encode first to get the main characteristics of the input, and then decode again to get an output that approximates the original input. Then we can also set up some encoded features, and then use the decoder to decode, you can get some “surprising” output. When we train a GAN well, we can use the generation model to generate pictures from the random noise input. Especially in styleGAN, we input potential factors and get the middle potential space through the mapping network. And thanks to the mapping network and AdaIN, we can even ignore the initial input and just add some random noise before the convolution and AdaIN to control the features of the generated image.
3. Code
The demo code was written based on PyTorch and was later used to fill in the TensorFlow version
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch import nn,optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import datasets,transforms
from torchvision.utils import save_image
import seaborn as sns
import os
import warnings
plt.rcParams['font.sans-serif'] ='SimHei'
plt.rcParams['axes.unicode_minus'] =False
warnings.filterwarnings('ignore')
%matplotlib inline
Copy the code
# set parameters
batch_size = 100
learning_rate=1e-2
num_epoches=3
# import data set
train_dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=1, shuffle=False)
Copy the code
# Define the network
class autoencoder(nn.Module) :
def __init__(self) :
super().__init__()
# encoder
self.encoder = nn.Sequential(
nn.Linear(28*28.128),
nn.Tanh(),
nn.Linear(128.64),
nn.Tanh(),
nn.Linear(64.12),
nn.Tanh(),
nn.Linear(12.3),# decoder
self.decoder = nn.Sequential(
nn.Linear(3.12),
nn.Tanh(),
nn.Linear(12.64),
nn.Tanh(),
nn.Linear(64.128),
nn.Tanh(),
nn.Linear(128.28*28),
nn.Sigmoid(),
)
def forward(self, x) :
encode = self.encoder(x)
decode = self.decoder(encode)
return encode, decode
Copy the code
# instantiation
net = autoencoder().cuda()
# Loss function and optimization function
loss_func = nn.MSELoss().cuda()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
# Visualize the training process
def list_img(i, img, title) :
img = img.reshape(28.28)
plt.subplot(2.5, i+1)
plt.imshow(img)
plt.title('%s' % (title))
def generate_test(inputs, title=' ') :
plt.figure(figsize=(15.6))
for i in range(len(inputs)):
img = inputs[i].view(-1.28*28).cuda()
hidden, outputs = net(img)
list_img(i, outputs.cpu().detach().numpy(), title)
plt.show()
Copy the code
# Training section
result = []
test_inputs = []
hiddens=[]
plt.figure(figsize=(15.6))
for i, (img, _) in enumerate(test_loader):
if i > 4 : break
test_inputs.append(img)
list_img(i, img.numpy(), 'truth')
plt.show()
for e in range(num_epoches):
for i, (inputs, _) in enumerate(train_loader):
inputs = inputs.view(-1.28*28).cuda()
optimizer.zero_grad()
hidden, outputs = net(inputs)
hiddens.append(hidden)
loss = loss_func(outputs, inputs)
loss.backward()
optimizer.step()
if i % 100= =0:
result.append(float(loss))
if i % 500= =0:
generate_test(test_inputs, 'generation')
Copy the code
In the end, we can see that the image generated by model training is very close to the real image inputThe margin of error is 0.03
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
# Visualization results
view_data = Variable((train_dataset.train_data[:500].type(torch.FloatTensor).view(-1.28*28) / 255. - 0.5) / 0.5).cuda()
encode, _ = net(view_data) Extract the eigenvalues of compression
fig = plt.figure(2)
ax = Axes3D(fig) # 3 d figure
# x, y, z
X = encode.data[:, 0].cpu().numpy()
Y = encode.data[:, 1].cpu().numpy()
Z = encode.data[:, 2].cpu().numpy()
values = train_dataset.train_labels[:500].numpy() # label value
for x, y, z, s in zip(X, Y, Z, values):
c = cm.rainbow(int(255*s/9)) # color
ax.text(x, y, z, s, backgroundcolor=c) # the seat
ax.set_xlim(X.min(), X.max())
ax.set_ylim(Y.min(), Y.max())
ax.set_zlim(Z.min(), Z.max())
plt.show()
Copy the code
In three dimensions, the distribution of the numbers is as follows
Finally, according to the position on the graph, write some random input features to the decoder to see whether it can get the image we want
The position of 4 is around 0, around -0.5, and between -0.5 and 0
code = Variable(torch.FloatTensor([[0.02, -0.543, -0.012]])).cuda()
decode = net.decoder(code)
decode_img = decode.data.reshape(28.28).cpu().numpy() * 255
plt.imshow(decode_img.astype('uint8')) # generate images
plt.show()
Copy the code
What if I set some data that’s not on the graph?
It turns out to be a question mark, which is kind of interesting