TCN basic structure

Temporal Convolutional Network (TCN) is developed by Shaojie Bai et al. Proposed in 2018, can be used for time series data processing, please see the paper for details.

(1) Causal Convolution

Causal convolution is shown in the figure above. The value at time T in the previous layer depends only on the value at time T in the next layer and before it. The difference from traditional convolutional neural networks is that causal convolution cannot see the future data, and it is a one-way structure rather than a two-way one. In other words, there are antecedents before there are consequences. It is a strict time-constrained model, so it is called causal convolution.

2. Dilated Convolution

Dilated Convolution is also called empty Convolution

Pure causal convolution still has the problem of traditional convolutional neural networks, that is, the length of modeling time is limited by the size of the convolution kernel. If you want longer dependencies, you need to stack many layers linearly. To solve this problem, the researchers proposed dilatative convolution, as shown in the figure below.

Unlike traditional convolution, dilatative convolution allows the input to be sampled at intervals during convolution, and the sampling rate is controlled by DDD in the figure. D = 1D = 1D =1 in the bottom layer means sampling every point in the input process, and D =2d =2d =2 in the middle layer means sampling every two points as input. In general, the higher the level, the greater the value of d. Therefore, expansion convolution makes the size of the effective window grow exponentially with the number of layers. In this way, convolutional networks can use fewer layers and obtain large receptive fields.

3. Residual Connections

Residual connections prove to be an effective way to train deep networks, allowing networks to transmit information in a cross-layer manner.

A residual block is constructed to replace the convolution layer. As shown in the figure above, a residual block contains two layers of convolution and nonlinear mapping, and adds the WeightNorm and Dropout to each layer to regularize the network.

TCN summary

advantages

(1) Parallelism. When given a sentence, TCN can process sentences in parallel, rather than sequentially, as RNN does.

(2) Flexible receptive field. The size of the receptive field of TCN is determined by the number of layers, the size of the convolution kernel and the expansion coefficient. Flexible customization can be carried out according to different characteristics of different tasks.

(3) Gradient stability. RNN often has the problem of gradient disappearance and gradient explosion, which is mainly caused by sharing parameters in different time periods. Like traditional convolutional neural networks, TCN does not have the problems of gradient disappearance and gradient explosion.

(4) Lower memory. RNN needs to save the information of each step when it is used, which takes up a lot of memory. The convolution kernel of TCN is shared in one layer, which has lower memory usage.

disadvantages

(1) TCN may not be so adaptable in transfer learning. This is because the amount of historical information required for model predictions may vary in different domains. Therefore, when migrating a model from a problem requiring less memory information to a problem requiring longer memory, TCN performance may be poor because its receptive field is not large enough.

(2) The TCN described in this paper is still unidirectional. For tasks such as speech recognition and speech synthesis, purely unidirectional structures are still very useful. However, most text uses bidirectional structure. TCN can be easily extended to a bidirectional structure by using a traditional convolution structure instead of causal convolution.

(3) TCN is, after all, a variant of convolutional neural network. Although extended convolution can enlarge the receptive field, it is still limited. The ability to retrieve relevant information of any length is still poor compared to Transformer. The use of TCN in text has yet to be tested.

TCN application

MINST handwritten numeric classification

Multiple features correspond to a label, i.e. (x i1,x i2,x i3… X in minus y I

Local Environment:

Python 3.6 IDE: PycharmCopy the code

Library version:

Keras 2.2.0 numpy 1.16.2 Tensorflow 1.9.0Copy the code

1. Download the dataset

MINST data set

2. Create tcn. py and enter the following code

Keras-based TCN

# TCN for minst data
from tensorflow.examples.tutorials.mnist import input_data
from keras.models import Model
from keras.layers import add, Input, Conv1D, Activation, Flatten, Dense


# Load data
def read_data(path) :
    mnist = input_data.read_data_sets(path, one_hot=True)
    train_x, train_y = mnist.train.images.reshape(-1.28.28), mnist.train.labels,
    valid_x, valid_y = mnist.validation.images.reshape(-1.28.28), mnist.validation.labels,
    test_x, test_y = mnist.test.images.reshape(-1.28.28), mnist.test.labels
    return train_x, train_y, valid_x, valid_y, test_x, test_y


# Residual block
def ResBlock(x, filters, kernel_size, dilation_rate) :
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate, activation='relu')(
        x)  # first convolution
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate)(r)  # Second convolution


    if x.shape[-1] == filters:
        shortcut = x
    else:
        shortcut = Conv1D(filters, kernel_size, padding='same')(x)  	# shortcut (shortcut)
    o = add([r, shortcut])
    # Activation function
    o = Activation('relu')(o)  
    return o


# Sequence Model
def TCN(train_x, train_y, valid_x, valid_y, test_x, test_y, classes, epoch) :
    inputs = Input(shape=(28.28))
    x = ResBlock(inputs, filters=32, kernel_size=3, dilation_rate=1)
    x = ResBlock(x, filters=32, kernel_size=3, dilation_rate=2)
    x = ResBlock(x, filters=16, kernel_size=3, dilation_rate=4)
    x = Flatten()(x)
    x = Dense(classes, activation='softmax')(x)
    model = Model(input=inputs, output=x)
    # View network structure
    model.summary()
    # Compile model
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    # Training model
    model.fit(train_x, train_y, batch_size=500, nb_epoch=epoch, verbose=2, validation_data=(valid_x, valid_y))
    Assessment model
    pre = model.evaluate(test_x, test_y, batch_size=500, verbose=2)
    print('test_loss:', pre[0].'- test_acc:', pre[1])

# MINST numbers from 0 to 9 total 10, i.e., 10 categories
classes = 10
epoch = 30
train_x, train_y, valid_x, valid_y, test_x, test_y = read_data('MNIST_data')
#print(train_x, train_y)

TCN(train_x, train_y, valid_x, valid_y, test_x, test_y, classes, epoch)
Copy the code

Results 3.

Test_loss: 0.05342669463425409 – test_ACC: 0.987100002169609

Multiple tags

Multiple features correspond to multiple labels, such as (x i1,x I2,x i3… X in – (y i1, y i2)

You only need to modify according to the above code, rebuild the training and test data, and set the corresponding input and output dimensions, parameters and other information.

Local Environment:

Python 3.6 IDE: PycharmCopy the code

Library version:

Keras 2.2.0 numpy 1.16.2 Pandas 0.24.1 SKlearn 0.20.1 tensorflow 1.9.0Copy the code

Specific code:

# TCN for indoor location
import math
from tensorflow.examples.tutorials.mnist import input_data
from keras.models import Model
from keras.layers import add, Input, Conv1D, Activation, Flatten, Dense
import numpy as np
import pandas
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

Create sequence data
def create_dataset(dataset, look_back=1) :
	dataX, dataY = [], []
	for i in range(len(dataset)-look_back-1):
		a = dataset[i:(i+look_back), :]
		dataX.append(a)
		dataY.append(dataset[i + look_back, -2:)return np.array(dataX), np.array(dataY)

# Residual block
def ResBlock(x, filters, kernel_size, dilation_rate) :
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate, activation='relu')(
        x)  # first convolution
    r = Conv1D(filters, kernel_size, padding='same', dilation_rate=dilation_rate)(r)  # Second convolution


    if x.shape[-1] == filters:
        shortcut = x
    else:
        shortcut = Conv1D(filters, kernel_size, padding='same')(x)  # shortcut (shortcut)
    o = add([r, shortcut])
    o = Activation('relu')(o)  # Activation function
    return o

# Sequence Model
def TCN(train_x, train_y, test_x, test_y, look_back, n_features, n_output, epoch) :
    inputs = Input(shape=(look_back, n_features))
    x = ResBlock(inputs, filters=32, kernel_size=3, dilation_rate=1)
    x = ResBlock(x, filters=32, kernel_size=3, dilation_rate=2)
    x = ResBlock(x, filters=16, kernel_size=3, dilation_rate=4)
    x = Flatten()(x)
    x = Dense(n_output, activation='softmax')(x)
    model = Model(input=inputs, output=x)
    # View network structure
    model.summary()
    # Compile model
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    # Training model
    model.fit(train_x, train_y, batch_size=500, nb_epoch=epoch, verbose=2)
    # Assessment model
    pre = model.evaluate(test_x, test_y, batch_size=500, verbose=2)
    # print(pre)
    print('test_loss:', pre[0].'- test_acc:', pre[1])
 

# public parameters
np.random.seed(7)
features = 24
output = 2
EPOCH = 30
look_back = 5

trainPath = '.. /data/train.csv'
testPath  = '.. /data/test.csv'

trainData = pandas.read_csv(trainPath, engine='python')
testData = pandas.read_csv(testPath, engine='python')

# features = 1
dataset = trainData.values
dataset = dataset.astype('float32')

datatestset = testData.values
datatestset = datatestset.astype('float32')
# print(dataset)

# normalize the dataset
scaler = MinMaxScaler(feature_range=(0.1))
dataset = scaler.fit_transform(dataset)
datatestset = scaler.fit_transform(datatestset)

trainX, trainY = create_dataset(dataset, look_back)
testX, testY = create_dataset(datatestset, look_back)
# print(trainX)
print(len(trainX), len(testX))
print(testX.shape)
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], features))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], features))

# train_x, train_y, valid_x, valid_y, test_x, test_y = read_data('MNIST_data')
print(trainX, trainY)

TCN(trainX, trainY,  testX, testY, look_back, features, output, EPOCH)

Copy the code

To be continued…

The resources

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

TCN-Time Convolutional Network

Keras-based time domain convolutional network (TCN)

Keras-TCN

[Tensorflow] Implementing Temporal Convolutional Networks