Speed comparison of mainstream Deep Learning Hardware (CPU, GPU, TPU)

Personal Home page –> www.yansongsong.cn

Welcome to xiao Song’s public account “Minimalist AI” to teach you deep learning:

Based on the sharing of theoretical learning and application development technology of deep learning, the author will often share the dry contents of deep learning. When learning or applying deep learning, you can also communicate with me on this page if you have any questions.

From CSDN blog expert & Zhihu deep learning columnist @Xiaosong yes

Related reading:

· How PyTorch uses GPU acceleration (conversion of CPU to GPU data)

·TensorFlow&KerasGPU

 

We implemented Cifar10 data set classification based on CNN, tested the same code in different mainstream deep learning, and obtained the comparative data of training speed.

Mainstream deep learning hardware speed comparison

(Colab TPU) speed 382s/epoch

(I5 8250U) Speed 320S /epoch

(I7 9700K) speed 36s/epoch

(GPU MX150) speed 36s/epoch

(Colab GPU) speed 16s/epoch

(GPU GTX 1060) speed 9s/epoch

(GPU GTX1080ti) speed 4s/epoch

Compared with the CPU of ordinary comparison notebook (I5 8250U), an entry-level graphics card (GPU MX150) can improve the speed by about 8 times, while high-performance graphics card (GPU GTX1080ti) can improve the speed by 80 times. If multiple Gpus are used, the speed will be faster. Therefore, GPU is recommended for regular training.

You are also welcome to run the following code on your computer to compare the speed. My computer CPU 320S /epoch. Code section


from tensorflow import keras

from keras.datasets import cifar10

import numpy as np

batch_size = 100
num_classes = 10
epochs = 10

# Data loading
(x_train, train_labels), (x_test, test_labels) = cifar10.load_data()

print(x_train.shape)

train_images = x_train.reshape([-1.32.32.3) /255.0
test_images = x_test.reshape([-1.32.32.3) /255.0


model = keras.Sequential([
    # (1,32,32,3) - > (1,32,32,16)
    keras.layers.Conv2D(input_shape=(32.32.3),filters=32,kernel_size=3,strides=1,padding='same'),     # Padding method),
    # (1,32,32,32) - > (1,32,32,32)
    keras.layers.Conv2D(filters=32,kernel_size=3,strides=1,padding='same'),     # Padding method),
    # (1,32,32,32) - > (1,16,16,32)
    keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same'),
    # (1,16,16,32) - > (1,16,16,64)
    keras.layers.Conv2D(filters=64,kernel_size=3,strides=1,padding='same'),     # Padding method),
    # (1,16,16,64) - > (1,16,16,64)
    keras.layers.Conv2D(filters=64,kernel_size=3,strides=1,padding='same'),     # Padding method),
    # (1,16,16,64) - > (1,8,8,64)
    keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same'),
    # (1,8,8,64) - > (1, 8 * 8 * 128)
    keras.layers.Conv2D(filters=128,kernel_size=3,strides=1,padding='same'),     # Padding method),
    # (1,8,8,128) - > (1, 8 * 8 * 128)
    keras.layers.Conv2D(filters=128,kernel_size=3,strides=1,padding='same'),     # Padding method),
    # (1,8,8,128) - > (1, 8 * 8 * 128)
    keras.layers.Flatten(),
    # (1, 8 * 8 * 128) - > (1256)
    keras.layers.Dropout(0.3),
    keras.layers.Dense(128, activation="relu"),
    # (1256) - > (1, 10)
    keras.layers.Dense(10, activation="softmax")])print(model.summary())

model.compile(optimizer="adam",
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_images, train_labels, batch_size = batch_size, epochs=epochs,validation_data=[test_images[:1000],test_labels[:1000]])

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(np.argmax(model.predict(test_images[:20]),1),test_labels[:20])
Copy the code

 

Output (GPU GTX 1080 TI)

 

python demo.py
Using TensorFlow backend.
(50000, 32, 32, 3)
_________________________________________________________________
Layer (type)                 Output Shape              Param #================================================================= conv2d (Conv2D) (None, 32, 32, 32) 896 _________________________________________________________________ conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 16, 16, 64) 18496 _________________________________________________________________ conv2d_3 (Conv2D) (None, 16, 16, 64) 36928 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 8, 8, 128) 73856 _________________________________________________________________ conv2d_5 (Conv2D) (None, 8, 8, 128) 147584 _________________________________________________________________ flatten (Flatten) (None, 8192) 0 _________________________________________________________________ dropout (Dropout) (None, 8192) 0 _________________________________________________________________ dense (Dense) (None, 128) 1048704 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 1337002 Trainable params: 1337002 Non - trainable params: 0 _________________________________________________________________ None Train on 50000 samples, Validate on 1000 samples Epoch 1/10 2019-03-15 17:07:34.477745: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not Compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-03-15 17:07:34.552699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA nodereadfrom SysFS had negative value (-1), but there must be at least one NUMA node, So returning NUMA node zero 2019-03-15 17:07:34: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1080 Ti Major: 6 Minor: 1 memoryClockRate(GHz): 1.6325 pciBusID: 001:01:00.0 totalMemory: 10.92GiB freeMemory: 10.68GiB 2019-03-15 17:07:34.553049: I tensorflow/core/common_runtime/ GPU /gpu_device.cc:1511] Adding Visible GPU Devices :0 2019-03-15 17:07:33.737306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-15 17:07:34.737335: I tensorflow/core/ Common_runtime/GPU /gpu_device.cc:988] 0 2019-03-15 17:07:34.737340: I tensorflow/core/ COMMON_Runtime/GPU /gpu_device.cc:988 I tensorflow/core/common_runtime/ GPU /gpu_device.cc:1001] 0: N I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10327 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, PCI bus ID: 0000:01:00.0, compute Capability: 6.1) 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 5 s 103 us/step - loss: 1.3343 acc: 0.5256 - val_loss: 1.0300 - val_acc: 0.6450 Epoch 2/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.9668 acc: 0.6660 - val_loss: 0.8930 - val_ACC: 0.6820 Epoch 3/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.8349 acc: 0.7097 - val_loss: 0.8486 - val_acc: 0.7130 Epoch 4/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.7496 acc: 0.7412 - val_loss: 0.8823 - val_ACC: 0.7040 Epoch 5/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.6805 acc: 0.7643 - val_loss: 0.8710 - val_acc: 0.7060 Epoch 6/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.6256 acc: 0.7833 - val_loss: 0.9150 - val_ACC: 0.7020 Epoch 7/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 77 us/step - loss: 0.5715 acc: 0.8000 - val_loss: 0.8586 - val_acc: 0.7140 Epoch 8/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.5312 acc: 0.8143-VAL_loss: 0.9455-val_ACC: 0.7030 Epoch 9/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 77 us/step - loss: 0.4878 acc: 0.8287 - val_loss: 1.0063 - val_acc: 0.7360 Epoch 10/10 50000/50000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 4 s 76 us/step - loss: 0.4474 acc: 0.8438 - VAL_loss: 1.0609 - val_ACC: 0.7030 10000/10000 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 54 us/step [3 8 8 0 6 6 1 3 1 4 September 4 7 9 5 5 8 August 6], [[3] [8] [8] [0] [6] [6] [1] [6] [3] [1] [0] [9] [5] [7] [9] [8] [5] [7] [8] [6]]Copy the code