Today, I will take you to explore the application of deep learning in the medical field — brain tumor recognition. Brain tumor, also known as intracranial tumor, is the main disease of intracranial space-occupying lesions, and ranks second only to leukemia in the malignant lesions most likely to afflict children. Data show that there are 7000 ~ 8000 new children brain tumor patients in China every year, and 70% ~ 80% of them are malignant. The younger the patient, the faster the onset of brain tumor, the higher the malignant degree of tumor, so early detection. Treatment has become one of the important ways to reduce the harm of diseases.

We used 253 brain scans, 155 of which were from patients with brain tumors and 98 from normal subjects. The algorithm was MobileNetV2, and the final recognition accuracy was 90.0% and AUC value was 0.869.

Key points of this time: Compared with the previous cases in “100 Cases of Deep Learning”, we will add THE AUC evaluation index to evaluate the recognition effect of brain tumor recognition. AUC (Area under the Curve of ROC) is the Area under the ROC Curve, which is the standard to judge the merits and disadvantages of dichotomous prediction model.

My environment:

  • Language: Python3.8
  • Compiler: Jupyter Lab
  • Deep learning environment: TensorFlow2.4.1

Our code flow chart looks like this:

1. Set GPU

import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0] # If there are multiple Gpus, use only the 0th GPU
    tf.config.experimental.set_memory_growth(gpu0, True) Set GPU memory usage as required
import matplotlib.pyplot as plt
import os,PIL,pathlib
import numpy as np
import pandas as pd
import warnings
from tensorflow import keras

warnings.filterwarnings("ignore")             # Ignore warning messages
plt.rcParams['font.sans-serif'] = ['SimHei']  # used to display Chinese labels normally
plt.rcParams['axes.unicode_minus'] = False    # is used to display the minus sign normally
2. Import data

1. Import data

import pathlib

data_dir = "./35-day-brain_tumor_dataset"
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('* / *')))
print("Total number of pictures is:",image_count)
Total number of images: 253Copy the code
batch_size = 16
img_height = 224
img_width  = 224
"" "about image_dataset_from_directory () articles detailing can refer to:, "" "
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    image_size=(img_height, img_width),
Found 253 files belonging to 2 classes.
Using 203 files for training.
"" "about image_dataset_from_directory () articles detailing can refer to:, "" "
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    image_size=(img_height, img_width),
Found 253 files belonging to 2 classes.
Using 50 files for validation.
class_names = train_ds.class_names
['no', 'yes']
2. Check the data

for image_batch, labels_batch in train_ds:
(16, 224, 224, 3)
3. Configure the data set

  • Shuffle () : disturb data, detailed introduction about this function can be reference:
  • Prefetch () : Prefetch data and accelerate operation. You can refer to my previous two articles for detailed introduction.
  • Cache () : Cache data sets into memory to speed up operations

def train_preprocessing(image,label) :
    return (image/255.0,label)

train_ds = (
    .shuffle(1000).map(train_preprocessing)    The preprocessor function can be set here
#. Batch (batCH_size) # Batch_size has been set in image_dataset_from_directory

val_ds = (
    .shuffle(1000).map(train_preprocessing)    The preprocessor function can be set here
#. Batch (batCH_size) # Batch_size has been set in image_dataset_from_directory
4. Data visualization

plt.figure(figsize=(10.8))  The width of the graph is 10 and the height is 5
plt.suptitle("Reply from public account: DL+35, get data")

class_names = ["Patients with brain tumors."."Normal"]

for images, labels in train_ds.take(1) :for i in range(15):
        plt.subplot(4.5, i + 1)

        # display images
        # display tag
Third, build the model

from tensorflow.keras import layers, models, Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout,BatchNormalization,Activation

Load the pretraining model
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(weights='imagenet',

for layer in base_model.layers:
    layer.trainable = True
X = base_model.output
"" notice that the original model (MobileNetV2) has been overfitted, and the overfitting is significantly improved by adding a Dropout layer. You can try tweaking your code to see the difference between the Annotation Dropout layer and unannotation. ""
X = Dropout(0.4)(X)

output = Dense(len(class_names), activation='softmax')(X)
model = Model(inputs=base_model.input, outputs=output)

# model.summary()
Four, compile,

5. Training model

from tensorflow.keras.callbacks import ModelCheckpoint, Callback, EarlyStopping, ReduceLROnPlateau, LearningRateScheduler


# Set dynamic learning rate
annealer = LearningRateScheduler(lambda x: 1e-3 * 0.99 ** (x+NO_EPOCHS))

# Set early stop
earlystopper = EarlyStopping(monitor='val_acc', patience=PATIENCE, verbose=VERBOSE)

checkpointer = ModelCheckpoint('best_model.h5',
train_model  =,
                  callbacks=[earlystopper, checkpointer, annealer])
Epoch 1/50 13/13 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 7 s 145 ms/step - loss: 3.1000 accuracy: 0.6700 - val_loss: 1.7745 - val_accuracy: 0.6400 WARNING: Tensorflow :Early stopping Conditioned on metric 'val_acc' which is not available. Available metrics are: loss,accuracy,val_loss,val_accuracy Epoch 00001: Val_accuracy Improved from -INF to 0.64000, saving model to best_model.h5...... Epoch 49/50 13/13 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 1 s 60 ms/step - loss: 3.0536 e-08 - accuracy: 1.0000 - val_loss: 2.6647 - val_accuracy: 0.8800 WARNING: Tensorflow :Early stopping Conditioned on metric 'val_acc' which is not available. Available metrics are: loss,accuracy,val_loss,val_accuracy Epoch 00049: 0.90000 Epoch val_accuracy did not improve the from 50/50 13/13 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 1 s 60 ms/step - loss: 1.4094E-08-accuracy: 1.0000-val_loss: 2.6689 - val_accuracy: 0.8800 WARNING: Tensorflow :Early stopping Conditioned on metric 'val_acc' which is not available. Available metrics are: Loss,accuracy, VAL_loss, VAL_accuracy Epoch 00050: VAL_accuracy did not improve from 0.90000Copy the code

6. Model evaluation

1. Confusion matrix

from sklearn.metrics import confusion_matrix
import seaborn as sns
import pandas as pd

Define a function that plots the obfuscation matrix
def plot_cm(labels, predictions) :
    Generate an obfuscation matrix
    conf_numpy = confusion_matrix(labels, predictions)
    Convert the matrix to a DataFrame
    conf_df = pd.DataFrame(conf_numpy, index=class_names ,columns=class_names)  
    sns.heatmap(conf_df, annot=True, fmt="d", cmap="BuPu")
    plt.title('Confusion matrix',fontsize=15)
    plt.ylabel('True value',fontsize=14)
    plt.xlabel('Predicted value',fontsize=14)
val_pre   = []
val_label = []

for images, labels in val_ds:Here we can take part of the validation data (.take(1)) to generate the confusion matrix
    for image, label in zip(images, labels):
        Need to add a dimension to the image
        img_array = tf.expand_dims(image, 0) 
        # Use models to predict people in pictures
        prediction = model.predict(img_array)

Copy the code
plot_cm(val_label, val_pre)
2. Evaluation of indicators

from sklearn import metrics

def test_accuracy_report(model) :
    print(metrics.classification_report(val_label, val_pre, target_names=class_names)) 
    score = model.evaluate(val_ds, verbose=0)
    print('Loss function: %s, accuracy:' % score[0], score[1])
Precision recall F1-score support For brain tumor patients 0.94 0.89 0.92 37 normal controls 0.73 0.85 0.79 13 accuracy 0.88 50 macro AVG 0.84 0.87 0.85 50 weighted AVG 0.89 0.88 0.88 50 Loss function: 2.668877601623535, accuracy: 0.8799999952316284Copy the code

3. The AUC

In a word, AUC (Area under the Curve of ROC) is the Area under the ROC Curve, which is the standard to judge the merits and demerits of dichotomous prediction model.

  • AUC = 1: it is a perfect classifier. There is no perfect classifier in most prediction situations.
  • 0.5 < AUC < 1: better than random guess.
  • AUC = 0.5: As with random guesses (e.g., coin toss), the model has no predictive value.
  • AUC < 0.5: worse than a random guess.

The abscissa of ROC curve is the False Positive Rate (False Positive Rate), the ordinate is the True Positive Rate (True Positive Rate), and the corresponding True negative Rate (True negative Rate). True Negative Rate) and False Negative Rate (False Negative class Rate, False Negative Rate). The calculation methods of these four categories are as follows:

  • False positive rate (FPR) : The percentage of all samples that are actually negative that are wrongly judged to be positive.
  • True positive rate (TPR) : The percentage of all actual positive samples that are correctly judged to be positive.
  • False negative rate (FNR) : The percentage of all samples that are actually positive that are wrongly predicted to be negative.
  • True negative rate (TNR) : The percentage of all samples that are actually negative that are correctly predicted to be negative.
val_pre   = []
val_label = []
for images, labels in val_ds:Here we can take part of the validation data (.take(1)) to generate the confusion matrix
    for image, label in zip(images, labels):
        Need to add a dimension to the image
        img_array = tf.expand_dims(image, 0)
        # Use models to predict people in pictures
        prediction = model.predict(img_array)

train_pre   = []
train_label = []
for images, labels in train_ds:Here we can take part of the validation data (.take(1)) to generate the confusion matrix
    for image, label in zip(images, labels):
        Need to add a dimension to the image
        img_array = tf.expand_dims(image, 0)
        # Use models to predict people in pictures
        prediction = model.predict(img_array)

Sklearn.metrics. Roc_curve () : Used to draw ROC curves

Main parameters:

  • y_true: real sample label, default {0,1} or {-1, 1}. If you want to set it to something else, the pos_label parameter is set to a specific value. For example, to set the sample label to {1,2}, where 2 indicates positive sample, pos_label=2.
  • y_score: Predicted results for each sample.
  • pos_label: Label of positive sample.

The return value:

  • fpr: False positive rate.
  • tpr: True positive rate.
  • thresholds
def plot_roc(name, labels, predictions, **kwargs) :
    fp, tp, _ = metrics.roc_curve(labels, predictions)

    plt.plot(fp, tp, label=name, linewidth=2, **kwargs)
    plt.plot([0.1], [0.1], color='gray', linestyle=The '-')
    plt.xlabel('False positives rate')
    plt.ylabel('True positives rate')
    ax = plt.gca()
plot_roc("Train Baseline", train_label, train_pre, color="green", linestyle=':')
plot_roc("val Baseline", val_label, val_pre, color="red", linestyle=The '-')

plt.legend(loc='lower right')
auc_score = metrics.roc_auc_score(val_label, val_pre)
print("The AUC value is:",auc_score)
The AUC value is 0.869022869022869Copy the code