PaddleX is used to quickly realize image classification training

Links: aistudio.baidu.com/aistudio/co…

PaddleX version 1.3 documentation: PaddleX. Readthedocs. IO/zh_CN/relea…

Use paddle directly for testing

Verify paddle installation
python -c "import paddle; paddle.utils.run_check()"Copy the code
If the following figure is displayed, the installation is successful

The preparatory work

Install Paddlex with PIP Install

PIP install paddlex = = 1.3.7 -i https://mirror.baidu.com/pypi/simpleCopy the code

Install PyCocoTools with PIP Install

pip install pycocotools -i https://mirror.baidu.com/pypi/simple
Copy the code

Verify the PaddleX installation

python -c "import paddlex as pdx; print(pdx.__version__)"Copy the code

If the following figure is displayed, the installation is successful

PaddleX is a more concise SET of apis and comes with a one-click download and install graphical development client. Using PaddleX to achieve image classification training is very fast and has a small amount of code.

Data processing

unzip cat_data_sets_models.zip
Copy the code

Extract the complete

Let’s go to the data_sets and look at the data:

Construct the required data

Labels.txt (mainly the name of the category tag)

Make train_list. TXT and val_list.txt

We shuffled the data in the original train_list.txt and used 70% as training files and 30% as verification images
A Python script is required for partitioning

import random

rate = 0.7
with open("train_list_origin.txt"."r") as f:
    datas = f.readlines()

index = int(len(datas) * rate)
random.shuffle(datas)
train_list = datas[:index]
val_list = datas[index:]
with open("train_list.txt"."w") as f:
    for data in train_list:
        f.write(data)

with open("val_list.txt"."w") as f:
    for data in val_list:
        f.write(data)
Copy the code

Train_list.txt (Path and label of training image)

Val_list.txt (Verify the path and label of the image)

Tectonic test_list. TXT

You need a Python script to do this

import os

files = os.listdir("cat_12_test")
f = open("test_list.txt"."w")
for file in files:
    f.write(os.path.join("cat_12_test",file) + "\n")
f.close()
Copy the code

test_list.txt

So far, the data has been processed completely

PaddleX completes the image classification

The main path where the program runs isdata_setsCreate a new output directory under this directory

The following is the code of the program

from paddlex.cls import transforms
import paddlex as pdx 

train_transforms = transforms.Compose([
    transforms.RandomCrop(crop_size=224),
    transforms.RandomHorizontalFlip(),
    transforms.Normalize()
])
eval_transforms = transforms.Compose([
    transforms.ResizeByShort(short_size=256),
    transforms.CenterCrop(crop_size=224),
    transforms.Normalize()
])

train_dataset = pdx.datasets.ImageNet(
    data_dir='cat_12',
    file_list='cat_12/train_list.txt',
    label_list='cat_12/labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
    data_dir='cat_12',
    file_list='cat_12/val_list.txt',
    label_list='cat_12/labels.txt',
    transforms=eval_transforms)

num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV3_small_ssld(num_classes=num_classes)

model.train(num_epochs=30,
            train_dataset=train_dataset,
            train_batch_size=32,
            eval_dataset=eval_dataset,
            lr_decay_epochs=[4.6.8],
            save_dir='output/mobilenetv3_small_ssld',
            use_vdl=True)
Copy the code

Three points to note from the picture above
- The first execution of the program, the program works
- Second, we can see the output of epoch, loss, ACC and LR key data
- Cv2 failed to read the image
  - Read carefully found that yGcJHV8Uuft6grFs7QWnK5CTAZvYzdDO. JPG image has a problem, in train_list. TXT or val_list. TXT removed

After careful analysis, we found that there were problems with 5 images in the training and validation set and one image in the test set

Training and validation sets

yGcJHV8Uuft6grFs7QWnK5CTAZvYzdDO.jpg

tO6cKGH8uPEayzmeZJ51Fdr2Tx3fBYSn.jpg

YfsxcFB9D3LvkdQyiXlqnNZ4STwope2r.jpg

5nKsehtjrXCZqbAcSW13gxB8E6z2Luy7.jpg

3yMZzWekKmuoGOF60ICQxldhBEc9Ra15.jpg

The test set

Qt29gPjYZwv3B6RJh5yiTWXrVImue1FH.jpg

Observe the training

The first epoch

The information we can see is that this is the first round of 30 epoches. After the training, the accuracy rate of the training set is the lowest 77.24%, the highest 98.9%, and the accuracy rate of the set to be verified is 77.24%. The current round is the best parameter round.

The fifteenth Epoch

The information we can see is that this is the 15th round of 30 epoches. after the training, the accuracy rate of the training set is the lowest 87.92%, the highest 99.26%, and the accuracy rate of the set to be verified is 88.54%. In addition, we can also see that our learning rate has become LR = 2.5E-05, which is related to the attenuation of the learning rate we set.

Epoch thirty

The information we can see is that this is the 30th round of 30 epoches. after the training, the accuracy rate of the training set is 88.54% at the lowest and 99.07% at the highest, and 89.0% at the set to be verified.

Verify in the test set

The verification part of the code is as follows:

import paddlex as pdx
import os
model = pdx.load_model('output/mobilenetv3_small_ssld/best_model')
cf = open("result.csv"."w")
with open("./cat_12/test_list.txt"."r") as f:
    test_datas = f.readlines()
    for t_data in test_datas:
        result = model.predict(os.path.join("cat_12",t_data.strip()))
        cf.write(t_data.strip().split("/") [1] + "," + str(result[0] ["category_id"]) + "\n")
print("Forecast completed")
Copy the code

Submit the result. The CSV

Download result.csv from paddle and upload the paddle so that the official can determine the result and modify their own build process based on the submitted result.

We’ll get the test results in a few minutes

This time the score was 85.83%, which could serve as a simple baseline.

Improved method

Modify the decay mode of learning rate

In our previous code, our LR was changed to 2.5E-05 after the 8th epoch, which was too small, so we could hardly change anything when gradient attenuation was carried out later, so we adjusted it

Take a look at the adjusted training results

In the 12th round, the result was better than the last 30 rounds of training. We can see that the LR at this time is 2.5E-04, which proves the effect.

Finally, the accuracy of our training on the verification set is 92.57%. Submit it and see the effect

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

12 cat categories -PaddleX use

PaddleX is used to quickly realize image classification training

The preparatory work

Install Paddlex with PIP Install

Install PyCocoTools with PIP Install

Verify the PaddleX installation

Data processing

Construct the required data

Labels.txt (mainly the name of the category tag)

Make train_list. TXT and val_list.txt

Tectonic test_list. TXT

PaddleX completes the image classification

Observe the training

Verify in the test set

Submit the result. The CSV

Improved method

Modify the decay mode of learning rate

league

12 cat categories -PaddleX use

PaddleX is used to quickly realize image classification training

The preparatory work

Install Paddlex with PIP Install

Install PyCocoTools with PIP Install

Verify the PaddleX installation

Data processing

Construct the required data

Labels.txt (mainly the name of the category tag)

Make train_list. TXT and val_list.txt

Tectonic test_list. TXT

PaddleX completes the image classification

Observe the training

Verify in the test set

Submit the result. The CSV

Improved method

Modify the decay mode of learning rate

league

Related Posts

Solving zero-wait problem of production scheduling Based on MATLAB Immune Algorithm

Wewin innovation Conference topic first bullet: AI threat theory! ? Wrong!

Machine Learning: LighTGBM (Combat: Classification && Regression)