PaddleX is used to quickly realize image classification training
Links: aistudio.baidu.com/aistudio/co…
PaddleX version 1.3 documentation: PaddleX. Readthedocs. IO/zh_CN/relea…
Use paddle directly for testing
Verify paddle installation
python -c "import paddle; paddle.utils.run_check()"Copy the code
If the following figure is displayed, the installation is successful
The preparatory work
Install Paddlex with PIP Install
PIP install paddlex = = 1.3.7 -i https://mirror.baidu.com/pypi/simpleCopy the code
Install PyCocoTools with PIP Install
pip install pycocotools -i https://mirror.baidu.com/pypi/simple
Copy the code
Verify the PaddleX installation
python -c "import paddlex as pdx; print(pdx.__version__)"Copy the code
If the following figure is displayed, the installation is successful
PaddleX is a more concise SET of apis and comes with a one-click download and install graphical development client. Using PaddleX to achieve image classification training is very fast and has a small amount of code.
Data processing
unzip cat_data_sets_models.zip
Copy the code
Extract the complete
Let’s go to the data_sets and look at the data:
Construct the required data
Labels.txt (mainly the name of the category tag)
Make train_list. TXT and val_list.txt
- We shuffled the data in the original train_list.txt and used 70% as training files and 30% as verification images
- A Python script is required for partitioning
import random
rate = 0.7
with open("train_list_origin.txt"."r") as f:
datas = f.readlines()
index = int(len(datas) * rate)
random.shuffle(datas)
train_list = datas[:index]
val_list = datas[index:]
with open("train_list.txt"."w") as f:
for data in train_list:
f.write(data)
with open("val_list.txt"."w") as f:
for data in val_list:
f.write(data)
Copy the code
- Train_list.txt (Path and label of training image)
- Val_list.txt (Verify the path and label of the image)
Tectonic test_list. TXT
- You need a Python script to do this
import os
files = os.listdir("cat_12_test")
f = open("test_list.txt"."w")
for file in files:
f.write(os.path.join("cat_12_test",file) + "\n")
f.close()
Copy the code
- test_list.txt
So far, the data has been processed completely
PaddleX completes the image classification
- The main path where the program runs is
data_sets
Create a new output directory under this directory
The following is the code of the program
from paddlex.cls import transforms
import paddlex as pdx
train_transforms = transforms.Compose([
transforms.RandomCrop(crop_size=224),
transforms.RandomHorizontalFlip(),
transforms.Normalize()
])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=256),
transforms.CenterCrop(crop_size=224),
transforms.Normalize()
])
train_dataset = pdx.datasets.ImageNet(
data_dir='cat_12',
file_list='cat_12/train_list.txt',
label_list='cat_12/labels.txt',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
data_dir='cat_12',
file_list='cat_12/val_list.txt',
label_list='cat_12/labels.txt',
transforms=eval_transforms)
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV3_small_ssld(num_classes=num_classes)
model.train(num_epochs=30,
train_dataset=train_dataset,
train_batch_size=32,
eval_dataset=eval_dataset,
lr_decay_epochs=[4.6.8],
save_dir='output/mobilenetv3_small_ssld',
use_vdl=True)
Copy the code
- Three points to note from the picture above
- The first execution of the program, the program works
- Second, we can see the output of epoch, loss, ACC and LR key data
- Cv2 failed to read the image
- Read carefully found that yGcJHV8Uuft6grFs7QWnK5CTAZvYzdDO. JPG image has a problem, in train_list. TXT or val_list. TXT removed
After careful analysis, we found that there were problems with 5 images in the training and validation set and one image in the test set
- Training and validation sets
- yGcJHV8Uuft6grFs7QWnK5CTAZvYzdDO.jpg
- tO6cKGH8uPEayzmeZJ51Fdr2Tx3fBYSn.jpg
- YfsxcFB9D3LvkdQyiXlqnNZ4STwope2r.jpg
- 5nKsehtjrXCZqbAcSW13gxB8E6z2Luy7.jpg
- 3yMZzWekKmuoGOF60ICQxldhBEc9Ra15.jpg
- The test set
- Qt29gPjYZwv3B6RJh5yiTWXrVImue1FH.jpg
Observe the training
The first epoch
The information we can see is that this is the first round of 30 epoches. After the training, the accuracy rate of the training set is the lowest 77.24%, the highest 98.9%, and the accuracy rate of the set to be verified is 77.24%. The current round is the best parameter round.
The fifteenth Epoch
The information we can see is that this is the 15th round of 30 epoches. after the training, the accuracy rate of the training set is the lowest 87.92%, the highest 99.26%, and the accuracy rate of the set to be verified is 88.54%. In addition, we can also see that our learning rate has become LR = 2.5E-05, which is related to the attenuation of the learning rate we set.
Epoch thirty
The information we can see is that this is the 30th round of 30 epoches. after the training, the accuracy rate of the training set is 88.54% at the lowest and 99.07% at the highest, and 89.0% at the set to be verified.
Verify in the test set
The verification part of the code is as follows:
import paddlex as pdx
import os
model = pdx.load_model('output/mobilenetv3_small_ssld/best_model')
cf = open("result.csv"."w")
with open("./cat_12/test_list.txt"."r") as f:
test_datas = f.readlines()
for t_data in test_datas:
result = model.predict(os.path.join("cat_12",t_data.strip()))
cf.write(t_data.strip().split("/") [1] + "," + str(result[0] ["category_id"]) + "\n")
print("Forecast completed")
Copy the code
Submit the result. The CSV
Download result.csv from paddle and upload the paddle so that the official can determine the result and modify their own build process based on the submitted result.
We’ll get the test results in a few minutes
This time the score was 85.83%, which could serve as a simple baseline.
Improved method
Modify the decay mode of learning rate
In our previous code, our LR was changed to 2.5E-05 after the 8th epoch, which was too small, so we could hardly change anything when gradient attenuation was carried out later, so we adjusted it
Take a look at the adjusted training results
In the 12th round, the result was better than the last 30 rounds of training. We can see that the LR at this time is 2.5E-04, which proves the effect.
Finally, the accuracy of our training on the verification set is 92.57%. Submit it and see the effect