Data set download address:

Link: pan.baidu.com/s/1l1AnBgkA… Extraction code: 2xQ4

In the training set, the cat and dog images are mixed together. Pytorch has two ways to read the Dataset. The first way is to put the images of different categories in the corresponding class folder, and the other way is to read the Dataset class, which inherited torch. Rewrite __getitem__ and __len__.

 

Separate the cats and dogs from the training set and place them in the dog and CAT folders:

Import glob import shutil import OS # data set directory path = "./ml/dogs-vs-cats/train" # train_path = path+'/train' # test set directory Def img_to_file(path) def img_to_file(path) test_path = path+'/test' Print (" = = = = = = = = = start moving picture = = = = = = = = = = = = ") # if there is no dog and cat folder, then if not OS. The new path. The exists (path + "/" dog ") : os.makedirs(path+"/dog") if not os.path.exists(path+"/cat"): OS. Makedirs (path + "/" cat) print (" a total of: ImgPath in glob. Glob (path+"/*.jpg")))) # imgPath in glob(path+"/*.jpg"): # # print (imgPath) using/img = imgPath. Divided into strip (" \ n "). The replace (" \ \ ", "/"). The split ("/") # print (img) # will be moved to the specified folder if picture img[-1].split(".")[0] == "cat": shutil.move(imgPath,path+"/cat") if img[-1].split(".")[0] == "dog": Shutil. Move (imgPath, path + "/" dog ") print (" = = = = = = move to complete the = = = = = = = = = = = = = = = ") img_to_file (train_path) print (" training set cats were: {} images ". The format (len (glob. Glob (train_path + "/ cat / *. JPG")))) print (" dog training set together: {} images ". The format (len (glob. Glob (train_path + "/ dog / *. JPG"))))Copy the code

Then 1250 images were extracted from dog and CAT respectively, and a total of 2500 images were taken as the test set.

import random def split_train_test(fileDir,tarDir): if not os.path.exists(tarDir): Os. makedirs(tarDir) pathDir = os.listdir(fileDir) # Filenumber =len(pathDir) rate=0.1 # makedirs(tarDir) pathDir = os.listdir(fileDir) Picknumber =int(filenumber*rate) # sample = random. Sample (pathDir, Picknumber) # picknumber number of randomly selected sample picture print (" = = = = = = = = = start moving picture = = = = = = = = = = = = ") for the name in the sample: shutil.move(fileDir+name, TarDir + name) print (" = = = = = = move to complete the = = = = = = = = = = = = = = = ") split_train_test (train_path + '/ dog/' test_path +'/dog/') split_train_test(train_path+'/cat/',test_path+'/cat/')Copy the code

Finally, we have the following structure:

Train contains 22,500 pictures, including 11,250 for dog and cat. Test contains 2500 images, including 1250 for dog and 1250 for CAT.

If the test set is still a little short, let’s go over it again.

Finally, Train contains 20250 pictures, including 10125 for dog and cat respectively. Test contains 4750 pictures, of which 2375 are dog and CAT respectively.