This is the 25th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

The environment

  • windows 10 64bit
  • voc dataset

Introduction to the

Sometimes, for a large data set (such as VOC2012), we only need to do the model training of a certain target (such as Person). At this time, we need to sort out the pictures and corresponding annotations of a specific target to form a new data set. This article, we will achieve this goal.

steps

Here, VOC2012 Dataset is taken as an example to come to the official open Dataset of Roboflow and select Pascal VOC2012 Dataset

VOC2012

When downloading, select YOLO Darknet format

VOC2012

Once the download is complete and decompressed, the directory structure looks like this

VOC2012

The full VOC2012 contains 20 targets, as described in the _darknet.labels file

VOC2012

Suppose that to extract the Person class, we create a new script at the root of the dataset and type the following code

Import OS import shutil # counter = 0 # create 2 target folders, images = images, labels = if not os.path.exists('images'): os.makedirs('images') if not os.path.exists('labels'): os.makedirs('labels') for file in os.listdir('train'): flag = False if file.endswith('.txt'): F = open("train/{}".format(file), 'r') line = f.readline() TMP = line.split(' ')[0] # TMP = line.split(' ')[0] # = "14": print('NOT person.') flag = True break line = f.readline() f.close() if not flag: counter += 1 prefix = file[0: Shutil. copy2('train/{} JPG '.format(prefix), 'images') shutil.copy2('train/{}'.format(file), 'labels') print('total number image: {}'.format(counter))Copy the code

After executing the above script, qualifying images are stored in the Images folder and labels are stored in the Labels folder

That next is going to modify the annotation of the class ID, in VOC2012, the id of the person is 14(from 0), taking into account the new data set only people in this category, so ID is 0, the work will become to TXT file in the first column of data in 14 to 0

So this is essentially a find and replace operation, and there are many ways to implement it. Here is an open source tool for Windows called grepWin. It has a graphical interface and is very simple to operate. The download address is

Github.com/stefankueng…

Once installed, right-click the labels folder and select Search with grepWin to open it

VOC2012

Lookups can use either re or text.

In this example, we use full text matching, so we need to replace 14 with 0. Note that there is a space after it, so that we can filter out more matches

VOC2012

In some TXT files, there may be multiple Matches. Click Matches on the top to reorder the Matches

VOC2012

Open one of the TXT files to check, found that other columns may also match 14, so for this case, you need to manually modify, fortunately, such files are not particularly many.

If it’s a single match, you can just replace the text.

At this point, the data set for the individual target is processed and ready for training