This is the 25th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
The environment
- windows 10 64bit
- voc dataset
Introduction to the
Sometimes, for a large data set (such as VOC2012), we only need to do the model training of a certain target (such as Person). At this time, we need to sort out the pictures and corresponding annotations of a specific target to form a new data set. This article, we will achieve this goal.
steps
Here, VOC2012 Dataset is taken as an example to come to the official open Dataset of Roboflow and select Pascal VOC2012 Dataset
VOC2012
When downloading, select YOLO Darknet format
VOC2012
Once the download is complete and decompressed, the directory structure looks like this
VOC2012
The full VOC2012 contains 20 targets, as described in the _darknet.labels file
VOC2012
Suppose that to extract the Person class, we create a new script at the root of the dataset and type the following code
Import OS import shutil # counter = 0 # create 2 target folders, images = images, labels = if not os.path.exists('images'): os.makedirs('images') if not os.path.exists('labels'): os.makedirs('labels') for file in os.listdir('train'): flag = False if file.endswith('.txt'): F = open("train/{}".format(file), 'r') line = f.readline() TMP = line.split(' ')[0] # TMP = line.split(' ')[0] # = "14": print('NOT person.') flag = True break line = f.readline() f.close() if not flag: counter += 1 prefix = file[0: Shutil. copy2('train/{} JPG '.format(prefix), 'images') shutil.copy2('train/{}'.format(file), 'labels') print('total number image: {}'.format(counter))Copy the code
After executing the above script, qualifying images are stored in the Images folder and labels are stored in the Labels folder
That next is going to modify the annotation of the class ID, in VOC2012, the id of the person is 14(from 0), taking into account the new data set only people in this category, so ID is 0, the work will become to TXT file in the first column of data in 14 to 0
So this is essentially a find and replace operation, and there are many ways to implement it. Here is an open source tool for Windows called grepWin. It has a graphical interface and is very simple to operate. The download address is
Github.com/stefankueng…
Once installed, right-click the labels folder and select Search with grepWin to open it
VOC2012
Lookups can use either re or text.
In this example, we use full text matching, so we need to replace 14 with 0. Note that there is a space after it, so that we can filter out more matches
VOC2012
In some TXT files, there may be multiple Matches. Click Matches on the top to reorder the Matches
VOC2012
Open one of the TXT files to check, found that other columns may also match 14, so for this case, you need to manually modify, fortunately, such files are not particularly many.
If it’s a single match, you can just replace the text.
At this point, the data set for the individual target is processed and ready for training