The author | Rahul Varma compile | source of vitamin k | forward Data Science
The most important step in training and testing an effective machine learning model is to collect a lot of data and use that data to effectively train it. Mini-batches help solve this problem, using a small number of batches of data for training in each iteration.
However, as a large number of machine learning tasks are performed on video data sets, there is a problem of effective batch processing for unequal length videos. Most methods rely on cropping the video to equal lengths in order to extract the same number of frames during iterations. But this isn’t particularly useful in scenarios where we need to extract information from every frame to effectively predict something, especially in the case of self-driving cars and motion recognition.
We can create a processing method that can handle videos of different lengths.
In Glenn Jocher’s Yolov3 (github.com/ultralytics…
Class initialization
def __init__(self, sources='streams.txt', img_size=416, batch_size=2, subdir_search=False) :
self.mode = 'images'
self.img_size = img_size
self.def_img_size = None
videos = []
if os.path.isdir(sources):
if subdir_search:
for subdir, dirs, files in os.walk(sources):
for file in files:
if 'video' in magic.from_file(subdir + os.sep + file, mime=True):
videos.append(subdir + os.sep + file)
else:
for elements in os.listdir(sources):
if not os.path.isdir(elements) and 'video' in magic.from_file(sources + os.sep + elements, mime=True):
videos.append(sources + os.sep + elements)
else:
with open(sources, 'r') as f:
videos = [x.strip() for x in f.read().splitlines() if len(x.strip())]
n = len(videos)
curr_batch = 0
self.data = [None] * batch_size
self.cap = [None] * batch_size
self.sources = videos
self.n = n
self.cur_pos = 0
The starting thread reads frames from the video stream
for i, s in enumerate(videos):
if curr_batch == batch_size:
break
print('%g/%g: %s... ' % (self.cur_pos+1, n, s), end=' ')
self.cap[curr_batch] = cv2.VideoCapture(s)
try:
assert self.cap[curr_batch].isOpened()
except AssertionError:
print('Failed to open %s' % s)
self.cur_pos+=1
continue
w = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = self.cap[curr_batch].get(cv2.CAP_PROP_FPS) % 100
frames = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_COUNT))
_, self.data[i] = self.cap[curr_batch].read() # guarantee first frame
thread = Thread(target=self.update, args=([i, self.cap[curr_batch], self.cur_pos+1]), daemon=True)
print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))
curr_batch+=1
self.cur_pos+=1
thread.start()
print(' ') # new line
if all( v is None for v in self.data ):
return
Check for common shapes
s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.data], 0) # Shape of reasoning
self.rect = np.unique(s, axis=0).shape[0] = =1
if not self.rect:
print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')
Copy the code
In the *__init__* function, four arguments are accepted. Although img_size is the same as the original, the other three parameters are defined as follows:
- Sources: It takes a directory path or text file as input.
- Batch_size: indicates the required batch size
- Subdir_search: You can toggle this option to ensure that related files in all subdirectories are searched when the directory is passed as the sources parameter
I first check whether the sources argument is a directory or a text file. If it is a directory, I will read everything in the directory (subdirectories are included if the subdir_search parameter is True), otherwise I will read the path of the video in the text file. The path of the video is stored in the list. Use cur_pos to track the current position in the list.
The list is iterated over at a maximum batch_size and checked to skip wrong videos or nonexistent ones. They are sent to the LetterBox function to resize the image. This is no change from the original version, unless all videos are faulty/unavailable.
def letterbox(img, new_shape=(416.416), color=(114.114.114), auto=True, scaleFill=False, scaleup=True) :
# adjust image to 32 pixels multiples of rectangular https://github.com/ultralytics/yolov3/issues/232
shape = img.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# proportion
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # Scale down only, not scale up (for better test images)
r = min(r, 1.0)
# calculate fill
ratio = r, r # aspect ratio
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # fill
if auto: # minimum rectangle
dw, dh = np.mod(dw, 64), np.mod(dh, 64) # fill
elif scaleFill: # stretch
dw, dh = 0.0.0.0
new_unpad = new_shape
ratio = new_shape[0] / shape[1], new_shape[1] / shape[0] # aspect ratio
dw /= 2 # Divide the fill into two sides
dh /= 2
if shape[::-1] != new_unpad: # change the size
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add boundary
return img, ratio, (dw, dh)
Copy the code
Fixed interval retrieval frame function
The update function has a small change, we also store the default image size so that all videos are extracted for processing, but one video is finished before the other due to unequal lengths. It will become clearer when I explain the next part of the code, which is the *__next__* function.
def update(self, index, cap, cur_pos) :
Read the next frame in the daemon thread
n = 0
while cap.isOpened():
n += 1
# _, self.imgs[index] = cap.read()
cap.grab()
if n == 4: # Read every 4 frames
_, self.data[index] = cap.retrieve()
if self.def_img_size is None:
self.def_img_size = self.data[index].shape
n = 0
time.sleep(0.01) # wait
Copy the code
The iterator
If the frame exists, it is passed to the letterBox function as usual. In the case of frame None, which means the video has been fully processed, we check to see if all the videos in the list have been processed. If there are more videos to process, the cur_pos pointer is used to get the location of the next available video.
If videos are no longer extracted from the list, but some videos are still being processed, a blank frame is sent to the other processing component, that is, it dynamically resizes the video based on the remaining frames in the other batch.
def __next__(self) :
self.count += 1
img0 = self.data.copy()
img = []
for i, x in enumerate(img0):
if x is not None:
img.append(letterbox(x, new_shape=self.img_size, auto=self.rect)[0])
else:
if self.cur_pos == self.n:
if all( v is None for v in img0 ):
cv2.destroyAllWindows()
raise StopIteration
else:
img0[i] = np.zeros(self.def_img_size)
img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])
else:
print('%g/%g: %s... ' % (self.cur_pos+1, self.n, self.sources[self.cur_pos]), end=' ')
self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])
fldr_end_flg = 0
while not self.cap[i].isOpened():
print('Failed to open %s' % self.sources[self.cur_pos])
self.cur_pos+=1
if self.cur_pos == self.n:
img0[i] = np.zeros(self.def_img_size)
img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])
fldr_end_flg = 1
break
self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])
if fldr_end_flg:
continue
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS) % 100
frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
_, self.data[i] = self.cap[i].read() # Guarantee the first frame
img0[i] = self.data[i]
img.append(letterbox(self.data[i], new_shape=self.img_size, auto=self.rect)[0])
thread = Thread(target=self.update, args=([i, self.cap[i], self.cur_pos+1]), daemon=True)
print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))
self.cur_pos+=1
thread.start()
print(' ') # new line
# stack
img = np.stack(img, 0)
# transformation
img = img[:, :, :, ::-1].transpose(0.3.1.2) # BGR to RGB, bSX3x416x416
img = np.ascontiguousarray(img)
return self.sources, img, img0, None
Copy the code
conclusion
With a lot of time spent on data collection and data preprocessing, I believe this helps reduce the time it takes to match the video to the model and we can focus on matching the model to the data.
I’ve attached the full source code here. Hope this helps!
The original link: towardsdatascience.com/variable-si…
Welcome to panchuangai blog: panchuang.net/
Sklearn123.com/
Welcome to docs.panchuang.net/