AI smart barrage (also known as masked barrage) : A barrage floats above the video but never blocks the person. Originated from Bilibili, the black technology on the Web end is implemented in the APP end of IOS and Android respectively. Now it is used in short video, live broadcast and other media industries, and the user experience is significantly improved.
In addition to using the new Flutter solution for cross-end implementation, this paper also explains how to process an arbitrary video stream into a masked data source using Opencv-Python to achieve a 0-1 front-end and back-end AI system. Let’s take a look at the final result of the double end:
- Python backend:
- The key frames of the video stream are extracted in turn and saved as pictures
- All the key frames are passed to the neural network model so that the algorithm can erase the non-person from the picture and save the picture frame
- The image frame containing only people is converted into plain color value to get gray scale image, and finally to black and white inverse color image
- Generate a time: path profile for the front end by identifying the outline coordinates of the black and white invert
- Because the front:
- Implement a barrage scheduling animation group
- According to the configuration file, cut the shell barrage outer container into a hole shape that just shows the character, also known as mask
- The introduction of the player, video stream playback, for the keyframe synchronous rendering of its corresponding mask shape
- Development:
- Web front-end Implementation
- Video on demand and live streaming
- Summary and optimization
1. The Python backend
1.1 Extracting key frames
Py -- configuration file import OS import cv2 VIDEO_NAME = 'source.mp4' # Processing video file name FACE_KEY = '*****' # AI identification key FACE_SECRET = '*****' # AI key dirPath = os.path.dirname(os.path.abspath(__file__)) cap = cv2.videocapture (os.path.join(dirPath, FPS = round(cap.get(cv2.cap_prop_fps), 0) # FRAME_CD = Max (1, round(FPS / 30)) if cv2.cap_prop_frame_count/FRAME_CD >= 900: Raise Warning(' The number of keyframes in your video has exceeded 900. Reduce the video length or FPS frame rate! ')Copy the code
In this configuration file, the frame rate of the video will be read first. For 30FPS videos, each frame will be processed as a key frame, while for 60FPS videos, every frame will be processed once. This ensures uniform performance of Flutter when drawing masks. In addition, it should be noted that since the DEMO is completely offline, both the video and the final mask file will be packed into the APP, so the video file should not be too large.
# -- Import OS import shutil import cv2 import config dirPath = os.path.dirname(os.path.abspath(__file__)) images_path = dirPath + '/images' cap = cv2.VideoCapture(os.path.join(dirPath, config.VIDEO_NAME)) count = 1 if os.path.exists(images_path): Shutil.rmtree (images_path) os.makedirs(images_path) # loop to read each frame of the video while True: ret, frame = if ret: if(count % config.FRAME_CD == 0): Print (' The number of frames: Imwrite (images_path + '/frame' + STR (count) + '.jpg', frame) count += 1 cv2.waitKey(0) else: print('frames were created successfully') break cap.release()Copy the code
Here, opencV is used to extract the keyframe image of the video and save it in the images folder of the current directory.
1.2 Character extraction through AI model
The work of extracting people in images needs to be completed by convolutional neural network. Different degrees of training have a great impact on the accuracy of image classification, which directly determines the final effect. Large companies have algorithm teams to train models. Our DEMO uses the open test interface provided by FACE++. The accuracy rate is the same as that of paid businesses, but it will be limited and the failure rate is as high as 80%.
Import OS import shutil import base64 import re import JSON import threading import # discern requests import config dirPath = os.path.dirname(os.path.abspath(__file__)) clip_path = dirPath + '/clip' if not Os.path.exists (clip_path): os.makedirs(clip_path) reqTimes = 0 filename = None data = { 'api_key': config.FACE_KEY, 'api_secret': config.FACE_SECRET, 'return_grayscale': 1 } def __init__(self, filename): self.filename = filename def once_again(self): # Success rate is about 10%, Print (self.filename +' fail times:' + STR (self.reqtimes)) return self.reqfacePlus () Def reqfaceplus(self): abs_path_name = os.path.join(dirPath, 'images', self.filename) # file with binary files = {'image_file': open(abs_path_name, 'rb')} try: response = '',, Loads (response. Text) # loads(response. Text) # loads(response. If 'error_message' in res_data: return self.once_again() else: # to identify successful return results return res_data except requests. Exceptions. RequestException as e: Return self.once_again() # def thread_req(n): Multiple_req_ins = multiple_req(filename=n) res = multiple_req_ins.reqfaceplus( img_data_color = base64.b64decode(res['body_image']) img_data = base64.b64decode(res['result']) with open(dirPath + '/clip/clip-color-' + n, 'wb') as f: F.write (img_data_color) with open(dirPath + '/clip/ '+ n, 'wb') as f: Image_list = os.listdir(os.path.join(dirPath, image_list = os.listdir(os.path.join(dirPath, 'images')) image_list_sort = sorted(image_list, key=lambda name: int(re.sub(r'\D', '', name))) has_cliped_list = os.listdir(clip_path) for n in image_list_sort: If 'clip-color-' + n in has_cliped_list and 'clip-color-' + n in has_cliped_list: continue '' a separate thread is recursed for each frame to achieve parallel effect. Exit the main process after all images are identified and saved, which takes several minutes. (where each Thread is constantly recursion network request, hang, waiting, IO write, not CPU) "' t = threading. Thread (target = thread_req, name = n, args = [n]) t.s tart ()Copy the code
First read the above images directory list all KEY frames, and for each KEY frame up a single thread, create a recognition in each thread class multiple_req instance, in each instance of the current incoming requests for continuous recursive submit identification file until recognition success (please to apply for a free KEY, I’m afraid face++ will seal my number return the identified image and save it in the clip directory. In this process, because the interface hit success rate is very low, the same image will be repeatedly identified dozens of times, but most of the time is waiting for network transmission and IO read and write, so you can rest assured that hundreds of threads CPU core can not run, wait for a few minutes to return all the results of the script will automatically exit.
1.2 Pixel conversion and contour path generation
We have obtained the algorithm to help us extract the human keyframe before, and then we need to use OpencV to convert the pixel: human keyframe to grayscale image to black and white anti-color image to outline JSON
JSON import OS import JSON import re import shutil import cv2 import config dirPath = os.path.dirname(os.path.abspath(__file__)) clip_path = os.path.join(dirPath, 'mask') cap = cv2.VideoCapture(os.path.join(dirPath, Config.video_name)) frame_width = cap.get(cv2.cap_prop_frame_width) # Resolution (width) frame_height = Cap.get (cv2.cap_prop_frame_height) # FPS = round(cap.get(cv2.cap_prop_fps), 0) # video FPS mask_cd = int(1000 / FPS * config.frame_cd) # milli_seconds_plus = mask_cd # jsonTemp = {# The last JSON configuration to be stored is' mask_CD ': mask_cd, 'frame_width': frame_width, 'frame_height': frame_height } if os.path.exists(clip_path): Shutil.rmtree (clip_path) os.makedirs(clip_path) def output_clip(filename): shutil.rmtree(clip_path) os.makedirs(clip_path) def output_clip(filename): Img = cv2.imread(os.path.join(dirPath, 'clip', Gray_in = cv2.cvtcolor (img, cv2.color_bgr2gray) # reverse color transform, gray_in is a 3d matrix, Gray = 255-gray_in # Convert the grayscale image to pure black and white, either 0 or 255. Threshold (gray, 220, 255, Cv2. imwrite(clip_path + '/invert-' + filename, binary) Contours, _ = cv2.findContours(binary, cv2.retr_tree, Cv2. CHAIN_APPROX_SIMPLE) # clip_list = [] for item in contours: if item.size > 0: Shape is (n, 1, 2). N is the number of coordinates that make up the face. 1 is meaningless. Rows, _, __ = item.shape Clip = [] Clip_list.append (Clip) for I in range(rows): # change np.ndarray to list, Clip. append(item[I, [zipzipgex] = zipzipgex [zipzipgex] = zipzipgex [zipzipgex] = zipzipgex [zipzipgex] = zipzipgex [zipzipgex] = zipzipgex Print (filename +' time(' + zippex +') data.') mask_cd += milli_seconds_plus name in os.listdir(os.path.join(dirPath, 'clip')): if not re.match(r'^clip-frame', name): Continue clipframe.append (name) # Sort the file names by frame order. ClipFrameSort = sorted(clipFrame, key=lambda name: int(re.sub(r'\D', '', name))) for name in clipFrameSort: Dumps = json.dumps(jsonTemp) fileObject = open(os.path.join(dirPath, 'res.json'), 'w') fileObject.write(jsObj) fileObject.close() print('calc done')Copy the code
For each character keyframe, there are layers of pixel manipulation. Opencv will generate the image pixel numPY 3D matrix, the calculation speed is fast, easy to operate, for example, we want to swap a grayscale image of a 3D matrix gray_in black and white pixels, only need gray = 255-GRAY_IN can get a new matrix without using Python language to cycle. Finally, the closure graph path of the calculated frame is converted into a common multi-dimensional array type and stored in the configuration file Map
. Key is the progress time of the video ms, and value is the closure path (i.e. the surrounding path of the white area in the figure, excluding the black character area), which is a two-dimensional array. Because there will be n closure paths in one frame. The video information is also stored in a configuration file, where frame_cd tells the flutter how many ms intervals to switch the next frame mask. The wide and high resolution of the video is used to initialize the player adaptive layout of the flutter. The detailed JSON data structure can be seen in the image above. Now that we have a res.json configuration file that contains the clipped coordinate set of the video keyframe data, we can use flutter to cut the paper
2. The Flutter front end
2.1 Bullet screen scheduling animation group
The implementation of each end of the bullet screen scheduling system is almost the same, but the API way of the animation library is different. A SlideTransition can be used to animate a single bullet screen in a flutter.
// Core.dart - Class Barrage extends StatefulWidget {final BarrageController BarrageController; Barrage(this.barrageController, {Key key}) : super(key: key); @override _BarrageState createState() => _BarrageState(); } class _BarrageState extends State<Barrage> with TickerProviderStateMixin { AnimationController _animationController; Animation<Offset> _offsetAnimation; _PlayPauseState _playPauseState; void _initAnimation() { final barrageController = widget.barrageController; _animationController = AnimationController( value: barrageController.value.scrollRate, duration: barrageController.duration, vsync: this, ); _animationController.addListener(() { barrageController.setScrollRate(_animationController.value); }); _offsetAnimation = Tween<Offset>(begin: const Offset(1.0, 0.0), end: Const Offset (1.0, 0.0),) the animate (_animationController); _playPauseState = _PlayPauseState(barrageController) .. init() .. addListener(() { _playPauseState.isPlaying ? _animationController.forward() : _animationController.stop(canceled: false); }); if (_playPauseState.isPlaying) { _animationController.forward(); } } void _disposeAnimation() { _animationController.dispose(); _playPauseState.dispose(); } @override void initState() { super.initState(); _initAnimation(); } @override void didUpdateWidget(Barrage oldWidget) { super.didUpdateWidget(oldWidget); _disposeAnimation(); _initAnimation(); } @override void deactivate() { _disposeAnimation(); super.deactivate(); } @override Widget build(BuildContext context) { return SlideTransition( position: _offsetAnimation, child: SizedBox( width: double.infinity, child: widget.barrageController.content, ), ); }}Copy the code
When there are a large number of bullets to attack, the first need in the upper layer of the playerContainer
Create multiple bullet screen channels in the container, and schedule which channel each bullet screen should appear in through the algorithm, initialize the animation, and remove the screendispose
Animates and removes the barrageWidget
On this basis, it is also necessary to set a randomness of time, so that there is a slight difference in the fluttering time of each bullet screen animation, so as to optimize the visual effect of the whole bullet screen flow. For detailed code for barrage scheduling, refer to the project core.dart file. I will not go into details here.
2.2 Clipping mask containers
Class Index extends StatefulWidget {//... } class IndexState extends State<Index> with WidgetsBindingObserver { //... Map cfg; @override void initState() { super.initState(); WidgetsBinding.instance.addObserver(this); Future<String> loadString = DefaultAssetBundle.of(context).loadString("py/res.json"); loadString.then((String value) { setState(() { cfg = json.decode(value); }); }); } / /... / /... }Copy the code
The formal environment must obtain real-time data from the network HTTP long connection or socket. As we are an offline DEMO, we need to load the path res.json of the backend output mask and package it into the APP during initialization for convenience.
// Barrage. Dart -- Extends StatefulWidget {final Map CFG; const BarrageInit({Key key, this.cfg}) : super(key: key); @override BarrageInitState createState() => BarrageInitState(); } class BarrageInitState extends State<BarrageInit> { //... BarrageWallController _controller; List curMaskData; / /... @override Widget build(BuildContext context) { num scale = MediaQuery.of(context).size.width / widget.cfg['frame_width']; return ClipPath( clipper: curMaskData ! = null ? MaskPath(curMaskData, scale) : null, child: Container( color: Colors.transparent, child: _controller.buildView(), ), ); } } class MaskPath extends CustomClipper<Path> { List<dynamic> curMaskData; num scale; MaskPath(this.curMaskData, this.scale); @override Path getClip(Size size) { var path = Path(); curMaskData.forEach((maskEach) { for (var i = 0; i < maskEach.length; i++) { if (i == 0) { path.moveTo(maskEach[i][0] * scale, maskEach[i][1] * scale); } else { path.lineTo(maskEach[i][0] * scale, maskEach[i][1] * scale); }}}); return path; } @override bool shouldReclip(CustomClipper<Path> oldClipper) { return true; }}Copy the code
This is at the heart of how Flutter achieves the mask effectCustomClipper
Class, which allows us to pass throughPath
Object to customize coordinates to draw a clipping path (similar to a Canvas drawing), we create oneMaskPath
And draw the frame of the configuration file we just loaded, and passClipPath
The effect of clipping mask can be achieved by wrapping the shell container:
In order to see the Container more clearly, we will set the background color of the Container to color.transparent
2.3 Video stream mask synchronization
First we need to introduce a player. Considering the stability of IOS and Android plug-ins, we used the player plugin video_Player provided by Flutter official
Class VedioBg extends StatefulWidget {//... } class VedioBgState extends State<VedioBg> { VideoPlayerController _controller; Future _initializeVideoPlayerFuture; bool _playing; num inMilliseconds = 0; Timer timer; / /... @override void initState() { super.initState(); int cd = widget.cfg['mask_cd']; _controller = VideoPlayerController.asset('py/source.mp4') .. setLooping(true) .. addListener(() { final bool isPlaying = _controller.value.isPlaying; final int nowMilliseconds = _controller.value.position.inMilliseconds; if ((inMilliseconds == 0 && nowMilliseconds > 0) || nowMilliseconds < inMilliseconds) { timer? .cancel(); int stepsTime = (nowMilliseconds / cd).round() * cd; timer = Timer.periodic(Duration(milliseconds: cd), (timer) { stepsTime += cd;; }); } inMilliseconds = nowMilliseconds; _playing = isPlaying; }); _initializeVideoPlayerFuture = _controller.initialize().then((_) {});; } / /... }Copy the code
After the video is initialized, use addListener to listen to the playback progress. When the playback progress changes, get the current progress milliseconds to find the data set stepsTime in the configuration file that is closest to the current progress. The mask of this configuration is the clipping mask of the current playback frame. ** Immediately redraw the array path with key for stepsTime via notification mask container. ** Calibration mask. There are two problems in practice:
- How do I determine which frame of data set the current progress is closest to?
- A: It was written in the configuration by calculation when the data was prepared
, this time is the interval of the initial extraction of key frames, and the interval time can be calculatedint stepsTime = (nowMilliseconds / mask_cd).round() * mask_cd;
- The player’s callback is 500 milliseconds to change the time schedule, but we want to maximize the experience without this long delay, otherwise we can’t guarantee the screen and mask synchronization
- A: Each time a progress change is triggered, a new one is created
Cycle timer, cycle time is the same as beforemask_cd
“, and save the current progress time, so that in the next 500 milliseconds, even if the player does not inform us of the progress, we can continue to add our own technology, called in the timer
Notify mask redraw calibration.Remember to clear the timer when the video is finished and the loop mode is started
Here you have basically implemented a Flutter AI barrage player
3. Develop
3.1 Web front-end implementation
The Web front-end implementation is simpler than the Native implementation, which is mentioned briefly here. The data flow on the server is the same, but if you only need to connect to the Web front end,There is no need to convert grayscale to JSON configuration. The webKit browser kernel does a lot of the work for us.
3.2 Video on Demand and live broadcast
In fact, there is no difference in essence for masked bullet screen, because it is impossible for video websites to encode a whole video as MP4 format and play it to the user. They all return m4S or FLV video slices to the user through long connection, so live broadcast on demand is the same. The configuration information of mask barrage, whether base64 pictures on the Web side or JSON coordinate points required by app, needs to be encoded into binary stream along with the video slice, pulled to the end and decoded, the video part is fed to the player, and the mask information is extracted separately. These two parts in a packet, if separate transmission, will cause the screen mask out of sync. In the live broadcast scene, the video uploading to the cloud requires real-time extraction of key frames, image recognition and classification, and finally encoding and pushing to the client. This process takes time, so there will be a delay in the live broadcast room of the initial version of danmo, which is normal.
3.3 summarize
Currently, Flutter lacks stable, open source, multifunctional player plug-ins. The official plug-ins only have basic functions, such as live stream slices, which cannot be supported. Some third-party plug-ins may not be reliable, and they cannot keep up with the update speed of Flutter version. As a result, it is necessary to invest a lot of R&D costs to develop native plug-ins for commercial use. There are some optimizable details about this AI danmaku DEMO, such as adding mask player progress control, vertical and horizontal screen switching, special effects danmaku and so on. The code in this paper only introduces some fragments. Please refer to the complete code of the front and back end: