Odd technical guidelines
This paper mainly introduces how to develop real-time interactive AI video application through WebRTC
Based on the initial research results and implementation conditions, the author’s team proposed two solutions
This article is reprinted from Qiwu Weekly
Project background
Recently, I was in charge of a large visual screen project for face and gesture recognition. The main task of the front end was to get real-time video stream from the camera, play the video stream on the canvas, extract frames every 1000ms, and send it to the back-end server by maintaining WebSocket connection after compression. After processing by the AI visual model algorithm on the server, the front end receives face recognition, gesture recognition and other related structured data, and completes human-computer interaction and data display according to business scenarios.
Based on the initial research results and implementation conditions, we tried two solutions successively.
The solution
Scheme 1: Websocket to obtain network video stream
IPC webcam is used, and its standard video output format is Real Time Streaming Protocol (RTSP). Native video players are definitely not enough to use, so the source video stream needs to be codec. After investigation, a lightweight pure JS decoder JSMpeg is adopted, which supports decoding RTSP video stream encoded by Hevc and output it in MPEG1 video encoding format. This project does not analyze the audio, that is, discard the audio stream and do not encode it. JSMpeg can then connect to a WebSocket server and send binary MPEG-TS data. The front end maintains a long connection to receive VIDEO stream TS fragments and play them on the Canvas. After testing, we found that JSMpeg can handle 720p video at 30fps.
Solution 2: WebrTC obtains local media
Use USB driver or built-in camera to directly obtain local media MediaSteam through THE API provided by WebRTC, and directly capture video stream to play on canvas. As the local media stream is acquired, network transmission and decoding are not required. Compared with scheme 1, the front-end processing work is greatly reduced, and the next step is to keep pumping frames and sending them to the server.
It is not hard to see that the difference between Scheme 1 and Scheme 2 lies in the source of video stream. The video stream in Scheme 1 is pushed to the front end through the Built-in Node relay server of JSMpeg, while the video stream in Scheme 2 is captured through WebRTC. It might seem like plan 2 would be easier, but it is, and plan 2 actually works better than Plan 1.
As for JSMpeg and WebRTC, which are mainly used in the two solutions, let’s discuss and learn from them
JSMpeg
JSMpeg is a video decoder written in JavaScript. It consists of an MPEG-TS separator, MPEG1 video and MP2 audio decoders, WebGL and Canvas2D renderers, and WebAudio sound output. It loads static video over Ajax and can stream it via WebSocket with low latency (~50ms).
The decoder looks pretty good, and the authors claim to be able to decode 720p video at 30fps on the iPhone 5S, which works in any modern browser (Chrome, Firefox, Safari, Edge), with only 20KB gzip. But JSMpeg only supports MPEG-TS video streams using MPEG1 video codecs and MP2 audio codecs. The video decoder cannot handle b-frames correctly, and the width of the video must be a multiple of 2. JSMpeg comes with a WebSocket repeater implemented by Node.js, which accepts the MPEG-TS source over HTTP and broadcasts it to all connected browsers via WebSocket. The main code is as follows:
// The HTTP server receives mPEG-TS streams from FFMPEG
var streamServer = http.createServer( function(request, response) {
request.on('data', function(data){
socketServer.broadcast(data);
});
request.on('end',function(){
console.log('close');
});
}).listen(STREAM_PORT);
Copy the code
For pushing streams, you can use FFmPEG, gStreamer, or other methods to generate incoming HTTP streams, in this case FFMPEG:
ffmpeg -f v4l2 -framerate 25 -video_size 640×480 -i /dev/video0 -f mpegts -codec:v
Mpeg1video-s 640×480-b :v 1000K-bf 0 http://127.0.0.1:8081
In short, here’s how it works:
-
Use NodeJS to create a relay service websocket-relay.js (see source code);
-
Run FFMPEG to send the video output to the HTTP port of the relay;
-
The JSMpeg in the browser connects to the Websocket port of the relay
-
Broadcast MPEG-TS data to the client through WebSocket.
However, having introduced JSMpeg, it’s almost time to give up here. At first glance, the above ideas are equivalent to live broadcasting, the process is tedious, the effect is not particularly ideal. Later, we gave up and started a new plan.
There are three reasons:
-
The video has a delay of about 1000ms.
-
As MPEG1 has very low efficiency, the video quality is low.
-
JS energy consumption problem, the client browser needs to parse mPEg1 format for playback, occupying too much CPU.
Due to the low quality of the video, and after several versions of optimization at the code level, little was achieved. So we switched to the hardware level and replaced it with a USB camera, which dramatically improved video sharpness. Since the original cable network card camera is abandoned, theoretically the front-end can directly obtain the local media capture video stream through the API provided by WebRTC, the rest of the work is the task of timing frame extraction.
WebRTC
WebRTC
Web Real-Time Communications (Web Real-Time Communications) is a technology tailored for real-time audio and video communication in browsers, which allows Web applications or sites to establish peer-to-peer connections between browsers without resorting to intermediaries. Realize the transmission of video stream, audio stream or other arbitrary data.
WebRTC includes several related apis and protocols to achieve the goal of real-time communication. The MediaDevices API is mainly described in the following sections of the project.
1
Get an accessible media device
MediaDevices. EnumerateDevices request one of the available media list of input and output devices, such as a microphone, camera, headphones, equipment, etc. The returned Promise completes with an array of MediaDeviceInfo that describes the device.
Use enumerateDevices to print a list of device ids with tags (if any) :
if (! navigator.mediaDevices || ! navigator.mediaDevices.enumerateDevices) {
Console. log(" Not supported enumerateDevices().");
return;
}
// List cameras and microphones.
navigator.mediaDevices.enumerateDevices()
.then(function (devices) {
devices.forEach(function (device) {
console.log(device.kind + ": " + device.label +
" id = " + device.deviceId);
});
})
.catch(function (err) {
console.log(err.name + ": " + err.message);
});
Copy the code
EnumerateDevices returns a Promise object.
When done, it receives an array of MediaDeviceInfo objects. Each object describes an available media INPUT/output device. If the enumeration fails, the Promise fails Rejected.
2
Get a usable media stream
1. MediaDevices.getUserMedia()
The getUserMedia API was created by the original Navigator.getUserMedia (deprecated)
Change to the navigator. MediaDevices. GetUserMedia. When used, getUserMedia first prompts the user to authorize media input permission, and then generates MediaStream.
A MediaStream consists of zero or more MediaStreamTrack objects, representing various video tracks (from hardware or virtual video sources, such as cameras, video capture devices, screen sharing services, etc.) and harmony tracks (from hardware or virtual audio sources, such as microphones, A/D converters, etc.).
Each Media Reamtrack may have one or more channels. This channel represents the smallest unit of a media stream, such as an audio signal corresponding to a corresponding speaker, such as the left or right channel in a stereo track.
var video = document.createElement('video');
var constraints = {
audio: false,
video: true
};
function successCallback(stream) {
window.stream = stream; / / MediaStream object
video.srcObject = stream;
}
function errorCallback(error) {
console.log('navigator.getUserMedia error: ', error);
}
function getMedia(constraints) {
if (window.stream) {
video.src = null;
window.stream.getVideoTracks()[0].stop();
}
navigator.mediaDevices.getUserMedia(
constraints
).then(
successCallback,
errorCallback
);
}
Copy the code
Note: After Chrome 47, the getUserMedia API will only allow video and audio requests from “secure and trusted” clients, such as HTTPS and local Localhost. Chrome throws an error if the script for the page is loaded from an insecure source and there is no mediaDevices object available in the Navigator object.
-
constraints
Constraints, as a MediaStreamConstraints object, specifies the media type of the request and the corresponding parameters.
It contains MediaStreamConstraints objects that contain both video and audio members and must have at least one type or specify these two keywords.
Note: If true is set for a media type, the resulting stream needs to have tracks of that type. If one of these is not available for some reason, getUserMedia() will generate an error.
Say you want to use the 1280×720 camera resolution:
{
audio: true,
video: { width: 1280, height: 720 }
}
Copy the code
Note: The browser tries to satisfy the request parameters, but it may return other resolutions if the request parameters are not exactly satisfied or if the user chooses to override the request parameters.
Use the keywords min, Max, or exact(min == Max) when you only want a specific size. For example, a minimum resolution of 1280×720 is required.
{
audio: true,
video: {
width: { min: 1280 },
height: { min: 720 }
}
}
Copy the code
If the camera does not support the requested resolution or higher, the Promise returned is in the Rejected state, NotFoundError as the Rejected callback argument, and the user will not be prompted for authorization.
Keywords min, Max, and exact are intrinsically mandatory compared to Ideal. When a request contains an Ideal, which has a higher weight, the browser first tries to find the Settings or camera that is closest to the ideal (if there are multiple cameras) :
{
audio: true,
video: {
width: { ideal: 1280 },
height: { ideal: 720 }
}
}
Copy the code
When your device has multiple cameras, choose front first:
{ audio: true, video: { facingMode: "user" } }
Copy the code
Or force the use of a rear camera:
{
audio: true,
video: {
facingMode: {
exact: "environment"
}
}
}
Copy the code
Capricious you may only want a particular device, and you’ll need to constrain it with deviceId, which the browser will get first.
{ video: { deviceId: myPreferredCameraDeviceId } }
Copy the code
Ok, this will return you to the media device you need.
When you use exact with deviceId and the specified device does not exist or the constraint cannot be met, the browser raises an OverconstrainedError:
-
Common exception throwing
var promise = navigator.mediaDevices.getUserMedia({
video: true,
audio: false
});
promise.then(function (MediaStream) {
video.srcObject = MediaStream;
}).catch(err => {
if (err.name == 'NotFoundError' || err.name == 'DeviceNotFoundError') {
// No media type could be found that satisfies the requested parameters
console.log(err.name, 'require track is missing');
} else if (err.name == 'NotReadableError' || err.name == 'TrackStartError') {
// The device is authorized, but some hardware, browser, or web page level error causes the device cannot be accessed
console.error(err.name, 'webcam or mic are already in use');
} else if (err.name == 'OverconstrainedError' || err.name == 'ConstraintNotSatisfiedError') {
The specified requirement cannot be met by the device. This exception is an object of type OverconstrainedError
console.error(err.name, 'constraints can not be satisfied by avb.device');
} else if (err.name == 'NotAllowedError' || err.name == 'PermissionDeniedError') {
// The user denied access to the browser instance
console.error(err.name, 'permission denied in browser');
} else if (err.name == 'TypeError' || err.name == 'TypeError') {
// Constraints objects are not set, or both are set to false
console.error(err.name, 'empty constraints object');
} else {
// Other errors
console.error(err.name, 'other errors');
}
});
Copy the code
2. MediaDevices.getDisplayMedia()
this
The getDisplayMedia method of the MediaDevices interface prompts the user to select and authorize the capture of displayed content or portions of content (such as a window) in a MediaStream. It contains a video track (video track content from a user-selected screen area and an optional audio track), and the stream can then be recorded using the MediaStreamRecording API or transmitted as part of a WebRTC session.
navigator.getDisplayMedia({ video: true })
.then(stream => {
// Successfully callback the stream and assign it to the video element;
videoElement.srcObject = stream;
}, error => {
console.log("Unable to acquire screen capture", error);
});
Copy the code
-
constraints
Like getDisplayMedia, getDisplayMedia has an optional MediaStreamConstraints object that specifies the MediaStream requirements to return. The difference is that getDisplayMedia requires a video track, which is present in the returned stream even if not explicitly required by Constraints.
-
abnormal
Consistent with getUserMedia, the rejection from the returned Promise is done by passing the DOMException error object to the Promise’s failure handler.
3. Compare getUserMedia with getDisplayMedia
GetDisplayMedia won’t go into too much detail, because most of the operations are the same, except for the following:
-
getUserMedia
MediaStream can include, for example, video tracks (from hardware or virtual video sources, such as cameras, video capture devices, screen sharing services, etc.), audio (from hardware or virtual audio sources, such as microphones, A/D converters, etc.), and possibly other track types.
Constraints can be implemented by accepting the MediaStreamConstraints constraint parameter to restrict the MediaStream that is captured.
-
getDisplayMedia
The MediaStream object has only one MediaStreamTrack for capturing the video stream and no MediaStreamTrack for capturing the audio stream.
Constraints cannot be implemented, and the constraints parameter does not accept MediaTrackConstraints values.
The permissions cannot be retained, and the Screen content to share cannot be changed unless Reload reawakens Screen Capture.
other
Having said video streaming, let’s talk briefly about extraction frames. Here, frame extraction refers to the extraction of images from the video at regular intervals, and the real-time face and gesture data in front of the large screen are sent to the back end. As mentioned earlier, we ended up drawing the video stream on the canvas. Drawing the current image through the drawImageAPI provided by Canvas and sending it to the back end through WebSocket with timer can achieve the purpose of frame extraction.
function drawImage(drawImageRate) {
context.drawImage(video, 0, 0, width, height);
let base64Image = canvas.toDataURL('image/jpeg', 1); // Customize the quality of the image from 0 to 1 when the format is image/ JPEG or image/webp
. // After compression
window.rws.send(JSON.stringify({ image: base64Image }));
}
window.drawInter = setInterval(drawImage, drawImageRate = 1000);
Copy the code
In addition to Base64, you can also choose Blob format for converting canvas to IMG images, because it is binary and more back-end friendly.
canvas.toBlob(callback, mimeType, qualityArgument)
Copy the code
Here is about smoke frame small demo (https://chhxin.github.io/webrtc-demo/)
Experience of using WebRTC
To sum up, it has several advantages:
-
Browser-based real-time audio and video communication.
-
Free and open source (and already incorporated by W3C into the HTML5 standard).
-
Low cost, plug-in free. Cross-platform, cross-browser, cross-mobile applications.
However, nothing is absolutely perfect in this world, and WebRTC still has some shortcomings:
-
Compatibility issues. On the Web side, there are compatibility issues between browsers. Although the WebRTC organization provides a WebRTC adapter on GitHub, it still faces the problem of inconsistent browser behavior
-
Transmission quality is unstable. Because WebRTC uses point-to-point transmission, the transmission quality in cross-carrier, cross-region, low-bandwidth, high-packet loss scenarios is basically left to chance.
-
Poor mobile adaptation. Different models need to be adapted, it is difficult to have a unified user experience.
That’s it. WebRTC really let me experience its power in the field of audio and video. In terms of browser support, in addition to IE, Chrome, Firefox, Safari, Microsoft Edge and other mainstream browsers have all supported WebRTC. A variety of audio and video development scenarios, such as online classroom and remote screen, have also been widely used. In the future, I hope it can bring us more surprises!
Refer to the link
https://webrtc.org
https://developer.mozilla.org/en-US/docs/Web/API/Screen_Capture_API/Using_Screen_Capture
https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API
https://www.w3.org/TR/webrtc/ https://www.jianshu.com/p/57fd3b5d2f80
Pay attention to our
World of you when not
Just be your shoulders
There is no
360 official technical official account
Technology of dry goods | | hand information activities
Empty,