Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article is published by Tencent Game Cloud in cloud + community column

Watching a great Bundesliga match, the referee suddenly whistled, the match stopped, and the screen began to automatically play a video advertisement of “Eat chicken McNuggets, watch bundesliga match”

So the question is, how to seamlessly insert video files on demand into the live stream?

This paper introduces QQ Music based on Tencent cloud AVSDK, interactive live animation program and stepped on the pit.

01

Start with a requirement from the product manager

“The opening animation? A commercial break?”

Not long ago, the product guy said we were going to do an opening animation for the live audio and video.

To play the interstitial animation, how do I do that? For live video, how to deal with the current live picture stream? For audio, how do you input a stream?

02

Carding Technical Solution

The way of interactive live broadcasting is to push the image of the anchor to the audience, and the image of the anchor side can come from the data collected by the camera or from other input streams. So if AVSDK of Tencent Cloud can support the playback input stream, it can make all the characters play interdicted animation by decoding a video file locally on the host side and then pushing the data of this stream to the audience side. Fortunately, Tencent Cloud AVSDK can support this feature. There are two specific methods:

The first: replace the video picture

/ *! @abstract A callback for preprocessing locally captured video. @discussion Main thread callback, aspects implement video rendering directly in the callback. @param frameData Locally collected video frames. The image processing of data data, such as beauty, filter, and special effects, is sent back to the SDK for encoding and sending, and takes effect in the video received by the remote end. @see QAVVideoFrame */
- (void)OnLocalVideoPreProcess:(QAVVideoFrame *)frameData;
Copy the code

Anchor side local after collect data at the camera, before coding the upward to the server, will provide a callback interface to give the business side as the image preprocessing, so, for a live video, we can use this interface, the uplink input video screen to modify slots in animation video frames, and in this way, from the perspective of the audience was interrupted by video animation.

Second: use an external input stream

/ *! @abstract When the external video capture function is enabled, the external video frames are passed to the SDK. @return QAV_OK Succeeded. QAV_ERR_ROOM_NOT_EXIST room does not exist, the call takes effect only after entering the room. QAV_ERR_DEVICE_NOT_EXIST The video device does not exist. QAV_ERR_FAIL failure. @see QAVVideoFrame */
- (int)fillExternalCaptureFrame:(QAVVideoFrame *)frame;
Copy the code

At the beginning, I wrongly thought that only the second method could meet the demand of interstitising animation in both audio and video live broadcasts. However, in practical practice, I found that if external input streams are to be played, the camera picture must be closed first. This operation will cause the video bit switch in the background of Tencent Cloud, and will be notified to the audience through the following function:

/ *! @abstract Function for notifying room member state changes. @discussion When the status of room members changes (such as whether to send audio, video, etc.), it will notify the business side through this function. @param eventID Status change ID. For details, see the definition of QAVUpdateEvent. @param endPoints List of member ids whose status has changed. * /
- (void)OnEndpointsUpdateInfo:(QAVUpdateEvent)eventID endpointlist:(NSArray *)endpoints;
Copy the code

Switching video bits in a short time will lead to some timing problems, which is not recommended after discussion with SDK side. Finally, QQ Music adopted the coexistence of the two programs.

03

Video Format selection

For video files of interlacing animations, considering that streaming playback is required with low bit rate and high picture quality, H264 naked stream +VideoToolBox can be used for hard solution. If you only play local files, you can use H264 encoding MP4 +AVURLAsset decoding. The latter seems more reasonable as there is no need for streaming yet and the design students were given an MP4 file directly. Out of personal interest, the author has tried to realize the two schemes, but also encountered the following pits, sum up, hoping to make other students less detours:

1. Configure the resolution and frame rate

The resolution of the video should be consistent with the uplink resolution specified in the SPEAR engine configuration of Tencent cloud background. The video uplink configuration selected by QQ Music is 960×540 and the frame rate is 15 frames. However, in the actual playback, the effect is not ideal, so we need to play the data with a higher resolution. This step can be achieved by changing AVSDK’s role RoleName, which is not extended here.

Another problem is that the data collected from the camera is the image with an Angle of 1 below. When rendering, it will be rotated 90 degrees by default. When changing the video picture, the consistency between the two should be maintained. The data captured by the camera is in NV12 format, while the local fill screen can be in I420 format. When drawing, you can judge whether to rotate the image display according to the data format.

2. Ffmpeg to H264 raw stream decoding problem

Starting with iOS8, apple has opened up VideoToolBox, giving apps the ability to harddecode h264. For specific implementation and analysis, you can refer to the article ios-H264 hard Decoding. Because the design students give is an MP4 file, so first need to convert MP4 into H264 naked code stream, and then do decoding. Here I use FFmPEG to do the conversion:

ffmpeg -i test.mp4 -codec copy -bsf: h264_mp4toannexb -s 960*540 -f h264 output264.
Copy the code

Annexb is the H264 Elementary Stream format. For Elementary Stream, there is no separate package for SPS and PPS, but attached to the front of the I frame, generally like this:

00 00 00 01 sps 00 00 00 01 pps 00 00 00 01The I frameCopy the code

The hard decoding of VideoToolBox generally goes through the following steps:

1.Read the video stream2.Find the SPS and PPS information, create CMVideoFormatDescriptionRef, introduced to the next step as a parameter3.VTDecompressionSessionCreate: create decoding session4.VTDecompressionSessionDecodeFrame: decoding a video frame5.VTDecompressionSessionInvalidate: release decoding sessionCopy the code

However, the problem of unsolved data is always encountered when decoding the naked codestream after the above transformation. Analysis of the converted files shows that the converted format is not pure code stream, but ffMPEG has added some irrelevant information:

But it’s not impossible to use the tool H264Naked to find the binary and delete it altogether. Try again, it still cannot play, because the decoding session creation failed in step 3 above, OSStatus = -5. Unfortunately, the corresponding error information of this error code cannot be found on OSStatus.com. By comparing the difference between the good and bad files, it is found that the startcode before PPS in the failed decoding file does not start with three zeros, but looks like this

00 00 00 01 sps 00 00 01 pps 00 00 00 01The I frameCopy the code

But in fact, a review of the official H264 documentation shows that both forms are correct

I only considered the first case, but ignored the second, resulting in the wrong PPS data. This can be solved by manually inserting a 00, or in the case of decoder compatibility. But at the same time, it’s not intuitive. Which brings us to the second method.

\3. Avasse Reader decoded video

Decoding YUV with AVAssetReader is relatively easy, so post the code directly below:

    AVURLAsset *asset = [AVURLAsset URLAssetWithURL:[[NSURL alloc] initFileURLWithPath:path] options:nil];
    NSError *error;
    AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:asset error:&error];
    NSArray* videoTracks = [asset tracksWithMediaType:AVMediaTypeVideo];
    AVAssetTrack* videoTrack = [videoTracks objectAtIndex:0];

    int m_pixelFormatType = kCVPixelFormatType_420YpCbCr8Planar;
    NSDictionary* options = [NSDictionary dictionaryWithObject:[NSNumber numberWithInt: (int)m_pixelFormatType] forKey:(id)kCVPixelBufferPixelFormatTypeKey];
    AVAssetReaderTrackOutput* videoReaderOutput = [[AVAssetReaderTrackOutput alloc] initWithTrack:videoTrack outputSettings:options];
    [reader addOutput:videoReaderOutput];
    [reader startReading];

    // Read the video and convert each buffer to CGImageRef
    dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^ {while ([reader status] == AVAssetReaderStatusReading && videoTrack.nominalFrameRate > 0) {

      CMSampleBufferRef sampleBuff = [videoReaderOutput copyNextSampleBuffer];
      // What to do with sampleBuff
    });
Copy the code

This is only for potholes encountered. Some MP4 videos have a mysterious green bar when they are decoded and drawn, like this one below

This is why, the code implementation is as follows, we first take out the y component data, then the UV component data, looks fine, but this is actually not the data store for our video format.

// Change the Samplebuff to cvBufferRef. CvBufferRef stores the data in the pixel buffer
CVImageBufferRef cvBufferRef = CMSampleBufferGetImageBuffer(sampleBuff);
// Lock the address so that data can be accessed from main memory later
CVPixelBufferLockBaseAddress(cvBufferRef, kCVPixelBufferLock_ReadOnly);
// Get y component data
unsigned char *y_frame = (unsigned char*)CVPixelBufferGetBaseAddressOfPlane(cvBufferRef, 0);
// Get the uv component data
unsigned char *uv_frame = (unsigned char*)CVPixelBufferGetBaseAddressOfPlane(cvBufferRef, 1);
Copy the code

The cvBufferRef code should store the data in the following format:

typedef struct CVPlanarPixelBufferInfo_YCbCrPlanar   CVPlanarPixelBufferInfo_YCbCrPlanar;
struct CVPlanarPixelBufferInfo_YCbCrBiPlanar {
  CVPlanarComponentInfo  componentInfoY;
  CVPlanarComponentInfo  componentInfoCbCr;
};
Copy the code

In the first code, however, the pixelFormatType used is kCVPixelFormatType_420YpCbCr8Planar and stores the data in the following format:

typedef struct CVPlanarPixelBufferInfo         CVPlanarPixelBufferInfo;
struct CVPlanarPixelBufferInfo_YCbCrPlanar {
  CVPlanarComponentInfo  componentInfoY;
  CVPlanarComponentInfo  componentInfoCb;
  CVPlanarComponentInfo  componentInfoCr;
};
Copy the code

That is, yuV should be decoded in terms of three components, not two. Achieve the correct decoding method, successfully eliminate the green bar.

At this point, encountered the pit is finished, the effect is also good.

Finally, I hope this article can be helpful to you, and you can take fewer detours in the development of live broadcast

reading

To develop JS, you must first attack CSS

Guidelines for the design of interactive fretting effects

Machine learning in action! Quick introduction to online advertising business and CTR knowledge

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!