Author: Agora Cavan

In the increasingly diverse live streaming scene, if you are also a fan of a game anchor, there is one kind of live streaming that you must be familiar with, that is the screen sharing we are going to talk about today.

Screen sharing in a live broadcast scenario requires not only sharing the picture displayed on the current display to the remote end, but also transmitting the sound, including the voice of the application and the voice of the anchor. In view of these two requirements, we can simply analyze the media streams required for a screen sharing live broadcast as follows:

  1. A video stream of a display screen
  2. An audio stream that applies sound
  3. An audio stream of the host’s voice

ReplayKit is apple’s framework for screen recording for iOS.

First, let’s take a look at apple’s data callback interface to ReplayKit for screen recording:

override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
        DispatchQueue.main.async {
            switch sampleBufferType {
            case .video:
                AgoraUploader.sendVideoBuffer(sampleBuffer)
            case .audioApp:
                AgoraUploader.sendAudioAppBuffer(sampleBuffer)
            case .audioMic:
                AgoraUploader.sendAudioMicBuffer(sampleBuffer)
            @unknown default:
                break
            }
        }
    }
Copy the code

By enumerating sampleBufferType, we can see that it fits our media streaming requirements.

video

format

guard let videoFrame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
    return
}
        
let type = CVPixelBufferGetPixelFormatType(videoFrame)
Copy the code
type = kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
Copy the code

Through CVPixelBufferGetPixelFormatType, we can get to each frame of video formats to yuv420.

Frame rate

Through the number of callback times of the print interface, we can know that the number of video frames can be obtained per second is 30, which is the frame rate of 30.

The format and frame rate are within the range that Agora RTC can receive, so the video can be shared to the remote end using Agora RTC’s pushExternalVideoFrame.

agoraKit.pushExternalVideoFrame(frame)
Copy the code

Just a little tidbit

The frame displayed on the display is from a frame cache, usually dual or triple cache. When the screen finishes displaying a frame, it sends out a v-sync signal, telling the frame cache to switch to the next frame cache, and the display begins to read the new frame for display.

This frame cache is system-level and cannot be read or written by the average developer. But if apple’s own recording framework, ReplayKit, was able to read frames that had already been rendered and were going to be used on the display, and this process did not interfere with the rendering process and cause frames to drop, it would reduce the number of renderings needed to provide callback data to ReplayKit.

audio

ReplayKit provides two types of audio: audio streams recorded by the microphone, and audio streams being played by the app in response. (Hereafter referred to as AudioMic and AudioApp)

The audio format can be obtained with the following two lines of code

CMAudioFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
const AudioStreamBasicDescription *description = CMAudioFormatDescriptionGetStreamBasicDescription(format);
Copy the code

AudioApp

AudioApp will have a different number of channels for different models. For example, in the iPad or iPhone7 models below, the device does not have dual channel playback, then AudioApp data is mono channel, otherwise is dual channel.

The sampling rate is 44100 for some models that have been tested, but it cannot be ruled out that other sampling rates will be used for models that have not been tested.

AudioMic

AudioMic in the tested model, the sampling rate is 32000, the number of channels is mono.

Audio preprocessing

If we send AudioApp and AudioMic as two audio streams, the traffic must be greater than one audio stream. In order to save the flow of one audio stream, we need to do the mixing (fusion) of the two audio streams.

However, from the above, it is not difficult to see that the formats of the two audio streams are different, and there is no guarantee that other formats will appear with different models. In the process of testing also found that the OS version of the different, each callback to the length of audio data will also change. So before we mix the two audio streams, we need to unify the formats to deal with the various formats that ReplayKit gives us. So we took the following important steps:

	 if (channels == 1) {
        int16_t* intData = (int16_t*)dataPointer;
        int16_t newBuffer[totalSamples * 2];
                
        for (int i = 0; i < totalSamples; i++) {
            newBuffer[2 * i] = intData[i];
            newBuffer[2 * i + 1] = intData[i];
        }
        totalSamples *= 2;
        memcpy(dataPointer, newBuffer, sizeof(int16_t) * totalSamples);
        totalBytes *= 2;
        channels = 2;
    }
Copy the code
  • Whether it’s AudioMic or AudioApp, if the incoming stream is mono channel, we convert it to double channel;
if (sampleRate ! = resampleRate) { int inDataSamplesPer10ms = sampleRate / 100; int outDataSamplesPer10ms = (int)resampleRate / 100; int16_t* intData = (int16_t*)dataPointer; switch (type) { case AudioTypeApp: totalSamples = resampleApp(intData, dataPointerSize, totalSamples, inDataSamplesPer10ms, outDataSamplesPer10ms, channels, sampleRate, (int)resampleRate); break; case AudioTypeMic: totalSamples = resampleMic(intData, dataPointerSize, totalSamples, inDataSamplesPer10ms, outDataSamplesPer10ms, channels, sampleRate, (int)resampleRate); break; } totalBytes = totalSamples * sizeof(int16_t); }Copy the code
  • Whether AudioMic or AudioApp, as long as the incoming stream sample rate is not 48000, we will resample them to 48000;
  memcpy(appAudio + appAudioIndex, dataPointer, totalBytes);
  appAudioIndex += totalSamples;
Copy the code
	memcpy(micAudio + micAudioIndex, dataPointer, totalBytes);
  micAudioIndex += totalSamples;
Copy the code
  • Through steps 1 and 2, we ensure that both audio streams are in the same audio format. But since ReplayKit calls back to one type of data at a time, we have to use two caches to store both streams before mixing;
  int64_t mixIndex = appAudioIndex > micAudioIndex ? micAudioIndex : appAudioIndex;
            
  int16_t pushBuffer[appAudioIndex];
            
  memcpy(pushBuffer, appAudio, appAudioIndex * sizeof(int16_t));
            
  for (int i = 0; i < mixIndex; i ++) {
       pushBuffer[i] = (appAudio[i] + micAudio[i]) / 2;
  }
Copy the code
  • ReplayKit has the option to turn on the microphone recording, so when we turn off the microphone recording, we only have an AudioApp audio stream. So we take this stream as the main stream to read the data length of the AudioMic cache, then compare the data length of the two caches, and take the smallest data length as our mixing length. Merge the data in the two caches of the length of the mix to get the mixed data and write it to a new mix cache (or directly to the cache of AudioApp).
[AgoraAudioProcessing pushAudioFrame:(*unsigned* *char* *)pushBuffer
                                   withFrameSize:appAudioIndex * *sizeof*(int16_t)];
Copy the code
  • Finally, we copy the mixed data into the Agora RTC’s C++ recording callback interface. At this time, we can transmit the sound recorded by the microphone and the sound played by the application to the remote end.

Through the processing of audio and video streams, combined with Agora RTC SDK, we completed the realization of a screen sharing live scene.

For details on the implementation, please refer to github.com/AgoraIO/Adv…