Introduction to the
VideoCore is an open source video processing library that supports capture, compositing, encoding, and RTMP streaming. Address: github.com/jgh-/VideoC…
The code structure
VideoCore’s processing flow is as follows
Source (Camera) -> Transform (Composite) -> Transform (H.264 Encode) -> Transform (RTMP Packetize) -> Output (RTMP)
VideoCore code structure
videocore/
sources/
videocore::ISource
videocore::IAudioSource : videocore::ISource
videocore::IVideoSource : videocore::ISource
videocore::Watermark : videocore:IVideoSource
iOS/
videocore::iOS::CameraSource : videocore::IVideoSource
Apple/
videocore::Apple::MicSource : videocore::IAudioSource
OSX/
videocore::OSX::DisplaySource : videocore::IVideoSource
videocore::OSX::SystemAudioSource : videocore::IAudioSource
outputs/
videocore::IOutput
videocore::ITransform : videocore::IOutput
iOS/
videocore::iOS::H264Transform : videocore::ITransform
videocore::iOS::AACTransform : videocore::ITransform
OSX/
videocore::OSX::H264Transform : videocore::ITransform
videocore::OSX::AACTransform : videocore::ITransform
RTMP/
videocore::rtmp::H264Packetizer : videocore::ITransform
videocore::rtmp::AACPacketizer : videocore::ITransform
mixers/
videocore::IMixer
videocore::IAudioMixer : videocore::IMixer
videocore::IVideoMixer : videocore::IMixer
videocore::AudioMixer : videocore::IAudioMixer
iOS/
videocore::iOS::GLESVideoMixer : videocore::IVideoMixer
OSX/
videocore::OSX::GLVideoMixer : videocore::IVideoMixer
rtmp/
videocore::RTMPSession : videocore::IOutput
stream/
videocore::IStreamSession
Apple/
videocore::Apple::StreamSession : videocore::IStreamSession
Copy the code
VCSimpleSession = videocore (); VCSimpleSession = videocore ();
CameraSource – > AspectTransform – > PositionTransform – > GLESVideoMixer – > Split – > PixelBufferOutput
Set MicSource – > AudioMixer
The processing process for audio and video collection using VideoCore is similar to the above. After the camera and microphone are enabled to collect audio and video data, pushBuffer is called to transmit the data to the next level for further processing. Started pushing flow, again add processing chain, call addEncodersAndPacketizers method RTMP push flow before, need to encode audio and video data, packaging, and then divided to push after packaging data into the flow in the queue.
GLESVideoMixer — > H264EncodeApple — > Split — > H264Packetizer — > RTMPSession
AudioMixer — > AACEncode — > Split — > AACPacketizer — > RTMPSession
The Split class does nothing more than forward the data to the next node.
IOutput base class
The IOutput class, as the output base class, defines two virtual functions, with pushBuffer as the entry to the next node.
class IOutput
{
public:
virtual void setEpoch(const std::chrono::steady_clock::time_point epoch) {};
virtual void pushBuffer(const uint8_t* const data, size_t size, IMetadata& metadata) = 0;
virtual ~IOutput() {};
};
Copy the code
Video acquisition
Video capture class CameraSource, call IOS system library AVFoundation, core code in the setupCamera method.
In the AVCaptureVideoDataOutput proxy callback method, the data to be collected
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection { auto source = m_source.lock(); if(source) { source->bufferCaptured(CMSampleBufferGetImageBuffer(sampleBuffer)); }}Copy the code
The source here is CameraSource
CameraSource::bufferCaptured(CVPixelBufferRef pixelBufferRef) { auto output = m_output.lock(); if(output) { VideoBufferMetadata md(1.f / float(m_fps)); md.setData(1, m_matrix, false, shared_from_this()); auto pixelBuffer = std::make_shared<Apple::PixelBuffer>(pixelBufferRef, true); pixelBuffer->setState(kVCPixelBufferStateEnqueued); output->pushBuffer((uint8_t*)&pixelBuffer, sizeof(pixelBuffer), md); }}Copy the code
In bufferCaptured approach, CVPixelBufferRef encapsulated into PixelBuffer, and set the status to kVCPixelBufferStateEnqueued, then PixelBuffer passed on to the next node (such as: CameraSource – > AspectTransform – > PositionTransform – > GLESVideoMixer).
AspectTransform and PositionTransform make adjustments to the video, such as panning and zooming
GLESVideoMixer renders the video to the texture and passes the rendered data to the next node (H264EncodeApple).
Audio collection
The audio collection is in the MicSource class, using the system library AudioToolbox, and the core code is in the constructor function MicSource(). In the audio collection callback method handleInputBuffer, the collected data is taken, the data is packaged into AudioBuffer and related parameters are added, and then passed to the next node (AudioMixer).
static OSStatus handleInputBuffer(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
videocore::iOS::MicSource* mc =static_cast<videocore::iOS::MicSource*>(inRefCon);
AudioBuffer buffer;
buffer.mData = NULL;
buffer.mDataByteSize = 0;
buffer.mNumberChannels = 2;
AudioBufferList buffers;
buffers.mNumberBuffers = 1;
buffers.mBuffers[0] = buffer;
OSStatus status = AudioUnitRender(mc->audioUnit(),
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
&buffers);
if(!status) {
mc->inputCallback((uint8_t*)buffers.mBuffers[0].mData, buffers.mBuffers[0].mDataByteSize, inNumberFrames);
}
return status;
}
MicSource::inputCallback(uint8_t *data, size_t data_size, **int** inNumberFrames)
{
auto output = m_output.lock();
if(output) {
videocore::AudioBufferMetadata md (0.);
md.setData(m_sampleRate,
16,
m_channelCount,
kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked,
m_channelCount * 2,
inNumberFrames,
false,
false,
shared_from_this());
output->pushBuffer(data, data_size, md);
}
}
Copy the code
After receiving the data, the AudioMixer resamples the audio data into a queue to be encoded, which is a linked list structure. After AudioMixer calls start(), the audio data from the queue is passed to the next node (AACEncode);
Video coding
After receiving the data of the upper level, H264EncodeApple compresses the data with H264 encoding
H264EncodeApple::pushBuffer(const uint8_t *const data, size_t size, videocore::IMetadata &metadata)
{
#if VERSION_OK
if(m_compressionSession) {
m_encodeMutex.lock();
VTCompressionSessionRef session = (VTCompressionSessionRef)m_compressionSession;
CMTime pts = CMTimeMake(metadata.timestampDelta + m_ctsOffset, 1000.); // timestamp is in ms.
CMTime dur = CMTimeMake(1, m_fps);
VTEncodeInfoFlags flags;
CFMutableDictionaryRef frameProps = NULL;
if(m_forceKeyframe) {
s_forcedKeyframePTS = pts.value;
frameProps = CFDictionaryCreateMutable(kCFAllocatorDefault, 1,&kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);
CFDictionaryAddValue(frameProps, kVTEncodeFrameOptionKey_ForceKeyFrame, kCFBooleanTrue);
}
VTCompressionSessionEncodeFrame(session, (CVPixelBufferRef)data, pts, dur, frameProps, NULL, &flags);
if(m_forceKeyframe) {
CFRelease(frameProps);
m_forceKeyframe = **false**;
}
m_encodeMutex.unlock();
}
#endif
}
Copy the code
The hard coding provided by IOS is used here to pass the encoded data to the next node in the encoding callback
void vtCallback(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer ) { CMBlockBufferRef block = CMSampleBufferGetDataBuffer(sampleBuffer); CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false); CMTime pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer); CMTime dts = CMSampleBufferGetDecodeTimeStamp(sampleBuffer); //printf("status: %d\n", (int) status); bool isKeyframe = **false**; if(attachments ! = **NULL**) { CFDictionaryRef attachment; CFBooleanRef dependsOnOthers; attachment = (CFDictionaryRef)CFArrayGetValueAtIndex(attachments, 0); dependsOnOthers = (CFBooleanRef)CFDictionaryGetValue(attachment, kCMSampleAttachmentKey_DependsOnOthers); isKeyframe = (dependsOnOthers == kCFBooleanFalse); } if(isKeyframe) { // Send the SPS and PPS. CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer); size_t spsSize, ppsSize; size_t parmCount; const uint8_t* sps, *pps; CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sps, &spsSize, &parmCount, nullptr); CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pps, &ppsSize, &parmCount, nullptr ); std::unique_ptr<uint8_t[]> sps_buf (new uint8_t[spsSize + 4]) ; std::unique_ptr<uint8_t[]> pps_buf (new uint8_t[ppsSize + 4]) ; memcpy(&sps_buf[4], sps, spsSize); spsSize+=4 ; memcpy(&sps_buf[0], &spsSize, 4); memcpy(&pps_buf[4], pps, ppsSize); ppsSize += 4; memcpy(&pps_buf[0], &ppsSize, 4); ((H264EncodeApple*)outputCallbackRefCon)->compressionSessionOutput((uint8_t*)sps_buf.get(),spsSize, pts.value, dts.value); ((H264EncodeApple*)outputCallbackRefCon)->compressionSessionOutput((uint8_t*)pps_buf.get(),ppsSize, pts.value, dts.value); } char* bufferData; size_t size; CMBlockBufferGetDataPointer(block, 0, **NULL**, &size, &bufferData); ((H264EncodeApple*)outputCallbackRefCon)->compressionSessionOutput((uint8_t*)bufferData,size, pts.value, dts.value); } H264EncodeApple::compressionSessionOutput(const uint8_t *data, size_t size, uint64_t pts, uint64_t dts) { #if VERSION_OK auto l = m_output.lock(); if(l && data && size > 0) { videocore::VideoBufferMetadata md(pts, dts); l->pushBuffer(data, size, md); } #endif }Copy the code
Audio coding
After receiving the audio data, AACEncode encodes the data. VideoCore uses AudioToolbox library of IOS system for AAC transcoding.
AACEncode::ioProc(AudioConverterRef audioConverter, UInt32 *ioNumDataPackets, AudioBufferList* ioData, AudioStreamPacketDescription** ioPacketDesc, void* inUserData )
{
UserData* ud = static_cast<UserData*>(inUserData);
UInt32 maxPackets = ud->size / ud->packetSize;
*ioNumDataPackets = std::min(maxPackets, *ioNumDataPackets);
ioData->mBuffers[0].mData = ud->data;
ioData->mBuffers[0].mDataByteSize = ud->size;
ioData->mBuffers[0].mNumberChannels = 1;
return noErr;
}
AACEncode::pushBuffer(const uint8_t* const data, size_t size, IMetadata& metadata)
{
const size_t sampleCount = size / m_bytesPerSample;
const size_t aac_packet_count = sampleCount / kSamplesPerFrame;
const size_t required_bytes = aac_packet_count * m_outputPacketMaxSize;
if(m_outputBuffer.total() < (required_bytes)) {
m_outputBuffer.resize(required_bytes);
}
uint8_t* p = m_outputBuffer();
uint8_t* p_out = (uint8_t*)data;
for ( size_t i = 0 ; i < aac_packet_count ; ++i ) {
UInt32 num_packets = 1;
AudioBufferList l;
l.mNumberBuffers=1;
l.mBuffers[0].mDataByteSize = m_outputPacketMaxSize * num_packets;
l.mBuffers[0].mData = p;
std::unique_ptr<UserData> ud(new UserData());
ud->size = static_cast<int>(kSamplesPerFrame * m_bytesPerSample);
ud->data = const_cast<uint8_t*>(p_out);
ud->packetSize = static_cast<int>(m_bytesPerSample);
AudioStreamPacketDescription output_packet_desc[num_packets];
m_converterMutex.lock();
AudioConverterFillComplexBuffer(m_audioConverter, AACEncode::ioProc, ud.get(), &num_packets, &l, output_packet_desc);
m_converterMutex.unlock();
p += output_packet_desc[0].mDataByteSize;
p_out += kSamplesPerFrame * m_bytesPerSample;
}
const size_t totalBytes = p - m_outputBuffer();
auto output = m_output.lock();
if(output && totalBytes) {
if(!m_sentConfig) {
output->pushBuffer((const uint8_t*)m_asc, sizeof(m_asc), metadata);
m_sentConfig = true;
}
output->pushBuffer(m_outputBuffer(), totalBytes, metadata);
}
}
Copy the code
The audio data is encoded and passed to the next node (Split — > AACPacketizer).
Push the flow
The logic for pushing the stream is handled in the last node (RTMPSession), and the key code is as follows:
RTMPSession::pushBuffer(const uint8_t* const data, size_t size, IMetadata& metadata)
{
if(m_ending) {
return ;
}
// make the lamdba capture the data
std::shared_ptr<Buffer> buf = std::make_shared<Buffer>(size);
buf->put(const_cast<uint8_t*>(data), size);
const RTMPMetadata_t inMetadata = static_cast<const RTMPMetadata_t&>(metadata);
m_jobQueue.enqueue([=]() {
if(!this->m_ending) {
static int c_count = 0;
c_count ++;
auto packetTime = std::chrono::steady_clock::now();
std::vector<uint8_t> chunk;
chunk.reserve(size+64);
size_t len = buf->size();
size_t tosend = std::min(len, m_outChunkSize);
uint8_t* p;
buf->read(&p, buf->size());
uint64_t ts = inMetadata.getData<kRTMPMetadataTimestamp>() ;
const int streamId = inMetadata.getData<kRTMPMetadataMsgStreamId>();
#ifndef RTMP_CHUNK_TYPE_0_ONLY
auto it = m_previousChunkData.find(streamId);
if(it == m_previousChunkData.end()) {
#endif
// Type 0.
put_byte(chunk, ( streamId & 0x1F));
put_be24(chunk, static_cast<uint32_t>(ts));
put_be24(chunk, inMetadata.getData<kRTMPMetadataMsgLength>());
put_byte(chunk, inMetadata.getData<kRTMPMetadataMsgTypeId>());
put_buff(chunk, (uint8_t*)&m_streamId, sizeof(int32_t)); // msg stream id is little-endian
#ifndef RTMP_CHUNK_TYPE_0_ONLY
} else {
// Type 1.
put_byte(chunk, RTMP_CHUNK_TYPE_1 | (streamId & 0x1F));
put_be24(chunk, static_cast<uint32_t>(ts - it->second)); // timestamp delta
put_be24(chunk, inMetadata.getData<kRTMPMetadataMsgLength>());
put_byte(chunk, inMetadata.getData<kRTMPMetadataMsgTypeId>());
}
#endif
m_previousChunkData[streamId] = ts;
put_buff(chunk, p, tosend);
len -= tosend;
p += tosend;
while(len > 0) {
tosend = std::min(len, m_outChunkSize);
p[-1] = RTMP_CHUNK_TYPE_3 | (streamId & 0x1F);
put_buff(chunk, p-1, tosend+1);
p+=tosend;
len-=tosend;
}
this->write(&chunk[0], chunk.size(), packetTime, inMetadata.getData<kRTMPMetadataIsKeyframe>() );
}
});
}
Copy the code
The data is packed into CHUNKS of RMPT protocol and pushed to stream.