This article was first published on the wechat official account Byteflow
FFmpeg development series serial:
FFmpeg Development (01) : FFmpeg compilation and integration
FFmpeg development (02) : FFmpeg + ANativeWindow video decoding playback
FFmpeg development (03) : FFmpeg + OpenSLES audio decoding playback
FFmpeg development (04) : FFmpeg + OpenGLES audio visual playback
FFmpeg development (05) : FFmpeg + OpenGLES video decoding playback and video filter
In the previous article, we implemented the rendering of decoded video and audio using OpenGL ES and OpenSL ES respectively based on FFmpeg. This article will realize the last important function of the player: audio and video synchronization.
Old people often say that there is no absolute static synchronization between audio and video playback, only relative dynamic synchronization, in fact, audio and video synchronization is a “you catch up with me” process.
There are three audio and video synchronization modes: audio and video to system clock synchronization, audio to video synchronization, and video to audio synchronization.
Audio and video decoder structure
Before the realization of audio and video synchronization, we first briefly say the general structure of the player, convenient for the realization of different audio and video synchronization.
As shown in the figure above, audio decoding and video decoding occupy an independent thread respectively. There is a decoding loop in the thread, which continuously decodes the audio and video encoded data. The audio and video decoding frame is not set to cache Buffer for real-time rendering, which greatly facilitates the realization of audio and video synchronization.
Audio and video decoding thread independent separation of the player mode, simple and flexible, small code for beginners, can be very convenient to achieve audio and video synchronization.
The audio and video decoding processes are very similar, so we can abstract the decoders for both as a base class:
class DecoderBase : public Decoder {
public:
DecoderBase()
{};
virtual~ DecoderBase()
{};
// Start playing
virtual void Start(a);
// Pause the playback
virtual void Pause(a);
/ / stop
virtual void Stop(a);
// Get the duration
virtual float GetDuration(a)
{
//ms to s
return m_Duration * 1.0 f / 1000;
}
//seek to play at a certain point in time
virtual void SeekToPosition(float position);
// The current playing position is used for updating the progress bar and audio and video synchronization
virtual float GetCurrentPosition(a);
virtual void ClearCache(a)
{};
virtual void SetMessageCallback(void* context, MessageCallback callback)
{
m_MsgContext = context;
m_MsgCallback = callback;
}
// Set the audio and video synchronization callback
virtual void SetAVSyncCallback(void* context, AVSyncCallback callback)
{
m_AVDecoderContext = context;
m_AudioSyncCallback = callback;
}
protected:
void * m_MsgContext = nullptr;
MessageCallback m_MsgCallback = nullptr;
virtual int Init(const char *url, AVMediaType mediaType);
virtual void UnInit(a);
virtual void OnDecoderReady(a) = 0;
virtual void OnDecoderDone(a) = 0;
// Decodes the data callback
virtual void OnFrameAvailable(AVFrame *frame) = 0;
AVCodecContext *GetCodecContext(a) {
return m_AVCodecContext;
}
private:
int InitFFDecoder(a);
void UnInitDecoder(a);
// Start the decoding thread
void StartDecodingThread(a);
// Audio and video decoding loop
void DecodingLoop(a);
// Update the display timestamp
void UpdateTimeStamp(a);
// Audio and video synchronization
void AVSync(a);
// Decode a packet encoding data
int DecodeOnePacket(a);
// Thread function
static void DoAVDecoding(DecoderBase *decoder);
// Encapsulate the format context
AVFormatContext *m_AVFormatContext = nullptr;
// Decoder context
AVCodecContext *m_AVCodecContext = nullptr;
/ / decoder
AVCodec *m_AVCodec = nullptr;
// Encoded packets
AVPacket *m_Packet = nullptr;
// Decoded frames
AVFrame *m_Frame = nullptr;
// The type of data flow
AVMediaType m_MediaType = AVMEDIA_TYPE_UNKNOWN;
// File address
char m_Url[MAX_PATH] = {0};
// Current playback time
long m_CurTimeStamp = 0;
// Start time of playback
long m_StartTimeStamp = - 1;
// Total duration ms
long m_Duration = 0;
// Data flow index
int m_StreamIndex = - 1;
// Lock and condition variables
mutex m_Mutex;
condition_variable m_Cond;
thread *m_Thread = nullptr;
//seek position
volatile float m_SeekPosition = 0;
volatile bool m_SeekSuccess = false;
// Decoder state
volatile int m_DecoderState = STATE_UNKNOWN;
void* m_AVDecoderContext = nullptr;
AVSyncCallback m_AudioSyncCallback = nullptr;// For audio and video synchronization
};
Copy the code
Space is limited, too much code can easily lead to visual fatigue, see the full implementation of the code in the reading article, only a few key functions are posted here.
Decode the loop.
void DecoderBase::DecodingLoop(a) {
LOGCATE("DecoderBase::DecodingLoop start, m_MediaType=%d", m_MediaType);
{
std::unique_lock<std::mutex> lock(m_Mutex);
m_DecoderState = STATE_DECODING;
lock.unlock(a); }for(;;) {
while (m_DecoderState == STATE_PAUSE) {
std::unique_lock<std::mutex> lock(m_Mutex);
LOGCATE("DecoderBase::DecodingLoop waiting, m_MediaType=%d", m_MediaType);
m_Cond.wait_for(lock, std::chrono::milliseconds(10));
m_StartTimeStamp = GetSysCurrentTime() - m_CurTimeStamp;
}
if(m_DecoderState == STATE_STOP) {
break;
}
if(m_StartTimeStamp == - 1)
m_StartTimeStamp = GetSysCurrentTime(a);if(DecodeOnePacket() != 0) {
// Pause the decoder after decoding
std::unique_lock<std::mutex> lock(m_Mutex); m_DecoderState = STATE_PAUSE; }}LOGCATE("DecoderBase::DecodingLoop end");
}
Copy the code
Gets the current timestamp.
void DecoderBase::UpdateTimeStamp(a) {
LOGCATE("DecoderBase::UpdateTimeStamp");
/ / reference ffplay
std::unique_lock<std::mutex> lock(m_Mutex);
if(m_Frame->pkt_dts ! = AV_NOPTS_VALUE) { m_CurTimeStamp = m_Frame->pkt_dts; }else if(m_Frame->pts ! = AV_NOPTS_VALUE) { m_CurTimeStamp = m_Frame->pts; }else {
m_CurTimeStamp = 0;
}
m_CurTimeStamp = (int64_t)((m_CurTimeStamp * av_q2d(m_AVFormatContext->streams[m_StreamIndex]->time_base)) * 1000);
}
Copy the code
Decode the encoded data of a packet.
int DecoderBase::DecodeOnePacket(a) {
int result = av_read_frame(m_AVFormatContext, m_Packet);
while(result == 0) {
if(m_Packet->stream_index == m_StreamIndex) {
if(avcodec_send_packet(m_AVCodecContext, m_Packet) == AVERROR_EOF) {
// End of decoding
result = - 1;
goto __EXIT;
}
// How many frames does a packet contain?
int frameCount = 0;
while (avcodec_receive_frame(m_AVCodecContext, m_Frame) == 0) {
// Update the timestamp
UpdateTimeStamp(a);/ / synchronize
AVSync(a);/ / rendering
LOGCATE("DecoderBase::DecodeOnePacket 000 m_MediaType=%d", m_MediaType);
OnFrameAvailable(m_Frame);
LOGCATE("DecoderBase::DecodeOnePacket 0001 m_MediaType=%d", m_MediaType);
frameCount ++;
}
LOGCATE("BaseDecoder::DecodeOneFrame frameCount=%d", frameCount);
// Check whether a packet is decoded
if(frameCount > 0) {
result = 0;
goto__EXIT; }}av_packet_unref(m_Packet);
result = av_read_frame(m_AVFormatContext, m_Packet);
}
__EXIT:
av_packet_unref(m_Packet);
return result;
}
Copy the code
The audio and video are synchronized with the system clock
Audio and video synchronization to the system clock, as the name implies, the update of the system clock is increased according to the increase of time, the acquisition of audio and video decoding frame with the system clock for alignment operation.
In short, when the current audio or video playback timestamp is greater than the system clock, the decoder thread sleeps until the timestamp is aligned with the system clock.
The audio and video are synchronized with the system clock.
void DecoderBase::AVSync(a) {
LOGCATE("DecoderBase::AVSync");
long curSysTime = GetSysCurrentTime(a);// Calculates the elapsed time from the start of playback based on the system clock
long elapsedTime = curSysTime - m_StartTimeStamp;
// Synchronize the system clock
if(m_CurTimeStamp > elapsedTime) {
// Sleep time
auto sleepTime = static_cast<unsigned int>(m_CurTimeStamp - elapsedTime);//ms
av_usleep(sleepTime * 1000); }}Copy the code
Synchronizing audio and video with the system clock minimizes frame loss and frame hops, but only when the system clock is not affected by other time-consuming tasks.
Audio is synchronized to video
Audio to video synchronization is when the audio timestamp is aligned with the video timestamp. Since video has a fixed refresh rate, namely FPS, we determine the render time of each frame based on PFS, and then use this to determine the time stamp of the video.
When the audio timestamp is larger than the video timestamp or exceeds a certain threshold, the audio player generally inserts a mute frame, sleeps, or slows down the playback. Otherwise, you need to skip, drop frames, or speed up audio playback.
void DecoderBase::AVSync(a) {
LOGCATE("DecoderBase::AVSync");
if(m_AVSyncCallback ! =nullptr) {
// Audio is synchronized to video, and m_AVSyncCallback is passed in to get the video timestamp
long elapsedTime = m_AVSyncCallback(m_AVDecoderContext);
LOGCATE("DecoderBase::AVSync m_CurTimeStamp=%ld, elapsedTime=%ld", m_CurTimeStamp, elapsedTime);
if(m_CurTimeStamp > elapsedTime) {
// Sleep time
auto sleepTime = static_cast<unsigned int>(m_CurTimeStamp - elapsedTime);//ms
av_usleep(sleepTime * 1000); }}}Copy the code
Decoder Settings when audio is synchronized to video.
// Create a decoder
m_VideoDecoder = new VideoDecoder(url);
m_AudioDecoder = new AudioDecoder(url);
// Set the renderer
m_VideoDecoder->SetVideoRender(OpenGLRender::GetInstance());
m_AudioRender = new OpenSLRender(a); m_AudioDecoder->SetVideoRender(m_AudioRender);
// Set the video timestamp callback
m_AudioDecoder->SetAVSyncCallback(m_VideoDecoder, VideoDecoder::GetVideoDecoderTimestampForAVSync);
Copy the code
The advantage of audio to video synchronization is that the video can play every frame, and the picture smoothness is optimal.
However, because the human ear is more sensitive to sound than the eye to image, when audio is aligned with video, mute frame, frame loss or variable speed play can be easily detected by users, resulting in poor experience.
Video is synchronized to audio
Syncing video to audio is common, taking advantage of the fact that the ear is more sensitive to changes in sound than the eye to changes in images.
The audio is played at a fixed sampling rate to provide an alignment baseline for the video. When the video timestamp is larger than the audio timestamp, the renderer does not render or repeat the previous frame, but instead performs a frame-hopping render.
void DecoderBase::AVSync(a) {
LOGCATE("DecoderBase::AVSync");
if(m_AVSyncCallback ! =nullptr) {
// The video is synchronized to the audio, and m_AVSyncCallback is passed in to get the audio timestamp
long elapsedTime = m_AVSyncCallback(m_AVDecoderContext);
LOGCATE("DecoderBase::AVSync m_CurTimeStamp=%ld, elapsedTime=%ld", m_CurTimeStamp, elapsedTime);
if(m_CurTimeStamp > elapsedTime) {
// Sleep time
auto sleepTime = static_cast<unsigned int>(m_CurTimeStamp - elapsedTime);//ms
av_usleep(sleepTime * 1000); }}}Copy the code
Decoder Settings when audio is synchronized to video.
// Create a decoder
m_VideoDecoder = new VideoDecoder(url);
m_AudioDecoder = new AudioDecoder(url);
// Set the renderer
m_VideoDecoder->SetVideoRender(OpenGLRender::GetInstance());
m_AudioRender = new OpenSLRender(a); m_AudioDecoder->SetVideoRender(m_AudioRender);
// Set the audio timestamp callback
m_VideoDecoder->SetAVSyncCallback(m_AudioDecoder, AudioDecoder::GetAudioDecoderTimestampForAVSync);
Copy the code
conclusion
Among the three methods of audio and video synchronization, which one is appropriate depends on the specific use scenario. For example, if you have high requirements for screen fluency, you can choose audio to video synchronization. If you want to implement video or audio playback separately, it’s easier to sync directly to the system clock.
Contact and exchange
Technical exchange/access source can add my wechat: byte-flow, access to audio and video development video tutorials