List of SDL2 articles

Introduction to SDL2

SDL2 event processing

SDL2 texture rendering

SDL2 audio playback

FFmpeg+SDL2 video streaming

FFmpeg+SDL2 audio stream playback

The first two articles did audio and video playback respectively, to achieve a complete simple player must do audio and video synchronization, and audio and video synchronization in audio and video development is a very important knowledge point, so here to record the understanding of audio and video synchronization related knowledge.

Introduction to Audio and video synchronization

As you can see from the previous study, in a video file, both audio and video exist as a single stream and do not interfere with each other. Then, the playback time of a certain Frame (Sample) can be played separately according to the Frame Rate of the video and the Sample Rate of the audio through simple calculation, which should be synchronous theoretically. However, due to the speed of machine operation, decoding efficiency and other factors, it is likely to appear audio and video synchronization, for example, in the video, people are talking, but only the mouth of the figure can be seen but no sound, which greatly affects the user’s viewing experience.

How to synchronize audio and video? To know that audio and video synchronization is a dynamic process, synchronization is temporary, not synchronization is the normal, need a linear increase over time, video and audio playback speed are based on this amount as the standard, play fast will slow down the playback speed; Play slow to speed up the speed of playback, in your pursuit of each other to achieve the synchronization of the state. There are three main ways to achieve synchronization:

  • Synchronize video and audio with an external clock. Select an external clock as the baseline, and the video and audio playback speeds are based on this clock.
  • To synchronize audio to video is to synchronize audio based on the playback speed of the video.
  • To synchronize a video to audio, you synchronize the video based on the playback speed of the audio.

More mainstream is the third, which syncs video to audio. As for why we don’t use the first two methods, generally speaking, people are more sensitive to sound. If we frequently adjust the audio, it will produce noise and make people feel harsh and uncomfortable, while people are much less sensitive to images, so the third method is generally adopted.

Review DTS, PTS and time base

  • PTS: Presentation Time Stamp, showing the Time Stamp used for rendering, telling us when we need to display it
  • DTS: Decode Time Stamp, which tells us when we need to Decode the video

PTS and DTS are generally the same in audio. But in video, PTS and DTS may be different due to the presence of B frames.

Actual frame order: I, B, B, P

Store frame sequence: I, P, B, B

Decode timestamp: 1, 4, 2, 3

Display time stamps: 1, 2, 3, 4

  • The time base
/** * This is the fundamental unit of time (in seconds) in terms * of which frame timestamps are represented. * This is the basic unit of time (in seconds) that represents a frame timestamp. * * /
typedef struct AVRational{
    int num; / / / < Numerator molecules
    int den; / / / < Denominator the Denominator
} AVRational;
Copy the code

The time base is a fraction, in seconds, like 1/50 of a second, so what does that mean? In the frame rate example, if it has a time base of 1/50th of a second, it means that one frame of data is displayed every 1/50th of a second, that is, 50 frames are displayed every second at a frame rate of 50FPS.

Each frame of data has a corresponding PTS, and when playing video or audio we need to convert the PTS timestamp into a time in seconds for final display. How to calculate the time and position of a frame in the whole video?

static inline double av_q2d(AVRational a){
    return a.num / (double) a.den;
}

// Calculate the time and position of a frame in the entire videoTimestamp (seconds) = PTS * av_q2d(st->time_base);Copy the code

Audio_Clock

Audio_Clock is the time from the start of Audio to the current time. Get Audio_Clock:

if(pkt->pts ! = AV_NOPTS_VALUE) { state->audio_clock = av_q2d(state->audio_st->time_base) * pkt->pts; }Copy the code

Since a packet can contain multiple Frame frames, PTS in the packet may be much earlier than the real playing PTS. The playing duration of data in the packet can be calculated according to Sample Rate and Sample Format. Update the Audio_Clock again.

// Number of bytes per second of audio playback Sampling rate * number of channels * Number of bytes per sample
n = 2 * state->audio_ctx->channels;
state->audio_clock += (double) data_size /
                   (double) (n * state->audio_ctx->sample_rate);
Copy the code

Finally, when we get the Audio_Clock, it is very likely that the audio buffer has not finished playing data, that is, some data has not played, so we need to subtract the playing time of this part of data from the Audio_Clock, which is the real Audio_Clock.

double get_audio_clock(VideoState *state) {
    double pts;
    int buf_size, bytes_per_sec;

    // PTS obtained in the previous step
    pts = state->audio_clock;
    // The audio buffer has no data to play yet
    buf_size = state->audio_buf_size - state->audio_buf_index; 
    // The number of bytes of audio played per second
    bytes_per_sec = state->audio_ctx->sample_rate * state->audio_ctx->channels * 2;
    pts -= (double) buf_size / bytes_per_sec;
    return pts;
}
Copy the code

Get_audio_clock returns the final Audio_Clock, the current audio playback duration.

Video_Clock

Video_Clock, the length of time the video has been played when it reaches the current frame.

avcodec_send_packet(state->video_ctx, packet);
while (avcodec_receive_frame(state->video_ctx, pFrame) == 0) {
    if((pts = pFrame->best_effort_timestamp) ! = AV_NOPTS_VALUE) { }else {
        pts = 0;
    }
    pts *= av_q2d(state->video_st->time_base); // Time base conversion, in seconds

    pts = synchronize_video(state, pFrame, pts);
    
    av_packet_unref(packet);
}
Copy the code

While the old version of FFmpeg used the av_frame_get_best_effort_timestamp function to get the most appropriate PTS of the video, the new version generates best_effort_TIMESTAMP when decoded. It is still possible to not get the correct PTS, so this is handled in synchronize_video.

double synchronize_video(VideoState *state, AVFrame *src_frame, double pts) {

    double frame_delay;

    if(pts ! =0) {
        state->video_clock = pts;
    } else {
        pts = state->video_clock;// PTS error, using the previous PTS value
    }
    // Calculate the interval time of each frame according to the time base
    frame_delay = av_q2d(state->video_ctx->time_base);
    // The time to delay the decoded frame
    frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);
    state->video_clock += frame_delay;// Get video_clock, which is actually the predicted time for the next video frame
    return pts;
}
Copy the code

synchronous

The previous two steps get the Audio_Clock and Video_Clock, so we have the display time of the Frame in the video stream, and we have the Audio clock as the base time, and we can synchronize the video to the Audio.

  1. PTS of the current frame – PTS of the last played frame gets a delay time
  2. The PTS of the current frame is compared to the Audio_Clock to determine whether the video is playing faster or slower
  3. According to the result of 2, set the delay time for playing the next frame
#define AV_SYNC_THRESHOLD 0.01 // Minimum synchronization threshold
#define AV_NOSYNC_THRESHOLD 10.0 // The synchronization threshold
double actual_delay, delay, sync_threshold, ref_clock, diff;

// Subtract the current Frame time from the last Frame time to get the delay between two frames
delay = vp->pts - is->frame_last_pts;
if (delay <= 0 || delay >= 1.0) { 
    // Delay less than 0 or more than 1 second (too long) is an error, set the delay time to the last delay time
    delay = is->frame_last_delay;
}

// Get audio Audio_Clock
ref_clock = get_audio_clock(is);
// Get the difference between the current PTS and Audio_Clock
diff = vp->pts - ref_clock;

sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;

// Adjust the delay for playing the next frame to achieve synchronization
if (fabs(diff) < AV_NOSYNC_THRESHOLD) {
    if (diff <= -sync_threshold) { // Delay set to 0
        delay = 0;
    } else if (diff >= sync_threshold) { // Almost, double delay
        delay = 2 * delay;
    }
 }
is->frame_timer += delay;
// Finally the actual delay time
actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
if (actual_delay < 0.010) {
    // If the delay time is too small, set the minimum value
    actual_delay = 0.010;
}
// Refresh the video according to the delay time
schedule_refresh(is, (int) (actual_delay * 1000 + 0.5));
Copy the code

The last

Video synchronization to audio audio and video synchronization is basically completed, the overall dynamic process is fast to wait, slow to accelerate, in a state of you catch up with each other to achieve synchronization playback.

Later blogs will really implement an audio and video synchronization player.