After 18 years of reshuffle, the competition in the live streaming industry is becoming increasingly fierce. After 18 years of reshuffle, the wild and violent period has passed, and the rest is constantly pursuing experience. Recently, I made the first opening of live broadcast optimization in Help, and reduced the first opening to less than 500ms through a variety of parallel schemes. I hope it can be used for reference.

Background: IJkPlayer based on FFmpeg, latest version 0.88.

The pull stream protocol is based on HTTP-FLV. Http-flv is more stable, and most domestic live streaming companies basically use HTTP-FLV. According to our actual data, HTTP-FLV is actually slightly faster. But this has been optimized to allow for RTMP sources.

IP through train

Simply put, replace the domain name with an IP address. For example, https://www.baidu.com/, you can directly change to 14.215.177.39. The purpose of this is to save the time of DNS resolution, especially if the network is not good, to access the domain name, the domain name has to be resolved and returned to you. Not only is the resolution time too long, but there is also the problem of DNS hijacking by small operators. Generally, when the application is started, the domain name of the pull flow is preresolved, saved locally, and then used directly in the real pull flow. In a typical case, many people use HTTPDNS, which is also available on Github.

Note that this scheme will fail when using HTTPS. Domain mismatch occurs during HTTPS certificate authentication, resulting in an SSL/TLS handshake failure.

Server GOP cache

In addition to the optimization on the client side, we can also optimize from the streaming media server side. We all know that the image frames in the live stream are divided into: I frame, P frame and B frame, among which only I frame can be decoded independently of other frames. This means that when the player receives I frame, it can render immediately, while when it receives P frame and B frame, it needs to wait for the dependent frame and cannot complete decoding and rendering immediately. This period is called “black screen”.

Therefore, the server side can cache GOP (in H.264, GOP is closed, which is a sequence of image frames beginning with I frame) to ensure that the player side can obtain the I frame and immediately render the picture when accessing the live broadcast, so as to optimize the experience of loading the first screen.

Here is the concept of IDR frames, all IDR frames are I frames, but not all I frames are IDR frames, IDR frames are a subset of I frames. I frame is strictly defined as an in-frame encoding frame. Since it is a full-frame compression encoding frame, I frame is usually used to represent “key frame”. IDR is an extension based on I-frame with control logic. IDR images are all I-frame images. When decoder decodes to IDR images, it will immediately empty the reference frame queue and output or discard all decoded data. Find the parameter set again and start a new sequence. This gives you an opportunity to resynchronize if there is a major error in the previous sequence. Images after IDR images are never decoded using data from images before IDR. In H.264 coding, GOP is closed, and the first frame of a GOP is IDR frame.

Push stream end Settings

Generally, players need to get a complete GOP before they can remember to play. The GOP can be set on the stream side. For example, in the following figure, I dump a stream and see the GOP situation. The GOP size is 50, FPS setting is 25, that is, 25 frames will be displayed within 1s, 50 frames, just setting GOP 2S for live broadcast. However, FPS setting is not so high for live broadcast, you can dump streams of any live broadcast company at will, FPS setting between 15 and 18 is enough.

Player-related time

After setting a source to the player, the player needs to open the stream, establish a long connection with the server, demux, codec, and render. We can optimize each of the four pieces of the player

  • Data request Time

  • Demultiplexing time

  • Decoding time consuming

  • Rendering takes time

Data request

This is network and protocol. Both HTTP-FLV and RTMP are primarily based on TCP, so there must be a TCP three-way handshake and tcp.c analysis turned on. Logging is required in several methods, such as the tcp_open method. It’s been changed

/* return non zero if error */static int tcp_open(URLContext *h, const char *uri, int flags){ av_log(NULL, AV_LOG_INFO, "tcp_open begin"); . If (! dns_entry) {#ifdef HAVE_PTHREADS av_log(h, AV_LOG_INFO, "ijk_tcp_getaddrinfo_nonblock begin.\n"); ret = ijk_tcp_getaddrinfo_nonblock(hostname, portstr, &hints, &ai, s->addrinfo_timeout, &h->interrupt_callback, s->addrinfo_one_by_one); av_log(h, AV_LOG_INFO, "ijk_tcp_getaddrinfo_nonblock end.\n"); #else if (s->addrinfo_timeout > 0) av_log(h, AV_LOG_WARNING, "Ignore addrinfo_timeout without pthreads support.\n"); av_log(h, AV_LOG_INFO, "getaddrinfo begin.\n"); if (! hostname[0]) ret = getaddrinfo(NULL, portstr, &hints, &ai); else ret = getaddrinfo(hostname, portstr, &hints, &ai); av_log(h, AV_LOG_INFO, "getaddrinfo end.\n"); #endif if (ret) { av_log(h, AV_LOG_ERROR, "Failed to resolve hostname %s: %s\n", hostname, gai_strerror(ret)); return AVERROR(EIO); } cur_ai = ai; } else { av_log(NULL, AV_LOG_INFO, "Hit DNS cache hostname = %s\n", hostname); cur_ai = dns_entry->res; } restart:#if HAVE_STRUCT_SOCKADDR_IN6 // workaround for IOS9 getaddrinfo in IPv6 only network use hardcode IPv4 address can not resolve port number. if (cur_ai->ai_family == AF_INET6){ struct sockaddr_in6 * sockaddr_v6 = (struct sockaddr_in6 *)cur_ai->ai_addr; if (! sockaddr_v6->sin6_port){ sockaddr_v6->sin6_port = htons(port); } }#endif fd = ff_socket(cur_ai->ai_family, cur_ai->ai_socktype, cur_ai->ai_protocol); if (fd < 0) { ret = ff_neterrno(); goto fail; } /* Set the socket's send or receive buffer sizes, if specified. If unspecified or setting fails, system default is used. */ if (s->recv_buffer_size > 0) { setsockopt (fd, SOL_SOCKET, SO_RCVBUF, &s->recv_buffer_size, sizeof (s->recv_buffer_size)); } if (s->send_buffer_size > 0) { setsockopt (fd, SOL_SOCKET, SO_SNDBUF, &s->send_buffer_size, sizeof (s->send_buffer_size)); } if (s->listen == 2) { // multi-client if ((ret = ff_listen(fd, cur_ai->ai_addr, cur_ai->ai_addrlen)) < 0) goto fail1; } else if (s->listen == 1) { // single client if ((ret = ff_listen_bind(fd, cur_ai->ai_addr, cur_ai->ai_addrlen, s->listen_timeout, h)) < 0) goto fail1; // Socket descriptor already closed here. Safe to overwrite to client one. fd = ret; } else { ret = av_application_on_tcp_will_open(s->app_ctx); if (ret) { av_log(NULL, AV_LOG_WARNING, "terminated by application in AVAPP_CTRL_WILL_TCP_OPEN"); goto fail1; } if ((ret = ff_listen_connect(fd, cur_ai->ai_addr, cur_ai->ai_addrlen, s->open_timeout / 1000, h, !! cur_ai->ai_next)) < 0) { if (av_application_on_tcp_did_open(s->app_ctx, ret, fd, &control)) goto fail1; if (ret == AVERROR_EXIT) goto fail1; else goto fail; } else { ret = av_application_on_tcp_did_open(s->app_ctx, 0, fd, &control); if (ret) { av_log(NULL, AV_LOG_WARNING, "terminated by application in AVAPP_CTRL_DID_TCP_OPEN"); goto fail1; } else if (! dns_entry && strcmp(control.ip, hostname_bak)) { add_dns_cache_entry(hostname_bak, cur_ai, s->dns_cache_timeout); av_log(NULL, AV_LOG_INFO, "Add dns cache hostname = %s, ip = %s\n", hostname_bak , control.ip); } } } h->is_streamed = 1; s->fd = fd; if (dns_entry) { release_dns_cache_reference(hostname_bak, &dns_entry); } else { freeaddrinfo(ai); } av_log(NULL, AV_LOG_INFO, "tcp_open end"); return 0; // omit some code}

Copy the code

Hints. Ai_family = AF_INET; Ai_family = AF_UNSPEC; , originally designed as an IPv4 and IPv6 compatible configuration, if changed to AF_INET, then there is no AAAA query package. If there are only IPv4 requests, you can change it to AF_INET. Of course there is IPv6, so let’s leave it at that. To see if there is any, you can use the packet capture tool to see.

Tcp_read is a blocking function that takes a lot of time, and we can’t set the interrupt time to be shorter, because if it is shorter, it will interrupt if it can’t read the data, and the subsequent playback will fail directly, so we can only let it wait. But the point of optimization is the next part

static int tcp_read(URLContext *h, uint8_t *buf, int size){ av_log(NULL, AV_LOG_INFO, "tcp_read begin %d\n", size); TCPContext *s = h->priv_data; int ret; if (! (h->flags & AVIO_FLAG_NONBLOCK)) { ret = ff_network_wait_fd_timeout(s->fd, 0, h->rw_timeout, &h->interrupt_callback); if (ret) return ret; } ret = recv(s->fd, buf, size, 0); if (ret == 0) return AVERROR_EOF; //if (ret > 0) // av_application_did_io_tcp_read(s->app_ctx, (void*)h, ret); av_log(NULL, AV_LOG_INFO, "tcp_read end %d\n", ret); return ret < 0 ? ff_neterrno() : ret; }

Copy the code

We can comment out the above two lines, because after ff_network_wait_fd_timeout is returned, the data can be put into buF, and av_application_did_IO_tcp_read is unnecessary. The av_application_did_io_tcp_read function was executed every time ret>0.

Demultiplexing time

It is found in the log that when the data request is received and the audio and video separation is performed, the corresponding DEMuxer needs to be matched first. Among them, the AV_finD_INput_format and avformat_find_STREAM_info of FFMPEG are time-consuming. The former simply means opening a certain file to request the data. The latter is to detect some information of the stream, do some sample detection, read a certain length of code stream data, analyze the basic information of the code stream, and fill the corresponding data for the AVStream structure of each media stream in the video. This function has done the search for the appropriate decoder, open the decoder, read a certain audio and video frame data, try to decode audio and video frame and so on, basically completed the whole process of decoding. This is a synchronous call. When the format of video data is not clear and good compatibility is needed, this process is time-consuming, which will affect the opening of the first screen of the player in seconds. Both function calls are in ff_ffplay.c’s read_thread function:

                                

if (ffp->iformat_name) { av_log(ffp, AV_LOG_INFO, "av_find_input_format noraml begin"); is->iformat = av_find_input_format(ffp->iformat_name); av_log(ffp, AV_LOG_INFO, "av_find_input_format normal end"); } else if (av_stristart(is->filename, "rtmp", NULL)) { av_log(ffp, AV_LOG_INFO, "av_find_input_format rtmp begin"); is->iformat = av_find_input_format( "flv"); av_log(ffp, AV_LOG_INFO, "av_find_input_format rtmp end"); ic->probesize = 4096; ic->max_analyze_duration = 2000000; ic->flags |= AVFMT_FLAG_NOBUFFER; } av_log(ffp, AV_LOG_INFO, "avformat_open_input begin"); err = avformat_open_input(&ic, is->filename, is->iformat, &ffp->format_opts); av_log(ffp, AV_LOG_INFO, "avformat_open_input end"); if (err < 0) { print_error(is->filename, err); ret = -1; goto fail; } ffp_notify_msg1(ffp, FFP_MSG_OPEN_INPUT); if (scan_all_pmts_set) av_dict_set(&ffp->format_opts, "scan_all_pmts", NULL, AV_DICT_MATCH_CASE); if ((t = av_dict_get(ffp->format_opts, "", NULL, AV_DICT_IGNORE_SUFFIX))) { av_log(NULL, AV_LOG_ERROR, "Option %s not found.\n", t->key); #ifdef FFP_MERGE ret = AVERROR_OPTION_NOT_FOUND; goto fail; #endif } is->ic = ic; if (ffp->genpts) ic->flags |= AVFMT_FLAG_GENPTS; av_format_inject_global_side_data(ic); if (ffp->find_stream_info) { AVDictionary **opts = setup_find_stream_info_opts(ic, ffp->codec_opts); int orig_nb_streams = ic->nb_streams; do { if (av_stristart(is->filename, "data:", NULL) && orig_nb_streams > 0) { for (i = 0; i < orig_nb_streams; i++) { if (! ic->streams[i] || ! ic->streams[i]->codecpar || ic->streams[i]->codecpar->profile == FF_PROFILE_UNKNOWN) { break; } } if (i == orig_nb_streams) { break; } } ic->probesize= 100*1024; ic->max_analyze_duration= 5*AV_TIME_BASE; ic->fps_probe_size= 0; av_log(ffp, AV_LOG_INFO, "avformat_find_stream_info begin"); err = avformat_find_stream_info(ic, opts); av_log(ffp, AV_LOG_INFO, "avformat_find_stream_info end"); } while(0); ffp_notify_msg1(ffp, FFP_MSG_FIND_STREAM_INFO);

Copy the code

The final changes were made to RTMP, specifying format to ‘FLV’, and the sample size. At the same time, you can set two parameters probesize and analyzeduration externally to control the amount of data read by the function and the analysis duration to a relatively small value to reduce the avformat_find_stream_info time and optimize the first screen seconds. However, if these two parameters are set too long, the preread data may be insufficient and the code stream information cannot be parsed. As a result, the playback fails and no audio or video is played. Therefore, standardize transcoding the video format on the server side to determine the video format, and then calculate the minimum Probesize and analyzeduration compatible with the avFormat_find_stream_info analysis stream information. In order to ensure the success rate of playback in the case of maximum area optimization of the first screen seconds open.

The function implementation in the utils.c file in FFmpeg has a line of code int fps_analyze_framecount = 20; The general purpose of this line of code is that if this value is not set externally, then avformat_find_STREAM_INFO needs to fetch at least 20 frames of video data, which takes a long time for the first screen, usually about 1 second. And the live broadcast also needs to be real-time, so there is no need to take at least 20 frames. Initialize this value to 2 and see what happens.

/* check if one codec still needs to be handled */ for (i = 0; i < ic->nb_streams; i++) { int fps_analyze_framecount = 2; st = ic->streams[i]; if (! has_codec_parameters(st, NULL)) break; if (ic->metadata) { AVDictionaryEntry *t = av_dict_get(ic->metadata, "skip-calc-frame-rate", NULL, AV_DICT_MATCH_CASE); if (t) { int fps_flag = (int) strtol(t->value, NULL, 10); if (! st->r_frame_rate.num && st->avg_frame_rate.num > 0 && st->avg_frame_rate.den > 0 && fps_flag > 0) { int avg_fps = st->avg_frame_rate.num / st->avg_frame_rate.den; if (avg_fps > 0 && avg_fps <= 120) { st->r_frame_rate.num = st->avg_frame_rate.num; st->r_frame_rate.den = st->avg_frame_rate.den; }}}}

Copy the code

Thus, the avformat_find_stream_info time can be reduced to less than 100ms.

Finally, it is the decoding time and rendering time of the map, this part of the optimization space is very small, the big head in the front.

Someone began to ask questions, you this start broadcast fast is fast, but the back network is not good, what about Caton? There will be lag in live broadcast, mainly when the network is jitter and there is not enough data to play. Ijkplayer will activate its buffer mechanism, mainly with several macro controls

  • DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS: Wake up the read_thread function to read data for the first time when the network is poor.

  • DEFAULT_NEXT_HIGH_WATER_MARK_IN_MS: Wakes up the read_thread function for the second time to read data.

  • DEFAULT_LAST_HIGH_WATER_MARK_IN_MS this macro means the last chance to wake up the read_thread function to read data.

DEFAULT_LAST_HIGH_WATER_MARK_IN_MS can be set to 1 * 1000, which indicates that the buffer finishes reading data after 1 second. The default value is 5 seconds. If the buffer is too large, the user may wait too long. DEFAULT_HIGH_WATER_MARK_IN_BYTES can be set to a smaller value of 30 * 1024, or 256 * 1024 by default. Set BUFFERING_CHECK_PER_MILLISECONDS to 50, the default is 500

                                        

#define DEFAULT_HIGH_WATER_MARK_IN_BYTES        (30 * 1024) #define DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS     (100) #define DEFAULT_NEXT_HIGH_WATER_MARK_IN_MS      (1 * 1000) #define DEFAULT_LAST_HIGH_WATER_MARK_IN_MS      (1 * 1000) #define BUFFERING_CHECK_PER_BYTES               (512) #define BUFFERING_CHECK_PER_MILLISECONDS        (50)

Copy the code

You can see where these macros are used

inline static void ffp_reset_demux_cache_control(FFDemuxCacheControl *dcc){ dcc->min_frames = DEFAULT_MIN_FRAMES; dcc->max_buffer_size = MAX_QUEUE_SIZE; dcc->high_water_mark_in_bytes = DEFAULT_HIGH_WATER_MARK_IN_BYTES; dcc->first_high_water_mark_in_ms = DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS; dcc->next_high_water_mark_in_ms = DEFAULT_NEXT_HIGH_WATER_MARK_IN_MS; dcc->last_high_water_mark_in_ms = DEFAULT_LAST_HIGH_WATER_MARK_IN_MS; dcc->current_high_water_mark_in_ms = DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS; }

Copy the code

The last point of optimization is to set some parameter values, and some of them can also be optimized. In fact, many live broadcast software use a low resolution of 240 or even 360 to reach second opening, which can be used as a time saving point to expand, because the lower the resolution, the less data, the faster the first opening.

mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "opensles", 0); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "framedrop", 1); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "start-on-prepared", 1); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "http-detect-range-support", 0); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "fflags", "nobuffer"); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "flush_packets", 1); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "max_delay", 0); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_CODEC, "skip_loop_filter", 48); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "packet-buffering", 0); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "max-buffer-size", 4 * 1024); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "min-frames", 50); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "probsize", "1024"); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "analyzeduration", "100"); mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "dns_cache_clear", 1); // mute //mediaPlayer.setOption(ijkMediaplayer. OPT_CATEGORY_PLAYER, "an", 1); SetOption (ijkMediaPlayer. OPT_CATEGORY_FORMAT, "reconnect", 1);

Copy the code

After the above, you can take a look at the test data, the resolution of 540P basic seconds open, tested in 4G network:

1. Hebei SATELLITE TV broadcast source tested 10 groups with an average of 300ms. A set of data 386ms, as follows:

11-17 14:17:46.659 9896 10147 D IJKMEDIA: ijkmediaplayer_native_setup11-17 14:17:46.663 9896 10147 V IJKMEDIA: SetDataSource: path http://weblive.hebtv.com/live/hbws_bq/index.m3u811-17 14:17:46. 666 9896 10177 I FFMPEG: [ffPlayer@0xe070d400] AVformat_Open_input begin11-17 14:17:46.841 9896 10177 I FFMPEG: [ffPlayer@0xe070d400] avformat_open_input END11-17 14:17:46.841 9896 10177 I FFMPEG: [ffPlayer@0xe070d400] AVformat_find_stream_info BEGIN11-17 14:17:46.894 9896 10177 I FFMPEG: [ffPlayer@0xe070d400] avformat_find_stream_info end11-17 14:17:47.045 9896 1019d IJKMEDIA: Video: First frame decoded11-17 14:17:47.046 9896 10175 D IJKMEDIA: FFP_MSG_VIDEO_DECODED_START:

2. Inke live show source tested 10 groups with an average of 400ms. A set of data 418ms, as follows:

11-17 14:21:32.908 11464 11788 D IJKMEDIA: ijkmediaplayer_native_setup11-17 14:21:32.952 11464 11788 V IJKMEDIA: setDataSource: path http://14.215.100.45/hw.pull.inke.cn/live/1542433669916866_0_ud.flv?ikDnsOp=1001&ikHost=hw&ikOp=0&codecInfo=8192&ikLog=1 & ikSyncBeta = 1 & dpSrc = 6 & push_host = trans. Push the CLS. Inke. Cn&ikMinBuf = 2900 & 3600 & ikSlowRate ikMaxBuf = = 0.9 & ikFastRate = 1.111-17 14:21:32.996 11464 11818 I FFMPEG: [FFPlayer @0xc2575c00] AVformat_open_input BEGIN11-17 14:21:33.161 11464 11818 I FFMPEG: [ffPlayer@0xc2575C00] avFormat_open_INPUT end11-17 14:21:33.326 11464 11829 D IJKMEDIA: Video: First frame decoded

3. Panda live game source, tested 10 groups with an average of 350ms. A set of data 373ms, as follows:

11-17 14:29:17.615 15801 16053 D IJKMEDIA: ijkmediaplayer_native_setup11-17 14:29:17.645 15801 16053 V IJKMEDIA: setDataSource: The path http://flv-live-qn.xingxiu.panda.tv/panda-xingxiu/dc7eb0c2e78c96646591aae3a20b0686.flv11-17 14:29:17. 649, 15801 16079 I FFMPEG: [FFPlayer @ 0xeb5ef000] avformat_open_input begin11-17 14:29:17.731 15801 16079 I FFMPEG: [ffPlayer@0xeb5ef000] avformat_open_input end11-17 14:29:17.988 15801 16090 D IJKMEDIA: Video: first frame decoded

The above logs are shared on planet Earth. Tell me what I have done in the 10 days since I created the planet. I hope this article will be helpful to you.