[TOC]
Audio & video & streaming media
What prompted me to write this primer on audio & Video streaming? That’s because I bet a girl on the concept of bitrate and lost. For a technician, the code rate is the most basic probability is not fully understood, it is a great shame, just kidding. So once you have that premise, it turns out that you have to understand a little bit more about the principles and the basics.
Streaming media background
At present, audio, video and streaming media have become ubiquitous, and live broadcasting has been popular for several years. In the following years, people will chat not only with text, but more with “face to face” communication, able to perceive the other party’s expressions and actions in real time. Therefore, it is necessary to follow the trend of The Times and comb the course of streaming media.
What is streaming media? Streaming media refers to media formats such as audio, video or multimedia files that are continuously played in real time on the network using streaming transmission technology. Streaming media technology is also called streaming media technology. So audio and video is the core of streaming media.
Definition specification for common terms of audio and video
Audio and video composition
A complete video file, including audio, video and basic meta information, our common video files such as MP4, MOV, FLV, AVI, RMVB and other video files, is a container packaging, which contains audio and video two parts, and are through some specific coding algorithm, encoding after compression.
H264 and Xvid are video encoding formats, while MP3 and AAC are audio encoding formats. For example, after encapsulating an Xvid video encoding file and an MP3 audio encoding file according to AVI encapsulation standard, a video file with AVI suffix is obtained.
So the essence of what video conversion needs to be set up is
- Set the required video encoding
- Set the desired audio encoding
- Select the desired container wrapper
A complete video conversion setup includes at least the above three steps.
Coding format
Audio encoding format
The audio encoding formats are as follows
- AAC
- AMR
- PCM
- Ogg (OGG Vorbis Audio)
- AC3(DVD audio Encoding)
- DTS(DVD Specific Audio Encoding)
- APE (monkey ‘s audio)
- AU (sun format)
- WMA
Comparison of sound quality between audio encoding schemes (AAC, MP3, WMA, etc.) results: AAC+ > MP3PRO > AAC> RealAudio > WMA > MP3
At present, the most common audio formats are Mp3, AC-3, ACC, Mp3 has the most extensive support, AC-3 is dolby technology, ACC is the audio standard in MPEG-4, ACC is currently more advanced and has the advantage of technology. For starters, it’s enough to know that there are several of the most common audio formats.
Video encoding format
Video coding standards have two major systems: MPEG and ITU-T, there are two organizations that formulate video codec technology in the world. One is THE International Telecommunication Union (ITU-T), which formulated standards h.261, H.263, H.263+, H.264, etc. The other is the “International Organization for Standardization (ISO)” it develops standards mPEG-1, MPEG-2, MPEG-4 and so on.
Common encoding formats are:
- Xvid(MPEG4)
- H264 (currently the most commonly used encoding format)
- H263
- MPEG1 and MPEG2
- AC-1
- RM, RMVB
- H.265 (not used enough at present)
At present, the general performance order of the most common video encoding methods is basically: MPEG-1/-2 < WMV/7/8 < RM/RMVB < Xvid/Divx < AVC/H.264 (from low to high, may not be completely accurate).
Before H.265 came out, H264 was the most compressed video compression format with the following advantages:
- Low Bit Rate: compared with MPEG2 and MPEG4 ASP compression technology, under the same image quality, the amount of data compressed by H.264 technology is only 1/8 of MPEG2 and 1/3 of MPEG4.
- High quality graphics: H.264 provides continuous, smooth high quality graphics (DVD quality).
- Fault tolerance: H.264 provides the necessary tools to resolve errors such as packet loss that are prone to occur on unstable networks.
- Network adaptability: H.264 provides Network Abstraction Layer which makes it easy to transfer H.264 files over different networks (e.g. Internet, CDMA, GPRS, WCDMA, CDMA2000, etc.).
The biggest advantage of H.264 is that it has a very high data compression ratio. Under the condition of the same image quality, h.264’s compression ratio is more than two times that of MPEG-2 and 1.5 ~ 2 times that of MPEG-4. For example, if the original file size was 88GB, it was reduced to 3.5GB with a 25∶1 compression ratio using THE MPEG-2 compression standard, but changed to 879MB with the H.264 compression standard. From 88GB to 879MB, the H.264 compression ratio reached a staggering 102:1. The high compression ratio of H.264 with Low Bit Rate plays an important role. Compared with mPEG-2 and MPEG-4 ASP compression technology, H.264 compression technology will greatly save users’ download time and data traffic charge. In particular, H.264’s high compression ratio is coupled with high quality, smooth images, which means that h.264 compressed video data requires less bandwidth and is more economical over the network.
At present, these common video coding formats actually belong to lossy compression, including H264 and H265, which are also lossy coding. Lossy coding can obtain higher compression rate and smaller volume under the premise of quality assurance
Storage package format
At present, common storage encapsulation formats in the market are as follows:
- AVI (.avi)
- The ASF. (ASF)
- WMV (.wmv)
- QuickTime ( .mov)
- MPEG (.mpg / .mpeg)
- MP4 (.mp4)
- M2ts (.m2ts /.mts)
- Matroska (.mkv /.mks /.mka)
- RM ( .rm / .rmvb)
- TS/PS
AVI: mPEG-2, DIVX, XVID, WMV3, WMV4, AC-1, H.264 RV40, RV50, RV60, RM8, RM9, RM10 MOV: mPEG-2, MPEG4-ASP(XVID), H.264 MKV: all
Video bit rate, frame rate, resolution
Bit rate
Data Rate refers to the Data flow Rate used by a video file in unit time, also known as bit Rate or bit flow Rate. Generally, it is known as sampling Rate, which is the most important part of picture quality control in video coding. Generally, the unit we use is KB /s or Mb/s. Generally speaking, with the same resolution, the larger the video stream, the smaller the compression ratio and the higher the picture quality. The larger the code stream, the larger the sampling rate per unit time, the higher the accuracy of the data stream, the closer the processed file is to the original file, the better the image quality, the clearer the picture quality, and the higher the decoding ability of the playback device is required.
Of course, the larger the bit rate, the larger the file size, which is calculated by the file size = time X bit rate /8. For example, a 90-minute 720P RMVB file with 1Mbps bit rate is common on the network, and its volume is 5400 seconds x 1Mbps/8=675MB.
Generally speaking, a video file includes images (video) and voice (audio), for example a RMVB video file, containing the video and audio information, audio and video have their own different ways of sampling and bit rate, that is to say, the same video file audio and video bit rate is not the same. The bitrate size of a video file generally refers to the sum of the bitrate of audio and video information in the video file.
Frame rate
The frame rate is also called the Frames Per Second. The number of frames refreshed per second, or the number of times a graphics processor refreshes per second. The higher the frame rate, the smoother and more realistic the animation. The more frames per second (FPS) you have, the smoother the action will be.
Here are some basic statistics about frame rates:
- The higher the frame rate, the higher the CPU consumption
- Live video of the show, usually 20fps
- Normal live video, usually 15fps
The resolution of the
Video resolution refers to the size or dimension of the image produced by a video imaging product. Common video resolution is 352× 288,176 × 144,640 ×480 and 1024×768. In the image of two sets of numbers, the former is the length of the image, the latter is the width of the image, the two multiply to get the image pixel, length to width ratio is generally 4:3.
480P: 640 x 480 pixels 720P: 1280 x 720 pixels 1080P: 1920 x 1080 pixels
Then you need to pay attention to the storage format of each pixel, and the size of bytes that each pixel takes up.
Image storage format YUV
What are the basic elements of a color image?
1. Width: How many pixels are in a row. 2. Height: how many pixels are in a column and how many lines are in a frame 4. How many bytes are in a line? 5. What is the image size? 6. What is the resolution of the image?
To put it bluntly, the basic thing that an image includes is binary data, and its capacity is essentially the number of binary data. A 1920×1080 pixel image of YUV422 is 1920X1080X2=4147200 (decimal), or 3.95m in size. The size depends on the number of pixels and how the data is stored.
Relationship between YUV and pixel:
YUV format, similar to RGB we are familiar with, YUV is also a color coding method, mainly used in the field of TV system and analog video, it separates brightness information (Y) and color information (UV), without UV information can display a complete image, but is black and white. This design solves the compatibility problem of color TV and black and white TV very well. Moreover, UNLIKE RGB, YUV does not require three independent video signals to be transmitted at the same time, so YUV transmission takes up very little bandwidth.
YUV formats come in two broad categories: planar and Packed. For planar YUV format, store Y for all pixels successively, followed by U for all pixels, followed by V for all pixels. For Packed YUV format, Y,U and V of each pixel point are stored continuously and alternately.
YUV is divided into three components, “Y” represents Luminance or Luma, which is the gray value; The “U” and “V” refer to Chrominance or Chroma, which describes the color and saturation of an image and specifies the color of a pixel. YUV uses a brightness (Y) and two color differences (U,V) to replace the traditional RGB primary colors to compress the image. Traditional RGB primary colors use red, green and blue primary colors to represent a pixel, and each primary color occupies a byte (8bit), so a pixel represented in RGB requires 8 * 3=24bit.
If YUV is used to represent this pixel, it is assumed that the sampling rate of YUV is 4:2:0, that is, the sampling frequency of each pixel for brightness Y is 1. For color difference U and V, each of the two adjacent pixels takes U and V. For a single pixel, chromatic aberrations U and V are sampled at half the frequency of brightness. If three adjacent pixels are represented in RGB primary colors, they need to occupy a total of 72bits: 8 x 3 x 3 = 72bits; If YUV (4:2:0) is used, only the following bits are required: 8 x 3 (Y) + 8 x 3 x 0.5 (U) + 8 x 3 x 0.5 (V) = 36bits. Only need half of the original space, can represent the original image, the data rate is compressed twice, and the image effect is basically unchanged.
So, how do we calculate the number of bytes in the YUV format?
YUV image format memory size
-
4:4:4 indicates that the chromaticity value (UV) does not reduce the sampling. That is, one byte each for Y,U, and V, plus one byte for the Alpha channel, for a total of 4 bytes. This format is actually 24BPP RGB format.
-
4:2:2 means that the UV component sampling is halved, such as the first pixel sampling Y,U, the second pixel sampling Y,V, and so on, so that each point occupies 2 bytes. Two pixels make up a macro pixel.
- Memory required: W x H x 2
-
This sampling at 4:2:0 does not mean that only Y, Cb and no Cr components are sampled, because the 0 here says that the U and V components are sampled every other row. For example, sample 4:2:0 in the first row, sample 4:0:2 in the second row, and so on… In this sampling mode, each pixel occupies either 16bits or 10bits of space.
- The memory is yyYYYYYYUUvV
- Memory to be occupied: W x H x 3/2
-
4:1:1 can be referred to the 4:2:2 component, which is further compressed, and U and V components are collected every four points. Usually Y and U are picked at point 1, Y is picked at point 2, YV is picked at point 3, Y is picked at point 4, and so on.
The relationship between frame rate, bit rate and resolution
Bit rate has nothing to do with frame rate
Bit rate is related to bandwidth and file size
Frame rate is related to smoothness and CPU consumption
Resolution is related to image size and sharpness
The size of a video file is 5.86m and the playback time is 3 minutes and 7 seconds
-
1, the corresponding bit rate of the file is
- 5.86 * 1024 * 1024 * 8 / (3 * 60 + 7) = 262872.95657754 BPS
-
2 indicates the number of concurrent online users supported by the exclusive bandwidth of 10M
- 10 * 1024 * 1024/262872.95657754 = 39.889078498294
-
3. The minimum bandwidth required by the system supporting 1000 people online at the same time is
- 262872 * 1000 / (1024 * 1024) = 250.69427490234M
10min, flow consumption 41587KB
41587/10*60 = 69KB/s = 69 * 8 Kb/s = 532Kb/s
So the bit rate is 532Kb/s
Output file size formula
A file with 128Kbps encoding rate for audio and 800Kbps encoding rate for video has a total encoding rate of 928Kbps, meaning that encoded data needs to be represented in 928K bits per second.
File size formula: (Audio encoding rate (KBit) /8 + Video encoding rate (KBit) /8) x Total movie length (second) = File size (MB)
Size of a frame image
A frame of image original size = wide pixels * long pixels, of course, to consider the data format, because the data format is not the same, case is not the same, general data using RGB, YUV format, such as RGB32, YUV420, YUV422 and so on. The most commonly used should belong to the YUv420. Therefore, the calculation formula is:
File bytes = image resolution * image quantization bits /8 Image resolution = number of pixels in X direction * number of pixels in Y direction Image quantization = number of binary color bits
-
RGB24 each frame size is
- Size = width * 3 heigth Bit
-
RGB32 each frame size is
- Size = width * heigth * 4
-
YUV420 is the size of each frame
- Size = width * 1.5 heigth Bit
For example, for a 1024*768 image, the actual YUV422 data stream size is: 1024*768 * 2 = 1572864bit
Audio sampling rate, bits
1, channel number: channel number is an important indicator of audio transmission, now there are mainly mono and double channel. Double channel also known as stereo, in the hardware to account for two lines, sound quality, good timbre, but stereo digital space than mono more than one times.
2. Quantization bit: Quantization bit is the digitalization of the amplitude axis of the analog audio signal, which determines the dynamic range of the analog signal after digitalization. Since the computer operates by byte, the general quantization number is 8 and 16 bits. The higher the quantization bit is, the larger the dynamic range of the signal is, the more likely the digitized audio signal is to be close to the original signal, but the larger the storage space is required.
3. Sampling rate: Also known as sampling speed or sampling frequency, it defines the number of samples per second that are extracted from a continuous signal to form a discrete signal, expressed in Hertz (Hz). Sampling rate refers to the sampling frequency when analog signal is converted into digital signal, that is, how many points are sampled per unit time. How many bits of data are there in a sample point? The bit rate is the number of bits (bits) transmitted per second. The unit is BPS (Bit Per Second). The higher the Bit rate, the larger the data and the better the sound quality.
The selection of the sampling rate should follow Harry Nyquist’s sampling theory (if an analog signal is sampled, the maximum recoverable signal frequency is only half of the sampling frequency, or the original signal can be reconstructed from the sampling signal series as long as the sampling frequency is higher than twice the maximum frequency of the input signal). According to the sampling theory, a CD CD can sample at a frequency of 44kHz, and record the highest audio at 22kHz. Such sound quality is not far from the original sound, which is often referred to as super hi-fi sound quality. The adopted frequency of the digital telephone in the communication system is usually 8kHz, consistent with the original 4k bandwidth sound.
Bit rate (audio) = sampling rate x Number of bits x number of channels.
Take the telephone for example, 3000 samples per second, each of which is 7 bits, so the bit rate of the telephone is 21000. CD is 44100 samples per second, two sound channels, each sample is 13-bit PCM code, so the bit rate of CD is 44100213=1146600, that is, the amount of data per second of CD is about 144KB, and the capacity of a CD is 74 minutes equal to 4440 seconds, that is, 639360KB = 640MB.
I frame, P frame, B frame, IDR frame
I frame: intra-frame coding frame
I frame features:
- It is a full frame compressed encoding frame. The whole frame image information is compressed and transmitted by JPEG.
- The whole image can be reconstructed only with the data of I frame.
- I frame describes the details of the image background and the moving subject;
- I frames are generated without reference to other frames;
- I frame is the reference frame of P frame and B frame (its quality directly affects the quality of subsequent frames in the same group);
- I frame is the base frame (the first frame) of frame group GOP, there is only one I frame in a group;
- I frames do not need to consider motion vectors;
- I frame occupies a large amount of data information.
P frame: forward prediction coding frame
Prediction and reconstruction of P frame 😛 frame takes I frame as the reference frame, and the predicted value and motion vector of “certain point” of P frame are found in I frame, and the prediction difference and motion vector are transmitted together. At the receiving end, the predicted value of “certain point” of P frame is found from I frame according to the motion vector, and the sample value of “certain point” of P frame is obtained by combining it with the difference value, thus the complete P frame can be obtained. Also called predictive frame, an encoded image that compresses the amount of transmitted data by fully reducing the time redundancy of the previous encoded frame in the image sequence. Also called predictive frame
P frame features:
- P frame is the coding frame separated by 1~2 frames after I frame;
- P frame adopts the method of motion compensation to transmit the difference between it and the previous I or P frame and the motion vector (prediction error).
- The P frame image can be reconstructed only after summing the predicted value and the predicted error in I frame during decoding.
- P frames belong to forward prediction interframe coding. It only refers to the nearest I frame or P frame;
- The P frame can be the reference frame of the P frame behind it, or the reference frame of the B frame before and after it.
- As P frame is a reference frame, it may cause the spread of decoding errors.
- Because of differential transmission, the compression of P frames is relatively high.
B frame: bidirectional predictive interpolation coded frame.
Prediction and reconstruction of B frame: I or P frame before B frame and P frame after B frame as reference frame, “find” the predicted value of “certain point” of B frame and two motion vectors, and take the prediction difference and motion vector transmission. The receiver “finds (calculates)” the predicted value in the two reference frames according to the motion vector and sums it with the difference to get a sample value of “point” in B-frame, thus obtaining the complete B-frame. Also known as bi-directional interpolated prediction frame, an encoded image that compresses the amount of transmitted data by considering the time redundancy between the encoded frames before and after the encoded frames of the source image sequence
B frame features:
- B frame is predicted by the I or P frame before and the P frame after;
- B frame transmits the prediction error and motion vector between it and the preceding I or P frame and the following P frame;
- B frame is bidirectional predictive coding frame;
- The compression ratio of B frame is the highest, because it only reflects the change of the moving subject of C reference frame, so the prediction is more accurate.
- B frames are not reference frames and will not cause the proliferation of decoding errors.
IDR frame
IDR (Instantaneous Decoding Refresh) – Instantaneous Decoding Refresh.
Both I and IDR frames use intra-frame prediction. They are all the same thing, in order to facilitate the encoding and decoding, the first I frame should be distinguished from other I frames, so the first I frame is called IDR, so that it is convenient to control the encoding and decoding process. The function of IDR frame is to refresh immediately so that the error does not propagate. From IDR frame, a new sequence is recalculated to start encoding. While I frames do not have random access capability, this function is undertaken by the IDR. IDR causes the DPB (DecodedPictureBuffer reference frame list — that’s the point) to clear, while I does not. An IDR image must be an I image, but an I image is not necessarily an IDR image. There can be many I images in a sequence, and the images after the I images can reference the images between the I images for motion reference. There can be many I images in a sequence, and the images after the I images can reference the images between the I images for motion reference.
For an IDR frame, any frame after an IDR frame cannot refer to the contents of any frame before the IDR frame. In contrast, for an ordinary I-frame, B- and P- frames after it can refer to an I-frame before the ordinary I-frame. From a random access video stream, the player can always play from an IDR frame because no frame after it references the previous frame. However, you cannot start at any point in a video without IDR frames, because subsequent frames will always reference previous frames.
summary
I frame represents the key frame, which you can understand as the complete preservation of this frame; Decoding only needs this frame data to complete (because it contains the whole picture).
P frame represents the difference between this frame and the previous key frame (or P frame). When decoding, the difference defined in this frame needs to be superimposed on the cached picture to generate the final picture. (That is, the difference frame, P frame does not have the full frame data, only the difference from the previous frame data).
B frame is a bidirectional differential frame, that is, B frame records the difference between this frame and the frame before and after. In other words, to decode B frame, not only the cached picture before but also the picture after decoding should be obtained, and the final picture should be obtained through the superposition of the data of the frame before and after and the frame. B frame compression rate is high, but decoding CPU will be more tired ~.
PTS: Presentation Time Stamp. PTS is mainly used to measure when the decoded video frame is displayed
DTS: Decode Time Stamp. DTS mainly identifies when the bit stream read into memory starts to be fed into the decoder for decoding.
DTS is mainly used for video decoding and is used in the decoding phase. PTS is mainly used for video synchronization and output. This is used in display. In the absence of a B frame. The output order of DTS and PTS is the same.
GOP
A GOP is formed between two I frames. In X264, the size of BF can be set by parameters at the same time, namely, the number of I and P or B between two PS. If B frames exist, the last frame of a GOP must be P.
Generally speaking, the compression rate of I is 7 (similar to JPG), P is 20, and B can reach 50. It can be seen that using B frame can save a lot of space, which can be used to save more I frame, so that at the same bit rate, it can provide better picture quality. On the premise of constant bit rate, the larger the GOP value is, the more P and B frames will be, the more bytes will be occupied by each I, P and B frame on average, and it will be easier to obtain better image quality. The larger the Reference is, the more B frames there are, and the easier it is to obtain better image quality.
If I frame is lost in a GOP, then the following P frame and B frame will also be useless, so they must be discarded. However, general policies will ensure that I frame is not lost (such as through TCP protocol). If UDP is adopted, there will be more policies to ensure the correct transmission of I frame.
codec
Hard codec
Codec is realized by hardware to reduce the burden of CPU calculation, such as GPU
Soft decoding
Such as H264, H265, MPEG-4 and other codec algorithms, more CPU consumption
Data optimization
Data optimization is closely related to codec algorithms, in general
Video frame size
- Generally, the compression rate of I frame is 7, P frame is 20, and B frame can reach 50 (data is not accurate).
- P frame is about 3~4KB (480P, 1200K bit rate, baseline profile)
Audio frame size
- (Sampling frequency (Hz) * Sampling bits (bit) * Number of sound channels)/ 8
- 48000Hz after AAC compression, it should be about 12KB/s
Streaming media transfer protocol
The commonly used streaming media protocols mainly include HTTP progressive download and real-time streaming media protocol based on RTSP/RTP, which are fundamentally different things
The commonly used streaming media protocols in CDN live broadcast include RTMP, HLS and HTTP FLV
RTP and RTCP
Real-time Transport Protocol (RTP) is commonly used in streaming media systems (with RTCP), video conferencing, and one-touch systems (with H.323 or SIP), making it the technical foundation of the IP telephony industry. RTP is used together with RTP control protocol RTCP and is based on UDP.
The Real-Time Transport Control Protocol (RTP Control Protocol or RTCP for short) is a sister Protocol to the Real-time Transport Protocol. RTCP provides out-of-channel control for RTP media streams. RTCP does not transmit data itself, but works with RTP to package and send multimedia data. RTCP periodically transfers control data between participants in a streaming multimedia session. The main function of RTCP is to provide feedback on the quality of service provided by RTP.
RTSP+RTP is often used in the IPTV domain. Because it uses UDP to transmit video and audio, support multicast, high efficiency. But its disadvantage is that the network is not good in the case of packet loss, affecting the quality of video watching.
summary
RTMP
RTMP(Real Time Messaging Protocol) is an open Protocol developed by Adobe Systems for audio, video, and data transmission between Flash players and servers.
It comes in three varieties:
- A plaintext protocol that works over TCP, using port 1935;
- RTMPT is encapsulated in HTTP requests and traverses firewalls.
- RTMPS is similar to RTMPT, but uses HTTPS connections.
Summary: RTMP protocol is implemented based on TCP. The data at each moment is forwarded immediately upon receipt, with a general delay of 1-3s
HLS
HTTP Live Streaming (HLS) is an HTTP-based Streaming protocol implemented by Apple Inc. It can realize Live Streaming and on demand. HLS on demand is basically the usual segmented HTTP on demand, but the difference is that its segmented is very small. The basic principle is to slice the video or stream into small pieces (TS) and build an index (M3U8).
Compared with common live streaming protocols, such as RTMP, RTSP and MMS, the biggest difference of HLS live streaming is that what the live streaming client obtains is not a complete data stream. HLS agreement on the server side will live very short data stream is stored as a continuous, long media files (MPEG – TS format), and the client is continually download and play these small files, because the server is always the latest live data will be generated new small files, so that the client as long as keep the order of play from the server to file, It’s live. It can be seen that HLS basically realizes live broadcasting in the technical way of vod. Because the data is transmitted through HTTP protocol, there is no need to consider the problem of firewall or proxy, and the time of segmentation file is very short, the client can quickly select and switch the bit rate to adapt to the playback under different bandwidth conditions. However, the technical nature of HLS means that its latency is generally higher than that of normal live streaming protocols.
Summary: HLS protocol is implemented based on HTTP short connection. It collects data for a period of time, generates TS slice file, and then updates M3U8 (index file of HTTP Live Streaming). Generally, the delay is longer than 10s
HTTP-FLV
Http-flv is based on HTTP long connection, like RTMP. Data at every moment is forwarded immediately after receiving, but using HTTP protocol, with a general delay of 1-3s
CDN
CDN architecture design is complicated. Different CDN vendors are also constantly optimizing their architectures, so the architectures cannot be unified. This is just a quick look at some of the basic architectures.
CDN mainly includes: source site, cache server, intelligent DNS, client and so on.
Source site: refers to the original site that publishes content. Add, delete and change the site’s files, are carried out on the source site; In addition, the objects captured by the cache server are all from the source site. For live streaming, the source station is the host client.
Cache server: a site resource directly provided for users to access, consisting of one or several servers; When a user initiates an access request, the intelligent DNS locates the request to the cache server nearby. If the requested content happens to be in the cache, the content is returned directly to the user; If the content required for access is not cached, the cache server grabs the content from a neighboring cache server or directly from the source site, and then returns it to the user.
Intelligent DNS: as the core of the whole CDN technology, it mainly directs the access request to the cache server which is close to the user and has a small load according to the source of the user and the current load of the cache server. Intelligent DNS resolution enables users to access low-load servers of the same service provider, eliminating the problem of slow network access and accelerating network access.
Client: a common user who initiates access. For live streaming, it’s the audience client.
Weak network optimization
Weak network optimization strategies include the following:
-
Player Buffer
-
Frame loss policy (P frame first, I frame second, audio last)
-
Adaptive bit rate algorithm
-
Bidirectional link optimization
-
Audio FEC redundancy algorithm (20% packet loss rate)
Throw the frame
In the case of weak networks, frame loss may be adopted in order to achieve a better experience, but frame loss, how to lose? Do I lose audio or video? Because the video frame is relatively large, and the video frame before and after are related; The audio frame is very small, but the point is that the audio frame is sampled continuously, and if you lose the audio frame, the sound will be obviously flawed. For this reason, the general frame loss strategy is to drop video frames
Adaptive bit rate
Under the condition of weak network, another kind of strategy is adaptive bit rate algorithm, by setting the rate to more than one class, so, when the network is bad, by reducing rate forecast, if rate reduced, enough buffer buffer, then continue to reduce a class, is a cycle detection process, if again downgraded after a class, If the buffer buffer is sufficient, the current network can accommodate this bit rate, so the current bit rate is adopted. The same goes for upscaling. But this is the core algorithm of the manufacturer,
The challenges of real-time chat
Make a simple estimate of the network latency. It is well known that the speed of light in vacuum is about 300,000km/s, while in other media the speed of light is greatly reduced, so in ordinary optical fiber, engineering generally believed that the transmission speed is 200,000km/s. In practical terms, it can be referred to as:
The challenges of real-time chat lie in the following:
-
Real-time: within 600ms
-
The asymmetry of the network
-
distance
Faqs and solutions
-
The screen is flowy or green
- Acquisition problem, codec problem, acoustic network transmission frame loss problem
-
Out of sync
- Collection problems or public cloud SDK problems
-
The picture is sometimes a little blurry
- Weak net, bit rate adaptive
-
There is sound without pictures
- Weak network, triggered frame loss policy
-
The screen sometimes stutters
- CPU usage is too high, such as the AR module
- Weak network
-
The network connection is down
- Weak network
- Either the code is buggy, or the public cloud SDK is buggy
-
Mosaic?
- Is it similar to a flower screen?
TODO
Other common metrics and problem solutions
Welcome to follow my wechat official account: Linux server system development, and we will vigorously send quality articles through wechat official account in the future.