HLS (HTTP Live Streaming)

HLS is a streaming media protocol based on HTTP.

How HLS works

HLS works by breaking up a stream into a series of small packets. At the start of the stream session, HLS downloads a file with the extension M3U (or M3U8 if utF-8 encoding character encoding is used). It contains a specific location for one or more media files. Typically used to point the media player to audio and video sources.

M3U files are plain text files that specify the location of one or more media files.

A file with the M3U file extension is an audio playlist file that represents an MP3 URL and, therefore, is not an actual audio file itself. M3U files only point to audio (and sometimes video) files so that the media player can queue them up for playback. These text-based files can contain urls and/or absolute or relative pathnames of media files and/or folders. Utf-8 encoded M3U files are saved as M3U8 files.

How to build an M3U file

M3U files are not usually built from scratch. For example, in a media player like VLC, you can use media > to save playlists as files… Option to save the currently open list of songs to an M3U file.

However, if you do build your own M3U files, it is important to use the correct syntax. Here is an example of an M3U file:

#EXTM3U # ext-x-version :3 # ext-x-media-sequence :0 # ext-x-targetDuration :00015 #EXTINF:6.000, 31972. Ts #EXTINF:10.000, 31975.ts #EXTINF:10.000, 31975.ts #EXTINF:10.000, 31975.ts # ext-x-endListCopy the code

HLS uses HTTP for transport

The basics of digital video

Video is fast becoming a major part of an enterprise’s traffic mix. Both streaming media and pre-positioned video have an impact on the network, which severely affects overall performance. Understanding the structure of video datagrams and their requirements on networks will help network administrators to implement media ready networks.

Different types of videos

You can use several broad attributes to describe a video. For example, video can be classified as live or pre-recorded, streaming or pre-recorded, and high or low resolution. The network load depends on the type of video being sent. Pre-recorded, pre-positioned low resolution video is just file transfer, while live streaming video requires a high-performance network. Many general-purpose video applications fall somewhere in between. This allows non-live streaming video applications to work acceptably over the public Internet. Tuning the network and media encoders is an important aspect of deploying video over IP networks.

H.264 encoding and decoding meaning

Video codecs have evolved over the past 15 years. Today’s codecs take advantage of enhanced processing power to better optimize stream sizes. The general program has not changed much since the release of the original MPEG1 standard. The image consists of a matrix of pixels grouped into blocks. Blocks are grouped into macro blocks. A row of macroblocks is a slice. The slices form pictures, which are combined into picture groups (GOP).

Each pixel has a red, green and blue component. The coding process begins by sampling RGB colors into luminance and two-color components, commonly known as YCrCb. A small amount of color information can be ignored during coding and then interpolated (??). Replacement. Once in YCrCb form, each component goes through a transformation. The conversion is reversible and does not compress data. Instead, the data is presented differently to allow for more efficient quantization and compression. Quantization is then used to round small details in the data. This rounding is used to set the quality. Lower quality leads to better compression. Lossless compression is applied by replacing the common sequence of bits with binary code after quantization. Each macro block in the diagram goes through this process, producing a basic bitstream. The stream is divided into 188-byte packets called packet basic Streams (PES). The stream is then loaded into the IP packet. Because IP packets have a 1500 byte MTU and PES packets are fixed at 188 bytes, only seven PES can fit IP packets. The resulting IP packet will be 1316 bytes, excluding the header. Therefore, IP fragmentation is not a problem. An entire HD video frame may require 100 IP groups to host all the basic stream groups, although 45 to 65 groups are more common. Quantization and image complexity are major factors in determining the number of packets required for transmission. Forward error correction can be used to estimate some missing information. However, in many cases, multiple IP packets are discarded in sequence. This makes the framework almost impossible to decompress. Successfully sent packets represent wasted bandwidth. RTCP can be used to request a new frame. If there is no valid initial frame, subsequent frames will not be decoded correctly.

The frame type

The current generation of video encodings has three names; H.264, MPEG4 Part 10 and Advanced Video Coding (AVC). Like earlier codecs, H.264 uses space and time compression. As mentioned earlier, spatial compression is used for a single video frame. These types of frames are called I frames. Frame I is the first picture in GOP. Time compression takes advantage of the fact that little information changes between subsequent frames. While a change in zoom or camera movement may cause almost every pixel to change, the change is the result of movement. Vectors are used to describe this motion and are applied to blocks. If the encoder determines all pixels moving together, global vectors are used, as in the case of camera panning. In addition, differential signals are used to fine-tune any errors caused. H.264 allows variable block sizes and the ability to encode motion as a quarter pixel. The decoder uses this information to determine that the current frame should be based on the appearance of the previous frame. Groups containing motion vectors and error signals are called P-frames. Missing P frames usually result in artifacts that fold into subsequent frames. If the artifact persists over time, the likely cause is missing P-frames.

H.264 also implements B frames. This type of frame populates the information between P frames. This means that B frames need to be held until the next P frame arrives before the information can be used. B frames are not used in all modes of H.264. The encoder decides which frame is best. There are usually more P frames than I frames. Laboratory analysis showed that TelePresence I frames were typically 64K wide (50 packets @ 1316 bytes) while P frames averaged 8K wide (900 bytes for 9 packets).

Dynamic JPEG (MJPEG)

Another type of video compression is MJPEG. This encoding does not use time compression. There are some advantages and disadvantages. First, the resulting video stream is larger, but packet sizes are more consistent at 1316 bytes (payloads). Quantization can be used to mitigate increased bandwidth at the expense of image quality. MJPEG provides the advantage that each frame of video is independent of the previous frame. If more than one grouping from a particular frame is discarded, the artifact is not continued. Another advantage is that a single frame can be easily extracted from the stream without reference to I frames and previous P&B frames. This is useful in some applications, such as video surveillance, where a single frame can be extracted and emailed as a JPG.

Voice and Video

Voice and video are often considered close relatives. Although they are both real-time protocol (RTP) applications, the similarities stop. They don’t even use the same codec to encode audio information. Use G.711 or G.729 for voice and MP3 or AAC for video. Voice is generally considered to perform well because each packet is of a fixed size and a fixed rate. Video frames are distributed over multiple packets that are transmitted as a group. Because a missing packet can destroy a p-frame, and a bad p-frame can lead to persistent artifacts, video generally has more stringent loss requirements than audio. The video is not symmetrical. Phonetics can work but usually not. IP phones send and receive the same amount of traffic even when they are silent. Video uses a separate camera and viewer, so symmetry cannot be guaranteed. In the case of broadcast video, the asymmetric load on the network can be large. Network policies may be required to manage potential senders. For example, if the branch is rendering video, the WAN aggregation router may be receiving more data than it is sending.

Video will increase the average real-time packet size and have the ability to quickly change network traffic profiles. Without a plan, this can adversely affect network performance.

resolution

The sending station determines the resolution of the video and thus the load on the network. This is independent of the size of the monitor used to display the video. Watching video is not a reliable way to estimate load. Common HD formats are 720i, 1080i, 1080p, etc. The numeric value of the format represents the number of lines in the frame. The high definition aspect ratio is 16:9, which results in 1,920 columns. Work is currently underway on 2160p resolution and UHDV (7,680 x 4320). NHK first demonstrated the format over an IP network, using 600 Mb/s bandwidth. The video load on the network is likely to increase over time due to the demand for high-quality images. In addition to the high resolution, there is a large amount of low-quality video that is often tunnelled over HTTP or, in some cases, OVER HTTPS and SSL. Typical resolutions include CIF (352×288) and 4CIF (704×576). These numbers are selected as integers for 16×16 macroblocks, used by DCT (22×18) and (44×36) macroblocks, respectively.

format The resolution of the The sample of bandwidth
QCIF (1/4 CIF) 176×144 260K
CIF 352×288 512K
4CIF 704×576 1 Mb/s
SD NTSC 720×480 Analog, 4.2 Mhz
720 HD 1280×720 1-8 mb/s
1080 HD 1080×1920 5-8 mb/s h.264 12+ mb/s mpg2
CUPC 640×480 max
YouTube 320×240 Flash(H.264)
Skype Camera limits 128 – 512K+

Network load

The effect of resolution on network load is usually a square term. An image twice as large requires four times as much bandwidth. In addition, color sampling, quantization and frame rate also affect network traffic. The standard rate is 30 frames per second (actually 29.97), but this is an arbitrary value chosen based on the frequency of the AC power supply. In Europe, analog video has traditionally been 25 FPS. Cineplex was shot at 24 FPS. As the frame rate decreases, so does the network load, and the lifetime of the motion decreases. Videos above 24 FPS showed no significant improvement in motion. Finally, the complexity of the encoder has a significant impact on the video load. H.264 encoders have great flexibility in determining how best to encode video, and can introduce complexity in determining the best approach. For example, MPEG4.10 allows the encoder to choose the most appropriate block size based on the surrounding pixels. Because efficient encoding is more difficult than decoding, and because the sender determines the load on the network, low-cost encoders typically require more bandwidth than high-end encoders. The H.264 encoding for real-time CIF video will drive all but the most powerful laptops, reaching 90% of the CPU range without a dedicated media processor.

transport

MPEG4 uses the same transport as MPEG2. PES consists of 188-byte datagrams loaded into IP. Video packages can be loaded into RTP/UDP/IP or HTTP (S)/TCP/IP.

Video on UDP can be found through a dedicated real-time application such as teleconferencing or TelePresence. In this case, an RTCP channel can be established from the receiver to the sender. This is used to manage video sessions and is implementation specific. RTCP can be used to request I frames or reporting functionality to the sender. Both UDP and RTP provide methods for multiplexing channels together. Audio and video typically use different UDP ports, but also have unique RTP payload types. Deep packet detection (DPI) can be used on the network to identify the types of video and audio that exist. Note that H.264 also provides a mechanism for multiplexing video layers together. This can be used to process the scrolling scroll bar at the bottom of the screen without sending a continuous stream of motion vectors.

The buffer

Jitter and latency exist on all IP networks. Jitter is a change in delay. Latency is usually caused by interface queuing. Video decoders can use broadcast buffers to smooth jitter found in the network. The depth of this buffer is limited. If it is too small, it will cause a drop. If it is too deep, the video will be delayed, which can be a problem in real-time applications such as TelePresence. Another limitation is handling discarded packets that often accompany deep playback buffers. If a new I frame is requested using RTCP, more frames will be skipped during resynchronization. It turns out that discarded packets have a slightly greater effect on video degradation than if the missing packets were found earlier. Most codecs employ dynamic playback buffers.

For more information

Security streaming media Internet live -QQ exchange group: 615081503

Gb GB28181 Without liveGBS-QQ communication group: 947137753

Mail: [email protected]

WEB:www.liveqing.com

Tel: 189-5515-0114

Copyright © 2016-2019 at LiveQing.com