IOS Audio and Video (1) AVFoundation core class

IOS Audio and Video (2) AVFoundation video capture

IOS Audio and Video (3) AVFoundation playback and recording

IOS Audio and Video (43) AVFoundation Audio Session

IOS Audio Queue Services for AVFoundation

IOS Audio and Video (45) HTTPS self-signed certificates implement side play

IOS Audio and Video (46) Offline online speech recognition solution

1. Overview of side seeding

If the video file is very large, users need to wait for a long time to see the video. This experience is not very friendly. To solve this problem, the IOS APP needs to implement the side-down playback function and use a data stream. The colleague who finishes watching the video saves the video to the local computer. After the video is played, the video is downloaded to the local computer. After downloading, the video is in.MP4 format and can be exported to play directly. When the user watches the secondary video for the second time, the video will not be obtained from the robot, but will be directly read from the local cache, that is, it can also be watched offline.

It can save data flow and watch the video recorded by the robot in real time by dragging and dropping.

This functionality meets the following requirements:

Supports all functions of normal player, including pause, play and drag. It can play locally cached videos or real-time videos recorded by the robot.
If the video is loaded and complete, the video file is saved to the local cache. The video in the local cache is played next time and no network data is requested.
If the video is not loaded (closed or dragged halfway), play the cached video from the cache first, and enable the download function at the same time, continue to download the rest of the video from the end of the last video.
Since the robot uses HTTPS + self-signed certificate, real-time video playback needs to solve the problem of certificate trust.

2. Implementation scheme of edge-down seeding

There are a lot of schemes for IOS client to realize side-down casting. At present, I have found three solutions. The implementation principles of the three schemes are described in detail below. JimuPro already uses an open source player: VGPlayer. This player basically implements the details of plan 3. The problem is that HTTPS self-signed certificate authentication is not implemented.
In IOS projects, I recommend using the third solution to implement the side play feature.

Scheme 2.1 a

By analyzing the format of MP4, the data of MP4 can be directly downloaded and written into files, and then the local video files can be played directly by the player.

The solution is to download the video to the local file first, and then pass the local video file address to the player, which actually plays the local file. If the playback progress of the player is greater than the current playable download cache progress, the player will pause the playback and wait for the cache time to be sufficient before the player starts to play. The download method of this scheme has nothing to do with the player at all. It just writes the video data sent by the server to the local file in sequence, and then lets the player read the data.

At present, the existing solution is that when the cache reaches 500KB, the cached address is transmitted to the player, and the video file less than 500KB is downloaded and then played, which is slow (need to be improved). Let the player play when the download progress exceeds the playback progress by 5 seconds, otherwise pause. If seek finds no cache, it switches to the network and stops the current download, wasting some traffic. Each download will save a configuration file to save whether the download is complete. If the download is not complete, the second time according to the current cache file size, restart the download sequence.

In general, the first scheme has the following disadvantages:

The user may wait for a long time to play the video

Traffic waste (seek will broadcast network streams and stop downloading)

It requires too much logic to control video playback and is heavily coupled to the player code.

It takes time to cut the source after seek, and each seek is slow

2.2 2

The local proxy server used: in the server side (robot side) support shard download mode, APP has a built-in HTTPServer proxy server, proxy server to achieve the data cache to the local, at the same time between the APP player proxy server to obtain playback data. This implementation is more complex, if not handled properly, easy to lead to crash problems.

This proxy server can also be done on the robot side, with one interface for playback and one interface for downloading.

Using HTTPServer, start an HTTP server locally and point the cached request address to the local server with the real URL address. Whether or not we use caching, HTTPServer should be silently enabled when the application is opened, which is a big drain on APP performance. And our introduction of the HTTPServer library will also increase the package size.

2.2.1 Technical points

The features of this scheme are as follows:

Interception of player request data from socket through proxy server;

According to the intercepted range information, request video data from the network server;

The video data is written into the local file, and can continue to be written and played from the position of SEEK.

Side seeding, speed up the playback speed;

Completely decoupled from the player logic and is just an address to the player

This scheme is to add a layer of proxy server between the player and the video source server, intercept the request sent by the video player, according to the intercepted request, request data to the network server, and then write to the local. The local proxy server reads the data from the file and sends it to the player for playback. As shown below:

As shown in the figure above, the details of the process are as follows:

Start the local proxy server.

The video source address is transmitted to the local proxy server.

Translates the video source address to the local proxy server as the video source address of the player.

The player sends a request to the local proxy server.

The local proxy server intercepts the request and sends the request to the real server based on the parsed request information.

The local proxy server starts to accept data, write the file, and return the file data back to the player.

The player receives the data and plays it.

Seek then repeats the above steps.

The above process mainly describes the real-time playback process implemented by the proxy server. The following focuses on the download process of the proxy server.

Download process implementation

Considering that when the video is playing, the user will drag the progress bar to seek, and at this time, the video file needs to be downloaded from the position dragged by the user, which will cause a lot of holes in the video file, as shown below:

Fragment = [start, end]; array = [fragment0The fragments1,fragment 2The fragments3];
Copy the code

Among themfragmentI’m referring to downloaded clips,startRefers to where the segment begins,endIs the end position of the fragment.

arrayIt refers to storagefragmentAn array offragmentIs to rely onstartInsert into the array from the smallest to the largest, keeping the array sorted.

The downloaded fragment is recorded in an array:Array = [Fragment0, fragment 1, fragment 2, fragment 3];

The download is divided into two stages: seek stage and hole – filling stage.

Seek stage: it is downloaded according to the position of the user seek when playing.

There are two cases according to the position of seek:

Case 1: If seek is in an existing fragment (e.g., at seek1 in the figure, where it has data), it requests data from the end of that fragment (end1) to the beginning of the next fragment (fragment2 start). Rang1 = (end1) — start2; This section after the download is complete, if write download clips for fragment1.1, will put fragment1, fragment1.1, fragment2 for a snippet for fragment1-2, the array = [fragments 0, fragement1-2, Frament3]; The status after this download is shown in Figure 2:

And then keep downloading untilArray = [fragment 0, fragement1-3];And then you decidefragement1-3Have you reached the end of the file? If so, download the file. If not, start fromfragement3(end3) start downloading until the end of the file.
Situation 2: If the location sought is not in an existing fragment (for example, seek2 in Figure 1), download data from the location sought until the next fragment starts (fragment2’s start2). If the fragment is labeled fragment1.1, Fragment2: Array = [Fragment 0, fragment1, Fagment1.1-2, Fragment3] After the merge, you can see figure 3 below: The next step is to continue downloading until you reach the end of the file;

If the clip is too small, saving it will cause the player to send an extra request the next time it plays, which can be costly. For example: as shown in Figure 3, iffragment1The size is only 1KB and would like to addfragment0withFragment1.1-2Data between, you need to send two requests, such frequent send requests, more waste resources. So whenfragmentToo small to be in the configuration array. This will result in one less request and will not waste a lot of traffic. When the download fragment is too small (e.g., the length of the download<20KB) is not stored in the fragment array (to control the granularity of the fragment). A problem with this is that when there is a hole in the middle of the video file that is less than 20KB, the clip never fills up. And that’s where stage two comes inFill hole phase.

Fill hole phase: The second stage hole filling stage, is the second time when the playback, if there is a hole in the file, this time no matter how small the fragment, will also be saved in the fragment. Finally, when the configuration array stores only the last dataLength} {0,.lengthIs the total length of the video, it indicates that all files have been downloaded.

2.3 plan three

A better solution for IOS is to use the native IOS API and use the AVAssetResourceLoader to cache played audio and video without changing the AVPlayer API.

Plan 3 is similar to Plan 2, but with the help of the original IOS API.

Use the IOS system automatic API to realize the video side downplay function:

Instead of opening a separate sub-thread to download, the video playback data is saved locally. In short, it’s using the traffic once, both playing and saving the video.

The specific implementation scheme is as follows:

A layer of proxy-like mechanism needs to be added between the video player and the server. Instead of the video player accessing the server directly, the video player accesses the proxy object, which accesses the server for data and then returns it to the video player, while the proxy object caches the data according to certain policies.

This mechanism is implemented by resourceLoader in AVURLAsset, whose delegate is the delegate object described above.

Before playing the video, the player checks whether the video exists in the local cache. If not, the player obtains the data from the proxy. If yes, the player directly plays the video in the local cache.

If HTTP is used, the preceding three steps can implement the downcast function. If HTTPS is used, the server certificate uses a certificate signed by the certificate authority, and can be processed in the same way as HTTP. However, if HTTPS+ self-signed certificate is used, the certificate must be verified before each request by resourceLoader, which is step 5 below

2.3.1 AVPlayer implements the edge-to-bottom seeding process

Let’s first refer to the following flow chart of playing QQ music online:

First observe and guess the cache strategy of Penguin music (of course it is not played by AVPlayer) : 1. Start playing and download the complete file at the same time. When the file is downloaded, save it to the cache folder. 2, when seek (1) if seek to have downloaded the part, directly seek success; PS1: at this time, the file download range is 70%-100% PS2: If the file download progress is 60%, seek progress is 50% PS3: If there is another seek operation, repeat Step 2. If the seek operation reaches 40% progress, then a new download will be started (40%-100%). 4. Next time the same song is played, if it exists in the cache folder, the cache file will be played directly.

We use AVPlayer to realize the general process of side play with the above QQ music caching mechanism is similar, is dependent on AVAssetResourceLoader. The general process is as follows:

As shown in the figure above, we can briefly describe the process of AVPlayer to implement edgeplay:

When a video is played, the system checks whether the current video is cached in the local cache based on the URL of the video. If yes, the system plays the video directly

If there is no video in the local cache, the video player requests data from the agent

Display loading prompt when loading video (chrysanthemum turn)

If the video can be played normally, the loading prompt is removed and the video fails to be loaded

If no data is played during the playback due to slow network or dragging, the loading prompt is displayed, and go to Step 4

Caching proxy policy:

When the video player requests a dataRequest from the agent, it determines whether the agent has already made a request to the server, and if not, it makes a request 2 to download the entire video file. If the proxy has already established a link with the server, it determines whether the offset of the current dataRequest request is greater than the offset of the currently cached file. If so, it cancels the current request and requests the server from offset to the end of the file. And the cached data is exceeded.)

If the offset of the current dataRequest request is less than the offset of the cached file and greater than the offset of the range requested by the proxy to the server, some cached data can be passed to the player, and this part of the data is returned to the player (this should be due to the player dragging forward, The requested data will not appear until it has been cached.

If the offset of the current dataRequest request is less than the offset of the range requested by the agent to the server, cancel the current request to the server and request the server from the offset to the end of the file (this should occur when the player drags forward and exceeds the cached data).

As long as the proxy sends a request to the server again, the cached data will be discontinuous. Therefore, the cached data will not be put into the local cache after loading

If the proxy/server connection times out, try again, and notify the player of a network error if it still fails

If the server returns another error, the agent notifies the player of a network error

2.3.2 AVPlayer API Introduction

We generally use AVPlayer in the AVFoundation framework to implement custom player for playing network videos in IOS. However, the relevant APIS of AVPlayer are highly encapsulated, so when we play network videos, we often cannot control its internal playing logic. For example, we may find that seek will fail when playing. After the data is loaded, we cannot obtain the data file for other operations, so we need to find a way to make up for its shortcomings, here we choose AVAssetResourceLoader. We also rely on it to implement the edge-down function here.

AVAssetResourceLoader lets you control the loading of AVPlayer data, including information to get the data required by AVPlayer, and how much data can be passed to AVPlayer.

Let’s take a look at the AVPlayer component diagram:

AVAssetResourceLoader: a tool released in iOS 6 specifically to handle AVAsset loading. This fully meets JimuPro’s requirements for running above IOS10.

AVAssetResourceLoader a AVAssetResourceLoaderDelegate agent, the agent has two important interface:

To request the proxy method to load resources, we need to save the loadingRequest and read or download the specified data. When the data read or download is completed, we can complete the loadingRequest.

- (BOOL)resourceLoader:(AVAssetResourceLoader *)resourceLoader 
shouldWaitForLoadingOfRequestedResource:(AVAssetResourceLoadingRequest *)loadingRequest;
Copy the code

To cancel the proxy method for loading the resource, we need to cancel the data read or download operation specified by loadingRequest.

- (void)resourceLoader:(AVAssetResourceLoader *)resourceLoader 
didCancelLoadingRequest:(AVAssetResourceLoadingRequest *)loadingRequest;
Copy the code

As long as we look for an object implements AVAssetResourceLoaderDelegate to this agreement, throw it to the asset, then throw the asset to the AVPlayer, AVPlayer play in the implementation of the time will go to ask the delegate: Hey, can you play this URL? The following method is then triggered: – (BOOL)resourceLoader:(AVAssetResourceLoader *)resourceLoader shouldWaitForLoadingOfRequestedResource:(AVAssetResourceLoadingRequest *)loadingRequest

In this method we check to see if the URL in the request is supported and return YES! You can then happily scroll down the video data while feeding it to AVPlayer to display the video.

AVUrlAsset requests custom URLScheme resources through the AVAssetResourceLoader instance. It is an attribute of AVUrlAsset and is declared as follows: var resourceLoader: AVAssetResourceLoader {get}

And AVAssetResourceLoader request to the relevant request (AVAssetResourceLoadingRequest) passed to AVAssetResourceLoaderDelegate implementation (if any), we can save these requests, Then construct own NSUrlRequset to send the request, when the response is received, the response data set to AVAssetResourceLoadingRequest, and the data cache, completes the edge below, the whole process roughly as follows:

AVAssetResourceLoadingDataRequest
currentOffset

Below we detail in the future use AVPlayer and AVAssetResourceLoaderDelegate edge under planting concrete realization.

3 details of implementation of HTTP side mp4 file

At present, there are a lot of code on the Internet about the IOS side below cast, in fact, the principle is the same, but the implementation method, details are not the same, here recommend two better open source code:

Git currently has 642 stars on it, which is pretty good.
Swift version: VGPlayer currently has 363 stars on Git, with relatively complete functions, which I recommend.

3.1 The principle of side and bottom seeding

The principle of edge-downcast has been described in detail in the introduction of the three schemes above. Here, it is mainly based on the third scheme to achieve edge-downcast with AVPlayer. Here, the problem of HTTPS signature authentication is put aside, and the next step is to explain the side seeding based on HTTP. The main flow diagram is as follows:

The whole process is divided into two chunks, one is real-time playback video, one is caching strategy download video.

3.1.1 Mechanism of Real-time Playback

Let’s take a look at the first piece, real-time video playback (forget downloading and caching). In terms of implementation, we can divide it into two steps:

You need to know how to request data, what the URL is, and how much data to download.
How to load AVPlayer with downloaded data

3.1.1.1 Requesting Data

In the callback method above, will get a AVAssetResourceLoadingRequest object, it does not have many attributes and methods, in order to reduce the interference, I have make this class header file, leaving only we would use the attributes and methods as well as the need to explain:

@interface AVAssetResourceLoadingRequest : NSObject 

 @property (nonatomic, readonly) NSURLRequest *request;

 @property (nonatomic, readonly, nullable) AVAssetResourceLoadingContentInformationRequest *contentInformationRequest NS_AVAILABLE(10 _9.7 _0);

 @property (nonatomic, readonly, nullable) AVAssetResourceLoadingDataRequest *dataRequest NS_AVAILABLE(10 _9.7 _0);

 - (void)finishLoading NS_AVAILABLE(10 _9.7 _0);

 - (void)finishLoadingWithError:(nullable NSError *)error;

 @end 
Copy the code

In AVAssetResourceLoadingRequest request on behalf of the original request, because the AVPlayer is will trigger a shard download strategy, also need to request from dataRequest range information. With the request address and Range, we can create a new NSURLRequest object with the request Range header set, and let the downloader download the data within the Range of the file.

3.1.1.2 Match data to AVPlayer

When AVPlayer triggers the download, it always initiates a data request with a Range of 0-2. This request is actually used to confirm the information of the video data, such as the file type and the file data length. When downloader initiate this request, after receiving server returns the response, we have to fill the video information into AVAssetResourceLoadingRequest contentInformationRequest attributes, Tell me the format and length of the video to download.

AVAssetResourceLoadingRequest the – (void) finishLoading, according to the information in the contentInformationRequest, to determine how to deal with. For example, if you download the file pointed to by the URL in the AVURLAsset, the obtained contentType of the file is not supported by the system, and the AVURLAsset will not play normally.

After obtaining the video information, we will receive the 2 Byte data specified just now. What about the downloaded data? Can give the dataRequest in AVAssetResourceLoadingRequest plug. – (void)respondWithData:(NSData *)data; Designed to receive downloaded data, this method can be called multiple times to receive incremental sequential data.

When AVAssetResourceLoadingRequest require all the data download finished, call – (void) finishLoading the download is complete, AVAssetResourceLoader will continue to launch after the request of the data segment. If the request fails, call – (void)finishLoadingWithError:(nullable NSError *)error; End download.

3.1.1.3 Retry Mechanism

In the actual test, AVAssetResourceLoader was found to perform load, – (void)resourceLoader:(AVAssetResourceLoader *)resourceLoader DidCancelLoadingRequest: (loadingRequest AVAssetResourceLoadingRequest *), and then restart strategy load request. If a portion is downloaded, the reinitiated download request will start with the portion that has not been downloaded.

AVAssetResourceLoaderDelegate in three ways to do processing in view of the special scene, but in the current environment with less than so you can choose not to implement these methods.

3.1.2 Mechanism of Download Caching

Through introducing the principle of real-time playback above, we already know AVAssetResourceLoaderDelegate implementation mechanism, through the delegate when AVAsset need to load the data tell outside, outside to take over the whole video download process.

When we take over video downloads, we can do anything with video data. For example: caching, recording download speed, getting download progress, and so on.

To implement a downloader, start a DataTask request with URLSession, stuff the received data to the DataRequest and write it to the local disk. There are three main points to note when implementing a downloader: 1. Range requests 2. 3. Fragment cache

3.1.2.1 Range request

Can through the Range fragment request, is the realization of real-time playback, the key to the bottom.

Each LoadingRequest is returned with information about the data Range of the request, such as the expected 100 to 500 bytes of the request. The Range value of the HTTPHeader needs to be set when creating the URLRequest.

NSString *range = [NSString stringWithFormat:@"bytes=%lld-%lld", fromOffset, endOffset];
[request setValue:range forHTTPHeaderField:@"Range"];
Copy the code

Introduce the biggest complex block download is contentOffset on response data processing, fortunately AVAssetResourceLoader help us deal with a lot of work, we only need to use good AVAssetResourceLoadingRequest is ok.

For example, here is the code section, which starts with getting the original request and sending the new one

func resourceLoader(_ resourceLoader: AVAssetResourceLoader, shouldWaitForLoadingOfRequestedResource loadingRequest: AVAssetResourceLoadingRequest) -> Bool {
    if self.session == nil {
        / / constructs the Session
        let configuration = URLSessionConfiguration.default
        configuration.requestCachePolicy = .reloadIgnoringLocalAndRemoteCacheData
        configuration.networkServiceType = .video
        configuration.allowsCellularAccess = true
        self.session = URLSession(configuration: configuration, delegate: self, delegateQueue: nil)}// Construct the save request
    var urlRequst = URLRequest.init(url: self.initalUrl! , cachePolicy: .reloadIgnoringLocalCacheData, timeoutInterval:20) / / 20 s timeout
    urlRequst.setValue("application/octet-stream", forHTTPHeaderField: "Content-Type")
    urlRequst.httpMethod = "GET"
    // Set the request header
    guard let wrappedDataRequest = loadingRequest.dataRequest else{
        // There is no data request for this request
        return true
    }
    let range:NSRange = NSMakeRange(Int.init(truncatingBitPattern: wrappedDataRequest.requestedOffset), wrappedDataRequest.requestedLength)
    let rangeHeaderStr = "byes=\(range.location)-\(range.location+range.length)"
    urlRequst.setValue(rangeHeaderStr, forHTTPHeaderField: "Range")
    urlRequst.setValue(self.initalUrl? .host, forHTTPHeaderField:"Referer")
    guard lettask = session? .dataTask(with: urlRequst)else{
        fatalError("cant create task for url")
    }
    task.resume()
    self.tasks[task] = loadingRequest
    return true
}
Copy the code

After receiving the response request, the packet is captured to check the response headers. The following figure shows the request headers of two responses:

Content-Length
Content-Range
start-end/total
Content-Length = end - start + 1

3.1.2.2 Download can be cancelled

When loading a video, AVAsset often triggers an undownload before a data request has been completed, and then initiates a new LoadingReqeust. This mechanism is a black box in AVAsset, the exact logic is not known, more like a retry mechanism in AVAsset. As a downloader, you need to stop downloading immediately when you receive a cancellation notification. Since the cancel operation of DataRequest is asynchronous, it is possible that the next LoadingRequest will arrive before the Cancel is completed. Therefore, it is necessary to ensure that only one downloader can download the same URL at the same time, otherwise the problem of data chaos will occur.

3.1.2.3 Fragment Cache

If you’re just downloading a video and the data is monotonically increasing, caching is easier. However, the reality is that users’ seek operations on Player bring great challenges to video cache management. Once user operations are involved, there will be more possibilities and higher complexity.

If there is no SEEK: When the network speed is normal, the cache data is more open than the playback time, and the playback is normal; When the network speed is slow, the player loading until there is enough data to play. If the network speed is slow, the player will play for a few seconds.

There are three possibilities when you join Seek:

In the first case, the video is completely downloaded and seek only needs to read the corresponding cache. In this case, it is easiest to read the data directly from the cache.
In the second case, when the video is half downloaded, the user seeks the undownloaded part, and the part requested by LoadingRequest is all undownloaded data. At this point, you need to cancel the data you are downloading, and then start downloading the data from seek’s point. In order to support seek, the downloader needs to support sharding caching. The current solution is that the downloaded video data is stored in the corresponding offset position in the file according to the requested Range value, and each video file will save a corresponding download information file. This information file keeps track of how much data is currently downloaded, how much data is in total, what fragments of data are downloaded, etc., and will be heavily dependent on this configuration file for future cache management.
In the third case, the video is sought several times, and the user seeks to a point in time where the part of the LoadingRequest request contains the downloaded and undownloaded parts. This situation is the most complicated! The simple thing to do is to treat it as the case above and re-download it all. Although the logic is simple, this solution will download the same data many times, so it’s not optimal. My goal is to make the best solution, of course, but a much more complex solution.

Upon receipt ofLoadingRequestAfter the request range of “, the downloader will first obtain the downloaded data information and create one of the downloaded shard informationactionAnd then create one fragment of data to be downloaded remotelyaction. The final combination could beLocalAction(50-100 bytes) + RemoteAction(101-200 bytes) + LocalAction(201-300 bytes) + RemoteAction(300-400 bytes). eachactionIt retrieves the data sequentially and returns it toLoadingRequest. The diagram below:

3.2 Implementation details of edge-to-bottom seeding

When downloading a video, it is easy to have an error and not be able to download normally. Our own implements AVAssetResourceLoaderDelegate throw an error to occur in the first request, players will immediately prompt error status, and if they are already response data, again wrong, AVAssetResourceLoader will ignore errors and continue loading until it times out. In the VIMediaCache implementation given above, the VIResourceLoaderManager provides a delegate that throws an error if it occurs internally, and the external business decides how to handle it.
The same URL cannot be downloaded multiple times at the same time: Because the internal implementation of the cache is to share the same download configuration file for each URL, if the same URL is downloaded multiple times at the same time, the file download information will be modified at the same time, and the download information will become confused. VIMediaCache does a simple process inside the MediaCache. If you are downloading a URL and try to download the same URL again, an error will be thrown indicating that the download cannot start.
In fact, VGPlayer is just a reference to the Swift version of VIMediaCache, VIMediaCache is really written OC version, worth a good study.
In view of our JimuPro engineer’s pure Swift project, the third-party library is installed in it without using OC code, so I give priority to VGPlayer to realize the side down play function from robot end to IOS APP end.
Since VGPlayer does not implement HTTPS certificate validation, I will simply implement the certificate validation code here. We will explain HTTPS certificate authentication implementation below. Here I simply said I realized, in VGPlayerDownloadURLSessionManager. Swift file VGPlayerDownloadURLSessionManager class add a URLSession a proxy implementation:
Even if you use the above source code to download and play as you go, there are still a few details that need to be paid attention to. For example, the implementation of play-by-play for MP4 files depends not only on the implementation described above, but also on the file format of MP4. If the meta data of MP4 file is placed at the end of the file, we need to convert the MP4 file on the server side to realize the function of side play.

Next, we will explain the mp4 format processing problem in detail.

3.3 Watch out for the mp4 file format

It is important to be clear that even if you use the above caching method to implement edge-to-bottom playback, not all mp4s support this, this requires you to understand how edge-to-bottom playback works.

The mp4 video header contains some metadata. Metadata includes video width, height, video duration, and encoding format. Mp4 metadata is usually in the header of the video file, so the player reads the metadata of the video first when reading the file, and then starts playing the video.

Of course, there is a case where the metadata of the MP4 video is at the end of the video file, so that when the player loads the video file, it reads the video information until the end, and then starts playing. The same is true if metadata is missing. As a result, MP4 video does not support loading and playing at the same time.

Why will appear above said this kind of situation, let’s briefly analyze the principle below:

There is a Range: Byte field in the request header that tells the media server what particular length of file to request. For MP4 files, all data is encapsulated in boxes or Atom, of which two atom are particularly important: Moov Atom and MDAT Atom.

moov atom: a data structure that contains metadata for the media, including box information for the media, format specification, and so on.
mdat atom: Media information containing media, for video is the video picture.

Send a request in IOS and use NSUrlSession to request video resources directly. The video with meta information at the head of the video file can be played at the bottom of the side, while the video with meta information at the bottom of the video file will be played after downloading. Why?

The answer is: There is only one MOOv and one MDAT, but since the MP4 file is composed of several such boxes or Atoms, the order in which the two Atoms appear in different media files may be different. To speed up streaming, one of the optimizations we can make is to manually place moov before MDat. Only for AVPlayer AVPlayerItemStatusReadyToPlay state, can start playing video, and entered into a state of AVPlayerItemStatusReadyToPlay necessary condition is that a player moov block read the media.

If MDat is behind MOOV, such MP4 video files cannot be played down the edge. To support edge-to-bottom mp4 videos, both MOOV and MDAT are at the head of the file, and moov precedes MDAT. As shown below:

So, if the meta data of MP4 file is placed at the end of the file, we need to convert the MP4 file on the server side to realize the function of side play.

The possible approach is to use the Qt-FastStart tool. Qt-faststart can move moov Atom metadata at the end of MP4 files to the front, but since the Qt-FastStart tool can only process files with Moov Atom metadata at the end of MP4. If we want to process all the files uniformly: the whole idea is to process the MP4 files through FFMPEG, move the Moov Atom metadata to the end, and then use the Qt-Faststart tool to move it to the front.

3.3.1 SPECIAL processing of MP4 metadata

To compile the FFmpeg download click here

Unzip the FFmpeg package:The tar - JXVF ffmpeg - 3.3.3. Tar..bz2
Configuration:./configure --enable-shared --prefix=/usr/local/ffmpegPrefix is used to set the installation location. The default is usr/local.
Installation:

make
make install
Copy the code

Compilation and installation will take a long time, about 10 minutes, after installation can go to the installation directory to check. This is not the end of the current use of the general error:

ffmpeg: error while loading shared libraries: libavfilter.so.1: cannot open shared object file: No such file or directory
Copy the code

You need to edit/etc/ld.so.confAdd the following content to the file:/usr/local/lib, save the Settings and exitldconfigCommand.

echo "/usr/local/ffmpeg/lib" >> /etc/ld.so.conf
#Note that this is where you installed FFMPEG earlier
ldconfig
Copy the code

Qt-faststart installs as described aboveqt-faststartIn fact, the tool is in ffMPEG source, because ffMPEG decompression end file in the existence of qt-Faststart source, so directly used, location in the decompression path/tools/qt-faststart.c

If you want to download separately click here: Qt-Faststart download 6. Run the make tools/qt-faststart command to decompress ffmpeg. In tools, a qt-faststart file (and a.c file) is displayed. 7.

CD ffmpeg installation directory /bin; ./ffmpeg -i /opt/mp4test.mp4 -acodec copy -vcodec copy /opt/1.mp4#/opt/mp4test.mp4 indicates the original mp4 file path, and /opt/1.mp4 indicates the generated file path

Copy the code

qt-faststartMove metadata to the beginning of a file:

CD Directory for decompressing the ffmpeg package /tools; ./qt-faststart /opt/1.mp4 /opt/2.mp4Copy the code

4 HTTPS side seeding self-signed certificate authentication

For details on how HTTPS self-signed certificates work, see our previous blog post: IOS uses self-signed Certificates to develop HTTPS file transfers

HTTPS SSL encryption process for establishing a connection

The diagram below:

Process details:

① The browser of the client sends a request to the server, and transmits the version number of the SSL protocol of the client, the type of encryption algorithm, the generated random number, and other information needed for communication between the server and the client.

② The server sends the SSL protocol version number, encryption algorithm type, random number, and other related information to the client. At the same time, the server also sends its certificate to the client.

(3) the client use the server to get the information of the legality of the authentication server, the server includes: the legality of certificate has expired, release the server certificate CA is reliable, the issuer public key certificate can correctly solve the server certificate of “digital signatures” issuers, domain name whether on the server’s certificate and match the actual domain name server. If the validity verification fails, the communication is disconnected. If the validity is verified, the fourth step will be continued.

④ The client randomly generates a symmetric password for communication, encrypts it with the public key of the server (obtained from the server certificate in Step 2), and sends the encrypted pre-master password to the server.

⑤ If the server requires the customer’s identity authentication (optional during the handshake), the user can create a random number and then sign the data, and send the random number containing the signature to the server together with the customer’s own certificate and the encrypted “pre-master password”.

⑥ If the server requires customer identity authentication, the server must verify the validity of the customer certificate and signature random number. The specific validity verification process includes: Check whether the customer’s certificate use date is valid, whether the CA that provides the certificate is reliable, whether the issuing CA’s public key can unlock the digital signature of the issuing CA, and whether the customer’s certificate is in the Certificate Repeal list (CRL). If the inspection does not pass, communication immediately interrupted; If verified, the server unlocks the encrypted “pre-master password” with its own private key, and then performs a series of steps to generate the master communication password (the client generates the same master communication password in the same way).

⑦ The server and client use the same master password that is “call password”, a symmetric key for SSL protocol encryption and decryption communication of secure data communication. At the same time, the integrity of data communication should be completed during SSL communication to prevent any changes in data communication.

The client sends a message to the server, indicating the following steps of data communication. The master password in ⑦ is a symmetric key, and at the same time notifies the server client that the handshake process is over.

⑨ The server sends a message to the client indicating that the master password in step 7 is a symmetric key and notifies the client that the server-side handshake process is over.

⑩ After the SSL handshake is complete, data communication over the SSL secure channel starts. The client and server use the same symmetric key for data communication and check the communication integrity.

Here I only give my project in the use of VGPlayer HTTPS certificate authentication method implementation code, only two simple can be achieved:

Add a self-signed certificate from the server to the project:
In VGPlayerDownloadURLSessionManager. Swift file VGPlayerDownloadURLSessionManager class adds a URLSession a proxy to implement:

public func urlSession(_ session: URLSession, didReceive challenge: URLAuthenticationChallenge, completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void) {
        let method = challenge.protectionSpace.authenticationMethod
        if method == NSURLAuthenticationMethodServerTrust {
            // Verify the server, directly trust or verify the certificate of the two options, recommended to verify the certificate, more security
            completionHandler( HTTPSManager.trustServerWithCer(challenge: challenge).0.HTTPSManager.trustServerWithCer(challenge: challenge).1)}else if method == NSURLAuthenticationMethodClientCertificate {
            // Authenticate the client certificate
             
            completionHandler( HTTPSManager.sendClientCer().0.HTTPSManager.sendClientCer().1)}else {
            // In other cases, no validation
            completionHandler(.cancelAuthenticationChallenge, nil)}}Copy the code

The authentication class HTTPSManager is implemented as follows:

//
// HTTPSManager.swift
// JimuPro
//
// Created by yulu kong on 2019/10/28.
// Copyright © 2019 UBTech. All rights reserved.
//

import UIKit


class HTTPSManager: NSObject {
    
// // MARK: - SLL certificate processing
// static func setKingfisherHTTPS() {
// // Obtain the downloader singleton
// let downloader = KingfisherManager.shared.downloader
// // Trusted Server IP address
// downloader.trustedHosts = Set([ServerTrustHost.fileTransportIP])
/ /}
//    
// static func setAlamofireHttps() {
//        
// SessionManager.default.delegate.sessionDidReceiveChallenge = { (session: URLSession, challenge: URLAuthenticationChallenge) in
//            
// let method = challenge.protectionSpace.authenticationMethod
// if method == NSURLAuthenticationMethodServerTrust {
// // Authenticates the server. You can choose to trust the server directly or verify the certificate. Certificate authentication is recommended for greater security
// return HTTPSManager.trustServerWithCer(challenge: challenge)
//// return HTTPSManager.trustServer(challenge: challenge)
//                
// } else if method == NSURLAuthenticationMethodClientCertificate {
// // Authenticates the client certificate
// return HTTPSManager.sendClientCer()
//                
// } else {
// // In other cases, the authentication fails
// return (.cancelAuthenticationChallenge, nil)
/ /}
/ /}
/ /}
    
    // Trust the server directly without doing any validation
    static private func trustServer(challenge: URLAuthenticationChallenge)- > (URLSession.AuthChallengeDisposition.URLCredential?). {let disposition = URLSession.AuthChallengeDisposition.useCredential
        let credential = URLCredential.init(trust: challenge.protectionSpace.serverTrust!)
        return (disposition, credential)
        
    }
    
    // Verify the server certificate
    static  func trustServerWithCer(challenge: URLAuthenticationChallenge)- > (URLSession.AuthChallengeDisposition.URLCredential?). {var disposition: URLSession.AuthChallengeDisposition = .performDefaultHandling
        var credential: URLCredential?
        
        // Get the certificate sent by the server
        let serverTrust:SecTrust = challenge.protectionSpace.serverTrust!
        let certificate = SecTrustGetCertificateAtIndex(serverTrust, 0)!
        let remoteCertificateData = CFBridgingRetain(SecCertificateCopyData(certificate))!
        
        // Load the local CA certificate
// let cerPath = Bundle.main.path(forResource: "oooo", ofType: "cer")!
// let cerUrl = URL(fileURLWithPath:cerPath)
        
        let cerUrl = Bundle.main.url(forResource: "server", withExtension: "cer")!
        let localCertificateData = try! Data(contentsOf: cerUrl)
        
        if (remoteCertificateData.isEqual(localCertificateData) == true) {
            // The server certificate is verified
            disposition = URLSession.AuthChallengeDisposition.useCredential
            credential = URLCredential(trust: serverTrust)
            
        } else {
            // Server certificate verification failed
            //disposition = URLSession.AuthChallengeDisposition.cancelAuthenticationChallenge
            disposition = URLSession.AuthChallengeDisposition.useCredential
            credential = URLCredential(trust: serverTrust)
        }
        
        return (disposition, credential)
        
    }
    
    // Send the client certificate to the server for verification
    static  func sendClientCer(a)- > (URLSession.AuthChallengeDisposition.URLCredential?). {let disposition = URLSession.AuthChallengeDisposition.useCredential
        var credential: URLCredential?
        
        // Obtain the path of the P12 certificate file
        let path: String = Bundle.main.path(forResource: "clientp12", ofType: "p12")!
        let PKCS12Data = NSData(contentsOfFile:path)!
        let key : NSString = kSecImportExportPassphrase as NSString
        let options : NSDictionary = [key : "123456"] // Client certificate password
        
        var items: CFArray?
        let error = SecPKCS12Import(PKCS12Data, options, &items)
        
        if error == errSecSuccess {
            
            let itemArr = items! as Array
            let item = itemArr.first!
            
            let identityPointer = item["identity"];
            let secIdentityRef = identityPointer as! SecIdentity
            
            let chainPointer = item["chain"]
            let chainRef = chainPointer as? [Any]
            
            credential = URLCredential.init(identity: secIdentityRef, certificates: chainRef, persistence: URLCredential.Persistence.forSession)
        }
        
        return (disposition, credential)
    }
}


Copy the code

6 Basic principles of the PLAYER

6.1 Introduction to Video Formats

Mp4 is also called MPEG-4.

MP4 is a compression coding standard for audio and video information. It was developed by the International Organization for Standardization (ISO) and the Moving Picture Experts Group (MPEG) under the International Electrotechnical Commission (IEC). The first version was approved in October 1998. The second edition was adopted in December 1999. The mPEG-4 format is mainly used for streaming, CD, voice (video telephony), and television broadcasting.

Mpeg-4 includes most of the functionality of mpeg-1 and mpeg-2 as well as the benefits of other formats. It also adds and extends support for VirtualReality Modeling Language (VRML), object-oriented composite files (including sound effects, Video and VRML objects) as well as digital rights management (DRM) and other interactive features. One of mPEG-4’s more advanced features than MPEG-2 is that it no longer uses macro blocks for image analysis, but instead records individual changes on the image, so even though the image changes quickly and the bit rate is insufficient, there is no square picture.

MP4 standard MPEG-4 code stream mainly includes basic code stream and system stream. Basic code stream includes encoding stream representation of audio and video and scene description. Each basic code stream contains only one data type and is decoded by its own decoder. The system flow specifies the method of generating interactive modes based on encoded audio-visual information and related scene description information, and describes its interactive communication system.
MP4 can understand into a video encapsulation format Video encapsulation format, or video format, equivalent to a storage containers of video information, it contains the encapsulation video files needed for video information, audio information and related configuration information (such as video and audio correlation information, how to decode, etc.). The direct reflection of a video encapsulation format is the corresponding video file format.

Common encapsulation formats are as follows:

Encapsulation format: It is to compress the encoded video data and audio data into a file according to a certain format. This file can be called a container. Of course it’s just a shell.

We usually store not only audio data and video data, but also metadata for video synchronization. For example, subtitles. These kinds of data will be handled by different programs, but they are bound together when they are transferred and stored.

Common video container formats:

AVI: Was introduced to counter quickTime format (MOV), can only support fixed CBR constant fixed bit rate sound files

MOV: Quicktime encapsulation

WMV: Launched by Microsoft as a market competition

MKV: universal wrapper, good compatibility and cross-platform, error correction, can take external subtitles

FLV: This encapsulation method can protect the original address and is not easy to be downloaded. At present, some video sharing websites adopt this encapsulation method

MP4: Mainly used for MPEG4 encapsulation, mainly used on mobile phones.

Video codec mode

The process of video codec refers to the process of compression or decompression of digital video. When doing video codec, we need to consider the balance of the following factors: The quality of the video, the amount of data required to represent the video (often called the bit rate), the complexity of the encoding and decoding algorithms, Robustness against data loss and errors, ease of editing, random access, the perfection of the encoding algorithm design, end-to-end latency, and many other factors.

Common video coding methods:
The H.26X series, dominated by the International Telex and Video Union Telecommunication Standardization Organization (ITU-T), includes H.261, H.262, H.263, H.264, H.265

H.261, mainly used in older video conferencing and video telephony systems. It was the first digital video compression standard in use. Virtually all subsequent standard video codecs are based on it.

H.262, equivalent to MPEG-2 Part II, is used in DVD, SVCD, and most digital video broadcast systems and wired distribution systems.

H.263, mainly used in video conferencing, video telephony and network video related products. H.263 represents a significant performance improvement over its predecessors in terms of compression of progressive video sources. Especially in the low bit rate end, it can greatly save bit rate on the premise of guaranteeing certain quality.

H.264, equivalent to MPEG-4 Part 10, also known as Advanced Video Coding (AVC), is a Video compression standard. It is a widely used high-precision Video recording, compression, and publishing format. This standard introduces a series of new technologies that can greatly improve compression performance and greatly surpass previous standards at both high and low bit rates.

H.265, known as High Efficiency Video Coding (HEVC), is a Video compression standard that is the successor to H.264. HEVC is believed to not only improve image quality, but also achieve two times the compression rate of H.264 (equivalent to 50% reduction of bit rate under the same picture quality), and can support 4K resolution and even ultra high definition TV, the highest resolution up to 8192×4320 (8K resolution), which is the current development trend.

MPEG series, by the international standards organization (ISO) under the moving image expert group (MPEG) development.

Mpeg-1 Part II, mainly used on VCD, is also used in some online videos. The quality of the codec is roughly equivalent to that of the original VHS videotape.

Mpeg-2 Part II, equivalent to H.262, is used in DVD, SVCD, and most digital video broadcast systems and wired distribution systems.

Mpeg-4 Part ii can be used for network transport, broadcasting, and media storage. It has improved compression performance over MPEG-2 Part ii and the first version of H.263.

Part 10 of MPEG-4, equivalent to H.264, is a collaborative standard between the two coding organizations.

Think of “video encapsulation format” as a container containing video, audio, “video codec mode” and other information. QuickTime File Format(.mov) supports almost all video codec methods, and MPEG(.mp4) supports a wide range of video codec methods. Mov. Mov is a QuickTime File Format. Mov is a QuickTime File Format.

However, it is impossible to know its “video encoding and decoding mode.” The technical term is probably A/B, where A is “video codec mode” and B is “video encapsulation format”. For example, an H.264/MOV video File is encapsulated in QuickTime File Format and encoded in H.264

H.264/ MP4 is used to record video in the robot here, so here I realize the side below broadcast scheme is also aimed at this H.264 video codec mode MP4 container format video files.

The biggest advantage of H264 is that it has a very high data compression ratio. Under the same image quality,H264’s compression ratio is more than 2 times that of MPEG-2 and 1.5~2 times that of MPEG-4.

If the original file size is 88GB, using MPEG-2 compression standard compressed to 3.5GB, compression ratio of 25∶1, and using H.264 compression standard compressed to 879MB, from 88GB to 879MB, H.264 compression ratio of 102∶1

Normally, after the video stream data is collected by our robot, H264 hard coding or FFmpeg H264 soft coding method is used to encode yuV4:2:0 data to get the encoded H264 stream data.
When we play in IOS, we actually take this kind of H264 stream data frame by frame, and then do hard decoding or FFmpeg soft decoding. (Hard decoding can be realized in IOS with the API of VideoToolBox framework, and soft decoding needs to use the H264 decoder in FFmpeg). After decoding, we get the original naked YUV data, and then we convert the YUV data into RGB data, and display the image on the Layer of View in the way of texture rendering by OpenGL ES or Metal.
In fact, all the underlying codes related to decoding and playing are encapsulated by AVPlayer in our AVFoundation framework, without exposing these details to me. We only need to pass a URL to realize the video playing function.

In order to better understand the principle of playing video, I also briefly introduce the relevant knowledge of H264 decoding

6.2 the H264 profile

H264 stream structure: H264 video is compressed into a sequence of frames. Frames contain images, which are divided into many pieces. Each slice can be divided into macroblocks. Each macro block consists of many sub-blocks, as shown below:

In the H264 structure, the encoded data of a video image is called a frame. A frame is composed of one slice or more slices, one slice is composed of one or more macro blocks (MB), and one macro block is composed of 16×16 YUV data. Macroblock is the basic unit of H264 coding.
Field and Frame: A scene or frame of a video can be used to produce an encoded image. In television, to reduce large areas of flickering, a frame is divided into two interlaced fields.
Slice: In each image, a pattern in which macroblocks are arranged into slices. The tablets are divided into I tablets, B tablets, P tablets and some others.

An I slice contains only I macroblocks, a P slice can contain P and I macroblocks, and a B slice can contain B and I macroblocks.

I macro blocks make intra-frame predictions using the pixels decoded from the current slice as a reference.

P macroblock uses the previously encoded image as the reference image for intra-frame prediction.

B macro blocks use bidirectional reference images (preceding and following frames) for intra-frame prediction.

The purpose of chip is to limit the spread and transmission of error code, so that the chip is independent of each other.

H264 code stream hierarchical structure diagram:

Nal Unit: NALU header +NALU data NALU body, is composed of slices. Slice data Slice data: macro block type + PCM data or macro block type + macro block mode + Residual data Residual data: Residual blocks.

NAL units are composed of a NALU header + a slice. Slices can be subdivided into “slice head + slice data “. We have learned that a frame of H254 is composed of multiple slices. Because one frame of data can be transmitted more than once. The diagram below:
Each Slice includes a Slice head + Slice data. Each slice contains many macroblocks. Each macro block contains the type of the macro block, the prediction of the macro block, and the residual data. The diagram below:

We can contain multiple slices in a compressed H264 frame. Have at least one slice, as shown below:

After understanding some basic concepts of H264 code stream above, we can better understand the principle of H264 coding and decoding, as well as the realization principle of image rendering and video player.

An H264 decoding process involves thousands of frames of data. There are three concepts: I frame, P frame, and B frame.

I frame: key frame, using the intra – frame compression technology.

For example, if the camera is pointing at you, very little actually happens to you in a second. Cameras typically capture dozens of frames of data a second. For example, for animations, it’s 25 frames per second, whereas most video files are around 30 frames per second. For some of the more demanding, precise motion requirements, to capture the full action, the advanced camera is generally 60 frames per second. Those that change it very little for a set of frames. In order to compress the data, what happens? Save the first frame completely. You can’t do this without decoding the data behind the keyframe. So I frames are particularly critical.

P frame: forward reference frame. Compression refers only to the previous frame. It’s interframe compression.

The first frame of the video is saved as a key frame. The following frames will depend forward. So frame 2 is dependent on frame 1. All subsequent frames are stored only for the difference of the previous frame. That’s a big reduction. Thus achieving a high compression rate effect.

B frame: bidirectional reference frame, which refers to the previous frame as well as the next frame during compression. Interframe compression technique.

B frame refers to both the previous frame and the later frame. This makes it more compressible. Less data is stored. The more B frames you have, the higher your compression rate will be. This is the advantage of B frame, but the biggest disadvantage of B frame is that if it is a real-time interactive live broadcast, then B frame can only be decoded by referring to the following frame, and it has to wait for the transmission of the following frame in the network. This has to do with the Internet. If the network is in good condition, decoding will be faster, if the network is not good decoding will be slightly slower. Retransmission is also required when a packet is lost. For live interaction, B frames are generally not used.

When we play video in real time, we request a Range of video frames from the server every time. In fact, the server returns a Group of H264 Frame data, which is also called Group of Frame data (GOF). GOF means: one I Frame to the next I Frame. This set of data, including B frame /P frame. As shown below:

In the H264 bit stream, we use SPS/PPS to store GOP parameters.
SPS Sequence Parameter Set: full name is Sequence Parameter Set, Sequence Parameter Set store frame number, reference frame number, decoding image size, frame field coding mode selection identifier, etc.
PPS image Parameter Set: full name is Picture Parameter Set. Store coding mode selection markers, number of slices, initial quantization parameters and filter coefficient adjustment markers to remove squares, etc.

The first thing we receive before a set of frames is the SPS/PPS data. Without this set of parameters, we can’t decode it. One of the problems encountered in WebRTC videos before was that the black screen was sometimes displayed during picture transmission on the IOS terminal. The reason was that the I frame lacked SPS/PPS information, resulting in decoding failure and resulting in black screen.

If we make an error when decoding, first check whether there are SPS/PPS. If not, it is because the peer end did not send it or because the peer end lost it in the process of sending. SPS/PPS data, which we also classify as I frames. These two sets of data must not be lost.
Cause analysis of video splintered screen and lag: We will encounter splintered screen or lag phenomenon when watching videos. So this has to do with the GOF that we just talked about

If the P frame in GOP group is lost, the image of decoding end will be wrong. When the decoding is wrong, we use the picture of the decoding failure to display, resulting in the phenomenon we see the screen

In order to avoid the occurrence of screen splintering, it is generally found that P or I frames are lost. All frames in this GOP are not displayed. Only go to the next I frame and refresh the image.

When this happens because the screen is not refreshed. The frames that lost the packet were all thrown away. The image will be stuck somewhere. That’s why it’s stuck.

So to sum it up, it’s because you lost P frames or I frames. The reason is that Caton threw away the whole group of wrong GOP data in order to avoid wasting screen. Go to the next set of correct GOP and reload. And the time difference is what we feel.

Soft coding and hard coding
Hard coding: Encoding using a non-CPU, such as GPU chip processing

With high performance and low bit rate, the quality is usually lower than that of hard encoder, but some products have transplanted excellent soft coding algorithm (such as X264) on GPU hardware platform, and the quality is basically the same as soft coding.

Hard coding is to use GPU to calculate and obtain data results, which has the advantages of fast speed and high efficiency.

Hardcoded for video on IOSVideoToolBoxFramework for audio hardcoding useAudioToolBoxThe framework

Soft coding: Use CPU to perform coding calculations.

Direct, simple, easy to adjust parameters, easy to upgrade, but the CPU load, performance is lower than hard coding, low bit rate quality is usually better than hard coding.

Soft coding, is to compute through the CPU, get data results.

It is generally used for video soft coding on IOS platformsFFmpeg,X264The algorithm encodes video source data YUV/RGB into H264. For audio usefdk_aacConvert audio data PCM to AAC.

If you want to further explore the underlying principles of the player, you can refer to these two open source players: iJKPlayer and KXMovie, which are packaged based on FFmpeg framework

ijkplayerBilibili is a video player based on FFmpeg produced by Bilibili. Git has 25.7K stars on it. It is very powerful and worthy of further research.
kxmovieGit also has 2.7K stars, which is the certification of strength and worth learning and studying.

6.3 MP4 format

MP4(MPEG-4 Part 14) is a common multimedia container format. It is defined in ISO/IEC 14496-14 standard file and belongs to MPEG-4. It is an implementation of the media format defined in the ISO/IEC 14496-12(MPEG-4 Part 12 ISO Base Media File Format) standard, which defines a common standard for the structure of media files. MP4 is a container format that describes more fully, which is considered to be able to embed any form of data in it, all kinds of encoded video, audio, etc., but most of our common MP4 files store AVC(H.264) or MPEG-4(Part 2) encoded video and AAC encoded audio. The official file name extension for MP4 is “.mp4 “, and there are other extended or reduced versions of the MP4 format, including M4V,3GP,F4V, and so on.

First, let’s take a look at the software’s parsing of MP4 files as shown below:

box
Ftype, MOOV, free, MDAT
mp4
box

The basic structure of box is shown in Figure 6.3.3 below, where size indicates the size occupied by the whole box, including the header part, and type indicates the type of box. If the box is large (for example, the MDAT box that stores specific video data) and exceeds the maximum value of the uint32, size is set to 1 and the following 8 bits, uint64, are used to store the size.

A mp4 file may contain a lot of box, to a great extent, increased the complexity of the resolution, the web site http://mp4ra.org/atoms.html to record some of the current registered box type. See so many boxes, if you want to support all of them, one by one parsing, I am afraid is the head will explode. Fortunately, most MP4 files don’t have that many box types. Here is a simplified version of the common MP4 file structure shown in Figure 6.3.4

stbl box
box
stbl
box
box

6.4 IOS raw API implementation moves the MOOv box of MP4 files to the front

As explained above, using the Qt-Faststart download tool in FFmpeg, it is possible to move the MOOv box of MP4 files to the front so that MP4 files can be played on the side. Here is a way to move the MOOv box of mp4 files from the back of the file to the front using IOS raw code.

However, this way is generally not used, one is because of efficiency, but the general implementation of side seeding, are by the server side to complete this kind of thing.

The specific code is as follows:

- (NSData*)exchangestco:(NSMutableData*) moovdata{

int i, atom_size, offset_count, current_offset;

NSString*atom_type;

longlongmoov_atom_size = moovdata.length;

Byte*buffer = (Byte*)malloc(5);

buffer[4] =0;

Byte*buffer01 = (Byte*)malloc(moov_atom_size);

[moovdatagetBytes:buffer01 length:moov_atom_size];

for(i =4; i < moov_atom_size -4; i++) {

NSRangerange;

range.location= I;

range.length=4;

[moovdatagetBytes:buffer range:range];

atom_type = [selftosType:buffer];

if([atom_typeisEqualToString:@"stco"]) {

range.location= i-4;

range.length =4;

[moovdatagetBytes:bufferrange:range];

atom_size = [selftoSize:buffer];

if(i + atom_size -4> moov_atom_size) {

WBLog(LOG_ERRORThe @"error i + atom_size - 4 > moov_atom_size");

returnnil;

}

range.location= I+8;

range.length=4;

[moovdatagetBytes:bufferrange:range];

offset_count = [selftoSize:buffer];

for(intj =0; j < offset_count; j++) {

range.location= i +12+ j *4;

range.length=4;

[moovdatagetBytes:bufferrange:range];

current_offset= [selftoSize:buffer];

current_offset += moov_atom_size;

buffer01[i +12+ j *4+0] = (Byte) ((current_offset >>24) &0xFF);

buffer01[i +12+ j *4+1] = (Byte) ((current_offset >>16) &0xFF);

buffer01[i +12+ j *4+2] = (Byte) ((current_offset >>8) &0xFF);

buffer01[i +12+ j *4+3] = (Byte) ((current_offset >>0) &0xFF);

}

i += atom_size -4;

}

elseif([atom_typeisEqualToString:@"co64"]) {

range.location= i-4;

range.length=4;

[moovdatagetBytes:bufferrange:range];

atom_size = [selftoSize:buffer];

if(i + atom_size -4> moov_atom_size) {

WBLog(LOG_ERRORThe @"error i + atom_size - 4 > moov_atom_size");

returnnil;

}

range.location= I+8;

range.length=4;

[moovdatagetBytes:bufferrange:range];

offset_count = [selftoSize:buffer];

for(intj =0; j < offset_count; j++) {

range.location= i +12+ j *8;

range.length=4;

[moovdatagetBytes:bufferrange:range];

current_offset = [selftoSize:buffer];

current_offset += moov_atom_size;

buffer01[i +12+ j *8+0] = (Byte)((current_offset >>56) &0xFF);

buffer01[i +12+ j *8+1] = (Byte)((current_offset >>48) &0xFF);

buffer01[i +12+ j *8+2] = (Byte)((current_offset >>40) &0xFF);

buffer01[i +12+ j *8+3] = (Byte)((current_offset >>32) &0xFF);

buffer01[i +12+ j *8+4] = (Byte)((current_offset >>24) &0xFF);

buffer01[i +12+ j *8+5] = (Byte)((current_offset >>16) &0xFF);

buffer01[i +12+ j *8+6] = (Byte)((current_offset >>8) &0xFF);

buffer01[i +12+ j *8+7] = (Byte)((current_offset >>0) &0xFF);

}

i += atom_size -4; }}NSData*moov = [NSDatadataWithBytes:buffer01length:moov_atom_size];

free(buffer);

free(buffer01);

returnmoov;

}

Copy the code

Reference: www.jianshu.com/p/0188ab038… www.jianshu.com/p/bb925a4a9… www.cnblogs.com/ios4app/p/6… www.jianshu.com/p/990ee3db0…