In Android development we often put native resource files in assets to be accessed when needed. Resources.assets. open(filename)/openFd(Filenam) methods provided by the API make it very easy to get InputStream or FileDescriptor, However, it is difficult to read Assets files using FFmpeg. The main reason is that FFmpeg reads the media file by the incoming URL to determine the IO protocol, that is, if you want FFmpeg to be able to read Assets file correctly, you need to select the appropriate IO protocol to construct the URL and pass the AVFormat_open_input (…) Method, however, the actual operation seems to be problematic.

AssetFileDescriptor

OpenFd (filename) returns an AssetFileDescriptor. Can FFmpeg open media files with file identifiers (fd)? The answer is yes. However, there is a problem with assets files. Assets return an Asset with an mStartOffset in the AssetFileDescriptor. In other words, the actual valid data needs to be read from the mStartOffset.

Stackoverflow has the same question:

How to properly pass an asset FileDescriptor to FFmpeg using JNI in Android

implementation

  1. After obtaining the actual FD in the native layer, the URL is assembled using the PIPE or file protocol

    int fd = jniGetFDFromFileDescriptor(env, fileDescriptor);
    char path[20];
    sprintf(path, "/proc/self/fd/%d", fd);
    Copy the code
  2. In the call avformat_open_input (…). Method to manually create AVFormatContext and set the skip_initial_bytes parameter.

    fmtCtx = avformat_alloc_context();
    fmtCtx->skip_initial_bytes = offset;
    fmtCtx->iformat = av_find_input_format("mp3");
    Copy the code

Note: Avformat_open_input is used to open media files and retrieve file information. A more detailed analysis of this function can be found in Raytheon’s FFmpeg source code: Avformat_open_input (). In fact, we can do without specifying iformat. The advantage of specifying iformat is that FFmpeg will no longer detect the format of media files.

The problem

[root@ffmPEG] [root@ffmpeg] [root@ffMpeg] [root@ffMpeg] [root@ffMPEG] [root@ffMPEG] [root@ffMPEG]

// Only key logs are posted here
[FFMPEG]--:type:'ftyp' parent:'root' sz: 32 8 15677739
[FFMPEG]--:ISO: File Type Major Brand: isom
[FFMPEG]--:type:'free' parent:'root' sz: 8 40 15677739
[FFMPEG]--:type:'mdat' parent:'root' sz: 7606147 48 15677739
[FFMPEG]--:type:'moov' parent:'root' sz: 7384 7606195 15677739.// Mapping between SAMPLE and chunk
[FFMPEG]--:AVIndex stream 0, sample 0, offset 30, dts 0, size 83337, distance 0, keyframe 1
[FFMPEG]--:AVIndex stream 0, sample 1, offset 1494b, dts 512, size 118, distance 1, keyframe 
....
[FFMPEG]--:nal_unit_type: 7(SPS), nal_ref_idc: 3
[FFMPEG]--:nal_unit_type: 8(PPS), nal_ref_idc: 3
[FFMPEG]--:stream 0, sample 0, dts 0
[FFMPEG]--:stream 1, sample 0, dts 0
// NAL is used to store video stream (H264) data
[FFMPEG]--:Invalid NAL unit size (1920229220 > 83333).
[FFMPEG]--:Error splitting the input into NAL units.
....
Copy the code

From the logs of FFmpeg, it can be found that FFmpeg does not make mistakes in parsing mp4 file information and correctly identifies key Atom such as FTYP, MDAT and MOOV. In subsequent parsing, it also correctly parsed the mapping relationship between sample and chunk (STSC). However, an obvious Invalid NAL unit size exception was reported in the decoding phase.

Analysis of the

Are media files in Assets handled by Android?

The above phenomenon first thought is whether the assets file in the assets directory has been specially processed by Android, and finally caused FFmpeg to read the file through fd decoding error. As you can see from the official description of AssetFileDescriptor Android doesn’t seem to do anything special to assets files.

File descriptor of an entry in the AssetManager. This provides your own opened FileDescriptor that can be used to read the data, as well as the offset and length of that entry’s data in the file.

However, in Android audio and video development, we often use MediaCodec’s setDataSource(FD) method to transfer media data to MediaCodec, but MediaCodec can still read assets files normally. Do you need special handling when using file identifiers in Android AssetFileDescriptor?

How does MediaCodec handle setFileDescriptor?

The call to search for the source code can find MediaExtractor# setDataSource native method for NuMediaExtractor: : setDataSource, this method finally fd encapsulation as FileSource object.

NuMediaExtractor.cpp

status_t NuMediaExtractor::setDataSource(int fd, off64_t offset, off64_t size) {...if(mImpl ! =NULL) {
        return -EINVAL;
    }
  	/ / create FileSource
    sp<FileSource> fileSource = newFileSource(dup(fd), offset, size); .return OK;
}
Copy the code

The reading from FileSource is done by calling the normal Linux kernel function read, which does nothing special to a FD.

FileSource.cpp

ssize_t FileSource::readAt_l(off64_t offset, void *data, size_t size) {
  	//seek to the specified offset
    off64_t result = lseek64(mFd, offset + mOffset, SEEK_SET);
    if (result == - 1) {
        ALOGE("seek to %lld failed", (long long)(offset + mOffset));
        return UNKNOWN_ERROR;
    }
    //seek reads the specified size of data from fd into data
    return ::read(mFd, data, size);
}
Copy the code
So how does FFmpeg handle the setFileDescriptor? Is it different from MediaCodec?

Libavforamt /file.c

//4.3.1 source code, 109 lines
static int file_read(URLContext *h, unsigned char *buf, int size)
{
    FileContext *c = h->priv_data;
    int ret;
    size = FFMIN(size, c->blocksize);
  	// The same read function is called
    ret = read(c->fd, buf, size);
    if (ret == 0 && c->follow)
        return AVERROR(EAGAIN);
    if (ret == 0)
        return AVERROR_EOF;
    return (ret == - 1)? AVERROR(errno) : ret; }Copy the code

Libavformat/file_read does not include seek logic. Libavformat /file.c does not include seek logic. Libavformat /file.c does not include seek logic. FFmpeg encapsulates all IO protocols as URLProtocl structures. We can briefly look at the definition of file protocol:

const URLProtocol ff_file_protocol = {
    .name                = "file",
    .url_open            = file_open,
    .url_read            = file_read,//file_read function pointer
    .url_write           = file_write,
    .url_seek            = file_seek,//seek.url_close = file_close, .... . .default_whitelist ="file,crypto,data"
};
Copy the code

Both FFmpeg and MediaCodec use read functions when they read AssetFileDescriptor, but we can’t tell if there’s a problem with the logic of seek inside FFmpeg. Thus it is doubtful that FFmpeg does not properly handle startOffset of AssetFileDescriptor.

Tests whether skip_initial_bytes in AVFormatContext is a problem
  • We first test fd with offset 0 under normal conditions, which we can use in AndroidParcelFileDescriptorTo convert a local path to fd:

Android application layer:

val fd = ParcelFileDescriptor.open(File(videoPath), ParcelFileDescriptor.MODE_READ_ONLY)
Copy the code

Native layer:

fmtCtx = avformat_alloc_context();
fmtCtx->skip_initial_bytes = 0;
fmtCtx->iformat = av_find_input_format("mp4");
Copy the code

Result: Normal decoding

  • To test that offset is not zero, we can use the hexadecimal tool to open the video file and manually add a few bytes to the header of the file to simulate offset

The Android application layer calls are the same as above

The Native layer needs to set the skip_initial_bytes variable:

fmtCtx = avformat_alloc_context();
fmtCtx->skip_initial_bytes = 3;// The number of bytes manually added to the file header
fmtCtx->iformat = av_find_input_format("mp4");
Copy the code

Result: The file cannot be decoded properly and basic information about the media file can be obtained. The log is the same as that in the preceding problem.

(If you do not have the test conditions, you can use my test branch to verify the results, clone code can be directly run)

From the analysis of the above columns, it is almost certain that FFmpeg will not read and decode an MP4 (MOV) media file correctly if the SKip_initial_bytes variable of AVFormatContext is set.

why

By looking at the avformat_open_INPUT function, you can see the following code related to the skip_initial_bytes variable:

.if ((ret = init_input(s, filename, &tmp)) < 0)
    gotofail; . avio_skip(s->pb, s->skip_initial_bytes); .if(! (s->flags&AVFMT_FLAG_PRIV_OPT) && s->iformat->read_header)if ((ret = s->iformat->read_header(s)) < 0)
        gotofail; .Copy the code

After calling the important init_input function, avformat_open_INPUT calls avio_skip(s-> PB, S ->skip_initial_bytes); The avformat_open_input function will seek the file to the specified offset position without any other processing logic after correctly recognizing the IO protocol and detecting the file format (how iformat is not set).

The read_header function pointer in AVInputformat is then called. The read_header function pointer points to the function of the corresponding file format. In the MOV format, the read_header function is mov_read_header. The mov_read_header function is very important. Internally the mov_read_default() function will be called to read the data in 8-byte cycles and look for atom matches in the mov_default_parse_table table. The mov_build_index() function is called when traK is found, which can be interpreted as a media stream of a video file, such as a video stream or an audio stream, to further parse STSC (the mapping between SAMPLE and chunk mentioned above). When av_frame_read() is called in the subsequent decoding phase, mov.c will rely on the mapping in STSC to find and read the specified chunk.

// Version 4.3.1, mov.c
static void mov_build_index(MOVContext *mov, AVStream *st) {.../ / 3935 rowsAVIndexEntry *e; . e->pos = current_offset;// Directly assigns the position of chunk derived from parsing STSC
      e->timestamp = current_dts;
      e->size = sample_size;
      e->min_distance = distance;
      e->flags = keyframe ? AVINDEX_KEYFRAME : 0;
  		// This is the log of our above analysis, remember?
      av_log(mov->fc, AV_LOG_TRACE, "AVIndex stream %d, sample %u, offset %"PRIx64", dts %"PRId64","
              "size %u, distance %u, keyframe %d\n", st->index, current_sample, current_offset, current_dts, sample_size, distance, keyframe); . }// The av_frame_read function also calls the read_packet function in the corresponding media format
static int mov_read_packet(AVFormatContext *s, AVPacket *pkt)
{.../ / 7887 rows
    if(st->discard ! = AVDISCARD_ALL) {// sample->pos = pos; seek mode = SEEK_SET
        int64_tret64 = avio_seek(sc->pb, sample->pos, SEEK_SET); . }... }Copy the code

Avformat_open_input will seek the skip_initial_bytes specified before read_header. However, the skip_initial_bytes of our file was not added when read_header resolved the chunk offset, and mov. C re-sought according to the position resolved by read_header in the subsequent read_packet operation. As a result, the data in AVPacket obtained from AV_read_frame is wrong, so the data sent to the encoder cannot be decoded normally.

To solve

The solution is relatively simple, since only AVIOContext is passed when mov.c parsed atom, so I added the same skip_initial_Bytes field to AVIOContext, The corresponding skip_initial_bytes are added when mov_build_index is called and pos is assigned to AVIndexEntry (the structure of the sample mapping relationship). I have submitted this solution to Github with a detailed description of the modified files, and I have also written a new demo based on the WhatTheCodec project, which I have recently tested and found no other problems. If you have any other questions, welcome to exchange and learn from each other.

Making: github.com/YiChaoLove/…

Demo: github.com/YiChaoLove/…

Pipe agreement

Using the PIPE protocol was mentioned in the answer to Stackoverflow’s question above, and it actually works, but THERE are a few issues THAT I found in implementing it that I’ll describe directly here due to space constraints.

Using the PIPE protocol requires attention to the buffer size

The pipe protocol is commonly used for interprocess communication in Linux, but the pipe buffer is limited in size. In understanding the Linux kernel, it is mentioned that the size of the pipe buffer is 16 pages, and the size of each page is 4KB, which is 64KB according to the Linux system rules. Android is a Linux-based operating system, so when using the PIPE protocol to pass data in Android, you also need to follow the PIPE call specification. We can create a separate thread in the application layer to write data to one end of the PIPE, with FFmpeg being the reading end.

In order to be more universal, I manually created pipe in the native layer. The output fd of pipe is given to FFmpeg, and the input FD is held by the application layer and written to the IO thread. In this way, we can use the PIPE protocol to write data flexibly. You can even pass in-memory video data directly into FFmpeg. For details, see the from(inputStream: inputStream, shortFormatName: String) method in MediafileBuilder.kt.

Pipe protocol invalid for some video (MOV format) files?

When using pipe, you may encounter some videos that do not read or decode correctly. Here is a similar error log:

. [FFMPEG]--:Before avformat_find_stream_info(a) pos: 7613571 bytes read:7613571 seeks:0 nb_streams:2
....
[FFMPEG]--:format: start_time: 0 duration: 7.234 (estimate from stream) bitrate=0 kb/s
[FFMPEG]--:Could not find codec parameters for stream 0 (Video: h264 (avc1 / 0x31637661), none, 720x960, 8285 kb/s): unspecified pixel format
[FFMPEG]--:Could not find codec parameters for stream 1 (Audio: aac (mp4a / 0x6134706D), 44100 Hz, 2 channels, 128 kb/s): unspecified sample format
[FFMPEG]--:After avformat_find_stream_info(a) pos: 7613571 bytes read:7613571 seeks:0 frames:0
....
Copy the code

After analyzing and comparing these video files, I find that for moov video files after MDAT, the pipe protocol will not be able to read and decode normally. The MOOV can be read and decoded normally before being adjusted to MDAT using the Faststart parameter.

ffmpeg -i video.mp4 -c copy -movflags +faststart output.mp4
Copy the code

The faststart parameter is mostly used for streaming optimization, mdat is the Atom that actually stores the video data, and MOOv is used to store the video information. If mooV is behind it, the player will have to search the MOOV backwards all the way while playing the video. Putting mooV at the top of the file (after FTYP) will help the player get the video information quickly, and mooV at the bottom will have to read all the previous data. In the case of FFmpeg, this process will happen as mentioned above:

The mov_read_header function is very important, and internally the mov_read_default() function will be called to read the data in 8-byte cycles and look for atom matches in the mov_default_parse_table table

If the MOOV of the video file is after mdAT, then FFmpeg has actually read the end of the data after avForAMT_open_INPUT has read the video information. Furthermore, the PIPE protocol does not support SEEK, and data cannot be read again after being read from the buffer, so the video file will not be read and decoded correctly in this case.

conclusion

This paper focuses on analyzing the problems encountered when transferring data to FFmpeg using AssetFileDescriptor. This problem is actually caused by FFmpeg failing to properly process skip_initial_bytes when parsing mov format files. Finally, I shared the problems and solutions I encountered when using the PIPE protocol. These two issues could make FFmpeg much easier to use on Android, although MediaCodec has gradually replaced FFmpeg in some Android scenarios. However, FFmpeg’s versatility, stability, and compatibility make it likely to have a long-term future in Android audio and video development.

The views expressed in this article are personal and may be incorrect. If you have any questions or questions about the content mentioned in this article, please feel free to issue. If you think that this article can help you in your development or learning, welcome on FFmpegForAndroidAssetFileDescriptor star. Mov format is mentioned in the paper, for reference:

  • docs.fileformat.com/video/mov/
  • Developer.apple.com/library/arc…