In the previous we learned AudioRecord, AudioTrack, Camera, MediaExtractor, MediaMuxer API, MediaCodec. It’s already possible to combine this knowledge and do something slightly more complicated

I. Process analysis

1.1 Requirements

What we need to do is: connect the whole audio and video recording process, complete the audio and video collection, coding, encapsulation into MP4 output.

1.2 Implementation Mode

Use the MediaCodec class to encode and compress, compress video to H.264, audio to AAC, use MediaMuxer to combine audio and video into MP4.

Second, implementation process

2.1 Collecting Camera Data, transcoding it to H264 and storing it in a file

The example here uses camera, because the usage is relatively simple, camerA2 and cameraX use the idea is actually similar, but the API is different, code sit down to modify the line ~ ~

Before data collection, set some parameters for the Camera to facilitate data processing after collection:

val parameter = camera? .parameters parameter? .previewFormat = ImageFormat.NV21 parameter? SetPreviewSize (1280720)Copy the code

Then set the PreviewCallback to retrieve the Camera’s raw NV21 data:

camera? .setPreviewCallback { bytes, camera -> }Copy the code

Create an H264Encoder class, encode the operation in it, and store the encoded data to a file

class H264VideoEncoder(private val width: Int, private val height: Int, private val frameRate: Int) { private val mediaCodec: MediaCodec private val mediaMuxer: MediaMuxer private val yuv420Queue = ArrayBlockingQueue<ByteArray>(10) private var videoTrack = -1 var isRunning = false  init { val mediaFormat = MediaFormat.createVideoFormat("video/avc", width, height) mediaFormat.setInteger( MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanar ) mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, width * height * 5) mediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 30) mediaFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1) mediaCodec = MediaCodec.createEncoderByType("video/avc") mediaCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE) mediaCodec.start() val path = Environment.getExternalStorageDirectory().absolutePath +  File.separator + "temp_video.mp4" mediaMuxer = MediaMuxer(path, MediaMuxer. OutputFormat. MUXER_OUTPUT_MPEG_4) / / output is MP4} / * * * * * start coding/fun startEncoder () { GlobalScope.launch(Dispatchers.IO) { isRunning = true var pts = 0L var generateIndex = 0L while (isRunning) { val input = yuv420queue.take () val yuv420sp = ByteArray(width * height * 3/2) NV21ToNV12(input, yuv420sp, width, Height) try {/ / get the input stream queue val inputBufferIndex = mediaCodec. DequeueInputBuffer (0) / / how long is the interval 0 to return immediately val inputBuffer = mediaCodec.getInputBuffer(inputBufferIndex) if (inputBuffer ! = = null) {PTS computePresentationTime (generateIndex) / / input into the queue inputBuffer. Put (yuv420sp) mediaCodec. QueueInputBuffer ( inputBufferIndex, 0, yuv420sp.size, pts, 0) generateIndex += 1} val bufferInfo = MediacoDec.BufferInfo () var outputBufferIndex = mediaCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC) //timeoutUs timeout if (outputBufferIndex == Mediacodec.info_Output_format_changed) {videoTrack = AddTrack (Mediacodec.outputFormat) alog.e ("xiao", "format change, videoTrack: $videoTrack") if (videoTrack >= 0) { mediaMuxer.start() ALog.e("xiao", "Start mixing")}} while (outputBufferIndex > = 0) {if (videoTrack > = 0) {mediaCodec. GetOutputBuffer outputBufferIndex?. Let {  mediaMuxer.writeSampleData(videoTrack, it, bufferInfo) } mediaCodec.releaseOutputBuffer(outputBufferIndex, false) } outputBufferIndex = mediaCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC) } } catch (e: Exception) {alog.e ("xiao", "error: ${e.message}") e.printStackTrace() } } try { mediaMuxer.stop() mediaMuxer.release() } catch (e: Exception) { e.printStackTrace() } try { mediaCodec.stop() mediaCodec.release() } catch (e: Exception) { e.printStackTrace() } ALog.e("xiao", }} /** * Stop encoding data */ fun stopEncoder() {isRunning = false} /** * Generate timestamp based on frame number */ private fun computePresentationTime(frameIndex: Long): Long { return 132 + frameIndex * 1000000 / frameRate } private fun NV21ToNV12(nv21: ByteArray, nv12: ByteArray, width: Int, height: Int) { val frameSize = width * height System.arraycopy(nv21, 0, nv12, 0, frameSize) for (i in 0 until frameSize) { nv12[i] = nv21[i] } for (j in 0 until (frameSize / 2) step 2) { nv12[frameSize  + j - 1] = nv21[j + frameSize] } for (j in 0 until (frameSize / 2) step 2) { nv12[frameSize + j] = nv21[j + frameSize -  1] } } fun putData(buffer: ByteArray) { if (yuv420Queue.size >= 10) { yuv420Queue.poll() } yuv420Queue.put(buffer) } companion object { private const val TIMEOUT_USEC = 12000L } }Copy the code

What we’re going to do is take the audio and mix it up into audio and video

The audio recording

class H264AudioEncoder( private val sampleRateInHz: Int, private val channelConfig: Int, private val audioFormat: Int ) { private val recordBufSize: Int = AudioRecord.getMinBufferSize(sampleRateInHz, channelConfig, audioFormat) private var audioRecord = AudioRecord( MediaRecorder.AudioSource.MIC, sampleRateInHz, channelConfig, audioFormat, recordBufSize ) private var mediaCodec: MediaCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC) private val path = Environment.getExternalStorageDirectory().absolutePath + File.separator + "temp_audio.mp4" private val mediaMuxer = MediaMuxer(path, MediaMuxer. OutputFormat. Private var MUXER_OUTPUT_MPEG_4) private var audioTrack = - 1 set = false / * * * * * / start coding fun startEncoder() { isRunning = true GlobalScope.launch(IO) { val audioFormat = MediaFormat.createAudioFormat(MediaFormat.MIMETYPE_AUDIO_AAC, sampleRateInHz, 1) audioFormat.setInteger( MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC ) audioFormat.setInteger(MediaFormat.KEY_BIT_RATE, 64 * 1000) audioFormat.setInteger(MediaFormat.KEY_CHANNEL_COUNT, 1) audioFormat.setInteger(MediaFormat.KEY_SAMPLE_RATE, sampleRateInHz) mediaCodec.configure(audioFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE) mediaCodec.start() audioRecord.startRecording() val buffer = ByteArray(recordBufSize) while (isRunning) { val readBytes =, 0, recordBufSize) ALog.e("xiao", "Decode audio data :$readBytes") try {encode(buffer, readBytes)} catch (e: java.lang.Exception) { ALog.e("xiao", "Failed to decode Audio data ") e.printStackTrace()}} try {audiorecord.stop () audiorecord.release ()} catch (e: Exception) { ALog.e("xiao", e.message) e.printStackTrace() } try { mediaMuxer.stop() mediaMuxer.release() } catch (e: Exception) { ALog.e("xiao", e.message) e.printStackTrace() } try { mediaCodec.stop() mediaCodec.release() } catch (e: Exception) { ALog.e("xiao", e.message) e.printStackTrace() } ALog.e("xiao", }} private val bufferInfo = mediacodec.bufferInfo () private fun encode(byteArray: byteArray, readBytes: Int) {val inputBufferIndex = mediaCodec. DequeueInputBuffer (TIMEOUT_USEC) / / how long is the interval 0 for immediate return ALog. E (" xiao ", "inputBufferIndex:  $inputBufferIndex") if (inputBufferIndex < 0) return val inputBuffer = mediaCodec.getInputBuffer(inputBufferIndex) if (inputBuffer ! = null) { inputBuffer.put(byteArray) if (readBytes <= 0) { ALog.e("xiao", "send BUFFER_FLAG_END_OF_STREAM") mediaCodec.queueInputBuffer( inputBufferIndex, 0, 0, System.nanoTime() / 1000, MediaCodec.BUFFER_FLAG_END_OF_STREAM ) } else { mediaCodec.queueInputBuffer( inputBufferIndex, 0, readBytes, System.nanotime () / 1000, 0)}} / * obtain the data after decoding * / var outputBufferIndex = mediaCodec dequeueOutputBuffer (bufferInfo, TIMEOUT_USEC) if (outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) { audioTrack = AddTrack (mediacodec.outputFormat) alog.e ("xiao", "format change, audioTrack: $audioTrack") if (audioTrack >= 0) { mediaMuxer.start() ALog.e("xiao", }} while (outputBufferIndex >= 0) {if (audioTrack >= 0) {val outBuffer = mediaCodec.getOutputBuffer(outputBufferIndex) if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_CODEC_CONFIG ! = 0) { bufferInfo.size = 0 } if (bufferInfo.size ! = 0 && outBuffer ! = null) { mediaMuxer.writeSampleData(audioTrack, outBuffer, bufferInfo) } mediaCodec.releaseOutputBuffer(outputBufferIndex, false) } outputBufferIndex = mediaCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC)}} /** * Stop encoding data */ fun stopEncoder() {isRunning = false} /** * Generate timestamp based on frame number */ Private Val frameRate = 30 private fun computePresentationTime(frameIndex: Long): Long { return 132 + frameIndex * 1000000 / frameRate } companion object { private const val TIMEOUT_USEC = 12000L } }Copy the code

Then we have mp4 audio and MP4 video 2, just mix it with mediaMuxer (see section 5)

class H264Muxer( private val videoPath: String = Environment.getExternalStorageDirectory().absolutePath + File.separator + "temp_video.mp4", private val audioPath: String = Environment.getExternalStorageDirectory().absolutePath + File.separator + "temp_audio.mp4" ) { private val outputPath = Environment.getExternalStorageDirectory().absolutePath + File.separator + "video_output.mp4" private val videoExtractor = MediaExtractor() private val audioExtractor = MediaExtractor() private var mediaMuxer:MediaMuxer? @throws (RuntimeException::class) fun muxer() {mediaMuxer = mediaMuxer (outputPath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4) val muxer = mediaMuxer ? : Var audioTrackIndex = -1 var audioMuxerTrackIndex = -1 var audioMaxInputSize = 0 audioExtractor.setDataSource(audioPath) val audioTrackCount = audioExtractor.trackCount for (i in 0 until audioTrackCount) { val format = audioExtractor.getTrackFormat(i) val mime = format.getString(MediaFormat.KEY_MIME) ? : Continue if (mime.startswith ("audio/")) {alog.e ("xiao"," find the audio track ") audioTrackIndex = I audioMuxerTrackIndex = Muxer.addtrack (format)// Add the audio track to the MediaMuxer, GetInteger (MediaFormat.key_MAX_input_size) break}} // Image information var videoTrackIndex = -1 var videoMuxerTrackIndex = -1 var videoMaxInputSize = 0 var videoFrameRate = 0 videoExtractor.setDataSource(videoPath) val videoTrackCount = videoExtractor.trackCount for (i in 0 until videoTrackCount) { val format = videoExtractor.getTrackFormat(i) val mime = format.getString(MediaFormat.KEY_MIME) ? : Continue if (mime.startswith ("video/")) {alog.e ("xiao"," find the video track ") videoTrackIndex = I videoMuxerTrackIndex = Muxer.addtrack (format) // Add video track to MediaMuxer, VideoMaxInputSize = format.getINTEGER (mediaFormat.key_max_input_size); VideoFrameRate = format.getINTEGER (mediaformat.key_frame_rate)// Get the frame rate of the video break}} if (audioTrackIndex) == -1) Throw RuntimeException(" No track found ") if (videoTrackIndex == -1) throw RuntimeException(" No track found ") muxer.start( AudioExtractor. SelectTrack (audioTrackIndex) / / will provide audio video option to track val audioMediaInfo = MediaCodec. BufferInfo (val) audioBuffer = ByteBuffer.allocate(audioMaxInputSize) while (true) { val sampleSize = audioExtractor.readSampleData(audioBuffer, 0) / / retrieve the current code sample and store it in bytes in the buffer if (sampleSize < = 0) {audioExtractor. UnselectTrack (audioTrackIndex) break} / samples/set encoding information audioMediaInfo.offset = 0 audioMediaInfo.size = sampleSize audioMediaInfo.flags = audioExtractor.sampleFlags audioMediaInfo.presentationTimeUs = audioExtractor.sampleTime muxer.writeSampleData(audioMuxerTrackIndex, audioBuffer, AudioMediaInfo) audioExtractor. Advance ()} videoExtractor. SelectTrack (videoTrackIndex) / / / / will choose to provide video image video video val on the rail videoMediaInfo = MediaCodec.BufferInfo() val videoBuffer = ByteBuffer.allocate(videoMaxInputSize) while (true) { val sampleSize = videoExtractor.readSampleData(videoBuffer, 0) / / retrieve the current code sample and store it in bytes in the buffer if (sampleSize < = 0) {videoExtractor. UnselectTrack (videoTrackIndex) break} / samples/set encoding information videoMediaInfo.offset = 0 videoMediaInfo.size = sampleSize videoMediaInfo.flags = MediaCodec.BUFFER_FLAG_KEY_FRAME videoMediaInfo.presentationTimeUs += 1000 * 1000 / videoFrameRate muxer.writeSampleData(videoMuxerTrackIndex, videoBuffer, videoMediaInfo) videoExtractor.advance() } audioExtractor.release() videoExtractor.release() muxer.stop() Muxer.release () alog.e ("xiao"," done ")}}Copy the code

Why can’t I output the full MP4 directly?

The above operation may be a bit confusing, you need to record two MP4 files separately and then synthesize one, which is a bit troublesome. Direct output of the complete MP4 is not impossible, and then need to make some changes to the code

The idea is to cache the audio and video encoded outputs separately and use muxer to process them. Go straight to the code

class H264Encode {

    // 视频相关参数
    private val width = 1280
    private val height = 720
    private val frame = 30

    private var videoCodec: MediaCodec? = null
    private val yuv420Queue = ArrayBlockingQueue<ByteArray>(10)

    // 音频相关参数
    private val sampleRateInHz: Int = 44100
    private val channelConfig: Int = AudioFormat.CHANNEL_CONFIGURATION_MONO
    private val encodingBitRate: Int = AudioFormat.ENCODING_PCM_16BIT
    private var audioCodec: MediaCodec? = null
    private var audioRecord: AudioRecord? = null

    private var isRunning = false
    private var outputPath =
        Environment.getExternalStorageDirectory().absolutePath + File.separator + "xiao.mp4"
    private val muxerDateQueue = LinkedBlockingQueue<MuxerData>()
    private var mediaMuxer: MediaMuxer? = null

    private var isVideoTrackAdd = false
    private var isAudioTrackAdd = false

    fun startEncoder() {
        isRunning = true
        isVideoTrackAdd = false
        isAudioTrackAdd = false
        GlobalScope.launch(IO) {
            mediaMuxer = MediaMuxer(outputPath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
            launch { catch({ onVideoEncoder() }, "onVideoEncoder") }
            launch { catch({ onAudioEncoder() }, "onAudioEncoder") }
            launch { catch({ onMuxer()}, "onMuxer") }

    fun stopEncoder() {
        isRunning = false

    fun putVideoData(buffer: ByteArray) {
        if (yuv420Queue.size >= 10) {

    private fun onVideoEncoder() {
        val muxer = mediaMuxer ?: return

        var videoTrack = -1
        val mediaFormat =
            MediaFormat.createVideoFormat(MediaFormat.MIMETYPE_VIDEO_AVC, width, height)
        mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, width * height * 5)
        mediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 30)
        mediaFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)

        videoCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_VIDEO_AVC)
        val videoCodec = videoCodec ?: throw NullPointerException("videoEncoder 未初始化")
        videoCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)

        while (isRunning) {
            val input = yuv420Queue.take()
            val yuv420sp = ByteArray(width * height * 3 / 2)
            // 必须要转格式,否则录制的内容播放出来为绿屏
            NV21ToNV12(input, yuv420sp, width, height)

            try {
                val inputBufferIndex = videoCodec.dequeueInputBuffer(0) //间隔多久 0为立即返回
                val inputBuffer = videoCodec.getInputBuffer(inputBufferIndex)
                if (inputBuffer != null) {
                    val nanoTime = System.nanoTime() / 1000
                    videoCodec.queueInputBuffer(inputBufferIndex, 0, yuv420sp.size, nanoTime, 0)

                val bufferInfo = MediaCodec.BufferInfo()
                var outputBufferIndex = videoCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC)
                if (outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                    videoTrack = muxer.addTrack(videoCodec.outputFormat)
                    isVideoTrackAdd = true
                while (outputBufferIndex >= 0) {
                    if (videoTrack >= 0) {
                        videoCodec.getOutputBuffer(outputBufferIndex)?.let {
                            if (isMuxerStart()) muxerDateQueue.put(MuxerData(Video, videoTrack, it ,bufferInfo))
                        videoCodec.releaseOutputBuffer(outputBufferIndex, false)
                    outputBufferIndex = videoCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC)
            } catch (e: Exception) {
                ALog.e("xiao", "onVideoEncoder出错: ${e.message}")

        try {
            this.videoCodec = null
        } catch (e: Exception) {

    private fun onAudioEncoder() {
        val muxer = mediaMuxer ?: return
        var audioTrack = -1

        val audioFormat =
            MediaFormat.createAudioFormat(MediaFormat.MIMETYPE_AUDIO_AAC, sampleRateInHz, channelConfig)
        audioFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC)
        audioFormat.setInteger(MediaFormat.KEY_BIT_RATE, 64 * 1000)
        audioCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC)
        audioCodec?.configure(audioFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)

        val recordBufSize = AudioRecord.getMinBufferSize(sampleRateInHz, channelConfig, encodingBitRate)
        audioRecord = AudioRecord(MediaRecorder.AudioSource.MIC, sampleRateInHz, channelConfig, encodingBitRate, recordBufSize)

        val audioCodec = audioCodec ?: throw NullPointerException()
        val audioRecord = audioRecord ?: throw NullPointerException()

        val bufferInfo = MediaCodec.BufferInfo()
        val buffer = ByteArray(recordBufSize)
        while (isRunning) {
            val readBytes =, 0, recordBufSize)
            try {
                val inputBufferIndex = audioCodec.dequeueInputBuffer(TIMEOUT_USEC)
                if (inputBufferIndex < 0) return
                val inputBuffer = audioCodec.getInputBuffer(inputBufferIndex)
                if (inputBuffer != null) {
                    if (readBytes <= 0) {
                        ALog.e("xiao", "send BUFFER_FLAG_END_OF_STREAM")
                        audioCodec.queueInputBuffer(inputBufferIndex, 0, 0, System.nanoTime() / 1000, MediaCodec.BUFFER_FLAG_END_OF_STREAM)
                    } else {
                        audioCodec.queueInputBuffer(inputBufferIndex, 0, readBytes, System.nanoTime() / 1000, 0)

                var outputBufferIndex = audioCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC)
                if (outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                    audioTrack = muxer.addTrack(audioCodec.outputFormat)
                    ALog.e("xiao", "format改变, audioTrack: $audioTrack")
                    if (audioTrack >= 0) {
                        isAudioTrackAdd = true

                while (outputBufferIndex >= 0) {
                    if (audioTrack >= 0) {
                        val outBuffer = audioCodec.getOutputBuffer(outputBufferIndex)
                        if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_CODEC_CONFIG != 0) {
                            bufferInfo.size = 0
                        if (bufferInfo.size != 0 && outBuffer != null) {
                            if (isMuxerStart()) muxerDateQueue.put(MuxerData(Audio, audioTrack, outBuffer, bufferInfo))
                        audioCodec.releaseOutputBuffer(outputBufferIndex, false)
                    outputBufferIndex = audioCodec.dequeueOutputBuffer(bufferInfo, TIMEOUT_USEC)

            } catch (e: Exception) {
                ALog.e("xiao", "解码音频(Audio)数据失败: ${e.message}")

        try {
            this.audioCodec = null
        } catch (e: Exception) {
            ALog.e("xiao", e.message)

        try {
            this.audioCodec = null
        } catch (e: Exception) {
            ALog.e("xiao", e.message)

    private suspend fun onMuxer() {
        while (!isMuxerStart()) { delay(100) }
        val muxer = mediaMuxer ?: throw NullPointerException()

        while (isRunning || muxerDateQueue.size > 0) {
            muxerDateQueue.take().apply {
                muxer.writeSampleData(trackIndex, buffer, bufferInfo)

        try {
        } catch (e: Exception) {
            ALog.e("xiao", e.message)
        mediaMuxer = null

    private fun NV21ToNV12(nv21: ByteArray, nv12: ByteArray, width: Int, height: Int) {
        val frameSize = width * height
        System.arraycopy(nv21, 0, nv12, 0, frameSize)
        for (i in 0 until frameSize) {
            nv12[i] = nv21[i]
        for (j in 0 until (frameSize / 2) step 2) {
            nv12[frameSize + j - 1] = nv21[j + frameSize]
        for (j in 0 until (frameSize / 2) step 2) {
            nv12[frameSize + j] = nv21[j + frameSize - 1]

    private fun isMuxerStart() = isAudioTrackAdd && isVideoTrackAdd

    private suspend fun catch(action: (suspend () -> Unit), key: String) {
        try {
        } catch (e: Exception) {
            ALog.e("xiao", "$key ${e.message}")

    data class MuxerData(
        val type: Type,
        val trackIndex: Int,
        val buffer: ByteBuffer,
        val bufferInfo: MediaCodec.BufferInfo
    ) {
        enum class Type {

    companion object {
        private const val TIMEOUT_USEC = 12000L
Copy the code

2.2 Use Camer2/CameraX to collect data and output as MP4

Camera straight out is NV12, cameraX/ CamerA2 straight out is YUV 3 array, so we need to deal with it

fun yuv420ToNv21(image: ImageProxy): ByteArray { val planes = image.planes val yBuffer: ByteBuffer = planes[0].buffer val uBuffer: ByteBuffer = planes[1].buffer val vBuffer: ByteBuffer = planes[2].buffer val ySize: Int = yBuffer.remaining() val uSize: Int = uBuffer.remaining() val vSize: Int = vBuffer.remaining() val nv21 = ByteArray(ySize + vSize + 1) yBuffer.get(nv21, 0, ySize) vBuffer.get(nv21, ySize, VSize) val u = ByteArray(uSize) ubuffer.get (u) Var pos = ySize + 1 for (I in 0 until uSize) {if (I % 2 == 0) {nv21[pos] = u[I] pos += 2}} return nv21}Copy the code

And then the YUV stuff, we’ll come back to that in the next article

