# Pcm AAc

What is Pcm?

PCM (Pulse Code Modulation) is one of the coding methods of digital communication. The main process is to take samples of voice, image and other analog signals at regular intervals to discretize them. Meanwhile, the sampling values are rounded to the whole quantization by stratified units, and the sampling values are represented by a group of binary codes to represent the amplitude of the sampling pulse.

# Pcm audio format

PCM: its data arrangement format is staggered arrangement of data of each sample point on the left and right channels

Generally speaking, in the development of receivers, considering the problem of network transmission load, audio (data) signals will be considered for down-sampling or denoising base processing, which involves the filtering of audio signals. But both in time domain filtering and frequency domain filtering, the receiver can directly output audio byte stream does not directly use, then consider using byte stream to restore the original audio time domain data, then data organization must be aware of PCM encoding format, involves two basic questions a PCM sampling data of how many bytes, hi-lo deposit order problem, After obtaining the time domain data, the filter can be used to filter the audio data. After this kind of conversion, the interference noise in the audio signal can be successfully filtered, making the audio more clear and completely unable to hear the interference signal.


Aac is a format for audio files

What’s the difference between MP3 and AAC?

1, the different MP3 compression technology is the use of the human ear is not sensitive to high frequency sound signal characteristics, the time domain waveform signal into frequency domain signal, and divided into multiple frequencies, the different frequency bands using different compression ratio, the high frequency increase the compression ratio (or even ignore the signal) using small compression ratio for low frequency signal, ensure the signal distortion. This is equivalent to ditching the high frequencies that are almost inaudible to the human ear and keeping only the low frequencies that can be heard, thereby squeezing the sound with a compression rate of 1:10 or even 1:12.

AAC it adopts a new algorithm for coding, more efficient, with a higher “cost performance”. Using AAC format, can make people feel that the sound quality is not significantly reduced under the premise of more compact.

2. Different audio quality AAC formats perform better at 96Kbps bit rate than MP3 formats at 128Kbps. At 128Kbps, AAC offers significantly better sound quality than MP3. AC is the only webcast format that can be awarded “outstanding” in all EBU audition-testing programs.

# Baidu Voice Recognition

Recently, I encountered a function that uses speech recognition to convert speech to text and at the same time retains the text information and the voice information after the conversion is complete. The text is input intermittently and the voice file is splice. Finally, I get a string of text and a.AAC file

How to operate?

Baidu voice recognition mobile documents

Search baidu voice recognition documents and find that Baidu Voice recognition voice-to-text voice collection audio format is as follows:

The default is microphone input, you can set the parameter to PCM format 16K sampling rate, 16bit, small encoder, mono audio stream input.Copy the code

Get audio files generated by Baidu Voice Recognition:

/** * Integrate based on SDK 2.2 Send start event * click start button * Test parameters fill here */
    public void start(a) {

        Map<String, Object> params = new LinkedHashMap<String, Object>();
        String event = null;
        event = SpeechConstant.ASR_START; // Replace the event for the test

        if (enableOffline) {
            params.put(SpeechConstant.DECODER, 2);
        } else{}// Set identification parameters based on SDK integration 2.1
        params.put(SpeechConstant.ACCEPT_AUDIO_VOLUME, false); // Current volume callback
        // In the case of networking, it can be used in the search model or far field model of Mandarin. Baidu server will do semantic analysis of the identified text to obtain the text intention and word slot.
        params.put(SpeechConstant.PID, 15373); // Chinese input method model, with comma
        params.put(SpeechConstant.VAD_ENDPOINT_TIMEOUT, 3000); // Enable long voice. Enable VAD tail detection, that is, the number of milliseconds for mute detection. The recommended value is 800ms to 3000ms
        // params.put(SpeechConstant.NLU, "enable");
        params.put(SpeechConstant.DISABLE_PUNCTUATION, false);
        // params.put(SpeechConstant.IN_FILE, "res:///com/baidu/android/voicedemo/16k_test.pcm");
        params.put(SpeechConstant.VAD, SpeechConstant.VAD_DNN);
        // Whether to output voice files
        params.put(SpeechConstant.ACCEPT_AUDIO_DATA, true);
        // Output file directory
        params.put(SpeechConstant.OUT_FILE, voicePcmUrl + "outfile.pcm");
        // Please use the interface such as' online identification 'to test and generate identification parameters first. Params with ActivityRecog class myrecognizer.start (Params);
        // Copy this section to automatically detect errors
        (new AutoCheck(mContext, new Handler() {
            public void handleMessage(Message msg) {
                if (msg.what == 100) {
                    AutoCheck autoCheck = (AutoCheck) msg.obj;
                    synchronized (autoCheck) {
                        String message = autoCheck.obtainErrorMessage(); // autoCheck.obtainAllMessage();
        }, enableOffline)).checkAsr(params);
        String json = null; // You can replace it with your own JSON
        json = new JSONObject(params).toString(); // This can be replaced with the json you need to test
        wakeup.send(event, json, null.0.0);
Now that I have my.PCM file the question is how do I convert PCM to AAc and do the concatenation?

A burst of crazy CSDN Baidu official documents and so on…

Android PCM to AAC implementation method

  • MediaCodec configures the compiler for editing transcoding
  • ffmpeg

Fundamentals of Android MediaCodec

Encapsulate a utility class:

  • Obtain the path for saving the output file after the conversion
  • Initialize the MediaCodec encoder
  • Store.pCM Bayte [] data to the cache queue
  • Start a thread to read cache queue data to start conversion
  • Conversion to monitor
 /** * Sets input/output file locations **@param srcPath
     * @param dstPath
    public void setIOPath(String srcPath, String dstPath) {
        this.srcPath = srcPath;
        this.dstPath = dstPath;
 /** * This class is encapsulated * calling the prepare method initializes Decode, Encode, input and output streams, etc. */
    public void prepare(a) {
        codeOver = false;
        if (srcPath == null) {
            throw new IllegalArgumentException("srcPath can't be null");
        if (dstPath == null) {
            throw new IllegalArgumentException("dstPath can't be null");
        try {
            File file = new File(srcPath);
            fileTotalSize = file.length();
            outFile = new File(dstPath);
            if(! file.exists()) { file.createNewFile(); } fileOutSize = outFile.length();// fos = new FileOutputStream(outFile);
            // bos = new BufferedOutputStream(fos, (int) fileTotalSize);
            queue = new ArrayBlockingQueue<byte[] > (10);
            initAACMediaEncode();/ / AAC codec
            FileInputStream inputStream = new FileInputStream(file);
            int available = inputStream.available();
            byte[] pcm = new byte[available];
    /** * Initialize the AAC encoder */
    private void initAACMediaEncode(a) {
        try {
            LogUtils.d(key_bit_rate + "" + key_channel_count + "" + key_sample_rate + "" + sampleRateType);
            key_sample_rate = 16000;
            key_channel_count = 1;
            key_bit_rate = 16;
            sampleRateType = ADTSUtils.getSampleRateType(16000);
            MediaFormat encodeFormat = MediaFormat.createAudioFormat(MediaFormat.MIMETYPE_AUDIO_AAC,
                    key_sample_rate, key_channel_count);// The parameters correspond to -> MIME type, sample rate, and number of channels
            encodeFormat.setInteger(MediaFormat.KEY_BIT_RATE, key_bit_rate);/ / bitrate
            encodeFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
            encodeFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, (int) fileTotalSize);
            mediaEncode = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC);
            mediaEncode.configure(encodeFormat, null.null, MediaCodec.CONFIGURE_FLAG_ENCODE);
        } catch (IOException e) {

        if (mediaEncode == null) {
            LogUtils.e("create mediaEncode failed");
        encodeInputBuffers = mediaEncode.getInputBuffers();
        encodeOutputBuffers = mediaEncode.getOutputBuffers();
        encodeBufferInfo = new MediaCodec.BufferInfo();
    /** * Author: Eric * CreateDate: 2018/AP 15:28 * Email: [email protected] * Version: 2.0 * Desc: * Modified: */

public class ADTSUtils {
    private static Map<String, Integer> SAMPLE_RATE_TYPE;

    static {
        SAMPLE_RATE_TYPE = new HashMap<>();

    public static int getSampleRateType(int sampleRate) {
    /** * start transcoding * audio data {@linkMIMETYPE_AUDIO_AAC audio format * MP3 ->PCM-> AAC */
    public void startAsync(a) {
        new Thread(new EncodeRunnable()).start();
/** * Encode PCM data to get mediaformat.mimetype_audio_aAC audio file and save to {@link #dstPath}
    private void dstAudioFormatFromPCM(a) {

        int inputIndex;
        ByteBuffer inputBuffer;
        int outputIndex;
        ByteBuffer outputBuffer;
        byte[] chunkAudio;
        int outBitSize;
        int outPacketSize;
        byte[] chunkPCM;

        for (int i = 0; i < encodeInputBuffers.length - 1; i++) {
            chunkPCM = getPCMData();// Get the data output of the thread where the decoder is located
            if (chunkPCM == null) {
            inputIndex = mediaEncode.dequeueInputBuffer(-1);// Same as decoder
            inputBuffer = encodeInputBuffers[inputIndex];// Same as decoder
            inputBuffer.clear();// Same as decoder
            inputBuffer.put(chunkPCM);//PCM data is populated to inputBuffer
            mediaEncode.queueInputBuffer(inputIndex, 0, chunkPCM.length, 0.0);// Notification encoder encoding

        outputIndex = mediaEncode.dequeueOutputBuffer(encodeBufferInfo, 10000);// Same as decoder
        while (outputIndex >= 0) {// Same as decoder
            outBitSize = encodeBufferInfo.size;
            outPacketSize = outBitSize + 7;//7 is the size of the ADTS header
            outputBuffer = encodeOutputBuffers[outputIndex];// Get the output Buffer
            outputBuffer.limit(encodeBufferInfo.offset + outBitSize);
            chunkAudio = new byte[outPacketSize];
            addADTStoPacket(chunkAudio, outPacketSize);// Add ADTS code will be pasted later
            outputBuffer.get(chunkAudio, 7, outBitSize);// Fetch the encoded AAC data into byte[] at offset=7
            try {
                // Implement append
                RandomAccessFile randomFile = new RandomAccessFile(dstPath, "rw");
                // File length, number of bytes
                long fileLength = randomFile.length();
                // Move the write pointer to the end of the file.
                randomFile.write(chunkAudio, 0, chunkAudio.length);
                LogUtils.d("write " + chunkAudio.length);
            } catch (IOException e) {
            mediaEncode.releaseOutputBuffer(outputIndex, false);
            outputIndex = mediaEncode.dequeueOutputBuffer(encodeBufferInfo, 10000);
            codeOver = true; }}/** * Add ADTS header **@param packet
     * @param packetLen
    private void addADTStoPacket(byte[] packet, int packetLen) {
        int profile = 2; // AAC LC
        int freqIdx = sampleRateType; / / 44.1 KHz
        int chanCfg = 2; // CPE

        // fill in ADTS data
        packet[0] = (byte) 0xFF;
        packet[1] = (byte) 0xF9;
        packet[2] = (byte) (((profile - 1) < <6) + (freqIdx << 2) + (chanCfg >> 2));
        packet[3] = (byte) (((chanCfg & 3) < <6) + (packetLen >> 11));
        packet[4] = (byte) ((packetLen & 0x7FF) > >3);
        packet[5] = (byte) (((packetLen & 7) < <5) + 0x1F);
        packet[6] = (byte) 0xFC;
