Android Audio Visualization

Author: Xiong Yunyang

preface

Audio visualization, as the name implies, is the presentation of sound in a visual way. How do I plot the audio signal? How to show the change of sound clearly visually, so that the feeling of vision and hearing is consistent? How does this work on Android? This article will answer these questions and try to make a comprehensive introduction to audio visualization implementation on Android.

Fourier transform

The general flow of Android audio playback is as follows:

The player loads encoded audio data from a local audio file or network and decodes it into PCM data writingAudioTrack
AudioTrackWrite PCM data to the FIFO
AudioFlingerIn theMixerThreadthroughAudioMixerData in FIFO is read for mixing and written to HAL output device for playback

In this process, audio features are directly represented, and PCM data is available for visual rendering. However, PCM represents the intensity of audio signal at each sampling time point, which looks chaotic and difficult to reflect the changes in auditory perception. PCM data can only be used to draw visual dynamic effects reflecting the change of average intensity of audio signals, while most other dynamic effects need to be drawn using frequency domain data reflecting the change of signal intensity at each frequency point obtained after Fourier transform of PCM data.

Just a quick review of the Fourier transform, which converts a signal from the time domain to the frequency domain, is used to analyze the spectrum of a signal, to determine its composition. The conversion result is as follows:

PCM data is time discrete and requires the use of discrete Fourier transform (DFT), which will contain a sequence of N complex numbers {xn}:=x0,x1… , xN – 1 \ {x_n \} : = x_0 x_1,… , x_{N-1}{xn}:=x0,x1,… ,xN−1 converts to another complex sequence {Xk}:=X0,X1… , XN – 1 \ {X_k \} : = X_0 X_1,… , X_{N-1}{Xk}:=X0,X1,… ,XN−1, and the calculation formula is:

X_k=\sum_{n=0}^{N-1}x_n \cdot e^{-i2 \pi {kn \over N}}=\sum_{n=0}^{N-1}x_n \cdot (cos(2\pi {kn \over N})-i \cdot sin(2\pi {kn \over N}))

The DFT of the sequence with length N is directly calculated by the above formula, and the time complexity is O(N2)O(N^2)O(N2), which is slow. In practical application, the fast Fourier transform (FFT) is generally used to reduce the time complexity to O(Nlog(N))O(Nlog(N))O(Nlog(N)).

The calculation formula looks very complicated, but not understanding it will not affect our realization of audio visualization, FFT calculation can use the existing library, do not need to achieve their own. However, in order to derive the data from FFT calculation results, it is necessary to understand the following DFT characteristics:

When all inputs are real numbers, the output results satisfy conjugation symmetry: XN−k=Xk∗X_{n-k}=X_k^*XN−k=Xk∗, so generally implementations return only half the results
For example, if the original signal sampling rate is FSF_Sfs, the sequence length is N, the output frequency resolution is FS /Nf_s/Nfs/N, and the frequency of the KTH point is KFS /Nkf_s/Nkfs/N, it can be used to find the corresponding position of the specified frequency range in the result
If the real and imaginary parts of a frequency output are re and im, its modulus is M=re2+im2M=\ SQRT {re^2+im^2}M=re2+im2, The amplitude of the original signal is A={M/NDC2M/NotherA=\begin{cases} M/N & DC \\ 2M/N & other \end{cases}A={M/N2M/NDCother, which can be used to calculate decibels and data scaling

The data source

There are two data sources to provide FFT calculation results for playing PCM data. One is the Visualizer class provided by the Android system, which has compatibility problems. Therefore, we introduce another self-implemented data source. At the same time, we realized switching data sources without modifying the data processing and rendering logic of the upper dynamic effects, as shown in the figure below:

Android Visualizer

System Visualizer provides a convenient API to obtain the waveform or FFT data of playback audio. The general usage is:

Create with audio session IDVisualizerObject, to get remixed visual data, to a specific player orAudioTrackThe ID of the audio session used to get visual data for the audio they play
adjustablesetCaptureSizeMethod to set the size of data to be obtained each timesetDataCaptureListenerMethod sets the data callback and specifies the frequency of retrieving data (that is, the callback frequency) and the data type (waveform or FFT)
adjustablesetEnabledMethod begins to fetch data, no longer requiring a callbackreleaseMethod to release resources

More detailed API information can be found in the official documentation.

The output value of Visualizer is proportional to the volume. If the volume is 0, the output value is also 0. The visualization effect varies with the volume.

The system Visualizer is incompatible with the system. On some models, the system sound will become invalid. To display the dynamic effects on all models without side effects, you need to customize the Visualizer.

Custom Visualizer

As a data source consistent with the Visualizer function, custom Visualizer must provide the following functions:

Obtain PCM data and calculate FFT
Sends FFT data at the specified frequency and size

To achieve the first function, we first need to obtain the PCM data of playing audio, which requires that the player used can provide PCM data. Our player is self-implemented and can meet this requirement. We have extended the player to include the ability to collect decoded PCM data to calculate FFT.

Because different audio sampling rates are different and FFT calculation uses a fixed window size, the callback frequency of FFT calculation results changes with the playing audio, and the specified data size may be different from the size of the calculation results. Therefore, to realize the second function, it is necessary to do fixed frequency and sampling processing for the calculation results.

In addition, our player runs in the player process, while the active page that actually uses FFT data runs in the main process, so data needs to be transferred across processes.

To sum up, the overall process of customized Visualizer is: calculate FFT in the player process native layer, call the calculation result back to the Java layer through JNI, and then pass THE FFT data to the main process through AIDL for subsequent data processing and sending operations. As shown below:

Fixed frequency The variable FFT result callback frequency needs to be converted to the externally set Visualizer callback frequency, as shown below:

Depending on the difference between the required data send interval and the FFT callback interval, we take two different approaches.

When the interval difference is less than or equal to the callback interval, the data is discarded once for every t/ δ TT / \Delta TT / δ t callback, where t is the FFT callback interval and δ T \Delta t δ t is the interval difference, as shown in the figure below:

When the interval difference is greater than the callback interval, data will be sent once every T1 / TT1 / TT1 / T callback, where T1 is the required data sending interval and T is the FFT callback interval, as shown in the following figure:

Sampling is to extract data from the original FFT data at appropriate intervals to meet the requirements of the setting when the data size of the external setting is smaller than the data size of the FFT calculation result.

In order to ensure that the value range returned by the custom Visualizer is consistent with the system Visualizer and seamless data source switching, we need to scale the FFT data. The value range of the decoded PCM data is [-1, 1], so the amplitude range of the original signal is [0, 1], that is, the value range of 2M/N2M/N2M/N is [0, 1] (dc component will not be used in drawing, so it is not considered here). The FFT data returned by Visualizer is a byte array. The value range of real and virtual parts is [-128, 128], and the value range of modulus is [0,128×2][0, 128 \times \ sqrT2][0,128×2]. The value range of 2M/N x 128 x 22M/N \times 128 \times \sqrt22M/N x 128 x 2 is the same as the value range of FFT in the output of the Visualizer. Since phase information is not used for drawing, we can use the values scaled in the above manner as the real part of the output FFT data and set the imaginary part to 0.

Due to the high frequency data, in order to avoid the frequent object creation cause memory jitter, we use the pool to save the data array object, each object from the pool of access to the required size of the array object, fill the sampling data to join the queue waiting to send, after data after the consumer will array object returned to the pool.

The data processing

Different dynamic effects have different data processing methods, ignoring the differences in details. Among the existing dynamic effects of cloud music, except for cosmic dust and Lonely Planet, other processing processes are basically the same, as shown in the figure below:

Firstly, the index position of the required frequency data in the FFT array is calculated according to the frequency range selected by the dynamic effect:

f_r=f_s/N, start=\lceil MIN/f_r \rceil, end=\lfloor MAX/f_r \rfloor

Where fsF_Sfs is the sampling rate, N is the FFT window size, FRF_RFR is the frequency resolution, MIN is the start value of frequency range, and MAX is the end value of frequency range.

Then, according to the data points required by dynamic effect, the FFT data within the frequency range are sampled or a SINGLE FFT data is used to represent multiple data points.

Then calculate the decibels:

db=20\log_{10}M

Where M is the modulus of FFT data.

Then convert decibels into heights:

h=db/MAX\_DB \cdot maxHeight

MAX_DB is the preset maximum decibel, maxHeight is the maximum height required by the current dynamic effect.

Finally, the calculated height is smoothed.

smooth

Smoothing the final data used to draw results in softer curves and a better visual effect, as shown below:

There are many data smoothing algorithms, and savitzky-Golay filtering method is selected for comprehensive consideration of the effect and computational complexity. The calculation method is as follows, and the corresponding window sizes are 5, 7 and 9 respectively. Different window sizes can be selected as required.

Y_i={1 \over 35}(-3y_{i-2}+12y_{i-1}+17y_i+12y_{i+1}-3y_{i+2})

Y_i={1 \over 21}(-2y_{i-3}+3y_{i-2}+6y_{i-1}+7y_i+6y_{i+1}+3y_{i+2}-2y_{i+3})

Y_i={1 \over 231}(-21y_{i-4}+14y_{i-3}+39y_{i-2}+54y_{i-1}+59y_i+54y_{i+1}+39y_{i+2}+14y_{i+3}-21y_{i+4})

The data changes after smoothing are shown in the figure below:

BufferQueue

Some dynamic effect data processing calculations are quite complex. In order to improve parallelism and reduce mainline time consumption, we draw on the idea of BufferQueue in the system graphics framework to realize a simple BufferQueue that carries dynamic effect drawing data and connects data processing and drawing. Its working process is shown in the following figure:

When initialized with a BufferQueue’s dynamic rendering class, a BufferQueue of the appropriate size is created as needed and a Looper thread is started to perform data processing.

The data processing part corresponds to the Producer of the BufferQueue. When the FFT data arrives, the Handler bound to the Looper thread sends the data to the Looper thread for data processing. During data processing, the Producer’s dequeue method is first called to obtain the idle Buffer from BufferQueue, and then the FFT data is processed to generate the required data and fill the Buffer. Finally, the Buffer is added to the Queued queue of the BufferQueue by calling Producer’s queue method.

Draw the Consumer part of the BufferQueue. Calling Producer’s Queue triggers the onBufferAvailable callback from the ConsumerListener. The main thread consumes Buffer in a callback by binding the main thread Handler. The Consumer acquire method is first called to get a Buffer from the BufferQueue’s Queued queue, and then the required data is fetched from the Buffer to draw. Consumer’s release method is finally called to return the last Buffer to the BufferQueue.

draw

The main work of the drawing part is to call the Canvas API of the system to draw the processed data into the desired effect. The specific application of the API to draw varies with the following effect, which will not be described here. This section introduces some optimization lessons for dynamic rendering in terms of experience and performance that are important for rendering.

Since the FFT data callback interval is greater than 16ms, if only the data is drawn when it comes in, there will be a visual lag. In order to get a better visual effect, we need to add transition frames between the two callbacks to achieve a gradual animation effect. The realization method is to calculate the current height according to the elapsed time of the current time relative to the data arrival time within the time interval between the previous data arrival time and the current data arrival time, and draw it repeatedly, as shown in the figure below:

Batch and cache are two methods of performance optimization, which can also be used in dynamic rendering. For dynamic effects that require multiple lines or points to be drawn, the drawLines or drawPoints methods should be called for batch processing instead of the drawLine or drawPoint methods being looped to reduce execution time.

conclusion

This article introduces the background knowledge and implementation process involved in Android audio visualization, and provides some solutions and optimization ideas. This article focuses on the general scheme, does not involve the specific implementation of specific dynamic effect, hope readers can be inspired to achieve their own cool dazzling dynamic effect.

The resources

Fourier transform
The discrete Fourier transform
The Fast Fourier transform
Data smoothing
Savitzky – Golay filtering
Android BufferQueue

This article is published from netease Cloud Music big front end team, the article is prohibited to be reproduced in any form without authorization. Grp.music – Fe (at) Corp.Netease.com We recruit front-end, iOS and Android all year long. If you are ready to change your job and you like cloud music, join us!