PyTorch is used for data manipulation and conversion of audio signal processing

“This is the sixth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

🌊 author’s home page: Haichong 🌊 Author Profile: 🥇HDZ core group member, 🏆 full stack quality creator, 🌊 has been ranked top ten in the weekly list of C station for a second year. Fan welfare: Send four books to fans every week (each one has), and send various small gifts every month (gold boring enamelware cup, pillow, mouse pad, mug, etc.)

Torchaudio: PyTorch’s audio library

Torchaudio aims to apply PyTorch to audio. By supporting PyTorch, Torchaudio follows the same philosophy of providing strong GPU acceleration, focusing on trainable features with a consistent style (tensor names and dimension names) through the Autograd system. Therefore, it is primarily a machine learning library rather than a general signal processing library. The benefits of PyTorch can be seen in Torchaudio, as all calculations are done through the PyTorch operation, which makes it easy to use and feels like a natural extension.

Support audio I/O (load file, save file)
- Load the following format into Torch Tensor using SoX
  - Mp3, WAV, AAC, OGG, FLAC, AVR, CDDA, CVS/VMS,
  - aiff, au, amr, mp2, mp4, ac3, avi, wmv,
  - Any other format supported by MPEG, IRCAM, and LibSox.
  - Kaldi (Ark /SCP)
Data Loaders for common Audio datasets (VCTK, YesNo)
Common audio conversion
- Spectrum map, AmplitudeToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, resampling
Compliance interface: Use PyTorch to run code consistent with other libraries
- Kaldi: Spectrum Graph, Fbank, MFCC, Resample_Waveform

dependencies

PyTorch (Compatible version below)
Libsox V14.3.2 or later (only needed when building from source)
[optional] vesis84 / Kaldi – IO – for – python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above

Here are the corresponding versions of Torchaudio and the supported Python versions.

torch	torchaudio	python
`master` / `nightly`	`master` / `nightly`	`> = 3.6`
`1.7.0`	`0.7.0`	`> = 3.6`
`1.6.0`	`0.6.0`	`> = 3.6`
`1.5.0`	`0.5.0`	`> = 3.5`
`1.4.0`	`0.4.0`	`= = 2.7`.`> = 3.5`.`< = 3.8`

The installation

Bivariate distribution

To install the latest version using Anaconda, run:

conda install -c pytorch torchaudio
Copy the code

To install the latest PIP wheel, run:

pip install torchaudio -f https://download.pytorch.org/whl/torch_stable.html
Copy the code

(If you haven’t already installed the Torch, this will install the Torch from PyPI by default. If you need a different Torch configuration, preinstall the Torch before running this command.

The Nightly build

Note that nightly builds are built on top of PyTorch’s nightly builds. Therefore, when you use Torchaudio built nightly, you need to install the latest PyTorch.

pip

pip install numpy
pip install --pre torchaudio -f https://download.pytorch.org/whl/nightly/torch_nightly.html
Copy the code

conda

conda install -y -c pytorch-nightly torchaudio
Copy the code

From the Source

If your system configuration is not among the supported configurations above, you can build Torchaudio from source code.

This will require libSox V14.3.2 or later.

Examples of how to install SoX

OSX (Homebrew) :

brew install sox
Copy the code

Linux (Ubuntu)

sudo apt-get install sox libsox-dev libsox-fmt-all
Copy the code

Python

conda install -c conda-forge sox
Copy the code

# Linux
python setup.py install

# OSX 
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Copy the code

Alternatively, the build process can statically build libSOX and some optional codecs, and Torchaudio can link them by setting environment variables BUILD_SOX=1. The build process will grab and build libmad, lame, FLac, Vorbis, opus, and libsox before building the extension. This process requires cmake and PKG-config.

# Linux 
BUILD_SOX=1 python setup.py install # OSX 
BUILD_SOX=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Copy the code

This is well known for Linux and Unix distributions such as Ubuntu and CentOS 7 and macOS. If you try this on a new system and find a solution to make it work, feel free to share it by opening the issue.

troubleshooting

Check the build system type… ./config.guess: The system type cannot be guessed

New environments, such as Jetson Aarch, cannot be properly detected because the codec configuration file is older. The config.guess file needs to be updated or replaced. . / third_party/TMP/lame – 3.99.5 / config. Guess ` `. / third_party/tmp/libmad-0.15.1b/config.guess:github.com/gcc-mirror/…

Quick to use

import torchaudio

waveform, sample_rate = torchaudio.load('foo.wav')  # load tensor from file
torchaudio.save('foo_save.wav', waveform, sample_rate)  # save tensor to file
Copy the code

The back-end scheduling

By default, in OSX and Linux, Torchaudio uses SoX as a back end to load and save files. You can change the back end to SoundFile using the following command. See SoundFile for installation instructions.

import torchaudio
torchaudio.set_audio_backend("soundfile")  # Switch background

waveform, sample_rate = torchaudio.load('foo.wav')  Load tensors from files as usual
torchaudio.save('foo_save.wav', waveform, sample_rate)  Save the tensors to the file as usual
Copy the code

Unlike SoX, SoundFile does not currently support MP3.

API reference

The API reference is here: pytorch.org/audio/

The convention

Since Torchaudio is a machine learning library and is built on top of PyTorch, Torchaudio is standardized around the following naming convention. Suppose the tensor has “channel” as the first dimension and time as the last (if applicable). This makes it the same size as the PyTorch. For size names, use the prefix n_ (for example, “tensor of size (n_freq, n_mel)”), while dimension names do not (for example, “tensor of dimension (channel, time)”)

waveform: Audio sample tensor with dimensions (channel, time)
sample_rate: Rate of audio dimension (number of samples per second)
specgram: Spectrum tensor with dimensions (channel, frequency, time)
mel_specgram: MEL spectrogram with dimensions (channel, MEL, time)
hop_length: Number of samples between the start of consecutive frames
n_fft: Number of Fourier boxes
n_mel.n_mfcc: Quantity of MEL and MFCC bin
n_freq: The number of Bin in the linear spectrum graph
min_freq: The lowest frequency of the lowest frequency band in the spectrum diagram
max_freq: The highest frequency of the highest frequency band in the spectrum diagram
win_length: Length of the STFT window
window_fn: for functions that create Windows, for exampletorch.hann_window

Transform expectations and return the following dimensions.

Spectrogram(Channel, time) -> (Channel, frequency, time)
AmplitudeToDB(Channel, frequency, time) -> (Channel, frequency, time)
MelScale: (channel, frequency, time) -> (channel, MEL, time)
MelSpectrogram(Channel, time) -> (channel, MEL, time)
MFCC(Channel, time) -> (channel, MFCC, time)
MuLawEncode(Channel, time) -> (channel, time)
MuLawDecode(Channel, time) -> (channel, time)
Resample(Channel, time) -> (channel, time)
Fade(Channel, time) -> (channel, time)
Vol(Channel, time) -> (channel, time)

Plural by (… , 2) tensors of dimensions are supported, and tensors such as Torchaudiocomplex_norm and Angle are provided to convert into its amplitude and phase. Here, in the document, we use the ellipsis “…” Serves as a placeholder for the remaining dimensions of the tensor, such as the optional batch and channel dimensions.

Contribution to guide

Please refer to the CONTRIBUTING. Md

Data set disclaimer

This is a library of utilities for downloading and preparing common data sets. We do not host or distribute these datasets, do not guarantee their quality or fairness, and do not claim that you have permission to use them. It is your responsibility to determine whether you are entitled to use the dataset under its license.

If you are a dataset owner and wish to update any part of it (description, citation, etc.), or do not wish your dataset to be included in this library, please contact us via GitHub Questions. Thank you for your contribution to the ML community!

GitHub

Github.com/pytorch/aud…

Write it at the end

The author is determined to build a fishing site with 100 small games, update progress: 40/100

I’ve been blogging about technology for a long time, mostly via Nuggets, and this is my article on data manipulation and conversion using PyTorch for audio signal processing. I like to share technology and happiness through articles. You can visit my blog at juejin.cn/user/204034… For more information. Hope you like it! 😊

💌 welcomes your comments and suggestions in the comments section! 💌