It’s been more than a year since Mirco Ravanelli announced his new speech Toolkit, and it’s time for SpeechBrain.

Heart of the Machine, editor: Mayonnaise.

The progress of speech processing technology is an important part of artificial intelligence to change people’s life. The rise of deep learning technology has also made great progress in this field in recent years. In the past, the main approach in this field has been to develop different toolkits for different tasks, which can take a lot of time for users to learn, and may involve learning different programming languages, and getting familiar with different code styles and standards. Now, most of these tasks can be done using deep learning techniques.

Previously, developers commonly used voice tools such as Kaldi, ESPNet, CMU Sphinx, HTK, etc., which have their own shortcomings. Kaldi, for example, relies on a large number of scripting languages, and its core algorithm is written in C++, plus it may need to change the structure of various neural networks. Even experienced engineers can experience great pain during debugging.

In line with the principle of making it easier for voice developers, Yoshua Bengio team member Mirco Ravanelli and others developed an open source framework that attempted to inherit Kaldi’s efficiency and PyTorch’s flexibility. But according to the developers themselves, “it’s not perfect enough.”

So, a little over a year ago, Mirco Ravanelli announced the creation of a new all-in-one speech toolkit called SpeechBrain. With that in mind, SpeechBrain’s main mission is to be simple, flexible, and user-friendly.

Project address: github.com/speechbrain…

As an open source, integrated speech toolkit based on PyTorch, SpeechBrain can be used to develop the latest speech technologies, including speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing, and speech recognition systems, with considerable performance. The team outlined its features as “easy to use,” “easy to customize,” “flexible,” “modular,” and so on.

For machine learning researchers, SpeechBrain makes it easy to embed other models into speech technology research. For starters, SpeechBrain is also easy to master, and according to testing, it only takes the average developer a few hours to get used to the tool. In addition, the development team has released a lot of tutorials for reference (speechbrain. Making. IO/tutorial_ba…). .

Overall, SpeechBrain has the following highlights:

  • The development team integrates with HuggingFace pre-trained models that have interfaces for running reasoning. If the HuggingFace model is not available, the team provides a Google Drive folder containing all the corresponding experiment results;

  • PyTorch data parallelism or distributed data parallelism is used for multi-GPU training and reasoning.

  • Mixing accuracy, speed up training speed;

  • Transparent and fully customizable data input and output pipelines. SpeechBrain follows the PyTorch data loader and data set style, enabling users to customize I/O Pipelines.

Fast installation

Developers can currently install SpeechBrain through PyPI, and can also use native installations to run experiments and modify/customize toolkits.

SpeechBrain supports Linux-based distributions and macOS (and has a solution for Windows users: github.com/speechbrain…). .

SpeechBrain supports both CPU and GPU, but for most recipes, the GPU must be used during training. It is important to note that CUDA must be properly installed to use the GPU.

Address: installation tutorial speechbrain. Readthedocs. IO/en/latest/I…

Install using PyPI

Once you have created the Python environment, simply type the following:

pip install speechbrain
Copy the code

You can then access SpeechBrain using the following command:

import  speech  brain  as  sb
Copy the code

The local installation

Once you have created the Python environment, simply type the following:

git clone https://github.com/speechbrain/speechbrain.gitcd speechbrainpip install -r requirements.txtpip install --editable .
Copy the code

You can then access SpeechBrain by:

import  speechbrain  as  sb
Copy the code

Any changes made to the SpeechBrain package will be automatically explained when you install the package with the — Editable logo.

SpeechBrain is not affiliated with any organization, but has a team of lab and enterprise members from The Mila Institute, Nuance, Dolby LABS, Nvidia, Samsung, Viadialog, and others. The two original principals were Mirco Ravanelli, a postdoctoral fellow at the Mila Institute, and Titouan Parcollet, an Avignon doctoral student. Currently, the Speechbrain project is a work in progress, and more developers are welcome.

Does this make Kaldi feel a little stressed out?