speechbrain / speechbrain
- среда, 21 апреля 2021 г. в 00:29:56
A PyTorch-based Speech Toolkit
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.
The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.
SpeechBrain is currently in beta.
| Discourse | Tutorials | Website | Documentation | Contributing | HuggingFace |
SpeechBrain provides various useful tools to speed up and facilitate research on speech technologies:
Brain
class, a fully-customizable tool for managing training and evaluation loops over data. The annoying details of training loops are handled for you while retaining complete flexibility to override any part of the process when needed.SpeechBrain supports state-of-the-art methods for end-to-end speech recognition:
dataset
to facilitate the training over a large text dataset.SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic feature extraction:
SpeechBrain provides different models for speaker recognition, identification, and diarization on different datasets:
Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments:
The recipes released with speechbrain implement speech processing systems with competitive or state-of-the-art performance. In the following, we report the best performance achieved on some popular benchmarks:
Dataset | Task | System | Performance |
---|---|---|---|
LibriSpeech | Speech Recognition | CNN + Transformer | WER=2.46% (test-clean) |
TIMIT | Speech Recognition | CRDNN + distillation | PER=13.1% (test) |
TIMIT | Speech Recognition | wav2vec2 + CTC/Att. | PER=8.04% (test) |
CommonVoice (French) | Speech Recognition | CRDNN | WER=17.7% (test) |
VoxCeleb2 | Speaker Verification | ECAPA-TDNN | EER=0.69% (vox1-test) |
AMI | Speaker Diarization | ECAPA-TDNN | DER=2.13% (lapel-mix) |
VoiceBank | Speech Enhancement | MetricGAN+ | PESQ=3.08 (test) |
WSJ2MIX | Speech Separation | SepFormer | SDRi=22.6 dB (test) |
WSJ3MIX | Speech Separation | SepFormer | SDRi=20.0 dB (test) |
For more details, take a look into the corresponding implementation in recipes/dataset/.
SpeechBrain is designed to speed-up research and development of speech technologies. Hence, our code is backed-up with three different levels of documentation:
We are currently working towards integrating DNN-HMM for speech recognition and machine translation.
SpeechBrain is constantly evolving. New features, tutorials, and documentation will appear over time. SpeechBrain can be installed via PyPI to rapidly use the standard library. Moreover, a local installation can be used by those users that what to run experiments and modify/customize the toolkit. SpeechBrain supports both CPU and GPU computations. For most all the recipes, however, a GPU is necessary during training. Please note that CUDA must be properly installed to use GPUs.
Once you have created your Python environment (Python 3.8+) you can simply type:
pip install speechbrain
Then you can access SpeechBrain with:
import speechbrain as sb
Once you have created your Python environment (Python 3.8+) you can simply type:
git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .
Then you can access SpeechBrain with:
import speechbrain as sb
Any modification made to the speechbrain
package will be automatically interpreted as we installed it with the --editable
flag.
Please, run the following script to make sure your installation is working:
pytest tests
pytest --doctest-modules speechbrain
In SpeechBrain, you can run experiments in this way:
> cd recipes/<dataset>/<task>/
> python experiment.py params.yaml
The results will be saved in the output_folder
specified in the yaml file. The folder is created by calling sb.core.create_experiment_directory()
in experiment.py
. Both detailed logs and experiment outputs are saved there. Furthermore, less verbose logs are output to stdout.
Instead of a long and boring README, we prefer to provide you with different resources that can be used to learn how to customize SpeechBrain to adapt it to your needs:
SpeechBrain is released under the Apache License, version 2.0. The Apache license is a popular BSD-like license. SpeechBrain can be redistributed for free, even for commercial purposes, although you can not take off the license headers (and under some circumstances, you may have to distribute a license document). Apache is not a viral license like the GPL, which forces you to release your modifications to the source code. Also note that this project has no connection to the Apache Foundation, other than that we use the same license terms.