This is the first day of my participation in the Gwen Challenge in November. Check out the details: the last Gwen Challenge in 2021
Project address: NVIDIA/ Tacotron2, clone first
git clone https://github.com/NVIDIA/tacotron2
Copy the code
Configure the environment
My experimental environment (Ubuntu) :
Python ==3.6.12 numpy==1.17.0 matplotlib==2.1.0 scipy==1.0.0 numba==0.48.0 librosa==0.6.0 tensorflow==1.15.2 Pytorch = = 1.1.0 torchvision = = 0.3.0 inflect the = = 0.2.5 Unidecode = = 1.0.22Copy the code
Since cudA version of our lab server is 9.0, we can only use PyTorch version 1.1.0, otherwise we can’t use the GPU. However, the source code for this project uses some new features above PyTorch 1.3, so I will have to modify some of the source code first (skip this if your PyTorch version is greater than or equal to 1.3)
The first method to change is line 9 of utils.py
# mask = (ids < lengths.unsqueeze(1)).bool()
mask = (ids < lengths.unsqueeze(1)).to(torch.bool)
Copy the code
The second thing to change is lines 401 and 488 of model.py
# memory, mask=~get_mask_from_lengths(memory_lengths)
memory, mask = get_mask_from_lengths(memory_lengths) <= 0
# mask=~get_mask_from_lengths(output_lengths)
mask = get_mask_from_lengths(output_lengths) <= 0
Copy the code
These are the changes that a few people need to make, the changes that everyone needs to make below are the appropriate changes to the contents of the three files in the Filelists/directory
Each file has two columns. The first column contains the location of the voice data, and the second column contains the text corresponding to the voice
The first column is the one we need to change, depending on where you downloaded the LJSpeech Dataset. For example, if I put ljspeech-1.1 / under tacotron2/, the same as tacotron2/train.py, then my path should be changed to
Start training
Single GPU
If you only have one GPU, run the following command to start training
python train.py --output_directory=outdir --log_directory=logdir
Copy the code
Many GPU
Multi GPU training, first install Apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./
Copy the code
Then manually create a new directory mkdir tacotron2/logs
Finally, run the following command
python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True
Copy the code
test
The code for the test has been provided, namely the inference. Ipynd file
The first thing we noticed was that the Waveglow folder was empty in the original project and we needed to clone the Waveglow code before importing denoiser
git clone https://github.com/NVIDIA/waveglow.git
Copy the code
Then make sure the tensorFlow version is 1.x. If it is 2, an error will be reported. If unidecode is not available, please PIP yourself
The above are some of the potholes you will encounter when running the program, and the following are some of the parts of the code that need to be changed
First of all, in Load Model from checkpoint, checkpoint_path can use the self-trained model instead of the official pre-trained model. Checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000
The waveglow_256channels_universal_v5.pt file is needed to Load WaveGlow for Mel2Audio synthesis and denoiser
Once you’ve done all the above steps, you’re ready to run
Solutions to common errors reported during training
CUDA out of memory
That will behparams.py
In thebatch_size
Change the parameter value to a smaller valueNo module named numba.decorators
First uninstall Numbapip uninstall numba
And then install version 0.48.0PIP install numba = = 0.48.0
numpy.core.multiarray failed to import
To ensure that you have installed a numpy version less than 1.19 and greater than 1.15