This is the first day of my participation in the Gwen Challenge in November. Check out the details: the last Gwen Challenge in 2021

Project address: NVIDIA/ Tacotron2, clone first

git clone https://github.com/NVIDIA/tacotron2
Copy the code

Configure the environment

My experimental environment (Ubuntu) :

Python ==3.6.12 numpy==1.17.0 matplotlib==2.1.0 scipy==1.0.0 numba==0.48.0 librosa==0.6.0 tensorflow==1.15.2 Pytorch = = 1.1.0 torchvision = = 0.3.0 inflect the = = 0.2.5 Unidecode = = 1.0.22Copy the code

Since cudA version of our lab server is 9.0, we can only use PyTorch version 1.1.0, otherwise we can’t use the GPU. However, the source code for this project uses some new features above PyTorch 1.3, so I will have to modify some of the source code first (skip this if your PyTorch version is greater than or equal to 1.3)

The first method to change is line 9 of utils.py

# mask = (ids < lengths.unsqueeze(1)).bool()
mask = (ids < lengths.unsqueeze(1)).to(torch.bool)
Copy the code

The second thing to change is lines 401 and 488 of model.py

# memory, mask=~get_mask_from_lengths(memory_lengths)
memory, mask = get_mask_from_lengths(memory_lengths) <= 0

# mask=~get_mask_from_lengths(output_lengths)
mask = get_mask_from_lengths(output_lengths) <= 0
Copy the code

These are the changes that a few people need to make, the changes that everyone needs to make below are the appropriate changes to the contents of the three files in the Filelists/directory

Each file has two columns. The first column contains the location of the voice data, and the second column contains the text corresponding to the voice

The first column is the one we need to change, depending on where you downloaded the LJSpeech Dataset. For example, if I put ljspeech-1.1 / under tacotron2/, the same as tacotron2/train.py, then my path should be changed to

Start training

Single GPU

If you only have one GPU, run the following command to start training

python train.py --output_directory=outdir --log_directory=logdir
Copy the code

Many GPU

Multi GPU training, first install Apex

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir ./
Copy the code

Then manually create a new directory mkdir tacotron2/logs

Finally, run the following command

python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True
Copy the code

test

The code for the test has been provided, namely the inference. Ipynd file

The first thing we noticed was that the Waveglow folder was empty in the original project and we needed to clone the Waveglow code before importing denoiser

git clone https://github.com/NVIDIA/waveglow.git
Copy the code

Then make sure the tensorFlow version is 1.x. If it is 2, an error will be reported. If unidecode is not available, please PIP yourself

The above are some of the potholes you will encounter when running the program, and the following are some of the parts of the code that need to be changed

First of all, in Load Model from checkpoint, checkpoint_path can use the self-trained model instead of the official pre-trained model. Checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000 /checkpoint_59000

The waveglow_256channels_universal_v5.pt file is needed to Load WaveGlow for Mel2Audio synthesis and denoiser

Once you’ve done all the above steps, you’re ready to run

Solutions to common errors reported during training

  1. CUDA out of memoryThat will behparams.pyIn thebatch_sizeChange the parameter value to a smaller value
  2. No module named numba.decoratorsFirst uninstall Numbapip uninstall numbaAnd then install version 0.48.0PIP install numba = = 0.48.0
  3. numpy.core.multiarray failed to importTo ensure that you have installed a numpy version less than 1.19 and greater than 1.15