Audio samples of "Chinese Text-To-Speech(TTS) based on Deep Learning"
Abstract: The disruptively designed end-to-end speech synthesis system Tacotron2 is currently only available in English.
This paper is devoted to the multi-directional improvement of Tacotron2,
and designs a Chinese speech synthesis scheme, which mainly includes:
adding pre-processing modules to convert Chinese into phonetic characters for the problems of Chinese characters,
such as non-sound, transposition and multi-tone; In the case of insufficient Chinese training corpus,
the pre-training decoder is used to obtain better sound quality in less corpus; for the Chinese speech synthesis rapid pause problem,
the cross entropy loss is weighted, and the multi-layer perceptron is used instead of the linear transformation pair.
The strategy of stopping the predictor has been effectively improved;
in addition, the Chinese speech synthesis quality has been further improved by adding a multi-attention mechanism.
The experimental comparison of the Mel spectrum and the Mel cepstrum distance shows that our work is effective and
can make Tacotron2 better adapt to the requirements of Chinese speech synthesis.