A Brief Introduction to Open Source Chinese TTS

(updated on Sep 16, 2025)

eSpeak is a compact open-source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents.
The voice in eSpeak is quite robotic. But it's small and fast. It's the default TTS engine for NVDA and Orca screen readers.
In September 2007, Cantonese was supported by eSpeak (check for its next generation, eSpeak NG).
In November 2007, Mandarin Chinese was supported by eSpeak.
eSpeak doesn't include full support for both Cantonese and Mandarin in its standard installation. We provide an archive package for eSpeak-Chinese.

In July 2008, Ekho was released. It supports Cantonese and Mandarin Chinese. Its voice quality is better than that of eSpeak.
In December 2012, Tibetan was supported by Ekho. Tibetan voice data is not packaged in Ekho by default. We can check the instruction of installing the Tibetan voice of Ekho.

As deep learning develops, better and better open-source Chinese TTSes appear. Below are some examples.

In November 2020, zhtts that runs on CPU in real time was released.

In December 2019, PaddleSpeech was released.
In February 2023, Cantonese was supported by PaddleSpeech. But the pretrain model is not quite clear and fluent.
Here is an installation guide for PaddleSpeech.

In November 2023, EmotiVoice was released. It's a multi-voice and prompt-controlled TTS. It speaks both English and Mandarin Chinese.

In 2021, Coqui TTS was released, which supports Mandarin Chinese.
In September 2025, Cantonese Model for Coqui TTS was released.