eGuideDog
free software for the blind

A Brief Introduction to Open Source Chinese TTS

(updated on May 12, 2026)

eSpeak is a compact open-source text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents.
The voice of eSpeak is quite robotic. But it's small and fast. It's the default TTS engine for NVDA and Orca screen readers.
In September 2007, Cantonese was supported by eSpeak (check for its next generation, eSpeak NG).
In November 2007, eSpeak added support for Mandarin Chinese.
eSpeak doesn't include full support for both Cantonese and Mandarin in its standard installation package. We provide an archive package for eSpeak-Chinese.

In July 2008, Ekho was released. It supports Cantonese and Mandarin Chinese. Its voice quality is better than that of eSpeak.
In December 2012, Ekho added support for Tibetan. Tibetan voice data is not packaged in Ekho by default. You may refer to the installing instruction for the Tibetan voice of Ekho.

As deep learning advances, an increasing number of better open‑source Chinese TTS solutions have emerged. Below are some examples.

In November 2020, zhtts that can runs on CPU in real time was released.

In December 2019, PaddleSpeech was released.
In February 2023, Cantonese was supported by PaddleSpeech. But the pretrained model does not produce very clear or fluent speech
Here is an installation guide for PaddleSpeech.

In November 2023, EmotiVoice was released. It's a multi-voice and prompt-controlled TTS. It supports both English and Mandarin Chinese.

In 2021, Coqui TTS was released, which supports Mandarin Chinese.
In September 2025, the Cantonese Model for Coqui TTS was released.

In Jan 2026, Piper TTS, a fast and local neural text-to-speech engine, added support for Mandarin Chinese.