Ekho Developer's Guide

(updated on May 7, 2011)

Source Code

Ekho uses SVN as the source code version control system. Source code can be checkout with following command:

svn co https://e-guidedog.svn.sourceforge.net/svnroot/e-guidedog/eGuideDog_TTS e-guidedog/eGuideDog_TTS

Voice data is not in directory above. Jyutping (Cantonese) is in e-guidedog/jyutping while Pinyin (Mandarin) is in e-guidedog/ssb22/yali-voice. Hangul (Korean) and other voices is not in version control. It's not necessary to checkout the voice data because the size is a bit large. Instead we can copy them from a release package.

We can also browse source code in http://e-guidedog.svn.sourceforge.net/viewvc/e-guidedog/

Here is a doxygen of version 4.5.2: Ekho doxygen

Ekho is written in C++ and we recommended this C++ Coding Standard although current code does not fully comply with the standard.

Software Architecture

Here is the component graph:

+----------------+
| Web TTS Client |
+----------------+-------+------------------+---------------+---------+
| Web TTS Server | SAPI5 |     OpenTTS/     | Command Line/ | Android |
|                |       | SpeechDispatcher |      Gtk+     |         |
+----------------+-------+------------------+---------------+---------+
|                                Ekho Library                         |
+----------+------------+------------+---------+--------+-------------+
|   Dict   |  Festival  | SoundTouch | Sndfile |  Lame  | Pulseaudio  |
+----------+------------+      /     +---------+    /   +-------------+
|  utfcpp  | sr-convert |    Sonic   |   GSM   | Vorbis |
+----------+------------+------------+---------+--------+

Here are some comments about the components:

Web TTS Client: It's called WebSpeech, which is written in Javascript for Web developers to write pages with voice. It will get audio data from Web TTS Server and play it with SoundManager2.
Web TTS Server: This is a CGI script (ekho.pl) to call Ekho through Web.
SAPI5: This is the standard speech API on Windows platform. Before supporting the SAPI5, we build Ekho with MinGW. Now we build SAPI5 with VS2005.
OpenTTS: It's forked from Speech Dispatcher, which is a standard speech API on Linux platform.
Command Line: ekho.cpp
Gtk+: "-g" option in ekho.cpp
Android: export Java API for Android
Ekho Library: libekho.cpp, ekho.h
Dict: ekho_dict.cpp, ekho_dict.h
Festival: Ekho speak English through Festival TTS.
SoundTouch: A library for change audio pitch, speed etc.
Sonic: Sonic is a simple algorithm for speeding up or slowing down speech. (`git clone git://vinux-project.org/sonic`)
Sndfile: A library for reading/writing different kinds of audio format.
LAME: An MP3 library. We need to pay attention to patent/legal issue when dealing with MP3.
Vorbis: A library for dealing with OGG file format.
Pulseaudio: A sound system for POSIX OSes. We use Portaudio before. But Pulseaudio seems more compatible.
utfcpp: A C++ library for dealing with UTF encoding.
sr-convert: a sample-rate conversion utility for WAV files. For converting Festival (16000Hz) sample rate to other sample rate.
GSM: high compressing rate format for voice wave.

Here is a typical procedure:

Get input
Query each Character's phonetic symbols from ekho::Dict
Get PCM audio data according to phonetic symbols with Sndfile from WAV voice data files
Join all characters' PCM data
Change pitch, speed with SoundTouch
Play audio data with Pulseaudio or save to WAV with Sndfile, MP3 with LAME or OGG with Vorbis