Ekho Developer's Guide
(updated on May 7, 2011)
Source Code
Ekho uses SVN as the source code version control system. Source code can be checkout with following command:
svn co https://e-guidedog.svn.sourceforge.net/svnroot/e-guidedog/eGuideDog_TTS e-guidedog/eGuideDog_TTS
Voice data is not in directory above. Jyutping (Cantonese) is in e-guidedog/jyutping while Pinyin (Mandarin) is in e-guidedog/ssb22/yali-voice. Hangul (Korean) and other voices is not in version control. It's not necessary to checkout the voice data because the size is a bit large. Instead we can copy them from a release package.
We can also browse source code in http://e-guidedog.svn.sourceforge.net/viewvc/e-guidedog/
Here is a doxygen of version 4.5.2: Ekho doxygen
Ekho is written in C++ and we recommended this C++ Coding Standard although current code does not fully comply with the standard.
Software Architecture
Here is the component graph:
+----------------+
| Web TTS Client |
+----------------+-------+------------------+---------------+---------+
| Web TTS Server | SAPI5 | OpenTTS/ | Command Line/ | Android |
| | | SpeechDispatcher | Gtk+ | |
+----------------+-------+------------------+---------------+---------+
| Ekho Library |
+----------+------------+------------+---------+--------+-------------+
| Dict | Festival | SoundTouch | Sndfile | Lame | Pulseaudio |
+----------+------------+ / +---------+ / +-------------+
| utfcpp | sr-convert | Sonic | GSM | Vorbis |
+----------+------------+------------+---------+--------+
Here are some comments about the components:
- Web TTS Client: It's called WebSpeech, which is written in Javascript for Web developers to write pages with voice. It will get audio data from Web TTS Server and play it with SoundManager2.
- Web TTS Server: This is a CGI script (ekho.pl) to call Ekho through Web.
- SAPI5: This is the standard speech API on Windows platform. Before supporting the SAPI5, we build Ekho with MinGW. Now we build SAPI5 with VS2005.
- OpenTTS: It's forked from Speech Dispatcher, which is a standard speech API on Linux platform.
- Command Line: ekho.cpp
- Gtk+: "-g" option in ekho.cpp
- Android: export Java API for Android
- Ekho Library: libekho.cpp, ekho.h
- Dict: ekho_dict.cpp, ekho_dict.h
- Festival: Ekho speak English through Festival TTS.
- SoundTouch: A library for change audio pitch, speed etc.
- Sonic: Sonic is a simple algorithm for speeding up or slowing down speech. (`git clone git://vinux-project.org/sonic`)
- Sndfile: A library for reading/writing different kinds of audio format.
- LAME: An MP3 library. We need to pay attention to patent/legal issue when dealing with MP3.
- Vorbis: A library for dealing with OGG file format.
- Pulseaudio: A sound system for POSIX OSes. We use Portaudio before. But Pulseaudio seems more compatible.
- utfcpp: A C++ library for dealing with UTF encoding.
- sr-convert: a sample-rate conversion utility for WAV files. For converting Festival (16000Hz) sample rate to other sample rate.
- GSM: high compressing rate format for voice wave.
Here is a typical procedure:
- Get input
- Query each Character's phonetic symbols from ekho::Dict
- Get PCM audio data according to phonetic symbols with Sndfile from WAV voice data files
- Join all characters' PCM data
- Change pitch, speed with SoundTouch
- Play audio data with Pulseaudio or save to WAV with Sndfile, MP3 with LAME or OGG with Vorbis
|