Saturday, August 26, 2006

Speech Processing Links

Largest Source Code Chinese site http://www.programsalon.com/

English Translation: http://www.google.com/translate?u=http%3A%2F%2Fwww.pudn.com%2F&langpair=zh%7Cen&hl=en&ie=UTF8

webcast.berkeley

Podcasts and Webcasts of UC Berkeley current and archived courses.

CS 224S/LINGUIST 136/236 Speech Recognition and Synthesis Winter 2005

CS 224S / LINGUIST 281Speech Recognition and SynthesisWinter 2006

Speech, Music and Hearing

Speech Processing Group

Connexions

is a rapidly growing collection of free scholarly materials and a powerful set of free software tools

Examples of Synthesized Speech

Friday, August 25, 2006

Arabic Speech Synthesis

Introduction

Speech is the primary means of communication between people. Speech synthesis, automatic generation of speech waveforms. Speech synthesis is the artificial production of human speech.

A diphone can be defined as a speech fragment which runs roughly from half-way one phoneme to half-way the next phoneme.
In this way the transition between two consecutive speech sounds is encapsulated in the diphone and needs not be calculated.

Overview of speech synthesis technology

The front-end has two major tasks.
  • First, it takes the raw text and converts things like numbers and abbreviations into their written-out word equivalents. This process is called text normalization or pre-processing.
  • Then it assigns phonetic transcriptions to each word, and divides and marks the text into various prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called letter to sound . The output is called symbolic linguistic representation.

The other part, the back-end, takes the symbolic linguistic representation and converts it into actual sound output. The back end is often referred to as the synthesizer.

The Arabic language

Modern Standard Arabic is generally adopted as the common medium of communication through the Arab world today.
Arabic language contains 28 letters. There are 6 vowels in Arabic, 3 short and 3 long.

Research Challenges

We can classify it into two categories:
First Text To Speech challenges:

  1. Memory Requirements : in the concatenation method the memory requirements mainly appears when need to convert large text into speech.
  2. Limited to only one speaker : because diphones must be previously recorded and the memory requirements.
  3. Intelligibility : whether the system produces an intelligible speech
  4. Naturalness : can the hearer be familiar with the voice

Second Arabic Language challenges:

  1. The absence of the diacritics in modern Arabic text is one of the most critical problems facing computer processing of Arabic text.
  2. Arabic Language has a special phonological system, phonetacties and syllabic structure.
  3. Conversion of Arabic scripts into phonetic rules.

Research target

The main target is to build text to speech system using diphones concatenation. To achieve that goal, multiple goals appeared in the scene, such as:

  1. Automatic diacritization for Arabic text, so as to help user saving time.
  2. Phonological rules implementation (letter to sound conversion) is responsible for the automatic determination of the phonetic transcription of the incoming text.
  3. Determining syllables type, We can classify the syllables in Arabic either according to the length of the syllable or according to the end of the syllable.
  4. Converting syllables into diphones which is mapped into pre-recorded sounds.
  5. Concatenating diphones wave files into one file that will be played.


Friday, August 18, 2006

Concatenation Wave Files using C# 2005

Download source code - 63.1 Kb
The WAVE file format is a subset of Microsoft's RIFF specification for the storage of multimedia files. A RIFF file starts out with a file header followed by a sequence of data chunks. A WAVE file is often just a RIFF file with a single "WAVE" chunk which consists of two sub-chunks -- a "fmt " chunk specifying the data format and a "data" chunk containing the actual sample data. Call this form the "Canonical form".

The main idea is to create only one header for all wav files that want to concatenate and then write data of each file in single file.

more details in CodeProject
http://www.codeproject.com/useritems/Concatenation_Wave_Files.asp

References
http://ccrma.stanford.edu/courses/422/projects/WaveFormat/
http://www.sonicspot.com/guide/wavefiles.html
http://www.planetsourcecode.com/vb/scripts/ShowCode.asp?txtCodeId=31678&lngWId=1