Wednesday, September 06, 2006

Voice Conversion

the key purpose of a voice conversion system is to transform the voice of one speaker into that of another speaker. Therefore, given two speakers, the goal for a voice conversion system is to determine how to design a transformation that makes the speech of the first speaker sounds as though it were uttered by the second speaker.

A general voice conversion system works as follows. The system analyzes the speech samples of two speakers. This involves collecting the voice characteristic of the original and desired (target) speakers. After learning the characteristics of each speaker, the system automatically creates a conversion rule from the original speaker's voice characteristics into those of the desired speaker. This conversion rule is applied to the original speech to create a converted speech that exhibits the target speaker's voice characteristics.


We can use this technology in many fields. In the speech synthesis fields, the output speech can be enriched using voice conversion. In addition, it can also be used in telephone voice translation, low-bit speech encoding, speaker adaptation, and so on. Voice conversion depends much on the research about voice individuality.

Research indicate that the factors relevant to voice individuality can be distributed into two types. One is acoustic parameter, such as pitch frequency, formant frequencies and bandwidths, which are reflected by the voice source and the vocal tract of different peoples. The other is prosodic parameter, such as the timing, rhythm, and pause of voice, which usually depend on the social conditions of different peoples.

The UPC Voice Conversion Toolkit