The objective is to synthesize speech from text via the numerical simulation of the human speech production processes, i.e. the articulatory, aerodynamic and acoustic aspects.
Corpus based approaches have taken a hegemonic place in text to speech synthesis.
They exploit very good acoustic quality speech databases while covering a high number of expressions and of phonetic contexts.
This is sufficient to produce intelligible speech. However, these approaches face almost insurmountable obstacles as soon as parameters intimately related to the physical process of speech production have to be modified. On the contrary,an approach which rests on the simulation of the physical speech production process makes explicitly use of source parameters, anatomy and geometry of the vocal tract, and of a temporal supervision strategy. It thus offers a direct control on the nature of the synthetic speech.
The project is organized in 5 work packages:

  1. Aerodynamic and acoustic simulations so as to produce a speech acoustic signal from the knowledge of the transversal area at any point of all the cavities of the vocal tract,
  2. Source and coordination scenarios so as to coordinate sources together with the temporal evolution of the vocal tract, which is crucial for the production of consonants in order to ensure their identification by human listeners,
  3. Supervision of the temporal evolution of the vocal tract geometry so as to anticipate the production of upcoming sounds and generate realistic articulatory gestures,
  4. Acquisition of speech production data essential to know the vocal fold activation, aerodynamic parameters, and the geometrical shape of the vocal tract (via MRI at a high sampling rate),
  5. General architecture to incorporate the different levels and synthesize an acoustic signal.

The consortium consists of four complementary research teams with leading international theoretical and practical experiences in the domains of:

  • aerodynamic and acoustic simulation of speech production, and modeling of the source and the geometry of the vocal tract,
  • magnetic resonance imaging and other acquisition techniques of speech production data.