Many studies on speech acoustics and production use articulatory synthesis as a
framework to investigate the relationship between articulatory gestures and acoustic features. Although supraglottal articulatory models are available, usually built from vocal tract imaging acquisitions, glottal gestures are commonly modeled with simple geometric primitives which do not necessarily reflect reality. This study is a first step towards the development of a database of realistic glottal gestures which will be used to design the glottal opening dynamics in articulatory synthesis paradigms. The experimental setup is presented measurements of glottal opening dynamics in VCV and VCCV sequences uttered by real subjects, thanks to a specifically designed external photoglottographic device (ePGG). The corpus was designed to highlight the differences in glottis opening between fricatives and stops.
The existence of different patterns of glottal opening is evidenced according to the class of the consonants.
The first experiments we carried out concerned the production of voiced and unvoiced fricatives [1] with respect to the glottis opening strategies. The measurements made on this occasion showed that the glottis opening is stronger for unvoiced fricatives than for all other sounds, except for breathing. Since the amount of light depends on the position of the larynx, we looked for a way to normalize the measurements. Glottal opening
for /asa/ presents the double advantage of being sufficiently large without being the greatest opening, and stable enough. During the construction of the corpus we therefore introduced /asa/ as a normalization sequence before and after each item. Each utterance is thus of the following form: /asa/ item /asa/ The corpus is therefore made
up of the following items:
- VCV where V is a cardinal vowel and C belongs to {p t k b d g f s S v z Z l m n K},
- aC 1 C 2 a where V is a cardinal vowel, C 1 belongs to {b d g} and C 2 to {l K},
- asCa where C belongs to {b d g p t k},
- geminated stops in /pap papa/ (« pape papa »), /pat tatue/ (« patte tatouée »), /sak kaKe/ (« sac carré »), /kKabbagaKœK/ (« crabe bagarreur »), /pad dat/ (« pas de date »), /blag gaKãti/ (« blague garantie »),
- 4 sentences of variable length.
This small corpus covers all the consonants, some of the most frequent clusters in French, especially those with /K/, geminated stops and some sentences. The corpus has been recorded by 3 female and male French speakers.
Figure 1: EPGG signal (bottom) and acoustic signal (top) for /asa aSa afa/ and zoom on /aSa/.
Figure 2 shows that the glottis opening is considerably smaller for unvoiced stops than unvoiced fricatives. The complete data confirms this trend. The first explanation is that the production of fricatives requires maintaining a turbulent flow throughout the duration of the fricative, and consequently a large opening to ensure a sufficient air flow. On the contrary, the production of unvoiced stops only requires stopping the vibration of the vocal folds and filling the cavity behind the constriction in order to achieve an overpressure with respect to atmospheric pressure.
The peak of the opening is reached approximately in the middle of the segment formed by the closure and burst, i.e. the moment during which there is no more voicing. On the 3 examples /papa kaka titi/ it can be noted that the opening is larger for /kaka/ than /papa/, and that it is bigger for /titi/ than /kaka/. This increase of the opening coincides with the existence of an increasingly intense frication noise following the release burst.
Overall, the same trend can be observed for /u/ and /i/ compared to /a/, i.e. a larger opening during the closure.
From the point of view of the larger glottis opening for the vowels /i/ and /u/, the tongue has to anticipate the position it will have after the occlusion is released. The cavity volume behind the constriction is therefore larger, and a larger opening is required to bring a sufficient amount of air. Compared to /t/ and /k/, /p/ does not impose any constraint on the tongue, which can therefore fully anticipate its position for the following vowel.
Since the vowel /i/, and to a lesser extent /u/, are characterized by a back cavity of a larger volume than /a/ it is therefore necessary to maintain a larger opening to allow enough air to enter. The volume of air behind the constriction, and the vocal tract shape also explains the duration of the global burst, i.e. from the transient noise corresponding to the release of the constriction to the first voiced period.
Figure 2. EPGG signal for /asa papa sa/, /asa kaka sa/ and /asa titi sa/