Spelling suggestions: "subject:"texttospeech atemsystem"" "subject:"texttospeech systsystem""
1 |
A text to speech synthesis system for MalteseMicallef, Paul January 1997 (has links)
The subject of this thesis covers a considerably varied multidisciplinary area which needs to be addressed to be able to achieve a text-to-speech synthesis system of high quality, in any language. This is the first time that such a system has been built for Maltese, and therefore, there was the additional problem of no computerised sources or corpora. However many problems and much of the system designs are common to all languages. This thesis focuses on two general problems. The first is that of automatic labelling of phonemic data, since this is crucial for the setting up of Maltese speech corpora, which in turn can be used to improve the system. A novel way of achieving such automatic segmentation was investigated. This uses a mixed parameter model with maximum likelihood training of the first derivative of the features across a set of phonetic class boundaries. It was found that this gives good results even for continuous speech provided that a phonemic labelling of the text is available. A second general problem is that of segment concatenation, since the end and beginning of subsequent diphones can have mismatches in amplitude, frequency, phase and spectral envelope. The use of-intermediate frames, build up from the last and first frames of two concatenated diphones, to achieve a smoother continuity was analysed. The analysis was done both in time and in frequency. The use of wavelet theory for the separation of the spectral envelope from the excitation was also investigated. The linguistic system modules have been built for this thesis. In particular a rule based grapheme to phoneme conversion system that is serial and not hierarchical was developed. The morphological analysis required the design of a system which allowed two dissimilar lexical structures, (semitic and romance) to be integrated into one overall morphological analyser. Appendices at the back are included with detailed rules of the linguistic modules developed. The present system, while giving satisfactory intelligibility, with capability of modifying duration, does not include as yet a prosodic module.
|
2 |
Prosody modelling for a Sesotho text-to-speech system using the Fujisaki modelMohasi, Lehlohonolo 03 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2015. / ENGLISH ABSTRACT:
Please refer to full text for abstract.
|
3 |
A Design of Multi-session Text-independent Digital Camcorder Audio-Video Database for Speaker RecognitionChen, Chun-chi 05 September 2008 (has links)
In this thesis, an audio-video database for speaker recognition is constructed using a digital camcorder. Motion pictures of fifteen hundred speakers are recorded in three different sessions in the database. For each speaker, 20 still images per session are also derived from the video data. It is hoped that this database can provide an appropriate training and testing mechanism for person identification using both voice and face features.
|
Page generated in 0.0295 seconds