Spelling suggestions: "subject:"conversation speech""
1 |
Fonologisk bedömning baserad på bildbenämning jämfört med spontantal av barn med fonologisk språkstörningJohansson, Siri, Lethagen, Elin January 2012 (has links)
In a phonological assessment, the aim is to obtain systematic and reliable data of a child's speech output, which can then serve as a basis for a decision on an appropriate intervention (Wolk & Meisler, 1998). Whether phonological assessment should be derived from an analysis of picture-naming or a conversation with the child, and whether the two methods for elicitation generate equivalent results, has been debated among clinicians and researchers for an extended time (Masterson, Bernhardt & Hofheintz, 2005). The aim of the present study was to compare two methods of speech elicitation for phonological assessment: spontaneous conversation and picture-naming, respectively. In the study, the procedures have been used when assessing children with phonological disorders as well as children with typical language development. The results are presented using two levels of phonological analysis; degree of phonological impairment, in terms of percentage of phonemes correct (PPC), and type of phonological impairment, in terms of phonological simplification processes. Eighteen (18) children participated in the study, nine (9) with phonological impairment (age 3;10 – 5;11), and nine with typical phonologic development (age 3;2 – 4;6). No significant differences were found regarding the percentage of phonemes correct between the two elicitation methods, neither for the group of children with phonological impairment, nor for the group of children with typical phonological development. Thus, the degree of speech difficulties was the same regardless of elicitation method. In assessing the type of impairment, however, a comparison between the sensitivity and the specificity obtained in the two tests indicate that there is a difference in how well the two elicitation methods intercept the phonological simplification processes. In the two elicitating methods, exactly the same processes could not be found in the speech of any child. The discussion includes the consequences of word structure, position and context of phonemes in the different speech samples. Furthermore, advantages and disadvantages of using the different elicitation methods in phonological assessment are discussed. The present study contributes to an increased knowledge about the ability to capture phonological problems sing picture-naming and conversational speech samples, respectively, in assessing a child’s speech. Furthermore, the study presents input to the on-going debate on phonological assessment, and may contribute to reflectance when selecting a clinical assessment tool.
|
2 |
Towards robust conversational speech recognition and understandingWeng, Chao 12 January 2015 (has links)
While significant progress has been made in automatic speech recognition (ASR) during the last few decades, recognizing and understanding unconstrained conversational speech remains a challenging problem. In this dissertation, five methods/systems are proposed towards a robust conversational speech recognition and understanding system.
I. A non-uniform minimum classification error (MCE) approach is proposed which can achieve consistent and significant keyword spotting performance gains on both English and Mandarin large-scale spontaneous conversational speech tasks (Switchboard and HKUST Mandarin CTS).
II. A hybrid recurrent DNN-HMM system is proposed for robust acoustic modeling and a new way of backpropagation through time (BPTT) is introduced. The proposed system achieves state-of-the-art performances on two benchmark datasets, the 2nd CHiME challenge (track 2) and Aurora-4, without front-end preprocessing, speaker adaptive training or multiple decoding passes.
III. To study the specific case of conversational speech recognition in the presence of competing talkers, several multi-style training setups of DNNs are investigated and a joint decoder operating on multi-talker speech is introduced. The proposed combined system improves upon the previous state-of-the-art IBM superhuman system by 2.8% absolute on the 2006 speech separation challenge dataset.
IV. Latent semantic rational kernels (LSRKs) are proposed for spotting the semantic notions on conversational speech. The proposed framework is generalized using tf-idf weighting, latent semantic analysis, WordNet, probabilistic topic models and neural network learned representations and is shown to achieve substantial topic spotting performance gains on two conversational speech tasks, Switchboard and AT&T HMIHY initial collection.
V. Non-uniform sequential discriminative training (DT) of DNNs with LSRKs is proposed which directly links the information of the proposed LSRK framework to the objective function of the DT. The experimental results on the subset of Switchboard show the proposed method can lead the acoustic modeling to a more robust system with respect to the semantic decoder.
|
3 |
Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous SpeechTyson, Na'im R. 26 June 2012 (has links)
No description available.
|
4 |
Analyse et détection automatique de disfluences dans la parole spontanée conversationnelle / Disfluency analysis and automatic detection in conversational spontaneous speechDutrey, Camille 16 December 2014 (has links)
Extraire de l'information de données langagières est un sujet de plus en plus d'actualité compte tenude la quantité toujours croissante d'information qui doit être régulièrement traitée et analysée, etnous assistons depuis les années 90 à l'essor des recherches sur des données de parole également. Laparole pose des problèmes supplémentaires par rapport à l'écrit, notamment du fait de la présence dephénomènes propres à l'oral (hésitations, reprises, corrections) mais aussi parce que les donnéesorales sont traitées par un système de reconnaissance automatique de la parole qui génèrepotentiellement des erreurs. Ainsi, extraire de l'information de données audio implique d'extraire del'information tout en tenant compte du « bruit » intrinsèque à l'oral ou généré par le système dereconnaissance de la parole. Il ne peut donc s'agir d'une simple application de méthodes qui ont faitleurs preuves sur de l'écrit. L'utilisation de techniques adaptées au traitement des données issues del'oral et prenant en compte à la fois leurs spécificités liées au signal de parole et à la transcription –manuelle comme automatique – de ce dernier représente un thème de recherche en pleindéveloppement et qui soulève de nouveaux défis scientifiques. Ces défis sont liés à la gestion de lavariabilité dans la parole et des modes d'expressions spontanés. Par ailleurs, l'analyse robuste deconversations téléphoniques a également fait l'objet d'un certain nombre de travaux dans lacontinuité desquels s'inscrivent ces travaux de thèse.Cette thèse porte plus spécifiquement sur l'analyse des disfluences et de leur réalisation dans desdonnées conversationnelles issues des centres d'appels EDF, à partir du signal de parole et destranscriptions manuelle et automatique de ce dernier. Ce travail convoque différents domaines, del'analyse robuste de données issues de la parole à l'analyse et la gestion des aspects liés àl'expression orale. L'objectif de la thèse est de proposer des méthodes adaptées à ces données, quipermettent d'améliorer les analyses de fouille de texte réalisées sur les transcriptions (traitement desdisfluences). Pour répondre à ces problématiques, nous avons analysé finement le comportement dephénomènes caractéristiques de l'oral spontané (disfluences) dans des données oralesconversationnelles issues de centres d'appels EDF, et nous avons mis au point une méthodeautomatique pour leur détection, en utilisant des indices linguistiques, acoustico-prosodiques,discursifs et para-linguistiques.Les apports de cette thèse s'articulent donc selon trois axes de recherche. Premièrement, nousproposons une caractérisation des conversations en centres d'appels du point de vue de l'oralspontané et des phénomènes qui le caractérisent. Deuxièmement, nous avons mis au point (i) unechaîne d'enrichissement et de traitement des données orales effective sur plusieurs plans d'analyse(linguistique, prosodique, discursif, para-linguistique) ; (ii) un système de détection automatique desdisfluences d'édition adapté aux données orales conversationnelles, utilisant le signal et lestranscriptions (manuelles ou automatiques). Troisièmement, d'un point de vue « ressource », nousavons produit un corpus de transcriptions automatiques de conversations issues de centres d'appelsannoté en disfluences d'édition (méthode semi-automatique). / Extracting information from linguistic data has gain more and more attention in the last decades inrelation with the increasing amount of information that has to be processed on a daily basis in the world. Since the 90’s, this interest for information extraction has converged to the development of researches on speech data. In fact, speech data involves extra problems to those encountered on written data. In particular, due to many phenomena specific to human speech (e.g. hesitations, corrections, etc.). But also, because automatic speech recognition systems applied on speech signal potentially generates errors. Thus, extracting information from audio data requires to extract information by taking into account the "noise" inherent to audio data and output of automatic systems. Thus, extracting information from speech data cannot be as simple as a combination of methods that have proven themselves to solve the extraction information task on written data. It comes that, the use of technics dedicated for speech/audio data processing is mandatory, and epsecially technics which take into account the specificites of such data in relation with the corresponding signal and transcriptions (manual and automatic). This problem has given birth to a new area of research and raised new scientific challenges related to the management of the variability of speech and its spontaneous modes of expressions. Furthermore, robust analysis of phone conversations is subject to a large number of works this thesis is in the continuity.More specifically, this thesis focuses on edit disfluencies analysis and their realisation in conversational data from EDF call centres, using speech signal and both manual and automatic transcriptions. This work is linked to numerous domains, from robust analysis of speech data to analysis and management of aspects related to speech expression. The aim of the thesis is to propose appropriate methods to deal with speech data to improve text mining analyses of speech transcriptions (treatment of disfluencies). To address these issues, we have finely analysed the characteristic phenomena and behavior of spontaneous speech (disfluencies) in conversational data from EDF call centres and developed an automatic method for their detection using linguistic, prosodic, discursive and para-linguistic features.The contributions of this thesis are structured in three areas of research. First, we proposed a specification of call centre conversations from the prespective of the spontaneous speech and from the phenomena that specify it. Second, we developed (i) an enrichment chain and effective processings of speech data on several levels of analysis (linguistic, acoustic-prosodic, discursive and para-linguistic) ; (ii) an system which detect automaticcaly the edit disfluencies suitable for conversational data and based on the speech signal and transcriptions (manual or automatic). Third, from a "resource" point of view, we produced a corpus of automatic transcriptions of conversations taken from call centres which has been annotated in edition disfluencies (using a semi-automatic method).
|
5 |
The Aerodynamic, Glottographic, and Acoustic Effects of Clear Speech.Tahamtan, Mahdi 06 September 2022 (has links)
No description available.
|
Page generated in 0.0908 seconds