Spelling suggestions: "subject:"disfluencies."" "subject:"fluency.""
21 |
Recognizing emotions in spoken dialogue with acoustic and lexical cuesTian, Leimin January 2018 (has links)
Automatic emotion recognition has long been a focus of Affective Computing. It has become increasingly apparent that awareness of human emotions in Human-Computer Interaction (HCI) is crucial for advancing related technologies, such as dialogue systems. However, performance of current automatic emotion recognition is disappointing compared to human performance. Current research on emotion recognition in spoken dialogue focuses on identifying better feature representations and recognition models from a data-driven point of view. The goal of this thesis is to explore how incorporating prior knowledge of human emotion recognition in the automatic model can improve state-of-the-art performance of automatic emotion recognition in spoken dialogue. Specifically, we study this by proposing knowledge-inspired features representing occurrences of disfluency and non-verbal vocalisation in speech, and by building a multimodal recognition model that combines acoustic and lexical features in a knowledge-inspired hierarchical structure. In our study, emotions are represented with the Arousal, Expectancy, Power, and Valence emotion dimensions. We build unimodal and multimodal emotion recognition models to study the proposed features and modelling approach, and perform emotion recognition on both spontaneous and acted dialogue. Psycholinguistic studies have suggested that DISfluency and Non-verbal Vocalisation (DIS-NV) in dialogue is related to emotions. However, these affective cues in spoken dialogue are overlooked by current automatic emotion recognition research. Thus, we propose features for recognizing emotions in spoken dialogue which describe five types of DIS-NV in utterances, namely filled pause, filler, stutter, laughter, and audible breath. Our experiments show that this small set of features is predictive of emotions. Our DIS-NV features achieve better performance than benchmark acoustic and lexical features for recognizing all emotion dimensions in spontaneous dialogue. Consistent with Psycholinguistic studies, the DIS-NV features are especially predictive of the Expectancy dimension of emotion, which relates to speaker uncertainty. Our study illustrates the relationship between DIS-NVs and emotions in dialogue, which contributes to Psycholinguistic understanding of them as well. Note that our DIS-NV features are based on manual annotations, yet our long-term goal is to apply our emotion recognition model to HCI systems. Thus, we conduct preliminary experiments on automatic detection of DIS-NVs, and on using automatically detected DIS-NV features for emotion recognition. Our results show that DIS-NVs can be automatically detected from speech with stable accuracy, and auto-detected DIS-NV features remain predictive of emotions in spontaneous dialogue. This suggests that our emotion recognition model can be applied to a fully automatic system in the future, and holds the potential to improve the quality of emotional interaction in current HCI systems. To study the robustness of the DIS-NV features, we conduct cross-corpora experiments on both spontaneous and acted dialogue. We identify how dialogue type influences the performance of DIS-NV features and emotion recognition models. DIS-NVs contain additional information beyond acoustic characteristics or lexical contents. Thus, we study the gain of modality fusion for emotion recognition with the DIS-NV features. Previous work combines different feature sets by fusing modalities at the same level using two types of fusion strategies: Feature-Level (FL) fusion, which concatenates feature sets before recognition; and Decision-Level (DL) fusion, which makes the final decision based on outputs of all unimodal models. However, features from different modalities may describe data at different time scales or levels of abstraction. Moreover, Cognitive Science research indicates that when perceiving emotions, humans make use of information from different modalities at different cognitive levels and time steps. Therefore, we propose a HierarchicaL (HL) fusion strategy for multimodal emotion recognition, which incorporates features that describe data at a longer time interval or which are more abstract at higher levels of its knowledge-inspired hierarchy. Compared to FL and DL fusion, HL fusion incorporates both inter- and intra-modality differences. Our experiments show that HL fusion consistently outperforms FL and DL fusion on multimodal emotion recognition in both spontaneous and acted dialogue. The HL model combining our DIS-NV features with benchmark acoustic and lexical features improves current performance of multimodal emotion recognition in spoken dialogue. To study how other emotion-related tasks of spoken dialogue can benefit from the proposed approaches, we apply the DIS-NV features and the HL fusion strategy to recognize movie-induced emotions. Our experiments show that although designed for recognizing emotions in spoken dialogue, DIS-NV features and HL fusion remain effective for recognizing movie-induced emotions. This suggests that other emotion-related tasks can also benefit from the proposed features and model structure.
|
22 |
Hesitações na fala semi-espontanea : analise por series temporais / Hesitation phenomena in semi-spontaneous sppech : a time series analysisMerlo, Sandra, 1979- 20 February 2006 (has links)
Orientador: Plinio Almeida Barbosa / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-06T09:56:13Z (GMT). No. of bitstreams: 1
Merlo_Sandra_M.pdf: 2584021 bytes, checksum: 6d29aa46ae01697da804147e4b44a6e1 (MD5)
Previous issue date: 2006 / Resumo: O comportamento temporal das hesitações na fala semi-espontânea é o tema desta pesquisa experimental. Investigou-se a possibilidade de as hesitações apresentarem-se periodicamente em textos falados e suas relações com tipos textuais, apoio visual e conhecimento declarativo. Participaram do estudo cinco adultos jovens, do gênero masculino, universitários, falantes nativos do português brasileiro e sem distúrbios de comunicação. Cada sujeito produziu quatro textos: descrição de estado a partir da figura de um quarto, descrição de estado sobre seu próprio quarto, narrativa de um cartoon e narrativa de uma situação vivida. Pausas silenciosas hesitativas, pausas preenchidas, repetições hesitativas, prolongamentos hesitativos e falsos inícios (corrigidos e abandonados) foram considerados como marcas de hesitação; em contrapartida, pausas silenciosas fluentes, repetições fluentes ou reformuladoras, prolongamentos fluentes ou reformuladores, paráfrases, correções e marcadores discursivos não foram considerados exemplares de hesitação. Os textos foram transcritos, separando-se os intervalos que apresentavam hesitações daqueles que não apresentavam. Os intervalos de hesitação receberam o número ¿0¿ e os intervalos de não-hesitação receberam o número ¿1¿. A codificação numérica foi amostrada a cada 200 milissegundos para que as séries temporais fossem construídas. A estatística descritiva indicou que os intervalos de hesitação satisfizeram a hipótese nula da distribuição gama, apresentando média e mediana em torno de 1 segundo, mínimo de 120 milissegundos e máximo de 5 segundos. Em relação à duração textual, a média e a mediana de hesitação estiveram em torno de 20%. A análise espectral demonstrou a existência de periodicidades de hesitação em todos os textos analisados, com média e mediana em torno de 10 segundos, mínimo de 2 segundos e máximo de 78 segundos. A organização periódica indica que a hesitação não é um fenômeno aleatório temporalmente, porque suas oscilações se repetem ao longo do tempo, o que aponta para um fenômeno estável dinamicamente e que pode ser antecipado. Em geral, os textos apresentaram mais de uma periodicidade, as quais foram atribuídas ao macroplanejamento, microplanejamento, codificador gramatical e codificador fonológico; nenhuma periodicidade foi atribuída à articulação. A atribuição de operações lingüístico-cognitivas como mecanismos geradores das periodicidades reforça a noção de que as hesitações são propriedades do processamento em curso. A presença de mais de uma periodicidade no mesmo texto sugere que o processamento da língua falada na memória operacional ocorre em paralelo, com os recursos sendo compartilhados por diferentes operações lingüístico-cognitivas. A estatística não-paramétrica não indicou diferença significativa quando as periodicidades foram comparadas em relação ao tipo textual (descrições versus narrativas), presença ou ausência de apoio visual (descrição de figura e narrativa de cartoon versus descrição e narrativa pessoais) e tipo de conhecimento declarativo envolvido (conhecimento semântico na descrição de figura, descrição pessoal e narrativa de cartoon versus conhecimento episódico na narrativa pessoal), sugerindo que as hesitações também são uma propriedade do locutor e não apenas do processamento em curso / Abstract: The focus of this experimental research is the temporal behavior of hesitation phenomena in semi-spontaneous speech. The possibility that hesitation phenomena occur periodically in spoken texts and their relations with text types, picture support and declarative knowledge were examined. The subjects were five young male adults, university students, native speakers of Brazilian Portuguese with no history of communication impairments. Each subject has produced four texts: state description from a bedroom picture, state description of his own bedroom, narrative from a cartoon and narrative about an experienced event. Hesitation phenomena were classified as silent hesitation pauses, filled pauses, hesitative repetitions, hesitative prolongations and false starts (retraced and unretraced); signs that were not considered as hesitation phenomena include fluent silent pauses, fluent or reformulative repetitions, fluent or reformulative prolongations, paraphrases, corrections and discourse markers. The texts were transcribed and the intervals with and without hesitation phenomena were distinguished. Hesitation phenomena intervals received the number ¿0¿ and non-hesitation intervals received the number ¿1¿. The number codes were sampled at intervals of 200 milliseconds to generate time series. Descriptive statistics indicated that the duration of hesitation phenomena intervals fulfilled the null hypothesis of gamma distribution with mean and median around 1 second, minimum of 120 milliseconds and maximum of 5 seconds. Concerning total text duration, mean and median of hesitation phenomena were around 20%. Spectral analysis detected the existence of hesitation phenomena periodicities in all texts, with mean and median around 10 seconds, minimum of 2 seconds and maximum of 78 seconds. The periodic organization supports the notion that hesitation phenomena do not occur temporally by chance, because their oscillations repeat through time, what signals to dynamically stable phenomena that can be anticipated. The texts usually presented more than one periodicity, which were regarded as belonging to macroplanning, microplanning, grammatical encoder and phonological encoder; no periodicity was regarded as belonging to articulation. The suggestion that linguistic-cognitive processes are the basis of the observed periodicities support the notion of hesitation phenomena as a characteristic of current processing. The presence of more than one periodicity in the same text suggest that spoken language is processed in parallel by working memory with resources being shared by different linguistic-cognitive processes at the same time. Non-parametric statistics did not indicate significant differences when periodicities were compared with regard to text type (descriptive versus narrative texts), presence or absence of picture support (picture description and cartoon narrative versus personal description and personal narrative) and declarative knowledge type (semantic knowledge in picture description, personal description and cartoon narrative versus episodic knowledge in personal narrative), suggesting that hesitation phenomena are also a characteristic of speaker and not just a characteristic of current processing / Mestrado / Fonetica e Fonologia / Mestre em Linguística
|
23 |
Marcadores metadiscursivos, fluidez y participación conversacional en español L2 : La evolución de la competencia comunicativa durante la estancia en una comunidad de la lengua meta / Metadiscourse markers, fluency and conversational participation in L2 Spanish : The development of communicative competence during the stay in a target language communityLindqvist, Helena January 2017 (has links)
This study investigates the acquisition and use of metadiscourse markers in learners/users of L2 Spanish and the role these markers play in the development of fluency and conversational participation during a five-month stay in Spain as exchange students of business administration. The study has been conducted in three steps. The first part focuses on the theory and categorization of metadiscourse markers, followed by an analysis of the use and development of these markers in learners of L2 Spanish. The second part deals with the categorization and operationalization of aspects of fluency and conversational participation that can be associated with the use of metadiscourse markers; followed by an analysis of these aspects in the performance of the learners. The third part of the study is a summary of the results obtained and a discussion of the relationship between the use of metadiscourse markers and the development of fluency and conversational participation. The data underlying the current study consists of a selection of 17 recorded conversations between learners of L2 Spanish and native speakers of Spanish taken from the AKSAM database. The conversations belong in two activity types: discussions and simulated negotiations. The selected sample has a duration of approx. 10 hours and comprises 87 683 words. The study focuses on nine learners who have been recorded at the beginning and at the end of their five month study-abroad stay. Results show that frequency of use of metadiscourse markers has increased considerably at the end of the stay in the majority of the learners under study. A qualitative development can also be found, through which the metadiscourse markers that characterize the learners’ L1 and/or interlanguage have been substituted by more target-like expressions. Furthermore, both their fluency and level of conversational participation have generally increased. Within this development, however, a notable individual variation can be found. The learners who show the strongest development as regards fluency and conversational participation are also found to exhibit the most salient development of metadiscourse markers. Since disfluency is reduced to a lesser degree in those participants who also exhibit a less developed use of metadiscourse markers, it is argued that the development of metadiscourse markers in the L2 learner runs parallel to the development of discourse skills, but also that acquiring an adequate use of metadiscourse markers helps developing these skills.
|
24 |
Examining the Influence of Disfluencies on Reaction Time : An Exploratory Study Investigating the Impact of Entropy in LanguageJansson, Alexander January 2023 (has links)
The current study aimed to investigate the effects of disfluencies, specifically filled pauses (FP) and unfilled pauses (UP), on reaction time (RT) to target words and hyponyms of target hypernyms(targets = target words + hyponyms of target hypernyms). Two experiments were conducted, withthe first experiment examining the impact of disfluencies on RTs to target words in utterances, whilethe second experiment explored their effect on hyponyms of target hypernyms. The experiments wereconducted on six participants, comprising two females and four males, following a within-subjectdesign.Unlike previous studies that have examined disfluencies in the Swedish language, this study dis-tinguished between two categories of filled pauses, namely ehm (E:m, @:m, œ:m, or æ:m) and öh (E:,@:, œ:, or æ:). However, the results revealed that this distinction had no significant effect on RTs.Conversely, a significant difference in RTs was observed between genders, with women exhibitingfaster reaction times compared to men. Participants generally reacted more swiftly to target wordsthan to hyponyms of target hypernyms.Interestingly, filled pauses were found to reduce reaction times to target hypernyms compared tounfilled pauses. However, they did not demonstrate a similar effect on reaction times to target words.Caution should be exercised in interpreting these results due to the limited sample size. Nevertheless,these findings have intriguing implications for the entropy hypothesis (EH) and attention-heighteninghypothesis (AHH) concerning filled pause production.
|
25 |
Analyse et détection automatique de disfluences dans la parole spontanée conversationnelle / Disfluency analysis and automatic detection in conversational spontaneous speechDutrey, Camille 16 December 2014 (has links)
Extraire de l'information de données langagières est un sujet de plus en plus d'actualité compte tenude la quantité toujours croissante d'information qui doit être régulièrement traitée et analysée, etnous assistons depuis les années 90 à l'essor des recherches sur des données de parole également. Laparole pose des problèmes supplémentaires par rapport à l'écrit, notamment du fait de la présence dephénomènes propres à l'oral (hésitations, reprises, corrections) mais aussi parce que les donnéesorales sont traitées par un système de reconnaissance automatique de la parole qui génèrepotentiellement des erreurs. Ainsi, extraire de l'information de données audio implique d'extraire del'information tout en tenant compte du « bruit » intrinsèque à l'oral ou généré par le système dereconnaissance de la parole. Il ne peut donc s'agir d'une simple application de méthodes qui ont faitleurs preuves sur de l'écrit. L'utilisation de techniques adaptées au traitement des données issues del'oral et prenant en compte à la fois leurs spécificités liées au signal de parole et à la transcription –manuelle comme automatique – de ce dernier représente un thème de recherche en pleindéveloppement et qui soulève de nouveaux défis scientifiques. Ces défis sont liés à la gestion de lavariabilité dans la parole et des modes d'expressions spontanés. Par ailleurs, l'analyse robuste deconversations téléphoniques a également fait l'objet d'un certain nombre de travaux dans lacontinuité desquels s'inscrivent ces travaux de thèse.Cette thèse porte plus spécifiquement sur l'analyse des disfluences et de leur réalisation dans desdonnées conversationnelles issues des centres d'appels EDF, à partir du signal de parole et destranscriptions manuelle et automatique de ce dernier. Ce travail convoque différents domaines, del'analyse robuste de données issues de la parole à l'analyse et la gestion des aspects liés àl'expression orale. L'objectif de la thèse est de proposer des méthodes adaptées à ces données, quipermettent d'améliorer les analyses de fouille de texte réalisées sur les transcriptions (traitement desdisfluences). Pour répondre à ces problématiques, nous avons analysé finement le comportement dephénomènes caractéristiques de l'oral spontané (disfluences) dans des données oralesconversationnelles issues de centres d'appels EDF, et nous avons mis au point une méthodeautomatique pour leur détection, en utilisant des indices linguistiques, acoustico-prosodiques,discursifs et para-linguistiques.Les apports de cette thèse s'articulent donc selon trois axes de recherche. Premièrement, nousproposons une caractérisation des conversations en centres d'appels du point de vue de l'oralspontané et des phénomènes qui le caractérisent. Deuxièmement, nous avons mis au point (i) unechaîne d'enrichissement et de traitement des données orales effective sur plusieurs plans d'analyse(linguistique, prosodique, discursif, para-linguistique) ; (ii) un système de détection automatique desdisfluences d'édition adapté aux données orales conversationnelles, utilisant le signal et lestranscriptions (manuelles ou automatiques). Troisièmement, d'un point de vue « ressource », nousavons produit un corpus de transcriptions automatiques de conversations issues de centres d'appelsannoté en disfluences d'édition (méthode semi-automatique). / Extracting information from linguistic data has gain more and more attention in the last decades inrelation with the increasing amount of information that has to be processed on a daily basis in the world. Since the 90’s, this interest for information extraction has converged to the development of researches on speech data. In fact, speech data involves extra problems to those encountered on written data. In particular, due to many phenomena specific to human speech (e.g. hesitations, corrections, etc.). But also, because automatic speech recognition systems applied on speech signal potentially generates errors. Thus, extracting information from audio data requires to extract information by taking into account the "noise" inherent to audio data and output of automatic systems. Thus, extracting information from speech data cannot be as simple as a combination of methods that have proven themselves to solve the extraction information task on written data. It comes that, the use of technics dedicated for speech/audio data processing is mandatory, and epsecially technics which take into account the specificites of such data in relation with the corresponding signal and transcriptions (manual and automatic). This problem has given birth to a new area of research and raised new scientific challenges related to the management of the variability of speech and its spontaneous modes of expressions. Furthermore, robust analysis of phone conversations is subject to a large number of works this thesis is in the continuity.More specifically, this thesis focuses on edit disfluencies analysis and their realisation in conversational data from EDF call centres, using speech signal and both manual and automatic transcriptions. This work is linked to numerous domains, from robust analysis of speech data to analysis and management of aspects related to speech expression. The aim of the thesis is to propose appropriate methods to deal with speech data to improve text mining analyses of speech transcriptions (treatment of disfluencies). To address these issues, we have finely analysed the characteristic phenomena and behavior of spontaneous speech (disfluencies) in conversational data from EDF call centres and developed an automatic method for their detection using linguistic, prosodic, discursive and para-linguistic features.The contributions of this thesis are structured in three areas of research. First, we proposed a specification of call centre conversations from the prespective of the spontaneous speech and from the phenomena that specify it. Second, we developed (i) an enrichment chain and effective processings of speech data on several levels of analysis (linguistic, acoustic-prosodic, discursive and para-linguistic) ; (ii) an system which detect automaticcaly the edit disfluencies suitable for conversational data and based on the speech signal and transcriptions (manual or automatic). Third, from a "resource" point of view, we produced a corpus of automatic transcriptions of conversations taken from call centres which has been annotated in edition disfluencies (using a semi-automatic method).
|
26 |
Infusing Dysfluency into Rhetoric and Composition: Overcoming the StutterMeyer, Craig A. 25 September 2013 (has links)
No description available.
|
Page generated in 0.0564 seconds