Global ETD Search

1	Collaborative Communication Interruption Management System (C-CIMS): Modeling Interruption Timings via Prosodic and Topic Modelling for Human-Machine Teams Peters, Nia S. 01 December 2017 (has links) Human-machine teaming aims to meld human cognitive strengths and the unique capabilities of smart machines to create intelligent teams adaptive to rapidly changing circumstances. One major contributor to the problem of human-machine teaming is a lack of communication skills on the part of the machine. The primary objective of this research is focused on a machine’s interruption timings or when a machine should share and communicate information with human teammates within human-machine teaming interactions. Previous work addresses interruption timings from the perspective of single human, multitasking and multiple human, single task interactions. The primary aim of this dissertation is to augment this area by approaching the same problem from the perspective of a multiple human, multitasking interaction. The proposed machine is the Collaborative Communication Interruption Management System (C-CIMS) which is tasked with leveraging speech information from a human-human task and making inferences on when to interrupt with information related to an orthogonal human-machine task. This study and previous literature both suggest monitoring task boundaries and engagement as candidate moments of interruptibility within multiple human, multitasking interactions. The goal then becomes designing an intermediate step between human teammate communication and points of interruptibility within these interactions. The proposed intermediate step is the mapping of low-level speech information such as prosodic and lexical information onto higher constructs indicative of interruptibility. C-CIMS is composed of a Task Boundary Prosody Model, a Task Boundary Topic Model, and finally a Task Engagement Topic Model. Each of these components are evaluated separately in terms of how they perform within two different simulated human-machine teaming scenarios and the speed vs. accuracy tradeoffs as well as other limitations of each module. Overall the Task Boundary Prosody Model is tractable within a real-time system because of the low-latency in processing prosodic information, but is less accurate at predicting task boundaries even within human-machine interactions with simple dialogue. Conversely, the Task Boundary and Task Engagement Topic Models do well inferring task boundaries and engagement respectively, but are intractable in a real-time system because of the bottleneck in producing automatic speech recognition transcriptions to make interruption decisions. The overall contribution of this work is a novel approach to predicting interruptibility within human-machine teams by modeling higher constructs indicative of interruptibility using low-level speech information. human machine teaming interruption management system prosody modeling topic modeling
2	Génération de parole expressive dans le cas des langues à tons / Generation the expressive speech in case of tonal languages Mac, Dang Khoa 15 June 2012 (has links) De plus en plus, l'interaction entre personne et machine se rapproche du naturel afin de ressembler à l'interaction entre humains, incluant l'expressivité (en particulier les émotions et les attitudes). Dans la communication parlée, les attitudes, et plus généralement les affects sociaux, sont véhiculés principalement par la prosodie. Pour les langues tonales, la prosodie est utilisée aussi pour coder l'information sémantique dans les variations de tons. Ce travail de thèse présente une étude des affects sociaux du vietnamien, une langue à tons et une langue peu dotée, afin d'appliquer les résultats obtenus à un système de synthèse de haute qualité capable de produire la parole « expressive » pour le vietnamien. Le premier travail de cette thèse consiste en la construction du premier corpus audio-visuel des attitudes vietnamiennes, qui contient seize attitudes. Ce corpus est ensuite utilisé pour étudier la perception audio-visuelle et interculturelle des attitudes vietnamiennes. Pour cela, une série de tests perceptifs a été effectuée avec des auditeurs natifs et non-natifs (des auditeurs francophones pour les non-natifs). Les résultats de ces tests montrent que les facteurs influant sur la perception des attitudes sont l'expression de l'attitude elle-même et la modalité de présentation (audio, visuelle et audio-visuelle). Ces résultats nous ont ainsi permis de trouver des affects sociaux communs ou interculturels entre le vietnamien et le français. Puis, un autre test de perception a été réalisé sur des phrases avec tons afin d'explorer l'effet du système tonal du vietnamien sur la perception des attitudes. Les résultats montrent que les juges non-natifs peuvent traiter et séparer les indices tonals locaux et les traits saillants prosodiques de portée globale. Après une présentation de nos études sur les affects sociaux en vietnamien, nous décrivons notre modélisation de la prosodie des attitudes en vue de la synthèse de la parole expressive en vietnamien. En nous basant sur le modèle de superposition des contours fonctionnels, nous proposons une méthode pour modéliser et générer de la prosodie expressive en vietnamien. Cette méthode est ensuite appliquée pour générer de la parole expressive en vietnamien, puis évaluée par des tests de perception sur les énoncés synthétiques. Les résultats de perception valident bien la performance de notre modèle et confirment que l'approche de superposition de contours fonctionnels peut être utilisée pour modéliser une prosodie complexe comme dans le cas de la parole expressive d'une langue à tons. / Today, the human-computer interaction is reaching the naturalness and is increasingly similar to the human-human interaction, including the expressiveness (especially emotions and attitudes). In spoken communication, attitudes or social affects are mainly transferred through prosody. For tonal languages, prosody is also used to encode semantic information via tones. This thesis presents a study of social affects in Vietnamese, a tonal and under-resourced language, in order to apply the results to Vietnamese expressive speech synthesis task. The first task of this thesis concerns the construction of a first audio-visual corpus of Vietnamese attitudes which contains sixteen attitudes. This corpus is then used to study the audio-visual and intercultural perceptions of the Vietnamese attitudes. A series of perceptual tests was carried out with native and non-native listeners (French for non-native listeners). Experimental results reveal the fact that the influential factors on the perception of attitudes include the modality of presentation (audio, visual and audio-visual) and the attitudinal expression itself. These results also allow us to investigate the common specificities and cross-cultural specificities between Vietnamese and French attitudes. Another perception test was carried out using sentences with tonal variation to study the influence of Vietnamese tones on the perception of attitudes. The results show that non-native listeners can process the local prosodic cues of tones, together with the global cues of attitude patterns. After presenting our studies on Vietnamese social affects, we describe our work on attitude modelling to apply it to Vietnamese expressive speech synthesis. Based on the concept of prosodic contour superposition, a prosodic model was proposed to encode the attitudinal function of prosody for Vietnamese attitudes. This model was applied to generate the Vietnamese expressive speech and then evaluated in a perceptual experiment with synthetic utterances. The results validate the ability of applying our proposed model in generating the prosody of attitudes for a tonal language such as Vietnamese. Parole expressive Synthese de la parole Vietnamienne Affects sociaux Contours prosodiques Modélisation de la prosodie Expressive speech Speech synthesis Vietnamese Social affects Prosodic contours Prosody modeling

Search results

Collaborative Communication Interruption Management System (C-CIMS): Modeling Interruption Timings via Prosodic and Topic Modelling for Human-Machine Teams

Génération de parole expressive dans le cas des langues à tons / Generation the expressive speech in case of tonal languages