11 |
Revisiting user simulation in dialogue systems : do we still need them ? : will imitation play the role of simulation ?Chandramohan, Senthilkumar 25 September 2012 (has links) (PDF)
Recent advancements in the area of spoken language processing and the wide acceptance of portable devices, have attracted signicant interest in spoken dialogue systems.These conversational systems are man-machine interfaces which use natural language (speech) as the medium of interaction.In order to conduct dialogues, computers must have the ability to decide when and what information has to be exchanged with the users. The dialogue management module is responsible to make these decisions so that the intended task (such as ticket booking or appointment scheduling) can be achieved.Thus learning a good strategy for dialogue management is a critical task.In recent years reinforcement learning-based dialogue management optimization has evolved to be the state-of-the-art. A majority of the algorithms used for this purpose needs vast amounts of training data.However, data generation in the dialogue domain is an expensive and time consuming process. In order to cope with this and also to evaluatethe learnt dialogue strategies, user modelling in dialogue systems was introduced. These models simulate real users in order to generate synthetic data.Being computational models, they introduce some degree of modelling errors. In spite of this, system designers are forced to employ user models due to the data requirement of conventional reinforcement learning algorithms can learn optimal dialogue strategies from limited amount of training data when compared to the conventional algorithms. As a consequence of this, user models are no longer required for the purpose of optimization, yet they continue to provide a fast and easy means for quantifying the quality of dialogue strategies. Since existing methods for user modelling are relatively less realistic compared to real user behaviors, the focus is shifted towards user modelling by means of inverse reinforcement learning. Using experimental results, the proposed method's ability to learn a computational models with real user like qualities is showcased as part of this work.
|
12 |
Open-ended Spoken Language Technology: Studies on Spoken Dialogue Systems and Spoken Document Retrieval Systems / 拡張可能な音声言語技術: 音声対話システムと音声文書検索システムにおける研究Kanda, Naoyuki 24 March 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18415号 / 情博第530号 / 新制||情||94(附属図書館) / 31273 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 奥乃 博, 教授 河原 達也, 教授 髙木 直史, 講師 吉井 和佳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
13 |
Engagement Recognition based on Multimodal Behaviors for Human-Robot Dialogue / ロボットとの対話におけるマルチモーダルなふるまいに基づくエンゲージメント認識 / # ja-KanaInoue, Koji 25 September 2018 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21392号 / 情博第678号 / 新制||情||117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 西田 豊明, 教授 神田 崇行 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
14 |
Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems / Apprentissage automatique des paramètres de l'apprentissage par renforcement pour les systèmes de dialogues adaptatifsAsri, Layla El 21 January 2016 (has links)
Cette thèse s’inscrit dans le cadre de la recherche sur les systèmes de dialogue. Ce document propose d’apprendre le comportement d’un système à partir d’un ensemble de dialogues annotés. Le système apprend un comportement optimal via l’apprentissage par renforcement. Nous montrons qu’il n’est pas nécessaire de définir une représentation de l’espace d’état ni une fonction de récompense. En effet, ces deux paramètres peuvent être appris à partir du corpus de dialogues annotés. Nous montrons qu’il est possible pour un développeur de systèmes de dialogue d’optimiser la gestion du dialogue en définissant seulement la logique du dialogue ainsi qu’un critère à maximiser (par exemple, la satisfaction utilisateur). La première étape de la méthodologie que nous proposons consiste à prendre en compte un certain nombre de paramètres de dialogue afin de construire une représentation de l’espace d’état permettant d’optimiser le critère spécifié par le développeur. Par exemple, si le critère choisi est la satisfaction utilisateur, il est alors important d’inclure dans la représentation des paramètres tels que la durée du dialogue et le score de confiance de la reconnaissance vocale. L’espace d’état est modélisé par une mémoire sparse distribuée. Notre modèle, Genetic Sparse Distributed Memory for Reinforcement Learning (GSDMRL), permet de prendre en compte de nombreux paramètres de dialogue et de sélectionner ceux qui sont importants pour l’apprentissage par évolution génétique. L’espace d’état résultant ainsi que le comportement appris par le système sont aisément interprétables. Dans un second temps, les dialogues annotés servent à apprendre une fonction de récompense qui apprend au système à optimiser le critère donné par le développeur. A cet effet, nous proposons deux algorithmes, reward shaping et distance minimisation. Ces deux méthodes interprètent le critère à optimiser comme étant la récompense globale pour chaque dialogue. Nous comparons ces deux fonctions sur un ensemble de dialogues simulés et nous montrons que l’apprentissage est plus rapide avec ces fonctions qu’en utilisant directement le critère comme récompense finale. Nous avons développé un système de dialogue dédié à la prise de rendez-vous et nous avons collecté un corpus de dialogues annotés avec ce système. Ce corpus permet d’illustrer la capacité de mise à l’échelle de la représentation de l’espace d’état GSDMRL et constitue un bon exemple de système industriel sur lequel la méthodologie que nous proposons pourrait être appliquée / This document proposes to learn the behaviour of the dialogue manager of a spoken dialogue system from a set of rated dialogues. This learning is performed through reinforcement learning. Our method does not require the definition of a representation of the state space nor a reward function. These two high-level parameters are learnt from the corpus of rated dialogues. It is shown that the spoken dialogue designer can optimise dialogue management by simply defining the dialogue logic and a criterion to maximise (e.g user satisfaction). The methodology suggested in this thesis first considers the dialogue parameters that are necessary to compute a representation of the state space relevant for the criterion to be maximized. For instance, if the chosen criterion is user satisfaction then it is important to account for parameters such as dialogue duration and the average speech recognition confidence score. The state space is represented as a sparse distributed memory. The Genetic Sparse Distributed Memory for Reinforcement Learning (GSDMRL) accommodates many dialogue parameters and selects the parameters which are the most important for learning through genetic evolution. The resulting state space and the policy learnt on it are easily interpretable by the system designer. Secondly, the rated dialogues are used to learn a reward function which teaches the system to optimise the criterion. Two algorithms, reward shaping and distance minimisation are proposed to learn the reward function. These two algorithms consider the criterion to be the return for the entire dialogue. These functions are discussed and compared on simulated dialogues and it is shown that the resulting functions enable faster learning than using the criterion directly as the final reward. A spoken dialogue system for appointment scheduling was designed during this thesis, based on previous systems, and a corpus of rated dialogues with this system were collected. This corpus illustrates the scaling capability of the state space representation and is a good example of an industrial spoken dialogue system upon which the methodology could be applied
|
15 |
Nové metody generování promluv v dialogových systémech / Novel Methods for Natural Language Generation in Spoken Dialogue SystemsDušek, Ondřej January 2017 (has links)
Title: Novel Methods for Natural Language Generation in Spoken Dialogue Systems Author: Ondřej Dušek Department: Institute of Formal and Applied Linguistics Supervisor: Ing. Mgr. Filip Jurčíček, Ph.D., Institute of Formal and Applied Linguistics Abstract: This thesis explores novel approaches to natural language generation (NLG) in spoken dialogue systems (i.e., generating system responses to be presented the user), aiming at simplifying adaptivity of NLG in three respects: domain portability, language portability, and user-adaptive outputs. Our generators improve over state-of-the-art in all of them: First, our gen- erators, which are based on statistical methods (A* search with perceptron ranking and sequence-to-sequence recurrent neural network architectures), can be trained on data without fine-grained semantic alignments, thus simplifying the process of retraining the generator for a new domain in comparison to previous approaches. Second, we enhance the neural-network-based gener- ator so that it takes preceding dialogue context into account (i.e., user's way of speaking), thus producing user-adaptive outputs. Third, we evaluate sev- eral extensions to the neural-network-based generator designed for producing output in morphologically rich languages, showing improvements in Czech generation. In...
|
16 |
Context-dependent voice commands in spoken dialogue systems for home environments : A study on the effect of introducing context-dependent voice commands to a spoken dialogue system for home environmentsDahlgren, Karl January 2013 (has links)
This thesis aims to investigate the eect context could have to interaction between a user and a spoken dialogue system. It was assumed that using context-dependent voice commands instead of absolute semantic voice commands would make the dialogue more natural and also increase the usability. This thesis also investigate if introducing context could aect the user's privacy and if it could expose a threat for the user from a user perspective. Based on an extended literature review of spoken dialogue system, voice recognition, ambient intelligence, human-computer interaction and privacy, a spoken dialogue system was designed and implemented to test the assumption. The test study included two steps: experiment and interview. The participants conducted the dierent scenarios where a spoken dialogue system could be used with both context-dependent commands and absolute semantic commands. Based on these studies, qualitative results regarding natural, usability and privacy validated the authors hypothesis to some extent. The results indicated that the interaction between users and spoken dialogue systems was more natural and increased the usability when using context. The participants did not feel more monitored by the spoken dialogue system when using context. Some participants stated that there could be a theoretical privacy issues, but only if the security measurements were not met. The paper concludes with suggestions for future work in the scientic area. / Denna uppsats har som mal att undersoka vilken eekt kontext kan ha pa interaktion mellan en anvandare och ett spoken dialogue system. Det antogs att anvandbarheten skulle oka genom att anvanda kontextberoende rostkommandon istallet for absolut semantiska rostkommandon. Denna uppsats granskar aven om kontext kan paverka anvandarens integritet och om den, ur ett anvandarperspektiv, kan utgora ett hot. Baserat pa den utokade litteraturstudien av spoken dialogue system, rostigenkanning, ambient intelligence, manniska-datorinteraktion och integritet, designades och implementerades ett spoken dialogue system for att testa detta antagande. Teststudien bestod av tva steg: experiment och intervju. Deltagarna utforde olika scenarier dar ett spoken dialogue system kunde anvands med kontextberoende rostkommandon och absolut semantiska rostkommandon. Kvalitativa resultat angaende naturlighet, anvandbarhet och integritet validerade forfattarens hypotes till en viss grad. Resultatet indikerade att interaktionen mellan anvandare och ett spoken dialogue system var mer naturlig och mer anvandbar vid anvandning av kontextberoende rostkommandon istallet for absolut semantiska rostkommandon. Deltagarna kande sig inte mer overvakade av ett spoken dialogue system vid anvandning av kontextberoende rostkommandon. Somliga deltagare angav att det, i teorin, fanns integritetsproblem, men endast om inte alla sakerhetsatgarder var uppnadda. Uppsatsen avslutas med forslag pa framtida studier inom detta vetenskapliga omrade.
|
17 |
Spoken Dialogue System for Information Navigation based on Statistical Learning of Semantic and Dialogue Structure / 意味・対話構造の統計的学習に基づく情報案内のための音声対話システムYoshino, Koichiro 24 September 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18614号 / 情博第538号 / 新制||情||95(附属図書館) / 31514 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 黒橋 禎夫, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
18 |
Deep Learning basierte Sprachinteraktion für Social Assistive RobotsGuhr, Oliver 25 September 2024 (has links)
In dieser Dissertation wurde ein Voice User Interface (VUI) für Socially Assistive Robot (SAR) konzipiert und entwickelt, mit dem Ziel, eine sprachbasierte Interaktion in Pflegeanwendungen zuermöglichen. Diese Arbeit schließt eine Forschungslücke, indem sie ein VUI entwickelt, das mit der natürlichen deutschen Alltagssprache operiert. Der Fokus lag auf der Nutzung von Fortschritten im Bereich der Deep Learning-basierten Sprachverarbeitung, um die Anforderungen der Robotik und der Nutzergruppen zu erfüllen. Es wurden zwei zentrale Forschungsfragen behandelt: die Ermittlung der Anforderungen an ein VUI für SARs in der Pflege und die Konzeption sowie Implementierung eines solchen VUIs. Die Arbeit erörtert die spezifischen Anforderungen der Robotik und der Nutzenden an ein VUIs. Des Weiteren wurden die geplanten Einsatzszenarien und Nutzergruppen des entwickelten VUIs, einschließlich dessen Anwendung in der Demenztherapie und in Pflegewohnungen, detailliert beschrieben. Im Hauptteil der Arbeit wurde das konzipierte VUI vorgestellt, das durch seine Offline-Fähigkeit und die Integration externer Sensoren und Aktoren des Roboters in das VUI auszeichnet. Die Arbeit behandelt auch die zentralen Bausteine für die Implementierung des VUIs, darunter Spracherkennung, Verarbeitung transkribierter Texte, Sentiment-Analyse und Textsegmentierung. Das entwickelte Dialogmanagement-Modell sowie die Evaluierung aktueller Sprachsynthesesysteme wurden ebenfalls diskutiert. In einer Nutzerstudie wurde die Anwendbarkeit des VUIs ohne spezifische Schulung getestet, mit dem Ergebnis, dass Teilnehmende 93% der Aufgaben erfolgreich lösen konnten. Zukünftige Forschungs- und Entwicklungsaktivitäten umfassen Langzeit-Evaluationen des VUIs in der Robotik und die Entwicklung eines digitalen Assistenten. Die Integration von LLMs undmultimodalen Modellen in VUIs stellt einen weiteren wichtigen Forschungsschwerpunkt dar, ebenso wie die Effizienzsteigerung von Deep Learning-Modellen für mobile Roboter.:Zusammenfassung 3
Abstract 4
1 Einleitung 13
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Problemstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.1 RobotikinderPflege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.2 AmbientAssistedLiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 ZielsetzungundForschungsfragen . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 AufbauderArbeit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Grundlagen 19
2.1 SociallyAssistiveRobotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 VoiceUserInterfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 DeepLearningzurSprachverarbeitung . . . . . . . . . . . . . . . . . . . . . . . 25
3 Konzeption 33
3.1 AnforderungenältererMenschenanVUIs . . . . . . . . . . . . . . . . . . . . . 33
3.2 AnforderungenderRobotikanVUIs . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Anwendungskontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 RobotergestützteMAKS-Therapie . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 AALWohnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Nutzeranalyse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 EntwicklungszielefürdasVUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Systemarchitektur 45
4.1 Architekturentscheidungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 KomponentendesVUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Systemkontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.2 AllgemeinesKomponentenmodell . . . . . . . . . . . . . . . . . . . . . 50
4.2.3 DetailliertesKomponentenmodell . . . . . . . . . . . . . . . . . . . . . . 51
4.3 ModulareerweiterbareInteraktion . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 InteraktiondurchSkills . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7
4.3.2 SchnittstellenmodellderSkills . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.3 ImplementierungsmodellderSkills . . . . . . . . . . . . . . . . . . . . . 56
5 Spracherkennung 59
5.1 VomgeprochenenWortzumText . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 VoiceActivityDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 AutomaticSpeechRecognotion . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.1 Evaulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.2 OptimierungderModellefürCPUInferenz . . . . . . . . . . . . . . . . 67
6 Sprachverarbeitung 71
6.1 VomTextzurIntention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Intent-ErkennungundNamedEntityRecognition . . . . . . . . . . . . . . . . . 72
6.3 SegmentierungvonAussagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.1 Datensatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Modelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3.4 Ablationsstudien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.5 Ergebnisse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4 TextbasierteSentimentanalyse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4.1 Daten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4.2 Modelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.3 Ergebnisse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7 DialogManagment 97
7.1 VonderIntentionzurAktion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2 VerfahrenfürdasDialogManagment . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 EinmodularesDialogManagment . . . . . . . . . . . . . . . . . . . . . . . . . .100
7.4 Sprachsynthese. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
8 EvaluationdesFrameworks 107
8.1 AufbauundZielsetzung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
8.2 TestdesignundMethodologie . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
8.3 TeilnehmendederStudie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
8.4 AnalyseundInterpretationderErgebnisse . . . . . . . . . . . . . . . . . . . . .111
8.5 EinschränkungenderStudie . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
8.6 Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
9 ZusammenfassungundAusblick 121
9.1 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
9.2 Ausblick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .124 / In this dissertation, a Voice User Interface (VUI) for Socially Assistive Robot (SAR) was designed and developed, with the aim of enabling voice-based interaction in care applications. This work fills a research gap by developing a VUI that operates with natural everyday natural everyday German language. The focus was on utilising advances in the field of deep learning-based speech and language processing to fulfil the requirements of robotics and user groups. Two central research questions were addressed: determining the requirements for a VUI for SARs in care applications and the design and implementation of such a VUIs. The work discusses the specific requirements of robotics and the users of a VUI. Furthermore, the planned application scenarios and user groups of the developed VUI, including its application in dementia therapy and in care homes, were described in detail. In the main part of the thesis, the designed VUI was presented, which is characterised by its offline capability and the integration of external sensors and actuators of the robot into the VUI. The thesis also deals with the central building blocks for the implementation of the VUIs, including speech recognition, processing of transcribed texts, sentiment analysis and text segmentation. The dialogue management model developed and the evaluation of current speech synthesis systems were also discussed. In a user study, the applicability of the VUIs was tested. Without specific training, the participants were able to successfully solve 93% of the tasks. Future research and development activities should include longterm evaluations of the VUIs in robotics and the development of a digital assistant. The integration of LLMs and multimodal models in VUIs is another important research focus, as is increasing the efficiency of deep learning models for mobile robots:Zusammenfassung 3
Abstract 4
1 Einleitung 13
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Problemstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.1 RobotikinderPflege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.2 AmbientAssistedLiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 ZielsetzungundForschungsfragen . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 AufbauderArbeit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Grundlagen 19
2.1 SociallyAssistiveRobotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 VoiceUserInterfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 DeepLearningzurSprachverarbeitung . . . . . . . . . . . . . . . . . . . . . . . 25
3 Konzeption 33
3.1 AnforderungenältererMenschenanVUIs . . . . . . . . . . . . . . . . . . . . . 33
3.2 AnforderungenderRobotikanVUIs . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Anwendungskontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 RobotergestützteMAKS-Therapie . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 AALWohnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Nutzeranalyse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 EntwicklungszielefürdasVUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Systemarchitektur 45
4.1 Architekturentscheidungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 KomponentendesVUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Systemkontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.2 AllgemeinesKomponentenmodell . . . . . . . . . . . . . . . . . . . . . 50
4.2.3 DetailliertesKomponentenmodell . . . . . . . . . . . . . . . . . . . . . . 51
4.3 ModulareerweiterbareInteraktion . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 InteraktiondurchSkills . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7
4.3.2 SchnittstellenmodellderSkills . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.3 ImplementierungsmodellderSkills . . . . . . . . . . . . . . . . . . . . . 56
5 Spracherkennung 59
5.1 VomgeprochenenWortzumText . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 VoiceActivityDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 AutomaticSpeechRecognotion . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.1 Evaulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.2 OptimierungderModellefürCPUInferenz . . . . . . . . . . . . . . . . 67
6 Sprachverarbeitung 71
6.1 VomTextzurIntention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Intent-ErkennungundNamedEntityRecognition . . . . . . . . . . . . . . . . . 72
6.3 SegmentierungvonAussagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.1 Datensatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Modelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3.4 Ablationsstudien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.5 Ergebnisse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4 TextbasierteSentimentanalyse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4.1 Daten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4.2 Modelle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.3 Ergebnisse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7 DialogManagment 97
7.1 VonderIntentionzurAktion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.2 VerfahrenfürdasDialogManagment . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 EinmodularesDialogManagment . . . . . . . . . . . . . . . . . . . . . . . . . .100
7.4 Sprachsynthese. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
8 EvaluationdesFrameworks 107
8.1 AufbauundZielsetzung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
8.2 TestdesignundMethodologie . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
8.3 TeilnehmendederStudie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
8.4 AnalyseundInterpretationderErgebnisse . . . . . . . . . . . . . . . . . . . . .111
8.5 EinschränkungenderStudie . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
8.6 Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
9 ZusammenfassungundAusblick 121
9.1 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
9.2 Ausblick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .124
|
19 |
Hierarchical reinforcement learning for spoken dialogue systemsCuayáhuitl, Heriberto January 2009 (has links)
This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale spoken dialogue systems. This research formulates the problem in terms of Semi-Markov Decision Processes (SMDPs), and proposes two hierarchical reinforcement learning methods to optimize sub-dialogues rather than full dialogues. The first method uses a hierarchy of SMDPs, where every SMDP ignores irrelevant state variables and actions in order to optimize a sub-dialogue. The second method extends the first one by constraining every SMDP in the hierarchy with prior expert knowledge. The latter method proposes a learning algorithm called 'HAM+HSMQ-Learning', which combines two existing algorithms in the literature of hierarchical reinforcement learning. Whilst the first method generates fully-learnt behaviour, the second one generates semi-learnt behaviour. In addition, this research proposes a heuristic dialogue simulation environment for automatic dialogue strategy learning. Experiments were performed on simulated and real environments based on a travel planning spoken dialogue system. Experimental results provided evidence to support the following claims: First, both methods scale well at the cost of near-optimal solutions, resulting in slightly longer dialogues than the optimal solutions. Second, dialogue strategies learnt with coherent user behaviour and conservative recognition error rates can outperform a reasonable hand-coded strategy. Third, semi-learnt dialogue behaviours are a better alternative (because of their higher overall performance) than hand-coded or fully-learnt dialogue behaviours. Last, hierarchical reinforcement learning dialogue agents are feasible and promising for the (semi) automatic design of adaptive behaviours in larger-scale spoken dialogue systems. This research makes the following contributions to spoken dialogue systems which learn their dialogue behaviour. First, the Semi-Markov Decision Process (SMDP) model was proposed to learn spoken dialogue strategies in a scalable way. Second, the concept of 'partially specified dialogue strategies' was proposed for integrating simultaneously hand-coded and learnt spoken dialogue behaviours into a single learning framework. Third, an evaluation with real users of hierarchical reinforcement learning dialogue agents was essential to validate their effectiveness in a realistic environment.
|
20 |
Revisiting user simulation in dialogue systems : do we still need them ? : will imitation play the role of simulation ? / Revisiter la simulation d'utilisateurs dans les systèmes de dialogue parlé : est-elle encore nécessaire ? : est-ce que l'imitation peut jouer le rôle de la simulation ?Chandramohan, Senthilkumar 25 September 2012 (has links)
Les récents progrès dans le domaine du traitement du langage ont apporté un intérêt significatif à la mise en oeuvre de systèmes de dialogue parlé. Ces derniers sont des interfaces utilisant le langage naturel comme medium d'interaction entre le système et l'utilisateur. Le module de gestion de dialogue choisit le moment auquel l'information qu'il choisit doit être échangée avec l'utilisateur. Ces dernières années, l'optimisation de dialogue parlé en utilisant l'apprentissage par renforcement est devenue la référence. Cependant, une grande partie des algorithmes utilisés nécessite une importante quantité de données pour être efficace. Pour gérer ce problème, des simulations d'utilisateurs ont été introduites. Cependant, ces modèles introduisent des erreurs. Par un choix judicieux d'algorithmes, la quantité de données d'entraînement peut être réduite et ainsi la modélisation de l'utilisateur évitée. Ces travaux concernent une partie des contributions présentées. L'autre partie des travaux consiste à proposer une modélisation à partir de données réelles des utilisateurs au moyen de l'apprentissage par renforcement inverse / Recent advancements in the area of spoken language processing and the wide acceptance of portable devices, have attracted signicant interest in spoken dialogue systems.These conversational systems are man-machine interfaces which use natural language (speech) as the medium of interaction.In order to conduct dialogues, computers must have the ability to decide when and what information has to be exchanged with the users. The dialogue management module is responsible to make these decisions so that the intended task (such as ticket booking or appointment scheduling) can be achieved.Thus learning a good strategy for dialogue management is a critical task.In recent years reinforcement learning-based dialogue management optimization has evolved to be the state-of-the-art. A majority of the algorithms used for this purpose needs vast amounts of training data.However, data generation in the dialogue domain is an expensive and time consuming process. In order to cope with this and also to evaluatethe learnt dialogue strategies, user modelling in dialogue systems was introduced. These models simulate real users in order to generate synthetic data.Being computational models, they introduce some degree of modelling errors. In spite of this, system designers are forced to employ user models due to the data requirement of conventional reinforcement learning algorithms can learn optimal dialogue strategies from limited amount of training data when compared to the conventional algorithms. As a consequence of this, user models are no longer required for the purpose of optimization, yet they continue to provide a fast and easy means for quantifying the quality of dialogue strategies. Since existing methods for user modelling are relatively less realistic compared to real user behaviors, the focus is shifted towards user modelling by means of inverse reinforcement learning. Using experimental results, the proposed method's ability to learn a computational models with real user like qualities is showcased as part of this work.
|
Page generated in 0.0617 seconds