Global ETD Search

21	対話事例を利用した音声対話システム Inagaki, Yasuyoshi, Matsubara, Shigeki, Kawaguchi, Nobuo, Murao, Hiroya, 稲垣, 康善, 松原, 茂樹, 河口, 信夫, 村尾, 浩也 01 December 2000 (has links) 情報処理学会研究報告. SLP, 音声言語情報処理; 2000-SLP-34-34 Car environment Language understanding Speech recognition Spoken dialogue 自動車言語理解音声認識音声対話
22	実走行車内音声対話コーパスの設計と特徴河口, 信夫, Kawaguchi, Nobuo, 松原, 茂樹, Matsubara, Shigeki, 若松, 佳広, Wakamatsu, Yoshihiro, 梶田, 将司, Kajita, Masashi, 武田, 一哉, Takeda, Kazuya, 板倉, 文忠, Itakura, Fumitada, 稲垣, 康善, Inagaki, Yasuyoshi 12 1900 (has links) No description available. 音声コーパス車内対話実走行環境ドライバー発話 speech corpus spoken dialogue in-car speech moving car environment
23	Reinforcement learning and reward estimation for dialogue policy optimisation Su, Pei-Hao January 2018 (has links) Modelling dialogue management as a reinforcement learning task enables a system to learn to act optimally by maximising a reward function. This reward function is designed to induce the system behaviour required for goal-oriented applications, which usually means fulfilling the user’s goal as efficiently as possible. However, in real-world spoken dialogue systems, the reward is hard to measure, because the goal of the conversation is often known only to the user. Certainly, the system can ask the user if the goal has been satisfied, but this can be intrusive. Furthermore, in practice, the reliability of the user’s response has been found to be highly variable. In addition, due to the sparsity of the reward signal and the large search space, reinforcement learning-based dialogue policy optimisation is often slow. This thesis presents several approaches to address these problems. To better evaluate a dialogue for policy optimisation, two methods are proposed. First, a recurrent neural network-based predictor pre-trained from off-line data is proposed to estimate task success during subsequent on-line dialogue policy learning to avoid noisy user ratings and problems related to not knowing the user’s goal. Second, an on-line learning framework is described where a dialogue policy is jointly trained alongside a reward function modelled as a Gaussian process with active learning. This mitigates the noisiness of user ratings and minimises user intrusion. It is shown that both off-line and on-line methods achieve practical policy learning in real-world applications, while the latter provides a more general joint learning system directly from users. To enhance the policy learning speed, the use of reward shaping is explored and shown to be effective and complementary to the core policy learning algorithm. Furthermore, as deep reinforcement learning methods have the potential to scale to very large tasks, this thesis also investigates the application to dialogue systems. Two sample-efficient algorithms, trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER), are introduced. In addition, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning to handle the cold start problem. Combining these two methods, a practical approach is demonstrated to effectively learn deep reinforcement learning-based dialogue policies in a task-oriented information seeking domain. Overall, this thesis provides solutions which allow truly on-line and continuous policy learning in spoken dialogue systems.
24	Data-driven language understanding for spoken dialogue systems Mrkšić, Nikola January 2018 (has links) Spoken dialogue systems provide a natural conversational interface to computer applications. In recent years, the substantial improvements in the performance of speech recognition engines have helped shift the research focus to the next component of the dialogue system pipeline: the one in charge of language understanding. The role of this module is to translate user inputs into accurate representations of the user goal in the form that can be used by the system to interact with the underlying application. The challenges include the modelling of linguistic variation, speech recognition errors and the effects of dialogue context. Recently, the focus of language understanding research has moved to making use of word embeddings induced from large textual corpora using unsupervised methods. The work presented in this thesis demonstrates how these methods can be adapted to overcome the limitations of language understanding pipelines currently used in spoken dialogue systems. The thesis starts with a discussion of the pros and cons of language understanding models used in modern dialogue systems. Most models in use today are based on the delexicalisation paradigm, where exact string matching supplemented by a list of domain-specific rephrasings is used to recognise users' intents and update the system's internal belief state. This is followed by an attempt to use pretrained word vector collections to automatically induce domain-specific semantic lexicons, which are typically hand-crafted to handle lexical variation and account for a plethora of system failure modes. The results highlight the deficiencies of distributional word vectors which must be overcome to make them useful for downstream language understanding models. The thesis next shifts focus to overcoming the language understanding models' dependency on semantic lexicons. To achieve that, the proposed Neural Belief Tracking (NBT) model forsakes the use of standard one-hot n-gram representations used in Natural Language Processing in favour of distributed representations of user utterances, dialogue context and domain ontologies. The NBT model makes use of external lexical knowledge embedded in semantically specialised word vectors, obviating the need for domain-specific semantic lexicons. Subsequent work focuses on semantic specialisation, presenting an efficient method for injecting external lexical knowledge into word vector spaces. The proposed Attract-Repel algorithm boosts the semantic content of existing word vectors while simultaneously inducing high-quality cross-lingual word vector spaces. Finally, NBT models powered by specialised cross-lingual word vectors are used to train multilingual belief tracking models. These models operate across many languages at once, providing an efficient method for bootstrapping language understanding models for lower-resource languages with limited training data.
25	MHNSS: um Middleware para o Desenvolvimento de Aplicações Móveis com Interações Baseada na Fala / MHNSS: a middleware for development of mobile application with interactions based speech Ferreira, Arikleyton de Oliveira 04 August 2014 (has links) Made available in DSpace on 2016-08-17T14:53:29Z (GMT). No. of bitstreams: 1 DISSERTACAO Arikleyton de Oliveira Ferreira.pdf: 1952997 bytes, checksum: 4c3733cd1aefc31e6f18a8068828d271 (MD5) Previous issue date: 2014-08-04 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Applications for mobile computing environments usually have several accessibility limitations due to the dependency on the interaction with the user through the device display, which hinders its use to people who have limitations to read, write (type) and/or have little fluency in the use of technology. In this master thesis we propose a middleware that provides support for developing mobile applications with accessibility features through spoken dialogue systems. These systems are able to hold a conversation with the user, providing a natural interaction interface that does not require prior learning. Thus, mobile applications can use the middleware to provide accessibility to the user that overcomes the physical or visual contact needs. The proposed middleware was developed in the context of the MobileHealthNet project, where it will help mobile applications with focus in the health domain to reach users with different profiles, with particular attention to underserved and remote communities. To perform the middleware evaluation, we used a case study based on a mobile application for evaluating the health condition of patients with atrial fibrillation. The evaluation involved 10 individuals, and the results obtained were very positive. / Aplicações para ambientes computacionais móveis usualmente apresentam diversas limitações de acessibilidade por dependerem da interação com o usuário através da tela dos dispositivos móveis, o que dificulta seu uso às pessoas que possuem limitações para ler, escrever (digitar) e que tenham pouca fluência no uso de tecnologias. Neste trabalho de mestrado propomos um middleware que fornece suporte ao desenvolvimento de aplicações móveis com recurso de acessibilidade através do diálogo falado. Essa modalidade de acesso é capaz de manter uma conversa com o usuário, proporcionando uma interface de interação natural que não requer prévio aprendizado. Assim, aplicações móveis podem utilizar o middleware para proporcionar acessibilidade ao usuário que supera a necessidade do contato físico ou visual, pois eles podem apenas dialogar entre si. O middleware proposto está inserido no contexto do projeto MobileHealthNet, onde auxiliará aplicações móveis focadas ao domínio da saúde a atingir usuários com diferentes perfis, com especial atenção a moradores de comunidades carentes e distantes. No processo de avalidação do middleware proposto foi utilizado um estudo de caso de uma aplicação dedicada a acompanhar o estado de saúde de pacientes portadores de fibrilação atrial, realizando-se uma avaliação com 10 sujeitos na qual obteve-se resultados bastante positivos. Acessibilidade Aplicações Móveis Interações Baseada na Fala Sistemas de Diálogo Falado Accessibility Mobile Applications Speech-Based Interactions Spoken Dialogue Systems
26	Den svenska callcenterbranschen och de tekniska lösningar som används : Branschanalys samt identifiering av problematiska dialogsystemsyttranden med hjälp av maskininlärning / The Swedish call center industry and the technologies it utilizes : Industry analysis and identification of problematic system utterances using machine learning Wirström, Li, Huledal, Mattias January 2015 (has links) Detta arbete består av två delar. Den första delen syftar till att beskriva och analysera callcenterbranschen i Sverige samt vilka faktorer som påverkar branschen och dess utveckling. Analysen grundar sig i två modeller: Porters fempunktsmodell och PEST. Fokus ligger på den del av branschen som består av kundtjänstverksamhet för att koppla till arbetets andra del. Analysen visar att branschen främst påverkas av hög konkurrens och företagens, som behöver tillhandahålla kundtjänst, val mellan interna eller externa kundtjänstlösningar. Analysen indikerar även att branschen kommer fortsätta växa och att det finns en trend att företag i större utsträckning väljer att outsourca sin kundtjänst. Utvecklingen hos de tekniska lösningar som används i callcenter, till exempel dialogsystem, är efterfrågade av företagen då dessa är viktiga verktyg för att skapa en väl fungerande kundtjänst. Dagens digitala system har uppenbara utvecklingsområden. Det är ofta stora internationella företag eller internationella arbetslag som utvecklar de digitala systemen. Dock sträcker sig användningsområdet för dessa system långt utanför endast callcenterbranschen. Den andra delen handlar om att identifiera problematiska dialogsystemyttranden med hjälp av maskininlärning och inspireras av SpeDial, ett EU-projekt med syfte att förbättra dialogsystem. Yttranden från dialogsystemet kan anses problematiska beroende på till exempel att systemet missuppfattat användarens avsikt. Syftet med arbetets andra del är att undersöka vilken eller vilka maskininlärningsmetoder i verktyget WEKA som lämpar sig bäst för att identifiera problematiska dialogsystemyttranden. De data som använts i arbetet kommer från en kundtjänstentré baserad på fritt tal, vilket innebär att användaren själv uppmanas beskriva sitt ärende för att kunna kopplas vidare till rätt avdelning inom kundtjänsten. Våra data har tillhandahållits av företaget Voice Provider som utvecklar, implementerar och underhåller kundtjänstsystem. Voice Provider kom vi i kontakt med via Institutionen för tal, musik och hörsel (TMH), vid Kungliga Tekniska högskolan, som deltar i SpeDial-projektet. Arbetet gick initialt ut på att förbereda tillhandahållen data för att kunna användas av maskininlärningsverktyget WEKAs inbyggda klassificerare, varefter sex klassificerare valdes ut för vidare utvärdering. Resultaten visar att ingen av klassificerarna lyckades utföra uppgiften på ett fullt ut tillfredsställande sätt. Den som lyckades bäst var dock metoden Random Forest. Det är svårt att dra några ytterligare slutsatser från resultaten. / This work consists of two parts. The first part aims to describe and analyze the call center industry in Sweden and the factors that affect the industry and its development. The analysis is based on two models: Porter’s five forces and PEST. The focus is mainly on the part of the industry that consists of customer service operations. The analysis shows that the industry is mainly affected by high competition and businesses’, that need to provide customer service, choice between internal or external customer service operations. The analysis also indicates that the industry will continue to grow and that there is a trend that companies increasingly choose to outsource their customer service. The development of the technological solutions used in call centers, for example, dialogue systems, are requested by companies as these are important tools to create a well-functioning customer service. Digital systems today have obvious development areas. It is often large international companies or international teams that develop the digital systems used. However, extends the area of use for these systems far beyond the call center industry. The second part involves identifying problematic dialogue system utterances using machine learning and is inspired by SpeDial, an EU project aimed at improving dialogue systems. Problematic dialogue system utterances can be considered problematic depending on, for example, that the system misinterprets the user's intention. The aim of the work done in the second part is to investigate what or which machine learning methods in the WEKA tool that are best suited to identify problematic dialogue system utterances. The data used in this work comes from a customer service entrance based on free speech, which means that the user is asked to describe their case to be transferred to the right department within the customer service. Our data has been provided by the company Voice Provider that develops, implements and maintains customer service systems. We came in contact with Voice Provider through the Department of Speech, Music and Hearing (TMH), at the Royal Institute of Technology, that are involved in the SpeDial project. The work initially consisted of preparing the supplied data to enable it to me used by the machine learning tool WEKA’s built-in classifiers, after which six classifiers were selected for further evaluation. The results show that none of the classifiers managed to accomplish the task in a fully satisfactory manner. Whoever the method that was most successful was the Random Forest method. It is difficult to draw any further conclusions from the results. Call center Classification Data mining Problematic dialogue system utterances Spoken dialogue system Sweden Callcenter Data mining Dialogsystem Klassificering Problematiska systemyttranden Sverige
27	Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent Stiff, Adam January 2020 (has links) No description available. Computer Science Educational Software Linguistics Artificial Intelligence question-answering dialogue agent spoken dialogue virtual patient standardized patient semantic classification data scarcity data sparsity neural network machine learning
28	[pt] DESENVOLVIMENTO DE MODELOS PARA PREVISÃO DE QUALIDADE DE SISTEMAS DE RECONHECIMENTO DE VOZ / [en] DEVELOPMENT OF PREDICTION MODELS FOR THE QUALITY OF SPOKEN DIALOGUE SYSTEMS BERNARDO LINS DE ALBUQUERQUE COMPAGNONI 12 November 2021 (has links) [pt] Spoken Dialogue Systems (SDS s) são sistemas baseados em computadores desenvolvidos para fornecerem informações e realizar tarefas utilizando o diálogo como forma de interação. Eles são capazes de reconhecimento de voz, interpretação, gerenciamento de diálogo e são capazes de ter uma voz como saída de dados, tentando reproduzir uma interação natural falada entre um usuário humano e um sistema. SDS s provém diferentes serviços, todos através de linguagem falada com um sistema. Mesmo com todo o desenvolvimento nesta área, há escassez de informações sobre como avaliar a qualidade de tais sistemas com o propósito de otimização do mesmo. Com dois destes sistemas, BoRIS e INSPIRE, usados para reservas de restaurantes e gerenciamento de casas inteligentes, diversos experimentos foram conduzidos no passado, onde tais sistemas foram utilizados para resolver tarefas específicas. Os participantes avaliaram a qualidade do sistema em uma série de questões. Além disso, todas as interações foram gravadas e anotadas por um especialista.O desenvolvimento de métodos para avaliação de performance é um tópico aberto de pesquisa na área de SDS s. Seguindo a idéia do modelo PARADISE (PARAdigm for DIalogue System Evaluation – desenvolvido pro Walker e colaboradores na AT&T em 1998), diversos experimentos foram conduzidos para desenvolver modelos de previsão de performance de sistemas de reconhecimento de voz e linguagem falada. O objetivo desta dissertação de mestrado é desenvolver modelos que permitam a previsão de dimensões de qualidade percebidas por um usuário humano, baseado em parâmetros instrumentalmente mensuráveis utilizando dados coletados nos experimentos realizados com os sistemas BoRIS e INSPIRE , dois sistemas de reconhecimento de voz (o primeiro para busca de restaurantes e o segundo para Smart Homes). Diferentes algoritmos serão utilizados para análise (Regressão linear, Árvores de Regressão, Árvores de Classificação e Redes Neurais) e para cada um dos algoritmos, uma ferramenta diferente será programada em MATLAB, para poder servir de base para análise de experimentos futuros, sendo facilmente modificado para sistemas e parâmetros novos em estudos subsequentes.A idéia principal é desenvolver ferramentas que possam ajudar na otimização de um SDS sem o envolvimento direto de um usuário humano ou servir de ferramenta para estudos futuros na área. / [en] Spoken Dialogue Systems (SDS s) are computer-based systems developed to provide information and carry out tasks using speech as the interaction mode. They are capable of speech recognition, interpretation, management of dialogue and have speech output capabilities, trying to reproduce a more or less natural spoken interaction between a human user and the system. SDS s provide several different services, all through spoken language. Even with all this development, there is scarcity of information on ways to assess and evaluate the quality of such systems with the purpose of optimization. With two of these SDS s ,BoRIS and INSPIRE, (used for Restaurant Booking Services and Smart Home Systems), extensive experiments were conducted in the past, where the systems were used to resolve specific tasks. The evaluators rated the quality of the system on a multitude of scales. In addition to that, the interactions were recorded and annotated by an expert. The development of methods for performance evaluation is an open research issue in this area of SDS s. Following the idea of the PARADISE model (PARAdigm for DIalogue System Evaluation model, the most well-known model for this purpose (developed by Walker and co-workers at AT&T in 1998), several experiments were conducted to develop predictive models of spoken dialogue performance. The objective of this dissertation is to develop and assess models which allow the prediction of quality dimensions as perceived by the human user, based on instrumentally measurable variables using all the collected data from the BoRIS and INSPIRE systems. Different types of algorithms will be compared to their prediction performance and to how generic they are. Four different approaches will be used for these analyses: Linear regression, Regression Trees, Classification Trees and Neural Networks. For each of these methods, a different tool will be programmed using MATLAB, that can carry out all experiments from this work and be easily modified for new experiments with data from new systems or new variables on future studies. All the used MATLAB programs will be made available on the attached CD with an operation manual for future users as well as a guide to modify the existing programs to work on new data. The main idea is to develop tools that would help on the optimization of a spoken dialogue system without a direct involvement of the human user or serve as tools for future studies in this area. [pt] RECONHECIMENTO DE VOZ [pt] SPOKEN DIALOGUE SYSTEMS [pt] LINGUAGEM FALADA [pt] ARVORES DE CLASSIFICACAO [pt] REDES NEURAIS [pt] ENGENHARIA ELETRICA - TESES [pt] REGRESSAO LINEAR [pt] ARVORES DE REGRESSAO [en] SPEECH RECOGNITION [en] SYSTEM PERFORMANCE EVALUATION [en] SPOKEN DIALOGUE SYSTEMS [en] SPOKEN LANGUAGE [en] CLASSIFICATION TREES [en] NEURAL NETWORKS [en] ELECTRICAL ENGINEERING - THESIS [en] LINEAR REGRESSION [en] REGRESSION TREES
29	Developing Multimodal Spoken Dialogue Systems : Empirical Studies of Spoken Human–Computer Interaction Gustafson, Joakim January 2002 (has links) This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them. The dialogue systems have been multimodal, giving information both verbally with animated talking characters and graphically on maps and in text tables. To be able to study a wider rage of user behaviour each new system has been in a new domain and with a new set of interactional abilities. The five system presented in this thesis are: The Waxholm system where users could ask about the boat traffic in the Stockholm archipelago; the Gulan system where people could retrieve information from the Yellow pages of Stockholm; the August system which was a publicly available system where people could get information about the author Strindberg, KTH and Stockholm; the AdAptsystem that allowed users to browse apartments for sale in Stockholm and the Pixie system where users could help ananimated agent to fix things in a visionary apartment publicly available at the Telecom museum in Stockholm. Some of the dialogue systems have been used in controlled experiments in laboratory environments, while others have been placed inpublic environments where members of the general public have interacted with them. All spoken human-computer interactions have been transcribed and analyzed to increase our understanding of how people interact verbally with computers, and to obtain knowledge on how spoken dialogue systems canutilize the regularities found in these interactions. This thesis summarizes the experiences from building these five dialogue systems and presents some of the findings from the analyses of the collected dialogue corpora. / QC 20100611 Spoken dialogue system multimodal speech GUI animated agents embodied conversational characters talking heads empirical user studies speech corpora system evaluation system development Wizard of Oz simulations system architecture linguis TECHNOLOGY TEKNIKVETENSKAP
30	WOZシステムのログ情報を利用した事例ベース音声対話システムの開発 INAGAKI, Yasuyoshi, YAMAGUCHI, Yukiko, MATSUBARA, Shigeki, KAWAGUCHI, Nobuo, MURAO, Hiroya, 稲垣, 康善, 山口, 由紀子, 松原, 茂樹, 河口, 信夫, 村尾, 浩也 19 December 2002 (has links) 情報処理学会研究報告音声言語情報処理;2002-SLP-44-23 Dialogue corpus Car environment Reply generation Speech understanding Speech recognition Spoken dialogue 対話コーパス自動車応答生成意図理解 Wizard of OZ 音声認識音声対話

Search results