Global ETD Search

71	Learning from Synthetic Data : Towards Effective Domain Adaptation Techniques for Semantic Segmentation of Urban Scenes / Lärande från Syntetiska Data : Mot Effektiva Domänanpassningstekniker för Semantisk Segmentering av Urbana Scener Valls I Ferrer, Gerard January 2021 (has links) Semantic segmentation is the task of predicting predefined class labels for each pixel in a given image. It is essential in autonomous driving, but also challenging because training accurate models requires large and diverse datasets, which are difficult to collect due to the high cost of annotating images at pixel-level. This raises interest in using synthetic images from simulators, which can be labelled automatically. However, models trained directly on synthetic data perform poorly in real-world scenarios due to the distributional misalignment between synthetic and real images (domain shift). This thesis explores the effectiveness of several techniques for alleviating this issue, employing Synscapes and Cityscapes as the synthetic and real datasets, respectively. Some of the tested methods exploit a few additional labelled real images (few-shot supervised domain adaptation), some have access to plentiful real images but not their associated labels (unsupervised domain adaptation), and others do not take advantage of any image or annotation from the real domain (domain generalisation). After extensive experiments and a thorough comparative study, this work shows the severity of the domain shift problem by revealing that a semantic segmentation model trained directly on the synthetic dataset scores a poor mean Intersection over Union (mIoU) of 33:5% when tested on the real dataset. This thesis also demonstrates that such performance can be boosted by 25:7% without accessing any annotations from the real domain and 17:3% without leveraging any information from the real domain. Nevertheless, these gains are still inferior to the 31:0% relative improvement achieved with as little as 25 supplementary labelled real images, which suggests that there is still room for improvement in the fields of unsupervised domain adaptation and domain generalisation. Future work efforts should focus on developing better algorithms and creating synthetic datasets with a greater diversity of shapes and textures in order to reduce the domain shift. / Semantisk segmentering är uppgiften att förutsäga fördefinierade klassetiketter för varje pixel i en given bild. Det är viktigt för autonom körning, men också utmanande eftersom utveckling av noggranna modeller kräver stora och varierade datamängder, som är svåra att samla in på grund av de höga kostnaderna för att märka bilder på pixelnivå. Detta väcker intresset att använda syntetiska bilder från simulatorer, som kan märkas automatiskt. Problemet är emellertid att modeller som tränats direkt på syntetiska data presterar dåligt i verkliga scenarier på grund av fördelningsfel mellan syntetiska och verkliga bilder (domänskift). Denna avhandling undersöker effektiviteten hos flera tekniker för att lindra detta problem, med Synscapes och Cityscapes som syntetiska respektive verkliga datamängder. Några av de testade metoderna utnyttjar några ytterligare märkta riktiga bilder (few-shot övervakad domänanpassning), vissa har tillgång till många riktiga bilder men inte deras associerade etiketter (oövervakad domänanpassning), och andra drar inte nytta av någon bild eller annotering från den verkliga domänen (domängeneralisering). Efter omfattande experiment och en grundlig jämförande studie visar detta arbete svårighetsgraden av domänskiftproblemet genom att avslöja att en semantisk segmenteringsmodell som upplärts direkt på den syntetiska datauppsättningen ger en dålig mean Intersection over Union (mIoU) på 33; 5% när den testas på den verkliga datamängden. Denna avhandling visar också att sådan prestanda kan ökas med 25; 7% utan att komma åt några annoteringar från den verkliga domänen och 17; 3% utan att utnyttja någon information från den verkliga domänen. Ändå är dessa vinster fortfarande sämre än den 31; 0% relativa förbättringen som uppnåtts med så lite som 25 kompletterande annoterade riktiga bilder, vilket tyder på att det fortfarande finns utrymme för förbättringar inom områdena oövervakad domänanpassning och domängeneralisering. Framtida arbetsinsatser bör fokusera på att utveckla bättre algoritmer och på att skapa syntetiska datamängder med en större mångfald av former och texturer för att minska domänskiftet. Semantic Segmentation Synthetic Data Autonomous Driving Domain Shift Domain Adaptation Domain Generalisation Semantisk Segmentering Syntetiska Data Autonom Körning Domänskift Domänanpassning Domängeneralisering Computer and Information Sciences Data- och informationsvetenskap
72	Be More with Less: Scaling Deep-learning with Minimal Supervision Yaqing Wang (12470301) 28 April 2022 (has links) <p> </p> <p>Large-scale deep learning models have reached previously unattainable performance for various tasks. However, the ever-growing resource consumption of neural networks generates large carbon footprint, brings difficulty for academics to engage in research and stops emerging economies from enjoying growing Artificial Intelligence (AI) benefits. To further scale AI to bring more benefits, two major challenges need to be solved. Firstly, even though large-scale deep learning models achieved remarkable success, their performance is still not satisfactory when fine-tuning with only a handful of examples, thereby hindering widespread adoption in real-world applications where a large scale of labeled data is difficult to obtain. Secondly, current machine learning models are still mainly designed for tasks in closed environments where testing datasets are highly similar to training datasets. When the deployed datasets have distribution shift relative to collected training data, we generally observe degraded performance of developed models. How to build adaptable models becomes another critical challenge. To address those challenges, in this dissertation, we focus on two topics: few-shot learning and domain adaptation, where few-shot learning aims to learn tasks with limited labeled data and domain adaption address the discrepancy between training data and testing data. In Part 1, we show our few-shot learning studies. The proposed few-shot solutions are built upon large-scale language models with evolutionary explorations from improving supervision signals, incorporating unlabeled data and improving few-shot learning abilities with lightweight fine-tuning design to reduce deployment costs. In Part 2, domain adaptation studies are introduced. We develop a progressive series of domain adaption approaches to transfer knowledge across domains efficiently to handle distribution shifts, including capturing common patterns across domains, adaptation with weak supervision and adaption to thousands of domains with limited labeled data and unlabeled data. </p> Computer Engineering Applied Computer Science Pattern Recognition and Data Mining Minimally-supervised Learning Semi-supervised Learning Data Mining Deep Learning Fake News Detection Natural Language Processing Domain Adaptation
73	Integrative approaches to single cell RNA sequencing analysis Johnson, Travis Steele 21 September 2020 (has links) No description available. Biomedical Research Bioinformatics
74	Real-time Unsupervised Domain Adaptation / Oövervakad domänanpassning i realtid Botet Colomer, Marc January 2023 (has links) Machine learning systems have been demonstrated to be highly effective in various fields, such as in vision tasks for autonomous driving. However, the deployment of these systems poses a significant challenge in terms of ensuring their reliability and safety in diverse and dynamic environments. Online Unsupervised Domain Adaptation (UDA) aims to address the issue of continuous domain changes that may occur during deployment, such as sudden weather changes. Although these methods possess a remarkable ability to adapt to unseen domains, they are hindered by the high computational cost associated with constant adaptation, making them unsuitable for real-world applications that demand real-time performance. In this work, we focus on the challenging task of semantic segmentation. We present a framework for real-time domain adaptation that utilizes novel strategies to enable online adaptation at a rate of over 29 FPS on a single GPU. We propose a clever partial backpropagation in conjunction with a lightweight domain-shift detector that identifies the need for adaptation, adapting appropriately domain-specific hyperparameters to enhance performance. To validate our proposed framework, we conduct experiments in various storm scenarios using different rain intensities and evaluate our results in different domain shifts, such as fog visibility, and using the SHIFT dataset. Our results demonstrate that our framework achieves an optimal trade-off between accuracy and speed, surpassing state-of-the-art results, while the introduced strategies enable it to run more than six times faster at a minimal performance loss. / Maskininlärningssystem har visat sig vara mycket effektiva inom olika områden, till exempel i datorseende uppgifter för autonom körning. Spridning av dessa system utgör dock en betydande utmaning när det gäller att säkerställa deras tillförlitlighet och säkerhet i olika och dynamiska miljöer. Online Unsupervised Domain Adaptation (UDA) syftar till att behandla problemet med kontinuerliga domänändringar som kan inträffas under systemets användning, till exempel plötsliga väderförändringar. Även om dessa metoder har en anmärkningsvärd förmåga att anpassa sig till okända domäner, hindras de av den höga beräkningskostnaden som är förknippad med ständig nöndvändighet för anpassning, vilket gör dem olämpliga för verkliga tillämpningar som kräver realtidsprestanda. I detta avhandling fokuserar vi på utmanande uppgiften semantisk segmentering. Vi presenterar ett system för domänanpassning i realtid som använder nya strategier för att möjliggöra onlineanpassning med en hastighet av över 29 FPS på en enda GPU. Vi föreslår en smart partiell backpropagation i kombination med en lätt domänförskjutningsdetektor som identifierar nãr anpassning egentligen behövs, vilket kan konfigureras av domänspecifika hyperparametrar på lämpligt sätt för att förbättra prestandan. För att validera vårt föreslagna system genomför vi experiment i olika stormscenarier med olika regnintensiteter och utvärderar våra resultat i olika domänförskjutningar, såsom dimmasynlighet, och med hjälp av SHIFT-datauppsättningen. Våra resultat visar att vårt system uppnår en optimal avvägning mellan noggrannhet och hastighet, och överträffar toppmoderna resultat, medan de introducerade strategierna gör det möjligt att köra mer än sex gånger snabbare med minimal prestandaförlust. Unsupervised Domain Adaptation Real-Time applications Online Learning Self-Learning Semantic Segmentation Reinforcement Learning Oövervakad domänanpassning Realtidsapplikationer Onlineinlärning Självinlärning Semantisk Segmentering Förstärkningsinlärning Computer and Information Sciences Data- och informationsvetenskap
75	MULTI-SOURCE AND SOURCE-PRIVATE CROSS-DOMAIN LEARNING FOR VISUAL RECOGNITION Qucheng Peng (12426570) 12 July 2022 (has links) <p>Domain adaptation is one of the hottest directions in solving annotation insufficiency problem of deep learning. General domain adaptation is not consistent with the practical scenarios in the industry. In this thesis, we focus on two concerns as below.</p> <p> </p> <p> First is that labeled data are generally collected from multiple domains. In other words, multi-source adaptation is a more common situation. Simply extending these single-source approaches to the multi-source cases could cause sub-optimal inference, so specialized multi-source adaptation methods are essential. The main challenge in the multi-source scenario is a more complex divergence situation. Not only the divergence between target and each source plays a role, but the divergences among distinct sources matter as well. However, the significance of maintaining consistency among multiple sources didn't gain enough attention in previous work. In this thesis, we propose an Enhanced Consistency Multi-Source Adaptation (EC-MSA) framework to address it from three perspectives. First, we mitigate feature-level discrepancy by cross-domain conditional alignment, narrowing the divergence between each source and target domain class-wisely. Second, we enhance multi-source consistency via dual mix-up, diminishing the disagreements among different sources. Third, we deploy a target distilling mechanism to handle the uncertainty of target prediction, aiming to provide high-quality pseudo-labeled target samples to benefit the previous two aspects. Extensive experiments are conducted on several common benchmark datasets and demonstrate that our model outperforms the state-of-the-art methods.</p> <p> </p> <p> Second is that data privacy and security is necessary in practice. That is, we hope to keep the raw data stored locally while can still obtain a satisfied model. In such a case, the risk of data leakage greatly decreases. Therefore, it is natural for us to combine the federated learning paradigm with domain adaptation. Under the source-private setting, the main challenge for us is to expose information from the source domain to the target domain while make sure that the communication process is safe enough. In this thesis, we propose a method named Fourier Transform-Assisted Federated Domain Adaptation (FTA-FDA) to alleviate the difficulties in two ways. We apply Fast Fourier Transform to the raw data and transfer only the amplitude spectra during the communication. Then frequency space interpolations between these two domains are conducted, minimizing the discrepancies while ensuring the contact of them and keeping raw data safe. What's more, we make prototype alignments by using the model weights together with target features, trying to reduce the discrepancy in the class level. Experiments on Office-31 demonstrate the effectiveness and competitiveness of our approach, and further analyses prove that our algorithm can help protect privacy and security.</p> Digital processor architectures Transfer learning (TL) Domain Adaptation Deep Learning Theory image classification methods Computer Engineering
76	[pt] ADAPTAÇÃO DE DOMINIO BASEADO EM APRENDIZADO PROFUNDO PARA DETECÇÃO DE MUDANÇAS EM FLORESTAS TROPICAIS / [en] DEEP LEARNING-BASED DOMAIN ADAPTATION FOR CHANGE DETECTION IN TROPICAL FORESTS PEDRO JUAN SOTO VEGA 20 July 2021 (has links) [pt] Os dados de observação da Terra são freqüentemente afetados pelo fenômeno de mudança de domínio. Mudanças nas condições ambientais, variabilidade geográfica e diferentes propriedades de sensores geralmente tornam quase impossível empregar classificadores previamente treinados para novos dados sem experimentar uma queda significativa na precisão da classificação. As técnicas de adaptação de domínio baseadas em modelos de aprendizado profundo têm se mostrado úteis para aliviar o problema da mudança de domínio. Trabalhos recentes nesta área fundamentam-se no treinamento adversárial para alinhar os atributos extraídos de imagens de diferentes domínios em um espaço latente comum. Outra forma de tratar o problema é empregar técnicas de translação de imagens e adaptá-las de um domínio para outro de forma que as imagens transformadas contenham características semelhantes às imagens do outro domínio. Neste trabalho, propõem-se abordagens de adaptação de domínio para tarefas de detecção de mudanças, baseadas em primeiro lugar numa técnica de traslação de imagens, Cycle-Consistent Generative Adversarial Network (CycleGAN), e em segundo lugar, num modelo de alinhamento de atributos: a Domain Adversarial Neural Network (DANN). Particularmente, tais técnicas foram estendidas, introduzindo-se restrições adicionais na fase de treinamento dos componentes do modelo CycleGAN, bem como um procedimento de pseudo-rotulagem não supervisionado para mitigar o impacto negativo do desequilíbrio de classes no DANN. As abordagens propostas foram avaliadas numa aplicação de detecção de desmatamento, considerando diferentes regiões na floresta amazônica e no Cerrado brasileiro (savana). Nos experimentos, cada região corresponde a um domínio, e a precisão de um classificador treinado com imagens e referências de um dos domínio (fonte) é medida na classificação de outro domínio (destino). Os resultados demonstram que as abordagens propostas foram bem sucedidas em amenizar o problema de desvio de domínio no contexto da aplicação alvo. / [en] Earth observation data are frequently affected by the domain shift phenomenon. Changes in environmental conditions, geographical variability and different sensor properties typically make it almost impossible to employ previously trained classifiers for new data without a significant drop in classification accuracy. Domain adaptation (DA) techniques based on Deep Learning models have been proven useful to alleviate domain shift. Recent improvements in DA technology rely on adversarial training to align features extracted from images of the different domains in a common latent space. Another way to face the problem is to employ image translation techniques, and adapt images from one domain in such a way that the transformed images contain characteristics that are similar to the images from the other domain. In this work two different DA approaches for change detection tasks are proposed, which are based on a particular image translation technique, the Cycle-Consistent Generative Adversarial Network (CycleGAN), and on a representation matching strategy, the Domain Adversarial Neural Network (DANN). In particular, additional constraints in the training phase of the original CycleGAN model components are proposed, as well as an unsupervised pseudo-labeling procedure, to mitigate the negative impact of class imbalance in the DANN-based approach. The proposed approaches were evaluated on a deforestation detection application, considering different sites in the Amazon rain-forest and in the Brazilian Cerrado (savanna) biomes. In the experiments each site corresponds to a domain, and the accuracy of a classifier trained with images and references from one (source) domain is measured in the classification of another (target) domain. The experimental results show that the proposed approaches are successful in alleviating the domain shift problem. [pt] APRENDIZADO PROFUNDO [pt] CYCLEGAN [pt] ADAPTACAO DE DOMINIO [pt] DETECCAO DE MUDANCAS [pt] DETECCAO DE DESMATAMENTO [en] DEEP LEARNING [en] CYCLEGAN [en] DOMAIN ADAPTATION [en] CHANGE DETECTION [en] DEFORESTATION DETECTION
77	Data Synthesis in Deep Learning for Object Detection / Syntetiskt Data i Djupinlärning för Objektdetektion Haddad, Josef January 2021 (has links) Deep neural networks typically require large amounts of labeled data for training, but a problem is that collecting data can be expensive. Our study aims at revealing insights into how training with synthetic data affects performance in real-world object detection tasks. This is achieved by synthesising annotated image data in the automotive domain using a car simulator for the tasks of detecting cars in images from the real world. We furthermore perform experiments in the aviation domain where we incorporate synthetic images extracted from an airplane simulator with real-world data for detecting runways. In our experiments, the synthetic data sets are leveraged by pre-training a deep learning based object detector, which is then fine-tuned and evaluated on real-world data. We evaluate this approach on three real-world data sets across the two domains and furthermore evaluate how the classification performance scales as synthetic and real-world data varies in the automotive domain. In the automotive domain, we additionally perform image-to-image translation both from the synthetic domain to the real-world domain, and the other way around, as a means of domain adaptation to assess whether it further improves performance. The results show that adding synthetic data improves performance in the automotive domain and that pre-training with more synthetic data results in further performance improvements, but that the performance boost of adding more real-world data exceeds that of the addition of more synthetic data. We can not conclude that using CycleGAN for domain adaptation further improves the performance. / Djupa neurala nätverk behöver normalt stora mängder annoterad träningsdata, men ett problem är att data kan vara dyrt att sampla in. Syftet med denna studie är att undersöka hur träning med syntetiskt data påverkar en objektdetektors prestanda på verkligt data. Detta undersöks genom att syntetisera data i bildomänen med hjälp av en bilsimulator för uppgiften att identifiera bilar i den verkliga världen. Dessutom utför vi experiment i flygdomänen där vi inkorporerar syntetiskt flygbilddata från en flygsimulator med riktigt flygdata för detektion av landningsbanor. Det syntetiska datat i vår studie används till att förträna en djupinlärningsbaserad objektdetektor, som sedan fintränas och evalueras på data insamlat från den verkliga världen. Vi evaluerar denna approach på totalt tre riktiga dataset över våra två domäner och dessutom undersöker vi hur prestandan skalar när mängden syntetiskt och riktigt data varierar i bildomänen. I bildomänen tillämpar vi dessutom bildtillbild translation mellan de syntetiska och riktiga bilderna för att undersöka om denna sorts domänadaption förbättrar prestandan. Resultaten visar att tillägg av syntetiskt data förbättrar prestandan i bildomänen och att förträning med en större mängd syntetiskt data resulterar i ytterligare prestandaförbättringar, men att prestandaförbättringen när mer riktigt data läggs till är större i jämförelse. Vi kan inte dra slutsatsen att domänadaption med CycleGAN leder till förbättrad prestanda. Deep Learning Computer vision Object detection Synthetic data Domain Adaptation Machine Learning Djupinlärning Datorseende Objektdetektion Syntetiskt data Domänadaption Maskininlärning Computer and Information Sciences Data- och informationsvetenskap
78	Handling Domain Shift in 3D Point Cloud Perception Saltori, Cristiano 10 April 2024 (has links) This thesis addresses the problem of domain shift in 3D point cloud perception. In the last decades, there has been tremendous progress in within-domain training and testing. However, the performance of perception models is affected when training on a source domain and testing on a target domain sampled from different data distributions. As a result, a change in sensor or geo-location can lead to a harmful drop in model performance. While solutions exist for image perception, addressing this problem in point clouds remains unresolved. The focus of this thesis is the study and design of solutions for mitigating domain shift in 3D point cloud perception. We identify several settings differing in the level of target supervision and the availability of source data. We conduct a thorough study of each setting and introduce a new method to solve domain shift in each configuration. In particular, we study three novel settings in domain adaptation and domain generalization and propose five new methods for mitigating domain shift in 3D point cloud perception. Our methods are used by the research community, and at the time of writing, some of the proposed approaches hold the state-of-the-art. In conclusion, this thesis provides a valuable contribution to the computer vision community, setting the groundwork for the development of future works in cross-domain conditions.
79	Action Recognition with Knowledge Transfer Choi, Jin-Woo 07 January 2021 (has links) Recent progress on deep neural networks has shown remarkable action recognition performance from videos. The remarkable performance is often achieved by transfer learning: training a model on a large-scale labeled dataset (source) and then fine-tuning the model on the small-scale labeled datasets (targets). However, existing action recognition models do not always generalize well on new tasks or datasets because of the following two reasons. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor generalization performance. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small- scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. For the first problem, I propose to learn scene-invariant action representations to mitigate the scene bias in action recognition models. Specifically, I augment the standard cross-entropy loss for action classification with 1) an adversarial loss for the scene types and 2) a human mask confusion loss for videos where the human actors are invisible. These two losses encourage learning representations unsuitable for predicting 1) the correct scene types and 2) the correct action types when there is no evidence. I validate the efficacy of the proposed method by transfer learning experiments. I trans- fer the pre-trained model to three different tasks, including action classification, temporal action localization, and spatio-temporal action detection. The results show consistent improvement over the baselines for every task and dataset. I formulate human action recognition as an unsupervised domain adaptation (UDA) problem to handle the second problem. In the UDA setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already exist- ing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene, to learn domain-invariant action representations. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Then I explore the semi-supervised video action recognition, where we have a lot of labeled videos as source data and sparsely labeled videos as target data. The semi-supervised setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject photometric, geometric, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks. / Doctor of Philosophy / Recent progress on deep learning has shown remarkable action recognition performance. The remarkable performance is often achieved by transferring the knowledge learned from existing large-scale data to the small-scale data specific to applications. However, existing action recog- nition models do not always work well on new tasks and datasets because of the following two problems. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor performance on the new datasets and tasks. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small-scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. To tackle the first problem, I propose to learn scene-invariant action representations to mitigate background scene- biased human action recognition models for the first problem. Specifically, the proposed method learns representations that cannot predict the scene types and the correct actions when there is no evidence. I validate the proposed method's effectiveness by transferring the pre-trained model to multiple action understanding tasks. The results show consistent improvement over the baselines for every task and dataset. To handle the second problem, I formulate human action recognition as an unsupervised learning problem on the target data. In this setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already existing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Here, we have many labeled videos as source data and sparsely labeled videos as target data. The setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject color, spatial, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks. Computer Vision Machine learning Deep learning (Machine learning) Convolutional Neural Networks Representation Learning Action Recognition Domain Adaptation Bias Reduction Semi-Supervised Learning Unsupervised Learning
80	Recurrent neural network language generation for dialogue systems Wen, Tsung-Hsien January 2018 (has links) Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations. A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe. Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.

Search results