Global ETD Search

21	Refinements in hierarchical phrase-based translation systems Pino, Juan Miguel January 2015 (has links) The relatively recently proposed hierarchical phrase-based translation model for statistical machine translation (SMT) has achieved state-of-the-art performance in numerous recent translation evaluations. Hierarchical phrase-based systems comprise a pipeline of modules with complex interactions. In this thesis, we propose refinements to the hierarchical phrase-based model as well as improvements and analyses in various modules for hierarchical phrase-based systems. We took the opportunity of increasing amounts of available training data for machine translation as well as existing frameworks for distributed computing in order to build better infrastructure for extraction, estimation and retrieval of hierarchical phrase-based grammars. We design and implement grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting grammar using the HFile format, which offers competitive trade-offs in terms of efficiency and simplicity. We demonstrate improvements over two alternative solutions used in machine translation. The modular nature of the SMT pipeline, while allowing individual improvements, has the disadvantage that errors committed by one module are propagated to the next. This thesis alleviates this issue between the word alignment module and the grammar extraction and estimation module by considering richer statistics from word alignment models in extraction. We use alignment link and alignment phrase pair posterior probabilities for grammar extraction and estimation and demonstrate translation improvements in Chinese to English translation. This thesis also proposes refinements in grammar and language modelling both in the context of domain adaptation and in the context of the interaction between first-pass decoding and lattice rescoring. We analyse alternative strategies for grammar and language model cross-domain adaptation. We also study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methods for large 5-gram language model rescoring. The last two chapters are devoted to the application of phrase-based grammars to the string regeneration task, which we consider as a means to study the fluency of machine translation output. We design and implement a monolingual phrase-based decoder for string regeneration and achieve state-of-the-art performance on this task. By applying our decoder to the output of a hierarchical phrase-based translation system, we are able to recover the same level of translation quality as the translation system. 410.285
22	A Computational Approach to the Analysis and Generation of Emotion in Text Keshtkar, Fazel January 2011 (has links) Sentiment analysis is a field of computational linguistics involving identification, extraction, and classification of opinions, sentiments, and emotions expressed in natural language. Sentiment classification algorithms aim to identify whether the author of a text has a positive or a negative opinion about a topic. One of the main indicators which help to detect the opinion are the words used in the texts. Needless to say, the sentiments expressed in the texts also depend on the syntactic structure and the discourse context. Supervised machine learning approaches to sentiment classification were shown to achieve good results. Classifying texts by emotions requires finer-grained analysis than sentiment classification. In this thesis, we explore the task of emotion and mood classification for blog postings. We propose a novel approach that uses the hierarchy of possible moods to achieve better results than a standard flat classification approach. We also show that using sentiment orientation features improves the performance of classification. We used the LiveJournal blog corpus as a dataset to train and evaluate our method. Another contribution of this work is extracting paraphrases for emotion terms based on the six basics emotions proposed by Ekman (\textit{happiness, anger, sadness, disgust, surprise, fear}). Paraphrases are different ways to express the same information. Algorithms to extract and automatically identify paraphrases are of interest from both linguistic and practical points of view. Our paraphrase extraction method is based on a bootstrapping algorithms that starts with seed words. Unlike in previous work, our algorithm does not need a parallel corpus. In Natural Language Generation (NLG), paraphrasing is employed to create more varied and natural text. In our research, we extract paraphrases for emotions, with the goal of using them to automatically generate emotional texts (such as friendly or hostile texts) for conversations between intelligent agents and characters in educational games. Nowadays, online services are popular in many disciplines such as: e-learning, interactive games, educational games, stock market, chat rooms and so on. NLG methods can be used in order to generate more interesting and normal texts for such applications. Generating text with emotions is one of the contributions of our work. In the last part of this thesis, we give an overview of NLG from an applied system's points of view. We discuss when NLG techniques can be used; we explained the requirements analysis and specification of NLG systems. We also, describe the main NLG tasks of content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation, and linguistic realisation. Moreover, we describe our Authoring Tool that we developed in order to allow writers without programming skills to automatically generate texts for educational games. We develop an NLG system that can generate text with different emotions. To do this, we introduce our pattern-based model for generation. We show our model starts with initial patterns, then constructs extended patterns from which we choose ``final'' patterns that are suitable for generating emotion sentences. A user can generate sentences to express the desired emotions by using our patterns. Alternatively, the user can use our Authoring Tool to generate sentences with emotions. Our acquired paraphrases will be employed by the tool in order to generate more varied outputs. Natural Language Processing Natural Language Generation Emotion Analysis Sentiment Orientation Paraphrase Bootstrapping Authoring Tool
23	A Decision Theoretic Approach to Natural Language Generation McKinley, Nathan D. 21 February 2014 (has links) No description available. Computer Science
24	Algorithms and Resources for Scalable Natural Language Generation Pfeil, Jonathan W. 01 September 2016 (has links) No description available. Computer Science Language natural language generation sentence generation linguistic resources discourse planning nlp nlg scaling
25	Determinação de conteúdo para geração de língua natural baseada em personalidade / Content planning for natural language generation based on personality Ramos, Ricelli Moreira Silva 25 June 2018 (has links) O presente trabalho aborda a determinação de conteúdo na fase de planejamento do documento no pipeline de Geração de Língua Natural (GLN) usando fatores de personalidade do modelo dos Cinco Grandes Fatores (CGF). O principal objetivo deste trabalho é gerar modelos computacionais de determinação de conteúdo baseados nos fatores de personalidade CGF. O trabalho aplicará técnicas já existentes de GLN para determinação de conteúdo, levando em conta os fatores de personalidade mapeados pelo modelo CGF. São utilizados os conceitos descritos por substantivos e os conceitos descritos por adjetivos relacionados aos substantivos na tarefa de descrição de cenas para a determinação de conteúdo. As principais contribuições desse trabalho são uma investigação de se e como a determinação de conteúdo de descrições textuais é influenciada pela personalidade do autor, além de entregar um modelo de determinação de conteúdo baseado em personalidade para os conceitos em que o trabalho foi aplicado, entre outras entregas. É apresentado o embasamento teórico com os conceitos fundamentais de GLN, e mais especificamente de determinação de conteúdo, foco dessa pesquisa. Além disso, são apresentados os modelos de personalidade humana, com destaque ao modelo CGF e inventários CGF, utilizados para a coleta de dados e execução dessa proposta. São apresentados também os principais trabalhos relacionados à GLN e modelo CGF, mesmo que não tratem especificamente da influência dos CGF na determinação de conteúdo. Um experimento para coleta do córpus utilizado na pesquisa é descrito, e também os modelos para determinação de conteúdo no âmbito de conceitos representando entidades visuais e seus predicados, assim como a avaliação desses modelos. Por fim, são apresentadas as conclusões obtidas com os modelos desenvolvidos e experimentos realizados / The present research approaches the content determination in the document planning phase of the Natural Language Generation (NLG) pipeline using personality factors of the Big Five Factor (BFF) model. The main objective of this research is to generate computational models of content determination based on the BFF personality factors. This research will apply existing NLG models to the content determination, taking into account the personality factors mapped by the BFF model. The concepts described by nouns and the concepts described by adjectives related to nouns in the task of describing scenes for content determination are used. The main contributions of this research are an investigation of if and how the content determination of textual descriptions is influenced by the personality of the author, in addition to providing a personality-based content determination model for the concepts in which the research was applied, among others deliveries. This document presents the theoretical basis and the fundamental NLG concepts, and more specifically the concept of content determination, which is the focus of this research. In addition, human personality models are presented, with emphasis on the BFF model and BFF inventories, used both for data collection and development of this proposal. The main studies related to NLG and the BFF model, even if they do not specifically address the influence of BFF in content determination, are also presented. An experiment for collecting the corpus used in the research is described, also the models to determine content in the scope of concepts representing visual entities and their predicates, as well as an evaluation of these models. Finally, the conclusions obtained with the developed models and experiments are presented BFF model Big Five Factors Content determination Determinação de conteúdo Fatores de personalidade Geração de língua natural Modelo CGF Natural language generation
26	Determinação de conteúdo para geração de língua natural baseada em personalidade / Content planning for natural language generation based on personality Ricelli Moreira Silva Ramos 25 June 2018 (has links) O presente trabalho aborda a determinação de conteúdo na fase de planejamento do documento no pipeline de Geração de Língua Natural (GLN) usando fatores de personalidade do modelo dos Cinco Grandes Fatores (CGF). O principal objetivo deste trabalho é gerar modelos computacionais de determinação de conteúdo baseados nos fatores de personalidade CGF. O trabalho aplicará técnicas já existentes de GLN para determinação de conteúdo, levando em conta os fatores de personalidade mapeados pelo modelo CGF. São utilizados os conceitos descritos por substantivos e os conceitos descritos por adjetivos relacionados aos substantivos na tarefa de descrição de cenas para a determinação de conteúdo. As principais contribuições desse trabalho são uma investigação de se e como a determinação de conteúdo de descrições textuais é influenciada pela personalidade do autor, além de entregar um modelo de determinação de conteúdo baseado em personalidade para os conceitos em que o trabalho foi aplicado, entre outras entregas. É apresentado o embasamento teórico com os conceitos fundamentais de GLN, e mais especificamente de determinação de conteúdo, foco dessa pesquisa. Além disso, são apresentados os modelos de personalidade humana, com destaque ao modelo CGF e inventários CGF, utilizados para a coleta de dados e execução dessa proposta. São apresentados também os principais trabalhos relacionados à GLN e modelo CGF, mesmo que não tratem especificamente da influência dos CGF na determinação de conteúdo. Um experimento para coleta do córpus utilizado na pesquisa é descrito, e também os modelos para determinação de conteúdo no âmbito de conceitos representando entidades visuais e seus predicados, assim como a avaliação desses modelos. Por fim, são apresentadas as conclusões obtidas com os modelos desenvolvidos e experimentos realizados / The present research approaches the content determination in the document planning phase of the Natural Language Generation (NLG) pipeline using personality factors of the Big Five Factor (BFF) model. The main objective of this research is to generate computational models of content determination based on the BFF personality factors. This research will apply existing NLG models to the content determination, taking into account the personality factors mapped by the BFF model. The concepts described by nouns and the concepts described by adjectives related to nouns in the task of describing scenes for content determination are used. The main contributions of this research are an investigation of if and how the content determination of textual descriptions is influenced by the personality of the author, in addition to providing a personality-based content determination model for the concepts in which the research was applied, among others deliveries. This document presents the theoretical basis and the fundamental NLG concepts, and more specifically the concept of content determination, which is the focus of this research. In addition, human personality models are presented, with emphasis on the BFF model and BFF inventories, used both for data collection and development of this proposal. The main studies related to NLG and the BFF model, even if they do not specifically address the influence of BFF in content determination, are also presented. An experiment for collecting the corpus used in the research is described, also the models to determine content in the scope of concepts representing visual entities and their predicates, as well as an evaluation of these models. Finally, the conclusions obtained with the developed models and experiments are presented Determinação de conteúdo Fatores de personalidade Geração de língua natural Modelo CGF BFF model Big Five Factors Content determination Natural language generation
27	Nové metody generování promluv v dialogových systémech / Novel Methods for Natural Language Generation in Spoken Dialogue Systems Dušek, Ondřej January 2017 (has links) Title: Novel Methods for Natural Language Generation in Spoken Dialogue Systems Author: Ondřej Dušek Department: Institute of Formal and Applied Linguistics Supervisor: Ing. Mgr. Filip Jurčíček, Ph.D., Institute of Formal and Applied Linguistics Abstract: This thesis explores novel approaches to natural language generation (NLG) in spoken dialogue systems (i.e., generating system responses to be presented the user), aiming at simplifying adaptivity of NLG in three respects: domain portability, language portability, and user-adaptive outputs. Our generators improve over state-of-the-art in all of them: First, our gen- erators, which are based on statistical methods (A* search with perceptron ranking and sequence-to-sequence recurrent neural network architectures), can be trained on data without fine-grained semantic alignments, thus simplifying the process of retraining the generator for a new domain in comparison to previous approaches. Second, we enhance the neural-network-based gener- ator so that it takes preceding dialogue context into account (i.e., user's way of speaking), thus producing user-adaptive outputs. Third, we evaluate sev- eral extensions to the neural-network-based generator designed for producing output in morphologically rich languages, showing improvements in Czech generation. In...
28	Automatic movie analysis and summarisation Gorinski, Philip John January 2018 (has links) Automatic movie analysis is the task of employing Machine Learning methods to the field of screenplays, movie scripts, and motion pictures to facilitate or enable various tasks throughout the entirety of a movie’s life-cycle. From helping with making informed decisions about a new movie script with respect to aspects such as its originality, similarity to other movies, or even commercial viability, all the way to offering consumers new and interesting ways of viewing the final movie, many stages in the life-cycle of a movie stand to benefit from Machine Learning techniques that promise to reduce human effort, time, or both. Within this field of automatic movie analysis, this thesis addresses the task of summarising the content of screenplays, enabling users at any stage to gain a broad understanding of a movie from greatly reduced data. The contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale data set of original movie scripts, annotated with additional meta-information such as genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script- Base is the largest data set of its kind, containing scripts and information for almost 1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the screenplay domain, which allows for extraction of highly informative and important scenes from movie scripts. The extracted summaries allow for the content of the original script to stay largely intact and provide the user with its important parts, while greatly reducing the script-reading time. (iii) We extend our summarisation model to capture additional modalities beyond the screenplay text. The model is rendered multi-modal by introducing visual information obtained from the actual movie and by extracting scenes from the movie, allowing users to generate visual summaries of motion pictures. (iv) We devise a novel end-to-end neural network model for generating natural language screenplay overviews. This model enables the user to generate short descriptive and informative texts that capture certain aspects of a movie script, such as its genres, approximate content, or style, allowing them to gain a fast, high-level understanding of the screenplay. Multiple automatic and human evaluations were carried out to assess the performance of our models, demonstrating that they are well-suited for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the ScriptBase data set has started to gain traction, and is currently used by a number of other researchers in the field to tackle various tasks relating to screenplays and their analysis.
29	Tell me why : uma arquitetura para fornecer explicações sobre revisões / Tell me why : an architecture to provide rich review explanations Woloszyn, Vinicius January 2015 (has links) O que as outras pessoas pensam sempre foi uma parte importante do processo de tomada de decisão. Por exemplo, as pessoas costumam consultar seus amigos para obter um parecer sobre um livro ou um filme ou um restaurante. Hoje em dia, os usuários publicam suas opiniões em sites de revisão colaborativa, como IMDB para filmes, Yelp para restaurantes e TripAdiviser para hotéis. Ao longo do tempo, esses sites têm construído um enorme banco de dados que conecta usuários, artigos e opiniões expressas por uma classificação numérica e um comentário de texto livre que explicam por que eles gostam ou não gostam de um item. Mas essa vasta quantidade de dados pode prejudicar o usuário a obter uma opinião. Muitos trabalhos relacionados fornecem uma interpretações de revisões para os usuários. Eles oferecem vantagens diferentes para vários tipos de resumos. No entanto, todos eles têm a mesma limitação: eles não fornecem resumos personalizados nem contrastantes comentários escritos por diferentes segmentos de colaboradores. Compreeder e contrastar comentários escritos por diferentes segmentos de revisores ainda é um problema de pesquisa em aberto. Assim, nosso trabalho propõe uma nova arquitetura, chamado Tell Me Why. TMW é um projeto desenvolvido no Laboratório de Informática Grenoble em cooperação com a Universidade Federal do Rio Grande do Sul para fornecer aos usuários uma melhor compreensão dos comentários. Propomos uma combinação de análise de texto a partir de comentários com a mineração de dados estruturado resultante do cruzamento de dimensões do avaliador e item. Além disso, este trabalho realiza uma investigação sobre métodos de sumarização utilizados na revisão de produtos. A saída de nossa arquitetura consiste em declarações personalizadas de texto usando Geração de Linguagem Natural composto por atributos de itens e comentários resumidos que explicam a opinião das pessoas sobre um determinado assunto. Os resultados obtidos a partir de uma avaliação comparativa com a Revisão Mais Útil da Amazon revelam que é uma abordagem promissora e útil na opinião do usuário. / What other people think has been always an important part of the process of decision-making. For instance, people usually consult their friends to get an opinion about a book, or a movie or a restaurant. Nowadays, users publish their opinions on collaborative reviewing sites such as IMDB for movies, Yelp for restaurants and TripAdvisor for hotels. Over the time, these sites have built a massive database that connects users, items and opinions expressed by a numeric rating and a free text review that explain why they like or dislike a specific item. But this vast amount of data can hamper the user to get an opinion. Several related work provide a review interpretations to the users. They offer different advantages for various types of summaries. However, they all have the same limitation: they do not provide personalized summaries nor contrasting reviews written by different segments of reviewers. Understanding and contrast reviews written by different segments of reviewers is still an open research problem. Our work proposes a new architecture, called Tell Me Why, which is a project developed at Grenoble Informatics Laboratory in cooperation with Federal University of Rio Grande do Sul to provide users a better understanding of reviews. We propose a combination of text analysis from reviews with mining structured data resulting from crossing reviewer and item dimensions. Additionally, this work performs an investigation of summarization methods utilized in review domain. The output of our architecture consists of personalized statement using Natural Language Generation that explain people’s opinion about a particular item. The evaluation reveal that it is a promising approach and useful in user’s opinion. Processamento : Linguagem natural Linguagem natural Mineracao : Dados Opinion mining Data mining Natural language processing Natural language generation Big data
30	Fixed Verse Generation using Neural Word Embeddings January 2016 (has links) abstract: For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a novel approach to fixed verse poetry generation using neural word embeddings. During the course of generation, a two layered poetry classifier is developed. The first layer uses a lexicon based method to classify poems into types based on form and structure, and the second layer uses a supervised classification method to classify poems into subtypes based on content with an accuracy of 92%. The system then uses a two-layer neural network to generate poetry based on word similarities and word movements in a 50-dimensional vector space. The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators. / Dissertation/Thesis / Masters Thesis Computer Science 2016 Computer science Linguistics Artificial intelligence Computational Creativity Linguistic Creativity Machine Learning Natural Language Generation Vector Space Model

Search results