Global ETD Search

51	[en] STORY ENGINEERING: A STUDY OF THE AUTOMATIC STORY GENERATION AND TELLING / [pt] ENGENHARIA DE ESTÓRIAS: UM ESTUDO SOBRE A GERAÇÃO E NARRAÇÃO AUTOMÁTICA DE ESTÓRIAS FABIO WANDERLEY GUERRA 29 May 2008 (has links) [pt] Nesta dissertação é estudado o problema de geração e narração de estórias, cuja relevância tem sido cada vez mais reconhecida, principalmente em decorrência da popularização de meios de comunicação interativos, tais como a TV digital e os jogos digitais. O trabalho partiu de uma revisão do estado da arte, destacando os principais modelos para representação de estórias e as técnicas mais utilizadas na criação de obras literárias. Foi proposto o uso do termo engenharia de estórias para enfatizar que a tarefa de geração e narração de estórias deve ser encarada como um processo de engenharia. O problema fundamental foi dividido em três subproblemas. O primeiro diz respeito a como gerar as estórias, o segundo a como contá-las ao público e o último é sobre como construir, armazenar e consultar a base de conhecimento usada na engenharia de estórias. Por fim, como estudo de caso, foi projetado e programado um protótipo capaz de gerar e narrar estórias automaticamente. A geração é efetuada por um planejador, usando o algoritmo de Redes de Tarefas Hierárquicas. Para a narração, é utilizado um gerador de textos em linguagem natural. A base de conhecimento é armazenada na forma de documentos XML tendo sido implementada uma ferramenta para facilitar sua preparação. / [en] This dissertation investigates the problem of story telling and generation, whose increasingly recognized relevance is mostly due to the popularization of interactive media, such as digital TV and video-games. The work initiates with a state of the art survey, detailing the major story representation models and the most used methods in literary work production. The use of the term story engineering was proposed to emphasize that story telling and generation should be viewed as an engineering process. The fundamental problem was divided into three subproblems. The first one is how to generate stories, the second is how to tell them to the public and the last is how to create, store and query the knowledge base used for story engineering. Finally, as a case study, a prototype capable of automatically generating and telling stories was designed and programmed. Generation is done by a planner, using the Hierarchical Task Network algorithm. Storytelling applies a natural language generation tool. The knowledge base is stored under the form ofXMLdocuments, and a tool was implemented to simplify their preparation. [pt] INTELIGENCIA ARTIFICIAL [en] ARTIFICIAL INTELLIGENCE [pt] ENGENHARIA DE ESTORIAS [en] STORY ENGENEERING [pt] GERACAO AUTOMATICA [en] AUTOMATIC STORY GENERATION [pt] GERACAO DE LINGUAGEM NATURAL [en] NATURAL LANGUAGE GENERATION [pt] NARRACAO DIGITAL DE ESTORIAS [en] INTERACTIVE STORYTELLING
52	Sobre o uso da gramática de dependência extensível na geração de língua natural: questões de generalidade, instanciabilidade e complexidade / On the application of extensible dependency grammar to natural language generation: generality, instantiability and complexity issues Pelizzoni, Jorge Marques 29 August 2008 (has links) A Geração de Língua Natural (GLN) ocupa-se de atribuir forma lingüística a dados em representação não-lingüística (Reiter & Dale, 2000); a Realização Lingüística (RL), por sua vez, reúne as subtarefas da GLN estritamente dependentes das especificidades da língua-alvo. Este trabalho objetiva a investigação em RL, uma de cujas aplicações mais proeminentes é a construção de módulos geradores de língua-alvo na tradução automática baseada em transferência semântica. Partimos da identificação de três requisitos fundamentais para modelos de RL quais sejam generalidade, instanciabilidade e complexidade e da tensão entre esses requisitos no estado da arte. Argumentamos pela relevância da avaliação formal dos modelos da literatura contra esses critérios e focalizamos em modelos baseados em restrições (Schulte, 2002) como promissores para reconciliar os três requisitos. Nesta classe de modelos, identificamos o recente modelo de Debusmann (2006) Extensible Dependency Grammar (XDG) e sua implementação - o XDG Development Toolkit (XDK) - como uma plataforma especialmente promissora para o desenvolvimento em RL, apesar de jamais utilizada para tal. Nossas contribuições práticas se resumem ao esforço de tornar o XDK mais eficiente e uma formulação da disjunção inerente à lexicalização adequada à XDG, demonstrando suas potenciais vantagens numa sistema de GLN mais completo / Natural Language Generation (NLG) concerns assigning linguistic form to data in nonlinguistic representation (Reiter & Dale, 2000); Linguistic Realization (LR), in turn, comprises all strictly target language-dependent NLG tasks. This work looks into RL systems from the perspective of three fundamental requirements - namely generality, instantiability, and complexity and the tension between them in the state of the art. We argue for the formal evaluation of models against these criteria and focus on constraint-based models (Schulte, 2002) as tools to reconcile them. In this class of models we identify the recent development of Debusmann (2006) - Extensible Dependency Grammar (XDG) - and its implementation - the XDG Development Toolkit (XDK) - as an especially promising platform for RL work, in spite of never having been used as such. Our practical contributions comprehend a successful effort to make the XDK more efficient and a formulation of lexicalization disjunction suitable to XDG, illustrating its potential advantages in a full-fledged NLG system Artificial intelligence Constraint programming Engenharia gramatical Extensible dependency grammar Geração de língua natural Gramática de dependência extensível Grammar engineering Inteligência artificial Modularidade Modularity Natural language generation Programação por restrições XDG XDG
53	Surface Realization Using a Featurized Syntactic Statistical Language Model Packer, Thomas L. 13 March 2006 (has links) An important challenge in natural language surface realization is the generation of grammatical sentences from incomplete sentence plans. Realization can be broken into a two-stage process consisting of an over-generating rule-based module followed by a ranker that outputs the most probable candidate sentence based on a statistical language model. Thus far, an n-gram language model has been evaluated in this context. More sophisticated syntactic knowledge is expected to improve such a ranker. In this thesis, a new language model based on featurized functional dependency syntax was developed and evaluated. Generation accuracies and cross-entropy for the new language model did not beat the comparison bigram language model. natural language generation natural language processing NLP NLG Bayesian networks decision trees context specific independence realization statistical language model standard pipeline architecture n-gram (bigram) language model syntax features statistical model machine learning Computer Sciences
54	Génération de résumés par abstraction Genest, Pierre-Étienne 05 1900 (has links) Cette thèse présente le résultat de plusieurs années de recherche dans le domaine de la génération automatique de résumés. Trois contributions majeures, présentées sous la forme d'articles publiés ou soumis pour publication, en forment le coeur. Elles retracent un cheminement qui part des méthodes par extraction en résumé jusqu'aux méthodes par abstraction. L'expérience HexTac, sujet du premier article, a d'abord été menée pour évaluer le niveau de performance des êtres humains dans la rédaction de résumés par extraction de phrases. Les résultats montrent un écart important entre la performance humaine sous la contrainte d'extraire des phrases du texte source par rapport à la rédaction de résumés sans contrainte. Cette limite à la rédaction de résumés par extraction de phrases, observée empiriquement, démontre l'intérêt de développer d'autres approches automatiques pour le résumé. Nous avons ensuite développé un premier système selon l'approche Fully Abstractive Summarization, qui se situe dans la catégorie des approches semi-extractives, comme la compression de phrases et la fusion de phrases. Le développement et l'évaluation du système, décrits dans le second article, ont permis de constater le grand défi de générer un résumé facile à lire sans faire de l'extraction de phrases. Dans cette approche, le niveau de compréhension du contenu du texte source demeure insuffisant pour guider le processus de sélection du contenu pour le résumé, comme dans les approches par extraction de phrases. Enfin, l'approche par abstraction basée sur des connaissances nommée K-BABS est proposée dans un troisième article. Un repérage des éléments d'information pertinents est effectué, menant directement à la génération de phrases pour le résumé. Cette approche a été implémentée dans le système ABSUM, qui produit des résumés très courts mais riches en contenu. Ils ont été évalués selon les standards d'aujourd'hui et cette évaluation montre que des résumés hybrides formés à la fois de la sortie d'ABSUM et de phrases extraites ont un contenu informatif significativement plus élevé qu'un système provenant de l'état de l'art en extraction de phrases. / This Ph.D. thesis is the result of several years of research on automatic text summarization. Three major contributions are presented in the form of published and submitted papers. They follow a path that moves away from extractive summarization and toward abstractive summarization. The first article describes the HexTac experiment, which was conducted to evaluate the performance of humans summarizing text by extracting sentences. Results show a wide gap of performance between human summaries written by sentence extraction and those written without restriction. This empirical performance ceiling to sentence extraction demonstrates the need for new approaches to text summarization. We then developed and implemented a system, which is the subject of the second article, using the Fully Abstractive Summarization approach. Though the name suggests otherwise, this approach is better categorized as semi-extractive, along with sentence compression and sentence fusion. Building and evaluating this system brought to light the great challenge associated with generating easily readable summaries without extracting sentences. In this approach, text understanding is not deep enough to provide help in the content selection process, as is the case in extractive summarization. As the third contribution, a knowledge-based approach to abstractive summarization called K-BABS was proposed. Relevant content is identified by pattern matching on an analysis of the source text, and rules are applied to directly generate sentences for the summary. This approach is implemented in a system called ABSUM, which generates very short and content-rich summaries. An evaluation was performed according to today's standards. The evaluation shows that hybrid summaries generated by adding extracted sentences to ABSUM's output have significantly more content than a state-of-the-art extractive summarizer. Résumés automatiques Résumés par abstraction Génération des langues naturelles Automatic summarization Abstractive summarization Natural language generation Natural language processing
55	Intégration de VerbNet dans un réalisateur profond Galarreta-Piquette, Daniel 08 1900 (has links) No description available. Génération automatique de texte Théorie Sens-Texte Verbe Cadres syntaxiques Patrons de régime Réalisation linguistique Natural language generation Linguistic realization Government patterns Syntactic frames Verbs Meaning-Text theory
56	Recurrent neural network language generation for dialogue systems Wen, Tsung-Hsien January 2018 (has links) Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations. A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe. Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.
57	Langage, engagement et émotions : les ressources de la génération linguistique et de l'intégration émotionnelle dans le discours scientifique / No title Pichard, Hugues 03 December 2012 (has links) L'émotion et le discours scientifique sont traditionnellement deux domaines considérés comme incompatibles du fait du caractère subjectif de la première et des exigences d'objectivité et de neutralité du dernier. La thèse propose une étude des processus en œuvre dans la génération des émotions en relation avec la constitution du discours, et ensuite des stratégies ou modes d'intégration des phénomènes de manifestations émotionnelles dans le discours final. L'étude combine une approche psychologique et linguistique des émotions et s'articule autour de la transition entre le domaine mental et celui de la préparation et de l'expression finale des émotions générées dans le discours (lien entre l'évaluation cognitive et l'Appraisal theory en linguistique). Une synthèse de grandes typologies des modes d'inclusion, ainsi que des grands types de manifestations de charges émotionnelles-affectives globale est présentée dans la thèse. Cette même synthèse est issue de la recherche et de l'analyse de manifestations d'émotions incluses volontairement ou non dans un corpus d'articles de la presse scientifique anglophone soumise à comité de relecture. L'un des objectifs étant de déterminer si le discours scientifique présente des manifestations de subjectivité des auteurs, et par quels biais ces phénomènes de subjectivité et d'émotion s'encodent dans des textes devant, selon les normes et conventions, être aussi objectifs et neutres que possible, quelle que soit la discipline concernée. A donc été soulevée la question de la part prise par l'émotion dans le discours en général, de sa constitution à son expression. / Emotion and scientific discourse are, by tradition, considered to be incompatible due to the subjective nature of the former and the objectivity requirements of the latter. The thesis deals with a study of the processes involved in emotion generation in relation with discourse generation. This is followed by the approach of the modes or strategies of inclusion of emotion manifestation phenomena in the final discourse. The study combines a psychological and linguistic approach of emotions, and revolves around the transition stage between the mental domain and that of language elaboration, leading to communicating the previously generated emotions in discourse (link between cognitive appraisal and the Appraisal theory in linguistics). The thesis sums up the broad categories of inclusion modes, as well as the main global emotion/affect load manifestation types. This very synthesis is the result of the research and analysis of emotion manifestations deliberately or accidentally included into a selection of articles sampled out from english-speaking peer-reviewed scientific press. One of the goals was to determine if scientific discourse contains and displays the authors' subjectivity and emotion manifestations, and how these phenomena are encoded in texts that are primarily meant (according to norms and conventions) to be as objective and neutral as possible, regardless of the subject. Thus was raised the question of the importance of the share taken by emotion in any discourse, from its generation to the moment it is uttered. Émotion Langue Discours Cognition Psychologie Affect Évaluation Discours de spécialité Notions Processus Génération linguistique Génération émotionnelle Linguistique systémique fonctionnelle Emotion Language Discourse Cognition Psychology Affect Appraisal Specialised discourse Notions Process Language generation Emotion generation Systemic functional linguistics 401.4
58	Sobre o uso da gramática de dependência extensível na geração de língua natural: questões de generalidade, instanciabilidade e complexidade / On the application of extensible dependency grammar to natural language generation: generality, instantiability and complexity issues Jorge Marques Pelizzoni 29 August 2008 (has links) A Geração de Língua Natural (GLN) ocupa-se de atribuir forma lingüística a dados em representação não-lingüística (Reiter & Dale, 2000); a Realização Lingüística (RL), por sua vez, reúne as subtarefas da GLN estritamente dependentes das especificidades da língua-alvo. Este trabalho objetiva a investigação em RL, uma de cujas aplicações mais proeminentes é a construção de módulos geradores de língua-alvo na tradução automática baseada em transferência semântica. Partimos da identificação de três requisitos fundamentais para modelos de RL quais sejam generalidade, instanciabilidade e complexidade e da tensão entre esses requisitos no estado da arte. Argumentamos pela relevância da avaliação formal dos modelos da literatura contra esses critérios e focalizamos em modelos baseados em restrições (Schulte, 2002) como promissores para reconciliar os três requisitos. Nesta classe de modelos, identificamos o recente modelo de Debusmann (2006) Extensible Dependency Grammar (XDG) e sua implementação - o XDG Development Toolkit (XDK) - como uma plataforma especialmente promissora para o desenvolvimento em RL, apesar de jamais utilizada para tal. Nossas contribuições práticas se resumem ao esforço de tornar o XDK mais eficiente e uma formulação da disjunção inerente à lexicalização adequada à XDG, demonstrando suas potenciais vantagens numa sistema de GLN mais completo / Natural Language Generation (NLG) concerns assigning linguistic form to data in nonlinguistic representation (Reiter & Dale, 2000); Linguistic Realization (LR), in turn, comprises all strictly target language-dependent NLG tasks. This work looks into RL systems from the perspective of three fundamental requirements - namely generality, instantiability, and complexity and the tension between them in the state of the art. We argue for the formal evaluation of models against these criteria and focus on constraint-based models (Schulte, 2002) as tools to reconcile them. In this class of models we identify the recent development of Debusmann (2006) - Extensible Dependency Grammar (XDG) - and its implementation - the XDG Development Toolkit (XDK) - as an especially promising platform for RL work, in spite of never having been used as such. Our practical contributions comprehend a successful effort to make the XDK more efficient and a formulation of lexicalization disjunction suitable to XDG, illustrating its potential advantages in a full-fledged NLG system Engenharia gramatical Geração de língua natural Gramática de dependência extensível Inteligência artificial Modularidade Programação por restrições XDG Artificial intelligence Constraint programming Extensible dependency grammar Grammar engineering Modularity Natural language generation XDG
59	A Comparative Study of the Quality between Formality Style Transfer of Sentences in Swedish and English, leveraging the BERT model / En jämförande studie av kvaliteten mellan överföring av formalitetsstil på svenska och engelska meningar, med hjälp av BERT-modellen Lindblad, Maria January 2021 (has links) Formality Style Transfer (FST) is the task of automatically transforming a piece of text from one level of formality to another. Previous research has investigated different methods of performing FST on text in English, but at the time of this project there were to the author’s knowledge no previous studies analysing the quality of FST on text in Swedish. The purpose of this thesis was to investigate how a model trained for FST in Swedish performs. This was done by comparing the quality of a model trained on text in Swedish for FST, to an equivalent model trained on text in English for FST. Both models were implemented as encoder-decoder architectures, warm-started using two pre-existing Bidirectional Encoder Representations from Transformers (BERT) models, pre-trained on Swedish and English text respectively. The two FST models were fine-tuned for both the informal to formal task as well as the formal to informal task, using the Grammarly’s Yahoo Answers Formality Corpus (GYAFC). The Swedish version of GYAFC was created through automatic machine translation of the original English version. The Swedish corpus was then evaluated on the three criteria meaning preservation, formality preservation and fluency preservation. The results of the study indicated that the Swedish model had the capacity to match the quality of the English model but was held back by the inferior quality of the Swedish corpus. The study also highlighted the need for task specific corpus in Swedish. / Överföring av formalitetsstil syftar på uppgiften att automatiskt omvandla ett stycke text från en nivå av formalitet till en annan. Tidigare forskning har undersökt olika metoder för att utföra uppgiften på engelsk text men vid tiden för detta projekt fanns det enligt författarens vetskap inga tidigare studier som analyserat kvaliteten för överföring av formalitetsstil på svensk text. Syftet med detta arbete var att undersöka hur en modell tränad för överföring av formalitetsstil på svensk text presterar. Detta gjordes genom att jämföra kvaliteten på en modell tränad för överföring av formalitetsstil på svensk text, med en motsvarande modell tränad på engelsk text. Båda modellerna implementerades som kodnings-avkodningsmodeller, vars vikter initierats med hjälp av två befintliga Bidirectional Encoder Representations from Transformers (BERT)-modeller, förtränade på svensk respektive engelsk text. De två modellerna finjusterades för omvandling både från informell stil till formell och från formell stil till informell. Under finjusteringen användes en svensk och en engelsk version av korpusen Grammarly’s Yahoo Answers Formality Corpus (GYAFC). Den svenska versionen av GYAFC skapades genom automatisk maskinöversättning av den ursprungliga engelska versionen. Den svenska korpusen utvärderades sedan med hjälp av de tre kriterierna betydelse-bevarande, formalitets-bevarande och flödes-bevarande. Resultaten från studien indikerade att den svenska modellen hade kapaciteten att matcha kvaliteten på den engelska modellen men hölls tillbaka av den svenska korpusens sämre kvalitet. Studien underströk också behovet av uppgiftsspecifika korpusar på svenska. Formality Style Transfer BERT Natural Language Generation Swedish Language Models GYAFC Encoder-Decoder Models Överföring av formalitetsstil BERT Generering av naturligt språk Svenska språkmodeller GYAFC Kodnings-avkodningsmodeller Computer Sciences Datavetenskap (datalogi)
60	Le traitement des locutions en génération automatique de texte multilingue Dubé, Michaelle 08 1900 (has links) La locution est peu étudiée en génération automatique de texte (GAT). Syntaxiquement, elle forme un syntagme, alors que sémantiquement, elle ne constitue qu’une seule unité. Le présent mémoire propose un traitement des locutions en GAT multilingue qui permet d’isoler les constituants de la locution tout en conservant le sens global de celle-ci. Pour ce faire, nous avons élaboré une solution flexible à base de patrons universels d’arbres de dépendances syntaxiques vers lesquels pointent des patrons de locutions propres au français (Pausé, 2017). Notre traitement a été effectué dans le réalisateur de texte profond multilingue GenDR à l’aide des données du Réseau lexical du français (RL-fr). Ce travail a abouti à la création de 36 règles de lexicalisation par patron (indépendantes de la langue) et à un dictionnaire lexical pour les locutions du français. Notre implémentation couvre 2 846 locutions du RL-fr (soit 97,5 %), avec une précision de 97,7 %. Le mémoire se divise en cinq chapitres, qui décrivent : 1) l’architecture classique en GAT et le traitement des locutions par différents systèmes symboliques ; 2) l’architecture de GenDR, (principalement sa grammaire, ses dictionnaires, son interface sémantique-syntaxe et ses stratégies de lexicalisations) ; 3) la place des locutions dans la phraséologie selon la théorie Sens-Texte, ainsi que le RL-fr et ses patrons syntaxiques linéarisés ; 4) notre implémentation de la lexicalisation par patron des locutions dans GenDR, et 5) notre évaluation de la couverture de la précision de notre implémentation. / Idioms are rarely studied in natural language generation (NLG). Syntactically, they form a phrase, while semantically, they correspond to a single unit. In this master’s thesis, we propose a treatment of idioms in multilingual NLG that enables us to isolate their constituents while preserving their global meaning. To do so, we developed a flexible solution based on universal templates of syntactic dependency trees, onto which we map French-specific idiom patterns (Pausé, 2017). Our work was implemented in Generic Deep Realizer (GenDR) using data from the Réseau lexical du français (RL-fr). This resulted in the creation of 36 template-based lexicalization rules (independent of language) and of a lexical dictionary for French idioms. Our implementation covers 2846 idioms of the RL-fr (i.e., 97.5%), with an accuracy of 97.7%. We divided our analysis into five chapters, which describe: 1) the classical NLG architecture and the handling of idioms by different symbolic systems; 2) the architecture of GenDR (mainly its grammar, its dictionaries, its semantic-syntactic interface, and its lexicalization strategies); 3) the place of idioms in phraseology according to Meaning-Text Theory (théorie Sens-Texte), the RL-fr and its linearized syntactic patterns; 4) our implementation of the template lexicalization of idioms in GenDR; and 5) our evaluation of the coverage and the precision of our implementation. locution expression polylexicale génération automatique de texte lexicalisation théorie Sens-Texte réalisation linguistique idiom multiword expressions multilingual natural language generation lexicalization Meaning-Text theory linguistic realization

Search results