Global ETD Search

1	Supporting Process Model Validation through Natural Language Generation Leopold, Henrik, Mendling, Jan, Polyvyanyy, Artem 29 May 2014 (has links) (PDF) The design and development of process-aware information systems is often supported by specifying requirements as business process models. Although this approach is generally accepted as an effective strategy, it remains a fundamental challenge to adequately validate these models given the diverging skill set of domain experts and system analysts. As domain experts often do not feel confident in judging the correctness and completeness of process models that system analysts create, the validation often has to regress to a discourse using natural language. In order to support such a discourse appropriately, so-called verbalization techniques have been defined for different types of conceptual models. However, there is currently no sophisticated technique available that is capable of generating natural-looking text from process models. In this paper, we address this research gap and propose a technique for generating natural language texts from business process models. A comparison with manually created process descriptions demonstrates that the generated texts are superior in terms of completeness, structure, and linguistic complexity. An evaluation with users further demonstrates that the texts are very understandable and effectively allow the reader to infer the process model semantics. Hence, the generated texts represent a useful input for process model validation.
2	DEXTER: Generating Documents by means of computational registers Oldham, Joseph D. 01 January 2000 (has links) Software is often capable of efficiently storing and managing data on computers. However, even software systems that store and manage data efficiently often do an inadequate job of presenting data to users. A prototypical example is the display of raw data in the tabular results of SQL queries. Users may need a presentation that is sensitive to data values and sensitive to domain conventions. One way to enhance presentation is to generate documents that correctly convey the data to users, taking into account the needs of the user and the values in the data. I have designed and implemented a software approach to generating human-readable documents in a variety of domains. The software to generate a document is called a {\em computational register}, or ``register'' for short. A {\em register system} is a software package for authoring and managing individual registers. Registers generating documents in various domains may be managed by one register system. In this thesis I describe computational registers at an architectural level and discuss registers as implemented in DEXTER, my register system. Input to DEXTER registers is a set of SQL query results. DEXTER registers use a rule-based approach to create a document outline from the input. A register creates the output document by using flexible templates to express the document outline. The register approach is unique in several ways. Content determination and structural planning are carried out sequentially rather than simultaneously. Content planning itself is broken down into data re-representation followed by content selection. No advanced linguistic knowledge is required to understand the approach. Register authoring follows a course very similar to writing a single document. The internal data representation and content planning steps allow registers to use flexible templates, rather than more abstract grammar-based approaches, to render the final document, Computational registers are applicable in a variety of domains. What registers can be written is restricted not by domain, but by the original data representation. Finally, DEXTER shows that a single software suite can assist in authoring and management of a variety of registers.
3	Um método para a fusão automática de sentenças similares em português / A method for automatic fusion of similar sentence in portuguese Seno, Eloize Rossi Marques 24 May 2010 (has links) Nos últimos anos, há um crescente interesse por aplicações do Processamento de Língua Natural (PLN) que processam uma coleção de textos sobre um mesmo assunto e produzem um novo texto de saída, quer seja um sumário ou uma resposta para uma dada pergunta. Para se produzir textos com qualidade, essas aplicações precisam lidar adequadamente com vários fenômenos, tais como a redundância, a contradição e a complementaridade de informações. Nesse contexto, um processo que permita a identificação de informações comuns em um conjunto de sentenças relacionadas, e gere uma nova sentença a partir da fusão de informações das sentenças de entrada, sem redundâncias e sem contradições, é de grande relevância para as aplicações que processam múltiplos textos. A fusão automática de sentenças é um tema de pesquisa relativamente recente na literatura de PLN e para a língua portuguesa, em particular, não se tem conhecimento de trabalhos dessa natureza. Neste trabalho propõe-se um método inédito para a fusão de sentenças similares em português, baseado em uma abordagem simbólica e independente de domínio, e produz-se o Zíper, um sistema de fusão sentencial que implementa o método proposto. O Zíper é o primeiro sistema a contemplar a geração de sentenças que expressam todas as informações das sentenças de entrada, ou seja, que representam a união do conjunto. Além disso, ele permite a geração de sentenças que expressam apenas as informações redundantes do conjunto (consideradas mais importantes), isto é, que representam a interseção das sentenças de entrada. O sistema foi avaliado intrinsecamente e os resultados obtidos mostram que, de modo geral, as sentenças produzidas são bem formadas e preservam a mensagem original do conjunto (isto é, a mensagem toda, na fusão por união e apenas a mensagem principal, na fusão por interseção). Zíper também foi avaliado extrinsecamente no contexto de um sumarizador multidocumento do português. Os resultados alcançados sugerem que o método proposto contribui para melhorar a qualidade dos sumários, reduzindo a redundância de informações, que frequentemente provoca a perda de coesão e de coerência / In recent years, there is increasing interest in applications of Natural Language Processing (NLP) that process a collection of texts on the same subject and generate a new output text, for instance, a summary or an answer to a given question. In order to generate quality texts, these applications need to cope with various phenomena such as information redundancy, contradiction and complementarity. In this context, a process that is able to identify common information in a set of related sentences and generate a new sentence by merging information from the input sentences, without redundancies and contradictions, is of great relevance for applications that process multiple texts. Automatic sentence fusion is a relatively new research topic in NLP literature and for Portuguese, in particular, we are not aware of any such work. This work proposes a new method for fusing similar sentences in Portuguese, based on a symbolic and domainindependent approach, and produces Zíper, a sentence fusion system that implements the proposed method. Zíper is the first such system to generate sentences that express all the information from input sentences, i.e., the union of the input set. Moreover, it allows generating sentences that express only the redundant information of the set (considered more important), i.e., the intersection of the input sentences. The system was evaluated intrinsically and the results show that, in general, the generated sentences are well formed and preserve the original message of the set (i.e. the entire message in the fusion by union, and only the main message in the fusion by intersection). Zíper was also evaluated extrinsically in the context of a Portuguese multi-document summarizer. The results suggest that it can improve the quality of summaries by reducing redundancy, which often causes loss of cohesion and coherence Automatic sentence fusion Fusão automática de sentenças Geração de texto a partir de texto Text-on-text generation
4	Um método para a fusão automática de sentenças similares em português / A method for automatic fusion of similar sentence in portuguese Eloize Rossi Marques Seno 24 May 2010 (has links) Nos últimos anos, há um crescente interesse por aplicações do Processamento de Língua Natural (PLN) que processam uma coleção de textos sobre um mesmo assunto e produzem um novo texto de saída, quer seja um sumário ou uma resposta para uma dada pergunta. Para se produzir textos com qualidade, essas aplicações precisam lidar adequadamente com vários fenômenos, tais como a redundância, a contradição e a complementaridade de informações. Nesse contexto, um processo que permita a identificação de informações comuns em um conjunto de sentenças relacionadas, e gere uma nova sentença a partir da fusão de informações das sentenças de entrada, sem redundâncias e sem contradições, é de grande relevância para as aplicações que processam múltiplos textos. A fusão automática de sentenças é um tema de pesquisa relativamente recente na literatura de PLN e para a língua portuguesa, em particular, não se tem conhecimento de trabalhos dessa natureza. Neste trabalho propõe-se um método inédito para a fusão de sentenças similares em português, baseado em uma abordagem simbólica e independente de domínio, e produz-se o Zíper, um sistema de fusão sentencial que implementa o método proposto. O Zíper é o primeiro sistema a contemplar a geração de sentenças que expressam todas as informações das sentenças de entrada, ou seja, que representam a união do conjunto. Além disso, ele permite a geração de sentenças que expressam apenas as informações redundantes do conjunto (consideradas mais importantes), isto é, que representam a interseção das sentenças de entrada. O sistema foi avaliado intrinsecamente e os resultados obtidos mostram que, de modo geral, as sentenças produzidas são bem formadas e preservam a mensagem original do conjunto (isto é, a mensagem toda, na fusão por união e apenas a mensagem principal, na fusão por interseção). Zíper também foi avaliado extrinsecamente no contexto de um sumarizador multidocumento do português. Os resultados alcançados sugerem que o método proposto contribui para melhorar a qualidade dos sumários, reduzindo a redundância de informações, que frequentemente provoca a perda de coesão e de coerência / In recent years, there is increasing interest in applications of Natural Language Processing (NLP) that process a collection of texts on the same subject and generate a new output text, for instance, a summary or an answer to a given question. In order to generate quality texts, these applications need to cope with various phenomena such as information redundancy, contradiction and complementarity. In this context, a process that is able to identify common information in a set of related sentences and generate a new sentence by merging information from the input sentences, without redundancies and contradictions, is of great relevance for applications that process multiple texts. Automatic sentence fusion is a relatively new research topic in NLP literature and for Portuguese, in particular, we are not aware of any such work. This work proposes a new method for fusing similar sentences in Portuguese, based on a symbolic and domainindependent approach, and produces Zíper, a sentence fusion system that implements the proposed method. Zíper is the first such system to generate sentences that express all the information from input sentences, i.e., the union of the input set. Moreover, it allows generating sentences that express only the redundant information of the set (considered more important), i.e., the intersection of the input sentences. The system was evaluated intrinsically and the results show that, in general, the generated sentences are well formed and preserve the original message of the set (i.e. the entire message in the fusion by union, and only the main message in the fusion by intersection). Zíper was also evaluated extrinsically in the context of a Portuguese multi-document summarizer. The results suggest that it can improve the quality of summaries by reducing redundancy, which often causes loss of cohesion and coherence Fusão automática de sentenças Geração de texto a partir de texto Automatic sentence fusion Text-on-text generation
5	Natural Language Generation for descriptive texts in interactive games Eliasson, Christopher January 2014 (has links) Context. Game development is a costly process and with today's advanced hardware the customers are asking for more playable content, and at higher quality. For many years providing this content procedurally has been done for level creation, modeling, and animation. However, there are games that require content in other forms, such as executable quests that progress the game forward. Quests have been procedurally generated to some extent, but not in enough detail to be usable for game development without providing a handwritten description of the quest. Objectives. In this study we combine a procedural content generation structure for quests with a natural language generation approach to generate a descriptive summarized text for quests, and examine whether the resulting texts are viable as quest prototypes for use in game development. Methods. A number of articles on the area of natural language generation is used to determine an appropriate way of validating the generated texts produced in this study, which concludes that a user case study is appropriate to evaluate each text for a set of statements. Results. 30 texts were generated and evaluated from ten different quest structures, where the majority of the texts were found to be good enough to be used for game development purposes. Conclusions. We conclude that quests can be procedurally generated in more detail by incorporating natural language generation. However, the quest structure used for this study needs to expand into more detail at certain structure components in order to fully support an automated system in a flexible manner. Furthermore due to semantics and grammatics being key components in the flow and usability of a text, a more sophisticated system needs to be implemented using more advanced techniques of natural language generation. Procedural content generation quests text generation game development Computer Sciences Datavetenskap (datalogi)
6	Abusive and Hate Speech Tweets Detection with Text Generation Nalamothu, Abhishek 06 September 2019 (has links) No description available. Computer Science Text generation Generative adversarial network Inverse Reinforcement Learning Online Harassment detection
7	Regularized Fine-tuning Strategies for Neural Language Models : Application of entropy regularization on GPT-2 Hong, Jae Eun January 2022 (has links) Deep neural language models like GPT-2 is undoubtedly strong at text generation, but often requires special decoding strategies to prevent producing degenerate output - namely repetition. The use of maximum likelihood training objective results in a peaked probability distribution, leading to the over-confidence of neural networks. In this thesis, we explore entropy regularization for a neural language model that can easily smooth peaked output distribution during the fine-tuning process employing GPT-2. We first define the models in three ways: (1) Out of-the box model without fine-tuning process, (2) Fine-tuned model without entropy regularization, and (3) Fine-tuned model with entropy regularization. To investigate the effect of domains on the model, we also divide the dataset into three ways: (1) fine-tuned on heterogeneous dataset, tested on heterogeneous dataset, (2) fine-tuned on homogeneous dataset, tested on homogeneous dataset, and (3) fine-tuned on heterogeneous dataset, tested on homogeneous dataset. In terms of entropy regularization, we experiment controlling the entropy strength parameter (𝛽) in the range of [0.5, 1.0, 2.0, 4.0, 6.0] and annealing the parameter during fine-tuning process. Our findings prove that the entropy-based regularization during fine-tuning process improve the text generation models by significantly reducing the repetition rate without tuning the decoding strategies. As a result of comparing the probabilities of human-generated sentence tokens, it was observed that entropy regularization compensates for the shortcomings of the deterministic decoding method (Beam search) that mostly selects few high-probability words. Various studies have explored entropy regularization in the cold-start training process of neural networks. However, there are not many studies covering the effect of the fine-tuning stage of text generation tasks when employing large scale pre-trained language models. Our findings present strong evidence that one can achieve significant improvement in text generation by way of utilizing entropy regularization, a highly cost-effective approach, during the fine-tuning process. text generation entropy regularization neural language model gpt-2
8	Summarizing User-generated Discourse Syed, Shahbaz 04 July 2024 (has links) Automatic text summarization is a long-standing task with its origins in summarizing scholarly documents by generating their abstracts. While older approaches mainly focused on generating extractive summaries, recent approaches using neural architectures have helped the task advance towards generating more abstractive, human-like summaries. Yet, the majority of the research in automatic text summarization has focused on summarizing professionally-written news articles due to easier availability of large-scale datasets with ground truth summaries in this domain. Moreover, the inverted pyramid writing style enforced in news articles places crucial information in the top sentences, essentially summarizing it. This allows for a more reliable identification of ground truth for constructing datasets. In contrast, user-generated discourse, such as social media forums or debate portals, has acquired comparably little attention, despite its evident importance. Possible reasons include the challenges posed by the informal nature of user-generated discourse, which often lacks a rigid structure, such as news articles, and the difficulty of obtaining high-quality ground truth summaries for this text register. This thesis aims to address this existing gap by delivering the following novel contributions in the form of datasets, methodologies, and evaluation strategies for automatically summarizing user-generated discourse: (1) three new datasets for the registers of social media posts and argumentative texts containing author-provided ground truth summaries as well as crowdsourced summaries for argumentative texts by adapting theoretical definitions of high-quality summaries; (2) methodologies for creating informative as well as indicative summaries for long discussions of controversial topics; (3) user-centric evaluation processes that emphasize the purpose and provenance of the summary for qualitative assessment of the summarization models; and (4) tools for facilitating the development and evaluation of summarization models that leverage visual analytics and interactive interfaces to enable a fine-grained inspection of the automatically generated summaries in relation to their source documents.:1 Introduction 1.1 Understanding User-Generated Discourse 1.2 The Role of Automatic Summarization 1.3 Research Questions and Contributions 1.4 Thesis Structure 1.5 Publication Record 2 The Task of Text Summarization 2.1 Decoding Human Summarization Practices 2.2 Exploring Automatic Summarization Methods 2.3 Evaluation of Automatic Summarization and its Challenges 2.4 Summary 3 Defining Good Summaries: Examining News Editorials 3.1 Key Characteristics of News Editorials 3.2 Operationalizing High-Quality Summaries 3.3 Evaluating and Ensuring Summary Quality 3.4 Automatic Extractive Summarization of News Editorials 3.5 Summary 4 Mining Social Media for Author-provided Summaries 4.1 Leveraging Human Signals for Summary Identification 4.2 Constructing a Corpus of Abstractive Summaries 4.3 Insights from the TL;DR Challenge 4.4 Summary 5 Generating Conclusions for Argumentative Texts 5.1 Identifying Author-provided Conclusions 5.2 Enhancing Pretrained Models with External Knowledge 5.3 Evaluating Informative Conclusion Generation 5.4 Summary 6 Frame-Oriented Extractive Summarization of Argumentative Discussions 6.1 Importance of Summaries for Argumentative Discussions 6.2 Employing Argumentation Frames as Anchor Points 6.3 Extractive Summarization of Argumentative Discussions 6.4 Evaluation of Extractive Summaries via Relevance Judgments 6.5 Summary 7 Indicative Summarization of Long Discussions 7.1 Table of Contents as an Indicative Summary 7.2 Unsupervised Summarization with Large Language Models 7.3 Comprehensive Analysis of Prompt Engineering 7.4 Purpose-driven Evaluation of Summary Usefulness 7.5 Summary 8 Summary Explorer: Visual Analytics for the Qualitative Assessment of the State of the Art in Text Summarization 8.1 Limitations of Automatic Evaluation Metrics 8.2 Designing Interfaces for Visual Exploration of Summaries 8.3 Corpora, Models, and Case Studies 8.4 Summary 9 SummaryWorkbench: Reproducible Models and Metrics for Text Summarization 9.1 Addressing the Requirements for Summarization Researchers 9.2 AUnified Interface for Applying and Evaluating State-of-the-Art Models and Metrics 9.3 Models and Measures 9.4 Curated Artifacts and Interaction Scenarios 9.5 Interaction Use Cases 9.6 Summary 10 Conclusion 10.1 Key Contributions of the Thesis 10.2 Open Problems and FutureWork info:eu-repo/classification/ddc/000 ddc:000
9	Automatizované generování příkladů do předmětu Asemblery / Auto-Generation of Examples for Assembly Languages Course Tomeček, Aleš January 2013 (has links) {This study analyses approaches to generation of unique assignments for teaching purposes and their potential usefulness for computer labs of assemblers course. Based on that research we design and implement system for creating pseudo unique assignments. As part of the work is also included web application for use directly during course and other tools aiding further work with system itself.
10	Synthetic Data Generation Using Transformer Networks / Textgenerering med transformatornätverk : Skapa text från ett syntetiskt dataset i tabellform Campos, Pedro January 2021 (has links) One of the areas propelled by the advancements in Deep Learning is Natural Language Processing. These continuous advancements allowed the emergence of new language models such as the Transformer [1], a deep learning model based on attention mechanisms that takes a sequence of symbols as input and outputs another sequence, attending to the input during its generation. This model is often used in translation, text summarization and text generation, outperforming previous used methods such as Recurrent Neural Networks and Generative Adversarial Networks. The problem statement provided by the company Syndata for this thesis is related to this new architecture: Given a tabular dataset, create a model based on the Transformer that can generate text fields considering the underlying context from the rest of the accompanying fields. In an attempt to accomplish this, Syndata has previously implemented a recurrent model, nevertheless, they’re confident that a Transformer could perform better at this task. Their goal is to improve the solution provided with the implementation of a model based on the Transformer architecture. The implemented model should then be compared to the previous recurrent model and it’s expected to outperform it. Since there aren’t many published research articles where Transformers are used for synthetic tabular data generation, this problem is fairly original. Four different models were implemented: a model that is based on the GPT architecture [2], an LSTM [3], a Bidirectional-LSTM with an Encoder- Decoder structure and the Transformer. The first two models are autoregressive models and the later two are sequence-to-sequence models which have an Encoder-Decoder architecture. We evaluated each one of them based on 3 different aspects: on the distribution similarity between the real and generated datasets, on how well each model was able to condition name generation considering the information contained in the accompanying fields and on how much real data the model compromised after generation, which addresses a privacy related issue. We found that the Encoder-Decoder models such as the Transformer and the Bidirectional LSTM seem to perform better for this type of synthetic data generation where the output (or the field to be predicted) has to be conditioned by the rest of the accompanying fields. They’ve outperformed the GPT and the RNNmodels in the aspects that matter most to Syndata: keeping customer data private and being able to correctly condition the output with the information contained in the accompanying fields. / Deep learning har lett till stora framsteg inom textbaserad språkteknologi (Natural Language Processing) där en typ av maskininlärningsarkitektur kallad Transformers[1] har haft ett extra stort intryck. Dessa modeller använder sig av en så kallad attention mekanism, tränas som språkmodeller (Language Models), där de tar in en sekvens av symboler och matar ut en annan. Varje steg i den utgående sekvensen beror olika mycket på steg i den ingående sekvensen givet vad denna attention mekanism lärt sig vara relevant. Dessa modeller används för översättning, sammanfattning och textgenerering och har överträffat andra arkitekturer som Recurrent Neural Networks, RNNs samt Generative Adversarial Networks. Problemformuleringen för denna avhandling kom från företaget Syndata och är relaterat till denna arkitektur: givet tabellbaserad data, implementera en Transformer som genererar textfält beroende av informationen i de medföljande tabellfälten. Syndata har tidigare implementerat ett RNN för detta ändamål men är övertygande om att en Transformer kan prestera bättre. Målet för denna avhandling är att implementera en Transformer och jämföra med den tidigare implementationen med hypotesen att den kommer att prestera bättre. Det underliggande målet är att givet data i tabellform kunna generera ny syntetisk data, användbar för industrin, där problem kring integritet och privat information kan minimeras. Fyra modeller implementerades: en Transformermodel baserad på GPT- arkitekturen[ 2], en LSTM[3]-modell, en encoder-decoder Transformer och en BiLSTM-modell. De två förstnämnda modellerna är auto-regressiva och de senare två är sequence-to-sequence som har en encoder-decoder arkitektur. Dessa modeller utvärderades och jämfördes givet tre kriterier: hur lik sannolikhetsfördelningen mellan den verkliga och den genererade datamängden, hur mycket varje modell baserade generationen på de medföljande fälten och hur mycket verklig data som komprometteras genom synteseringen. Slutsatsen var att Encoder-Decoder varianterna, Transformern och BiLSTM, var bättre för att syntesera data i tabellformat, där utdatan (eller fälten som ska genereras) ska uppvisa ett starkt beroende av resten av de medföljande fälten. De överträffade GPT- och RNN- modellerna i de aspekter som betyder mest för Syndata att hålla kunddata privat och att den syntetiserade datan ska vara beroende av informationen i de medföljande fälten. Transformer Synthetic Data Text Generation Deep Learning Tabular Data Transformator Syntetisk data Textgenerering Djupinlärning Tabelldata Computer Sciences Datavetenskap (datalogi)

Search results