Global ETD Search

241	Segmenting Electronic Theses and Dissertations By Chapters Manzoor, Javaid Akbar 18 January 2023 (has links) Master of Science / Electronic theses and dissertations (ETDs) are structured documents in which chapters are major components. There is a lack of any repository that contains chapter boundary details alongside these structured documents. Revealing these details of the documents can help increase accessibility. This research explores the manipulation of ETDs marked up using LaTeX to generate chapter boundaries. We use this to create a data set of 1,459 ETDs and their chapter boundaries. Additionally, for the task of automatic segmentation of unseen documents, we prototype three deep learning models that are trained using this data set. We hope to encourage researchers to incorporate LaTeX manipulation techniques to create similar data sets. segmentation deep learning natural language processing ETD digital libraries
242	Computational models of coherence for open-domain dialogue Cervone, Alessandra 08 October 2020 (has links) Coherence is the quality that gives a text its conceptual unity, making a text a coordinated set of connected parts rather than a random group of sentences (turns, in the case of dialogue). Hence, coherence is an integral property of human communication, necessary for a meaningful discourse both in text and dialogue. As such, coherence can be regarded as a requirement for conversational agents, i.e. machines designed to converse with humans. Though recently there has been a proliferation in the usage and popularity of conversational agents, dialogue coherence is still a relatively neglected area of research, and coherence across multiple turns of a dialogue remains an open challenge for current conversational AI research. As conversational agents progress from being able to handle a single application domain to multiple ones through any domain (open-domain), the range of possible dialogue paths increases, and thus the problem of maintaining multi-turn coherence becomes especially critical. In this thesis, we investigate two aspects of coherence in dialogue and how they can be used to design modules for an open-domain coherent conversational agent. In particular, our approach focuses on modeling intentional and thematic information patterns of distribution as proxies for a coherent discourse in open-domain dialogue. While for modeling intentional information we employ Dialogue Acts (DA) theory (Bunt, 2009); for modeling thematic information we rely on open-domain entities (Barzilay and Lapata, 2008). We find that DAs and entities play a fundamental role in modelling dialogue coherence both independently and jointly, and that they can be used to model different components of an open-domain conversational agent architecture, such as Spoken Language Understanding, Dialogue Management, Natural Language Generation, and open-domain dialogue evaluation. The main contributions of this thesis are: (I) we present an open-domain modular conversational agent architecture based on entity and DA structures designed for coherence and engagement; (II) we propose a methodology for training an open-domain DA tagger compliant with the ISO 24617-2 standard (Bunt et al., 2012) combining multiple resources; (III) we propose different models, and a corpus, for predicting open-domain dialogue coherence using DA and entity information trained with weakly supervised techniques, first at the conversation level and then at the turn level; (IV) we present supervised approaches for automatic evaluation of open-domain conversation exploiting DA and entity information, both at the conversation level and at the turn level; (V) we present experiments with Natural Language Generation models that generate text from Meaning Representation structures composed of DAs and slots for an open-domain setting.
243	Improving Access to ETD Elements Through Chapter Categorization and Summarization Banerjee, Bipasha 07 August 2024 (has links) The field of natural language processing and information retrieval has made remarkable progress since the 1980s. However, most of the theoretical investigation and applied experimentation is focused on short documents like web pages, journal articles, or papers in conference proceedings. Electronic Theses and Dissertations (ETDs) contain a wealth of information. These book-length documents describe research conducted in a variety of academic disciplines. While current digital library systems can be directly used to find a document of interest, they do not also facilitate discovering what specific parts or segments are of particular interest. This research aims to improve access to ETD components by providing users with chapter-level classification labels and summaries to help easily find portions of interest. We explore the challenges such documents pose, especially when dealing with a highly specialized academic vocabulary. We use large language models (LLMs) and fine-tune pre-trained models for these downstream tasks. We also develop a method to connect the ETD discipline and the department information to an ETD-centric classification system. To help guide the summarization model to create better chapter summaries, for each chapter, we try to identify relevant sentences of the document abstract, plus the titles of cited references from the bibliography. We leverage human feedback that helps us evaluate models qualitatively on top of using traditional metrics. We provide users with chapter classification labels and summaries to improve access to ETD chapters. We generate the top three classification labels for each chapter that reflect the interdisciplinarity of the work in ETDs. Our evaluation proves that our ensemble methods yield summaries that are preferred by users. Our summaries also perform better than summaries generated by using a single method when evaluated on several metrics using an LLM-based evaluation methodology. / Doctor of Philosophy / Natural language processing (NLP) is a field in computer science that focuses on creating artificially intelligent models capable of processing text and audio similarly to humans. We make use of various NLP techniques, ranging from machine learning and language models, to provide users with a much more granular level of information stored in Electronic Theses and Dissertations (ETDs). ETDs are documents submitted by students conducting research at the culmination of their degree. Such documents comprise research work in various academic disciplines and thus contain a wealth of information. This work aims to make such information stored in chapters of ETDs more accessible to readers through the addition of chapter-level classification labels and summaries. We provide users with chapter classification labels and summaries to improve access to ETD chapters. We generate the top three classification labels for each chapter that reflect the interdisciplinarity of the work in ETDs. Alongside human evaluation of automatically generated summaries, we use an LLM-based approach that aims to score summaries on several metrics. Our evaluation proves that our methods yield summaries that users prefer to summaries generated by using a single method. Summarization Classification Natural Language Processing Machine Learning Language Models
244	Linguistic Cues to Deception Connell, Caroline 05 June 2012 (has links) This study replicated a common experiment, the Desert Survival Problem, and attempted to add data to the body of knowledge for deception cues. Participants wrote truthful and deceptive essays arguing why items salvaged from the wreckage were useful for survival. Cues to deception considered here fit into four categories: those caused by a deceivers' negative emotion, verbal immediacy, those linked to a deceiver's attempt to appear truthful, and those resulting from deceivers' high cognitive load. Cues caused by a deceiver's negative emotions were mostly absent in the results, although deceivers did use fewer first-person pronouns than truth tellers. That indicated deceivers were less willing to take ownership of their statements. Cues because of deceivers' attempts to appear truthful were present. Deceivers used more words and more exact language than truth tellers. That showed an attempt to appear truthful. Deceivers' language was simpler than that of truth tellers, which indicated a higher cognitive load. Future research should include manipulation checks on motivation and emotion, which are tied to cue display. The type of cue displayed, be it emotional leakage, verbal immediacy, attempts to appear truthful or cognitive load, might be associated with particular deception tasks. Future research, including meta-analyses, should attempt to determine which deception tasks produce which cue type. Revised file, GMc 5/28/2014 per Dean DePauw / Master of Arts computer-mediated communication deception detection natural language processing
245	Information Retrieval Models for Software Test Selection and Prioritization Gådin, Oskar January 2024 (has links) There are a lot of software systems currently in use for different applications. To make sure that these systems function there is a need to properly test and maintain them.When a system grows in scope it becomes more difficult to test and maintain, and so test selection and prioritization tools that incorporate artificial intelligence, information retrieval and natural language processing are useful. In this thesis, different information retrieval models were implemented and evaluated using multiple datasets based on different filters and pre-processing methods. The data was provided by Westermo Network Technologies AB and represent one of their systems. The datasets contained data with information about the test results and what data was used for the tests. The results showed that for models that are not trained for this data it is more beneficial to give them less data which is only related to test failures. Allowing the models to have access to more data showed that they made connections that were inaccurate as the data were unrelated. The results also showed that if a model is not adjusted to the data, a simple model could be more effective compared to a more advanced model. / Det finns många mjukvarusystem som för närvarande används för olika tjänster. För att säkerställa att dessa system fungerar korrekt är det nödvändigt att testa och underhålla dem ordentligt.När ett system växer i omfattning blir det svårare att testa och underhålla, och testverktyg för testselektion och testprioritering som integrerar artificiell intelligens, informationssökning och natural language processing är därför användbara. I denna rapport utvärderades olika informationssökningsmodeller med hjälp av olika dataset som är baserade på olika filter och förbehandlingsmetoder. Datan tillhandahölls av Westermo Network Technologies AB och representerar ett av deras system. Dataseten innehöll data med information om testresultaten och vilken data som användes för testen. Resultaten visade att för modeller som inte är tränade för denna data är det mer fördelaktigt att ge dem mindre data som endast är relaterade till testfel. Att ge modellerna tillgång till mer data visade att de gjorde felaktiga kopplingar eftersom data var orelaterad. Resultaten visade också att givet en modell inte var justerad mot data, kunde en enklare modell vara mer effektiv än en mer avancerad modell. Information retrieval Natural language processing Computer and Information Sciences Data- och informationsvetenskap
246	Konzeption eines dreistufigen Transfers für die maschinelle Übersetzung natürlicher Sprachen Laube, Annett, Karl, Hans-Ulrich 14 December 2012 (has links) (PDF) 0 VORWORT Die für die Übersetzung von Programmiersprachen benötigten Analyse- und Synthesealgorithmen können bereits seit geraumer Zeit relativ gut sprachunabhängig formuliert werden. Dies findet seinen Ausdruck unter anderem in einer Vielzahl von Generatoren, die den Übersetzungsproze? ganz oder teilweise automatisieren lassen. Die Syntax der zu verarbeitenden Sprache steht gewöhnlich in Datenform (Graphen, Listen) auf der Basis formaler Beschreibungsmittel (z.B. BNF) zur Verfügung. Im Bereich der Übersetzung natürlicher Sprachen ist die Trennung von Sprache und Verarbeitungsalgorithmen - wenn überhaupt - erst ansatzweise vollzogen. Die Gründe liegen auf der Hand. Natürliche Sprachen sind mächtiger, ihre formale Darstellung schwierig. Soll die Übersetzung auch die mündliche Kommunikation umfassen, d.h. den menschlichen Dolmetscher auf einer internationalen Konferenz oder beim Telefonieren mit einem Partner, der eine andere Sprache spricht, ersetzen, kommen Echtzeitanforderungen dazu, die dazu zwingen werden, hochparallele Ansätze zu verfolgen. Der Prozess der Übersetzung ist auch dann, wenn keine Echtzeiterforderungen vorliegen, außerordentlich komplex. Lösungen werden mit Hilfe des Interlingua- und des Transferansatzes gesucht. Verstärkt werden dabei formale Beschreibungsmittel realtiv gut erforschter Teilgebiete der Informatik eingesetzt (Operationen über dekorierten Bäumen, Baum-zu-Baum-Übersetzungsstrategien), von denen man hofft, daß die Ergebnisse weiter führen werden als spektakuläre Prototypen, die sich jetzt schon am Markt befinden und oft aus heuristischen Ansätzen abgeleitet sind. [...] maschinelle Übersetzung natürliche Sprachen Programmiersprachen Syntaxanalyse translation natural language programming natural language processing ddc:004 rvk:SS 5514
247	Tense, aspect and temporal reference Moens, Marc January 1988 (has links) English exhibits a rich apparatus of tense, aspect, time adverbials and other expressions that can be used to order states of affairs with respect to each other, or to locate them at a point in time with respect to the moment of speech. Ideally one would want a semantics for these expressions to demonstrate that an orderly relationship exists between any one expression and the meanings it conveys. Yet most existing linguistic and formal semantic accounts leave something to be desired in this respect, describing natural language temporal categories as being full of ambiguities and indetenninacies, apparently escaping a uniform semantic description. It will be argued that this anomaly stems from the assumption that the semantics of these expressions is directly related to the linear conception of time familiar from temporal logic or physics - an assumption which can be seen to underly most of the current work on tense and aspect. According to these theories, the cognitive work involved in the processing of temporal discourse consists of the ordering of events as points or intervals on a time line or a set of time lines. There are, however, good reasons for wondering whether this time concept really is the one that our linguistic categories are most directly related to; it will be argued that a semantics of temporally referring expressions and a theory of their use in defining the temporal relations of events require a different and more complex structure underlying the meaning representations than is commonly assumed. A semantics will be developed, based on the assumption that categories like tense, aspect, aspectual adverbials and propositions refer to a mental representation of events that is structured on other than purely temporal principles, and to which the notion of a nucleus or consequentially related sequence of preparatory process, goal event and consequent state is central. It will be argued that the identification of the correct ontology is a logical preliminary to the choice of any particular formal representation scheme, as well as being essential in the design of natural language front-ends for temporal databases. It will be shown how the ontology developed here can be implemented in a database that contains time-related information about events and that is to be queried by means of natural language utterances. 410
248	Automatic movie analysis and summarisation Gorinski, Philip John January 2018 (has links) Automatic movie analysis is the task of employing Machine Learning methods to the field of screenplays, movie scripts, and motion pictures to facilitate or enable various tasks throughout the entirety of a movie’s life-cycle. From helping with making informed decisions about a new movie script with respect to aspects such as its originality, similarity to other movies, or even commercial viability, all the way to offering consumers new and interesting ways of viewing the final movie, many stages in the life-cycle of a movie stand to benefit from Machine Learning techniques that promise to reduce human effort, time, or both. Within this field of automatic movie analysis, this thesis addresses the task of summarising the content of screenplays, enabling users at any stage to gain a broad understanding of a movie from greatly reduced data. The contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale data set of original movie scripts, annotated with additional meta-information such as genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script- Base is the largest data set of its kind, containing scripts and information for almost 1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the screenplay domain, which allows for extraction of highly informative and important scenes from movie scripts. The extracted summaries allow for the content of the original script to stay largely intact and provide the user with its important parts, while greatly reducing the script-reading time. (iii) We extend our summarisation model to capture additional modalities beyond the screenplay text. The model is rendered multi-modal by introducing visual information obtained from the actual movie and by extracting scenes from the movie, allowing users to generate visual summaries of motion pictures. (iv) We devise a novel end-to-end neural network model for generating natural language screenplay overviews. This model enables the user to generate short descriptive and informative texts that capture certain aspects of a movie script, such as its genres, approximate content, or style, allowing them to gain a fast, high-level understanding of the screenplay. Multiple automatic and human evaluations were carried out to assess the performance of our models, demonstrating that they are well-suited for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the ScriptBase data set has started to gain traction, and is currently used by a number of other researchers in the field to tackle various tasks relating to screenplays and their analysis.
249	Tell me why : uma arquitetura para fornecer explicações sobre revisões / Tell me why : an architecture to provide rich review explanations Woloszyn, Vinicius January 2015 (has links) O que as outras pessoas pensam sempre foi uma parte importante do processo de tomada de decisão. Por exemplo, as pessoas costumam consultar seus amigos para obter um parecer sobre um livro ou um filme ou um restaurante. Hoje em dia, os usuários publicam suas opiniões em sites de revisão colaborativa, como IMDB para filmes, Yelp para restaurantes e TripAdiviser para hotéis. Ao longo do tempo, esses sites têm construído um enorme banco de dados que conecta usuários, artigos e opiniões expressas por uma classificação numérica e um comentário de texto livre que explicam por que eles gostam ou não gostam de um item. Mas essa vasta quantidade de dados pode prejudicar o usuário a obter uma opinião. Muitos trabalhos relacionados fornecem uma interpretações de revisões para os usuários. Eles oferecem vantagens diferentes para vários tipos de resumos. No entanto, todos eles têm a mesma limitação: eles não fornecem resumos personalizados nem contrastantes comentários escritos por diferentes segmentos de colaboradores. Compreeder e contrastar comentários escritos por diferentes segmentos de revisores ainda é um problema de pesquisa em aberto. Assim, nosso trabalho propõe uma nova arquitetura, chamado Tell Me Why. TMW é um projeto desenvolvido no Laboratório de Informática Grenoble em cooperação com a Universidade Federal do Rio Grande do Sul para fornecer aos usuários uma melhor compreensão dos comentários. Propomos uma combinação de análise de texto a partir de comentários com a mineração de dados estruturado resultante do cruzamento de dimensões do avaliador e item. Além disso, este trabalho realiza uma investigação sobre métodos de sumarização utilizados na revisão de produtos. A saída de nossa arquitetura consiste em declarações personalizadas de texto usando Geração de Linguagem Natural composto por atributos de itens e comentários resumidos que explicam a opinião das pessoas sobre um determinado assunto. Os resultados obtidos a partir de uma avaliação comparativa com a Revisão Mais Útil da Amazon revelam que é uma abordagem promissora e útil na opinião do usuário. / What other people think has been always an important part of the process of decision-making. For instance, people usually consult their friends to get an opinion about a book, or a movie or a restaurant. Nowadays, users publish their opinions on collaborative reviewing sites such as IMDB for movies, Yelp for restaurants and TripAdvisor for hotels. Over the time, these sites have built a massive database that connects users, items and opinions expressed by a numeric rating and a free text review that explain why they like or dislike a specific item. But this vast amount of data can hamper the user to get an opinion. Several related work provide a review interpretations to the users. They offer different advantages for various types of summaries. However, they all have the same limitation: they do not provide personalized summaries nor contrasting reviews written by different segments of reviewers. Understanding and contrast reviews written by different segments of reviewers is still an open research problem. Our work proposes a new architecture, called Tell Me Why, which is a project developed at Grenoble Informatics Laboratory in cooperation with Federal University of Rio Grande do Sul to provide users a better understanding of reviews. We propose a combination of text analysis from reviews with mining structured data resulting from crossing reviewer and item dimensions. Additionally, this work performs an investigation of summarization methods utilized in review domain. The output of our architecture consists of personalized statement using Natural Language Generation that explain people’s opinion about a particular item. The evaluation reveal that it is a promising approach and useful in user’s opinion. Processamento : Linguagem natural Linguagem natural Mineracao : Dados Opinion mining Data mining Natural language processing Natural language generation Big data
250	Answering Deep Queries Specified in Natural Language with Respect to a Frame Based Knowledge Base and Developing Related Natural Language Understanding Components January 2015 (has links) abstract: Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However, most of the research in Question Answering aims at factual questions rather than deep ones such as ``How'' and ``Why'' questions. In this dissertation, I suggest a different approach in tackling this problem. We believe that the answers of deep questions need to be formally defined before found. Because these answers must be defined based on something, it is better to be more structural in natural language text; I define Knowledge Description Graphs (KDGs), a graphical structure containing information about events, entities, and classes. We then propose formulations and algorithms to construct KDGs from a frame-based knowledge base, define the answers of various ``How'' and ``Why'' questions with respect to KDGs, and suggest how to obtain the answers from KDGs using Answer Set Programming. Moreover, I discuss how to derive missing information in constructing KDGs when the knowledge base is under-specified and how to answer many factual question types with respect to the knowledge base. After having the answers of various questions with respect to a knowledge base, I extend our research to use natural language text in specifying deep questions and knowledge base, generate natural language text from those specification. Toward these goals, I developed NL2KR, a system which helps in translating natural language to formal language. I show NL2KR's use in translating ``How'' and ``Why'' questions, and generating simple natural language sentences from natural language KDG specification. Finally, I discuss applications of the components I developed in Natural Language Understanding. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2015 Artificial intelligence Computer science deep question How and Why knowledge representation natural language processing natural language understanding question answering

Search results