Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
631 |
Turn of Phrase: Contrastive Pre-Training for Discourse-Aware Conversation ModelsLaboulaye, Roland 16 August 2021 (has links)
Understanding long conversations requires recognizing a discourse flow unique to conversation. Recent advances in unsupervised representation learning of text have been attained primarily through language modeling, which models discourse only implicitly and within a small window. These representations are in turn evaluated chiefly on sentence pair or paragraph-question pair benchmarks, which measure only local discourse coherence. In order to improve performance on discourse-reliant, long conversation tasks, we propose Turn-of-Phrase pre-training, an objective designed to encode long conversation discourse flow. We leverage tree-structured Reddit conversations in English to, relative to a chosen conversation path through the tree, select paths of varying degrees of relatedness. The final utterance of the chosen path is appended to the related paths and the model learns to identify the most coherent conversation path. We demonstrate that our pre-training objective encodes conversational discourse awareness by improving performance on a dialogue act classification task. We then demonstrate the value of transferring discourse awareness with a comprehensive array of conversation-level classification tasks evaluating persuasion, conflict, and deception.
|
632 |
Improving Eligibility Prescreening for Alzheimer’s Disease and Related Dementias Clinical Trials with Natural Language ProcessingIdnay, Betina Ross Saldua January 2022 (has links)
Alzheimer’s disease and related dementias (ADRD) are among the leading causes of disability and mortality among the older population worldwide and a costly public health issue, yet there is still no treatment for prevention or cure. Clinical trials are available, but successful recruitment has been a longstanding challenge. One strategy to improve recruitment is conducting eligibility prescreening, a resource-intensive process where clinical research staff manually go through electronic health records to identify potentially eligible patients. Natural language processing (NLP), an informatics approach used to extract relevant data from various structured and unstructured data types, may improve eligibility prescreening for ADRD clinical trials.
Guided by the Fit between Individuals, Task, and Technology framework, this dissertation research aims to optimize eligibility prescreening for ADRD clinical research by evaluating the sociotechnical factors influencing the adoption of NLP-driven tools. A systematic review of the literature was done to identify NLP systems that have been used for eligibility prescreening in clinical research. Following this, three NLP-driven tools were evaluated in ADRD clinical research eligibility prescreening: Criteria2Query, i2b2, and Leaf. We conducted an iterative mixed-methods usability evaluation with twenty clinical research staff using a cognitive walkthrough with a think-aloud protocol, Post-Study System Usability Questionnaire, and a directed deductive content analysis. Moreover, we conducted a cognitive task analysis with sixty clinical research staff to assess the impact of cognitive complexity on the usability of NLP systems and identify the sociotechnical gaps and cognitive support needed in using NLP systems for ADRD clinical research eligibility prescreening.
The results show that understanding the role of NLP systems in improving eligibility prescreening is critical to the advancement of clinical research recruitment. All three systems are generally usable and accepted by a group of clinical research staff. The cognitive walkthrough and a think-aloud protocol informed iterative system refinement, resulting in high system usability. Cognitive complexity has no significant effect on system usability; however, the system, order of evaluation, job position, and computer literacy are associated with system usability. Key recommendations for system development and implementation include improving system intuitiveness and overall user experience through comprehensive consideration of user needs and task completion requirements; and implementing a focused training on database query to improve clinical research staff’s aptitude in eligibility prescreening and advance workforce competency.
Finally, this study contributes to our understanding of the conduct of electronic eligibility prescreening for ADRD clinical research by clinical research staff. Findings from this study highlighted the importance of leveraging human-computer collaboration in conducting eligibility prescreening using NLP-driven tools, which provide an opportunity to identify and enroll participants of diverse backgrounds who are eligible for ADRD clinical research and accelerate treatment development.
|
633 |
Syntax-based Concept Extraction For Question AnsweringGlinos, Demetrios 01 January 2006 (has links)
Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems.
|
634 |
The Hermeneutics Of The Hard Drive: Using Narratology, Natural Language Processing, And Knowledge Management To Improve The Effectiveness Of The Digital Forensic ProcessPollitt, Mark 01 January 2013 (has links)
In order to protect the safety of our citizens and to ensure a civil society, we ask our law enforcement, judiciary and intelligence agencies, under the rule of law, to seek probative information which can be acted upon for the common good. This information may be used in court to prosecute criminals or it can be used to conduct offensive or defensive operations to protect our national security. As the citizens of the world store more and more information in digital form, and as they live an ever-greater portion of their lives online, law enforcement, the judiciary and the Intelligence Community will continue to struggle with finding, extracting and understanding the data stored on computers. But this trend affords greater opportunity for law enforcement. This dissertation describes how several disparate approaches: knowledge management, content analysis, narratology, and natural language processing, can be combined in an interdisciplinary way to positively impact the growing difficulty of developing useful, actionable intelligence from the ever-increasing corpus of digital evidence. After exploring how these techniques might apply to the digital forensic process, I will suggest two new theoretical constructs, the Hermeneutic Theory of Digital Forensics and the Narrative Theory of Digital Forensics, linking existing theories of forensic science, knowledge management, content analysis, narratology, and natural language processing together in order to identify and extract narratives from digital evidence. An experimental approach will be described and prototyped. The results of these experiments demonstrate the potential of natural language processing techniques to digital forensics.
|
635 |
Enhanced Content-Based Fake News Detection Methods with Context-Labeled News SourcesArnfield, Duncan 01 December 2023 (has links) (PDF)
This work examined the relative effectiveness of multilayer perceptron, random forest, and multinomial naïve Bayes classifiers, trained using bag of words and term frequency-inverse dense frequency transformations of documents in the Fake News Corpus and Fake and Real News Dataset. The goal of this work was to help meet the formidable challenges posed by proliferation of fake news to society, including the erosion of public trust, disruption of social harmony, and endangerment of lives. This training included the use of context-categorized fake news in an effort to enhance the tools’ effectiveness. It was found that term frequency-inverse dense frequency provided more accurate results than bag of words across all evaluation metrics for identifying fake news instances, and that the Fake News Corpus provided much higher result metrics than the Fake and Real News Dataset. In comparison to state-of-the-art methods the models performed as expected.
|
636 |
TSPOONS: Tracking Salience Profiles of Online News StoriesPaterson, Kimberly Laurel 01 June 2014 (has links) (PDF)
News space is a relatively nebulous term that describes the general discourse concerning events that affect the populace. Past research has focused on qualitatively analyzing news space in an attempt to answer big questions about how the populace relates to the news and how they respond to it. We want to ask when do stories begin? What stories stand out among the noise? In order to answer the big questions about news space, we need to track the course of individual stories in the news. By analyzing the specific articles that comprise stories, we can synthesize the information gained from several stories to see a more complete picture of the discourse. The individual articles, the groups of articles that become stories, and the overall themes that connect stories together all complete the narrative about what is happening in society.
TSPOONS provides a framework for analyzing news stories and answering two main questions: what were the important stories during some time frame and what were the important stories involving some topic. Drawing technical news stories from Techmeme.com, TSPOONS generates profiles of each news story, quantitatively measuring the importance, or salience, of news stories as well as quantifying the impact of these stories over time.
|
637 |
Predicting Music Genre Preferences Based on Online CommentsSinclair, Andrew J 01 June 2014 (has links) (PDF)
Communication Accommodation Theory (CAT) states that individuals adapt to each other’s communicative behaviors. This adaptation is called “convergence.” In this work we explore the convergence of writing styles of users of the online music distribution plat- form SoundCloud.com. In order to evaluate our system we created a corpus of over 38,000 comments retrieved from SoundCloud in April 2014. The corpus represents comments from 8 distinct musical genres: Classical, Electronic, Hip Hop, Jazz, Country, Metal, Folk, and World. Our corpus contains: short comments, frequent misspellings, little sentence struc- ture, hashtags, emoticons, and URLs. We adapt techniques used by researchers analyzing other short web-text corpora in order to deal with these problems. We use a supervised machine learning approach to classify the genre of comments in our corpus. We examine the effects of different feature sets and supervised machine learning algorithms on classification accuracy. In total we ran 180 experiments in which we varied: number of genres, feature set composition, and machine learning algorithm. In experiments with all 8 genres we achieve up to 40% accuracy using either a Naive Bayes classifier or C4.5 based classifier with a feature set consisting of 1262 token unigrams and bigrams. This represents a 3 time improvement over chance levels.
|
638 |
Improving Relation Extraction from Unstructured Genealogical Texts Using Fine-Tuned TransformersParrolivelli, Carloangello 01 June 2022 (has links) (PDF)
Though exploring one’s family lineage through genealogical family trees can be insightful to developing one’s identity, this knowledge is typically held behind closed doors by private companies or require expensive technologies, such as DNA testing, to uncover. With the ever-booming explosion of data on the world wide web, many unstructured text documents, both old and new, are being discovered, written, and processed which contain rich genealogical information. With access to this immense amount of data, however, entails a costly process whereby people, typically volunteers, have to read large amounts of text to find relationships between people. This delays having genealogical information be open and accessible to all.
This thesis explores state-of-the-art methods for relation extraction across the genealogical and biomedical domains and bridges new and old research by proposing an updated three-tier system for parsing unstructured documents. This system makes use of recently developed and massively pretrained transformers and fine-tuning techniques to take advantage of these deep neural models’ inherent understanding of English syntax and semantics for classification.
With only a fraction of labeled data typically needed to train large models, fine-tuning a LUKE relation classification model with minimal added features can identify genealogical relationships with macro precision, recall, and F1 scores of 0.880, 0.867, and 0.871, respectively, in data sets with scarce (∼10%) positive relations. Further- more, with the advent of a modern coreference resolution system utilizing SpanBERT embeddings and a modern named entity parser, our end-to-end pipeline can extract and correctly classify relationships within unstructured documents with macro precision, recall, and F1 scores of 0.794, 0.616, and 0.676, respectively. This thesis also evaluates individual components of the system and discusses future improvements to be made.
|
639 |
Design Extractor: A ML-based Tool for CapturingSoftware Design DecisionsSöderström, Petrus January 2023 (has links)
Context: A software project’s success; involvinga larger group of individuals, relies on efficient teamcommunication. Part of efficient communication is avoidingmiscommunication, misunderstandings, and losingknowledge. These consequences of poor communication canlead to negative repercussions such as loss of time, money,and customer approval. Much effort has been put intocreating tools and systems to aid software engineers inretaining knowledge and decisions made during meetings,but many existing solutions require additional manualintervention on the part of software meeting participants.The objective of this thesis is to explore and develop a toolcalled Design Extractor (DE) which creates concisesummaries of design meetings from recorded voiceconversations. These summaries include both the designdecisions made during a meeting as well as the rationalebehind them. This thesis used readily available Pythonframeworks for machine learning to train two transformermodels based on DistilBert and Google’s BERT. Fine-tuningthese models with data sourcedfrom six different softwaredesign meetings found that the best base model wasDistilBert, which resulted in a fine-tuned model reporting anF1 score of 82.63%. This study created a simple Python tool,built upon many publicly available Python frameworks andthe fine-tuned transformer model, that takes in voicerecordings and outputs labeled sentence-label pairs that canbe used to quickly notate a design meeting. Short summariesare also provided by the tool through the use of pre-existingtext summarisation machine learning models such as BART.Design extractor therefore provides a simple quick way toreview longer meeting recordings in the context of softwareengineering decisions.
|
640 |
PROMPT-ASSISTED RELATION FUSION IN KNOWLEDGE GRAPH ACQUISITIONXiaonan Jing (14230196) 08 December 2022 (has links)
<p> </p>
<p>Knowledge Base (KB) systems have been studied for decades. Various approaches have been explored in acquiring accurate and scalable KBs. Recently, many studies focus on Knowledge Graphs (KG) which uses a simple triple representation. A triple consists of a head entity, a predicate, and a tail entity. The head entity and the tail entity are connected by the predicate which indicates a certain relation between them. Three main research fields can be identified in KG acquisition. First, relation extraction aims at extracting the triples from the raw data. Second, entity linking addresses mapping the same entity together. Last, knowledge fusion integrates heterogeneous sources into one. This dissertation focuses on relation fusion, which is a sub-process of knowledge fusion. More specifically, this dissertation aims to investigate if the concurrently popular prompt-based learning method can assist with relation fusion. A framework to acquire a KG is proposed to work with a real world dataset. The framework contains a Preprocessing module which annotates raw sentences and links known entities to the triples; a Prompting module, which generates and processes prompts for prediction with Pretrained Language Models (PLMs); and a Relation Fusion module, which creates predicate representations, clusters embeddings, and derives cluster labels. A series of experiments with comparison prompting groups are conducted. The results indicate that prompt-based learning, if applied appropriately, can help with grouping similar predicates. The framework proposed in this dissertation can be used eectively for assisting human experts with the creation of relation types during knowledge acquisition. </p>
|
Page generated in 0.1243 seconds