Global ETD Search

251	Leveraging Transformer Models and Elasticsearch to Help Prevent and Manage Diabetes through EFT Cues Shah, Aditya Ashishkumar 16 June 2023 (has links) Diabetes in humans is a long-term (chronic) illness that affects how our body converts food into energy. Approximately one in ten individuals residing in the United States is affected with diabetes and more than 90% of those have type 2 diabetes (T2D). Human bodies fail to produce insulin in type 1 diabetes, causing you to take insulin for survival. However, with type 2 diabetes, the body can't use insulin well. A proven way to manage diabetes is through a positive mindset and a healthy lifestyle. Several studies have been conducted at Virginia Tech and the University of Buffalo on discovering different helpful characteristics in a person's day-to-day life, which relate to important events. They consider Episodic Fu- ture Thinking (EFT), where participants identify several events/actions that might occur at multiple future time frames (1 month to 10 years) in text-based descriptions (cues). This re- search aims to detect content characteristics from these EFT cues. However, class imbalance often presents a challenging issue when dealing with such domain-specific data. To mitigate this issue, this research employs Elasticsearch to address data imbalance and enhance the machine learning (ML) pipeline for improved accuracy of predictions. By leveraging Elas- ticsearch and transformer models, this study constructs classifiers and regression models, which can be utilized to identify various content characteristics from the cues. To the best of our knowledge, this work represents the first such attempt to employ natural language processing (NLP) techniques to analyze EFT cues and establish a correlation between those characteristics and their impacts on decision-making and health outcomes. / Master of Science / Diabetes is a serious and long-term illness that impacts how the body converts food into energy. It affects around one in ten individuals residing in the United States, and over 90% of these individuals have type 2 diabetes (T2D). While a positive attitude and healthy lifestyle can help with management of diabetes, it is unclear exactly which mental attitudes most affect health outcomes. To gain a better understanding of this relationship, researchers from Virginia Tech and the University of Buffalo conducted multiple studies on Episodic Future Thinking (EFT), where participants identify several events or actions that could take place in the future. This research uses natural language processing (NLP) to analyze the descriptions of these events (cues) and identify different characteristics that relate to a person's day-to-day life. With the help of Elasticsearch and transformer models, this work handles the data imbalance and improves the model predictions for different categories within cues. Overall, this research has the potential to provide valuable insights that can impact their diabetes risk, potentially leading to better management and prevention strategies and treatments. Natural Language Processing Deep Learning Elasticsearch Language models Diabetes.
252	Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social Media Gunturi, Uma Sushmitha 11 July 2023 (has links) Experiences of interpersonal racism persist as a prevalent reality for BIPOC (Black, Indigenous, People of Color) in the United States. One form of racism that often goes unnoticed is racial microaggressions. These are subtle acts of racism that leave victims questioning the intent of the aggressor. The line of offense is often unclear, as these acts are disguised through humor or seemingly harmless intentions. In this study, we analyze the language used in online racial microaggressions ("Acts") and compare it to personal narratives recounting experiences of such aggressions ("Recalls") by Black social media users. We curated a corpus of acts and recalls from social media discussions on platforms like Reddit and Tumblr. Additionally, we collaborated with Black participants in a workshop to hand-annotate and verify the corpus. Using natural language processing techniques and qualitative analysis, we examine the language underlying acts and recalls of racial microaggressions. Our goal is to understand the lexical patterns that differentiate the two in the context of racism in the U.S. Our findings indicate that neural language models can accurately classify acts and recalls, revealing contextual words that associate Blacks with objects that perpetuate negative stereotypes. We also observe overlapping linguistic signatures between acts and recalls, serving different purposes, which have implications for current challenges in social media content moderation systems. / Master of Science / Racial Microaggressions are expressions of human biases that are subtly disguised. The differences in language and themes used in instances of Racial Microaggressions ("Acts") and the discussions addressing them ("Recalls") on online communities have made it difficult for researchers to automatically quantify and extract these differences. In this study, we introduce a tool that can effectively distinguish acts and recalls of microaggressions. We utilize Natural Language Processing techniques to classify and identify key distinctions in language usage and themes. Additionally, we employ qualitative methods and engage in workshop discussions with Black participants to interpret the classification results. Our findings reveal common linguistic patterns between acts and recalls that serve opposing purposes. Acts tend to stereotype and degrade Black people, while recalls seek to portray their discomfort and seek validation for their experiences. These findings highlight why recalls are often considered toxic in online communities. This also represents an initial step towards creating a socio-technical system that safeguards the experiences of racial minority groups. Natural Language Processing Human Centered Computing Race and Ethnicity
253	Role of Premises in Visual Question Answering Mahendru, Aroma 12 June 2017 (has links) In this work, we make a simple but important observation questions about images often contain premises -- objects and relationships implied by the question -- and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel irrelevant question detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning. / Master of Science / There has been substantial recent work on the Visual Question Answering (VQA) problem in which an automated agent is tasked on answering questions about images posed in natural language. In this work, we make a simple but important observation – questions about images often contain premises – objects and relationships implied by the question – and that reasoning about premises can help VQA models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel irrelevant question detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning. Machine learning Natural Language Processing Computer Vision Artificial Intelligence
254	Computational models of coherence for open-domain dialogue Cervone, Alessandra 08 October 2020 (has links) Coherence is the quality that gives a text its conceptual unity, making a text a coordinated set of connected parts rather than a random group of sentences (turns, in the case of dialogue). Hence, coherence is an integral property of human communication, necessary for a meaningful discourse both in text and dialogue. As such, coherence can be regarded as a requirement for conversational agents, i.e. machines designed to converse with humans. Though recently there has been a proliferation in the usage and popularity of conversational agents, dialogue coherence is still a relatively neglected area of research, and coherence across multiple turns of a dialogue remains an open challenge for current conversational AI research. As conversational agents progress from being able to handle a single application domain to multiple ones through any domain (open-domain), the range of possible dialogue paths increases, and thus the problem of maintaining multi-turn coherence becomes especially critical. In this thesis, we investigate two aspects of coherence in dialogue and how they can be used to design modules for an open-domain coherent conversational agent. In particular, our approach focuses on modeling intentional and thematic information patterns of distribution as proxies for a coherent discourse in open-domain dialogue. While for modeling intentional information we employ Dialogue Acts (DA) theory (Bunt, 2009); for modeling thematic information we rely on open-domain entities (Barzilay and Lapata, 2008). We find that DAs and entities play a fundamental role in modelling dialogue coherence both independently and jointly, and that they can be used to model different components of an open-domain conversational agent architecture, such as Spoken Language Understanding, Dialogue Management, Natural Language Generation, and open-domain dialogue evaluation. The main contributions of this thesis are: (I) we present an open-domain modular conversational agent architecture based on entity and DA structures designed for coherence and engagement; (II) we propose a methodology for training an open-domain DA tagger compliant with the ISO 24617-2 standard (Bunt et al., 2012) combining multiple resources; (III) we propose different models, and a corpus, for predicting open-domain dialogue coherence using DA and entity information trained with weakly supervised techniques, first at the conversation level and then at the turn level; (IV) we present supervised approaches for automatic evaluation of open-domain conversation exploiting DA and entity information, both at the conversation level and at the turn level; (V) we present experiments with Natural Language Generation models that generate text from Meaning Representation structures composed of DAs and slots for an open-domain setting.
255	Improving Access to ETD Elements Through Chapter Categorization and Summarization Banerjee, Bipasha 07 August 2024 (has links) The field of natural language processing and information retrieval has made remarkable progress since the 1980s. However, most of the theoretical investigation and applied experimentation is focused on short documents like web pages, journal articles, or papers in conference proceedings. Electronic Theses and Dissertations (ETDs) contain a wealth of information. These book-length documents describe research conducted in a variety of academic disciplines. While current digital library systems can be directly used to find a document of interest, they do not also facilitate discovering what specific parts or segments are of particular interest. This research aims to improve access to ETD components by providing users with chapter-level classification labels and summaries to help easily find portions of interest. We explore the challenges such documents pose, especially when dealing with a highly specialized academic vocabulary. We use large language models (LLMs) and fine-tune pre-trained models for these downstream tasks. We also develop a method to connect the ETD discipline and the department information to an ETD-centric classification system. To help guide the summarization model to create better chapter summaries, for each chapter, we try to identify relevant sentences of the document abstract, plus the titles of cited references from the bibliography. We leverage human feedback that helps us evaluate models qualitatively on top of using traditional metrics. We provide users with chapter classification labels and summaries to improve access to ETD chapters. We generate the top three classification labels for each chapter that reflect the interdisciplinarity of the work in ETDs. Our evaluation proves that our ensemble methods yield summaries that are preferred by users. Our summaries also perform better than summaries generated by using a single method when evaluated on several metrics using an LLM-based evaluation methodology. / Doctor of Philosophy / Natural language processing (NLP) is a field in computer science that focuses on creating artificially intelligent models capable of processing text and audio similarly to humans. We make use of various NLP techniques, ranging from machine learning and language models, to provide users with a much more granular level of information stored in Electronic Theses and Dissertations (ETDs). ETDs are documents submitted by students conducting research at the culmination of their degree. Such documents comprise research work in various academic disciplines and thus contain a wealth of information. This work aims to make such information stored in chapters of ETDs more accessible to readers through the addition of chapter-level classification labels and summaries. We provide users with chapter classification labels and summaries to improve access to ETD chapters. We generate the top three classification labels for each chapter that reflect the interdisciplinarity of the work in ETDs. Alongside human evaluation of automatically generated summaries, we use an LLM-based approach that aims to score summaries on several metrics. Our evaluation proves that our methods yield summaries that users prefer to summaries generated by using a single method. Summarization Classification Natural Language Processing Machine Learning Language Models
256	AN ANALYSIS ON SHORT-FORM TEXT AND DERIVED ENGAGEMENT Ryan J Schwarz (19178926) 22 July 2024 (has links) <p dir="ltr">Short text has historically proven challenging to work with in many Natural Language<br>Processing (NLP) applications. Traditional tasks such as authorship attribution benefit<br>from having longer samples of work to derive features from. Even newer tasks, such as<br>synthetic text detection, struggle to distinguish between authentic and synthetic text in<br>the short-form. Due to the widespread usage of social media and the proliferation of freely<br>available Large Language Models (LLMs), such as the GPT series from OpenAI and Bard<br>from Google, there has been a deluge of short-form text on the internet in recent years.<br>Short-form text has either become or remained a staple in several ubiquitous areas such as<br>schoolwork, entertainment, social media, and academia. This thesis seeks to analyze this<br>short text through the lens of NLP tasks such as synthetic text detection, LLM authorship<br>attribution, derived engagement, and predicted engagement. The first focus explores the task<br>of detection in the binary case of determining whether tweets are synthetically generated or<br>not and proposes a novel feature extraction technique to improve classifier results. The<br>second focus further explores the challenges presented by short-form text in determining<br>authorship, a cavalcade of related difficulties, and presents a potential work around to those<br>issues. The final focus attempts to predict social media engagement based on the NLP<br>representations of comments, and results in some new understanding of the social media<br>environment and the multitude of additional factors required for engagement prediction.</p> Natural language processing Synthetic Text Detection Authorship Attribution Engagement Prediction
257	Automatic generation of natural language documentation from statecharts Garibay, Ivan Ibarguen 01 April 2000 (has links) No description available. Natural language processing Statecharts (Computer science) Electrical and Computer Engineering Engineering
258	Linguistic Cues to Deception Connell, Caroline 05 June 2012 (has links) This study replicated a common experiment, the Desert Survival Problem, and attempted to add data to the body of knowledge for deception cues. Participants wrote truthful and deceptive essays arguing why items salvaged from the wreckage were useful for survival. Cues to deception considered here fit into four categories: those caused by a deceivers' negative emotion, verbal immediacy, those linked to a deceiver's attempt to appear truthful, and those resulting from deceivers' high cognitive load. Cues caused by a deceiver's negative emotions were mostly absent in the results, although deceivers did use fewer first-person pronouns than truth tellers. That indicated deceivers were less willing to take ownership of their statements. Cues because of deceivers' attempts to appear truthful were present. Deceivers used more words and more exact language than truth tellers. That showed an attempt to appear truthful. Deceivers' language was simpler than that of truth tellers, which indicated a higher cognitive load. Future research should include manipulation checks on motivation and emotion, which are tied to cue display. The type of cue displayed, be it emotional leakage, verbal immediacy, attempts to appear truthful or cognitive load, might be associated with particular deception tasks. Future research, including meta-analyses, should attempt to determine which deception tasks produce which cue type. Revised file, GMc 5/28/2014 per Dean DePauw / Master of Arts computer-mediated communication deception detection natural language processing
259	Information Retrieval Models for Software Test Selection and Prioritization Gådin, Oskar January 2024 (has links) There are a lot of software systems currently in use for different applications. To make sure that these systems function there is a need to properly test and maintain them.When a system grows in scope it becomes more difficult to test and maintain, and so test selection and prioritization tools that incorporate artificial intelligence, information retrieval and natural language processing are useful. In this thesis, different information retrieval models were implemented and evaluated using multiple datasets based on different filters and pre-processing methods. The data was provided by Westermo Network Technologies AB and represent one of their systems. The datasets contained data with information about the test results and what data was used for the tests. The results showed that for models that are not trained for this data it is more beneficial to give them less data which is only related to test failures. Allowing the models to have access to more data showed that they made connections that were inaccurate as the data were unrelated. The results also showed that if a model is not adjusted to the data, a simple model could be more effective compared to a more advanced model. / Det finns många mjukvarusystem som för närvarande används för olika tjänster. För att säkerställa att dessa system fungerar korrekt är det nödvändigt att testa och underhålla dem ordentligt.När ett system växer i omfattning blir det svårare att testa och underhålla, och testverktyg för testselektion och testprioritering som integrerar artificiell intelligens, informationssökning och natural language processing är därför användbara. I denna rapport utvärderades olika informationssökningsmodeller med hjälp av olika dataset som är baserade på olika filter och förbehandlingsmetoder. Datan tillhandahölls av Westermo Network Technologies AB och representerar ett av deras system. Dataseten innehöll data med information om testresultaten och vilken data som användes för testen. Resultaten visade att för modeller som inte är tränade för denna data är det mer fördelaktigt att ge dem mindre data som endast är relaterade till testfel. Att ge modellerna tillgång till mer data visade att de gjorde felaktiga kopplingar eftersom data var orelaterad. Resultaten visade också att givet en modell inte var justerad mot data, kunde en enklare modell vara mer effektiv än en mer avancerad modell. Information retrieval Natural language processing Computer and Information Sciences Data- och informationsvetenskap
260	Predicting Personality from LinkedIn Profiles Using Natural Language Processing Tavoosi, Saba 01 January 2024 (has links) (PDF) LinkedIn profiles are increasingly serving as supplements to or substitutes for traditional resumes. Beyond the explicit information in LinkedIn profiles, research indicates that recruiters and hiring managers can infer additional applicant characteristics. The Big Five personality dimensions are particularly valuable for organizations to glean from these profiles due to their relationship with job performance. Although previous research has attempted to predict personality from LinkedIn, it has severe limitations, including limited practical utility due to relying on manual coding of profiles, inconsistent and largely non-significant findings, and a tendency to overlook the text data within profiles. This study addresses the first issue by developing an automated computer coding process, which significantly reduces profile coding time. The other limitations are tackled by drawing on the Realistic Accuracy Model and literature suggesting natural language contains personality cues to create a more comprehensive prediction model by incorporating the text data of LinkedIn profiles. Machine learning was used to analyze the profiles of 960 employees recruited through CloudResearch Connect and MTurk. Cross-validated and tested on out-of-sample data, the results indicate that all the Big Five personality dimensions can be validly and reliably predicted from LinkedIn profiles when text data is incorporated and analyzed through open-vocabulary approaches, but generally not when text is not included. Additionally, the built models result in fewer subgroup differences than a traditional self-report personality assessment. This research provides a more efficient and accurate approach to predicting personality from LinkedIn profiles. The implications and limitations of the developed approach are discussed. Personality Big Five Machine Learning Natural Language Processing Prediction LinkedIn

Search results