Global ETD Search

251	Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social Media Gunturi, Uma Sushmitha 11 July 2023 (has links) Experiences of interpersonal racism persist as a prevalent reality for BIPOC (Black, Indigenous, People of Color) in the United States. One form of racism that often goes unnoticed is racial microaggressions. These are subtle acts of racism that leave victims questioning the intent of the aggressor. The line of offense is often unclear, as these acts are disguised through humor or seemingly harmless intentions. In this study, we analyze the language used in online racial microaggressions ("Acts") and compare it to personal narratives recounting experiences of such aggressions ("Recalls") by Black social media users. We curated a corpus of acts and recalls from social media discussions on platforms like Reddit and Tumblr. Additionally, we collaborated with Black participants in a workshop to hand-annotate and verify the corpus. Using natural language processing techniques and qualitative analysis, we examine the language underlying acts and recalls of racial microaggressions. Our goal is to understand the lexical patterns that differentiate the two in the context of racism in the U.S. Our findings indicate that neural language models can accurately classify acts and recalls, revealing contextual words that associate Blacks with objects that perpetuate negative stereotypes. We also observe overlapping linguistic signatures between acts and recalls, serving different purposes, which have implications for current challenges in social media content moderation systems. / Master of Science / Racial Microaggressions are expressions of human biases that are subtly disguised. The differences in language and themes used in instances of Racial Microaggressions ("Acts") and the discussions addressing them ("Recalls") on online communities have made it difficult for researchers to automatically quantify and extract these differences. In this study, we introduce a tool that can effectively distinguish acts and recalls of microaggressions. We utilize Natural Language Processing techniques to classify and identify key distinctions in language usage and themes. Additionally, we employ qualitative methods and engage in workshop discussions with Black participants to interpret the classification results. Our findings reveal common linguistic patterns between acts and recalls that serve opposing purposes. Acts tend to stereotype and degrade Black people, while recalls seek to portray their discomfort and seek validation for their experiences. These findings highlight why recalls are often considered toxic in online communities. This also represents an initial step towards creating a socio-technical system that safeguards the experiences of racial minority groups. Natural Language Processing Human Centered Computing Race and Ethnicity
252	Computational models of coherence for open-domain dialogue Cervone, Alessandra 08 October 2020 (has links) Coherence is the quality that gives a text its conceptual unity, making a text a coordinated set of connected parts rather than a random group of sentences (turns, in the case of dialogue). Hence, coherence is an integral property of human communication, necessary for a meaningful discourse both in text and dialogue. As such, coherence can be regarded as a requirement for conversational agents, i.e. machines designed to converse with humans. Though recently there has been a proliferation in the usage and popularity of conversational agents, dialogue coherence is still a relatively neglected area of research, and coherence across multiple turns of a dialogue remains an open challenge for current conversational AI research. As conversational agents progress from being able to handle a single application domain to multiple ones through any domain (open-domain), the range of possible dialogue paths increases, and thus the problem of maintaining multi-turn coherence becomes especially critical. In this thesis, we investigate two aspects of coherence in dialogue and how they can be used to design modules for an open-domain coherent conversational agent. In particular, our approach focuses on modeling intentional and thematic information patterns of distribution as proxies for a coherent discourse in open-domain dialogue. While for modeling intentional information we employ Dialogue Acts (DA) theory (Bunt, 2009); for modeling thematic information we rely on open-domain entities (Barzilay and Lapata, 2008). We find that DAs and entities play a fundamental role in modelling dialogue coherence both independently and jointly, and that they can be used to model different components of an open-domain conversational agent architecture, such as Spoken Language Understanding, Dialogue Management, Natural Language Generation, and open-domain dialogue evaluation. The main contributions of this thesis are: (I) we present an open-domain modular conversational agent architecture based on entity and DA structures designed for coherence and engagement; (II) we propose a methodology for training an open-domain DA tagger compliant with the ISO 24617-2 standard (Bunt et al., 2012) combining multiple resources; (III) we propose different models, and a corpus, for predicting open-domain dialogue coherence using DA and entity information trained with weakly supervised techniques, first at the conversation level and then at the turn level; (IV) we present supervised approaches for automatic evaluation of open-domain conversation exploiting DA and entity information, both at the conversation level and at the turn level; (V) we present experiments with Natural Language Generation models that generate text from Meaning Representation structures composed of DAs and slots for an open-domain setting.
253	Linguistic Cues to Deception Connell, Caroline 05 June 2012 (has links) This study replicated a common experiment, the Desert Survival Problem, and attempted to add data to the body of knowledge for deception cues. Participants wrote truthful and deceptive essays arguing why items salvaged from the wreckage were useful for survival. Cues to deception considered here fit into four categories: those caused by a deceivers' negative emotion, verbal immediacy, those linked to a deceiver's attempt to appear truthful, and those resulting from deceivers' high cognitive load. Cues caused by a deceiver's negative emotions were mostly absent in the results, although deceivers did use fewer first-person pronouns than truth tellers. That indicated deceivers were less willing to take ownership of their statements. Cues because of deceivers' attempts to appear truthful were present. Deceivers used more words and more exact language than truth tellers. That showed an attempt to appear truthful. Deceivers' language was simpler than that of truth tellers, which indicated a higher cognitive load. Future research should include manipulation checks on motivation and emotion, which are tied to cue display. The type of cue displayed, be it emotional leakage, verbal immediacy, attempts to appear truthful or cognitive load, might be associated with particular deception tasks. Future research, including meta-analyses, should attempt to determine which deception tasks produce which cue type. Revised file, GMc 5/28/2014 per Dean DePauw / Master of Arts computer-mediated communication deception detection natural language processing
254	Information Retrieval Models for Software Test Selection and Prioritization Gådin, Oskar January 2024 (has links) There are a lot of software systems currently in use for different applications. To make sure that these systems function there is a need to properly test and maintain them.When a system grows in scope it becomes more difficult to test and maintain, and so test selection and prioritization tools that incorporate artificial intelligence, information retrieval and natural language processing are useful. In this thesis, different information retrieval models were implemented and evaluated using multiple datasets based on different filters and pre-processing methods. The data was provided by Westermo Network Technologies AB and represent one of their systems. The datasets contained data with information about the test results and what data was used for the tests. The results showed that for models that are not trained for this data it is more beneficial to give them less data which is only related to test failures. Allowing the models to have access to more data showed that they made connections that were inaccurate as the data were unrelated. The results also showed that if a model is not adjusted to the data, a simple model could be more effective compared to a more advanced model. / Det finns många mjukvarusystem som för närvarande används för olika tjänster. För att säkerställa att dessa system fungerar korrekt är det nödvändigt att testa och underhålla dem ordentligt.När ett system växer i omfattning blir det svårare att testa och underhålla, och testverktyg för testselektion och testprioritering som integrerar artificiell intelligens, informationssökning och natural language processing är därför användbara. I denna rapport utvärderades olika informationssökningsmodeller med hjälp av olika dataset som är baserade på olika filter och förbehandlingsmetoder. Datan tillhandahölls av Westermo Network Technologies AB och representerar ett av deras system. Dataseten innehöll data med information om testresultaten och vilken data som användes för testen. Resultaten visade att för modeller som inte är tränade för denna data är det mer fördelaktigt att ge dem mindre data som endast är relaterade till testfel. Att ge modellerna tillgång till mer data visade att de gjorde felaktiga kopplingar eftersom data var orelaterad. Resultaten visade också att givet en modell inte var justerad mot data, kunde en enklare modell vara mer effektiv än en mer avancerad modell. Information retrieval Natural language processing Computer and Information Sciences Data- och informationsvetenskap
255	Automatic generation of natural language documentation from statecharts Garibay, Ivan Ibarguen 01 April 2000 (has links) No description available. Natural language processing Statecharts (Computer science) Electrical and Computer Engineering Engineering
256	Event Centric Approaches in Natural Language Processing / 自然言語処理におけるイベント中心のアプローチ Huang, Yin Jou 26 July 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23438号 / 情博第768号 / 新制\|\|情\|\|131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授黒橋禎夫, 教授河原達也, 教授伊藤孝行 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Natural Language Processing Event Coreference Resolution Narrative Event Relation Extractive Summarization 007
257	JOKE RECOMMENDER SYSTEM USING HUMOR THEORY Soumya Agrawal (9183053) 29 July 2020 (has links) <p>The fact that every individual has a different sense of humor and it varies greatly from one person to another means that it is a challenge to learn any individual’s humor preferences. Humor is much more than just a source of entertainment; it is an essential tool that aids communication. Understanding humor preferences can lead to improved social interactions and bridge existing social or economic gaps.</p><p> </p><p>In this study, we propose a methodology that aims to develop a recommendation system for jokes by analyzing its text. Various researchers have proposed different theories of humor depending on their area of focus. This exploratory study focuses mainly on Attardo and Raskin’s (1991) General Theory of Verbal Humor and implements the knowledge resources defined by it to annotate the jokes. These annotations contain the characteristics of the jokes and also play an important role in determining how alike these jokes are. We use Lin’s similarity metric (Lin, 1998) to computationally capture this similarity. The jokes are clustered in a hierarchical fashion based on their similarity values used for the recommendation. We also compare our joke recommendations to those obtained by the Eigenstate algorithm (Goldberg, Roeder, Gupta, & Perkins, 2001), an existing joke recommendation system that does not consider the content of the joke in its recommendation.</p> Applied Computer Science Linguistics Natural Language Processing Computational Linguistics computational humor humor research natural language processing joke similarity
258	Giant Pigeon and Small Person: Prompting Visually Grounded Models about the Size of Objects Yi Zhang (12438003) 22 April 2022 (has links) <p>Empowering machines to understand our physical world should go beyond models with only natural language and models with only vision. Vision and language is a growing field of study that attempts to bridge the gap between natural language processing and computer vision communities by enabling models to learn visually grounded language. However, as an increasing number of pre-trained visual linguistic models focus on the alignment between visual regions and natural language, it is difficult to claim that these models capture certain properties of objects in their latent space, such as size. Inspired by recent trends in prompt learning, this study will design a prompt learning framework for two visual linguistic models, ViLBERT and ViLT, and use different manually crafted prompt templates to evaluate the consistency of performance of these models in comparing the size of objects. The results of this study showed that ViLT is more consistent in prediction accuracy for the given task with six pairs of objects under four prompt designs. However, the overall prediction accuracy is lower than the expectation on this object size comparison task; even the better model in this study, ViLT, has only 16 out of 24 cases better than the proposed random chance baseline. As this study is a preliminary study to explore the potential of pre-trained visual linguistic models on object size comparison, there are many directions for future work, such as investigating more models, choosing more object pairs, and trying different methods for feature engineering and prompt engineering.</p> Computer Vision Natural Language Processing Natural Language Processing Computer Vision Prompt Learning Visual Linguistic Task
259	Comments as reviews: Predicting answer acceptance by measuring sentiment on stack exchange William Chase Ledbetter IV (12261440) 16 June 2023 (has links) <p>Online communication has increased the need to rapidly interpret complex emotions due to the volatility of the data involved; machine learning tasks that process text, such as sentiment analysis, can help address this challenge by automatically classifying text as positive, negative, or neutral. However, while much research has focused on detecting offensive or toxic language online, there is also a need to explore and understand the ways in which people express positive emotions and support for one another in online communities. This is where sentiment dictionaries and other computational methods can be useful, by analyzing the language used to express support and identifying common patterns or themes.</p> <p><br></p> <p>This research was conducted by compiling data from social question and answering around machine learning on the site Stack Exchange. Then a classification model was constructed using binary logistic regression. The objective was to discover whether predictions of marked solutions are accurate by treating the comments as reviews. Measuring collaboration signals may help capture the nuances of language around support and assistance, which could have implications for how people understand and respond to expressions of help online. By exploring this topic further, researchers can gain a more complete understanding of the ways in which people communicate and connect online.</p> Natural language processing Affective computing natural language processing affective computing relational database systems question and answer polarity determination
260	Skadligt innehåll på nätet - Toxiskt språk på TikTok Wester, Linn, Stenvall, Elin January 2024 (has links) Toxiskt språk på internet och det som ofta i vardagliga termer benämns som näthat innefattar kränkningar, hot och stötande språk. Toxiskt språk är särskilt märkbart på sociala medier. Det går att upptäcka toxiskt språk på internet med hjälp av maskininlärning som automatiskt känner igen typiska särdrag för toxiskt språk. Tidigare svensk forskning har undersökt förekomsten av toxiskt språk på sociala medier med hjälp av maskininlärning, men det saknas fortfarande forskning på den allt mer populära plattformen TikTok. Syftet med denna studie är att undersöka förekomsten och särdragen av toxiska kommentarer på TikTok med hjälp av maskininlärning och manuella metoder. Studien är menad att ge en bättre förståelse för vad unga möts av i kommentarerna på TikTok. Studien applicerar en mixad metod i en dokumentundersökning av 69 895 kommentarer. Maskininlärningsmodellen Hatescan användes för att automatiskt klassificera sannolikheten att toxiskt språk förekommer i kommentarerna. Utifrån denna sannolikhet analyserades ett urval av kommentarerna manuellt, vilket ledde till både kvantitativa och kvalitativa fynd. Resultatet av studien visade att omfattningen av toxiskt språk var relativt liten, där 0,24% av 69 895 kommentarer ansågs vara toxiska enligt en både automatiserad och manuell bedömning. Den typ av toxiskt språk som mest förekom i undersökningen visades vara obscent språk, som till majoriteten innehöll svordomar. / Toxic language on the internet and what is often referred to in everyday terms as cyberbullying includes insults, threats and offensive language. Toxic language is particularly noticeable on social media. It is possible to detect toxic language on the internet with the help of machine learning in the form of, among other things, Natural Language Processing (NLP) techniques, which automatically recognize typical characteristics of toxic language. Previous Swedish research has investigated the presence of toxic language on social media using machine learning, but there is still a lack of research on the increasingly popular platform TikTok. Through the study, the authors intend to investigate the prevalence and characteristics of toxic comments on TikTok using both a machine learning technique and manual methods. The study is meant to provide a better understanding of what young people encounter in the comments on TikTok. The study applies a mixed method in a document survey of 69 895 comments. Hatescan was used to automatically classify the likelihood of toxic language appearing in the comments. Based on this probability, a section of the comments could be sampled and manually analysed using theory, leading to both quantitative and qualitative findings. The results of the study showed that the prevalence of toxic language was relatively small, with 0.24% of 69 895 comments considered toxic based on an automatic and manual analysis. The type of toxic language that occurred the most in the study was shown to be obscene language, the majority of which contained swear words. TikTok Machine Learning Natural Language Processing Toxic language TikTok Maskininlärning Natural Language Processing Toxiskt språk Information Systems

Search results