Global ETD Search

601	Automatic Detection of Section Title and Prose Text in HTML Documents Using Unsupervised and Supervised Learning Mysore Gopinath, Abhijith Athreya January 2018 (has links) No description available. Computer Science HTML Structure Analysis Natural Language Processing Topicality Detection in HTML Machine Learning Privacy Policies
602	QED: A Fact Verification and Evidence Support System Luken, Jackson 28 August 2019 (has links) No description available. Computer Science FEVER Natural Language Processing Fact Verification Computer Science Machine Learning
603	Unsupervised Interpretable Feature Extraction for Binary Executables using LIBCAISE Greer, Jeremiah 21 October 2019 (has links) No description available. Computer Science Natural Language Processing Cybersecurity Program Analysis Software Assembly Unsupervised
604	Natural Language Processing, Statistical Inference, and American Foreign Policy Lauretig, Adam M. 06 November 2019 (has links) No description available. Political Science Political Science Text as Data Natural Language Processing Variational Bayesian Word Embeddings Securitization
605	Using Sentence Embeddings for Word Sense Induction Tallo, Philip T. January 2020 (has links) No description available. Computer Science Natural Language Processing Artificial Intelligence Word Sense Induction Sentence Embedding Polysemy Word Sense Disambiguation
606	Classification of User Stories using aNLP and Deep Learning Based Approach Kandikari, Bhavesh January 2023 (has links) No description available. Natural Language Processing Word Embedding Requirement Refining User Story Deep Learning Computer Sciences Datavetenskap (datalogi)
607	Sentiment Analysis for Swedish : The Impact of Emojis on Sentiment Analysis of Swedish Informal Texts Berggren, Lovisa January 2023 (has links) This study investigates the use of emojis in sentiment analysis for the Swedish language, with the objective to assess if emojis improve the performance of the model. Sentiment analysis is an NLP classification task aimed at extracting people's opinions, sentiments, and attitudes from language. Though sentiment analysis as a research area has made a lot of progress recently, there are still some challenges to overcome. In this work, two of these challenges were considered; the analysis of a non-English language and the impact of emojis. These areas were explored through creating a sentiment annotated dataset of Swedish texts containing emojis, and creating a Swedish sentiment analysis model for evaluation. The sentiment analysis model created, SweVADER, was based on the English Lexicon-based model VADER. The best performing SweVADER model achieved an accuracy of 0.53 and an F1-score of 0.47. Furthermore, the presence of emojis improved the analysis for most models, but not by much. The results indicate that the use of emojis can improve the sentiment analysis, but there were other features affecting the results as well. The sentiment lexicon used plays a key role, and pre-processing techniques like stemming could affect the performance too. A takeaway from this study is that emojis contain important sentiment information, and should not be disregarded. Furthermore, emojis are useful when analyzing texts, if there is a lack of linguistic resources for the language in question. sentiment analysis NLP natural language processing emojis swedish Computer Sciences Datavetenskap (datalogi) Engineering and Technology Teknik och teknologier
608	Automatic evaluation of the effectiveness ofcommunication between software developers -NLP/AI Haapasaari Lindgren, Marcus, Persson, Jon January 2023 (has links) Communication is one of the most demanding andimportant parts of effective software development.Furthermore, the effectiveness of software developmentcommunication can be measured with the three collaborativeinterpersonal problem-solving conversation dimensions:Active Discussion, Creative Conflict, and ConversationManagement.Previous work that utilized these dimensions to analyzecommunication relied on manually labeling thecommunication, a process that is time-consuming and notapplicable to real-time use.In this study, natural language processing and supervisedmachine learning were investigated for the automaticclassification and measurement of collaborativeinterpersonal problem-solving conversation dimensions intranscribed software development communication. Thisapproach enables the evaluation of communication andprovides suggestions to improve software developmentefficiency.To determine the optimal classification approach, this workexamined nine different classifiers. It was determined thatthe classifier that scored the highest was Random Forest,followed by Decision Tree and SVM.Random Forest managed to achieve accuracy, precision, andrecall up to 93.66%, 93.76%, and 93.63%, respectively whentrained and tested with stratified 10-fold cross-validation. Machine Learning Natural Language Processing Conversation Supervised Learning Software Engineering Programvaruteknik
609	Document Expansion for Swedish Information Retrieval Systems / Dokumentexpansion för svenska informationssökningssystem Hagström, Tobias January 2023 (has links) Information retrieval systems have come to change how users interact with computerized systems and locate information. A major challenge when designing these systems is how to handle the vocabulary mismatch problem, i.e. that users, when formulating queries, pick different words than those present in the relevant documents that should be retrieved. With recent advances in artificial intelligence and the emergence of transformer-based language models, new methods have been proposed to alleviate this problem. One such method is the usage of document expansion models which append words to each document that are likely to be part of users’ queries. As previous research on document expansion models has been focused on English-language applications, this thesis investigates the effectiveness of one such model for Swedish applications. Although no improvement was found when using this method, the result is likely to be a consequence of dataset quality and domain rather than the method itself. / Informationssökningssystem har förändrat hur användare interagerar med datorsystem och lokaliserar information. En betydande utmaning när dessa system designas är hur det s.k. ”vocabulary mismatch”-problemet ska hanteras, d.v.s. att användare väljer andra söktermer än de som förekommer i de relevanta dokumenten som söksystemet ska hitta. Nya framsteg inom artificiell intelligens och utvecklingen av transformer-baserade språkmodeller har lett till att nya metoder har föreslagits för att mildra det här problemet. En sådan metod är att använda dokumentexpansionsmodeller som lägger till ord till varje dokument som är sannolika att förekomma som söktermer. Då tidigare forskning på dokumentexpansionsmodeller har fokuserat på engelskspråkiga tillämpningar fokuserar det här arbetet i stället på hur väl sådana modeller fungerar för svenskspråkiga tillämpningar. Även om ingen förbättring observerades när denna metod tillämpades är resultatet sannolikt en konsekvens av datamängdens kvalitet och domän snarare än metoden i sig. Information retrieval Natural language processing Deep learning informationssökningssystem språkteknologi djupinlärning Computer and Information Sciences Data- och informationsvetenskap
610	Analyzing Toxicity in YouTube Comments with the Help of Machine Learning Dehkhoda, Sasan, Gunica, Jasmyn Ali January 2023 (has links) Toxic comments are overall likely to make someone feel uncomfortable and leave a discussion and are therefore potentially problematic. Toxic comments occur online on various social media, and depending on the site, get detected manually or via machine learning algorithms (or both), and removed depending on the severity and other factors. The problem is the lack of research on toxic comments on Swedish YouTube channels, meaning that content creators, especially new ones, will be unfamiliar with and unprepared for these toxic comments. We aim to expand research in this area by finding out not only the proportion of comments on Swedish YouTube channels that are toxic, but what type of toxic comments occur, and what types are the most common. A Survey of documents was the chosen research strategy, and mixed methods were used as well, by combining qualitative and quantitative data analysis, with more focus on the quantitative aspect. A random sample of 79 577 YouTube comments was collected as data, and the machine learning program Hatescan was used to generate a toxicity score for each comment, allowing us to sort these comments based on score, and sample to manually analyze the type of toxicity of these comments. The results show that 0.643% of the total comments analyzed were toxic. It was found that most of the toxic comments are directed toward someone from the video. Toxic comments in the form of personal insults, and toxic comments about someone’s intelligence/competence were by far the most common. Toxic comments YouTube Machine Learning Natural Language Processing Computer Sciences Datavetenskap (datalogi)

Search results