Global ETD Search

1	Leverage Fusion of Sentiment Features and Bert-based Approach to Improve Hate Speech Detection Cheng, Kai Hsiang 23 June 2022 (has links) Social media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, the Bert-based approach is promising and thus dominates SemEval-2019 Task 6, a hate speech detection competition. In this work, the method of fusion of sentiment features and Bert-based approach is proposed. The classic Bert architecture for hate speech detection is modified to fuse with additional sentiment features, provided by an extractor pre-trained on Sentiment140. The proposed model is compared with top-3 models in SemEval-2019 Task 6 Subtask A and achieves 83.1% F1 score that better than the models in the competition. Also, to see if additional sentiment features benefit the detectoin of hate speech, the features are fused with three kind of deep learning architectures respectively. The results show that the models with sentiment features perform better than those models without sentiment features. / Master of Science / Social media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, Bert is one of promising approach for automatic hate speech recognition. Bert is a kind of deep learning model for natural language processing (NLP) that originated from Transformer developed by Google in 2017. The Bert has applied to many NLP tasks and achieved astonished results such as text classification, semantic similarity between pairs of sentences, question answering with given paragraph, and text summarization. So in this study, Bert will be adopted to learn the meaning of given text and distinguish the hate speech from tons of tweets automatically. In order to let Bert better capture hate speech, the approach in this work modifies Bert to take additional source of sentiment-related features for learning the pattern of hate speech, given that the emotion will be negative when people trying to put out abusive speech. For evaluation of the approach, our model is compared against those in SemEval-2019 Task 6, a famous hate speech detection competition, and the results show that the proposed model achieves 83.1\% F1 score better than the models in the competition. Also, to see if additional sentiment features benefit the detection of hate speech, the features are fused with three different kinds of deep learning architectures respectively, and the results show that the models with sentiment features perform better than those without sentiment features. hate speech detection sentiment features BERT
2	Can Hatescan Detect Antisemitic Hate Speech Nyrén, Olle January 2023 (has links) This thesis focuses on how well Hatescan, a hate speech detector built on the same Natural Language Processing and AI algorithms used in most online hate speech detectors, can detect different categories of antisemitism as well as whether or not it is worse at detecting implicit antisemitism than explicit antisemitism. The ability of hate speech detectors to detect antisemitic hate speech is a pressing issue. Jews have not only persevered through unparalleled historical oppression, but additionally, antisemitism is very much alive and kicking online, which poses not only a direct threat to individual Jews themselves (since there is a clear link between antisemitic expressions and antisemitic violence) but to the idea of liberal democracy itself. This thesis evaluated the efficacy of the hate speech detector, Hatescan, regarding its ability to detect antisemitism and to assess whether or not it was better or worse at detecting explicit antisemitism or implicit antisemitism, expressed in Swedish. Thus, the research questions posed for this thesis were: 1. How well does Hatescan detect antisemitism? 2. Is Hatescan equally efficient at detecting different categories of antisemitism? 3. Is Hatescan equally efficient at detecting implicit antisemitism and explicit antisemitism? To answer these questions, this thesis used the research strategy experiment, the data collection method documents, qualitative analysis methods (discourse analysis) for annotation, and quantitative analysis methods (descriptive statistics) for calculating performance metrics (precision, recall, F1-score, and accuracy). A dataset was created using three other previously existing datasets containing hate speech expressed in Swedish on Reddit, Flashback, and Twitter. The data collected was collected used search terms presumed to appear in antisemitic content. The datasets were created by the supervisor of this thesis and her research team for use in previous studies. These datasets were combined and made into one dataset (in a spreadsheet). Duplicates were deleted, adn each remaining sentence was annotated according to hatefulness, category of antisemtism and explicit versus implicit antisemitism. Each sentence was manually run through Hatescan’s web interface to generate a Hatescan output and said output was documented in the spreadsheet containing the data. Based on a threshold of 70% for generated Hatescan output, the Hatescan output for each sentence was annotated as either being a true positive, false positive, false negative, or true negative using IFS formulas in the spreadsheet. Precision, recall, and F1-score were calculated for the dataset as a whole, and accuracy rates were calculated for all categories of antisemitism as well as for explicit and implicit antisemitism. Results showed that while performance metrics on the antisemitic dataset (precision 0.93, recall 0.85, F1-score 0.89) were similar to the performance metrics in the development of Hatescan (precision 0.89, recall 0.94, F1-score 0.91), there were significant differences in accuracy between the different annotated categories in the dataset (accuracy ranging from 27 percent to 92 percent). Artificial intelligence hate speech detection hate speech antisemitism Computer Sciences Datavetenskap (datalogi)
3	A Tale of Two Domains: Automatic Identification of Hate Speech in Cross-Domain Scenarios / Automatisk identifikation av näthat i domänöverföringsscenarion Gren, Gustaf January 2023 (has links) As our lives become more and more digital, our exposure to certain phenomena increases, one of which is hate speech. Thus, automatic hate speech identification is needed. This thesis explores three strategies for hate speech detection for cross-domain scenarios: using a model trained on annotated data for a previous domain, a model trained on data from a novel methodology of automatic data derivation (with cross-domain scenarios in mind), and using ChatGPT as a domain-agnostic classifier. Results showed that cross-domain scenarios remain a challenge for hate speech detection, results which are discussed out of both technical and ethical considerations. / I takt med att våra liv blir allt mer digitala ökar vår exponering för vissa fenomen, varav ett är näthat. Därför behövs automatisk identifikation av näthat. Denna uppsats utforskar tre strategier för att upptäcka hatretorik för korsdomänscenarion: att använda inferenserna av en modell tränad på annoterad data för en tidigare domän, att använda inferenserna av en modell tränad på data från en ny metodologi för automatisk dataderivatisering som föreslås (för denna avhandling), samt att använda ChatGPT som klassifierare. Resultaten visade att korsdomänscenarion fortfarande utgör en utmaning för upptäckt av näthat, resultat som diskuteras utifrån både tekniska och etiska överväganden. NLP hate speech detection transformers BERT ChatGPT Språkteknologi näthat hatretorik transformers BERT ChatGPT

1

Page generated in 0.1128 seconds