1 |
Leverage Fusion of Sentiment Features and Bert-based Approach to Improve Hate Speech DetectionCheng, Kai Hsiang 23 June 2022 (has links)
Social media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, the Bert-based approach is promising and thus dominates SemEval-2019 Task 6, a hate speech detection competition. In this work, the method of fusion of sentiment features and Bert-based approach is proposed. The classic Bert architecture for hate speech detection is modified to fuse with additional sentiment features, provided by an extractor pre-trained on Sentiment140. The proposed model is compared with top-3 models in SemEval-2019 Task 6 Subtask A and achieves 83.1% F1 score that better than the models in the competition. Also, to see if additional sentiment features benefit the detectoin of hate speech, the features are fused with three kind of deep learning architectures respectively. The results show that the models with sentiment features perform better than those models without sentiment features. / Master of Science / Social media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual.
With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, Bert is one of promising approach for automatic hate speech recognition. Bert is a kind of deep learning model for natural language processing (NLP) that originated from Transformer developed by Google in 2017. The Bert has applied to many NLP tasks and achieved astonished results such as text classification, semantic similarity between pairs of sentences, question answering with given paragraph, and text summarization. So in this study, Bert will be adopted to learn the meaning of given text and distinguish the hate speech from tons of tweets automatically. In order to let Bert better capture hate speech, the approach in this work modifies Bert to take additional source of sentiment-related features for learning the pattern of hate speech, given that the emotion will be negative when people trying to put out abusive speech. For evaluation of the approach, our model is compared against those in SemEval-2019 Task 6, a famous hate speech detection competition, and the results show that the proposed model achieves 83.1\% F1 score better than the models in the competition. Also, to see if additional sentiment features benefit the detection of hate speech, the features are fused with three different kinds of deep learning architectures respectively, and the results show that the models with sentiment features perform better than those without sentiment features.
|
2 |
Detection and handling of overlapping speech for speaker diarizationZelenák, Martin 31 January 2012 (has links)
For the last several years, speaker diarization has been attracting substantial research attention as one of the spoken
language technologies applied for the improvement, or enrichment, of recording transcriptions. Recordings of meetings,
compared to other domains, exhibit an increased complexity due to the spontaneity of speech, reverberation effects, and also
due to the presence of overlapping speech.
Overlapping speech refers to situations when two or more speakers are speaking simultaneously. In meeting data, a
substantial portion of errors of the conventional speaker diarization systems can be ascribed to speaker overlaps, since usually
only one speaker label is assigned per segment. Furthermore, simultaneous speech included in training data can eventually
lead to corrupt single-speaker models and thus to a worse segmentation.
This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker
diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on
distant microphone channel data. Spatial features from different microphone pairs are fused by means of principal component
analysis, linear discriminant analysis, or by a multi-layer perceptron.
In addition, we also investigate the possibility of employing longterm prosodic information. The most suitable subset from a set
of candidate prosodic features is determined in two steps. Firstly, a ranking according to mRMR criterion is obtained, and then,
a standard hill-climbing wrapper approach is applied in order to determine the optimal number of features.
The novel spatial as well as prosodic parameters are used in combination with spectral-based features suggested previously in
the literature. In experiments conducted on AMI meeting data, we show that the newly proposed features do contribute to the
detection of overlapping speech, especially on data originating from a single recording site.
In speaker diarization, for segments including detected speaker overlap, a second speaker label is picked, and such segments
are also discarded from the model training. The proposed overlap labeling technique is integrated in Viterbi decoding, a part of
the diarization algorithm. During the system development it was discovered that it is favorable to do an independent
optimization of overlap exclusion and labeling with respect to the overlap detection system.
We report improvements over the baseline diarization system on both single- and multi-site AMI data. Preliminary experiments
with NIST RT data show DER improvement on the RT ¿09 meeting recordings as well.
The addition of beamforming and TDOA feature stream into the baseline diarization system, which was aimed at improving the
clustering process, results in a bit higher effectiveness of the overlap labeling algorithm. A more detailed analysis on the
overlap exclusion behavior reveals big improvement contrasts between individual meeting recordings as well as between
various settings of the overlap detection operation point. However, a high performance variability across different recordings is
also typical of the baseline diarization system, without any overlap handling.
|
3 |
Can Hatescan Detect Antisemitic Hate SpeechNyrén, Olle January 2023 (has links)
This thesis focuses on how well Hatescan, a hate speech detector built on the same Natural Language Processing and AI algorithms used in most online hate speech detectors, can detect different categories of antisemitism as well as whether or not it is worse at detecting implicit antisemitism than explicit antisemitism. The ability of hate speech detectors to detect antisemitic hate speech is a pressing issue. Jews have not only persevered through unparalleled historical oppression, but additionally, antisemitism is very much alive and kicking online, which poses not only a direct threat to individual Jews themselves (since there is a clear link between antisemitic expressions and antisemitic violence) but to the idea of liberal democracy itself. This thesis evaluated the efficacy of the hate speech detector, Hatescan, regarding its ability to detect antisemitism and to assess whether or not it was better or worse at detecting explicit antisemitism or implicit antisemitism, expressed in Swedish. Thus, the research questions posed for this thesis were: 1. How well does Hatescan detect antisemitism? 2. Is Hatescan equally efficient at detecting different categories of antisemitism? 3. Is Hatescan equally efficient at detecting implicit antisemitism and explicit antisemitism? To answer these questions, this thesis used the research strategy experiment, the data collection method documents, qualitative analysis methods (discourse analysis) for annotation, and quantitative analysis methods (descriptive statistics) for calculating performance metrics (precision, recall, F1-score, and accuracy). A dataset was created using three other previously existing datasets containing hate speech expressed in Swedish on Reddit, Flashback, and Twitter. The data collected was collected used search terms presumed to appear in antisemitic content. The datasets were created by the supervisor of this thesis and her research team for use in previous studies. These datasets were combined and made into one dataset (in a spreadsheet). Duplicates were deleted, adn each remaining sentence was annotated according to hatefulness, category of antisemtism and explicit versus implicit antisemitism. Each sentence was manually run through Hatescan’s web interface to generate a Hatescan output and said output was documented in the spreadsheet containing the data. Based on a threshold of 70% for generated Hatescan output, the Hatescan output for each sentence was annotated as either being a true positive, false positive, false negative, or true negative using IFS formulas in the spreadsheet. Precision, recall, and F1-score were calculated for the dataset as a whole, and accuracy rates were calculated for all categories of antisemitism as well as for explicit and implicit antisemitism. Results showed that while performance metrics on the antisemitic dataset (precision 0.93, recall 0.85, F1-score 0.89) were similar to the performance metrics in the development of Hatescan (precision 0.89, recall 0.94, F1-score 0.91), there were significant differences in accuracy between the different annotated categories in the dataset (accuracy ranging from 27 percent to 92 percent).
|
4 |
Speech Detection Using Gammatone Features And One-class Support Vector MachineCooper, Douglas 01 January 2013 (has links)
A network gateway is a mechanism which provides protocol translation and/or validation of network traffic using the metadata contained in network packets. For media applications such as Voice-over-IP, the portion of the packets containing speech data cannot be verified and can provide a means of maliciously transporting code or sensitive data undetected. One solution to this problem is through Voice Activity Detection (VAD). Many VAD’s rely on time-domain features and simple thresholds for efficient speech detection however this doesn’t say much about the signal being passed. More sophisticated methods employ machine learning algorithms, but train on specific noises intended for a target environment. Validating speech under a variety of unknown conditions must be possible; as well as differentiating between speech and nonspeech data embedded within the packets. A real-time speech detection method is proposed that relies only on a clean speech model for detection. Through the use of Gammatone filter bank processing, the Cepstrum and several frequency domain features are used to train a One-Class Support Vector Machine which provides a clean-speech model irrespective of environmental noise. A Wiener filter is used to provide improved operation for harsh noise environments. Greater than 90% detection accuracy is achieved for clean speech with approximately 70% accuracy for SNR as low as 5dB
|
5 |
A Tale of Two Domains: Automatic Identification of Hate Speech in Cross-Domain Scenarios / Automatisk identifikation av näthat i domänöverföringsscenarionGren, Gustaf January 2023 (has links)
As our lives become more and more digital, our exposure to certain phenomena increases, one of which is hate speech. Thus, automatic hate speech identification is needed. This thesis explores three strategies for hate speech detection for cross-domain scenarios: using a model trained on annotated data for a previous domain, a model trained on data from a novel methodology of automatic data derivation (with cross-domain scenarios in mind), and using ChatGPT as a domain-agnostic classifier. Results showed that cross-domain scenarios remain a challenge for hate speech detection, results which are discussed out of both technical and ethical considerations. / I takt med att våra liv blir allt mer digitala ökar vår exponering för vissa fenomen, varav ett är näthat. Därför behövs automatisk identifikation av näthat. Denna uppsats utforskar tre strategier för att upptäcka hatretorik för korsdomänscenarion: att använda inferenserna av en modell tränad på annoterad data för en tidigare domän, att använda inferenserna av en modell tränad på data från en ny metodologi för automatisk dataderivatisering som föreslås (för denna avhandling), samt att använda ChatGPT som klassifierare. Resultaten visade att korsdomänscenarion fortfarande utgör en utmaning för upptäckt av näthat, resultat som diskuteras utifrån både tekniska och etiska överväganden.
|
6 |
Multi-speaker Speech Activity Detection From VideoWejdelind, Marcus, Wägmark, Nils January 2020 (has links)
A conversational robot will in many cases have todeal with multi-party spoken interaction in which one or morepeople could be speaking simultaneously. To do this, the robotmust be able to identify the speakers in order to attend to them.Our project has approached this problem from a visual pointof view where a Convolutional Neural Network (CNN) wasimplemented and trained using video stream input containingone or more faces from an already existing data set (AVA-Speech). The goal for the network has then been to for eachface, and in each point in time, detect the probability of thatperson speaking. Our best result using an added Optical Flowfunction was 0.753 while we reached 0.781 using another pre-processing method of the data. These numbers correspondedsurprisingly well with the existing scientific literature in thearea where 0.77 proved to be an appropriate benchmark level. / En social robot kommer i många fall tvingasatt hantera konversationer med flera interlokutörer och därolika personer pratar samtidigt. För att uppnå detta är detviktigt att roboten kan identifiera talaren för att i nästa ledkunna bistå eller interagera med denna. Detta projekt harundersökt problemet med en visuell utgångspunkt där ettFaltningsnätverk (CNN) implementerades och tränades medvideo-input från ett redan befintligt dataset (AVA-Speech).Målet för nätverket har varit att för varje ansikte, och i varjetidpunkt, detektera sannolikheten att den personen talar. Vårtbästa resultat vid användning av Optical Flow var 0,753 medanvi lyckades nå 0,781 med en annan typ av förprocessering avdatan. Dessa resultat motsvarade den existerande vetenskapligalitteraturen på området förvånansvärt bra där 0,77 har visatsig vara ett lämpligt jämförelsevärde. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
|
7 |
How accuracy of estimated glottal flow waveforms affects spoofed speech detection performanceDeivard, Johannes January 2020 (has links)
In the domain of automatic speaker verification, one of the challenges is to keep the malevolent people out of the system. One way to do this is to create algorithms that are supposed to detect spoofed speech. There are several types of spoofed speech and several ways to detect them, one of which is to look at the glottal flow waveform (GFW) of a speech signal. This waveform is often estimated using glottal inverse filtering (GIF), since, in order to create the ground truth GFW, special invasive equipment is required. To the author’s knowledge, no research has been done where the correlation of GFW accuracy and spoofed speech detection (SSD) performance is investigated. This thesis tries to find out if the aforementioned correlation exists or not. First, the performance of different GIF methods is evaluated, then simple SSD machine learning (ML) models are trained and evaluated based on their macro average precision. The ML models use different datasets composed of parametrized GFWs estimated with the GIF methods from the previous step. Results from the previous tasks are then combined in order to spot any correlations. The evaluations of the different methods showed that they created GFWs of varying accuracy. The different machine learning models also showed varying performance depending on what type of dataset that was being used. However, when combining the results, no obvious correlations between GFW accuracy and SSD performance were detected. This suggests that the overall accuracy of a GFW is not a substantial factor in the performance of machine learning-based SSD algorithms.
|
8 |
Machine Learning Approaches for Speech ForensicsAmit Kumar Singh Yadav (19984650) 31 October 2024 (has links)
<p dir="ltr">Several incidents report misuse of synthetic speech for impersonation attacks, spreading misinformation, and supporting financial frauds. To counter such misuse, this dissertation focuses on developing methods for speech forensics. First, we present a method to detect compressed synthetic speech. The method uses comparatively 33 times less information from compressed bit stream than used by existing methods and achieve high performance. Second, we present a transformer neural network method that uses 2D spectral representation of speech signals to detect synthetic speech. The method shows high performance on detecting both compressed and uncompressed synthetic speech. Third, we present a method using an interpretable machine learning approach known as disentangled representation learning for synthetic speech detection. Fourth, we present a method for synthetic speech attribution. It identifies the source of a speech signal. If the speech is spoken by a human, we classify it as authentic/bona fide. If the speech signal is synthetic, we identify the generation method used to create it. We examine both closed-set and open-set attribution scenarios. In a closed-set scenario, we evaluate our approach only on the speech generation methods present in the training set. In an open-set scenario, we also evaluate on methods which are not present in the training set. Fifth, we propose a multi-domain method for synthetic speech localization. It processes multi-domain features obtained from a transformer using a ResNet-style MLP. We show that with relatively less number of parameters, the proposed method performs better than existing methods. Finally, we present a new direction of research in speech forensics <i>i.e.</i>, bias and fairness of synthetic speech detectors. By bias, we refer to an action in which a detector unfairly targets a specific demographic group of individuals and falsely labels their bona fide speech as synthetic. We show that existing synthetic speech detectors are gender, age and accent biased. They also have bias against bona fide speech from people with speech impairments such as stuttering. We propose a set of augmentations that simulate stuttering in speech. We show that synthetic speech detectors trained with proposed augmentation have less bias relative to detector trained without it.</p>
|
9 |
Machine Learning for Speech Forensics and Hypersonic Vehicle ApplicationsEmily R Bartusiak (6630773) 06 December 2022 (has links)
<p>Synthesized speech may be used for nefarious purposes, such as fraud, spoofing, and misinformation campaigns. We present several speech forensics methods based on deep learning to protect against such attacks. First, we use a convolutional neural network (CNN) and transformers to detect synthesized speech. Then, we investigate closed set and open set speech synthesizer attribution. We use a transformer to attribute a speech signal to its source (i.e., to identify the speech synthesizer that created it). Additionally, we show that our approach separates different known and unknown speech synthesizers in its latent space, even though it has not seen any of the unknown speech synthesizers during training. Next, we explore machine learning for an objective in the aerospace domain.</p>
<p><br></p>
<p>Compared to conventional ballistic vehicles and cruise vehicles, hypersonic glide vehicles (HGVs) exhibit unprecedented abilities. They travel faster than Mach 5 and maneuver to evade defense systems and hinder prediction of their final destinations. We investigate machine learning for identifying different HGVs and a conic reentry vehicle (CRV) based on their aerodynamic state estimates. We also propose a HGV flight phase prediction method. Inspired by natural language processing (NLP), we model flight phases as “words” and HGV trajectories as “sentences.” Next, we learn a “grammar” from the HGV trajectories that describes their flight phase transition patterns. Given “words” from the initial part of a HGV trajectory and the “grammar”, we predict future “words” in the “sentence” (i.e., future HGV flight phases in the trajectory). We demonstrate that this approach successfully predicts future flight phases for HGV trajectories, especially in scenarios with limited training data. We also show that it can be used in a transfer learning scenario to predict flight phases of HGV trajectories that exhibit new maneuvers and behaviors never seen before during training.</p>
|
Page generated in 0.1078 seconds