Global ETD Search

71	Anomaly Detection and Security Deep Learning Methods Under Adversarial Situation Miguel Villarreal-Vasquez (9034049) 27 June 2020 (has links) <p>Advances in Artificial Intelligence (AI), or more precisely on Neural Networks (NNs), and fast processing technologies (e.g. Graphic Processing Units or GPUs) in recent years have positioned NNs as one of the main machine learning algorithms used to solved a diversity of problems in both academia and the industry. While they have been proved to be effective in solving many tasks, the lack of security guarantees and understanding of their internal processing disrupts their wide adoption in general and cybersecurity-related applications. In this dissertation, we present the findings of a comprehensive study aimed to enable the absorption of state-of-the-art NN algorithms in the development of enterprise solutions. Specifically, this dissertation focuses on (1) the development of defensive mechanisms to protect NNs against adversarial attacks and (2) application of NN models for anomaly detection in enterprise networks.</p><p>In this state of affairs, this work makes the following contributions. First, we performed a thorough study of the different adversarial attacks against NNs. We concentrate on the attacks referred to as trojan attacks and introduce a novel model hardening method that removes any trojan (i.e. misbehavior) inserted to the NN models at training time. We carefully evaluate our method and establish the correct metrics to test the efficiency of defensive methods against these types of attacks: (1) accuracy with benign data, (2) attack success rate, and (3) accuracy with adversarial data. Prior work evaluates their solutions using the first two metrics only, which do not suffice to guarantee robustness against untargeted attacks. Our method is compared with the state-of-the-art. The obtained results show our method outperforms it. Second, we proposed a novel approach to detect anomalies using LSTM-based models. Our method analyzes at runtime the event sequences generated by the Endpoint Detection and Response (EDR) system of a renowned security company running and efficiently detects uncommon patterns. The new detecting method is compared with the EDR system. The results show that our method achieves a higher detection rate. Finally, we present a Moving Target Defense technique that smartly reacts upon the detection of anomalies so as to also mitigate the detected attacks. The technique efficiently replaces the entire stack of virtual nodes, making ongoing attacks in the system ineffective.</p><p> </p> Applied Computer Science Computer Vision Computer System Security Deep Neural Network (DNN) security content-focus data augmentation trojan or backdoor attacks insider threats Long Short-Term Memory (LSTM) anomaly detection variable-length sequence analysis
72	Comparative Study of Classification Methods for the Mitigation of Class Imbalance Issues in Medical Imaging Applications Kueterman, Nathan 22 June 2020 (has links) No description available. Medical Imaging Electrical Engineering Computer Engineering medical imaging machine learning deep learning class imbalance data augmentation, sampling methods diabetic retinopathy leukemia CIFAR-10 CIFAR10 leukocyte ensemble networks neural networks comparative study
73	Myaamia Translator: Using Neural Machine Translation With Attention to Translate a Low-resource Language Baaniya, Bishal 06 April 2023 (has links) No description available. Computer Engineering Computer Science Artificial Intelligence Linguistics Language Native American Studies
74	Exploring State-of-the-Art Machine Learning Methods for Quantifying Exercise-induced Muscle Fatigue / Exploring State-of-the-Art Machine Learning Methods for Quantifying Exercise-induced Muscle Fatigue Afram, Abboud, Sarab Fard Sabet, Danial January 2023 (has links) Muscle fatigue is a severe problem for elite athletes, and this is due to the long resting times, which can vary. Various mechanisms can cause muscle fatigue which signifies that the specific muscle has reached its maximum force and cannot continue the task. This thesis was about surveying and exploring state-of-the-art methods and systematically, theoretically, and practically testing the applicability and performance of more recent machine learning methods on an existing EMG to muscle fatigue pipeline. Several challenges within the EMG domain exist, such as inadequate data, finding the most suitable model, and how they should be addressed to achieve reliable prediction. This required approaches for addressing these problems by combining and comparing various state-of-the-art methodologies, such as data augmentation techniques for upsampling, spectrogram methods for signal processing, and transfer learning to gain a reliable prediction by various pre-trained CNN models. The approach during this study was to conduct seven experiments consisting of a classification task that aims to predict muscle fatigue in various stages. These stages are divided into 7 classes from 0-6, and higher classes represent a fatigued muscle. In the tabular part of the experiments, the Decision Tree, Random Forest, and Support Vector Machine (SVM) were trained, and the accuracy was determined. A similar approach was made for the spectrogram part, where the signals were converted to spectrogram images, and with a combination of traditional- and intelligent data augmentation techniques, such as noise and DCGAN, the limited dataset was increased. A comparison between the performance of AlexNet, VGG16, DenseNet, and InceptionV3 pre-trained CNN models was made to predict differences in jump heights. The result was evaluated by implementing baseline classifiers on tabular data and pre-trained CNN model classifiers for CWT and STFT spectrograms with and without data augmentation. The evaluation of various state-of-the-art methodologies for a classification problem showed that DenseNet and VGG16 gave a reliable accuracy of 89.8 % on intelligent data augmented CWT images. The intelligent data augmentation applied on CWT images allows the pre-trained CNN models to learn features that can generalize unseen data. Proving that the combination of state-of-the-art methods can be introduced and address the challenges within the EMG domain. EMG SEMG STFT CWT SVM CNN GAN DCGAN BCE SGD deep learning machine learning muscle fatigue DCGAN spectrogram CNN models transfers learning data augmentation feature extraction Computer and Information Sciences Data- och informationsvetenskap
75	Point Cloud Data Augmentation for 4D Panoptic Segmentation / Punktmolndataförstärkning för 4D-panoptisk Segmentering Jin, Wangkang January 2022 (has links) 4D panoptic segmentation is an emerging topic in the field of autonomous driving, which jointly tackles 3D semantic segmentation, 3D instance segmentation, and 3D multi-object tracking based on point cloud data. However, the difficulty of collection limits the size of existing point cloud datasets. Therefore, data augmentation is employed to expand the amount of existing data for better generalization and prediction ability. In this thesis, we built a new point cloud dataset named VCE dataset from scratch. Besides, we adopted a neural network model for the 4D panoptic segmentation task and proposed a simple geometric method based on translation operation. Compared to the baseline model, better results were obtained after augmentation, with an increase of 2.15% in LSTQ. / 4D-panoptisk segmentering är ett framväxande ämne inom området autonom körning, som gemensamt tar itu med semantisk 3D-segmentering, 3D-instanssegmentering och 3D-spårning av flera objekt baserat på punktmolnsdata. Svårigheten att samla in begränsar dock storleken på befintliga punktmolnsdatauppsättningar. Därför används dataökning för att utöka mängden befintliga data för bättre generalisering och förutsägelseförmåga. I det här examensarbetet byggde vi en ny punktmolndatauppsättning med namnet VCE-datauppsättning från grunden. Dessutom antog vi en neural nätverksmodell för 4D-panoptisk segmenteringsuppgift och föreslog en enkel geometrisk metod baserad på översättningsoperation. Jämfört med baslinjemodellen erhölls bättre resultat efter förstärkning, med en ökning på 2.15% i LSTQ. Point Cloud Data Augmentation 4D panoptic segmentation Deep Learning 3D Perception Autonomous Driving Punktmoln Dataökning 4D panoptisk segmentering Djup lärning 3D Perception 3D Uppfattning Autonom körning Computer and Information Sciences Data- och informationsvetenskap
76	Mispronunciation Detection with SpeechBlender Data Augmentation Pipeline / Uttalsfelsdetektering med SpeechBlender data-förstärkning Elkheir, Yassine January 2023 (has links) The rise of multilingualism has fueled the demand for computer-assisted pronunciation training (CAPT) systems for language learning, CAPT systems make use of speech technology advancements and offer features such as learner assessment and curriculum management. Mispronunciation detection (MD) is a crucial aspect of CAPT, aimed at identifying and correcting mispronunciations in second language learners’ speech. One of the significant challenges in developing MD models is the limited availability of labeled second-language speech data. To overcome this, the thesis introduces SpeechBlender - a fine-grained data augmentation pipeline designed to generate mispronunciations. The SpeechBlender targets different regions of a phonetic unit and blends raw speech signals through linear interpolation, resulting in erroneous pronunciation instances. This method provides a more effective sample generation compared to traditional cut/paste methods. The thesis explores also the use of pre-trained automatic speech recognition (ASR) systems for mispronunciation detection (MD), and examines various phone-level features that can be extracted from pre-trained ASR models and utilized for MD tasks. An deep neural model was proposed, that enhance the representations of extracted acoustic features combined with positional phoneme embeddings. The efficacy of the augmentation technique is demonstrated through a phone-level pronunciation quality assessment task using only non-native good pronunciation speech data. Our proposed technique achieves state-of-the-art results, with Speechocean762 Dataset [54], on ASR dependent MD models at phoneme level, with a 2.0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [17]. Additionally, we demonstrate a 5.0% improvement at the phoneme level compared to our baseline. In this thesis, we developed the first Arabic pronunciation learning corpus for Arabic AraVoiceL2 to demonstrate the generality of our proposed model and augmentation technique. We used the corpus to evaluate the effectiveness of our approach in improving mispronunciation detection for non-native Arabic speakers learning. Our experiments showed promising results, with a 4.6% increase in F1-score for the Arabic AraVoiceL2 testset, demonstrating the effectiveness of our model and augmentation technique in improving pronunciation learning for non-native speakers of Arabic. / Den ökande flerspråkigheten har ökat efterfrågan på datorstödda CAPT-system (Computer-assisted pronunciation training) för språkinlärning. CAPT-systemen utnyttjar taltekniska framsteg och erbjuder funktioner som bedömning av inlärare och läroplanshantering. Upptäckt av felaktigt uttal är en viktig aspekt av CAPT som syftar till att identifiera och korrigera felaktiga uttal i andraspråkselevernas tal. En av de stora utmaningarna när det gäller att utveckla MD-modeller är den begränsade tillgången till märkta taldata för andraspråk. För att övervinna detta introduceras SpeechBlender i avhandlingen - en finkornig dataförstärkningspipeline som är utformad för att generera feluttalningar. SpeechBlender är inriktad på olika regioner i en fonetisk enhet och blandar råa talsignaler genom linjär interpolering, vilket resulterar i felaktiga uttalsinstanser. Denna metod ger en effektivare provgenerering jämfört med traditionella cut/paste-metoder. I avhandlingen undersöks användningen av förtränade system för automatisk taligenkänning (ASR) för upptäckt av felaktigt uttal. I avhandlingen undersöks olika funktioner på fonemnivå som kan extraheras från förtränade ASR-modeller och användas för att upptäcka felaktigt uttal. En LSTM-modell föreslogs som förbättrar representationen av extraherade akustiska egenskaper i kombination med positionella foneminbäddningar. Effektiviteten hos förstärkning stekniken demonstreras genom en uppgift för bedömning av uttalskvaliteten på fonemnivå med hjälp av taldata som endast innehåller taldata som inte är av inhemskt ursprung och som ger ett bra uttal, Vår föreslagna teknik uppnår toppresultat med Speechocean762-dataset [54], på ASR-beroende modeller för upptäckt av felaktigt uttal på fonemnivå, med en ökning av Pearsonkorrelationskoefficienten (PCC) med 2,0% jämfört med den tidigare toppnivån [17]. Dessutom visar vi en förbättring på 5,0% på fonemnivå jämfört med vår baslinje. Vi observerade också en ökning av F1-poängen med 4,6% med arabiska AraVoiceL2-testset. Automatic Speech Recognition (ASR) Datorstödd uttalsträning (CAPT) automatisk taligenkänning (ASR) Elektroteknik och elektronik
77	AI-based Quality Inspection forShort-Series Production : Using synthetic dataset to perform instance segmentation forquality inspection / AI-baserad kvalitetsinspektion för kortserieproduktion : Användning av syntetiska dataset för att utföra instans segmentering förkvalitetsinspektion Russom, Simon Tsehaie January 2022 (has links) Quality inspection is an essential part of almost any industrial production line. However, designing customized solutions for defect detection for every product can be costlyfor the production line. This is especially the case for short-series production, where theproduction time is limited. That is because collecting and manually annotating the training data takes time. Therefore, a possible method for defect detection using only synthetictraining data focused on geometrical defects is proposed in this thesis work. The methodis partially inspired by previous related work. The proposed method makes use of aninstance segmentation model and pose-estimator. However, this thesis work focuses onthe instance segmentation part while using a pre-trained pose-estimator for demonstrationpurposes. The synthetic data was automatically generated using different data augmentation techniques from a 3D model of a given object. Moreover, Mask R-CNN was primarilyused as the instance segmentation model and was compared with a rival model, HTC. Thetrials show promising results in developing a trainable general-purpose defect detectionpipeline using only synthetic data Synthetic Training Dataset Geometrical Defect Detection Instance Segmentation Data Augmentation Techniques Mask R-CNN Transformers Syntetisk Träningsdataset Detektion av Geometriska Defekter Instanssegmentering Tekniker för Dataaugmentering Mask R-CNN Transformers
78	Multivariate Time Series Data Generation using Generative Adversarial Networks : Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN Nord, Sofia January 2021 (has links) Large datasets are a crucial requirement to achieve high performance, accuracy, and generalisation for any machine learning task, such as prediction or anomaly detection, However, it is not uncommon for datasets to be small or imbalanced since gathering data can be difficult, time-consuming, and expensive. In the task of collecting vehicle sensor time series data, in particular when the vehicle has an abnormal behaviour, these struggles are present and may hinder the automotive industry in its development. Synthetic data generation has become a growing interest among researchers in several fields to handle the struggles with data gathering. Among the methods explored for generating data, generative adversarial networks (GANs) have become a popular approach due to their wide application domain and successful performance. This thesis focuses on generating multivariate time series data that are similar to vehicle sensor readings from the air pressures in the brake system of vehicles with an abnormal behaviour, meaning there is a leakage somewhere in the system. A novel GAN architecture called TimeGAN was trained to generate such data and was then evaluated using both qualitative and quantitative evaluation metrics. Two versions of this model were tested and compared. The results obtained proved that both models learnt the distribution and the underlying information within the features of the real data. The goal of the thesis was achieved and can become a foundation for future work in this field. / När man applicerar en modell för att utföra en maskininlärningsuppgift, till exempel att förutsäga utfall eller upptäcka avvikelser, är det viktigt med stora dataset för att uppnå hög prestanda, noggrannhet och generalisering. Det är dock inte ovanligt att dataset är små eller obalanserade eftersom insamling av data kan vara svårt, tidskrävande och dyrt. När man vill samla tidsserier från sensorer på fordon är dessa problem närvarande och de kan hindra bilindustrin i dess utveckling. Generering av syntetisk data har blivit ett växande intresse bland forskare inom flera områden som ett sätt att hantera problemen med datainsamling. Bland de metoder som undersökts för att generera data har generative adversarial networks (GANs) blivit ett populärt tillvägagångssätt i forskningsvärlden på grund av dess breda applikationsdomän och dess framgångsrika resultat. Denna avhandling fokuserar på att generera flerdimensionell tidsseriedata som liknar fordonssensoravläsningar av lufttryck i bromssystemet av fordon med onormalt beteende, vilket innebär att det finns ett läckage i systemet. En ny GAN modell kallad TimeGAN tränades för att genera sådan data och utvärderades sedan både kvalitativt och kvantitativt. Två versioner av denna modell testades och jämfördes. De erhållna resultaten visade att båda modellerna lärde sig distributionen och den underliggande informationen inom de olika signalerna i den verkliga datan. Målet med denna avhandling uppnåddes och kan lägga grunden för framtida arbete inom detta område. Time Series Data Generation Generative Adversarial Network Deep Neural Network Data Augmentation Synthetic Data Generation Generering av Tidsseriedata Generativa Motstridande Nätverk Djupa Neurala Nätverk Dataökning Syntetisk Datagenerering Computer and Information Sciences Data- och informationsvetenskap
79	Generative Adversarial Networks for Image-to-Image Translation on Street View and MR Images Karlsson, Simon, Welander, Per January 2018 (has links) Generative Adversarial Networks (GANs) is a deep learning method that has been developed for synthesizing data. One application for which it can be used for is image-to-image translations. This could prove to be valuable when training deep neural networks for image classification tasks. Two areas where deep learning methods are used are automotive vision systems and medical imaging. Automotive vision systems are expected to handle a broad range of scenarios which demand training data with a high diversity. The scenarios in the medical field are fewer but the problem is instead that it is difficult, time consuming and expensive to collect training data. This thesis evaluates different GAN models by comparing synthetic MR images produced by the models against ground truth images. A perceptual study is also performed by an expert in the field. It is shown by the study that the implemented GAN models can synthesize visually realistic MR images. It is also shown that models producing more visually realistic synthetic images not necessarily have better results in quantitative error measurements, when compared to ground truth data. Along with the investigations on medical images, the thesis explores the possibilities of generating synthetic street view images of different resolution, light and weather conditions. Different GAN models have been compared, implemented with our own adjustments, and evaluated. The results show that it is possible to create visually realistic images for different translations and image resolutions. Deep learning Image processing Artificial intelligence Neural networks MRI Generative adversarial networks Data augmentation Image-to-image translation Street view Biomedical engineering Electrical engineering Medical Image Processing Medicinsk bildbehandling Elektroteknik och elektronik
80	On the Keyword Extraction and Bias Analysis, Graph-based Exploration and Data Augmentation for Abusive Language Detection in Low-Resource Settings Peña Sarracén, Gretel Liz de la 07 April 2024 (has links) Tesis por compendio / [ES] La detección del lenguaje abusivo es una tarea que se ha vuelto cada vez más importante en la era digital moderna, donde la comunicación se produce a través de diversas plataformas en línea. El aumento de las interacciones en estas plataformas ha provocado un aumento de la aparición del lenguaje abusivo. Abordar dicho contenido es crucial para mantener un entorno en línea seguro e inclusivo. Sin embargo, esta tarea enfrenta varios desafíos que la convierten en un área compleja y que demanda de continua investigación y desarrollo. En particular, detectar lenguaje abusivo en entornos con escasez de datos presenta desafíos adicionales debido a que el desarrollo de sistemas automáticos precisos a menudo requiere de grandes conjuntos de datos anotados. En esta tesis investigamos diferentes aspectos de la detección del lenguaje abusivo, prestando especial atención a entornos con datos limitados. Primero, estudiamos el sesgo hacia palabras clave abusivas en modelos entrenados para la detección del lenguaje abusivo. Con este propósito, proponemos dos métodos para extraer palabras clave potencialmente abusivas de colecciones de textos. Luego evaluamos el sesgo hacia las palabras clave extraídas y cómo se puede modificar este sesgo para influir en el rendimiento de la detección del lenguaje abusivo. El análisis y las conclusiones de este trabajo revelan evidencia de que es posible mitigar el sesgo y que dicha reducción puede afectar positivamente el desempeño de los modelos. Sin embargo, notamos que no es posible establecer una correspondencia similar entre la variación del sesgo y el desempeño de los modelos cuando hay escasez datos con las técnicas de reducción del sesgo estudiadas. En segundo lugar, investigamos el uso de redes neuronales basadas en grafos para detectar lenguaje abusivo. Por un lado, proponemos una estrategia de representación de textos diseñada con el objetivo de obtener un espacio de representación en el que los textos abusivos puedan distinguirse fácilmente de otros textos. Por otro lado, evaluamos la capacidad de redes neuronales convolucionales basadas en grafos para clasificar textos abusivos. La siguiente parte de nuestra investigación se centra en analizar cómo el aumento de datos puede influir en el rendimiento de la detección del lenguaje abusivo. Para ello, investigamos dos técnicas bien conocidas basadas en el principio de minimización del riesgo en la vecindad de instancias originales y proponemos una variante para una de ellas. Además, evaluamos técnicas simples basadas en el reemplazo de sinónimos, inserción aleatoria, intercambio aleatorio y eliminación aleatoria de palabras. Las contribuciones de esta tesis ponen de manifiesto el potencial de las redes neuronales basadas en grafos y de las técnicas de aumento de datos para mejorar la detección del lenguaje abusivo, especialmente cuando hay limitación de datos. Estas contribuciones han sido publicadas en conferencias y revistas internacionales. / [CA] La detecció del llenguatge abusiu és una tasca que s'ha tornat cada vegada més important en l'era digital moderna, on la comunicació es produïx a través de diverses plataformes en línia. L'augment de les interaccions en estes plataformes ha provocat un augment de l'aparició de llenguatge abusiu. Abordar este contingut és crucial per a mantindre un entorn en línia segur i inclusiu. No obstant això, esta tasca enfronta diversos desafiaments que la convertixen en una àrea complexa i contínua de recerca i desenvolupament. En particular, detectar llenguatge abusiu en entorns amb escassetat de dades presenta desafiaments addicionals pel fet que el desenvolupament de sistemes automàtics precisos sovint requerix de grans conjunts de dades anotades. En esta tesi investiguem diferents aspectes de la detecció del llenguatge abusiu, prestant especial atenció a entorns amb dades limitades. Primer, estudiem el biaix cap a paraules clau abusives en models entrenats per a la detecció de llenguatge abusiu. Amb este propòsit, proposem dos mètodes per a extraure paraules clau potencialment abusives de col·leccions de textos. Després avaluem el biaix cap a les paraules clau extretes i com es pot modificar este biaix per a influir en el rendiment de la detecció de llenguatge abusiu. L'anàlisi i les conclusions d'este treball revelen evidència que és possible mitigar el biaix i que esta reducció pot afectar positivament l'acompliment dels models. No obstant això, notem que no és possible establir una correspondència similar entre la variació del biaix i l'acompliment dels models quan hi ha escassetat dades amb les tècniques de reducció del biaix estudiades. En segon lloc, investiguem l'ús de xarxes neuronals basades en grafs per a detectar llenguatge abusiu. D'una banda, proposem una estratègia de representació textual dissenyada amb l'objectiu d'obtindre un espai de representació en el qual els textos abusius puguen distingir-se fàcilment d'altres textos. D'altra banda, avaluem la capacitat de models basats en xarxes neuronals convolucionals basades en grafs per a classificar textos abusius. La següent part de la nostra investigació se centra en analitzar com l'augment de dades pot influir en el rendiment de la detecció del llenguatge abusiu. Per a això, investiguem dues tècniques ben conegudes basades en el principi de minimització del risc en el veïnatge d'instàncies originals i proposem una variant per a una d'elles. A més, avaluem tècniques simples basades en el reemplaçament de sinònims, inserció aleatòria, intercanvi aleatori i eliminació aleatòria de paraules. Les contribucions d'esta tesi destaquen el potencial de les xarxes neuronals basades en grafs i de les tècniques d'augment de dades per a millorar la detecció del llenguatge abusiu, especialment quan hi ha limitació de dades. Estes contribucions han sigut publicades en revistes i conferències internacionals. / [EN] Abusive language detection is a task that has become increasingly important in the modern digital age, where communication takes place via various online platforms. The increase in online interactions has led to an increase in the occurrence of abusive language. Addressing such content is crucial to maintaining a safe and inclusive online environment. However, this task faces several challenges that make it a complex and ongoing area of research and development. In particular, detecting abusive language in environments with sparse data poses an additional challenge, since the development of accurate automated systems often requires large annotated datasets. In this thesis we investigate different aspects of abusive language detection, paying particular attention to environments with limited data. First, we study the bias toward abusive keywords in models trained for abusive language detection. To this end, we propose two methods for extracting potentially abusive keywords from datasets. We then evaluate the bias toward the extracted keywords and how this bias can be modified in order to influence abusive language detection performance. The analysis and conclusions of this work reveal evidence that it is possible to mitigate the bias and that such a reduction can positively affect the performance of the models. However, we notice that it is not possible to establish a similar correspondence between bias mitigation and model performance in low-resource settings with the studied bias mitigation techniques. Second, we investigate the use of models based on graph neural networks to detect abusive language. On the one hand, we propose a text representation framework designed with the aim of obtaining a representation space in which abusive texts can be easily distinguished from other texts. On the other hand, we evaluate the ability of models based on convolutional graph neural networks to classify abusive texts. The next part of our research focuses on analyzing how data augmentation can influence the performance of abusive language detection. To this end, we investigate two well-known techniques based on the principle of vicinal risk minimization and propose a variant for one of them. In addition, we evaluate simple techniques based on the operations of synonym replacement, random insertion, random swap, and random deletion. The contributions of this thesis highlight the potential of models based on graph neural networks and data augmentation techniques to improve abusive language detection, especially in low-resource settings. These contributions have been published in several international conferences and journals. / This research work was partially funded by the Spanish Ministry of Science and Innovation under the research project MISMIS-FAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31). The authors thank also the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025. This work was done in the framework of the research project on Fairness and Transparency for equitable NLP applications in social media, funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making EuropePI. FairTransNLP research project (PID2021-124361OB-C31) funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe. Part of the work presented in this article was performed during the first author’s research visit to the University of Mannheim, supported through a Contact Fellowship awarded by the DAAD scholarship program “STIBET Doktoranden”. / Peña Sarracén, GLDL. (2024). On the Keyword Extraction and Bias Analysis, Graph-based Exploration and Data Augmentation for Abusive Language Detection in Low-Resource Settings [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/203266 / Compendio Redes neuronales de grafos Optimización de recursos Detección de lenguaje abusivo Extracción de palabras clave Análisis de sesgos Aumento de datos Keyword Extraction Bias Analysis Abusive language detection Graph neural networks Data augmentation Low resource settings LENGUAJES Y SISTEMAS INFORMATICOS

Search results