Global ETD Search

31	An analysis of text-based machine learning models for vulnerability detection Napier, Kollin Ryne 12 May 2023 (has links) (PDF) With an increase in complexity of software, developers rely more on reuse and dependencies in their source code via code snippets. As a result, it is becoming harder to identify and mitigate vulnerabilities. Although traditional analysis tools are still utilized, machine learning models are being adopted to expand efforts and combat such threats. Given the possibilities towards usage of such models, research in this area has introduced various approaches which vary in usability and prediction. In generalizing models to a more natural language approach, researchers have opted to train models on source code to identify existing and potential vulnerabilities. Exploratory research has been performed by treating source code as plain text, creating “text-based” models. With a motivation to prevent vulnerable code snippets, we present a dissertation on the effectiveness of text-based machine learning models for vulnerability detection. We utilize datasets composed of open-source projects and vulnerability types to generate our own training and testing data via extracted function pairings. Using this data, we evaluate a series of text-based machine learning models, coupled with natural language processing (NLP) techniques and our own data processing methods. Through empirical research, we demonstrate the effectiveness of such models based on statistical evidence. From these results, we determine negative correlations and identify "cross-cutting" features. Finally, we present analysis of models with "cross-cutting" feature removal to improve performance while providing explainability towards model decisions. text-based machine learning vulnerability detection natural language processing data analysis explainability text-based analysis machine learning deep learning neural networks Computer Sciences Data Science Information Security Software Engineering
32	[en] A CRITICAL VIEW ON THE INTERPRETABILITY OF MACHINE LEARNING MODELS / [pt] UMA VISÃO CRÍTICA SOBRE A INTERPRETABILIDADE DE MODELOS DE APRENDIZADO DE MÁQUINA JORGE LUIZ CATALDO FALBO SANTO 29 July 2019 (has links) [pt] À medida que os modelos de aprendizado de máquina penetram áreas críticas como medicina, sistema de justiça criminal e mercados financeiros, sua opacidade, que impede que as pessoas interpretem a maioria deles, se tornou um problema a ser resolvido. Neste trabalho, apresentamos uma nova taxonomia para classificar qualquer método, abordagem ou estratégia para lidar com o problema da interpretabilidade de modelos de aprendizado de máquina. A taxonomia proposta que preenche uma lacuna existente nas estruturas de taxonomia atuais em relação à percepção subjetiva de diferentes intérpretes sobre um mesmo modelo. Para avaliar a taxonomia proposta, classificamos as contribuições de artigos científicos relevantes da área. / [en] As machine learning models penetrate critical areas like medicine, the criminal justice system, and financial markets, their opacity, which hampers humans ability to interpret most of them, has become a problem to be solved. In this work, we present a new taxonomy to classify any method, approach or strategy to deal with the problem of interpretability of machine learning models. The proposed taxonomy fills a gap in the current taxonomy frameworks regarding the subjective perception of different interpreters about the same model. To evaluate the proposed taxonomy, we have classified the contributions of some relevant scientific articles in the area. [pt] APRENDIZADO DE MAQUINA [pt] TRANSPARENCIA DE ALGORITIMO [pt] ALGORITMO INTERPRETAVEL [pt] EXPLANABILIDADE DE MODELO [pt] INTERPRETABILIDADE DE MODELO [pt] INTELIGENCIA ARTIFICIAL EXPLICAVEL [pt] INTELIGENCIA ARTIFICIAL [en] MACHINE LEARNING [en] ALGORITHMIC TRANSPARENCY [en] INTERPRETABLE ALGORITHM [en] EXPLAINABILITY OF MODEL [en] INTERPRETABILITY OF MODELS [en] EXPLAINABLE AI [en] ARTIFICIAL INTELLIGENCE
33	Prédiction d’états mentaux futurs à partir de données de phénotypage numérique Jean, Thierry 12 1900 (has links) Le phénotypage numérique mobilise les nombreux capteurs du téléphone intelligent (p. ex. : accéléromètre, GPS, Bluetooth, métadonnées d’appels) pour mesurer le comportement humain au quotidien, sans interférence, et les relier à des symptômes psychiatriques ou des indicateurs de santé mentale. L’apprentissage automatique est une composante intégrale au processus de transformation de signaux bruts en information intelligible pour un clinicien. Cette approche émerge d’une volonté de caractériser le profil de symptômes et ses variations dans le temps au niveau individuel. Ce projet consistait à prédire des variables de santé mentale (p. ex. : stress, humeur, sociabilité, hallucination) jusqu’à sept jours dans le futur à partir des données du téléphone intelligent pour des patients avec un diagnostic de schizophrénie. Le jeu de données CrossCheck, composé d’un échantillon de 62 participants, a été utilisé. Celui-ci inclut 23,551 jours de signaux du téléphone avec 29 attributs et 6364 autoévaluations de l’état mental à l’aide d’échelles ordinales à 4 ancrages. Des modèles prédictifs ordinaux ont été employés pour générer des prédictions discrètes interprétables sur l’échelle de collecte de données. Au total, 240 modèles d’apprentissage automatique ont été entrainés, soit les combinaisons de 10 variables de santé mentale, 3 horizons temporels (même jour, prochain jour, prochaine semaine), 2 algorithmes (XGBoost, LSTM) et 4 tâches d’apprentissage (classification binaire, régression continue, classification multiclasse, régression ordinale). Les modèles ordinaux et binaires ont performé significativement au-dessus du niveau de base et des deux autres tâches avec une erreur moyenne absolue macro entre 1,436 et 0,767 et une exactitude balancée de 58% à 73%. Les résultats montrent l’effet prépondérant du débalancement des données sur la performance prédictive et soulignent que les mesures n’en tenant pas compte surestiment systématiquement la performance. Cette analyse ancre une série de considérations plus générales quant à l’utilisation de l’intelligence artificielle en santé. En particulier, l’évaluation de la valeur clinique de solutions d’apprentissage automatique présente des défis distinctifs en comparaison aux traitements conventionnels. Le rôle grandissant des technologies numériques en santé mentale a des conséquences sur l’autonomie, l’interprétation et l’agentivité d’une personne sur son expérience. / Digital phenotyping leverages the numerous sensors of smartphones (e.g., accelerometer, GPS, Bluetooth, call metadata) to measure daily human behavior without interference and link it to psychiatric symptoms and mental health indicators. Machine learning is an integral component of processing raw signals into intelligible information for clinicians. This approach emerges from a will to characterize symptom profiles and their temporal variations at an individual level. This project consisted in predicting mental health variables (e.g., stress, mood, sociability, hallucination) up to seven days in the future from smartphone data for patients with a diagnosis of schizophrenia. The CrossCheck dataset, which has a sample of 62 participants, was used. It includes 23,551 days of phone sensor data with 29 features, and 6364 mental state self-reports on 4-point ordinal scales. Ordinal predictive models were used to generate discrete predictions that can be interpreted using the guidelines from the clinical data collection scale. In total, 240 machine learning models were trained, i.e., combinations of 10 mental health variables, 3 forecast horizons (same day, next day, next week), 2 algorithms (XGBoost, LSTM), and 4 learning tasks (binary classification, continuous regression, multiclass classification, ordinal regression). The ordinal and binary models performed significantly better than the baseline and the two other tasks with a macroaveraged mean absolute error between 1.436 and 0.767 and a balanced accuracy between 58% and 73%. Results showed a dominant effect of class imbalance on predictive performance and highlighted that metrics not accounting for it lead to systematic overestimation of performance. This analysis anchors a series of broader considerations about the use of artificial intelligence in healthcare. In particular, assessing the clinical value of machine learning solutions present distinctive challenges when compared to conventional treatments. The growing role of digital technologies in mental health has implication for autonomy, sense-making, and agentivity over one’s experience. santé numérique soi quantifié intelligence artificielle explicabilité échelle clinique apprentissage automatique régression ordinale prévision débalancement de classes Digital health Quantified self Artificial intelligence Ordinal regression Forecast Explainability Clinical scale Machine learning Class imbalance
34	<b>Explaining Generative Adversarial Network Time Series Anomaly Detection using Shapley Additive Explanations</b> Cher Simon (18324174) 10 July 2024 (has links) <p dir="ltr">Anomaly detection is an active research field that widely applies to commercial applications to detect unusual patterns or outliers. Time series anomaly detection provides valuable insights into mission and safety-critical applications using ever-growing temporal data, including continuous streaming time series data from the Internet of Things (IoT), sensor networks, healthcare, stock prices, computer metrics, and application monitoring. While Generative Adversarial Networks (GANs) demonstrate promising results in time series anomaly detection, the opaque nature of generative deep learning models lacks explainability and hinders broader adoption. Understanding the rationale behind model predictions and providing human-interpretable explanations are vital for increasing confidence and trust in machine learning (ML) frameworks such as GANs. This study conducted a structured and comprehensive assessment of post-hoc local explainability in GAN-based time series anomaly detection using SHapley Additive exPlanations (SHAP). Using publicly available benchmarking datasets approved by Purdue’s Institutional Review Board (IRB), this study evaluated state-of-the-art GAN frameworks identifying their advantages and limitations for time series anomaly detection. This study demonstrated a systematic approach in quantifying the extent of GAN-based time series anomaly explainability, providing insights for businesses when considering adopting generative deep learning models. The presented results show that GANs capture complex time series temporal distribution and are applicable for anomaly detection. The analysis from this study shows SHAP can identify the significance of contributing features within time series data and derive post-hoc explanations to quantify GAN-detected time series anomalies.</p> Time-series analysis Data engineering and data science Adversarial machine learning Deep learning Generative Adversarial Networks (GANs) SHapley Additive exPlanations (SHAP) Time Series Anomaly Detection Explainability
35	Explainable Artificial Intelligence for Radio Resource Management Systems : A diverse feature importance approach / Förklarande Artificiell Intelligens inom System för Hantering av Radioresurser : Metoder för klassifisering av betydande predikatorer Marcu, Alexandru-Daniel January 2022 (has links) The field of wireless communications is arguably one of the most rapidly developing technological fields. Therefore, with each new advancement in this field, the complexity of wireless systems can grow significantly. This phenomenon is most visible in mobile communications, where the current 5G and 6G radio access networks (RANs) have reached unprecedented complexity levels to satisfy diverse increasing demands. In such increasingly complex environments, managing resources is becoming more and more challenging. Thus, experts employed performant artificial intelligence (AI) techniques to aid radio resource management (RRM) decisions. However, these AI techniques are often difficult to understand by humans, and may receive unimportant inputs which unnecessarily increase their complexity. In this work, we propose an explainability pipeline meant to be used for increasing humans’ understanding of AI models for RRM, as well as for reducing the complexity of these models, without loss of performance. To achieve this, the pipeline generates diverse feature importance explanations of the models with the help of three explainable artificial intelligence (XAI) methods: Kernel SHAP, CERTIFAI, and Anchors, and performs an importance-based feature selection using one of three different strategies. In the case of Anchors, we formulate and utilize a new way of computing feature importance scores, since no current publication in the XAI literature suggests a way to do this. Finally, we applied the proposed pipeline to a reinforcement learning (RL)- based RRM system. Our results show that we could reduce the complexity of the RL model between ∼ 27.5% and ∼ 62.5% according to different metrics, without loss of performance. Moreover, we showed that the explanations produced by our pipeline can be used to answer some of the most common XAI questions about our RL model, thus increasing its understandability. Lastly, we achieved an unprecedented result showing that our RL agent could be completely replaced with Anchors rules when taking RRM decisions, without a significant loss of performance, but with a considerable gain in understandability. / Området trådlös kommunikation är ett av de snabbast utvecklande tekniska områdena, och varje framsteg riskerar att medföra en signifikant ökning av komplexiteten för trådlösa nätverk. Det här fenomenet är som tydligast i mobil kommunikaiton, framför allt inom 5G och 6G radioaccessnätvärk (RANs) som har nåt nivåer av komplexitet som saknar motstycke. Detta för att uppfylla de ökande kraven som ställs på systemet. I dessa komplexa system blir resurshantering ett ökande problem, därför används nu artificiell intelligens (AI) allt mer för att ta beslut om hantering av radioresurser (RRM). Dessa AI tekniker är dock ofta svåra att förstå för människor, och kan således ges oviktig input vilket leder till att öka AI modellernas komplexitet. I detta arbete föreslås en förklarande pipeline vars mål är att användas för att öka människors förståelse av AI modeller för RRM. Målet är även att minska modellernas komplexitet, utan att förlora prestanda. För att åstadkomma detta genererar pipelinen förklaringar av betydande predikatorer för modellen med hjälp av tre metoder för förklarande artificiell intelligens (XAI). Dessa tre metoder är, Kernel SHAP, CERTIFAI och Anchors. Sedan görs ett predikatorurval baserat på predikatorbetydelse med en av dessa tre metoder. För metoden Anchors formuleras ett nytt sätt att beräkna betydelsen hos predikatorer, eftersom tidigare forskning inte föreslår någon metod för detta. Slutligen appliceras den föreslagna pipelinen på en förstärkt inlärnings- (RL) baserat RRM system. Resultaten visar att komplexiteten av RL modellen kunde reduceras med mellan ∼ 27, 5% och ∼ 62, 5% baserat på olika nyckeltal:er, utan att förlora någon prestanda. Utöver detta visades även att förklaringarna som producerats kan användas för att svara på de vanligaste XAI frågoran om RL modellen, och på det viset har även förståelsen för modellen ökat. Sistnämnt uppnåddes enastående resultat som visade att RL modellen helt kunde ersättas med regler producerade av Anchor-metoden för beslut inom RRM, utan någon störra förlust av prestanda, men med an stor vinst i förståelse. Explainability pipeline Feature importance Kernel SHAP CERTIFAI Anchors XAI in wireless systems XAI for radio resource management Förklarande pipeline Prediktor betydelse Kernel SHAP CERTIFAI Anchors XAI i trådlösa system XAI för radioresurshantering Computer and Information Sciences Data- och informationsvetenskap

Page generated in 0.0363 seconds