• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Machine Learning Survival Models : Performance and Explainability

Alabdallah, Abdallah January 2023 (has links)
Survival analysis is an essential statistics and machine learning field in various critical applications like medical research and predictive maintenance. In these domains understanding models' predictions is paramount. While machine learning techniques are increasingly applied to enhance the predictive performance of survival models, they simultaneously sacrifice transparency and explainability.  Survival models, in contrast to regular machine learning models, predict functions rather than point estimates like regression and classification models. This creates a challenge regarding explaining such models using the known off-the-shelf machine learning explanation techniques, like Shapley Values, Counterfactual examples, and others.    Censoring is also a major issue in survival analysis where the target time variable is not fully observed for all subjects. Moreover, in predictive maintenance settings, recorded events do not always map to actual failures, where some components could be replaced because it is considered faulty or about to fail in the future based on an expert's opinion. Censoring and noisy labels create problems in terms of modeling and evaluation that require to be addressed during the development and evaluation of the survival models. Considering the challenges in survival modeling and the differences from regular machine learning models, this thesis aims to bridge this gap by facilitating the use of machine learning explanation methods to produce plausible and actionable explanations for survival models. It also aims to enhance survival modeling and evaluation revealing a better insight into the differences among the compared survival models. In this thesis, we propose two methods for explaining survival models which rely on discovering survival patterns in the model's predictions that group the studied subjects into significantly different survival groups. Each pattern reflects a specific survival behavior common to all the subjects in their respective group. We utilize these patterns to explain the predictions of the studied model in two ways. In the first, we employ a classification proxy model that can capture the relationship between the descriptive features of subjects and the learned survival patterns. Explaining such a proxy model using Shapley Values provides insights into the feature attribution of belonging to a specific survival pattern. In the second method, we addressed the "what if?" question by generating plausible and actionable counterfactual examples that would change the predicted pattern of the studied subject. Such counterfactual examples provide insights into actionable changes required to enhance the survivability of subjects. We also propose a variational-inference-based generative model for estimating the time-to-event distribution. The model relies on a regression-based loss function with the ability to handle censored cases. It also relies on sampling for estimating the conditional probability of event times. Moreover, we propose a decomposition of the C-index into a weighted harmonic average of two quantities, the concordance among the observed events and the concordance between observed and censored cases. These two quantities, weighted by a factor representing the balance between the two, can reveal differences between survival models previously unseen using only the total Concordance index. This can give insight into the performances of different models and their relation to the characteristics of the studied data. Finally, as part of enhancing survival modeling, we propose an algorithm that can correct erroneous event labels in predictive maintenance time-to-event data. we adopt an expectation-maximization-like approach utilizing a genetic algorithm to find better labels that would maximize the survival model's performance. Over iteration, the algorithm builds confidence about events' assignments which improves the search in the following iterations until convergence. We performed experiments on real and synthetic data showing that our proposed methods enhance the performance in survival modeling and can reveal the underlying factors contributing to the explainability of survival models' behavior and performance.
2

Computationally Efficient Explainable AI: Bayesian Optimization for Computing Multiple Counterfactual Explanantions / Beräkningsmässigt Effektiv Förklarbar AI: Bayesiansk Optimering för Beräkning av Flera Motfaktiska Förklaringar

Sacchi, Giorgio January 2023 (has links)
In recent years, advanced machine learning (ML) models have revolutionized industries ranging from the healthcare sector to retail and E-commerce. However, these models have become increasingly complex, making it difficult for even domain experts to understand and retrace the model's decision-making process. To address this challenge, several frameworks for explainable AI have been proposed and developed. This thesis focuses on counterfactual explanations (CFEs), which provide actionable insights by informing users how to modify inputs to achieve desired outputs. However, computing CFEs for a general black-box ML model is computationally expensive since it hinges on solving a challenging optimization problem. To efficiently solve this optimization problem, we propose using Bayesian optimization (BO), and introduce the novel algorithm Separated Bayesian Optimization (SBO). SBO exploits the formulation of the counterfactual function as a composite function. Additionally, we propose warm-starting SBO, which addresses the computational challenges associated with computing multiple CFEs. By decoupling the generation of a surrogate model for the black-box model and the computation of specific CFEs, warm-starting SBO allows us to reuse previous data and computations, resulting in computational discounts and improved efficiency for large-scale applications. Through numerical experiments, we demonstrate that BO is a viable optimization scheme for computing CFEs for black-box ML models. BO achieves computational efficiency while maintaining good accuracy. SBO improves upon this by requiring fewer evaluations while achieving accuracies comparable to the best conventional optimizer tested. Both BO and SBO exhibit improved capabilities in handling various classes of ML decision models compared to the tested baseline optimizers. Finally, Warm-starting SBO significantly enhances the performance of SBO, reducing function evaluations and errors when computing multiple sequential CFEs. The results indicate a strong potential for large-scale industry applications. / Avancerade maskininlärningsmodeller (ML-modeller) har på senaste åren haft stora framgångar inom flera delar av näringslivet, med allt ifrån hälso- och sjukvårdssektorn till detaljhandel och e-handel. I jämn takt med denna utveckling har det dock även kommit en ökad komplexitet av dessa ML-modeller vilket nu lett till att även domänexperter har svårigheter med att förstå och tolka modellernas beslutsprocesser. För att bemöta detta problem har flertalet förklarbar AI ramverk utvecklats. Denna avhandling fokuserar på kontrafaktuella förklaringar (CFEs). Detta är en förklaringstyp som anger för användaren hur denne bör modifiera sin indata för att uppnå ett visst modellbeslut. För en generell svarta-låda ML-modell är dock beräkningsmässigt kostsamt att beräkna CFEs då det krävs att man löser ett utmanande optimeringsproblem. För att lösa optimeringsproblemet föreslår vi användningen av Bayesiansk Optimering (BO), samt presenterar den nya algoritmen Separated Bayesian Optimization (SBO). SBO utnyttjar kompositionsformuleringen av den kontrafaktuella funktionen. Vidare, utforskar vi beräkningen av flera sekventiella CFEs för vilket vi presenterar varm-startad SBO. Varm-startad SBO lyckas återanvända data samt beräkningar från tidigare CFEs tack vare en separation av surrogat-modellen för svarta-låda ML-modellen och beräkningen av enskilda CFEs. Denna egenskap leder till en minskad beräkningskostnad samt ökad effektivitet för storskaliga tillämpningar.  I de genomförda experimenten visar vi att BO är en lämplig optimeringsmetod för att beräkna CFEs för svarta-låda ML-modeller tack vare en god beräknings effektivitet kombinerat med hög noggrannhet. SBO presterade ännu bättre med i snitt färre funktionsutvärderingar och med fel nivåer jämförbara med den bästa testade konventionella optimeringsmetoden. Både BO och SBO visade på bättre kapacitet att hantera olika klasser av ML-modeller än de andra testade metoderna. Slutligen observerade vi att varm-startad SBO gav ytterligare prestandaökningar med både minskade funktionsutvärderingar och fel när flera CFEs beräknades. Dessa resultat pekar på stor potential för storskaliga tillämpningar inom näringslivet.

Page generated in 0.1413 seconds