• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Towards Fairness-Aware Online Machine Learning from Imbalanced Data Streams

Sadeghi, Farnaz 10 August 2023 (has links)
Online supervised learning from fast-evolving imbalanced data streams has applications in many areas. That is, the development of techniques that are able to handle highly skewed class distributions (or 'class imbalance') is an important area of research in domains such as manufacturing, the environment, and health. Solutions should be able to analyze large repositories in near real-time and provide accurate models to describe rare classes that may appear infrequently or in bursts while continuously accommodating new instances. Although numerous online learning methods have been proposed to handle binary class imbalance, solutions suitable for multi-class streams with varying degrees of imbalance in evolving streams have received limited attention. To address this knowledge gap, the first contribution of this thesis introduces the Online Learning from Imbalanced Multi-Class Streams through Dynamic Sampling (DynaQ) algorithm for learning in such multi-class imbalanced settings. Our approach utilizes a queue-based learning method that dynamically creates an instance queue for each class. The number of instances is balanced by maintaining a queue threshold and removing older samples during training. In addition, new and rare classes are dynamically added to the training process as they appear. Our experimental results confirm a noticeable improvement in minority-class detection and classification performance. A comparative evaluation shows that the DynaQ algorithm outperforms the state-of-the-art approaches. Our second contribution in this thesis focuses on fairness-aware learning from imbalanced streams. Our work is motivated by the observation that the decisions made by online learning algorithms may negatively impact individuals or communities. Indeed, the development of approaches to handle these concerns is an active area of research in the machine learning community. However, most existing methods process the data in offline settings and are not directly suitable for online learning from evolving data streams. Further, these techniques fail to take the effects of class imbalance, on fairness-aware supervised learning into account. In addition, recent fairness-aware online learning supervised learning approaches focus on one sensitive attribute only, which may lead to subgroup discrimination. In a fair classification, the equality of fairness metrics across multiple overlapping groups must be considered simultaneously. In our second contribution, we thus address the combined problem of fairness-aware online learning from imbalanced evolving streams, while considering multiple sensitive attributes. To this end, we introduce the Multi-Sensitive Queue-based Online Fair Learning (MQ-OFL) algorithm, an online fairness-aware approach, which maintains valid and fair models over evolving streams. MQ-OFL changes the training distribution in an online fashion based on both stream imbalance and discriminatory behavior of the model evaluated over the historical stream. We compare our MQ-OFL method with state-of-art studies on real-world datasets and present comparative insights on the performance. Our final contribution focuses on explainability and interpretability in fairness-aware online learning. This research is guided by the concerns raised due to the black-box nature of models, concealing internal logic from users. This lack of transparency poses practical and ethical challenges, particularly when these algorithms make decisions in finance, healthcare, and marketing domains. These systems may introduce biases and prejudices during the learning phase by utilizing complex machine learning algorithms and sensitive data. Consequently, decision models trained on such data may make unfair decisions and it is important to realize such issues before deploying the models. To address this issue, we introduce techniques for interpreting the outcomes of fairness-aware online learning. Through a case study predicting income based on features such as ethnicity, biological sex, age, and education level, we demonstrate how our fairness-aware learning process (MQ-OFL) maintains a balance between accuracy and discrimination trade-off using global and local surrogate models.
2

Benchmarking bias mitigation algorithms in representation learning through fairness metrics

Reddy, Charan 07 1900 (has links)
Le succès des modèles d’apprentissage en profondeur et leur adoption rapide dans de nombreux domaines d’application ont soulevé d’importantes questions sur l’équité de ces modèles lorsqu’ils sont déployés dans le monde réel. Des études récentes ont mis en évidence les biais encodés par les algorithmes d’apprentissage des représentations et ont remis en cause la fiabilité de telles approches pour prendre des décisions. En conséquence, il existe un intérêt croissant pour la compréhension des sources de biais dans l’apprentissage des algorithmes et le développement de stratégies d’atténuation des biais. L’objectif des algorithmes d’atténuation des biais est d’atténuer l’influence des caractéristiques des données sensibles sur les décisions d’éligibilité prises. Les caractéristiques sensibles sont des caractéristiques privées et protégées d’un ensemble de données telles que le sexe ou la race, qui ne devraient pas affecter les décisions de sortie d’éligibilité, c’està-dire les critères qui rendent un individu qualifié ou non qualifié pour une tâche donnée, comme l’octroi de prêts ou l’embauche. Les modèles d’atténuation des biais visent à prendre des décisions d’éligibilité sur des échantillons d’ensembles de données sans biais envers les attributs sensibles des données d’entrée. La difficulté des tâches d’atténuation des biais est souvent déterminée par la distribution de l’ensemble de données, qui à son tour est fonction du déséquilibre potentiel de l’étiquette et des caractéristiques, de la corrélation des caractéristiques potentiellement sensibles avec d’autres caractéristiques des données, du décalage de la distribution de l’apprentissage vers le phase de développement, etc. Sans l’évaluation des modèles d’atténuation des biais dans diverses configurations difficiles, leurs mérites restent incertains. Par conséquent, une analyse systématique qui comparerait différentes approches d’atténuation des biais sous la perspective de différentes mesures d’équité pour assurer la réplication des résultats conclus est nécessaire. À cette fin, nous proposons un cadre unifié pour comparer les approches d’atténuation des biais. Nous évaluons différentes méthodes d’équité formées avec des réseaux de neurones profonds sur un ensemble de données synthétiques commun et un ensemble de données du monde réel pour obtenir de meilleures informations sur le fonctionnement de ces méthodes. En particulier, nous formons environ 3000 modèles différents dans diverses configurations, y compris des configurations de données déséquilibrées et corrélées, pour vérifier les limites des modèles actuels et mieux comprendre dans quelles configurations ils sont sujets à des défaillances. Nos résultats montrent que le biais des modèles augmente à mesure que les ensembles de données deviennent plus déséquilibrés ou que les attributs des ensembles de données deviennent plus corrélés, le niveau de dominance des caractéristiques des ensembles de données sensibles corrélées a un impact sur le biais, et les informations sensibles restent dans la représentation latente même lorsque des algorithmes d’atténuation des biais sont appliqués. Résumant nos contributions - nous présentons un ensemble de données, proposons diverses configurations d’évaluation difficiles et évaluons rigoureusement les récents algorithmes prometteurs d’atténuation des biais dans un cadre commun et publions publiquement cette référence, en espérant que la communauté des chercheurs le considérerait comme un point d’entrée commun pour un apprentissage en profondeur équitable. / The rapid use and success of deep learning models in various application domains have raised significant challenges about the fairness of these models when used in the real world. Recent research has shown the biases incorporated within representation learning algorithms, raising doubts about the dependability of such decision-making systems. As a result, there is a growing interest in identifying the sources of bias in learning algorithms and developing bias-mitigation techniques. The bias-mitigation algorithms aim to reduce the impact of sensitive data aspects on eligibility choices. Sensitive features are private and protected features of a dataset, such as gender of the person or race, that should not influence output eligibility decisions, i.e., the criteria that determine whether or not an individual is qualified for a particular activity, such as lending or hiring. Bias mitigation models are designed to make eligibility choices on dataset samples without bias toward sensitive input data properties. The dataset distribution, which is a function of the potential label and feature imbalance, the correlation of potentially sensitive features with other features in the data, the distribution shift from training to the development phase, and other factors, determines the difficulty of bias-mitigation tasks. Without evaluating bias-mitigation models in various challenging setups, the merits of deep learning approaches to these tasks remain unclear. As a result, a systematic analysis is required to compare different bias-mitigation procedures using various fairness criteria to ensure that the final results are replicated. In order to do so, this thesis offers a single paradigm for comparing bias-mitigation methods. To better understand how these methods work, we compare alternative fairness algorithms trained with deep neural networks on a common synthetic dataset and a real-world dataset. We train around 3000 distinct models in various setups, including imbalanced and correlated data configurations, to validate the present models’ limits and better understand which setups are prone to failure. Our findings show that as datasets become more imbalanced or dataset attributes become more correlated, model bias increases, the dominance of correlated sensitive dataset features influence bias, and sensitive data remains in the latent representation even after bias-mitigation algorithms are applied. In summary, we present a dataset, propose multiple challenging assessment scenarios, rigorously analyse recent promising bias-mitigation techniques in a common framework, and openly disclose this benchmark as an entry point for fair deep learning.

Page generated in 0.0634 seconds