• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Evaluation verschiedener Imputationsverfahren zur Aufbereitung großer Datenbestände am Beispiel der SrV-Studie von 2013

Meister, Romy 09 March 2016 (has links)
Missing values are a serious problem in surveys. The literature suggests to replace these with realistic values using imputation methods. This master thesis examines four different imputation techniques concerning their ability for handling missing data. Therefore, mean imputation, conditional mean imputation, Expectation-Maximization algorithm and Markov-Chain-Monte-Carlo method are presented. In addition, the three first mentioned methods were simulated by using a large real data set. To analyse the quality of these techniques a metric variable of the original data set was chosen to generate some missing values considering different percentages of missingness and common missing data mechanism. After the replacement of the simulated missing values, several statistical parameters, like quantiles, arithmetic mean and variance of all completed data sets were calculated in order to compare them with the parameters from the original data set. The results, that have been established by empiric data analysis, show that the Expectation-Maximization algorithm estimates all considered statistical parameters of the complete data set far better than the other analysed imputation methods, although the assumption of a multivariate normal distribution could not be achieved. It is found, that the mean as well as the conditional mean imputation produce statistically significant estimator for the arithmetic mean under the supposition of missing completely at random, whereas other parameters as the variance do not show the estimated effects. Generally, the accuracy of all estimators from the three imputation methods decreases with increasing percentage of missingness. The results lead to the conclusion that the Expectation-Maximization algorithm should be preferred over the mean and the conditional mean imputation.
2

Identifying Induced Bias in Machine Learning

Chowdhury Mohammad Rakin Haider (18414885) 22 April 2024 (has links)
<p dir="ltr">The last decade has witnessed an unprecedented rise in the application of machine learning in high-stake automated decision-making systems such as hiring, policing, bail sentencing, medical screening, etc. The long-lasting impact of these intelligent systems on human life has drawn attention to their fairness implications. A majority of subsequent studies targeted the existing historically unfair decision labels in the training data as the primary source of bias and strived toward either removing them from the dataset (de-biasing) or avoiding learning discriminatory patterns from them during training. In this thesis, we show label bias is not a necessary condition for unfair outcomes from a machine learning model. We develop theoretical and empirical evidence showing that biased model outcomes can be introduced by a range of different data properties and components of the machine learning development pipeline.</p><p dir="ltr">In this thesis, we first prove that machine learning models are expected to introduce bias even when the training data doesn’t include label bias. We use the proof-by-construction technique in our formal analysis. We demonstrate that machine learning models, trained to optimize for joint accuracy, introduce bias even when the underlying training data is free from label bias but might include other forms of disparity. We identify two data properties that led to the introduction of bias in machine learning. They are the group-wise disparity in the feature predictivity and the group-wise disparity in the rates of missing values. The experimental results suggest that a wide range of classifiers trained on synthetic or real-world datasets are prone to introducing bias under feature disparity and missing value disparity independently from or in conjunction with the label bias. We further analyze the trade-off between fairness and established techniques to improve the generalization of machine learning models such as adversarial training, increasing model complexity, etc. We report that adversarial training sacrifices fairness to achieve robustness against noisy (typically adversarial) samples. We propose a fair re-weighted adversarial training method to improve the fairness of the adversarially trained models while sacrificing minimal adversarial robustness. Finally, we observe that although increasing model complexity typically improves generalization accuracy, it doesn’t linearly improve the disparities in the prediction rates.</p><p dir="ltr">This thesis unveils a vital limitation of machine learning that has yet to receive significant attention in FairML literature. Conventional FairML literature reduces the ML fairness task to as simple as de-biasing or avoiding learning discriminatory patterns. However, the reality is far away from it. Starting from deciding on which features collect up to algorithmic choices such as optimizing robustness can act as a source of bias in model predictions. It calls for detailed investigations on the fairness implications of machine learning development practices. In addition, identifying sources of bias can facilitate pre-deployment fairness audits of machine learning driven automated decision-making systems.</p>

Page generated in 0.1014 seconds