Global ETD Search

Return to search

A STUDY ON THE IMPACT OF PREPROCESSING STEPS ON MACHINE LEARNING MODEL FAIRNESS

<p dir="ltr">The success of machine learning techniques in widespread applications has taught us that with respect to accuracy, the more data, the better the model. However, for fairness, data quality is perhaps more important than quantity. Existing studies have considered the impact of data preprocessing on the accuracy of ML model tasks. However, the impact of preprocessing on the fairness of the downstream model has neither been studied nor well understood. Throughout this thesis, we conduct a systematic study of how data quality issues and data preprocessing steps impact model fairness. Our study evaluates several preprocessing techniques for several machine learning models trained over datasets with different characteristics and evaluated using several fairness metrics. It examines different data preparation techniques, such as changing categories into numbers, filling in missing information, and smoothing out unusual data points. The study measures fairness using standards that check if the model treats all groups equally, predicts outcomes fairly, and gives similar chances to everyone. By testing these methods on various types of data, the thesis identifies which combinations of techniques can make the models both accurate and fair.The empirical analysis demonstrated that preprocessing steps like one-hot encoding, imputation of missing values, and outlier treatment significantly influence fairness metrics. Specifically, models preprocessed with median imputation and robust scaling exhibited the most balanced performance across fairness and accuracy metrics, suggesting a potential best practice guideline for equitable ML model preparation. Thus, this work sheds light on the importance of data preparation in ML and emphasizes the need for careful handling of data to support fair and ethical use of ML in society.</p>

10.25394/pgs.25608363.v1

Data engineering and data science

Data quality

data preprocessing workflow

ML pipeline

ML Fairness

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/25608363
Date	17 April 2024
Creators	Sathvika Kotha (18370548)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/A_STUDY_ON_THE_IMPACT_OF_PREPROCESSING_STEPS_ON_MACHINE_LEARNING_MODEL_FAIRNESS/25608363

Page generated in 0.0018 seconds

A STUDY ON THE IMPACT OF PREPROCESSING STEPS ON MACHINE LEARNING MODEL FAIRNESS

Description

Links & Downloads

Tags

Additional Fields