Return to search

Data Centric Methods For Machine Learning On Qsar Data

The focus in the field of machine learning has increasingly shifted towards data-centric approaches, recognizing that the quality of data is crucial for the effectiveness of the models developed. One significant challenge that can degrade data quality is the presence of outliers. Therefore, this study investigates the impact of various outlier detection algorithms on the performance of machine learning models applied to QSAR datasets. Utilizing methods such as Isolation Forest (IF), Local Outlier Factor (LOF), and One-Class Support Vector Machine (OCSVM), the aim was to explore these methods and evaluate them, identify potential outliers, and assess their influence on model predictions. The study incorporated both synthetic data and real-world datasets, including those obtained from a pharmaceutical company and benchmark datasets. The methodology involved preprocessing the data, applying outlier detection algorithms, and evaluating the models using traditional metrics like Mean Squared Error (MSE) and conformal prediction for uncertainty estimation. Results indicated that no major improvements were observed using the different algorithms and that excessive data removal led to a decline. While OCSVM showed inconsistent performance across different datasets, LOF demonstrated promising potential as a method worth further investigation. This study has even highlighted different challenges including high dimensionality, the need for hyperparameter tuning, and the limitations of current outlier detection methods. It also underscores the complexity of outlier detection in QSAR data and suggests directions for future research to improve model robustness and accuracy.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-532770
Date January 2024
CreatorsSawas, Hala
PublisherUppsala universitet, Institutionen för farmaceutisk biovetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds