Machine learning has gained much attention and extended to the field of drug discovery. However, due to the uncertainties of the dataset, predictions should be quantitatively analyzed. Conformal prediction is a powerful method for quantifying these uncertainties, generating a predefined confidence level and a corresponding interval within which the true target is anticipated to fall. This paper aims to explore the effects of different chemical representations of SMILES structures for training (chemical descriptors, Morgan fingerprints), machine learning algorithms (k-nearest neighbor, support vector machine, random forest, extreme gradient boosting, and artificial neural network), and different normalization methods (k-nearest neighbor, Mondrian regression) in influencing the conformal prediction results. We find that Morgan fingerprint outperforms chemical descriptors, Mondrian regression outperforms knearest neighbor for one or several values of coverage, and the mean, median, and standard deviation of the output interval. None of the investigated machine learning methods extremely outperforms the other methods. Conformal predictive system, an alternative form of conformal prediction was also investigated to explore its usefulness in drug discovery.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-531352 |
Date | January 2024 |
Creators | Chen, Yuhang |
Publisher | Uppsala universitet, Institutionen för farmaceutisk biovetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds