The development of medicine is generally a long and costly process, and one big factor is estimating the affinity of protein-drug binding. Leveraging machine learning in this field is a promising approach as it can streamline the prediction process and reduce the need for expensive experimental methods. Machine learning methods have already enabled significant advances in predicting protein-drug binding affinity, yet there remains room for improvement. The primary challenge is the quality of data used for these machine learning models. In this work, two ensemble machine learning models, Random Forest and Extreme Gradient Boosting Machine, have been tested and compared with a recent database of protein-ligand complex features calculated from molecular dynamics simulation. Additional features were also extracted from the PDB database through PLIP (Protein-Ligand interaction Profiler), aiming to improve the predictions further. The results indicate that while the features from the PDB database provided strong predictive power, including features from molecular dynamic simulations did not improve the models’ performance.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-130089 |
Date | January 2024 |
Creators | Guttormsson, Guðmundur Andri, Le Gallo, Léa |
Publisher | Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0015 seconds