Return to search

Data mining methods for the prediction of intestinal absorption using QSAR

Oral administration is the most common route for administration of drugs. With the growing cost of drug discovery, the development of Quantitative Structure-Activity Relationships (QSAR) as computational methods to predict oral absorption is highly desirable for cost effective reasons. The aim of this research was to develop QSAR models that are highly accurate and interpretable for the prediction of oral absorption. In this investigation the problems addressed were datasets with unbalanced class distributions, feature selection and the effects of solubility and permeability towards oral absorption prediction. Firstly, oral absorption models were obtained by overcoming the problem of unbalanced class distributions in datasets using two techniques, under-sampling of compounds belonging to the majority class and the use of different misclassification costs for different types of misclassifications. Using these methods, models with higher accuracy were produced using regression and linear/non-linear classification techniques. Secondly, the use of several pre-processing feature selection methods in tandem with decision tree classification analysis – including misclassification costs – were found to produce models with better interpretability and higher predictive accuracy. These methods were successful to select the most important molecular descriptors and to overcome the problem of unbalanced classes. Thirdly, the roles of solubility and permeability in oral absorption were also investigated. This involved expansion of oral absorption datasets and collection of in vitro and aqueous solubility data. This work found that the inclusion of predicted and experimental solubility in permeability models can improve model accuracy. However, the impact of solubility on oral absorption prediction was not as influential as expected. Finally, predictive models of permeability and solubility were built to predict a provisional Biopharmaceutic Classification System (BCS) class using two multi-label classification techniques, binary relevance and classifier chain. The classifier chain method was shown to have higher predictive accuracy by using predicted solubility as a molecular descriptor for permeability models, and hence better final provisional BCS prediction. Overall, this research has resulted in predictive and interpretable models that could be useful in a drug discovery context.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:638485
Date January 2014
CreatorsNewby, Danielle Anne
ContributorsGhafourian, Taravat; Freitas, Alex
PublisherUniversity of Kent
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://kar.kent.ac.uk/47600/

Page generated in 0.0109 seconds