Statistical contribution to the virtual multicriteria optimisation of combinatorial molecules libraries and to the validation and application of QSAR models

This thesis develops an integrated methodology based on the desirability index and QSAR models to virtually optimise molecules. Statistical and algorithmic tools are proposed to search in huge collections of compounds obtained by combinatorial chemistry the most promising ones.
First, once the drugability properties of interest have been precisely defined, QSAR models are developed to mimic the relationship between those optimised properties and chemical descriptors of molecules. The literature on QSAR models is reviewed and the statistical tools to validate the models, analyse their fit and their predictive power are detailed.
Even if a QSAR model has been validated and sounds highly predictive, we emphasise the importance of measuring extrapolation by the definition of its applicability domain and quantifying the prediction error for a given molecule. Indeed, QSAR models are often massively applied to predict drugability properties for libraries of new compounds without taking care of the reliability of each individual prediction.
Then, a desirability index measures the compromise between the multiple estimated drugability properties and allows to rank the molecules in the combinatorial library in preference order. The propagation of the models prediction error on the desirability index is quantified by a confidence interval that can be constructed under general conditions for linear regression, PLS regression or regression tree models. This fulfills an important lack of the desirability index literature that considers it as exact.
Finally, a new efficient algorithm (WEALD) is proposed to virtually screen the combinatorial library and retain the molecule with the highest desirability indexes.
For each explored molecule, it is checked if it belongs to the applicability domain of each QSAR models.
In addition, the uncertainty of the desirability index of each explored molecule is taken into account by gathering molecules that can not be distinguished from the optimal one due to the propagation of QSAR models prediction error. Those molecules do not have a significantly smaller desirability than the optimal molecule found by WEALD.
This constitutes another important improvement in the use of desirability index as a tool to compare solutions in a multicriteria optimisation problem.
This integrated methodology has been developed in the context of lead optimisation and is illustrated on a real combinatorial library provided by Eli Lilly and Company. This is the main application of the thesis. Nevertheless, as the results on desirability index uncertainty are applicable under general conditions, they can be applied to any multicriteria optimisation problem, like it often occurs in industry.

Identiferoai:union.ndltd.org:BICfB/oai:ucl.ac.be:ETDUCL:BelnUcetd-01032008-172816
Date07 January 2008
CreatorsLe Bailly de Tilleghem, Céline
PublisherUniversite catholique de Louvain
Source SetsBibliothèque interuniversitaire de la Communauté française de Belgique
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-01032008-172816/
Rightsunrestricted, J'accepte que le texte de la thèse (ci-après l'oeuvre), sous réserve des parties couvertes par la confidentialité, soit publié dans le recueil électronique des thèses UCL. A cette fin, je donne licence à l'UCL : - le droit de fixer et de reproduire l'oeuvre sur support électronique : logiciel ETD/db - le droit de communiquer l'oeuvre au public Cette licence, gratuite et non exclusive, est valable pour toute la durée de la propriété littéraire et artistique, y compris ses éventuelles prolongations, et pour le monde entier. Je conserve tous les autres droits pour la reproduction et la communication de la thèse, ainsi que le droit de l'utiliser dans de futurs travaux. Je certifie avoir obtenu, conformément à la législation sur le droit d'auteur et aux exigences du droit à l'image, toutes les autorisations nécessaires à la reproduction dans ma thèse d'images, de textes, et/ou de toute oeuvre protégés par le droit d'auteur, et avoir obtenu les autorisations nécessaires à leur communication à des tiers. Au cas où un tiers est titulaire d'un droit de propriété intellectuelle sur tout ou partie de ma thèse, je certifie avoir obtenu son autorisation écrite pour l'exercice des droits mentionnés ci-dessus.

Page generated in 0.0028 seconds