Return to search

Chemical informatics of prohibited substances

This thesis is based on the field of chemoinformatics, in particular Quantitative Structure Activity Relationships (QSAR), data mining and machine learning algorithms. The work has been broken down into seven chapters. In the first chapter an overview of the categories of the substances on the Prohibited List is given, together with a history of doping in sport over the last century and how anti-doping agencies have been set up to combat and punish offenders. Chapter 2 introduces chemoinformatics, the concept of <i>“molecular similarity”, </i>descriptors, chemical space and feature selection and how classification algorithms can be used to partition this chemical space into islands of bioactivity. The basic approach outlined in this chapter is followed throughout much of this thesis. In Chapter 3 the WADA 2005 dataset is presented, with pictorial representations of the most and least likely structures in each of the ten prohibited classes. The objective of this chapter is to use industrial standard two dimensional chemical descriptors and classification algorithms to see whether it is possible to correctly categorise the different classes of substance in the WADA 2005 Prohibited List. Chapter 4 focuses on the development of an ultrafast hybrid chemical descriptor which takes into account both two and three dimensional information and is used in a virtual screening study to rank molecules in a database taken from the National Cancer Institute (NCI) based on their likelihood of being active. Chapter 5 introduces a novel classification algorithm based on inductive logic programming and a Support Vector Machine called SVILP (Support Vector Inductive Logic Programming). SVILP is compared to MOLPRINT 2D on a well known bench mark dataset of around one hundred thousand molecules, with eleven bioactivity classes. In Chapter 6 the concept of rule based learning is applied to the WADA 2005 Prohibited List. In the first half of this chapter the PART rule based learner is used to classify the prohibited substances and generate rules based on two dimensional chemical descriptors to give some insight on why the substances are part of a specific class. These results add more meaning to simple “<i>yes/no”</i> classification responses obtained through work in earlier chapters. In the second half of this chapter rule learning has been extended to identify novel statistically unusual subgroups of WADA substances using the CN2-SD algorithm.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:597272
Date January 2008
CreatorsCannon, E. O.
PublisherUniversity of Cambridge
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0083 seconds