Return to search

Analysis of the impact of mutations and prediction of their pathogenicity

Inherited diseases and cancer are often characterized by single DNA base mutations that can result in altered gene expression, altered mRNA splicing, or changes to the protein structure. The effects of the latter category on protein function and how this is related to disease is the easiest of these to understand. Pathogenic deviations (PDs) are utations reported to be disease-causing, while true single nucleotide polymorphisms (SNPs) are understood to have a negligible effect on phenotype. With recent developments in biotechnology, the most relevant being the increased reliability and speed of sequencing, a wealth of information regarding SNPs and PDs has been acquired. Quite apart from the analytical challenge of analysing this information with a view to identifying novel therapies and targets for disease, the challenge of simply storing, mapping, and processing these data is significant in itself. This thesis builds on earlier work in the Martin group in which a database (SAAPdb) was developed to map mutation data to protein structure and allow the likely local protein structural effects of a mutation to be evaluated. In this thesis, a general introduction to the relevant biology (Chapter 1) and bioinformatics tools and resources (Chapter 2) is provided. In Chapter 3, the Single Amino Acid Polymorphism database (SAAPdb) is described and the work done to fix bugs and update the data is outlined. Despite this work, owing to continuous maintenance problems identified when updating the program, the Martin group has now switched to using a ‘pipeline’ version that no longer relies on any pre-calculated data stored in a database. Earlier work performed during a Masters project showed that some of the analyses were extremely sensitive to tructural details. These analyses have been updated and extended, confirming earlier results. Consequently, some of he analyses were updated to replace Boolean True/False (Good/Bad) assignments with energy or pseudo-energy alues. A pseudo-energy potential was developed for evaluating the effects of mutations to-proline or from-glycine (Chapter 4) and a new full-energy method for assessing the effects of side-chain clashes was evaluated (Chapter 5). method using the structural analyses data together with random forests to predict whether a mutation will be amaging was then developed (Chapter 6). This method was demonstrated to be better than all competing individual methods. A variation of this approach was used to distinguish between two phenotypes (hypertrophic ardiomyopathy – HCM, and dilated cardiomyopathy – DCM ) caused by mutations in the cardiac beta-myosin gene (MYH7, Chapter). The thesis finishes with a general discussion and conclusions (Chapter 8).

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:631945
Date January 2014
CreatorsAl-Numair, N. S.
PublisherUniversity College London (University of London)
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://discovery.ucl.ac.uk/1435701/

Page generated in 0.0021 seconds