Global ETD Search

201	Deterministic Parallel Global Parameter Estimation for a Model of the Budding Yeast Cell Cycle Panning, Thomas D. 18 August 2006 (has links) Two parallel deterministic direct search algorithms are combined to find improved parameters for a system of differential equations designed to simulate the cell cycle of budding yeast. Comparing the model simulation results to experimental data is difficult because most of the experimental data is qualitative rather than quantitative. An algorithm to convert simulation results to mutant phenotypes is presented. Vectors of the 143 parameters defining the differential equation model are rated by a discontinuous objective function. Parallel results on a 2200 processor supercomputer are presented for a global optimization algorithm, DIRECT, a local optimization algorithm, MADS, and a hybrid of the two. A second formulation is presented that uses a system of smooth inequalities to evaluate the phenotype of a mutant. Preliminary results of this formulation are given. / Master of Science computational biology MADS algorithm direct search DIRECT algorithm
202	A computational framework for protein-DNA binding discovery. January 2010 (has links) Wong, Ka Chun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 109-121). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgements --- p.iv / List of Figures --- p.ix / List of Tables --- p.xi / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Objective --- p.2 / Chapter 1.3 --- Methodology --- p.2 / Chapter 1.4 --- Bioinforrnatics --- p.2 / Chapter 1.5 --- Computational Methods --- p.3 / Chapter 1.5.1 --- Evolutionary Algorithms --- p.3 / Chapter 1.5.2 --- Data Mining for TF-TFBS bindings --- p.4 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Gene Transcription --- p.5 / Chapter 2.1.1 --- Protein-DNA Binding --- p.6 / Chapter 2.1.2 --- Existing Methods --- p.6 / Chapter 2.1.3 --- Related Databases --- p.8 / Chapter 2.1.3.1 --- TRANSFAC - Experimentally Determined Database --- p.8 / Chapter 2.1.3.2 --- cisRED - Computational Determined Database --- p.9 / Chapter 2.1.3.3 --- ORegAnno - Community Driven Database --- p.10 / Chapter 2.2 --- Evolutionary Algorithms --- p.13 / Chapter 2.2.1 --- Representation --- p.15 / Chapter 2.2.2 --- Parent Selection --- p.16 / Chapter 2.2.3 --- Crossover Operators --- p.17 / Chapter 2.2.4 --- Mutation Operators --- p.18 / Chapter 2.2.5 --- Survival Selection --- p.19 / Chapter 2.2.6 --- Termination Condition --- p.19 / Chapter 2.2.7 --- Discussion --- p.19 / Chapter 2.2.8 --- Examples --- p.19 / Chapter 2.2.8.1 --- Genetic Algorithm --- p.20 / Chapter 2.2.8.2 --- Genetic Programming --- p.21 / Chapter 2.2.8.3 --- Differential Evolution --- p.21 / Chapter 2.2.8.4 --- Evolution Strategy --- p.22 / Chapter 2.2.8.5 --- Swarm Intelligence --- p.23 / Chapter 2.3 --- Association Rule Mining --- p.24 / Chapter 2.3.1 --- Objective --- p.24 / Chapter 2.3.2 --- Apriori Algorithm --- p.24 / Chapter 2.3.3 --- Partition Algorithm --- p.25 / Chapter 2.3.4 --- DHP --- p.25 / Chapter 2.3.5 --- Sampling --- p.25 / Chapter 2.3.6 --- Frequent Pattern Tree --- p.26 / Chapter 3 --- Discovering Protein-DNA Binding Sequence Patterns Using Associa- tion Rule Mining --- p.27 / Chapter 3.1 --- Materials and Methods --- p.28 / Chapter 3.1.1 --- Association Rule Mining and Apriori Algorithm --- p.29 / Chapter 3.1.2 --- Discovering associated TF-TFBS sequence patterns --- p.29 / Chapter 3.1.3 --- "Data, Preparation" --- p.31 / Chapter 3.2 --- Results and Analysis --- p.34 / Chapter 3.2.1 --- Rules Discovered --- p.34 / Chapter 3.2.2 --- Quantitative Analysis --- p.36 / Chapter 3.2.3 --- Annotation Analysis --- p.37 / Chapter 3.2.4 --- Empirical Analysis --- p.37 / Chapter 3.2.5 --- Experimental Analysis --- p.38 / Chapter 3.3 --- Verifications --- p.41 / Chapter 3.3.1 --- Verification by PDB --- p.41 / Chapter 3.3.2 --- Verification by Homology Modeling --- p.45 / Chapter 3.3.3 --- Verification by Random Analysis --- p.45 / Chapter 3.4 --- Discussion --- p.49 / Chapter 4 --- Designing Evolutionary Algorithms for Multimodal Optimization --- p.50 / Chapter 4.1 --- Introduction --- p.50 / Chapter 4.2 --- Problem Definition --- p.51 / Chapter 4.2.1 --- Minimization --- p.51 / Chapter 4.2.2 --- Maximization --- p.51 / Chapter 4.3 --- An Evolutionary Algorithm with Species-specific Explosion for Multi- modal Optimization --- p.52 / Chapter 4.3.1 --- Background --- p.52 / Chapter 4.3.1.1 --- Species Conserving Genetic Algorithm --- p.52 / Chapter 4.3.2 --- Evolutionary Algorithm with Species-specific Explosion --- p.53 / Chapter 4.3.2.1 --- Species Identification --- p.53 / Chapter 4.3.2.2 --- Species Seed Delta Evaluation --- p.55 / Chapter 4.3.2.3 --- Stage Switching Condition --- p.56 / Chapter 4.3.2.4 --- Species-specific Explosion --- p.57 / Chapter 4.3.2.5 --- Calculate Explosion Weights --- p.59 / Chapter 4.3.3 --- Experiments --- p.59 / Chapter 4.3.3.1 --- Performance measurement --- p.60 / Chapter 4.3.3.2 --- Parameter settings --- p.61 / Chapter 4.3.3.3 --- Results --- p.61 / Chapter 4.3.4 --- Conclusion --- p.62 / Chapter 4.4 --- A. Crowding Genetic. Algorithm with Spatial Locality for Multimodal Op- timization --- p.64 / Chapter 4.4.1 --- Background --- p.64 / Chapter 4.4.1.1 --- Crowding Genetic Algorithm --- p.64 / Chapter 4.4.1.2 --- Locality of Reference --- p.64 / Chapter 4.4.2 --- Crowding Genetic Algorithm with Spatial Locality --- p.65 / Chapter 4.4.2.1 --- Motivation --- p.65 / Chapter 4.4.2.2 --- Offspring generation with spatial locality --- p.65 / Chapter 4.4.3 --- Experiments --- p.67 / Chapter 4.4.3.1 --- Performance measurements --- p.67 / Chapter 4.4.3.2 --- Parameter setting --- p.68 / Chapter 4.4.3.3 --- Results --- p.68 / Chapter 4.4.4 --- Conclusion --- p.68 / Chapter 5 --- Generalizing Protein-DNA Binding Sequence Representations and Learn- ing using an Evolutionary Algorithm for Multimodal Optimization --- p.70 / Chapter 5.1 --- Introduction and Background --- p.70 / Chapter 5.2 --- Problem Definition --- p.72 / Chapter 5.3 --- Crowding Genetic Algorithm with Spatial Locality --- p.72 / Chapter 5.3.1 --- Representation --- p.72 / Chapter 5.3.2 --- Crossover Operators --- p.73 / Chapter 5.3.3 --- Mutation Operators --- p.73 / Chapter 5.3.4 --- Fitness Function --- p.74 / Chapter 5.3.5 --- Distance Metric --- p.76 / Chapter 5.4 --- Experiments --- p.77 / Chapter 5.4.1 --- Parameter Setting --- p.77 / Chapter 5.4.2 --- Search Space Estimation --- p.78 / Chapter 5.4.3 --- Experimental Procedure --- p.78 / Chapter 5.4.4 --- Results and Analysis --- p.79 / Chapter 5.4.4.1 --- Generalization Analysis --- p.79 / Chapter 5.4.4.2 --- Verification By PDB --- p.86 / Chapter 5.5 --- Conclusion --- p.87 / Chapter 6 --- Predicting Protein Structures on a Lattice Model using an Evolution- ary Algorithm for Multimodal Optimization --- p.88 / Chapter 6.1 --- Introduction --- p.88 / Chapter 6.2 --- Problem Definition --- p.89 / Chapter 6.3 --- Representation --- p.90 / Chapter 6.4 --- Related Works --- p.91 / Chapter 6.5 --- Crowding Genetic Algorithm with Spatial Locality --- p.92 / Chapter 6.5.1 --- Motivation --- p.92 / Chapter 6.5.2 --- Customization --- p.92 / Chapter 6.5.2.1 --- Distance metrics --- p.92 / Chapter 6.5.2.2 --- Handling infeasible conformations --- p.93 / Chapter 6.6 --- Experiments --- p.94 / Chapter 6.6.1 --- Performance Metrics --- p.94 / Chapter 6.6.2 --- Parameter Settings --- p.94 / Chapter 6.6.3 --- Results --- p.94 / Chapter 6.7 --- Conclusion --- p.95 / Chapter 7 --- Conclusion and Future Work --- p.97 / Chapter 7.1 --- Thesis Contribution --- p.97 / Chapter 7.2 --- Fixture Work --- p.98 / Chapter A --- Appendix --- p.99 / Chapter A.1 --- Problem Definition in Chapter 3 --- p.107 / Bibliography --- p.109 / Author's Publications --- p.122 DNA-binding proteins--Analysis Computer algorithms Computational biology--Methodology DNA-Binding Proteins--analysis Algorithms Computational Biology--methods
203	Evaluation of Annotation Performances between Automated and Curated Databases of <i>E.COLI</i> Using the Correlation Coefficient Marpuri, ReddySalilaja 01 August 2009 (has links) This project compared the performance of the correlation coefficient to show similarities in annotations between a predictive automated bacterial annotation database and the curated EcoCyc database. EcoCyc is a conservative multidimensional annotation system that is exclusively based on experimentally validated findings by over 15,000 publications. The automated annotation system, used in the comparison was BASys. It is often used as a first pass annotation tool that tries to add as many annotations as possible by drawing upon over 30 information sources. Gene ontology served as one basis of comparison between these databases because of the limited common terms in the ontology annotations. Translation libraries were used to extend the number of BASys terms that could be compared to the gene ontology terms in EcoCyc. Additional, non-ontology terms and metadata in BASys were compared to EcoCyc terms after parsing them into root words. The different term sources were quantitatively compared by using the correlation coefficient as the evaluation metric. The direct gene ontology comparison gave the lowest correlation coefficient. The addition of gene ontology terms to BASys by using translation tables of metadata greatly increased the correlation coefficient, which was comparable to the parsed word comparison. The combination of enhanced gene ontology and parsed word methods gave the highest correlation coefficient of 0.16. The controlled vocabulary system of gene ontology was not sufficient to compare two annotated databases. The addition of gene ontology terms from translation libraries greatly increased the performance of these comparisons. In general, as the number of comparison terms increased the correlation coefficient increased. Future comparisons should include the enhanced gene ontology dataset in order to monitor the organization pertaining to formal nomenclature and the datasets generated from Word parsing can be used to monitor the degree of additional terms might be incorporated with translation libraries. EcoCyc gene ontology BASys biological databases computational biology genome biology Computational Biology Genomics Molecular genetics
204	Prediction of secondary structures for large RNA molecules Mathuriya, Amrita 12 January 2009 (has links) The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website. Computational biology Parallel algorithms Viral RNA Proof Molecular biology Computer simulation Computational biology RNA viruses Protein folding Algorithms
205	Smart Sequence Similarity Search (S⁴) system Chen, Zhuo 01 January 2004 (has links) Sequence similarity searching is commonly used to help clarify the biochemical and physiological features of newly discovered genes or proteins. An efficient similarity search relies on the choice of tools and their associated subprograms and numerous parameter settings. To assist researchers in selecting optimal programs and parameter settings for efficient sequence similarity searches, the web-based expert system, Smart Sequence Similarity Search (S4) was developed. Smart (Computer file) Sequential analysis -- Methods Computer programs Computational biology DNA -- Analysis Proteins -- Analysis Computational Biology Computer Sciences
206	Computational protein design: assessment and applications Li, Zhixiu January 2015 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Computational protein design aims at designing amino acid sequences that can fold into a target structure and perform a desired function. Many computational design methods have been developed and their applications have been successful during past two decades. However, the success rate of protein design remains too low to be of a useful tool by biochemists whom are not an expert of computational biology. In this dissertation, we first developed novel computational assessment techniques to assess several state-of-the-art computational techniques. We found that significant progresses were made in several important measures by two new scoring functions from RosettaDesign and from OSCAR-design, respectively. We also developed the first machine-learning technique called SPIN that predicts a sequence profile compatible to a given structure with a novel nonlocal energy-based feature. The accuracy of predicted sequences is comparable to RosettaDesign in term of sequence identity to wild type sequences. In the last two application chapters, we have designed self-inhibitory peptides of Escherichia coli methionine aminopeptidase (EcMetAP) and de novo designed barstar. Several peptides were confirmed inhibition of EcMetAP at the micromole-range 50% inhibitory concentration. Meanwhile, the assessment of designed barstar sequences indicates the improvement of OSCAR-design over RosettaDesign. Computational protein design Energy function Machine learning Self-inhibitory peptide Sequence profile Inhibitor Protein engineering Protein engineering -- Methods Proteins -- Conformation Protein folding Computational biology Computational biology Computational biology -- Methods Machine learning -- Technique
207	Analysis of macromolecular structure through experiment and computation Gossett, John Jared 08 April 2013 (has links) This thesis covers a wide variety of projects within the domain of computational structural biology. Structural biology is concerned with the molecular structure of proteins and nucleic acids, and the relationship between structure and biological function. We used molecular modeling and simulation, a purely computational approach, to study DNA-linked molecular nanowires. We developed a computational tool that allows potential designs to be screened for viability, and then we used molecular dynamics (MD) simulations to test their stability. As an example of using molecular modeling to create experimentally testable hypotheses, we were able to suggest a new design based on pyrrylene vinylene monomers. In another project, we combined experiments and molecular modeling to gain insight into factors that influence the kinetic binding dynamics of fibrin "knob" peptides and complementary "holes." Molecular dynamics simulations provided helpful information about potential peptide structural conformations and intrachain interactions that may influence binding properties. The remaining projects discussed in this thesis all deal with RNA structure. The underlying approach for these studies is a recently developed chemical probing technology called 2'-hydroxyl acylation analyzed by primer extension (SHAPE). One study focuses on ribosomal RNA, specifically the 23S rRNA from T. thermophilus. We used SHAPE experiments to show that Domain III of the T. thermophilus 23S rRNA is an independently folding domain. This first required the development of our own data processing program for generating quantitative and interpretable data from our SHAPE experiments, due to limitations of existing programs and modifications to the experimental protocol. In another study, we used SHAPE chemistry to study the in vitro transcript of the RNA genome of satellite tobacco mosaic virus (STMV). This involved incorporating the SHAPE data into a secondary structure prediction program. The SHAPE-directed secondary structure of the STMV RNA was highly extended and considerably different from that proposed for the RNA in the intact virion. Finally, analyzing SHAPE data requires navigating a complex data processing pipeline. We review some of the various ways of running a SHAPE experiment, and how this affects the approach to data analysis. Computational biology Structural biology Molecular nanowire Ribosomal RNA Satellite tobacco mosaic virus SHAPE chemistry Macromolecular modeling Data analysis Macromolecules Analysis Computational biology Biomolecules Structure Molecular dynamics Computer simulation
208	Generalized pattern matching applied to genetic analysis. / 通用性模式匹配在基因序列分析中的應用 / CUHK electronic theses & dissertations collection / Digital dissertation consortium / Tong yong xing mo shi pi pei zai ji yin xu lie fen xi zhong de ying yong January 2011 (has links) Approximate pattern matching problem is, given a reference sequence T, a pattern (query) Q, and a maximum allowed error e, to find all the substrings in the reference, such that the edit distance between the substrings and the pattern is smaller than or equal to the maximum allowed error. Though it is a well-studied problem in Computer Science, it gains a resurrection in Bioinformatics in recent years, largely due to the emergence of the next-generation high-throughput sequencing technologies. This thesis contributes in a novel generalized pattern matching framework, and applies it to solve pattern matching problems in general and alternative splicing detection (AS) in particular. AS is to map a large amount of next-generation sequencing short reads data to a reference human genome, which is the first and an important step in analyzing the sequenced data for further Biological analysis. The four parts of my research are as follows. / In the first part of my research work, we propose a novel deterministic pattern matching algorithm which applies Agrep, a well-known bit-parallel matching algorithm, to a truncated suffix array. Due to the linear cost of Agrep, the cost of our approach is linear to the number of characters processed in the truncated suffix array. We analyze the matching cost theoretically, and .obtain empirical costs from experiments. We carry out experiments using both synthetic and real DNA sequence data (queries) and search them in Chromosome-X of a reference human genome. The experimental results show that our approach achieves a speed-up of several magnitudes over standard Agrep algorithm. / In the fourth part, we focus on the seeding strategies for alternative splicing detection. We review the history of seeding-and-extending (SAE), and assess both theoretically and empirically the seeding strategies adopted in existing splicing detection tools, including Bowtie's heuristic and ABMapper's exact seedings, against the novel complementary quad-seeding strategy we proposed and the corresponding novel splice detection tool called CS4splice, which can handle inexact seeding (with errors) and all 3 types of errors including mismatch (substitution), insertion, and deletion. We carry out experiments using short reads (queries) of length 105bp comprised of several data sets consisting of various levels of errors, and align them back to a reference human genome (hg18). On average, CS4splice can align 88. 44% (recall rate) of 427,786 short reads perfectly back to the reference; while the other existing tools achieve much smaller recall rates: SpliceMap 48.72%, MapSplice 58.41%, and ABMapper 51.39%. The accuracies of CS4splice are also the highest or very close to the highest in all the experiments carried out. But due to the complementary quad-seeding that CS4splice use, it takes more computational resources, about twice (or more) of the other alternative splicing detection tools, which we think is practicable and worthy. / In the second part, we define a novel generalized pattern (query) and a framework of generalized pattern matching, for which we propose a heuristic matching algorithm. Simply speaking, a generalized pattern is Q 1G1Q2 ... Qc--1Gc--1 Qc, which consists of several substrings Q i and gaps Gi occurring in-between two substrings. The prototypes of the generalized pattern come from several real Biological problems that can all be modeled as generalized pattern matching problems. Based on a well-known seeding-and-extending heuristic, we propose a dual-seeding strategy, with which we solve the matching problem effectively and efficiently. We also develop a specialized matching tool called Gpattern-match. We carry out experiments using 10,000 generalized patterns and search them in a reference human genome (hg18). Over 98.74% of them can be recovered from the reference. It takes 1--2 seconds on average to recover a pattern, and memory peak goes to a little bit more than 1G. / In the third part, a natural extension of the second part, we model a real biological problem, alternative splicing detection, into a generalized pattern matching problem, and solve it using a proposed bi-directional seeding-and-extending algorithm. Different from all the other tools which depend on third-party tools, our mapping tool, ABMapper, is not only stand-alone but performs unbiased alignments. We carry out experiments using 427,786 real next-generation sequencing short reads data (queries) and align them back to a reference human genome (hg18). ABMapper achieves 98.92% accuracy and 98.17% recall rate, and is much better than the other state-of-the-art tools: SpliceMap achieves 94.28% accuracy and 78.13% recall rate;while TopHat 88.99% accuracy and 76.33% recall rate. When the seed length is set to 12 in ABMapper, the whole searching and alignment process takes about 20 minutes, and memory peak goes to a little bit more than 2G. / Ni, Bing. / Adviser: Kwong-Sak Leung. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical referencesTexture mapping (leaves 151-161). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Combinatorial analysis Computational biology Computer algorithms DNA--Analysis--Data processing Genetics--Methodology Matching theory Proteins--Analysis--Data processing Computational Biology--methods Sequence Analysis, DNA Sequence Analysis, Protein
209	Graphical representation of biological sequences and its applications. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2010 (has links) Among all existing alignment-free methods for comparing biological sequences, the sequence graphical representation provides a simple approach to view, sort, and compare gene structures. The aim of graphical representation is to display DNA or protein sequences graphically so that we can easily find out visually how similar or how different they are. Of course, only the visual comparison of sequences is not enough for the follow-up research work. We need more accurate comparison. This leads us to develop the application of the graphical representation for biological sequences. / In this thesis, we have two main contributions: (1) We construct a protein map with the help of our proposed new graphical representation for protein sequences. Each protein sequence can be represented as a point in this map, and cluster analysis of proteins can be performed for comparison between the points. This protein map can be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence. (2) We construct a novel genome space with biological geometry, which is a subspace in RN . In this space each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships. / Yu, Chenglong. / Adviser: Luk Hing Sun. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 59-64). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Amino acid sequence--Mathematical models Computational biology Nucleotide sequence--Mathematical models Base Sequence Computational Biology Mathematics Sequence Alignment Sequence Analysis, DNA Sequence Analysis, Protein
210	Probing Protein Dynamics Through Mutational and Computational Studies of HIV-1 Protease: A Dissertation Murzycki, Jennifer E. 15 September 2006 (has links) How proteins undergo conformational changes to bind a ligand is one of the most fundamental questions of protein biology. MD simulations provide a useful computational tool for studying the theoretical movements of protein in solution on nanosecond timescales. The results of these simulations can be used to guide experimental design. By correlating the theoretical models with the results of experimental studies, we can obtain a significant amount of information about protein dynamics. This study represents the application of both computational and traditional experimental techniques to study protein dynamics in HIV-1 protease. The results provide a novel mechanism for the conformational changes in proteins and address the role of residues outside the active site in protein dynamics. Additionally, these results are applied to the complex role of non-active site mutations in the development of drug resistance. Chapter II examines an invariant Thr80 at the apex of the P1 loop of HIV-1, HIV-2, and simian immunodeficiency virus protease. Sequence variability associated with human immunodeficiency virus type 1 (HIV-1) is useful for inferring structural and/or functional constraints at specific residues within the viral protease. Positions that are invariant even in the presence of drug selection define critically important residues for protease function. Three protease variants (T80V, T80N, and T80S) were examined for changes in structure, dynamics, enzymatic activity, affinity for protease inhibitors, and viral infectivity. While all three variants were structurally similar to the wild type, only T80S was functionally similar. T80V significantly decreased the ability of the enzyme to cleave a peptide substrate but maintained infectivity, while T80N abolished both activity and viral infectivity. Additionally, T80N decreased the conformational flexibility of the flap region, as observed by simulations of molecular dynamics. Taken together, these data indicate that HIV-1 protease functions best when residue 80 is a small polar residue and that mutations to other amino acids significantly impair enzyme function, possibly by affecting the flexibility of the flap domain. Chapter III focuses on residues within the hydrophobic core of each monomer in HIV-1 protease. Many hydrophobic residues located in the core of this dimeric enzyme frequently mutate in patients undergoing protease inhibitor therapy. The mechanism by which these mutations aid the development of drug resistance is not well understood. Using MD simulations, this study suggests that the hydrophobic residues outside the active site facilitate the conformational change that occurs in HIV-1 protease upon binding substrates and inhibitors. In these simulations, the core of each monomer significantly rearranges to assist in the expansion of the active site as hydrophobic core residues slide by each other, exchanging one hydrophobic contact for another. Such hydrophobic sliding may represent a general mechanism by which proteins undergo conformational changes. Mutation of these hydrophobic core residues would alter the packing of the hydrophobic core. Thus, these residues could facilitate drug resistance in HIV-1 protease by altering dynamic properties of HIV-1 protease preferentially affecting the relative affinity for inhibitors versus substrates. Chapter IV concentrates on a residue in the flap region, Ile54, which is significantly correlated with the development of drug resistance. A series of patient sequences containing the mutation I54A were evaluated for the most frequently occurring co-mutations. I54A was found to occur with mutations that were previously correlated with I54V mutations, including L10I, G48V, and V82A. Based on the results of this evaluation, the binding properties of five variant proteases were investigated: MDI54V, MDRI54A, I54V, I54A, and G48V. MDRI54V and MDRI54Aeach contained the mutations L10I, G48V, and V82A, and either I54V or I54A, respectively. The other variants contained only the mutation indicated. Mutations at Ile54 were able to significantly impact the thermodynamics of binding to saquinavir, amprenavir, and the recently approved darunavir. The magnitude of this impact depended on the presence or absence of other drug resistance mutations, including another mutation in the flap region, G48V. Therefore, while residues 48 and 54 are not in contact with each other, mutations at both sites had a cooperative effect that varies between inhibitors. The results demonstrate that residues outside the active site of HIV-1 protease are clearly important to enzyme function, possibly through their role in the dynamic properties of protease. Mutations outside the active site of protease that are known to cause drug resistance could alter the conformational flexibility of protease. While the role of protein dynamics in molecular recognition is still not fully understood, the results of this study indicate that altering the dynamic properties of a protein affects its ability to recognize ligands. Therefore, to design better inhibitors we will have to develop a more thorough understanding of protein dynamics. Protein Conformation HIV Protease HIV-1 Computational Biology Amino Acids, Peptides, and Proteins Chemical Actions and Uses Computational Biology Genetic Phenomena Therapeutics Viruses

Search results