Global ETD Search

Return to search

Search and Analysis of the Sequence Space of a Protein Using Computational Tools

A new approach to the process of Directed Evolution is proposed, which utilizes different machine learning algorithms. Directed Evolution is a process of improving a protein for catalytic purposes by introducing random mutations in its sequence to create variants. Through these mutations, Directed Evolution explores the sequence space, which is defined as all the possible sequences for a given number of amino acids. Each variant sequence is divided into one of two classes, positive or negative, according to their activity or stability. By employing machine learning algorithms for feature selection on the sequence of these variants of the protein, attributes or amino acids in its sequence important for the classification into positive or negative, can be identified. Support Vector Machines (SVMs) were utilized to identify the important individual amino acids for any protein, which have to be preserved to maintain its activity. The results for the case of beta-lactamase show that such residues can be identified with high accuracy while using a small number of variant sequences. Another class of machine learning problems, Boolean Learning, was used to extend this approach to identifying interactions between the different amino acids in a proteins sequence using the variant sequences. It was shown through simulations that such interactions can be identified for any protein with a reasonable number of variant sequences. For experimental verification of this approach, two fluorescent proteins, mRFP and DsRed, were used to generate variants, which were screened for fluorescence. Using Boolean Learning, an interacting pair was identified, which was shown to be important for the fluorescence. It was also shown through experiments and simulations that knowing such pairs can increase the fraction active variants in the library. A Boolean Learning algorithm was also developed for this application, which can learn Boolean functions from data in the presence of classification noise.

http://hdl.handle.net/1853/14115

OCAT

Interacting residues of a protein

Important residues of a protein

Boolean learning

Support vector machines

Machine learning

Directed evolution

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/14115
Date	25 August 2006
Creators	Dubey, Anshul
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Dissertation
Format	14804478 bytes, application/pdf

Page generated in 0.0017 seconds

Search and Analysis of the Sequence Space of a Protein Using Computational Tools

Description

Links & Downloads

Tags

Additional Fields