Global ETD Search

Return to search

Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides

Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important. / Singapore-MIT Alliance (SMA)

http://hdl.handle.net/1721.1/30388

Machine learning

peptides

modeling

physio-chemical properties

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/30388
Date	01 1900
Creators	Jensen, Kyle, Styczynski, Mark, Stephanopoulos, Gregory
Source Sets	M.I.T. Theses and Dissertation
Language	English
Detected Language	English
Type	Article
Format	331891 bytes, application/pdf
Relation	Molecular Engineering of Biological and Chemical Systems (MEBCS)

Page generated in 0.0019 seconds

Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides

Description

Links & Downloads

Tags

Additional Fields