Global ETD Search

Return to search

Feature Identification and Reduction for Improved Generalization Accuracy in Secondary-Structure Prediction Using Temporal Context Inputs in Machine-Learning Models

A protein's properties are influenced by both its amino-acid sequence and its three-dimensional conformation. Ascertaining a protein's sequence is relatively easy using modern techniques, but determining its conformation requires much more expensive and time-consuming techniques. Consequently, it would be useful to identify a method that can accurately predict a protein's secondary-structure conformation using only the protein's sequence data. This problem is not trivial, however, because identical amino-acid subsequences in different contexts sometimes have disparate secondary structures, while highly dissimilar amino-acid subsequences sometimes have identical secondary structures. We propose (1) to develop a set of metrics that facilitates better comparisons between dissimilar subsequences and (2) to design a custom set of inputs for machine-learning models that can harness contextual dependence information between the secondary structures of successive amino acids in order to achieve better secondary-structure prediction accuracy.

Bioinformatics

machine learning

secondary-structure prediction

amino-acid properties

Computer Sciences

Identifer	oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-6266
Date	01 May 2015
Creators	Seeley, Matthew Benjamin
Publisher	BYU ScholarsArchive
Source Sets	Brigham Young University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	http://lib.byu.edu/about/copyright/

Page generated in 0.0022 seconds

Feature Identification and Reduction for Improved Generalization Accuracy in Secondary-Structure Prediction Using Temporal Context Inputs in Machine-Learning Models

Description

Links & Downloads

Tags

Additional Fields