Global ETD Search

Return to search

COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences

Consensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs.

http://hdl.handle.net/10012/1050

Computer Science

bioinformatics software

multiple alignment

motif-finding

consensus pattern problem

Identifer	oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OWTU.10012/1050
Date	January 2001
Creators	Liang, Chengzhi
Publisher	University of Waterloo
Source Sets	Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Language	English
Detected Language	English
Type	Thesis or Dissertation
Format	application/pdf, 439052 bytes, application/pdf
Rights	Copyright: 2001, Liang, Chengzhi. All rights reserved.

Page generated in 0.0022 seconds

COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences

Description

Links & Downloads

Tags

Additional Fields