With recent and continuing advances in bioinformatics, the volume
of sequence data has increased tremendously. Along with this
increase, there is a growing need to develop efficient algorithms
to process such data in order to make useful and important
discoveries. Careful analysis of genomic data will benefit science
and society in numerous ways, including the understanding of
protein sequence functions, early detection of diseases, and
finding evolutionary relationships that exist among various
organisms.
Most sequence analysis problems arising from computational
genomics and evolutionary biology fall into the class of
NP-complete problems. Advances in exact and
approximate algorithms to address these problems are critical. In
this thesis, we investigate a novel graph theoretical model that
deals with fundamental evolutionary problems. The model allows
incorporation of the evolutionary operations ``insertion',
``deletion', and ``substitution', and various parameters such as
relative distances and weights. By varying appropriate parameters
and weights within the model, several important combinatorial
problems can be represented, including the weighted supersequence,
weighted superstring, and weighted longest common sequence
problems. Consequently, our model provides a general computational
framework for solving a wide variety of important and difficult
biological sequencing problems, including the multiple sequence
alignment problem, and the problem of finding an evolutionary
ancestor of multiple sequences.
In this thesis, we develop large scale combinatorial optimization
techniques to solve our graph theoretical model. In particular, we
formulate the problem as two distinct but related models:
constrained network flow problem and weighted node packing
problem. The integer programming models are solved in a branch and
bound setting using simultaneous column and row generation. The
methodology developed will also be useful to solve large scale
integer programming problems arising in other areas such as transportation and logistics.
Identifer | oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/26676 |
Date | 25 August 2008 |
Creators | Gupta, Kapil |
Publisher | Georgia Institute of Technology |
Source Sets | Georgia Tech Electronic Thesis and Dissertation Archive |
Detected Language | English |
Type | Dissertation |
Page generated in 0.0018 seconds