351 |
Finding conserved patterns in biological sequences, networks and genomesYang, Qingwu 15 May 2009 (has links)
Biological patterns are widely used for identifying biologically interesting regions
within macromolecules, classifying biological objects, predicting functions and studying
evolution. Good pattern finding algorithms will help biologists to formulate and
validate hypotheses in an attempt to obtain important insights into the complex
mechanisms of living things.
In this dissertation, we aim to improve and develop algorithms for five biological
pattern finding problems. For the multiple sequence alignment problem, we propose
an alternative formulation in which a final alignment is obtained by preserving pairwise
alignments specified by edges of a given tree. In contrast with traditional NPhard
formulations, our preserving alignment formulation can be solved in polynomial
time without using a heuristic, while having very good accuracy.
For the path matching problem, we take advantage of the linearity of the query
path to reduce the problem to finding a longest weighted path in a directed acyclic
graph. We can find k paths with top scores in a network from the query path in
polynomial time. As many biological pathways are not linear, our graph matching
approach allows a non-linear graph query to be given. Our graph matching formulation
overcomes the common weakness of previous approaches that there is no
guarantee on the quality of the results.
For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that
allow direct comparisons of clusters of different sizes. We explore both a restricted
version which requires that orthologous genes are strictly ordered within each cluster,
and the unrestricted problem that allows paralogous genes within a genome and clusters
that may not appear in every genome. We solve the first problem in polynomial
time and develop practical exact algorithms for the second one.
In the gene cluster querying problem, based on a querying strategy, we propose
an efficient approach for investigating clustering of related genes across multiple
genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial
genomes, we show that our algorithm is efficient enough to study gene clusters across
hundreds of genomes.
|
352 |
Structural, Functional and Evolutionary Characterization of Sense-Antisense Transcripts in MammalsDickens, Charles 2009 May 1900 (has links)
Sense-antisense transcripts (SATs) are messenger RNA (mRNA) transcripts that have regions that are complementary to regions of other mRNA transcripts. SATs may play an influential role in the regulation of gene expression. One evolutionary event that has had a dramatic impact on many genomes is the widespread dispersal of repetitive sequences which includes transposable elements (TEs) as well as simple and tandem repeats. Approximately 45% of the human and 37.5% of the mouse genomes are composed of repeats derived from transposable elements. A group of SATs was identified as resulting from transposable elements integrating into the coding strand of some genes and into the template strand of the coding region of other genes. These SATs may add to the complexity of an organism's regulatory network or they may be the result of rather recent TE activities yet to succumb to sequence divergence.
The human, mouse and bovine genomes were analyzed for SATs using publicly available datasets and bioinformatics analysis tools. Each sense-antisense binding region (SABR) was aligned to transposable elements from the RepBase repeat database revealing many SABRs containing TE sequence in a large portion of the sequence. A Gene Ontology analysis on subsets of the data showed enrichments for the functional category of "DNA repair" and the component category "cytoplasm". An analysis of the substitution rates in human and mouse across the 3' UTRs of transcripts containing SABRs at the 5' end of their 3' UTRs showed that the substitution rate in the region of the SABR was lower than compared to the beginning of the 3' UTR. The lower percent GC composition found at the 3' end of the 3' UTRs could be attributed to conserved poly-A signals in this region.
|
353 |
Validation of a novel expressed sequence tag (EST) clustering method and development of a phylogenetic annotation pipeline for livestock gene familiesVenkatraman, Anand 2008 December 1900 (has links)
Prediction of functions of genes in a genome is a key step in all genome sequencing projects. Sequences that carry out important functions are likely to be conserved between evolutionarily distant species and can be identified using cross-species comparisons. In the absence of completed genomes and the accompanying high-quality annotations, expressed sequence tags (ESTs) from random cDNA clones are the primary tools for functional genomics. EST datasets are fragmented and redundant, necessitating clustering of ESTs into groups that are likely to have been derived from the same genes. EST clustering helps reduce the search space for sequence homology searching and improves the accuracy of function predictions using EST datasets. This dissertation is a case study that describes clustering of Bos taurus and Sus scrofa EST datasets, and utilizes the EST clusters to make computational function predictions using a comparative genomics approach. We used a novel EST clustering method, TAMUClust, to cluster bovine ESTs and compare its performance to the bovine EST clusters from TIGR Gene Indices (TGI) by using bovine ESTs aligned to the bovine genome assembly as a gold standard. This comparison study reveals that TAMUClust and TGI are similar in performance. Comparisons of TAMUClust and TGI with predicted bovine gene models reveal that both datasets are similar in transcript coverage.
We describe here the design and implementation of an annotation pipeline for predicting functions of the Bos taurus (cattle) and Sus scrofa (pig) transcriptomes. EST datasets were clustered into gene families using Ensembl protein family clusters as a framework. Following clustering, the EST consensus sequences were assigned predicted function by transferring annotations of the Ensembl vertebrate protein(s) they are grouped to after sequence homology searches and phylogenetic analysis. The annotations benefit the livestock community by helping narrow down the gamut of direct experiments needed to verify function.
|
354 |
A multi-agent-based distributed computing environment for bioinformatics applicationsKe, Hung-i 27 July 2009 (has links)
The process of bioinformatics computing consumes huge computing resources, in
situation of difficulty in improvement of algorithm and high cost of mainframe, many
scholars choice distributed computing as an approach for reducing computing time. When
using distributed computing for bioinformatics, how to find a properly tasks allocation
strategy among different computing nodes to keep load-balancing is an important issue. By
adopting multi-agent system as a tool, system developer can design tasks allocation strategies
through intuitional view and keep load-balancing among computing nodes.
The purpose of our research work is using multi-agent system as an underlying tool to
develop a distributed computing environment and assist scholars in solving bioinformatics
computing problem, In comparison with public computing projects such as BOINC, our
research work focuses on utilizing computing nodes deployed inside organization and
connected by local area network.
|
355 |
Combinatorial approaches for problems in bioinformaticsMeneses, Cláudio N. January 2005 (has links)
Thesis (Ph. D.)--University of Florida, 2005. / Title from title page of source document. Document formatted into pages; contains 105 pages. Includes vita. Includes bibliographical references.
|
356 |
The evolutionary history and genomic impact of mammalian DNA transposonsPace, John Kelly. January 2008 (has links)
Thesis (Ph.D.)--University of Texas at Arlington, 2008.
|
357 |
The structural and functional landscape of protein superfamilies: From the thioredoxin fold to parasite peptidasesAtkinson, Holly J. January 2009 (has links)
Thesis (Ph. D.)--University of California, San Francisco, 2009. / Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3484. Adviser: Patricia C. Babbitt.
|
358 |
Visualization of protein 3D structures in reduced represetnation with simultaneous display of intra and inter-molecular interactions /Sheth, Vrunda. January 2009 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 2009. / Typescript. Includes bibliographical references (leaves 34-36).
|
359 |
Evolutionary coupling in multisubunit membrane protein complexes /Natarajan, Shreedhar. January 2008 (has links)
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008. / Source: Dissertation Abstracts International, Volume: 69-05, Section: B, page: 2701. Adviser: Eric Jakobsson. Includes bibliographical references (leaves 120-147) Available on microfilm from Pro Quest Information and Learning.
|
360 |
Development of a bioinformatics and statistical framework to integrate biological resources for genome-wide genetic mapping and its applicationsLi, Miaoxin. January 2009 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2010. / Includes bibliographical references (p. 169-186). Also available in print.
|
Page generated in 0.0739 seconds