Global ETD Search

341	Annotation Concept Synthesis and Enrichment Analysis: a Logic-Based Approach to the Interpretation of High-Throughput Biological Experiments Jiline, Mikhail 26 January 2011 (has links) Annotation Enrichment Analysis is a widely used analytical methodology to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information for sets of genes identified by the previous processing stages (e.g., a set of differentially expressed genes, a cluster). Enrichment analysis algorithms attach annotations to the genes and then discover statistical fluctuations of individual annotation terms in a given gene subset. The annotation terms represent different aspects of biological knowledge and come from databases such as GO, BIND, KEGG. Typical statistical models used to detect enrichments or depletions of annotation terms are hypergeometric, binomial and X2. At the end, the discovered information is utilized by human experts to find biological interpretations of the experiments. The main drawback of AEA is that it isolates and tests for overrepresentation of isolated individual annotation terms or groups of similar terms. As a result, AEA is limited in its ability to uncover complex phenomena involving relationships between multiple annotation terms from various knowledge bases. Also, AEA assumes that annotations describe the whole object of interest, which makes it difficult to apply it to sets of compound objects (e.g., sets of protein-protein interactions) and to sets of objects having an internal structure (e.g., protein complexes). To overcome this shortcoming, we propose a novel logic-based Annotation Concept Synthesis and Enrichment Analysis (ACSEA) approach. In this approach, the source annotation information, experimental data and uncovered enriched annotations are represented as First-Order Logic (FOL) statements. ACSEA uses the fusion of inductive logic reasoning with statistical inference to uncover more complex phenomena captured by the experiments. The proposed paradigm allows a synthesis of enriched annotation concepts that better describe the observed biological processes. The methodological advantage of Annotation Concept Synthesis and Enrichment Analysis is six-fold. Firstly, it is easier to represent complex, structural annotation information. Information already captured and formalized in OWL and RDF knowledge bases can be directly utilized. Secondly, it is possible to synthesize and analyze complex annotation concepts. Thirdly, it is possible to perform the enrichment analysis for sets of aggregate objects (such as sets of genetic interactions, physical protein-protein interactions or sets of protein complexes). Fourthly, annotation concepts are straightforward to interpret by a human expert. Fifthly, the logic data model and logic induction are a common platform that can integrate specialized analytical tools (e.g. tools for numerical, structural and sequential analysis). Sixthly, used statistical inference methods are robust on noisy and incomplete data, scalable and trusted by human experts in the field. In this thesis we developed and implemented the ACSEA approach. We evaluate it on large-scale datasets from several microarray experiments and on a clustered genome-wide genetic interaction network using different biological knowledge bases. Also, we define a statistical model of experimental and annotation data and evaluate ACSEA on synthetic datasets. The discovered interpretations are more enriched in terms of P- and Q-values than the interpretations found by AEA, are highly integrative in nature, and include analysis of quantitative and structured information present in the knowledge bases. The results suggest that ACSEA can significantly boost the effectiveness of the processing of high-throughput experiment data. Bioinformatics ILP Enrichment Analysis
342	The Philosophy of Bioinformatics Mikhael, Joseph January 2007 (has links) The development of bioinformatics as an influential biological field should interest philosophers of biology and philosophers of science in general. Bioinformatics contributes significantly to the development of biological knowledge using a variety of scientific methods. Particular tools used by bioinformaticists, such as BLAST, phylogenetic tree creation software, and DNA microarrays, will be shown to utilize the scientific methods of extended cognition, analogical reasoning, and representations of mechanisms. Extended cognition is found in bioinformatics through the use of computer databases and algorithms in the representation and development of scientific theories in bioinformatics. Analogical reasoning is found in bioinformatics through particular analogical comparisons that are made between biological sequences and operations. Lastly, scientific theories that are created using certain bioinformatics tools are often representations of mechanisms. These methods are found in other scientific fields, but it will be shown that these methods are expanded in bioinformatics research through the use of computers to make the methods of analogical reasoning and representation of mechanisms more powerful. bioinformatics philosophy Philosophy
343	Purification and characterisation of plasmodium falciparum Hypoxanthine phosphoribosyltransferase. Murungi, Edwin Kimathi January 2007 (has links) <p>Malaria remains the most important parasitic disease worldwide. It is estimated that over 500 million infections and more that 2.7 million deaths arising from malaria occur each year. Most (90%) of the infections occur in Africa with the most affected groups being children of less than five years of age and women. this dire situation is exacerbated by the emrggence of drug resistant strains of Plasmodium falciparum. The work reported in this thesis focuses on improving the purification of PfHPRT by investigating the characteristics of anion exchange DE-52 chromatography (the first stage of purification), developing an HPLC gel filtration method for examining the quaternary structure of the protein and possible end stage purification, and initialcrystalization trials. a homology model of the open, unligaded PfHPRT is constructed using the atoomic structures of human, T.ccruz and STryphimurium HPRT as templates.</p> Plasmodium falciparum Bioinformatics Malaria.
344	Deriving executable models of biochemical network dynamics from qualitative data January 2009 (has links) Progress in advancing our understanding of biological systems is limited by their sheer complexity, the cost of laboratory materials and equipment, and limitations of current laboratory technology. Computational and mathematical modeling provide ways to address these obstacles through hypothesis generation and testing without experimentation---allowing researchers to analyze system structure and dynamics in silico and, then, design lab experiments that yield desired information about phenomena of interest. These models, however, are only as accurate and complete as the data used to build them. Currently, most models are constructed from quantitative experimental data. However, since accurate quantitative measurements are hard to obtain and difficult to adapt from literature and online databases, new sources of data for building models need to be explored. In my work, I have designed methods for building and executing computational models of cellular network dynamics based on qualitative experimental data, which are more abundant, easier to obtain, and reliably reproducible. Such executable models allow for in silico perturbation, simulation, and exploration of biological systems. In this thesis, I present two general strategies for building and executing tokenized models of biochemical networks using only qualitative data. Both methods have been successfully used to model and predict the dynamics of signaling networks in normal and cancer cell lines, rivaling the accuracy of existing methods trained on quantitative data. I have implemented these methods in the software tools PathwayOracle and Monarch, making the new techniques I present here accessible to experimental biologists and other domain experts in cellular biology. Biology Bioinformatics Computer Science
345	Mapping the structural landscape of protein families with geometric feature vectors January 2010 (has links) This thesis describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine family-wide sub-group organization by quantifying the substructural variation within a protein family. The results demonstrate examples of automatically determined sub-groups that can be linked to phylogenetic distance between family members, segregation by ligation state, and organization by ancestry among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative template for each of the subgroups determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing templates. This work provides an unbiased, automated assessment of the structural variability of identified substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. Biology Bioinformatics Computer Science
346	Methods for detecting multi-locus genotype-phenotype association January 2010 (has links) Solutions to the genotype-phenotype problem seek to identify the set of genetic mutations and interactions between them which modify risk for and severity of a trait of interest. I propose association graph reduction (AGR), a novel algorithm to detect such genetic lesions in genome-wide data, particularly in the presence of high-order interactions. I describe several existing methods and evaluate their performance in terms of computational cost and power to detect associations. An objective comparison of the results shows that AGR successfully combines high power with computational efficiency, while providing a detailed account of interactions present in the data. No other known method combines these three properties. When applied to real data, AGR can be used to discover genetic causes of common diseases such as arthritis, hypertension, diabetes, asthma, and many others, which will facilitate the discovery of novel diagnostic tools and treatment protocols. Biology Bioinformatics Computer Science
347	Inference of parsimonious species phylogenies from multi-locus data January 2010 (has links) The main focus of this dissertation is the inference of species phylogenies, i.e. evolutionary histories of species. Species phylogenies allow us to gain insights into the mechanisms of evolution and to hypothesize past evolutionary events. They also find applications in medicine, for example, the understanding of antibiotic resistance in bacteria. The reconstruction of species phylogenies is, therefore, of both biological and practical importance. In the traditional method for inferring species trees from genetic data, we sequence a single locus in species genomes, reconstruct a gene tree, and report it as the species tree. Biologists have long acknowledged that a gene tree can be different from a species tree, thus implying that this traditional method might infer the wrong species tree. Moreover, reticulate events such as horizontal gene transfer and hybridization make the evolution of species no longer tree-like. The availability of multi-locus data provides us with excellent opportunities to resolve those long standing problems. In this dissertation, we present parsimony-based algorithms for reconciling species/gene tree incongruence that is assumed to be due solely to lineage sorting. We also describe a unified framework for detecting hybridization despite lineage sorting. To address the first problem of species/gene tree incongruence caused by lineage sorting, we present three algorithms. In Chapter 3, we present an algorithm based on an integer-linear programming (ILP) formula to infer the species tree's topology and divergence times from multiple gene trees. In Chapter 4, we describe two methods that infer the species tree by minimizing deep coalescences (MDC), a criterion introduced by Maddison in 1997. The first method is also based on an ILP formula, but it eliminates the enumeration phase of candidate species trees of the algorithm in Chapter 3. The second algorithm further eliminates the dependence on external ILP solvers by employing dynamic programming. We ran those methods on both biological and simulated data, and experimental results demonstrate their high accuracy and speed in species tree inference, which makes them suitable for analyzing multi-locus data. The second problem this dissertation deals with is reticulation (e.g., horizontal gene transfer, hybridization) detection despite lineage sorting. The phylogeny-based approach compares the evolutionary histories of different genomic regions and test them for incongruence that would indicate hybridization. However, since species tree and gene tree incongruence can also be due to lineage sorting, phylogeny-based hybridization methods might overestimate the amount of hybridization. We present in this dissertation a framework that can handle both hybridization and lineage sorting simultaneously. In this framework, we extend the MDC criterion to phylogenetic networks, and use it to propose a heuristic to detect hybridization despite lineage sorting. Empirical results on a simulated and a yeast data set show its promising performance, as well as several directions for future research. Biology Bioinformatics Computer Science
348	Automated annotation of protein families / Automatiserad annotering av proteinfamiljer Elfving, Eric January 2011 (has links) Introduction: The great challenge in bioinformatics is data integration. The amount of available data is always increasing and there are no common unified standards of where, or how, the data should be stored. The aim of this workis to build an automated tool to annotate the different member families within the protein superfamily of medium-chain dehydrogenases/reductases (MDR), by finding common properties among the member proteins. The goal is to increase the understanding of the MDR superfamily as well as the different member families.This will add to the amount of knowledge gained for free when a new, unannotated, protein is matched as a member to a specific MDR member family. Method: The different types of data available all needed different handling. Textual data was mainly compared as strings while numeric data needed some special handling such as statistical calculations. Ontological data was handled as tree nodes where ancestry between terms had to be considered. This was implemented as a plugin-based system to make the tool easy to extend with additional data sources of different types. Results: The biggest challenge was data incompleteness yielding little (or no) results for some families and thus decreasing the statistical significance of the results. Results show that all the human and mouse MDR members have a Pfam ADH domain (ADH_N and/or ADH_zinc_N) and takes part in an oxidation-reduction process, often with NAD or NADP as cofactor. Many of the proteins contain zinc and are expressed in liver tissue. Conclusions: A python based tool for automatic annotation has been created to annotate the different MDR member families. The tool is easily extendable to be used with new databases and much of the results agrees with information found in literature. The utility and necessity of this system, as well as the quality of its produced results, are expected to only increase over time, even if no additional extensions are produced, as the system itself is able to make further and more detailed inferences as more and more data become available. data integration Bioinformatics Bioinformatik
349	The Philosophy of Bioinformatics Mikhael, Joseph January 2007 (has links) The development of bioinformatics as an influential biological field should interest philosophers of biology and philosophers of science in general. Bioinformatics contributes significantly to the development of biological knowledge using a variety of scientific methods. Particular tools used by bioinformaticists, such as BLAST, phylogenetic tree creation software, and DNA microarrays, will be shown to utilize the scientific methods of extended cognition, analogical reasoning, and representations of mechanisms. Extended cognition is found in bioinformatics through the use of computer databases and algorithms in the representation and development of scientific theories in bioinformatics. Analogical reasoning is found in bioinformatics through particular analogical comparisons that are made between biological sequences and operations. Lastly, scientific theories that are created using certain bioinformatics tools are often representations of mechanisms. These methods are found in other scientific fields, but it will be shown that these methods are expanded in bioinformatics research through the use of computers to make the methods of analogical reasoning and representation of mechanisms more powerful. bioinformatics philosophy Philosophy
350	Finding conserved patterns in biological sequences, networks and genomes Yang, Qingwu 15 May 2009 (has links) Biological patterns are widely used for identifying biologically interesting regions within macromolecules, classifying biological objects, predicting functions and studying evolution. Good pattern finding algorithms will help biologists to formulate and validate hypotheses in an attempt to obtain important insights into the complex mechanisms of living things. In this dissertation, we aim to improve and develop algorithms for five biological pattern finding problems. For the multiple sequence alignment problem, we propose an alternative formulation in which a final alignment is obtained by preserving pairwise alignments specified by edges of a given tree. In contrast with traditional NPhard formulations, our preserving alignment formulation can be solved in polynomial time without using a heuristic, while having very good accuracy. For the path matching problem, we take advantage of the linearity of the query path to reduce the problem to finding a longest weighted path in a directed acyclic graph. We can find k paths with top scores in a network from the query path in polynomial time. As many biological pathways are not linear, our graph matching approach allows a non-linear graph query to be given. Our graph matching formulation overcomes the common weakness of previous approaches that there is no guarantee on the quality of the results. For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparisons of clusters of different sizes. We explore both a restricted version which requires that orthologous genes are strictly ordered within each cluster, and the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome. We solve the first problem in polynomial time and develop practical exact algorithms for the second one. In the gene cluster querying problem, based on a querying strategy, we propose an efficient approach for investigating clustering of related genes across multiple genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial genomes, we show that our algorithm is efficient enough to study gene clusters across hundreds of genomes. biological pattern bioinformatics

Search results