Spelling suggestions: "subject:"bioinformatics"" "subject:"bionformatics""
141 |
Bayesian learning in bioinformaticsGold, David L. 15 May 2009 (has links)
Life sciences research is advancing in breadth and scope, affecting many areas of life
including medical care and government policy. The field of Bioinformatics, in particular,
is growing very rapidly with the help of computer science, statistics, applied
mathematics, and engineering. New high-throughput technologies are making it possible
to measure genomic variation across phenotypes in organisms at costs that were
once inconceivable. In conjunction, and partly as a consequence, massive amounts
of information about the genomes of many organisms are becoming accessible in the
public domain. Some of the important and exciting questions in the post-genomics
era are how to integrate all of the information available from diverse sources.
Learning in complex systems biology requires that information be shared in a natural
and interpretable way, to integrate knowledge and data. The statistical sciences can
support the advancement of learning in Bioinformatics in many ways, not the least
of which is by developing methodologies that can support the synchronization of efforts
across sciences, offering real-time learning tools that can be shared across many
fields from basic science to the clinical applications. This research is an introduction
to several current research problems in Bioinformatics that addresses integration
of information, and discusses statistical methodologies from the Bayesian school of
thought that may be applied. Bayesian statistical methodologies are proposed to integrate biological knowledge and
improve statistical inference for three relevant Bioinformatics applications: gene expression
arrays, BAC and aCGH arrays, and real-time gene expression experiments.
A unified Bayesian model is proposed to perform detection of genes and gene classes,
defined from historical pathways, with gene expression arrays. A novel Bayesian statistical
method is proposed to infer chromosomal copy number aberrations in clinical
populations with BAC or aCGH experiments. A theoretical model is proposed, motivated
from historical work in mathematical biology, for inference with real-time gene
expression experiments, and fit with Bayesian methods. Simulation and case studies
show that Bayesian methodologies show great promise to improve the way we learn
with high-throughput Bioinformatics experiments.
|
142 |
Bayesian methods in bioinformaticsBaladandayuthapani, Veerabhadran 25 April 2007 (has links)
This work is directed towards developing flexible Bayesian statistical methods
in the semi- and nonparamteric regression modeling framework with special focus on
analyzing data from biological and genetic experiments. This dissertation attempts to
solve two such problems in this area. In the first part, we study penalized regression
splines (P-splines), which are low-order basis splines with a penalty to avoid under-
smoothing. Such P-splines are typically not spatially adaptive, and hence can have
trouble when functions are varying rapidly. We model the penalty parameter inherent
in the P-spline method as a heteroscedastic regression function. We develop a full
Bayesian hierarchical structure to do this and use Markov Chain Monte Carlo tech-
niques for drawing random samples from the posterior for inference. We show that
the approach achieves very competitive performance as compared to other methods.
The second part focuses on modeling DNA microarray data. Microarray technology
enables us to monitor the expression levels of thousands of genes simultaneously and
hence to obtain a better picture of the interactions between the genes. In order to
understand the biological structure underlying these gene interactions, we present a
hierarchical nonparametric Bayesian model based on Multivariate Adaptive Regres-sion Splines (MARS) to capture the functional relationship between genes and also
between genes and disease status. The novelty of the approach lies in the attempt to
capture the complex nonlinear dependencies between the genes which could otherwise
be missed by linear approaches. The Bayesian model is flexible enough to identify
significant genes of interest as well as model the functional relationships between the
genes. The effectiveness of the proposed methodology is illustrated on leukemia and
breast cancer datasets.
|
143 |
GPCR-Directed Libraries for High Throughput ScreeningPoudel, Sagar January 2006 (has links)
<p>Guanine nucleotide binding protein (G-protein) coupled receptors (GPCRs), the largest receptor family, is enormously important for the pharmaceutical industry as they are the target of 50-60% of all existing medicines. Discovery of many new GPCR receptors by the “human genome project”, open up new opportunities for developing novel therapeutics. High throughput screening (HTS) of chemical libraries is a well established method for finding new lead compounds in drug discovery. Despite some success this approach has suffered from the near absence of more focused and specific targeted libraries. To improve the hit rates and to maximally exploit the full potential of current corporate screening collections, in this thesis work, identification and analysis of the critical drug-binding positions within the GPCRs were done, based on their overall sequence, their transmembrane regions and their drug binding fingerprints. A proper classification based on drug binding fingerprints on the basis for a successful pharmacophore modelling and virtual screening were done, which facilities in the development of more specific and focused targeted libraries for HTS.</p>
|
144 |
Analysis of transmembrane and globular protein depending on their solvent energyWakadkar, Sachin January 2009 (has links)
<p>The number of experimentally determined protein structures in the protein data bank (PDB) is continuously increasing. The common features like; cellular location, function, topology, primary structure, secondary structure, tertiary structure, domains or fold are used to classify them. Therefore, there are various methods available for classification of proteins. In this work we are attempting an additional method for making appropriate classification, i.e. solvent energy. Solvation is one of the most important properties of macromolecules and biological membranes by which they remain stabilized in different environments. The energy required for solvation can be measured in term of solvent energy. Proteins from similar environments are investigated for similar solvent energy. That is, the solvent energy can be used as a measure to analyze and classify proteins. In this project solvent energy of proteins present in the Protein Data Bank (PDB) was calculated by using Jones’ algorithm. The proteins were classified into two classes; transmembrane and globular. The results of statistical analysis showed that the values of solvent energy obtained for two main classes (globular and transmebrane) were from different sets of populations. Thus, by adopting classification based on solvent energy will definitely help for prediction of cellular placement.</p><p> </p>
|
145 |
A bioinformaticians view on the evolution of smell perceptionAnders, Patrizia January 2006 (has links)
<p>Background:</p><p>The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates.</p><p>Results:</p><p>Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a</p><p>high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods.</p><p>Conclusions:</p><p>The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction.</p>
|
146 |
Integrating Prior Knowledge into the Fitness Function of an Evolutionary Algorithm for Deriving Gene Regulatory NetworksBirkmeier, Bettina January 2006 (has links)
<p>The topic of gene regulation is a major research area in the bioinformatics community. In this thesis prior knowledge from Gene Ontology in the form of templates is integrated into the fitness function of an evolutionary algorithm to predict gene regulatory networks. The resulting multi-objective fitness functions are then tested with MAPK network data taken from KEGG to evaluate their respective performances. The results are presented and analyzed. However, a clear tendency cannot be observed. The results are nevertheless promising and can provide motivation for further research in that direction. Therefore different ideas and approaches are suggested for future work.</p>
|
147 |
Using an ontology to enhance metabolic or signaling pathway comparisions by biological and chemical knowledgePohl, Matin January 2006 (has links)
<p>Motivation:</p><p>As genome-scale efforts are ongoing to investigate metabolic networks of miscellaneous organisms the amount of pathway data is growing. Simultaneously an increasing amount of gene expression data from micro arrays becomes available for reverse engineering, delivering e.g. hypothetical regulatory pathway data. To avoid outgrowing of data and keep control of real new informations the need of analysis tools arises. One vital task is the comparison of pathways for detection of similar functionalities, overlaps, or in case of reverse engineering, detection of known data corroborating a hypothetical pathway. A comparison method using ontological knowledge about molecules and reactions will feature a more biological point of view which graph theoretical approaches missed so far. Such a comparison attempt based on an ontology is described in this report.</p><p>Results:</p><p>An algorithm is introduced that performs a comparison of pathways component by component. The method was performed on two selected databases and the results proved it to be not satisfying using it as stand-alone method. Further development possibilities are suggested and steps toward an integrated method using several approaches are recommended.</p><p>Availability:</p><p>The source code, used database snapshots and pictures can be requested from the author.</p>
|
148 |
Time course simulation replicability of SBML-supporting biochemical network simulation toolsSentausa, Erwin January 2006 (has links)
<p>Background: Modelling and simulation are important tools for understanding biological systems. Numerous modelling and simulation software tools have been developed for integrating knowledge regarding the behaviour of a dynamic biological system described in mathematical form. The Systems Biology Markup Language (SBML) was created as a standard format for exchanging biochemical network models among tools. However, it is not certain yet whether actual usage and exchange of SBML models among the tools of different purpose and interfaces is assessable. Particularly, it is not clear whether dynamic simulations of SBML models using different modelling and simulation packages are replicable.</p><p>Results: Time series simulations of published biological models in SBML format are performed using four modelling and simulation tools which support SBML to evaluate whether the tools correctly replicate the simulation results. Some of the tools do not successfully integrate some models. In the time series output of the successful</p><p>simulations, there are differences between the tools.</p><p>Conclusions: Although SBML is widely supported among biochemical modelling and simulation tools, not all simulators can replicate time-course simulations of SBML models exactly. This incapability of replicating simulation results may harm the peer-review process of biological modelling and simulation activities and should be addressed accordingly, for example by specifying in the SBML model the exact algorithm or simulator used for replicating the simulation result.</p>
|
149 |
A method for extracting pathways from Scansite-predicted protein-protein interactionsSimu, Tiberiu January 2006 (has links)
<p>Protein interaction is an important mechanism for cellular functionality. Predicting protein interactions is available in many cases as computational methods in publicly available resources (for example Scansite). These predictions can be further combined with other information sources to generate hypothetical pathways. However, when using computational methods for building pathways, the process may become time consuming, as it requires multiple iterations and consolidating data from different sources. We have tested whether it is possible to generate graphs of protein-protein interaction by using only domain-motif interaction data and the degree to which it is possible to automate this process by developing a program that is able to aggregate, under user guidance, query results from different information sources. The data sources used are Scansite and SwissProt. Visualisation of the graphs is done with an external program freely available for academic purposes, Osprey. The graphs obtained by running the software show that although it is possible to combine publicly available data and theoretical protein-protein interaction predictions from Scansite, further efforts are needed to increase the biological plausibility of these collections of data. It is possible, however, to reduce the dimensionality of the obtained graphs by focusing the searches on a certain tissue of interest.</p>
|
150 |
A method to identify the non-coding RNA gene for U1 RNA in species in which it has not yet been foundMathew, Sumi January 2007 (has links)
<p>Background</p><p>Non coding RNAs are the RNA molecules that do not code for proteins but play structural, catalytic or regulatory roles in the organisms in which they are found. These RNAs generally conserve their secondary structure more than their primary sequence. It is possible to look for protein coding genes using sequence signals like promoters, terminators, start and stop codons etc. However, this is not the case with non coding RNAs since these signals are weakly conserved in them. Hence the situation with non coding RNAs is more challenging. Therefore a protocol is devised to identify U1 RNA in species not previously known to have it.</p><p>Results</p><p>It is sufficient to use the covariance models to identify non coding RNAs but they are very slow and hence a filtering step is needed before using the covariance models to reduce the search space for identifying these genes. The protocol for identifying U1 RNA genes employs for the filtering a pattern matcher RNABOB that can conduct secondary structure pattern searches. The descriptor for RNABOB is made automatically such that it can also represent the bulges and interior loops in helices of RNA. The protocol is compared with the Rfam and Weinberg & Ruzzo approaches and has been able to identify new U1 RNA homologues in the Apicomplexan group where it has not previously been found.</p><p>Conclusions</p><p>The method has been used to identify the gene for U1 RNA in certain species in which it has not been detected previously. The identified genes may be further analyzed by wet laboratory techniques for the confirmation of their existence.</p><p>4</p>
|
Page generated in 0.063 seconds