Spelling suggestions: "subject:"bioinformatics"" "subject:"ioinformatics""
Escherichia coli proteomics and bioinformaticsNiu, Lili 15 May 2009 (has links)
A lot of things happen to proteins when Escherichia coli cells enter stationary phase, such as protein amount, post-translational modifications, conformation changes, and component of protein complex. Proteomics, which study the whole component of proteins, can be used to study the products of the genome and the physiology of Escherichia coli cells at different conditions. By comparing proteome from different growth phases, such as exponential and stationary phase, a lot of proteins with changes can be identified at the same time, which provides a pilot for further studies of mechanism. Current global proteomic studies have identified about 27% of the annotated proteins of E. coli, most of which are predicted to be abundance proteins. Subproteomics, the study of specific subsets of the proteome, can be used to study specific functional classes of proteins and low abundance proteins. In this dissertation, using non-denatured anion exchange column with 2D SDS-PAGE and tandem mass spectrometry, difference of E. coli cells between exponential and stationary phase were studied for whole soluble proteome. Also, using heparin column and mass spectrometry with tandem mass spectrometry, heparin-binding proteins were identified and analyzed for exponential growth and stationary phases. To manage and display the data generated by proteomics, a web-based database has been constructed for experiments in E. coli proteomics (EEP), which includes NonDeLC, Heparome, AIX/2D PAGE and other proteomic studies.
Bayesian learning in bioinformaticsGold, David L. 15 May 2009 (has links)
Life sciences research is advancing in breadth and scope, affecting many areas of life including medical care and government policy. The field of Bioinformatics, in particular, is growing very rapidly with the help of computer science, statistics, applied mathematics, and engineering. New high-throughput technologies are making it possible to measure genomic variation across phenotypes in organisms at costs that were once inconceivable. In conjunction, and partly as a consequence, massive amounts of information about the genomes of many organisms are becoming accessible in the public domain. Some of the important and exciting questions in the post-genomics era are how to integrate all of the information available from diverse sources. Learning in complex systems biology requires that information be shared in a natural and interpretable way, to integrate knowledge and data. The statistical sciences can support the advancement of learning in Bioinformatics in many ways, not the least of which is by developing methodologies that can support the synchronization of efforts across sciences, offering real-time learning tools that can be shared across many fields from basic science to the clinical applications. This research is an introduction to several current research problems in Bioinformatics that addresses integration of information, and discusses statistical methodologies from the Bayesian school of thought that may be applied. Bayesian statistical methodologies are proposed to integrate biological knowledge and improve statistical inference for three relevant Bioinformatics applications: gene expression arrays, BAC and aCGH arrays, and real-time gene expression experiments. A unified Bayesian model is proposed to perform detection of genes and gene classes, defined from historical pathways, with gene expression arrays. A novel Bayesian statistical method is proposed to infer chromosomal copy number aberrations in clinical populations with BAC or aCGH experiments. A theoretical model is proposed, motivated from historical work in mathematical biology, for inference with real-time gene expression experiments, and fit with Bayesian methods. Simulation and case studies show that Bayesian methodologies show great promise to improve the way we learn with high-throughput Bioinformatics experiments.
Bayesian methods in bioinformaticsBaladandayuthapani, Veerabhadran 25 April 2007 (has links)
This work is directed towards developing flexible Bayesian statistical methods in the semi- and nonparamteric regression modeling framework with special focus on analyzing data from biological and genetic experiments. This dissertation attempts to solve two such problems in this area. In the first part, we study penalized regression splines (P-splines), which are low-order basis splines with a penalty to avoid under- smoothing. Such P-splines are typically not spatially adaptive, and hence can have trouble when functions are varying rapidly. We model the penalty parameter inherent in the P-spline method as a heteroscedastic regression function. We develop a full Bayesian hierarchical structure to do this and use Markov Chain Monte Carlo tech- niques for drawing random samples from the posterior for inference. We show that the approach achieves very competitive performance as compared to other methods. The second part focuses on modeling DNA microarray data. Microarray technology enables us to monitor the expression levels of thousands of genes simultaneously and hence to obtain a better picture of the interactions between the genes. In order to understand the biological structure underlying these gene interactions, we present a hierarchical nonparametric Bayesian model based on Multivariate Adaptive Regres-sion Splines (MARS) to capture the functional relationship between genes and also between genes and disease status. The novelty of the approach lies in the attempt to capture the complex nonlinear dependencies between the genes which could otherwise be missed by linear approaches. The Bayesian model is flexible enough to identify significant genes of interest as well as model the functional relationships between the genes. The effectiveness of the proposed methodology is illustrated on leukemia and breast cancer datasets.
GPCR-Directed Libraries for High Throughput ScreeningPoudel, Sagar January 2006 (has links)
<p>Guanine nucleotide binding protein (G-protein) coupled receptors (GPCRs), the largest receptor family, is enormously important for the pharmaceutical industry as they are the target of 50-60% of all existing medicines. Discovery of many new GPCR receptors by the “human genome project”, open up new opportunities for developing novel therapeutics. High throughput screening (HTS) of chemical libraries is a well established method for finding new lead compounds in drug discovery. Despite some success this approach has suffered from the near absence of more focused and specific targeted libraries. To improve the hit rates and to maximally exploit the full potential of current corporate screening collections, in this thesis work, identification and analysis of the critical drug-binding positions within the GPCRs were done, based on their overall sequence, their transmembrane regions and their drug binding fingerprints. A proper classification based on drug binding fingerprints on the basis for a successful pharmacophore modelling and virtual screening were done, which facilities in the development of more specific and focused targeted libraries for HTS.</p>
Analysis of transmembrane and globular protein depending on their solvent energyWakadkar, Sachin January 2009 (has links)
<p>The number of experimentally determined protein structures in the protein data bank (PDB) is continuously increasing. The common features like; cellular location, function, topology, primary structure, secondary structure, tertiary structure, domains or fold are used to classify them. Therefore, there are various methods available for classification of proteins. In this work we are attempting an additional method for making appropriate classification, i.e. solvent energy. Solvation is one of the most important properties of macromolecules and biological membranes by which they remain stabilized in different environments. The energy required for solvation can be measured in term of solvent energy. Proteins from similar environments are investigated for similar solvent energy. That is, the solvent energy can be used as a measure to analyze and classify proteins. In this project solvent energy of proteins present in the Protein Data Bank (PDB) was calculated by using Jones’ algorithm. The proteins were classified into two classes; transmembrane and globular. The results of statistical analysis showed that the values of solvent energy obtained for two main classes (globular and transmebrane) were from different sets of populations. Thus, by adopting classification based on solvent energy will definitely help for prediction of cellular placement.</p><p> </p>
A bioinformaticians view on the evolution of smell perceptionAnders, Patrizia January 2006 (has links)
<p>Background:</p><p>The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates.</p><p>Results:</p><p>Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a</p><p>high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods.</p><p>Conclusions:</p><p>The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction.</p>
Integrating Prior Knowledge into the Fitness Function of an Evolutionary Algorithm for Deriving Gene Regulatory NetworksBirkmeier, Bettina January 2006 (has links)
<p>The topic of gene regulation is a major research area in the bioinformatics community. In this thesis prior knowledge from Gene Ontology in the form of templates is integrated into the fitness function of an evolutionary algorithm to predict gene regulatory networks. The resulting multi-objective fitness functions are then tested with MAPK network data taken from KEGG to evaluate their respective performances. The results are presented and analyzed. However, a clear tendency cannot be observed. The results are nevertheless promising and can provide motivation for further research in that direction. Therefore different ideas and approaches are suggested for future work.</p>
Using an ontology to enhance metabolic or signaling pathway comparisions by biological and chemical knowledgePohl, Matin January 2006 (has links)
<p>Motivation:</p><p>As genome-scale efforts are ongoing to investigate metabolic networks of miscellaneous organisms the amount of pathway data is growing. Simultaneously an increasing amount of gene expression data from micro arrays becomes available for reverse engineering, delivering e.g. hypothetical regulatory pathway data. To avoid outgrowing of data and keep control of real new informations the need of analysis tools arises. One vital task is the comparison of pathways for detection of similar functionalities, overlaps, or in case of reverse engineering, detection of known data corroborating a hypothetical pathway. A comparison method using ontological knowledge about molecules and reactions will feature a more biological point of view which graph theoretical approaches missed so far. Such a comparison attempt based on an ontology is described in this report.</p><p>Results:</p><p>An algorithm is introduced that performs a comparison of pathways component by component. The method was performed on two selected databases and the results proved it to be not satisfying using it as stand-alone method. Further development possibilities are suggested and steps toward an integrated method using several approaches are recommended.</p><p>Availability:</p><p>The source code, used database snapshots and pictures can be requested from the author.</p>
Time course simulation replicability of SBML-supporting biochemical network simulation toolsSentausa, Erwin January 2006 (has links)
<p>Background: Modelling and simulation are important tools for understanding biological systems. Numerous modelling and simulation software tools have been developed for integrating knowledge regarding the behaviour of a dynamic biological system described in mathematical form. The Systems Biology Markup Language (SBML) was created as a standard format for exchanging biochemical network models among tools. However, it is not certain yet whether actual usage and exchange of SBML models among the tools of different purpose and interfaces is assessable. Particularly, it is not clear whether dynamic simulations of SBML models using different modelling and simulation packages are replicable.</p><p>Results: Time series simulations of published biological models in SBML format are performed using four modelling and simulation tools which support SBML to evaluate whether the tools correctly replicate the simulation results. Some of the tools do not successfully integrate some models. In the time series output of the successful</p><p>simulations, there are differences between the tools.</p><p>Conclusions: Although SBML is widely supported among biochemical modelling and simulation tools, not all simulators can replicate time-course simulations of SBML models exactly. This incapability of replicating simulation results may harm the peer-review process of biological modelling and simulation activities and should be addressed accordingly, for example by specifying in the SBML model the exact algorithm or simulator used for replicating the simulation result.</p>
A method for extracting pathways from Scansite-predicted protein-protein interactionsSimu, Tiberiu January 2006 (has links)
<p>Protein interaction is an important mechanism for cellular functionality. Predicting protein interactions is available in many cases as computational methods in publicly available resources (for example Scansite). These predictions can be further combined with other information sources to generate hypothetical pathways. However, when using computational methods for building pathways, the process may become time consuming, as it requires multiple iterations and consolidating data from different sources. We have tested whether it is possible to generate graphs of protein-protein interaction by using only domain-motif interaction data and the degree to which it is possible to automate this process by developing a program that is able to aggregate, under user guidance, query results from different information sources. The data sources used are Scansite and SwissProt. Visualisation of the graphs is done with an external program freely available for academic purposes, Osprey. The graphs obtained by running the software show that although it is possible to combine publicly available data and theoretical protein-protein interaction predictions from Scansite, further efforts are needed to increase the biological plausibility of these collections of data. It is possible, however, to reduce the dimensionality of the obtained graphs by focusing the searches on a certain tissue of interest.</p>
Page generated in 0.0979 seconds