Global ETD Search

1	Finding functional groups of genes using pairwise relational data : methods and applications Brumm, Jochen 05 1900 (has links) Genes, the fundamental building blocks of life, act together (often through their derived proteins) in modules such as protein complexes and molecular pathways to achieve a cellular function such as DNA repair and cellular transport. A current emphasis in genomics research is to identify gene modules from gene profiles, which are measurements (such as a mutant phenotype or an expression level), associated with the individual genes under conditions of interest; genes in modules often have similar gene profiles. Clustering groups of genes with similar profiles can hence deliver candidate gene modules. Pairwise similarity measures derived from these profiles are used as input to the popular hierarchical agglomerative clustering algorithms; however, these algorithms offer little guidance on how to choose candidate modules and how to improve a clustering as new data becomes available. As an alternative, there are methods based on thresholding the similarity values to obtain a graph; such a graph can be analyzed through (probabilistic) methods developed in the social sciences. However, thresholding the data discards valuable information and choosing the threshold is difficult. Extending binary relational analysis, we exploit ranked relational data as the basis for two distinct approaches for identifying modules from genomic data, both based on the theory of random graph processes. We propose probabilistic models for ranked relational data that allow candidate modules to be accompanied by objective confidence scores and that permit an elegant integration of external information on gene-gene relationships. We first followed theoretical work by Ling to objectively select exceptionally isolated groups as candidate gene modules. Secondly, inspired by stochastic block models used in the social sciences, we construct a novel model for ranked relational data, where all genes have hidden module parameters which govern the strength of all gene-gene relationships. Adapting a classical likelihood often used for the analysis of horse races, clustering is performed by estimating the module parameters using standard Bayesian methods. The method allows the incorporation of prior information on gene-gene relationships; the utility of using prior information in the form of protein-protein interaction data in clustering of yeast mutant phenotype profiles is demonstrated. Statistics Genomics Bayesian methods
2	Finding functional groups of genes using pairwise relational data : methods and applications Brumm, Jochen 05 1900 (has links) Genes, the fundamental building blocks of life, act together (often through their derived proteins) in modules such as protein complexes and molecular pathways to achieve a cellular function such as DNA repair and cellular transport. A current emphasis in genomics research is to identify gene modules from gene profiles, which are measurements (such as a mutant phenotype or an expression level), associated with the individual genes under conditions of interest; genes in modules often have similar gene profiles. Clustering groups of genes with similar profiles can hence deliver candidate gene modules. Pairwise similarity measures derived from these profiles are used as input to the popular hierarchical agglomerative clustering algorithms; however, these algorithms offer little guidance on how to choose candidate modules and how to improve a clustering as new data becomes available. As an alternative, there are methods based on thresholding the similarity values to obtain a graph; such a graph can be analyzed through (probabilistic) methods developed in the social sciences. However, thresholding the data discards valuable information and choosing the threshold is difficult. Extending binary relational analysis, we exploit ranked relational data as the basis for two distinct approaches for identifying modules from genomic data, both based on the theory of random graph processes. We propose probabilistic models for ranked relational data that allow candidate modules to be accompanied by objective confidence scores and that permit an elegant integration of external information on gene-gene relationships. We first followed theoretical work by Ling to objectively select exceptionally isolated groups as candidate gene modules. Secondly, inspired by stochastic block models used in the social sciences, we construct a novel model for ranked relational data, where all genes have hidden module parameters which govern the strength of all gene-gene relationships. Adapting a classical likelihood often used for the analysis of horse races, clustering is performed by estimating the module parameters using standard Bayesian methods. The method allows the incorporation of prior information on gene-gene relationships; the utility of using prior information in the form of protein-protein interaction data in clustering of yeast mutant phenotype profiles is demonstrated. Statistics Genomics Bayesian methods
3	Finding functional groups of genes using pairwise relational data : methods and applications Brumm, Jochen 05 1900 (has links) Genes, the fundamental building blocks of life, act together (often through their derived proteins) in modules such as protein complexes and molecular pathways to achieve a cellular function such as DNA repair and cellular transport. A current emphasis in genomics research is to identify gene modules from gene profiles, which are measurements (such as a mutant phenotype or an expression level), associated with the individual genes under conditions of interest; genes in modules often have similar gene profiles. Clustering groups of genes with similar profiles can hence deliver candidate gene modules. Pairwise similarity measures derived from these profiles are used as input to the popular hierarchical agglomerative clustering algorithms; however, these algorithms offer little guidance on how to choose candidate modules and how to improve a clustering as new data becomes available. As an alternative, there are methods based on thresholding the similarity values to obtain a graph; such a graph can be analyzed through (probabilistic) methods developed in the social sciences. However, thresholding the data discards valuable information and choosing the threshold is difficult. Extending binary relational analysis, we exploit ranked relational data as the basis for two distinct approaches for identifying modules from genomic data, both based on the theory of random graph processes. We propose probabilistic models for ranked relational data that allow candidate modules to be accompanied by objective confidence scores and that permit an elegant integration of external information on gene-gene relationships. We first followed theoretical work by Ling to objectively select exceptionally isolated groups as candidate gene modules. Secondly, inspired by stochastic block models used in the social sciences, we construct a novel model for ranked relational data, where all genes have hidden module parameters which govern the strength of all gene-gene relationships. Adapting a classical likelihood often used for the analysis of horse races, clustering is performed by estimating the module parameters using standard Bayesian methods. The method allows the incorporation of prior information on gene-gene relationships; the utility of using prior information in the form of protein-protein interaction data in clustering of yeast mutant phenotype profiles is demonstrated. / Science, Faculty of / Statistics, Department of / Graduate Statistics Genomics Bayesian methods
4	Bayesian hierarchical models for linear networks Al-Kaabawi, Zainab A. A. January 2018 (has links) A motorway network is handled as a linear network. The purpose of this study is to highlight dangerous motorways via estimating the intensity of accidents and study its pattern across the UK motorway network. Two mechanisms have been adopted to achieve this aim. The first, the motorway-specific intensity is estimated by modelling the point pattern of the accident data using a homogeneous Poisson process. The homogeneous Poisson process is used to model all intensities but heterogeneity across motorways is incorporated using two-level hierarchical models. The data structure is multilevel since each motorway consists of junctions that are joined by grouped segments. In the second mechanism, the segment-specific intensity is estimated by modelling the point pattern of the accident data. The homogeneous Poisson process is used to model accident data within segments but heterogeneity across segments is incorporated using three-level hierarchical models. A Bayesian method via Markov Chain Monte Carlo simulation algorithms is used in order to estimate the unknown parameters in the models and a sensitivity analysis to the prior choice is assessed. The performance of the proposed models is checked through a simulation study and an application to traffic accidents in 2016 on the UK motorway network. The performance of the three-level frequentist model was poor. The deviance information criterion (DIC) and the widely applicable information criterion (WAIC) are employed to choose between the two-level Bayesian hierarchical model and the three-level Bayesian hierarchical model, where the results showed that the best fitting model was the three-level Bayesian hierarchical model.
5	Bayesian mediation analysis for partially clustered designs Chu, Yiyi 05 December 2013 (has links) Partially clustered design is common in medicine, social sciences, intervention and psychological research. With some participants clustered and others not, the structure of partially clustering data is not parallel. Despite its common occurrence in practice, limited attention has been given regarding the evaluation of intervention effects in partially clustered data. Mediation analysis is used to identify the mechanism underlying the relationship between an independent variable and a dependent variable via a mediator variable. While most of the literature is focused on conventional frequentist mediation models, no research has studied a Bayesian mediation model in the context of a partially clustered design yet. Therefore, the primary objectives of this paper are to address conceptual considerations in estimating the mediation effects in the partially clustered randomized designs, and to examine the performances of the proposed model using both simulated data and real data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K). A small-scale simulation study was also conducted and the results indicate that under large sample sizes, negligible relative parameter bias was found in the Bayesian estimates of the indirect effects and of covariance between the components of the indirect effect. Coverage rates for the 95% credible interval for these two estimates were found to be close to the nominal level. These results supported use of the proposed Bayesian model for partially clustered mediation in conditions when the sample size is moderately large. / text Mediation analysis Bayesian methods Partially clustered designs
6	Bayesian methods to improve the assessment and management advice of anchovy in the Bay of Biscay Contreras, Leire Ibaibarriaga January 2012 (has links) No description available. 338.3
7	Understanding and predicting global leaf phenology using satellite observations of vegetation Caldararu, Silvia January 2013 (has links) Leaf phenology refers to the timing of leaf life cycle events and is essential to our understanding of the earth system as it impacts the terrestrial carbon and water cycles and indirectly global climate through changes in surface roughness and albedo. Traditionally, leaf phenology is described as a response to higher temperatures in spring and lower temperatures in autumn for temperate regions. With the advent of carbon ecosystem models however, we need a better representation of seasonal cycles, one that is able to explain phenology in different areas around the globe, including tropical regions, and has the capacity to predict phenology under future climates. We propose a global phenology model based on the hypothesis that phenology is a strategy through which plants reach optimal carbon assimilation. We fit this 14 parameter model to five years of space borne data of leaf area index using a Bayesian fitting algorithm and we use it to simulate leaf seasonal cycles across the globe. We explain the observed increase in leaf area over the Amazon basin during the dry season through an increase in available direct solar radiation. Seasonal cycles in dry tropical areas are explained by the variation in water availability, while phenology at higher latitudes is driven by changes in temperature and daylength. We explore the hypothesis that phenological traits can be explained at the biome (plant functional group) level and we show that some characteristics can only be explained at the species level due to local factors such as water and nutrient availability. We anticipate that our work can be incorporated into larger earth system models and used to predict future phenological patterns. 578.4
8	Bayesian methods in music modelling Peeling, Paul Halliday January 2011 (has links) This thesis presents several hierarchical generative Bayesian models of musical signals designed to improve the accuracy of existing multiple pitch detection systems and other musical signal processing applications whilst remaining feasible for real-time computation. At the lowest level the signal is modelled as a set of overlapping sinusoidal basis functions. The parameters of these basis functions are built into a prior framework based on principles known from musical theory and the physics of musical instruments. The model of a musical note optionally includes phenomena such as frequency and amplitude modulations, damping, volume, timbre and inharmonicity. The occurrence of note onsets in a performance of a piece of music is controlled by an underlying tempo process and the alignment of the timings to the underlying score of the music. A variety of applications are presented for these models under differing inference constraints. Where full Bayesian inference is possible, reversible-jump Markov Chain Monte Carlo is employed to estimate the number of notes and partial frequency components in each frame of music. We also use approximate techniques such as model selection criteria and variational Bayes methods for inference in situations where computation time is limited or the amount of data to be processed is large. For the higher level score parameters, greedy search and conditional modes algorithms are found to be sufficiently accurate. We emphasize the links between the models and inference algorithms developed in this thesis with that in existing and parallel work, and demonstrate the effects of making modifications to these models both theoretically and by means of experimental results. 780
9	Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure 2015 September 1900 (has links) Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the genes that are most related to a certain disease (e.g., cancer) from high-dimensional gene expression profiles. There are tremendous difficulties in eliminating a large number of useless or redundant features. The expression levels of genes have structure; for example, a group of co-regulated genes that have similar biological functions tend to have similar mRNA expression levels. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection and regression, including Group LASSO, Supervised Group LASSO, and regression on group representatives. In this thesis, we propose to use a sophisticated Markov chain Monte Carlo method (Hamiltonian Monte Carlo with restricted Gibbs sampling) to fit T-probit regression with heavy-tailed priors to make selection in the features with grouping structure. We will refer to this method as fully Bayesian T-probit. The main feature of fully Bayesian T-probit is that it can make feature selection within groups automatically without a pre-specification of the grouping structure and more efficiently discard noise features than LASSO (Least Absolute Shrinkage and Selection Operator). Therefore, the feature subsets selected by fully Bayesian T-probit are significantly more sparse than subsets selected by many other methods in the literature. Such succinct feature subsets are much easier to interpret or understand based on existing biological knowledge and further experimental investigations. In this thesis, we use simulated and real datasets to demonstrate that the predictive performances of the more sparse feature subsets selected by fully Bayesian T-probit are comparable with the much larger feature subsets selected by plain LASSO, Group LASSO, Supervised Group LASSO, random forest, penalized logistic regression and t-test. In addition, we demonstrate that the succinct feature subsets selected by fully Bayesian T-probit have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods. Bayesian methods probit MCMC gene expression data grouping structure
10	Bayesian Spatial Quantile Regression. Reich, BJ, Fuentes, M, Dunson, DB 03 1900 (has links) Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997-2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. / Dissertation Climate change Ozone Semiparametric Bayesian methods Spatial data

Search results