Global ETD Search

1	Bayes linear covariance matrix adjustment Wilkinson, Darren James January 1995 (has links) In this thesis, a Bayes linear methodology for the adjustment of covariance matrices is presented and discussed. A geometric framework for quantifying uncertainties about covariance matrices is set up, and an inner-product for spaces of random matrices is motivated and constructed. The inner-product on this space captures aspects of belief about the relationships between covariance matrices of interest, providing a structure rich enough to adjust beliefs about unknown matrices in the light of data such as sample covariance matrices, exploiting second-order exchangeability and related specifications to obtain representations allowing analysis. Adjustment is associated with orthogonal projection, and illustrated by examples for some common problems. The difficulties of adjusting the covariance matrices underlying exchangeable random vectors is tackled and discussed. Learning about the covariance matrices associated with multivariate time series dynamic linear models is shown to be amenable to a similar approach. Diagnostics for matrix adjustments are also discussed. 519.5
2	RVD2: An ultra-sensitive variant detection model for low-depth heterogeneous next-generation sequencing data He, Yuting 29 April 2014 (has links) Motivation: Next-generation sequencing technology is increasingly being used for clinical diagnostic tests. Unlike research cell lines, clinical samples are often genomically heterogeneous due to low sample purity or the presence of genetic subpopulations. Therefore, a variant calling algorithm for calling low-frequency polymorphisms in heterogeneous samples is needed. Result: We present a novel variant calling algorithm that uses a hierarchical Bayesian model to estimate allele frequency and call variants in heterogeneous samples. We show that our algorithm improves upon current classifiers and has higher sensitivity and specificity over a wide range of median read depth and minor allele frequency. We apply our model and identify twelve mutations in the PAXP1 gene in a matched clinical breast ductal carcinoma tumor sample; two of which are loss-of-heterozygosity events. variant detection Bayesian statistics graphical
3	Bayesian Information Fusion for Precision Indoor Location Cavanaugh, Andrew F 07 February 2011 (has links) This thesis documents work which is part of the ongoing effort by the Worcester Polytechnic Institute (WPI) Precision Personnel Locator (PPL) project, to track and locate first responders in urban/indoor settings. Specifically, the project intends to produce a system which can accurately determine the floor that a person is on, as well as where on the floor that person is, with sub-meter accuracy. The system must be portable, rugged, fast to set up, and require no pre-installed infrastructure. Several recent advances have enabled us to get closer to meeting these goals: The development of Transactional Array Reconciliation Tomography(TART) algorithm, and corresponding locator hardware, as well as the integration of barometric sensors, and a new antenna deployment scheme. To fully utilize these new capabilities, a Bayesian Fusion algorithm has been designed. The goal of this thesis is to present the necessary methods for incorporating diverse sources of information, in a constructive manner, to improve the performance of the PPL system. While the conceptual methods presented within are meant to be general, the experimental results will focus on the fusion of barometric height estimates and RF data. These information sources will be processed with our existing Singular Value Array Reconciliation Tomography (σART), and the new TART algorithm, using a Bayesian Fusion algorithm to more accurately estimate indoor locations. signal processing location tracking bayesian statistics
4	Bayesian Logistic Regression with Spatial Correlation: An Application to Tennessee River Pollution Marjerison, William M 15 December 2006 (has links) "We analyze data (length, weight and location) from a study done by the Army Corps of Engineers along the Tennessee River basin in the summer of 1980. The purpose is to predict the probability that a hypothetical channel catfish at a location studied is toxic and contains 5 ppm or more DDT in its filet. We incorporate spatial information and treate it separetely from other covariates. Ultimately, we want to predict the probability that a catfish from the unobserved location is toxic. In a preliminary analysis, we examine the data for observed locations using frequentist logistic regression, Bayesian logistic regression, and Bayesian logistic regression with random effects. Later we develop a parsimonious extension of Bayesian logistic regression and the corresponding Gibbs sampler for that model to increase computational feasibility and reduce model parameters. Furthermore, we develop a Bayesian model to impute data for locations where catfish were not observed. A comparison is made between results obtained fitting the model to only observed data and data with missing values imputed. Lastly, a complete model is presented which imputes data for missing locations and calculates the probability that a catfish from the unobserved location is toxic at once. We conclude that length and weight of the fish have negligible effect on toxicity. Toxicity of these catfish are mostly explained by location and spatial effects. In particular, the probability that a catfish is toxic decreases as one moves further downstream from the source of pollution." logistic regression Bayesian statistics MCMC spatial statistics
5	An investigation of a Bayesian decision-theoretic procedure in the context of mastery tests Hsieh, Ming-Chuan 01 January 2007 (has links) The purpose of this study was to extend Glas and Vos's (1998) Bayesian procedure to the 3PL IRT model by using the MCMC method. In the context of fixed-length mastery tests, the Bayesian decision-theoretic procedure was compared with two conventional procedures (conventional- Proportion Correct and conventional- EAP) across different simulation conditions. Several simulation conditions were investigated, including two loss functions (linear and threshold loss function), three item pools (high discrimination, moderate discrimination and real item pool) and three test lengths (20, 40 and 60). Different loss parameters were manipulated in the Bayesian decision-theoretic procedure to examine the effectiveness of controlling false positive and false negative errors. The degree of decision accuracy for the Bayesian decision-theoretic procedure using both the 3PL and 1PL models was also compared. Four criteria, including the percentages of correct classifications, false positive error rates, false negative error rates, and phi correlations between the true and observed classification status, were used to evaluate the results of this study. According to these criteria, the Bayesian decision-theoretic procedure appeared to effectively control false negative and false positive error rates. The differences in the percentages of correct classifications and phi correlations between true and predicted status for the Bayesian decision-theoretic procedures and conventional procedures were quite small. The results also showed that there was no consistent advantage for either the linear or threshold loss function. In relation to the four criteria used in this study, the values produced by these two loss functions were very similar. One of the purposes of this study was to extend the Bayesian procedure from the 1PL to the 3PL model. The results showed that when the datasets were simulated to fit the 3PL model, using the 1PL model in the Bayesian procedure yielded less accurate results. However, when the datasets were simulated to fit the 1PL model, using the 3PL model in the Bayesian procedure yielded reasonable classification accuracies in most cases. Thus, the use of the Bayesian decision-theoretic procedure with the 3PL model seemed quite promising in the context of fixed-length mastery tests. Mastery tests;Bayesian Statistics Loss Function Education
6	Municipal-level estimates of child mortality for Brazil : a new approach using Bayesian statistics McKinnon, Sarah Ann 14 December 2010 (has links) Current efforts to measure child mortality for municipalities in Brazil are hampered by the relative rarity of child deaths, which often results in unstable and unreliable estimates. As a result, it is not possible to accurately assess true levels of child mortality for many areas, hindering efforts towards constructing and implementing effective policy initiatives for the reduction of child mortality. However, with a spatial smoothing process based upon Bayesian Statistics it is possible to “borrow” information from neighboring areas in order to generate more stable and accurate estimates of mortality in smaller areas. The objective of this study is to use this spatial smoothing process to derive estimates of child mortality at the level of the municipality in Brazil. Using data from the 2000 Brazil Census, I derive both Bayesian and non-Bayesian estimates of mortality for each municipality. In comparing the smoothed and raw estimates of this parameter, I find that the Bayesian estimates yield a clearer spatial pattern of child mortality with smaller variances in less populated municipalities, thus, more accurately reflecting the true mortality situation of those municipalities. These estimates can then be used, ultimately, to lead to more effective policies and health initiatives in the fight for the reduction of child mortality in Brazil. / text Child mortality Brazil Bayesian statistics Spatial smoothing
7	Bayesian inference for models with infinite-dimensionally generated intractable components Villalobos, Isadora Antoniano January 2012 (has links) No description available. 519.5
8	Bayesian matrix factorisation : inference, priors, and data integration Brouwer, Thomas Alexander January 2017 (has links) In recent years the amount of biological data has increased exponentially. Most of these data can be represented as matrices relating two different entity types, such as drug-target interactions (relating drugs to protein targets), gene expression profiles (relating drugs or cell lines to genes), and drug sensitivity values (relating drugs to cell lines). Not only the size of these datasets is increasing, but also the number of different entity types that they relate. Furthermore, not all values in these datasets are typically observed, and some are very sparse. Matrix factorisation is a popular group of methods that can be used to analyse these matrices. The idea is that each matrix can be decomposed into two or more smaller matrices, such that their product approximates the original one. This factorisation of the data reveals patterns in the matrix, and gives us a lower-dimensional representation. Not only can we use this technique to identify clusters and other biological signals, we can also predict the unobserved entries, allowing us to prune biological experiments. In this thesis we introduce and explore several Bayesian matrix factorisation models, focusing on how to best use them for predicting these missing values in biological datasets. Our main hypothesis is that matrix factorisation methods, and in particular Bayesian variants, are an extremely powerful paradigm for predicting values in biological datasets, as well as other applications, and especially for sparse and noisy data. We demonstrate the competitiveness of these approaches compared to other state-of-the-art methods, and explore the conditions under which they perform the best. We consider several aspects of the Bayesian approach to matrix factorisation. Firstly, the effect of inference approaches that are used to find the factorisation on predictive performance. Secondly, we identify different likelihood and Bayesian prior choices that we can use for these models, and explore when they are most appropriate. Finally, we introduce a Bayesian matrix factorisation model that can be used to integrate multiple biological datasets, and hence improve predictions. This model hybridly combines different matrix factorisation models and Bayesian priors. Through these models and experiments we support our hypothesis and provide novel insights into the best ways to use Bayesian matrix factorisation methods for predictive purposes.
9	Uncertainty in inverse elasticity problems Gendin, Daniel I. 27 September 2021 (has links) The non-invasive differential diagnosis of breast masses through ultrasound imaging motivates the following class of elastic inverse problems: Given one or more measurements of the displacement field within an elastic material, determine the material property distribution within the material. This thesis is focused on uncertainty quantification in inverse problem solutions, with application to inverse problems in linear and nonlinear elasticity. We consider the inverse nonlinear elasticity problem in the context of Bayesian statistics. We show the well-known result that computing the Maximum A Posteriori (MAP) estimate is consistent with previous optimization formulations of the inverse elasticity problem. We show further that certainty in this estimate may be quantified using concepts from information theory, specifically, information gain as measured by the Kullback-Leibler (K-L) divergence and mutual information. A particular challenge in this context is the computational expense associated with computing these quantities. A key contribution of this work is a novel approach that exploits the mathematical structure of the inverse problem and properties of conjugate gradient method to make these calculations feasible. A focus of this work is estimating the spatial distribution of the elastic nonlinearity of a material. Measurement sensitivity to the nonlinearity is much higher for large (finite) strains than for smaller strains, and so large strains tend to be used for such measurements. Measurements of larger deformations, however, tend to show greater levels of noise. A key finding of this work is that, when identifying nonlinear elastic properties, information gain can be used to characterize a trade-off between larger strains with higher noise levels and smaller strains with lower noise levels. These results can be used to inform experimental design. An approach often used to estimate both linear and nonlinear elastic property distributions is to do so sequentially: Use a small strain deformation to estimate the linear properties, and a large strain deformation to estimate the nonlinearity. A key finding of this work is that accurate characterization of the joint posterior probability distribution over both linear and nonlinear elastic parameters requires that the estimates be performed jointly rather than sequentially. All the methods described above are demonstrated in applications to problems in elasticity for both simulated data as well as clinically measured data (obtained in vivo). In the context of the clinical data, we evaluate repeatability of measurements and parameter reconstructions in a clinical setting. Biomechanics Bayesian statistics Elastography Inverse problems
10	Network-based methods to identify mechanisms of action in disease and drug perturbation profiles using high-throughput genomic data Pham, Lisa M. 24 June 2024 (has links) In the past decade it has become increasingly clear that a biological response is rarely caused by a single gene or protein. Rather, it is a result of a myriad of biological factors, constituting a systematic network of biological variables that span multiple granularities of biology from gene transcription to cell metabolism. Therefore it has become a significant challenge in the field of bioinformatics to integrate different levels of biology and to think of biological problems from a network perspective. In my thesis, I will discuss three projects that address this challenge. First, I will introduce two novel methods that integrate quantitative and qualitative biological data in a network approach. My aim in chapters two and three is to combine high-throughput data with biological databases to identify the causal mechanisms of action (MoA), in the form of canonical biological pathways, underlying the data for a given phenotype. In the second chapter, I will introduce an algorithm called Latent Pathway Identification Analysis (LPIA). This algorithm looks for statistically significant evidence of dysregulation in a network of pathways constructed in a manner that explicitly links pathways through their common function in the cell. In chapter three, I will introduce a new method that focuses on the identification of perturbed pathways from high-throughput gene expression data, which we approach as a task in statistical modeling and inference. We develop a two-level statistical model, where (i) the first level captures the relationship between high-throughput gene expression and biological pathways, and (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation. In the fourth chapter, I will focus on the integration of high throughput data on two distinct levels of biology to elucidate associations and causal relationships amongst genotype, gene expression and glycemic traits relevant to Type 2 Diabetes. I use the Framingham heart study as well as its extension, the SABRe initiative, to identify genes whose expression may be causally linked to fasting glucose. Bioinformatics Bayesian statistics Genomics GWAS Microarray Network

Search results