Global ETD Search

311	Methods in Hypothesis Testing, Markov Chain Monte Carlo and Neuroimaging Data Analysis Xu, Xiaojin 25 September 2013 (has links) This thesis presents three distinct topics: a modified K-S test for autocorrelated data, improving MCMC convergence rate with residual augmentations, and resting state fMRI data analysis. In Chapter 1, we present a modified K-S test to adjust for sample autocorrelation. We first demonstrate that the original K-S test does not have the nominal type one error rate when applied to autocorrelated samples. Then the notion of mixing conditions and Billingsley's theorem are reviewed. Based on these results, we suggest an effective sample size formula to adjust sample autocorrelation. Extensive simulation studies are presented to demonstrate that this modified K-S test has the nominal type one error as well as reasonable power for various autocorrelated samples. An application to an fMRI data set is presented in the end. In Chapter 2 of this thesis, we present the work on MCMC sampling. Inspired by a toy example of random effect model, we find there are two ways to boost the efficiency of MCMC algorithms: direct and indirect residual augmentations. We first report theoretical investigations under a class of normal/independece models, where we find an intriguing phase transition type of phenomenon. Then we present an application of the direct residual augmentations to the probit regression, where we also include a numerical comparison with other existing algorithms. / Statistics Statistics
312	Advances in Empirical Bayes Modeling and Bayesian Computation Stein, Nathan Mathes 14 August 2013 (has links) Chapter 1 of this thesis focuses on accelerating perfect sampling algorithms for a Bayesian hierarchical model. A discrete data augmentation scheme together with two different parameterizations yields two Gibbs samplers for sampling from the posterior distribution of the hyperparameters of the Dirichlet-multinomial hierarchical model under a default prior distribution. The finite-state space nature of this data augmentation permits us to construct two perfect samplers using bounding chains that take advantage of monotonicity and anti-monotonicity in the target posterior distribution, but both are impractically slow. We demonstrate however that a composite algorithm that strategically alternates between the two samplers' updates can be substantially faster than either individually. We theoretically bound the expected time until coalescence for the composite algorithm, and show via simulation that the theoretical bounds can be close to actual performance. Chapters 2 and 3 introduce a strategy for constructing scientifically sensible priors in complex models. We call these priors catalytic priors to suggest that adding such prior information catalyzes our ability to use richer, more realistic models. Because they depend on observed data, catalytic priors are a tool for empirical Bayes modeling. The overall perspective is data-driven: catalytic priors have a pseudo-data interpretation, and the building blocks are alternative plausible models for observations, yielding behavior similar to hierarchical models but with a conceptual shift away from distributional assumptions on parameters. The posterior under a catalytic prior can be viewed as an optimal approximation to a target measure, subject to a constraint on the posterior distribution's predictive implications. In Chapter 3, we apply catalytic priors to several familiar models and investigate the performance of the resulting posterior distributions. We also illustrate the application of catalytic priors in a preliminary analysis of the effectiveness of a job training program, which is complicated by the need to account for noncompliance, partially defined outcomes, and missing outcome data. / Statistics Statistics
313	Complications in Causal Inference: Incorporating Information Observed After Treatment is Assigned Watson, David Allan 06 June 2014 (has links) Randomized experiments are the gold standard for inferring causal effects of treatments. However, complications often arise in randomized experiments when trying to incorporate additional information that is observed after the treatment has been randomly assigned. The principal stratification framework has provided clarity to these problems by explicitly considering the potential outcomes of all information that is observed after treatment is randomly assigned. Principal stratification is a powerful general framework, but it is best understood in the context of specific applied problems (e.g., non-compliance in experiments and "censoring due to death" in clinical trials). This thesis considers three examples of the principal stratification framework, each focusing on different aspects of statistics and causal inference. / Statistics Statistics
314	Dilemmas in Design: From Neyman and Fisher to 3D Printing Sabbaghi, Arman 06 June 2014 (has links) This manuscript addresses three dilemmas in experimental design. / Statistics Statistics
315	Novel Computational and Statistical Approaches in Metagenomic Studies Sohn, Michael B. January 2015 (has links) Metagenomics has a great potential to discover previously unattainable information about microbial communities. The simplest, but extremely powerful approach for studying the characteristics of a microbial community is the analysis of differential abundance, which tries to identify differentially abundant features (e.g. species or genes) across different communities. For instance, detection of differentially abundant microbes across healthy and diseased groups can enable us to identify potential pathogens or probiotics. However, the analysis of differential abundance could mislead us about the characteristics of microbial communities if the counts or abundance of features on different scales are not properly normalized within and between communities. An important prerequisite for the analysis of differential abundance is to accurately estimate the composition of microbial communities, which is commonly known as the analysis of taxonomic composition. Most of prevalent approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree. In this study, two novel methods are developed: one for the analysis of taxonomic composition, called Taxonomic Analysis by Elimination and Correction (TAEC) and the other for the analysis of differential abundance, called Ratio Approach for Identifying Differential Abundance (RAIDA). TAEC utilizes the alignment similarity between known genomes in addition to the similarity between query sequences and sequences of known genomes. It is comprehensively tested on various simulated datasets of diverse complexity of bacterial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in the abundance of bacteria in a given microbial sample. RAIDA utilizes an invariant property of the ratio between the abundance of features, that is, a ratio between the relative abundance of two features is the same as a ratio between the absolute abundance of two features. Through comprehensive simulation studies the performance of RAIDA is consistently powerful and under some situations it greatly surpasses other existing methods for the analysis of differential abundance in metagenomic studies. Statistics
316	Probabilities of Ruin in Economics and Insurance under Light- and Heavy-tailed Distributions Kim, Hyeonju January 2015 (has links) This research is conducted on ruin problems in two fields. First, the ruin or survival of an economic agent over finite and infinite time horizons is explored for a one-good economy. A recursive relation derived for the intractable ruin distribution is used to compute its moments. A new system of Chebyshev inequalities, using an optimal allocation of different orders of moments over different ranges of the initial stock, provide good conservative estimates of the true ruin distribution. The second part of the research is devoted to the study of ruin probabilities in the general renewal model of insurance under both light- and heavy-tailed claim size distributions. Recent results on the dual problem of equilibrium of the Lindley-Spitzer Markov process provide clues to the orders of magnitude of finite time ruin probabilities in insurance. Extensive empirical studies show the disparity between the performances of light and heavy-tailed theoretical asymptotics vis-a-vis actual probabilities in finite time and/or with finite initial assets. Statistics
317	Scaling variances, correlation and principal components with multivariate geostatistics Vargas-Guzman, Jose Antonio, 1961- January 1998 (has links) A new concept of dispersion (cross) covariance has been introduced for the modeling of spatial scale dependent multivariate correlations. Such correlations between attributes depend on the spatial size of the domain and size of samples in the population and have been modeled by first time in this research. Modeled correlations have been used to introduce a new scale dependent principal component analysis (PCA) method. This method is based on computation of eigen values and vectors from dispersion covariance matrices or scale dependent correlations which can be modeled from integrals of matrix variograms. For second order stationary random functions this PCA converges for large domains to the classic PCA. A new technique for computing variograms from spatial variances have also been developed using derivatives. For completeness, a deeper analysis of the linear model of coregionalizations widely used in multivariate geostatistics has been included as well. This last part leads to a new more sophisticated model we termed "linear combinations coregionalization model." This whole research explains the relationship between different average states and the micro- state of vector random functions in the framework of geostatistics. Examples have been added to illustrate the practical application of the theory. This approach will be useful in all earth sciences and particularly in soil and environmental sciences. Statistics.
318	Investigating Potential Socioeconomic and Behavioral Factors Influencing Mosquito Net Ownership in Three Countries in Sub-Saharan Africa Pope, Benjamin January 2014 (has links) Malaria was responsible for 207 million illnesses per year, as of 2012. One of the main methods used to combat the mosquito-borne malaria is the use of mosquito nets. Many previous studies have examined various factors affecting malaria incidence and bed net ownership and usage, but few have made cross-country comparisons. In this study we used multilevel hierarchical regression to examine the factors which affect net ownership in Kenya, Malawi, and Tanzania by simultaneously accounting for effects at the individual household and regional levels. Some of the factors identified include wealth index and bicycle ownership (p-values less than 0.05). In Malawi, an effect modification between bicycle ownership and altitude was observed, so the models were stratified by bicycle ownership. Statistics
319	A consensus based Bayesian sample size criterion / Cámara Hagen, Luis Tomás. January 2000 (has links) The objective of the present thesis is to offer a new criterion for Bayesian sample size determination. This criterion is intended for situations where there is diversity of prior (pre-experiment) opinions. The new criteria ensure that the posterior densities derived from such diverse prior opinions are close to a pre-specified degree. We rigorously investigate the mathematical properties of these criteria and include closed form formulae for the sample sizes. Tables of sample size values and some examples from clinical trials are given. Statistics.
320	Nonparametric maximum likelihood estimation of the cumulative distribution function with multivariate interval censored data : computation, identifiability and bounds Liu, Xuecheng, 1963- January 2002 (has links) This thesis addresses nonparametric maximal likelihood (NPML) estimation of the cumulative distribution function (CDF) given multivariate interval censored data (MILD). The methodology consists in applying graph theory to the intersection graph of censored data. The maximal cliques of this graph and their real representations contain all the information needed to find NPML estimates (NPMLE). In this thesis, a new algorithm to determine the maximal cliques of an MICD set is introduced. The concepts of diameter and semi-diameter of the polytope formed by all NPMLEs are introduced and simulation to investigate the properties of the non-uniqueness polytope of the CDF NPMLEs for bivariate censored data is described. Also, an a priori bounding technique for the total mass attributed to a set of maximal cliques by a self-consistent estimate of the CDF (including the NPMLE) is presented. Statistics.

Search results