141 |
Mixed Methods for Mixed ModelsDorie, Vincent J. January 2014 (has links)
This work bridges the frequentist and Bayesian approaches to mixed models by borrowing the best features from both camps: point estimation procedures are combined with priors to obtain accurate, fast inference while posterior simulation techniques are developed that approximate the likelihood with great precision for the purposes of assessing uncertainty. These allow flexible inferences without the need to rely on expensive Markov chain Monte Carlo simulation techniques. Default priors are developed and evaluated in a variety of simulation and real-world settings with the end result that we propose a new set of standard approaches that yield superior performance at little computational cost.
|
142 |
Unbiased Penetrance Estimates with Unknown Ascertainment StrategiesGore, Kristen January 2014 (has links)
Allelic variation in the genome leads to variation in individuals' production of proteins. This, in turn, leads to variation in traits and development, and, in some cases, to diseases. Understanding the genetic basis for disease can aid in the search for therapies and in guiding genetic counseling. Thus, it is of interest to discover the genes with mutations responsible for diseases and to understand the impact of allelic variation at those genes.
A subject's genetic composition is commonly referred to as the subject's genotype. Subjects who carry the gene mutation of interests are referred to as carriers. Subjects who are afflicted with a disease under study (that is, subjects who exhibit the phenotype) are termed affected carriers. The age-specific probability that a given subject will exhibit a phenotype of interest, given mutation status at a gene is known as penetrance.
Understanding penetrance is an important facet of genetic epidemiology. Penetrance estimates are typically calculated via maximum likelihood from family data. However, penetrance estimates can be biased if the nature of the sampling strategy is not correctly reflected in the likelihood. Unfortunately, sampling of family data may be conducted in a haphazard fashion or, even if conducted systematically, might be reported in an incomplete fashion. Bias is possible in applying likelihood methods to reported data if (as is commonly the case) some unaffected family members are not represented in the reports.
The purpose here is to present an approach to find efficient and unbiased penetrance estimates in cases where there is incomplete knowledge of the sampling strategy and incomplete information on the full pedigree structure of families included in the data. The method may be applied with different conjectural assumptions about the ascertainment strategy to balance the possibly biasing effects of wishful assumptions about the sampling strategy with the efficiency gains that could be obtained through valid assumptions.
|
143 |
Limit Theory for Spatial Processes, Bootstrap Quantile Variance Estimators, and Efficiency Measures for Markov Chain Monte CarloYang, Xuan January 2014 (has links)
This thesis contains three topics: (I) limit theory for spatial processes, (II) asymptotic results on the bootstrap quantile variance estimator for importance sampling, and (III) an efficiency measure of MCMC.
(I) First, central limit theorems are obtained for sums of observations from a $\kappa$-weakly dependent random field. In particular, it is considered that the observations are made from a random field at irregularly spaced and possibly random locations. The sums of these samples as well as sums of functions of pairs of the observations are objects of interest; the latter has applications in covariance estimation, composite likelihood estimation, etc. Moreover, examples of $\kappa$-weakly dependent random fields are explored and a method for the evaluation of $\kappa$-coefficients is presented.
Next, statistical inference is considered for the stochastic heteroscedastic processes (SHP) which generalize the stochastic volatility time series model to space. A composite likelihood approach is adopted for parameter estimation, where the composite likelihood function is formed by a weighted sum of pairwise log-likelihood functions. In addition, the observations sites are assumed to distributed according to a spatial point process. Sufficient conditions are provided for the maximum composite likelihood estimator to be consistent and asymptotically normal.
(II) It is often difficult to provide an accurate estimation for the variance of the weighted sample quantile. Its asymptotic approximation requires the value of the density function which may be hard to evaluate in complex systems. To circumvent this problem, the bootstrap estimator is considered. Theoretical results are established for the exact convergence rate and asymptotic distributions of the bootstrap variance estimators for quantiles of weighted empirical distributions. Under regularity conditions, it is shown that the bootstrap variance estimator is asymptotically normal and has relative standard deviation of order O(n^-1/4)
(III) A new performance measure is proposed to evaluate the efficiency of Markov chain Monte Carlo (MCMC) algorithms. More precisely, the large deviations rate of the probability that the Monte Carlo estimator deviates from the true by a certain distance is used as a measure of efficiency of a particular MCMC algorithm. Numerical methods are proposed for the computation of the rate function based on samples of the renewal cycles of the Markov chain. Furthermore the efficiency measure is applied to an array of MCMC schemes to determine their optimal tuning parameters.
|
144 |
Efficiency in Lung Transplant Allocation StrategiesZou, Jingjing January 2015 (has links)
Currently in the United States, lungs are allocated to transplant candidates based on the Lung Allocation Score (LAS). The LAS is an empirically derived score aimed at increasing total life span pre- and post-transplantation, for patients on lung transplant waiting lists. The goal here is to develop efficient allocation strategies in the context of lung transplantation.
In this study, patient and organ arrivals to the waiting list are modeled as independent homogeneous Poisson processes. Patients' health status prior to allocations are modeled as evolving according to independent and identically distributed finite-state inhomogeneous Markov processes, in which death is treated as an absorbing state. The expected post-transplantation residual life is modeled as depending on time on the waiting list and on current health status. For allocation strategies satisfying certain minimal fairness requirements, the long-term limit of expected average total life exists, and is used as the standard for comparing allocation strategies.
Via the Hamilton-Jacobi-Bellman equations, upper bounds as a function of the ratio of organ arrival rate to the patient arrival rate for the long-term expected average total life are derived, and corresponding to each upper bound is an allocable set of (state, time) pairs at which patients would be optimally transplanted. As availability of organs increases, the allocable set expands monotonically, and ranking members of the waiting list according to the availability at which they enter the allocable set provides an allocation strategy that leads to long-term expected average total life close to the upper bound.
Simulation studies are conducted with model parameters estimated from national lung transplantation data from United Network for Organ Sharing (UNOS). Results suggest that compared to the LAS, the proposed allocation strategy could provide a 7% increase in average total life.
|
145 |
Automatic Shape-Constrained Non-Parametric RegressionGao, Zhikun 02 May 2019 (has links)
<p> We propose an automatic shape-constrained non-parametric estimation methodology in least squares and quantile regression, where the regression function and its shape are simultaneously estimated and identified. </p><p> We build the estimation based on the quadratic B-spline expansion with penalization about its first and second derivatives on spline knots in a group manner. By penalizing the positive and negative parts of the introduced group derivatives, the shape of the estimated regression curve is determined according to the sparsity of the parameters considered. In the quadratic B-spline expansion, the parameters referring to the shape can be written through some simple linear combinations of the basis coefficients, which makes it convenient to impose penalization for shape identification is efficient in computation and is flexible in various shape identification. In both least squares and quantile regression scenarios, under some regularity conditions, we show that the proposed method can identify the correct shape of the regression function with probability approaching one, and the resulting non-parametric estimator can achieve the optimal convergence rate. Simulation study shows that the proposed method gives more stable curve estimation and more accurate curve shape classification than the conventional unconstrained B-spline estimator in both mean and quantile regressions, and it is competitive in terms of the estimation accuracy to the artificial shape-constrained estimator built by knowing prior information of the curve shape. In addition, across multiple quantile levels, the proposed estimator shows less crossing between the estimated quantile curves than the unpenalized counterpart.</p><p>
|
146 |
Fused Lasso and Tensor Covariance Learning with Robust EstimationKunz, Matthew Ross 31 January 2019 (has links)
<p> With the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. </p><p> Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of coefficients. Robust modifications are made to the mean to account for gross outliers in the data. This method is applied to near infrared spectral measurements in prediction of an aqueous analyte concentration and is shown to improve prediction accuracy. </p><p> Expansion on the robust estimation and structure analysis is performed by examining graph structures within a clustered tensor. The tensor is subjected to wavelet smoothing and robust sparse precision matrix estimation for a detailed look into the covariance structure. This methodology is applied to catalytic kinetics data where the graph structure estimates the elementary steps within the reaction mechanism.</p><p>
|
147 |
Bootstrapped Information-Theoretic Model Selection with Error Control (BITSEC)January 2018 (has links)
abstract: Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are not driven by a notion of statistical confidence, so their results entail an unknown de- gree of uncertainty. This paper introduces a general framework which extends notions of Type-I and Type-II error to model selection. A theo- retical method for controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions is given, along with a bootstrap approach that approximates the procedure. Results are presented for simulated experiments using normal distributions, random walk models, nested linear regression, and nonnested regression including nonlinear mod- els. Tests are performed using an R package developed by the author which will be made publicly available on journal publication of research results. / Dissertation/Thesis / Masters Thesis Statistics 2018
|
148 |
Robustness of the General Factor Mean Difference Estimation in Bifactor Ordinal DataJanuary 2019 (has links)
abstract: A simulation study was conducted to explore the robustness of general factor mean difference estimation in bifactor ordered-categorical data. In the No Differential Item Functioning (DIF) conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of the general factor mean difference, and the size of specific factor loadings; in data analysis, misspecification conditions were introduced in which the generated bifactor data were fit using a unidimensional model, and/or ordered-categorical data were treated as continuous data. In the DIF conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of latent mean difference for the general factor, the type of item parameters that had DIF, and the magnitude of DIF; the data analysis conditions varied in whether or not setting equality constraints on the noninvariant item parameters.
Results showed that falsely fitting bifactor data using unidimensional models or failing to account for DIF in item parameters resulted in estimation bias in the general factor mean difference, while treating ordinal data as continuous had little influence on the estimation bias as long as there was no severe model misspecification. The extent of estimation bias produced by misspecification of bifactor datasets with unidimensional models was mainly determined by the degree of unidimensionality (i.e., size of specific factor loadings) and the general factor mean difference size. When the DIF was present, the estimation accuracy of the general factor mean difference was completely robust to ignoring noninvariance in specific factor loadings while it was very sensitive to failing to account for DIF in threshold parameters. With respect to ignoring the DIF in general factor loadings, the estimation bias of the general factor mean difference was substantial when the DIF was -0.15, and it can be negligible for smaller sizes of DIF. Despite the impact of model misspecification on estimation accuracy, the power to detect the general factor mean difference was mainly influenced by the sample size and effect size. Serious Type I error rate inflation only occurred when the DIF was present in threshold parameters. / Dissertation/Thesis / Doctoral Dissertation Educational Psychology 2019
|
149 |
Marked Determinantal Point ProcessesUnknown Date (has links)
Determinantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closed-form densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been well-studied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas the multivariate DPPs, or the so-called multi-type marked DPPs, have been little explored. In this thesis, we propose a class of multivariate DPPs based on a block kernel construction. For the marked DPP, we show that the conditions of existence of DPP can easily be satised. The block construction allows us to model the individually marked DPPs as well as controlling the scale of repulsion of points having dierent marks. Unlike other researchers who model the kernel function of a DPP, we model its spectral representation, which not only guarantees the existence of the multivariate DPP, but makes the simulation-based estimation methods readily available. In our research, we adopted bivariate complex Fourier basis, which demonstrates nice properties such as constant intensity and approximate isotropy within a short distance between the nearby points. The parameterized block kernels can approximate to commonly-used covariance functions using Fourier expansion. The parameters can be estimated using Maximum Likelihood Estimation, Bayesian approach and Minimum Contrast Estimation. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2019. / April 17, 2019. / Determinantal Point Processes, DPP, Marked Point Processes, Multivariate DPP, Poisson Processes / Includes bibliographical references. / Fred Huffer, Professor Directing Dissertation; Craig Nolder, University Representative; Xufeng Niu, Committee Member; Jonathan Bradley, Committee Member.
|
150 |
High-Dimensional Statistical Methods for Tensor Data and Efficient AlgorithmsUnknown Date (has links)
In contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of high-dimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for Covariate-Adjusted Tensor Classification in High-dimensions, which efficiently integrates the low-dimensional covariates and the tensor to perform classification and variable selection. The CATCH model preserves and utilizes the structures of the data for maximum interpretability and optimal prediction. We propose a penalized approach to select a subset of tensor predictor entries that has direct discriminative effects after adjusting for covariates. Theoretical results confirm that our approach achieves variable selection consistency and optimal classification accuracy. For unsupervised learning, we consider clustering problem on high-dimensional tensor data. we propose an efficient procedure based on EM algorithm. It directly estimates the sparse discriminant vector from a penalized objective function and provides computationally efficient rules to update all other parameters. Meanwhile, the algorithm takes advantage of the tensor structure to reduce the number of parameters, which leads to lower storage costs. The performance of our method over existing methods is demonstrated in simulated and real data examples. Moreover, based on tensor computation, we propose a novel algorithm referred to as the SMORE algorithm for differential network analysis. The SMORE algorithm has low storage cost and high computation speed, especially in the presence of strong sparsity. It also provides a unified framework for binary and multiple network problems. In addition, we note that the SMORE algorithm can be applied to high-dimensional quadratic discriminant analysis problems, providing a new approach for multiclass high-dimensional quadratic discriminant analysis. In the end, we discuss some directions of the future work, including new approaches, applications and relaxing assumptions. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2019. / April 16, 2019. / classification, clustering, high-dimension, tensor, variable selection / Includes bibliographical references. / Qing Mai, Professor Co-Directing Dissertation; Xin Zhang, Professor Co-Directing Dissertation; Weikuan Yu, University Representative; Elizabeth Slate, Committee Member.
|
Page generated in 0.0793 seconds