• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • Tagged with
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Essays in Cluster Sampling and Causal Inference

Makela, Susanna January 2018 (has links)
This thesis consists of three papers in applied statistics, specifically in cluster sampling, causal inference, and measurement error. The first paper studies the problem of estimating the finite population mean from a two-stage sample with unequal selection probabilies in a Bayesian framework. Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. In a two-stage cluster sampling design, clusters are first selected with probability proportional to cluster size, and units are then randomly sampled within selected clusters. Methodological challenges arise when the sizes of nonsampled cluster are unknown. We propose both nonparametric and parametric Bayesian approaches for predicting the cluster size, and we implement inference for the unknown cluster sizes simultaneously with inference for survey outcome. We implement this method in Stan and use simulation studies to compare the performance of an integrated Bayesian approach to classical methods on their frequentist properties. We then apply our propsed method to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference. The second paper focuses on the problem of weak instrumental variables, motivated by estimating the causal effect of incarceration on recidivism. An instrument is weak when it is only weakly predictive of the treatment of interest. Given the well-known pitfalls of weak instrumental variables, we propose a method for strengthening a weak instrument. We use a matching strategy that pairs observations to be close on observed covariates but far on the instrument. This strategy strengthens the instrument, but with the tradeoff of reduced sample size. To help guide the applied researcher in selecting a match, we propose simulating the power of a sensitivity analysis and design sensitivity and using graphical methods to examine the results. We also demonstrate the use of recently developed methods for identifying effect modification, which is an interaction between a pretreatment covariate and the treatment. Larger and less variable treatment effects are less sensitive to unobserved bias, so identifying when effect modification is present and which covariates may be the source is important. We undertake our study in the context of studying the causal effect of incarceration on recividism via a natural experiment in the state of Pennsylvania, a motivating example that illustrates each component of our analysis. The third paper considers the issue of measurement error in the context of survey sampling and hierarchical models. Researchers are often interested in studying the relationship between community-levels variables and individual outcomes. This approach often requires estimating the neighborhood-level variable of interest from the sampled households, which induces measurement error in the neighborhood-level covariate since not all households are sampled. Other times, neighborhood-level variables are not observed directly, and only a noisy proxy is available. In both cases, the observed variables may contain measurement error. Measurement error is known to attenuate the coefficient of the mismeasured variable, but it can also affect other coefficients in the model, and ignoring measurement error can lead to misleading inference. We propose a Bayesian hierarchical model that integrates an explicit model for the measurement error process along with a model for the outcome of interest for both sampling-induced measurement error and classical measurement error. Advances in Bayesian computation, specifically the development of the Stan probabilistic programming language, make the implementation of such models easy and straightforward.
2

Optimization under Uncertainty with Applications in Data-driven Stochastic Simulation and Rare-event Estimation

Zhang, Xinyu January 2022 (has links)
For many real-world problems, optimization could only be formulated with partial information or subject to uncertainty due to reasons such as data measurement error, model misspecification, or that the formulation depends on the non-stationary future. It thus often requires one to make decisions without knowing the problem's full picture. This dissertation considers the robust optimization framework—a worst-case perspective—to characterize uncertainty as feasible regions and optimize over the worst possible scenarios. Two applications in this worst-case perspective are discussed: stochastic estimation and rare-event simulation. Chapters 2 and 3 discuss a min-max framework to enhance existing estimators for simulation problems that involve a bias-variance tradeoff. Biased stochastic estimators, such as finite-differences for noisy gradient estimation, often contain parameters that need to be properly chosen to balance impacts from the bias and the variance. While the optimal order of these parameters in terms of the simulation budget can be readily established, the precise best values depend on model characteristics that are typically unknown in advance. We introduce a framework to construct new classes of estimators, based on judicious combinations of simulation runs on sequences of tuning parameter values, such that the estimators consistently outperform a given tuning parameter choice in the conventional approach, regardless of the unknown model characteristics. We argue the outperformance via what we call the asymptotic minimax risk ratio, obtained by minimizing the worst-case asymptotic ratio between the mean square errors of our estimators and the conventional one, where the worst case is over any possible values of the model unknowns. In particular, when the minimax ratio is less than 1, the calibrated estimator is guaranteed to perform better asymptotically. We identify this minimax ratio for general classes of weighted estimators and the regimes where this ratio is less than 1. Moreover, we show that the best weighting scheme is characterized by a sum of two components with distinct decay rates. We explain how this arises from bias-variance balancing that combats the adversarial selection of the model constants, which can be analyzed via a tractable reformulation of a non-convex optimization problem. Chapters 4 and 5 discuss extreme event estimation using a distributionally robust optimization framework. Conventional methods for extreme event estimation rely on well-chosen parametric models asymptotically justified from extreme value theory (EVT). These methods, while powerful and theoretically grounded, could however encounter difficult bias-variance tradeoffs that exacerbates especially when data size is too small, deteriorating the reliability of the tail estimation. The chapters study a framework based on the recently surging literature of distributionally robust optimization. This approach can be viewed as a nonparametric alternative to conventional EVT, by imposing general shape belief on the tail instead of parametric assumption and using worst-case optimization as a resolution to handle the nonparametric uncertainty. We explain how this approach bypasses the bias-variance tradeoff in EVT. On the other hand, we face a conservativeness-variance tradeoff which we describe how to tackle. We also demonstrate computational tools for the involved optimization problems and compare our performance with conventional EVT across a range of numerical examples.
3

Flexible models of time-varying exposures

Wang, Chenkun 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / With the availability of electronic medical records, medication dispensing data offers an unprecedented opportunity for researchers to explore complex relationships among longterm medication use, disease progression and potential side-effects in large patient populations. However, these data also pose challenges to existing statistical models because both medication exposure status and its intensity vary over time. This dissertation focused on flexible models to investigate the association between time-varying exposures and different types of outcomes. First, a penalized functional regression model was developed to estimate the effect of time-varying exposures on multivariate longitudinal outcomes. Second, for survival outcomes, a regression spline based model was proposed in the Cox proportional hazards (PH) framework to compare disease risk among different types of time-varying exposures. Finally, a penalized spline based Cox PH model with functional interaction terms was developed to estimate interaction effect between multiple medication classes. Data from a primary care patient cohort are used to illustrate the proposed approaches in determining the association between antidepressant use and various outcomes. / NIH grants, R01 AG019181 and P30 AG10133.
4

Single-index regression models

Wu, Jingwei 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Useful medical indices pose important roles in predicting medical outcomes. Medical indices, such as the well-known Body Mass Index (BMI), Charleson Comorbidity Index, etc., have been used extensively in research and clinical practice, for the quantification of risks in individual patients. However, the development of these indices is challenged; and primarily based on heuristic arguments. Statistically, most medical indices can be expressed as a function of a linear combination of individual variables and fitted by single-index model. Single-index model represents a way to retain latent nonlinear features of the data without the usual complications that come with increased dimensionality. In my dissertation, I propose a single-index model approach to analytically derive indices from observed data; the resulted index inherently correlates with specific health outcomes of interest. The first part of this dissertation discusses the derivation of an index function for the prediction of one outcome using longitudinal data. A cubic-spline estimation scheme for partially linear single-index mixed effect model is proposed to incorporate the within-subject correlations among outcome measures contributed by the same subject. A recursive algorithm based on the optimization of penalized least square estimation equation is derived and is shown to work well in both simulated data and derivation of a new body mass measure for the assessment of hypertension risk in children. The second part of this dissertation extends the single-index model to a multivariate setting. Specifically, a multivariate version of single-index model for longitudinal data is presented. An important feature of the proposed model is the accommodation of both correlations among multivariate outcomes and among the repeated measurements from the same subject via random effects that link the outcomes in a unified modeling structure. A new body mass index measure that simultaneously predicts systolic and diastolic blood pressure in children is illustrated. The final part of this dissertation shows existence, root-n strong consistency and asymptotic normality of the estimators in multivariate single-index model under suitable conditions. These asymptotic results are assessed in finite sample simulation and permit joint inference for all parameters.

Page generated in 0.0798 seconds