Spelling suggestions: "subject:"astatistics"" "subject:"cstatistics""
1 
Inference and Prediction Problems for Spatial and Spatiotemporal DataCervone, Daniel Leonard 17 July 2015 (has links)
This dissertation focuses on prediction and inference problems for complex spatiotemporal systems. I explore three specific problems in this areamotivated by real data examplesand discuss the theoretical motivations for the proposed methodology, implementation details, and inference/performance on data of interest.
Chapter 1 introduces a novel time series model that improves the accuracy of lung tumor tracking for radiotherapy. Tumor tracking requires realtime, multiplestep ahead forecasting of a quasiperiodic time series recording instantaneous tumor locations. Our proposed model is a locationmixture autoregressive (LMAR) process that admits multimodal conditional distributions, fast approximate inference using the EM algorithm and accurate multiplestep ahead predictive distributions. Compared with other families of mixture autoregressive models, LMAR is easier to fit (with a smaller parameter space) and better suited to online inference and multiplestep ahead forecasting as there is no need for Monte Carlo. Against other candidate models in statistics and machine learning, our model provides superior predictive performance for clinical data.
Chapter 2 develops a stochastic process model for the spatiotemporal evolution of a basketball possession based on tracking data that records each player's exact location at 25Hz. Our model comprises of multiresolution transition kernels that simultaneously describe players' continuous motion dynamics along with their decisions, ball movements, and other discrete actions. Many such actions occur very sparsely in player $\times$ location space, so we use hierarchical models to share information across different players in the league and disjoint regions on the basketball courta challenging problem given the scale of our data (over 400 players and 1 billion spacetime observations) and the computational cost of inferential methods in spatial statistics. Our framework, in addition to offering valuable insight into individual players’ behavior and decisionmaking, allows us to estimate the instantaneous expected point value of an NBA possession by averaging over all possible future possession paths.
In Chapter 3, we investigate Gaussian process regression where inputs are subject to measurement error. For instance, in spatial statistics, input measurement errors occur when the geographical locations of observed data are not known exactly. Such sources of error are not special cases of ``nugget'' or microscale variation, and require alternative methods for both interpolation and parameter estimation. We discuss some theory for Kriging in this regime, as well as using Hybrid Monte Carlo to provide predictive distributions (and parameter estimates, if necessary). Through simulation study and analysis of northern hemipshere temperature data from the summer of 2011, we show that appropriate methods for incorporating location measurement error are essential to reliable inference in this regime. / Statistics

2 
Methods for Effectively Combining Group and IndividualLevel DataSmoot, Elizabeth 17 July 2015 (has links)
In observational studies researchers often have access to multiple sources of information but ultimately choose to apply wellestablished statistical methods that do not take advantage of the full range of information available. In this dissertation I discuss three methods that are able to incorporate this additional data and show how using each improves the quality of the analysis.
First, in Chapters 1 and 2, I focus on methods for improving estimator efficiency in studies in which both population (group) and individuallevel data is available. In such settings, the hybrid design for ecological inference efficiently combines the two sources of information; however, in practice, maximizing the likelihood is often computationally intractable. I propose and develop an alternative, computationally efficient representation of the hybrid likelihood. I then demonstrate that this approximation incurs no penalty in terms of increased bias or reduced efficiency.
Second, in Chapters 3 and 4, I highlight the problem of applying standard analyses to outcomedependent sampling schemes in settings in which study units are clustercorrelated. I demonstrate that incorporating known outcome totals into the likelihood via inverse probability weights results in valid estimation and inference. I further discuss the applicability of outcomedependent sampling schemes in resourcelimited settings, specifically to the analysis of national ART programs in subSaharan Africa. I propose the clusterstratified casecontrol study as a valid and logistically reasonable study design in such resourcepoor settings, discuss balanced versus unbalanced sampling techniques, and address the practical tradeoff between logistic considerations and statistical efficiency of clusterstratified casecontrol versus casecontrol studies.
Finally, in Chapter 5, I demonstrate the benefit of incorporating the fullrange of possible outcomes into an observational data analysis, as opposed to running the analysis on a preselected set of outcomes. Testing all possible outcomes for associations with the exposure inherently incorporates negative controls into the analysis and further validates a study's statistically significant results. I apply this technique to an investigation of the relationship between particulate air pollution and hospital admission causes. / Biostatistics

3 
Extensions of RandomizationBased Methods for Causal InferenceLee, Joseph Jiazong 17 July 2015 (has links)
In randomized experiments, the random assignment of units to treatment groups justifies many of the traditional analysis methods for evaluating causal effects. Specifying subgroups of units for further examination after observing outcomes, however, may partially nullify any advantages of randomized assignment when data are analyzed naively. Some previous statistical literature has treated all posthoc analyses homogeneously as entirely invalid and thus uninterpretable. Alternative analysis methods and the extent of the validity of such analyses remain largely unstudied. Here Chapter 1 proposes a novel, randomizationbased method that generates valid posthoc subgroup pvalues, provided we know exactly how the subgroups were constructed. If we do not know the exact subgrouping procedure, our method may still place helpful bounds on the significance level of estimated effects. Chapter 2 extends the proposed methodology to generate valid posterior predictive pvalues for partially posthoc subgroup analyses, i.e., analyses that compare existing experimental data  from which a subgroup specification is derived  to new, subgrouponly data. Both chapters are motivated by pharmaceutical examples in which subgroup analyses played pivotal and controversial roles. Chapter 3 extends our randomizationbased methodology to more general randomized experiments with multiple testing and nuisance unknowns. The results are valid familywise tests that are doubly advantageous, in terms of statistical power, over traditional methods. We apply our methods to data from the United States Job Training Partnership Act (JTPA) Study, where our analyses lead to different conclusions regarding the significance of estimated JTPA effects. In all chapters, we investigate the operating characteristics and demonstrate the advantages of our methods through series of simulations. / Statistics

4 
Ordinal Outcome Prediction and Treatment Selection in Personalized MedicineShen, Yuanyuan 01 May 2017 (has links)
In personalized medicine, two important tasks are predicting disease risk and selecting appropriate treatments for individuals based on their baseline information. The dissertation focuses on providing improved risk prediction for ordinal outcome data and proposing scorebased test to identify informative markers for treatment selection. In Chapter 1, we take up the first problem and propose a disease risk prediction model for ordinal outcomes. Traditional ordinal outcome models leave out intermediate models which may lead to suboptimal prediction performance; they also don't allow for nonlinear covariate effects. To overcome these, a continuation ratio kernel machine (CRKM) model is proposed both to let the data reveal the underlying model and to capture potential nonlinearity effect among predictors, so that the prediction accuracy is maximized. In Chapter 2, we seek to develop a kernel machine (KM) score test that can efficiently identify markers that are predictive of treatment difference. This new approach overcomes the shortcomings of the standard Wald test, which is scaledependent and only take into account linear effect among predictors. To do this, we propose a modelfree score test statistics and implement the KM framework. Simulations and real data applications demonstrated the advantage of our methods over the Wald test. In Chapter 3, based on the procedure proposed in Chapter 2, we further add sparsity assumption on the predictors to take into account the real world problem of sparse signal. We incorporate the generalized higher criticism (GHC) to threshold the signals in a group and maintain a high detecting power. A comprehensive comparison of the procedures in Chapter 2 and Chapter 3 demonstrated the advantages and disadvantages of difference procedures under different scenarios. / Biostatistics

5 
Essays in Causal Inference and Public PolicyFeller, Avi Isaac 17 July 2015 (has links)
This dissertation addresses statistical methods for understanding treatment effect variation in randomized experiments, both in terms of variation across pretreatment covariates and variation across postrandomization intermediate outcomes. These methods are then applied to data from the National Head Start Impact Study (HSIS), a largescale randomized evaluation of the Federally funded preschool program, which has become an important part of the policy debate in early childhood education.
Chapter 2 proposes a randomizationbased approach for testing for the presence of treatment effect variation not explained by observed covariates. The key challenge in using this approach is the fact that the average treatment effect, generally the object of interest in randomized experiments, actually acts as a nuisance parameter in this setting. We explore potential solutions and advocate for a method that guarantees valid tests in finite samples despite this nuisance. We also show how this method readily extends to testing for heterogeneity beyond a given model, which can be useful for assessing the sufficiency of a given scientific theory. We finally apply this method to the HSIS and find that there is indeed significant unexplained treatment effect variation.
Chapter 3 leverages modelbased principal stratification to assess treatment effect variation across an intermediate outcome in the HSIS. In particular, we estimate differential impacts of Head Start by alternative care setting, the care that children would receive in the absence of the offer to enroll in Head Start. We find strong, positive shortterm effects of Head Start on receptive vocabulary for those Compliers who would otherwise be in homebased care. By contrast, we find no meaningful impact of Head Start on vocabulary for those Compliers who would otherwise be in other centerbased care. Our findings suggest that alternative care type is a potentially important source of variation in Head Start.
Chapter 4 reviews the literature on the use of principal score methods, which rely on predictive covariates rather than outcomes for estimating principal causal effects. We clarify the role of the Principal Ignorability assumption in this approach and show that there are in fact two versions: Strong and Weak Principal Ignorability. We then explore several proposed in the literature and assess their finite sample properties via simulation. Finally, we propose some extensions to the case of twosided noncompliance and apply these ideas to the HSIS, finding mixed results. / Statistics

6 
Exploring the Role of Randomization in Causal InferenceDing, Peng 17 July 2015 (has links)
This manuscript includes three topics in causal inference, all of which are under the randomization inference framework (Neyman, 1923; Fisher, 1935a; Rubin, 1978). This manuscript contains three selfcontained chapters.
Chapter 1. Under the potential outcomes framework, causal effects are defined as comparisons between potential outcomes under treatment and control. To infer causal effects from randomized experiments, Neyman proposed to test the null hypothesis of zero average causal effect (Neyman’s null), and Fisher proposed to test the null hypothesis of zero individual causal effect (Fisher’s null). Although the subtle difference between Neyman’s null and Fisher’s null has caused lots of controversies and confusions for both theoretical and practical statisticians, a careful comparison between the two approaches has been lacking in the literature for more than eighty years. I fill in this historical gap by making a theoretical comparison between them and highlighting an intriguing paradox that has not been recognized by previous re searchers. Logically, Fisher’s null implies Neyman’s null. It is therefore surprising that, in actual completely randomized experiments, rejection of Neyman’s null does not imply rejection of Fisher’s null for many realistic situations, including the case with constant causal effect. Furthermore, I show that this paradox also exists in other commonlyused experiments, such as stratified experiments, matchedpair experiments, and factorial experiments. Asymptotic analyses, numerical examples, and real data examples all support this surprising phenomenon. Besides its historical and theoretical importance, this paradox also leads to useful practical implications for modern researchers.
Chapter 2. Causal inference in completely randomized treatmentcontrol studies with binary outcomes is discussed from Fisherian, Neymanian and Bayesian perspectives, using the potential outcomes framework. A randomizationbased justification of Fisher’s exact test is provided. Arguing that the crucial assumption of constant causal effect is often unrealistic, and holds only for extreme cases, some new asymptotic and Bayesian inferential procedures are proposed. The proposed procedures exploit the intrinsic nonadditivity of unitlevel causal effects, can be applied to linear and non linear estimands, and dominate the existing methods, as verified theoretically and also through simulation studies.
Chapter 3. Recent literature has underscored the critical role of treatment effect variation in estimating and understanding causal effects. This approach, however, is in contrast to much of the foundational research on causal inference; Neyman, for example, avoided such variation through his focus on the average treatment effect and his definition of the confidence interval. In this chapter, I extend the Ney manian framework to explicitly allow both for treatment effect variation explained by covariates, known as the systematic component, and for unexplained treatment effect variation, known as the idiosyncratic component. This perspective enables es timation and testing of impact variation without imposing a model on the marginal distributions of potential outcomes, with the workhorse approach of regression with interaction terms being a special case. My approach leads to two practical results.
First, I combine estimates of systematic impact variation with sharp bounds on over all treatment variation to obtain bounds on the proportion of total impact variation explained by a given model—this is essentially an R2 for treatment effect variation. Second, by using covariates to partially account for the correlation of potential out comes problem, I exploit this perspective to sharpen the bounds on the variance of the average treatment effect estimate itself. As long as the treatment effect varies across observed covariates, the resulting bounds are sharper than the current sharp bounds in the literature. I apply these ideas to a large randomized evaluation in educational research, showing that these results are meaningful in practice. / Statistics

7 
Three Aspects of Biostatistical Learning TheoryNeykov, Matey 17 July 2015 (has links)
In the present dissertation we consider three classical problems in biostatistics and statistical learning  classification, variable selection and statistical inference.
Chapter 2 is dedicated to multiclass classification. We characterize a class of loss functions which we deem relaxed Fisher consistent, whose local minimizers not only recover the Bayes rule but also the exact conditional class probabilities. Our class encompasses previously studied classes of lossfunctions, and includes nonconvex functions, which are known to be less susceptible to outliers. We propose a generic greedy functional gradientdescent minimization algorithm for boosting weak learners, which works with any loss function in our class. We show that the boosting algorithm achieves geometric rate of convergence in the case of a convex loss. In addition we provide numerical studies and a real data example which serve to illustrate that the algorithm performs well in practice.
In Chapter 3, we provide insights on the behavior of sliced inverse regression in a highdimensional setting under a single index model. We analyze two algorithms: a thresholding based algorithm known as diagonal thresholding and an L1 penalization algorithm  semidefinite programming, and show that they achieve optimal (up to a constant) sample size in terms of support recovery in the case of standard Gaussian predictors. In addition, we look into the performance of the linear regression LASSO in single index models with correlated Gaussian designs. We show that under certain restrictions on the covariance and signal, the linear regression LASSO can also enjoy optimal sample size in terms of support recovery. Our analysis extends existing results on LASSO's variable selection capabilities for linear models.
Chapter 4 develops general inferential framework for testing and constructing confidence intervals for highdimensional estimating equations. Such framework has a variety of applications and allows us to provide tests and confidence regions for parameters estimated by algorithms such as the Dantzig Selector, CLIME and LDP among others, non of which has been previously equipped with inferential procedures. / Biostatistics

8 
On Causal Inference for Ordinal OutcomesLu, Jiannan 04 December 2015 (has links)
This dissertation studies the problem of causal inference for ordinal outcomes. Chapter 1 focuses on the sharp null hypothesis of no treatment effect on all experimental units, and develops a systematic procedure for closedform construction of sequences of alternative hypotheses in increasing orders of their departures from the sharp null hypothesis. The resulted construction procedure helps assessing the powers of randomization tests with ordinal outcomes. Chapter 2 proposes two new causal parameters, i.e., the probabilities that the treatment is beneficial and strictly beneficial for the experimental units, and derives their sharp bounds using only the marginal distributions, without imposing any assumptions on the joint distribution of the potential outcomes. Chapter 3 generalizes the framework in Chapter 2 to address noncompliance. / Statistics

9 
Topics in Bayesian Inference for Causal EffectsGarcia Horton, Viviana 04 December 2015 (has links)
This manuscript addresses two topics in Bayesian inference for causal effects.
1) Treatment noncompliance is frequent in clinical trials, and because the treatment actually received may be different from that assigned, comparisons between groups as randomized will no longer assess the effect of the treatment received.
To address this complication, we create latent subgroups based on the potential outcomes of treatment received and focus on the subgroup of compliers, where under certain assumptions the estimands of causal effects of assignment can be interpreted as causal effects of receipt of treatment.
We propose estimands of causal effects for rightcensored timeto event endpoints, and discuss a framework to estimate those causal effects that relies on modeling survival times as parametric functions of pretreatment variables.
We demonstrate a Bayesian estimation strategy that multiply imputes the missing data using posterior predictive distributions using a randomized clinical trial involving breast cancer patients.
Finally, we establish a connection with the commonly used parametric proportional hazards and accelerated failure time models, and briefly discuss the consequences of relaxing the assumption of independent censoring.
2) Bayesian inference for causal effects based on data obtained from ignorable assignment mechanisms can be sensitive to the model specified for the data.
Ignorability is defined with respect to specific models for an assignment mechanism and data, which we call the ``true'' generating data models, generally unknown to the statistician; these, in turn, determine a true posterior distribution for a causal estimand of interest.
On the other hand, the statistician poses a set of models to conduct the analysis, which we call the ``statistician's'' models; a posterior distribution for the causal estimand can be obtained assuming these models.
Let $\Delta_M$ denote the difference between the true models and the statistician's models, and let $\Delta_D$ denote the difference between the true posterior distribution and the statistician's posterior distribution (for a specific estimand).
For fixed $\Delta_M$ and fixed sample size, $\Delta_D$ varies more with datadependent assignment mechanisms than with datafree assignment mechanisms.
We illustrate this through a sequence of examples of $\Delta_M$, and
under various ignorable assignment mechanisms, namely, complete randomization design, rerandomization design, and the finite selection model design.
In each case, we create the 95\% posterior interval for an estimand under a statistician's model, and then compute its coverage probability for the correct posterior distribution; this Bayesian coverage probability is our choice of measure $\Delta_D$.
The objective of these examples is to provide insights into the ranges of data models for which Bayesian inference for causal effects from datasets obtained through ignorable assignment mechanisms is approximately valid from the Bayesian perspective, and how these validities are influenced by datadependent assignment mechanisms. / Statistics

10 
`Time for a New Angle!': Unravel the Mystery of SplitPlot Designs via the Potential Outcomes PrismZhao, Anqi 25 July 2017 (has links)
This manuscript investigates two different approaches, namely the Neymanian randomization based (Neyman, 1923) method and the Bayesian model based (Rubin, 1978) method, towards the causal inference for 2by2 splitplot designs (Jones and Nachtsheim, 2009), both under the potential outcomes framework (Neyman, 1923; Rubin, 1974, 1978, 2005).
Chapter 1  Chapter 5. Given two 2level factors of interest, a 2by2 splitplot design (a) takes each of the 2by2 = 4 possible factorial combinations as a treatment, (b) identifies one factor as 'wholeplot,' (c) divides the experimental units into blocks, and (d) assigns the treatments in such a way that all units within the same block receive the same level of the wholeplot factor. Assuming the potential outcomes framework, we propose in Chapters 1 — 5 a randomizationbased estimation procedure for causal inference under such designs. Sampling variances of the point estimates are derived in closed form as linear combinations of the between and withinblock covariances of the potential outcomes. Results are compared to those under complete randomizations as measures of design efficiency. Interval estimates are constructed based on conservative estimates of the sampling variances, and the frequency coverage properties evaluated via simulation. Superiority over existing modelbased alternatives is reported under a variety of settings for both binary and continuous outcomes.
Chapter 6. Causal inference compares the differences in outcomes over a particular set of experiment units. Whereas the randomizationbased Neymanian inference focuses on the experimental units directly involved in the study, the introduction of Bayesian inferential framework provides a principled way to extend such finite population concerns to the superpopulation (Rubin, 1978). We outline in this chapter the explicit procedure for analyzing 2by2 splitplot designs under this framework, and illustrate the various technical issues in the actual implementation via examples. / Statistics

Page generated in 0.1053 seconds