41 |
Topics in experimental and tournament designHennessy, Jonathan Philip 21 October 2014 (has links)
We examine three topics related to experimental design in this dissertation. Two are related to the analysis of experimental data and the other focuses on the design of paired comparison experiments, in this case knockout tournaments. The two analysis topics are motivated by how to estimate and test causal effects when the assignment mechanism fails to create balanced treatment groups. In Chapter 2, we apply conditional randomization tests to experiments where, through random chance, the treatment groups differ in their covariate distributions. In Chapter 4, we apply principal stratification to factorial experiments where the subjects fail to comply with their assigned treatment. The sources of imbalance differ, but, in both cases, ignoring the imbalance can lead to incorrect conclusions.
In Chapter 3, we consider designing knockout tournaments to maximize different objectives given a prior distribution on the strengths of the players. These objectives include maximizing the probability the best player wins the tournament. Our emphasis on balance in the other two chapters comes from a desire to create a fair comparison between treatments. However, in this case, the design uses the prior information to intentionally bias the tournament in favor of the better players. / Statistics
|
42 |
Evaluating the Performance of Propensity Scores to Address Selection Bias in a Multilevel Context: A Monte Carlo Simulation Study and Application Using a National DatasetLingle, Jeremy Andrew 16 October 2009 (has links)
When researchers are unable to randomly assign students to treatment conditions, selection bias is introduced into the estimates of treatment effects. Random assignment to treatment conditions, which has historically been the scientific benchmark for causal inference, is often impossible or unethical to implement in educational systems. For example, researchers cannot deny services to those who stand to gain from participation in an academic program. Additionally, students select into a particular treatment group through processes that are impossible to control, such as those that result in a child dropping-out of high school or attending a resource-starved school. Propensity score methods provide valuable tools for removing the selection bias from quasi-experimental research designs and observational studies through modeling the treatment assignment mechanism. The utility of propensity scores has been validated for the purposes of removing selection bias when the observations are assumed to be independent; however, the ability of propensity scores to remove selection bias in a multilevel context, in which group membership plays a role in the treatment assignment, is relatively unknown. A central purpose of the current study was to begin filling in the gaps in knowledge regarding the performance of propensity scores for removing selection bias, as defined by covariate balance, in multilevel settings using a Monte Carlo simulation study. The performance of propensity scores were also examined using a large-scale national dataset. Results from this study provide support for the conclusion that multilevel characteristics of a sample have a bearing upon the performance of propensity scores to balance covariates between treatment and control groups. Findings suggest that propensity score estimation models should take into account the cluster-level effects when working with multilevel data; however, the numbers of treatment and control group individuals within each cluster must be sufficiently large to allow estimation of those effects. Propensity scores that take into account the cluster-level effects can have the added benefit of balancing covariates within each cluster as well as across the sample as a whole.
|
43 |
Searching for causal effects of road traffic safety interventions : applications of the interrupted time series designBonander, Carl January 2015 (has links)
Traffic-related injuries represent a global public health problem, and contribute largely to mortality and years lived with disability worldwide. Over the course of the last decades, improvements to road traffic safety and injury surveillance systems have resulted in a shift in focus from the prevention of motor vehicle accidents to the control of injury events involving vulnerable road users (VRUs), such as cyclists and moped riders. There have been calls for improvements to the evaluation of safety interventions due to methodological problems associated with the most commonly used study designs. The purpose of this licentiate thesis was to assess the strengths and limitations of the interrupted time series (ITS) design, which has gained some attention for its ability to provide valid effect estimates. Two national road safety interventions involving VRUs were selected as cases: the Swedish bicycle helmet law for children under the age 15, and the tightening of licensing rules for Class 1 mopeds. The empirical results suggest that both interventions were effective in improving the safety of VRUs. Unless other concurrent events affect the treatment population at the exact time of intervention, the effect estimates should be internally valid. One of the main limitations of the study design is the inability to identify why the interventions were successful, especially if they are complex and multifaceted. A lack of reliable exposure data can also pose a further threat to studies of interventions involving VRUs if the intervention can affect the exposure itself. It may also be difficult to generalize the exact effect estimates to other regions and populations. Future studies should consider the use of the ITS design to enhance the internal validity of before-after measurements. / Traffic-related injuries represent a global public health problem, and contribute largely to mortality and years lived with disability. Over the course of the last decades, improvements to road traffic safety and injury surveillance systems have resulted in a shift in focus from motor vehicle accidents to injury events involving vulnerable road users (VRUs), such as cyclists and moped riders. There have been calls for improvements to the evaluation of safety interventions due to methodological problems associated with the most commonly used study designs. The purpose of this licentiate thesis was to assess the strengths and limitations of the interrupted time series (ITS) design, which has gained some attention for its ability to provide valid effect estimates while accounting for secular trends. Two national interventions involving VRUs were selected as cases: the Swedish bicycle helmet law for children under the age 15, and the tightening of licensing rules for Class 1 mopeds. The empirical results suggest that both interventions were effective. These results are discussed in the light of some methodological considerations regarding internal and external validity, data quality and the ability to fully understand key causal mechanisms behind complex interventions.
|
44 |
Adjusting for Selection Bias Using Gaussian Process ModelsDu, Meng 18 July 2014 (has links)
This thesis develops techniques for adjusting for selection bias using Gaussian process models. Selection bias is a key issue both in sample surveys and in observational studies for causal inference. Despite recently emerged techniques for dealing with selection bias in high-dimensional or complex situations, use of Gaussian process models and Bayesian hierarchical models in general has not been explored.
Three approaches are developed for using Gaussian process models to estimate the population mean of a response variable with binary selection mechanism. The first approach models only the response with the selection probability being ignored. The second approach incorporates the selection probability when modeling the response using dependent Gaussian process priors. The third approach uses the selection probability as an additional covariate when modeling the response. The third approach requires knowledge of the selection probability, while the second approach can be used even when the selection probability is not available. In addition to these Gaussian process approaches, a new version of the Horvitz-Thompson estimator is also developed, which follows the conditionality principle and relates to importance sampling for Monte Carlo simulations.
Simulation studies and the analysis of an example due to Kang and Schafer show that the Gaussian process approaches that consider the selection probability are able to not only correct selection bias effectively, but also control the sampling errors well, and therefore can often provide more efficient estimates than the methods tested that are not based on Gaussian process models, in both simple and complex situations. Even the Gaussian process approach that ignores the selection probability often, though not always, performs well when some selection bias is present.
These results demonstrate the strength of Gaussian process models in dealing with selection bias, especially in high-dimensional or complex situations. These results also demonstrate that Gaussian process models can be implemented rather effectively so that the benefits of using Gaussian process models can be realized in practice, contrary to the common belief that highly flexible models are too complex to use practically for dealing with selection bias.
|
45 |
Multilevel Potential Outcome Models for Causal Inference in Jury ResearchJanuary 2015 (has links)
abstract: Recent advances in hierarchical or multilevel statistical models and causal inference using the potential outcomes framework hold tremendous promise for mock and real jury research. These advances enable researchers to explore how individual jurors can exert a bottom-up effect on the jury’s verdict and how case-level features can exert a top-down effect on a juror’s perception of the parties at trial. This dissertation explains and then applies these technical advances to a pre-existing mock jury dataset to provide worked examples in an effort to spur the adoption of these techniques. In particular, the paper introduces two new cross-level mediated effects and then describes how to conduct ecological validity tests with these mediated effects. The first cross-level mediated effect, the a1b1 mediated effect, is the juror level mediated effect for a jury level manipulation. The second cross-level mediated effect, the a2bc mediated effect, is the unique contextual effect that being in a jury has on the individual the juror. When a mock jury study includes a deliberation versus non-deliberation manipulation, the a1b1 can be compared for the two conditions, enabling a general test of ecological validity. If deliberating in a group generally influences the individual, then the two indirect effects should be significantly different. The a2bc can also be interpreted as a specific test of how much changes in jury level means of this specific mediator effect juror level decision-making. / Dissertation/Thesis / Doctoral Dissertation Psychology 2015
|
46 |
Estimating Causal Direct and Indirect Effects in the Presence of Post-Treatment Confounders: A Simulation StudyJanuary 2013 (has links)
abstract: In investigating mediating processes, researchers usually use randomized experiments and linear regression or structural equation modeling to determine if the treatment affects the hypothesized mediator and if the mediator affects the targeted outcome. However, randomizing the treatment will not yield accurate causal path estimates unless certain assumptions are satisfied. Since randomization of the mediator may not be plausible for most studies (i.e., the mediator status is not randomly assigned, but self-selected by participants), both the direct and indirect effects may be biased by confounding variables. The purpose of this dissertation is (1) to investigate the extent to which traditional mediation methods are affected by confounding variables and (2) to assess the statistical performance of several modern methods to address confounding variable effects in mediation analysis. This dissertation first reviewed the theoretical foundations of causal inference in statistical mediation analysis, modern statistical analysis for causal inference, and then described different methods to estimate causal direct and indirect effects in the presence of two post-treatment confounders. A large simulation study was designed to evaluate the extent to which ordinary regression and modern causal inference methods are able to obtain correct estimates of the direct and indirect effects when confounding variables that are present in the population are not included in the analysis. Five methods were compared in terms of bias, relative bias, mean square error, statistical power, Type I error rates, and confidence interval coverage to test how robust the methods are to the violation of the no unmeasured confounders assumption and confounder effect sizes. The methods explored were linear regression with adjustment, inverse propensity weighting, inverse propensity weighting with truncated weights, sequential g-estimation, and a doubly robust sequential g-estimation. Results showed that in estimating the direct and indirect effects, in general, sequential g-estimation performed the best in terms of bias, Type I error rates, power, and coverage across different confounder effect, direct effect, and sample sizes when all confounders were included in the estimation. When one of the two confounders were omitted from the estimation process, in general, none of the methods had acceptable relative bias in the simulation study. Omitting one of the confounders from estimation corresponded to the common case in mediation studies where no measure of a confounder is available but a confounder may affect the analysis. Failing to measure potential post-treatment confounder variables in a mediation model leads to biased estimates regardless of the analysis method used and emphasizes the importance of sensitivity analysis for causal mediation analysis. / Dissertation/Thesis / Ph.D. Psychology 2013
|
47 |
Inference of gene networks from time series expression data and application to type 1 DiabetesLopes, Miguel 04 September 2015 (has links)
The inference of gene regulatory networks (GRN) is of great importance to medical research, as causal mechanisms responsible for phenotypes are unravelled and potential therapeutical targets identified. In type 1 diabetes, insulin producing pancreatic beta-cells are the target of an auto-immune attack leading to apoptosis (cell suicide). Although key genes and regulations have been identified, a precise characterization of the process leading to beta-cell apoptosis has not been achieved yet. The inference of relevant molecular pathways in type 1 diabetes is then a crucial research topic. GRN inference from gene expression data (obtained from microarrays and RNA-seq technology) is a causal inference problem which may be tackled with well-established statistical and machine learning concepts. In particular, the use of time series facilitates the identification of the causal direction in cause-effect gene pairs. However, inference from gene expression data is a very challenging problem due to the large number of existing genes (in human, over twenty thousand) and the typical low number of samples in gene expression datasets. In this context, it is important to correctly assess the accuracy of network inference methods. The contributions of this thesis are on three distinct aspects. The first is on inference assessment using precision-recall curves, in particular using the area under the curve (AUPRC). The typical approach to assess AUPRC significance is using Monte Carlo, and a parametric alternative is proposed. It consists on deriving the mean and variance of the null AUPRC and then using these parameters to fit a beta distribution approximating the true distribution. The second contribution is an investigation on network inference from time series. Several state of the art strategies are experimentally assessed and novel heuristics are proposed. One is a fast approximation of first order Granger causality scores, suited for GRN inference in the large variable case. Another identifies co-regulated genes (ie. regulated by the same genes). Both are experimentally validated using microarray and simulated time series. The third contribution of this thesis is on the context of type 1 diabetes and is a study on beta cell gene expression after exposure to cytokines, emulating the mechanisms leading to apoptosis. 8 datasets of beta cell gene expression were used to identify differentially expressed genes before and after 24h, which were functionally characterized using bioinformatics tools. The two most differentially expressed genes, previously unknown in the type 1 Diabetes literature (RIPK2 and ELF3) were found to modulate cytokine induced apoptosis. A regulatory network was then inferred using a dynamic adaptation of a state of the art network inference method. Three out of four predicted regulations (involving RIPK2 and ELF3) were experimentally confirmed, providing a proof of concept for the adopted approach. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
48 |
Essays in Political MethodologyBlackwell, Matthew 24 July 2012 (has links)
This dissertation provides three novel methodologies to the field of political science. In the first chapter, I describe how to make causal inferences in the face of dynamic strategies. Traditional causal inference methods assume that these dynamic decisions are made all at once, an assumption that forces a choice between omitted variable bias and post-treatment bias. I resolve this dilemma by adapting methods from biostatistics and use these methods to estimate the effectiveness of an inherently dynamic process: a candidate's decision to "go negative." Drawing on U.S. statewide elections (2000-2006), I find, in contrast to the previous literature, that negative advertising is an effective strategy for non-incumbents. In the second chapter, I develop a method for handling measurement error. Social scientists devote considerable effort to mitigating measurement error during data collection but then ignore the issue during analysis. Although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. This chapter develops an easy-to-use alternative without these problems as a special case of extreme measurement error and corrects for both. In the final chapter, I introduce a model for detecting changepoints in the distribution of contributions data because it allows for overdispersion, a key feature of contributions data. While many extant changepoint models force researchers to choose the number of changepoint ex ante, the game-changers model incorporates a Dirichlet process prior in order to estimate the number of changepoints along with their location. I demonstrate the usefulness of the model in data from the 2012 Republican primary and the 2008 U.S. Senate elections. / Government
|
49 |
Inférence causale, modélisation prédictive et décision médicale. / Causal inference, predictive modeling and medical decision-making.Nguyên, Tri Long 20 September 2016 (has links)
La prise de décision médicale se définit par le choix du traitement de la maladie, dans l’attente d’un résultat probable tentant de maximiser les bénéfices sur la santé du patient. Ce choix de traitement doit donc reposer sur les preuves scientifiques de son efficacité, ce qui renvoie à une problématique d’estimation de l’effet-traitement. Dans une première partie, nous présentons, proposons et discutons des méthodes d’inférence causale, permettant d’estimer cet effet-traitement par des approches expérimentales ou observationnelles. Toutefois, les preuves obtenues par ces méthodes fournissent une information sur l’effet-traitement uniquement à l’échelle de la population globale, et non à l’échelle de l’individu. Connaître le devenir probable du patient est essentiel pour adapter une décision clinique. Nous présentons donc, dans une deuxième partie, l’approche par modélisation prédictive, qui a permis une avancée en médecine personnalisée. Les modèles prédictifs fournissent au clinicien une information pronostique pour son patient, lui permettant ensuite le choix d’adapter le traitement. Cependant, cette approche a ses limites, puisque ce choix de traitement repose encore une fois sur des preuves établies en population globale. Dans une troisième partie, nous proposons donc une méthode originale d’estimation de l’effet-traitement individuel, en combinant inférence causale et modélisation prédictive. Dans le cas où un traitement est envisagé, notre approche permettra au clinicien de connaître et de comparer d’emblée le pronostic de son patient « avant traitement » et son pronostic « après traitement ». Huit articles étayent ces approches. / Medical decision-making is defined by the choice of treatment of illness, which attempts to maximize the healthcare benefit, given a probable outcome. The choice of a treatment must be therefore based on a scientific evidence. It refers to a problem of estimating the treatment effect. In a first part, we present, discuss and propose causal inference methods for estimating the treatment effect using experimental or observational designs. However, the evidences provided by these approaches are established at the population level, not at the individual level. Foreknowing the patient’s probability of outcome is essential for adapting a clinical decision. In a second part, we present the approach of predictive modeling, which provided a leap forward in personalized medicine. Predictive models give the patient’s prognosis at baseline and then let the clinician decide on treatment. This approach is therefore limited, as the choice of treatment is still based on evidences stated at the overall population level. In a third part, we propose an original method for estimating the individual treatment effect, by combining causal inference and predictive modeling. Whether a treatment is foreseen, our approach allows the clinician to foreknow and compare both the patient’s prognosis without treatment and the patient’s prognosis with treatment. Within this thesis, we present a series of eight articles.
|
50 |
Statistical issues in Mendelian randomization : use of genetic instrumental variables for assessing causal associationsBurgess, Stephen January 2012 (has links)
Mendelian randomization is an epidemiological method for using genetic variationto estimate the causal effect of the change in a modifiable phenotype onan outcome from observational data. A genetic variant satisfying the assumptionsof an instrumental variable for the phenotype of interest can be usedto divide a population into subgroups which differ systematically only in thephenotype. This gives a causal estimate which is asymptotically free of biasfrom confounding and reverse causation. However, the variance of the causalestimate is large compared to traditional regression methods, requiring largeamounts of data and necessitating methods for efficient data synthesis. Additionally,if the association between the genetic variant and the phenotype is notstrong, then the causal estimates will be biased due to the “weak instrument”in finite samples in the direction of the observational association. This biasmay convince a researcher that an observed association is causal. If the causalparameter estimated is an odds ratio, then the parameter of association willdiffer depending on whether viewed as a population-averaged causal effect ora personal causal effect conditional on covariates. We introduce a Bayesian framework for instrumental variable analysis, whichis less susceptible to weak instrument bias than traditional two-stage methods,has correct coverage with weak instruments, and is able to efficiently combinegene–phenotype–outcome data from multiple heterogeneous sources. Methodsfor imputing missing genetic data are developed, allowing multiple genetic variantsto be used without reduction in sample size. We focus on the question ofa binary outcome, illustrating how the collapsing of the odds ratio over heterogeneousstrata in the population means that the two-stage and the Bayesianmethods estimate a population-averaged marginal causal effect similar to thatestimated by a randomized trial, but which typically differs from the conditionaleffect estimated by standard regression methods. We show how thesemethods can be adjusted to give an estimate closer to the conditional effect. We apply the methods and techniques discussed to data on the causal effect ofC-reactive protein on fibrinogen and coronary heart disease, concluding withan overall estimate of causal association based on the totality of available datafrom 42 studies.
|
Page generated in 0.0684 seconds