Global ETD Search

1	Finding common support and assessing matching methods for causal inference Mahmood, Sharif January 1900 (has links) Doctor of Philosophy / Department of Statistics / Michael J. Higgins / This dissertation presents an approach to assess and validate causal inference tools to es- timate the causal effect of a treatment. Finding treatment effects in observational studies is complicated by the need to control for confounders. Common approaches for controlling include using prognostically important covariates to form groups of similar units containing both treatment and control units or modeling responses through interpolation. This disser- tation proposes a series of new, computationally efficient methods to improve the analysis of observational studies. Treatment effects are only reliably estimated for a subpopulation under which a common support assumption holds—one in which treatment and control covariate spaces overlap. Given a distance metric measuring dissimilarity between units, a graph theory is used to find common support. An adjacency graph is constructed where edges are drawn between similar treated and control units to determine regions of common support by finding the largest connected components (LCC) of this graph. The results show that LCC improves on existing methods by efficiently constructing regions that preserve clustering in the data while ensuring interpretability of the region through the distance metric. This approach is extended to propose a new matching method called largest caliper matching (LCM). LCM is a version of cardinality matching—a type of matching used to maximize the number of units in an observational study under a covariate balance constraint between treatment groups. While traditional cardinality matching is an NP-hard, LCM can be completed in polynomial time. The performance of LCM with other five popular matching methods are shown through a series of Monte Carlo simulations. The performance of the simulations is measured by the bias, empirical standard deviation and the mean square error of the estimates under different treatment prevalence and different distributions of covariates. The formed matched samples improve estimation of the population treatment effect in a wide range of settings, and suggest cases in which certain matching algorithms perform better than others. Finally, this dissertation presents an application of LCC and matching methods on a study of the effectiveness of right heart catheterization (RHC) and find that clinical outcomes are significantly worse for patients that undergo RHC. Causal inference
2	Methodological problems in causal inference, with reference to transitional justice Lee, Byung-Jae 22 September 2014 (has links) This dissertation addresses methodological problems in causal inference in the presence of time-varying confounding, and provides methodological tools to handle the problems within the potential outcomes framework of causal inference. The time-varying confounding is common in longitudinal observational studies, in which the covariates and treatments are interacting and changing over time in response to the intermediate outcomes and changing circumstances. The existing approaches in causal inference are mostly focused on static single-shot decision-making settings, and have limitations in estimating the effects of long-term treatments on the chronic problems. In this dissertation, I attempt to conceptualize the causal inference in this situation as a sequential decision problem, using the conceptual tools developed in decision theory, dynamic treatment regimes, and machine learning. I also provide methodological tools useful for this situation, especially when the treatments are multi-level and changing over time, using inverse probability weights and $g$-estimation. Substantively, this dissertation examines transitional justice's effects on human rights and democracy in emerging democracies. Using transitional justice as an example to illustrate the proposed methods, I conceptualize the adoption of transitional justice by a new government as a sequential decision-making process, and empirically examine the comparative effectiveness of transitional justice measures --- independently or in combination with others --- on human rights and democracy. / text Causal inference Transitional justice
3	Estimating Individual Causal Effects Lam, Patrick Kenneth 18 October 2013 (has links) Most empirical work focuses on the estimation of average treatment effects (ATE). In this dissertation, I argue for a different way of thinking about causal inference by estimating individual causal effects (ICEs). I argue that focusing on estimating ICEs allows for a more precise and clear understanding of causal inference, reconciles the difference between what the researcher is interested in and what the researcher estimates, allows the researcher to explore and discover treatment effect heterogeneity, bridges the quantitative-qualitative divide, and allows for easy estimation of any other causal estimand. / Government Political Science Statistics causal inference
4	Propensity Score for Causal Inference of Multiple and Multivalued Treatments Gu, Zirui 01 January 2016 (has links) Propensity score methods (PSM) that have been widely used to reduce selection bias in observational studies are restricted to a binary treatment. Imai and van Dyk extended PSM to estimate non-binary treatment effect using stratification with P-Function, and generalized inverse treatment probability weighting (GIPTW). However, propensity score (PS) matching methods on multiple treatments received little attention, and existing generalized PSMs merely focused on estimates of main treatment effects but omitted potential interaction effects that are of essential interest in many studies. In this dissertation, I extend Rubin’s PS matching theory to general treatment regimens under the P-Function framework. From theory to practice, I propose an innovative distance measure that can summarize similarities among subjects in multiple treatment groups. Based on this distance measure I propose four generalized propensity score matching methodologies. The first two methods are extensions of nearest neighbor matching. I implemented Monte Carlo simulation studies to compare them with GIPTW and stratification on P-Function methods. The next two methods are extensions of the nearest neighbor caliper width matching and variable matching. I define the caliper width as the product of a weighted standard deviation of all possible pairwise distances between two treatment groups. I conduct a series of simulation studies to determine an optimal caliper width by searching the lowest mean square error of average causal interaction effect. I further compare the ones with optimal caliper width with other methods using simulations. Finally, I apply these methods to the National Medical Expenditure Survey data to examine the average causal main effect of duration and frequency of smoking as well as their interaction effect on annual medical expenditures. Using proposed methods, researchers can apply regression models with specified interaction terms to the matched data and simultaneously obtain both main and interaction effects estimate with improved statistical properties. Propensity Score Causal Inference Multiple and Multivalued Treatments
5	Applications of machine learning to agricultural land values: prediction and causal inference Er, Emrah January 1900 (has links) Doctor of Philosophy / Department of Agricultural Economics / Nathan P. Hendricks / This dissertation focuses on the prediction of agricultural land values and the effects of water rights on land values using machine learning algorithms and hedonic pricing methods. I predict agricultural land values with different machine learning algorithms, including ridge regression, least absolute shrinkage and selection operator, random forests, and extreme gradient boosting methods. To analyze the causal effects of water right seniority on agricultural land values, I use the double-selection LASSO technique. The second chapter presents the data used in the dissertation. A unique set of parcel sales from Property Valuation Division of Kansas constitute the backbone of the data used in the estimation. Along with parcel sales data, I collected detailed basis, water, tax, soil, weather, and urban influence data. This chapter provides detailed explanation of various data sources and variable construction processes. The third chapter presents different machine learning models for irrigated agricultural land price predictions in Kansas. Researchers, and policymakers use different models and data sets for price prediction. Recently developed machine learning methods have the power to improve the predictive ability of the models estimated. In this chapter I estimate several machine learning models for predicting the agricultural land values in Kansas. Results indicate that the predictive power of the machine learning methods are stronger compared to standard econometric methods. Median absolute error in extreme gradient boosting estimation is 0.1312 whereas it is 0.6528 in simple OLS model. The fourth chapter examines whether water right seniority is capitalized into irrigated agricultural land values in Kansas. Using a unique data set of irrigated agricultural land sales, I analyze the causal effect of water right seniority on agricultural land values. A possible concern during the estimation of hedonic models is the omitted variable bias so we use double-selection LASSO regression and its variable selection properties to overcome the omitted variable bias. I also estimate generalized additive models to analyze the nonlinearities that may exist. Results show that water rights have a positive impact on irrigated land prices in Kansas. An additional year of water right seniority causes irrigated land value to increase nearly $17 per acre. Further analysis also suggest a nonlinear relationship between seniority and agricultural land prices. Land Values Machine Learning Prediction Causal Inference
6	Essays on Causal Inference for Public Policy Zajonc, Tristan 07 August 2012 (has links) Effective policymaking requires understanding the causal effects of competing proposals. Relevant causal quantities include proposals' expected effect on different groups of recipients, the impact of policies over time, the potential trade-offs between competing objectives, and, ultimately, the optimal policy. This dissertation studies causal inference for public policy, with an emphasis on applications in economic development and education. The ﬁrst chapter introduces Bayesian methods for time-varying treatments that commonly arise in economics, health, and education. I present methods that account for dynamic selection on intermediate outcomes and can estimate the causal eﬀect of arbitrary dynamic treatment regimes, recover the optimal regime, and characterize the set of feasible outcomes under diﬀerent regimes. I demonstrate these methods through an application to optimal student tracking in ninth and tenth grade mathematics. The proposed estimands characterize outcomes, mobility, equity, and eﬃciency under diﬀerent tracking regimes. The second chapter studies regression discontinuity designs with multiple forcing variables. Leading examples include education policies where treatment depends on multiple test scores and spatial treatment discontinuities arising from geographic borders. I give local linear estimators for both the conditional eﬀect along the boundary and the average eﬀect over the boundary. For two-dimensional RD designs, I derive an optimal, data-dependent, bandwidth selection rule for the conditional eﬀect. I demonstrate these methods using a summer school and grade retention example. The third chapters illustrate the central role of persistence in estimating and interpreting value-added models of learning. Using data from Pakistani public and private schools, I apply dynamic panel methods that address three key empirical challenges: imperfect persistence, unobserved student heterogeneity, and measurement error. After correcting for these diﬃculties, the estimates suggest that only a ﬁfth to a half of learning persists between grades and that private schools increase average achievement by 0.25 standard deviations each year. In contrast, value-added models that assume perfect persistence yield severely downwardly biased and occasionally wrong-signed estimates of the private school eﬀect. causal inference economics program evaluation statistics education
7	The effect of sugar-sweetened beverage consumption on childhood obesity - causal evidence Yang, Yan 18 May 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Communities and States are increasingly targeting the consumption of sugar sweetened beverages (SSBs), especially soda, in their efforts to curb childhood obesity. However, the empirical evidence based on which policy makers design the relevant policies is not causally interpretable. In the present study, we suggest a modeling framework that can be used for making causal estimation and inference in the context of childhood obesity. This modeling framework is built upon the two-stage residual inclusion (2SRI) instrumental variables method and have two levels – level one models children’s lifestyle choices and level two models children’s energy balance which is assumed to be dependent on their lifestyle behaviors. We start with a simplified version of the model that includes only one policy, one lifestyle, one energy balance, and one observable control variable. We then extend this simple version to be a general one that accommodates multiple policy and lifestyle variables. The two versions of the model are 1) first estimated via the nonlinear least square (NLS) method (henceforth NLS-based 2SRI); and 2) then estimated via the maximum likelihood estimation (MLE) method (henceforth MLE-based 2SRI). Using simulated data, we show that 1) our proposed 2SRI method outperforms the conventional method that ignores the inherent nonlinearity [the linear instrumental variables (LIV) method] or the potential endogeneity [the nonlinear regression (NR) method] in obtaining the relevant estimators; and 2) the MLE-based 2SRI provides more efficient estimators (also consistent) compared to the NLS-based one. Real data analysis is conducted to illustrate the implementation of 2SRI method in practice using both NLS and MLE methods. However, due to data limitation, we are not able to draw any inference regarding the impacts of lifestyle, specifically SSB consumption, on childhood obesity. We are in the process of getting better data and, after doing so, we will replicate and extend the analyses conducted here. These analyses, we believe, will produce causally interpretable evidence of the effects of SSB consumption and other lifestyle choices on childhood obesity. The empirical analyses presented in this dissertation should, therefore, be viewed as an illustration of our newly proposed framework for causal estimation and inference. Causal inference Childhood obesity Sugar-sweetened beverage
8	Empirical stadies of online markets: the impact of product page cues on consumer decisions Banerjee, Shrabastee 14 May 2021 (has links) The widespread expansion of online markets in the past decade poses several questions for platforms, firms and customers alike. An important dimension to be explored in this domain is the provision of information on e-commerce platforms - given the increasing ease with which product pages can be customized to include a vast variety of content, how do these pieces of information interact? Further, what are the specific channels through which this information eventually influences consumer decision-making? My dissertation is situated in this space, and aims to look at how consumers respond to various “cues” that are being introduced by e-commerce platforms which offer products or services that can be purchased online, and how these cues might eventually influence decision-making. In my first dissertation project, the cue I focus on is user generated content. More specifically, I study how the introduction of the Q&A technology (which enables customers to ask product-specific questions before purchase, and receive answers either from other customers or the platform itself) affects the more widely established reviews and ratings feature on e-commerce platforms. I find that the addition of Q&As leads to better matches between customers and products, higher customer satisfaction, and resultantly higher ratings. My second project examines another cue that is common in online markets, which is the advertised reference price. My goal in this project is to examine how users react to a specific variant of such prices, namely the “Starting from...” price, using data from a large scale field experiment conducted on Holidu.com. My results indicate that raising “From” prices gives users a more accurate price estimate, but it negatively impacts outbound clicks and other engagement metrics. Taken together, the two projects aim to shed light on factors that influence consumer decision-making in an e-commerce setting, and the possible mechanisms underlying this influence. Marketing Causal inference Field experiment Online markets
9	Precision improvement for Mendelian Randomization Zhu, Yineng 23 January 2023 (has links) Mendelian Randomization (MR) methods use genetic variants as instrumental variables (IV) to infer causal relationships between an exposure and an outcome, which overcomes the inability to infer such a relationship in observational studies due to unobserved confounders. There are several MR methods, including the inverse variance weighted (IVW) method, which has been extended to deal with correlated IVs; the median method, which provides consistent causal estimates in the presence of pleiotropy when less than half of the genetic variants are invalid IVs but assumes independent IVs. In this dissertation, we propose two new methods to improve precision for MR analysis. In the first chapter, we extend the median method to correlated IVs: the quasi-boots median method, that accounts for IV correlation in the standard error estimation using a quasi-bootstrap method. Simulation studies show that this method outperforms existing median methods under the correlated IVs setting with and without the presence of pleiotropic effects. In the second chapter, to overcome the lack of an effective solution to account for sample overlap in current IVW methods, we propose a new overall causal effect estimator by exploring the distribution of the estimator for individual IVs under the independent IVs setting, which we name the IVW-GH method. In the final chapter, we extend the IVW-GH method to correlated IVs. In simulation studies, the IVW-GH method outperforms the existing IVW methods under the one-sample setting for independent IVs and shows reasonable results for other settings. We apply these proposed methods to genome-wide association results from the Framingham Heart Study Offspring Study and the Million Veteran Program to identify potential causal relationships between a number of proteins and lipids. All the proposed methods are able to identify some proteins known to be related to lipids. In addition, the quasi-boots median method is robust to pleiotropic effects in the real data application. Consequently, the newly proposed quasi-boots median method and IVW-GH method may provide additional insights for identifying causal relationships. / 2025-01-23T00:00:00Z Biostatistics Causal inference Mendelian Randomization Pleiotropy
10	A conditional view of causality Weinert, Friedel January 2007 (has links) No / Causal inference is perhaps the most important form of reasoning in the sciences. A panoply of disciplines, ranging from epidemiology to biology, from econometrics to physics, make use of probability and statistics to infer causal relationships. The social and health sciences analyse population-level data using statistical methods to infer average causal relations. In diagnosis of disease, probabilistic statements are based on population-level causal knowledge combined with knowledge of a particular person¿s symptoms. For the physical sciences, the Salmon-Dowe account develops an analysis of causation based on the notion of process and interaction. In artificial intelligence, the development of graphical methods has leant impetus to a probabilistic analysis of causality. The biological sciences use probabilistic methods to look for evolutionary causes of the state of a current species and to look for genetic causal factors. This variegated situation raises at least two fundamental philosophical issues: about the relation between causality and probability, and about the interpretation of probability in causal analysis. In this book we bring philosophers and scientists together to discuss the relation between causality and probability, and the applications of these concepts within the sciences. Causality ; Probability ; Causal inference ; Causal relationships

Search results