Spelling suggestions: "subject:"causal inference"" "subject:"kausal inference""
91 |
Using Machine Learning and Daytime Satellite Imagery to Estimate Aid's Effect on Wealth: Comparing China and World Bank Programs in AfricaConlin, Cindy January 2024 (has links)
A large literature has not reached consensus on foreign aid’s economic effects. Using geolocated aid data and daytime satellite images over nearly 10,000 African neighborhoods, I examine the economic growth impact of World Bank and Chinese aid to 36 Africa countries from 2002-2013, covering 88% of the continent’s population, by sector (e.g. Health, Education, Water Supply and Sanitation, etc.). I estimate each funder and aid sector’s average treatment effect with an inverse probability weighting approach and adjust for two types of confounders: those I provide in a tabular format and proxies based on satellite images of each neighborhood. The use of image-based confounders may reduce bias due to omitted variables and measurement errors when unobserved or mis-measured variables are visible remotely. To measure economic outcomes, I use a new wealth index generated by a machine learning algorithm trained to associate USAID-funded DHS survey wealth measures with daytime and nighttime satellite imagery from the same years and locations. The availability of the wealth estimate for 3-year periods over thirty years enabled the analysis to use panel data and fixed effects at the second administrative division (e.g. county, district, city) level. The results are heterogenous across sectors but generally show small positive effects of World Bank aid and larger positive effects of Chinese aid. Substantive results are generally robust to the choice of computer vision image model, except for three funder-sectors where wide confidence intervals make one model but not the other statistically insignificant.
|
92 |
Family Behavior and Children’s Wellbeing: Statistical Modeling and Measurement IssuesRodríguez Sánchez, Alejandra 30 June 2023 (has links)
In dieser Dissertation gehe ich auf verschiedene statistische Modellierungs- und Messprobleme ein, die eine kausale Interpretation der in der Literatur zu Familiensoziologie und sozialer Ungleichheit gefundenen Zusammenhängen erschweren. Erstens legt die Lebensverlaufsforschung nahe, dass das Problem der Verzerrung durch Selektion in der Literatur über die Abwesenheit von Vätern komplexer sein könnte als angenommen. Durch die Korrektur von dynamischen Verzerrungen wird die Schätzung des kausalen Effektes der Abwesenheit des Vaters auf das Wohlergehen der Kinder reduziert. Zweitens wird angenommen, dass familiäre Instabilität in der Kindheit das Wohlbefinden der Kinder negativ beeinflusst. Allerdings könnten zeitabhängige konfundierende Faktoren, die durch vergangene Episoden familiärer Instabilität beeinflusst werden und sich auf die künftige Stabilität der Familie auswirken, einen Teil der angenommenen negativen Auswirkungen erklären. Ich zeige, dass eine dynamische Version der Selektionshypothese eine wesentliche Rolle bei der Entkräftung der Hypothese der familiären Instabilität spielt. Drittens deuten die Forschungsergebnisse darauf hin, dass die soziale Stratifizierung bei den Sprachkenntnissen von Vorschulkindern durch Eingriffe in den Erziehungsstil von Eltern mit wenig Ressourcen erheblich verringert werden könnten. Mit Hilfe einer kausalen Mediationsanalyse zeige ich, dass die elterliche Erziehung nur etwa ein Drittel des Gesamteffekts des sozioökonomischen Status auf die frühen Sprachfähigkeiten mediieren. Viertens wird die Messung kognitiver Fähigkeiten durch verschiedene Merkmale standardisierter Beurteilungen erschwert. Diese Probleme haben wichtige Auswirkungen auf die Quantifizierung sozialer Ungleichheit bei unbeobachtbaren Variablen und auf die Forschung zu kausalen Effekten. Die Dissertation schließt mit einem Plädoyer zur rigoroseren Anwendung von Methoden der kausalen Inferenz in Familiensoziologie und Forschung zu sozialer Ungleichheit. / In this dissertation, I consider various statistical modeling and measurement issues that complicate the causal attributions made about those associations in the literature in family sociology and social inequality. First, life course informed research suggests that the problem of selection bias in the father absence literature may be more complex than currently thought. After adjusting for dynamic biases, estimates of father absence's effect on children's wellbeing are reduced. Second, family instability experienced during childhood is said to negatively affect children's wellbeing. However, time-dependent confounders affected by past episodes of family instability and affecting future family stability might explain away part of the negative impact. I show that a dynamic version of the selection hypothesis counters the family instability hypothesis, and the effects of cumulative family instability are small and not consistent with the family instability hypothesis. Third, research suggest that socioeconomic status gaps in language skills among preschoolers could be substantially reduced by intervening on the parenting styles, practices, and parental investments of low-resource parents. Employing interventional causal mediation analysis, however, I show parenting mediates around one third of the total effect of SES on early language skills. Fourth, the measurement of cognitive abilities is complicated by various features of standardized assessments. Those problems have important implications for the quantification of social inequality in unobservable variables and for causal inference research because test scores capture non-random noise. The dissertation concludes by making a plea for furthering causal inference thinking in family sociology, social inequality, social mobility, and family demography research.
|
93 |
Essays on using machine learning for causal inferenceJacob, Daniel 01 March 2022 (has links)
Um Daten am effektivsten zu nutzen, muss die moderne Ökonometrie ihren Werkzeugkasten an Modellen erweitern und neu denken. Das Feld, in dem diese Transformation am besten beobachtet werden kann, ist die kausale Inferenz.
Diese Dissertation verfolgt die Absicht Probleme zu untersuchen, Lösungen zu präsentieren und neue Methoden zu entwickeln Machine Learning zu benutzen, um kausale Parameter zu schätzen. Dafür werden in der Dissertation zuerst verschiedene neuartige Methoden, welche als Ziel haben heterogene Treatment Effekte zu messen, eingeordnet. Im zweiten Schritt werden, basierend auf diesen Methoden, Richtlinien für ihre Anwendung in der Praxis aufgestellt. Der Parameter von Interesse ist der „conditional average treatment effect“ (CATE). Es kann gezeigt werden, dass ein Vergleich mehrerer Methoden gegenüber der Verwendung einer einzelnen Methode vorzuziehen ist. Ein spezieller Fokus liegt dabei auf dem Aufteilen und Gewichten der Stichprobe, um den Verlust in Effizienz wettzumachen. Ein unzulängliches Kontrollieren für die Variation durch verschiedene Teilstichproben führt zu großen Unterschieden in der Präzision der geschätzten Parameter. Wird der CATE durch Bilden von Quantilen in Gruppen unterteilt, führt dies zu robusteren Ergebnissen in Bezug auf die Varianz.
Diese Dissertation entwickelt und untersucht nicht nur Methoden für die Schätzung der Heterogenität in Treatment Effekten, sondern auch für das Identifizieren von richtigen Störvariablen. Hierzu schlägt diese Dissertation sowohl die „outcome-adaptive random forest“ Methode vor, welche automatisiert Variablen klassifiziert, als auch „supervised randomization“ für eine kosteneffiziente Selektion der Zielgruppe. Einblicke in wichtige Variablen und solche, welche keine Störung verursachen, ist besonders in der Evaluierung
von Politikmaßnahmen aber auch im medizinischen Sektor wichtig, insbesondere dann, wenn kein randomisiertes Experiment möglich ist. / To use data effectively, modern econometricians need to expand and rethink their toolbox. One field where such a transformation has already started is causal inference. This thesis aims to explore further issues, provide solutions, and develop new methods on how machine learning can be used to estimate causal parameters. I categorize novel methods to estimate heterogeneous treatment effects and provide a practitioner’s guide for implementation. The parameter of interest is the conditional average treatment effect (CATE). It can be shown that an ensemble of methods is preferable to relying on one method. A special focus, with respect to the CATE, is set on the comparison of such methods and the role of sample splitting and cross-fitting to restore efficiency. Huge differences in the estimated parameter accuracy can occur if the sampling uncertainty is not correctly accounted for. One feature of the CATE is a coarser representation through quantiles. Estimating groups of the CATE leads to more robust estimates with respect to the sampling uncertainty and the resulting high variance.
This thesis not only develops and explores methods to estimate treatment effect heterogeneity but also to identify confounding variables as well as observations that should receive treatment. For these two tasks, this thesis proposes the outcome-adaptive random forest for automatic variable selection, as well as supervised randomization for a cost-efficient selection of the target group. Insights into important variables and those that are not true confounders are very helpful for policy evaluation and in the medical sector when randomized control trials are not possible.
|
94 |
Comparaison d'estimateurs de la variance du TMLEBoulanger, Laurence 09 1900 (has links)
No description available.
|
95 |
Strategies for assessing health risks from two occupational cohorts within the domain of northern Sweden / Strategier vid utvärdering av hälsorisker baserade på två arbetarekohorter från norra SverigeBjör, Ove January 2013 (has links)
Background Studies based on a cohort design requires access to both subject-specific and period-specific information. In order to conduct an occupational cohort study, access to exposure information and the possibility and permission to link information on outcomes from other registers are generally necessary. The analysis phase is also aggravated by its added complexity because of the longitudinal dimension of the cohort’s data.This thesis aims at increasing the knowledge on hazards from work on fatalities and cancer within the domain of cohort studies on miners and metal refiners and to study the complexity of the analysis by discussing and suggesting analytical strategies. Methods The study population for this thesis consisted of a cohort of 2264 blue-collar aluminium smelter workers (paper I) and a cohort of 13000 blue-collar iron-ore miners (papers II-IV), both followed for over 50 years. The outcomes were collected from the Swedish Cause of Death Register and the Swedish Cancer Register. The primary methods of analysis were either Standardized Morbidity Ratios (SMR) or internal comparisons based on Cox or Poisson regression modeling. In paper IV, a g-estimation based on an accelerated failure-time model was performed to estimate the survival ratio. Results The results from paper I suggested that working as a blue-collar worker metal refiner was associated with increased rates of incidental lung cancer. Elevated rates among short term workers were observed for several outcomes. Paper I also showed that the choice of reference population when calculating SMR could influence the conclusions of the results. In paper II, several outcomes were elevated among the miners compared to the reference population from northern Sweden. However, no outcome except lung cancer was associated with cumulative employment time. The most recurrent pattern of the results was the negative association between cumulative employment time underground and several outcomes. The results from paper III showed that cumulative employment time working outdoors was associated with increased rates of cerebrovascular disease mortality. However, employment with heavy physical workloads did not explain the previously observed decreasing rates in the selected groups of outcomes. The adjustment for the healthy worker survivor effect by g-estimation in paper IV suggested that exposure from respirable dust was associated with elevated mortality risks that could not be observed with standard analytical methods. Conclusion Our studies found several rates from the cohorts that were elevated compared to external refererence populations but also that long term employments generally were associated with decreasing rates. Furthermore, incidental lung cancer rates was found elevated for the metal refiners. Among the miners, mortality rates of cerebrovascular diseases depended on if work was performed outdoor (higher rates) or underground (lower rates). Methodologically, this thesis has discussed different analytical strategies for handling confounding in occupational cohort studies. Paper IV showed that the healthy worker survivor effect could be adjusted for by performing g-estimation.
|
96 |
Investigating the relationship between markers of ageing and cardiometabolic diseaseWright, Daniel John January 2018 (has links)
Human ageing is accompanied by characteristic metabolic and endocrine changes, including altered hormone profiles, insulin resistance and deterioration of skeletal muscle. Obesity and diabetes may themselves drive an accelerated ageing phenotype. Untangling the causal web between ageing, obesity and diabetes is a priority in order to understand their aetiology and improve prevention and management. The role of biological ageing in determining the risk of obesity and associated conditions has often been examined using mean leukocyte telomere length (LTL), a marker of replicative fatigue and senescence. However, considering phenotypes which represent different domains of biological and functional ageing as exposures for obesity and related traits could allow the elucidation of new understudied phenotypes relevant to cardio-metabolic risk in the wider population. This PhD considers the causal role of (1) hand grip strength (HGS), a marker of overall strength and physical functioning, and (2) resting energy expenditure, an indicator of overall energy metabolism and the major component of daily energy expenditure, in cardio-metabolic risk. I also characterise a new and readily-quantifiable marker of age-related genomic instability, mosaic loss of the Y chromosome (mLOY). Observational evidence implicates each of these phenotypes in cardio-metabolic conditions and intermediate phenotypes. However, it is not possible to infer causality from these observational associations due to confounding and reverse-causality. Mendelian randomisation offers a solution to these limitations and can allow the causal nature of these relationships to be investigated. Using population-based data including UK Biobank, this thesis presents the first large-scale genetic discovery effort for each trait and provides new biological insight into their shared and separate aetiology. I used identified variants to investigate the bidirectional causal associations of each trait with cardio-metabolic outcomes, intermediate phenotypes and other related traits such as frailty and mortality. In total I identified 16 loci for hand grip strength, 19 for mLOY, and one signal for REE. I have shown that HGS is likely to be causally linked to fracture risk, and I have identified the important shared genetic architecture between mLOY, glycaemic traits and cancer. I have also demonstrated that at least one known genetic variant contributing to obesity risk acts partially via reduced REE. Overall the findings of my PhD contribute to our wider understanding of the aetiological role of ageing processes in metabolic dysfunction, and have implications for both basic science and translational applications.
|
97 |
Causal inference and prior integration in bioinformatics using information theoryOlsen, Catharina 17 October 2013 (has links)
An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.<p><p>Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.<p><p>The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.<p><p>Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
98 |
Attenuation, Stasis, or Amplification: Change in the Causal Effect of Coercive PoliciesSmith, Gregory Lyman January 2020 (has links)
No description available.
|
99 |
Detekce kauzality v časových řadách pomocí extrémních hodnot / Detection of causality in time series using extreme valuesBodík, Juraj January 2021 (has links)
Juraj Bodík Abstract This thesis is dealing with the following problem: Let us have two stationary time series with heavy- tailed marginal distributions. We want to detect whether they have a causal relation, i.e. if a change in one of them causes a change in the other. The question of distinguishing between causality and correlation is essential in many different science fields. Usual methods for causality detection are not well suited if the causal mechanisms only manifest themselves in extremes. In this thesis, we propose a new method that can help us in such a nontraditional case distinguish between correlation and causality. We define the so-called causal tail coefficient for time series, which, under some assumptions, correctly detects the asymmetrical causal relations between different time series. We will rigorously prove this claim, and we also propose a method on how to statistically estimate the causal tail coefficient from a finite number of data. The advantage is that this method works even if nonlinear relations and common ancestors are present. Moreover, we will mention how our method can help detect a time delay between the two time series. We will show how our method performs on some simulations. Finally, we will show on a real dataset how this method works, discussing a cause of...
|
100 |
Causal Inference for Observational Survival Data using Restricted Mean Survival Time ModelLin, Zihan 09 December 2022 (has links)
No description available.
|
Page generated in 0.2784 seconds