Global ETD Search

1	Essays on using machine learning for causal inference Jacob, Daniel 01 March 2022 (has links) Um Daten am effektivsten zu nutzen, muss die moderne Ökonometrie ihren Werkzeugkasten an Modellen erweitern und neu denken. Das Feld, in dem diese Transformation am besten beobachtet werden kann, ist die kausale Inferenz. Diese Dissertation verfolgt die Absicht Probleme zu untersuchen, Lösungen zu präsentieren und neue Methoden zu entwickeln Machine Learning zu benutzen, um kausale Parameter zu schätzen. Dafür werden in der Dissertation zuerst verschiedene neuartige Methoden, welche als Ziel haben heterogene Treatment Effekte zu messen, eingeordnet. Im zweiten Schritt werden, basierend auf diesen Methoden, Richtlinien für ihre Anwendung in der Praxis aufgestellt. Der Parameter von Interesse ist der „conditional average treatment effect“ (CATE). Es kann gezeigt werden, dass ein Vergleich mehrerer Methoden gegenüber der Verwendung einer einzelnen Methode vorzuziehen ist. Ein spezieller Fokus liegt dabei auf dem Aufteilen und Gewichten der Stichprobe, um den Verlust in Effizienz wettzumachen. Ein unzulängliches Kontrollieren für die Variation durch verschiedene Teilstichproben führt zu großen Unterschieden in der Präzision der geschätzten Parameter. Wird der CATE durch Bilden von Quantilen in Gruppen unterteilt, führt dies zu robusteren Ergebnissen in Bezug auf die Varianz. Diese Dissertation entwickelt und untersucht nicht nur Methoden für die Schätzung der Heterogenität in Treatment Effekten, sondern auch für das Identifizieren von richtigen Störvariablen. Hierzu schlägt diese Dissertation sowohl die „outcome-adaptive random forest“ Methode vor, welche automatisiert Variablen klassifiziert, als auch „supervised randomization“ für eine kosteneffiziente Selektion der Zielgruppe. Einblicke in wichtige Variablen und solche, welche keine Störung verursachen, ist besonders in der Evaluierung von Politikmaßnahmen aber auch im medizinischen Sektor wichtig, insbesondere dann, wenn kein randomisiertes Experiment möglich ist. / To use data effectively, modern econometricians need to expand and rethink their toolbox. One field where such a transformation has already started is causal inference. This thesis aims to explore further issues, provide solutions, and develop new methods on how machine learning can be used to estimate causal parameters. I categorize novel methods to estimate heterogeneous treatment effects and provide a practitioner’s guide for implementation. The parameter of interest is the conditional average treatment effect (CATE). It can be shown that an ensemble of methods is preferable to relying on one method. A special focus, with respect to the CATE, is set on the comparison of such methods and the role of sample splitting and cross-fitting to restore efficiency. Huge differences in the estimated parameter accuracy can occur if the sampling uncertainty is not correctly accounted for. One feature of the CATE is a coarser representation through quantiles. Estimating groups of the CATE leads to more robust estimates with respect to the sampling uncertainty and the resulting high variance. This thesis not only develops and explores methods to estimate treatment effect heterogeneity but also to identify confounding variables as well as observations that should receive treatment. For these two tasks, this thesis proposes the outcome-adaptive random forest for automatic variable selection, as well as supervised randomization for a cost-efficient selection of the target group. Insights into important variables and those that are not true confounders are very helpful for policy evaluation and in the medical sector when randomized control trials are not possible. Kausale Inferenz Machine Learning Heterogenität Variablen Selektion Causal Inference Machine Learning Variable Selection Heterogeneity 122 Kausalität QP 225 ddc:122
2	Calculating control variables with age at onset data to adjust for conditions prior to exposure Höfler, Michael, Brueck, Tanja, Lieb, Roselind, Wittchen, Hans-Ulrich 20 February 2013 (has links) (PDF) Background: When assessing the association between a factor X and a subsequent outcome Y in observational studies, the question that arises is what are the variables to adjust for to reduce bias due to confounding for causal inference on the effect of X on Y. Disregarding such factors is often a source of overestimation because these variables may affect both X and Y. On the other hand, adjustment for such variables can also be a source of underestimation because such variables may be the causal consequence of X and part of the mechanism that leads from X to Y. Methods: In this paper, we present a simple method to compute control variables in the presence of age at onset data on both X and a set of other variables. Using these age at onset data, control variables are computed that adjust only for conditions that occur prior to X. This strategy can be used in prospective as well as in survival analysis. Our method is motivated by an argument based on the counterfactual model of a causal effect. Results: The procedure is exemplified by examining of the relation between panic attack and the subsequent incidence of MDD. Conclusions: The results reveal that the adjustment for all other variables, irrespective of their temporal relation to X, can yield a false negative result (despite unconsidered confounders and other sources of bias). Störfaktor Kausalität kausale Inferenz Erkrankungsalter logistische Regression Lebensdaueranalyse psychische Störungen Epidemiologie confounding causality causal inference age at onset logistic regression survival analysis mental disorders epidemiology ddc:150.2 rvk:CS 1000
3	Calculating control variables with age at onset data to adjust for conditions prior to exposure Höfler, Michael, Brueck, Tanja, Lieb, Roselind, Wittchen, Hans-Ulrich January 2005 (has links) Background: When assessing the association between a factor X and a subsequent outcome Y in observational studies, the question that arises is what are the variables to adjust for to reduce bias due to confounding for causal inference on the effect of X on Y. Disregarding such factors is often a source of overestimation because these variables may affect both X and Y. On the other hand, adjustment for such variables can also be a source of underestimation because such variables may be the causal consequence of X and part of the mechanism that leads from X to Y. Methods: In this paper, we present a simple method to compute control variables in the presence of age at onset data on both X and a set of other variables. Using these age at onset data, control variables are computed that adjust only for conditions that occur prior to X. This strategy can be used in prospective as well as in survival analysis. Our method is motivated by an argument based on the counterfactual model of a causal effect. Results: The procedure is exemplified by examining of the relation between panic attack and the subsequent incidence of MDD. Conclusions: The results reveal that the adjustment for all other variables, irrespective of their temporal relation to X, can yield a false negative result (despite unconsidered confounders and other sources of bias). info:eu-repo/classification/ddc/150.2 ddc:150.2
4	Empirische Wirkungsanalyse direkter Transferzahlungen - am Beispiel von Agrarumweltmaßnahmen und der Ausgleichszulage für benachteiligte Gebiete / Dissertation zur Erlangung des Doktorgrades der Fakultät für Agrarwissenschaften der Georg-August-Universität Göttingen / Empirical analysis of direct farm payments using the example of agri-environment programmes and the less favoured areas scheme / Dissertation for obtaining the doctoral degree of the faculty of Agricultural Science of the Georg-August-Universtity of Goettingen Pufahl, Andrea 11 November 2009 (has links) No description available. 630 Landwirtschaft Veterinärmedizin Agricultural Sciences Direktzahlungen Transferzahlungen Agrarumweltmaßnahmen Ausgleichszulage Evaluation Wirkungsanalyse Faktoreinsatz Nutzungsintensität Kontrollgruppenverleich Matching Panelanalyse Kausale Inferenz Methoden Deutschland direct payments agri-environment programmes less favoured areas scheme evaluation impact analysis factor input land use intensity control group comparison matching panel analysis causal inference methods Germany 48.18 83.66 YJA100 YGA000 YGA200
5	Development and application of new statistical methods for the analysis of multiple phenotypes to investigate genetic associations with cardiometabolic traits Konigorski, Stefan 27 April 2018 (has links) Die biotechnologischen Entwicklungen der letzten Jahre ermöglichen eine immer detailliertere Untersuchung von genetischen und molekularen Markern mit multiplen komplexen Traits. Allerdings liefern vorhandene statistische Methoden für diese komplexen Analysen oft keine valide Inferenz. Das erste Ziel der vorliegenden Arbeit ist, zwei neue statistische Methoden für Assoziationsstudien von genetischen Markern mit multiplen Phänotypen zu entwickeln, effizient und robust zu implementieren, und im Vergleich zu existierenden statistischen Methoden zu evaluieren. Der erste Ansatz, C-JAMP (Copula-based Joint Analysis of Multiple Phenotypes), ermöglicht die Assoziation von genetischen Varianten mit multiplen Traits in einem gemeinsamen Copula Modell zu untersuchen. Der zweite Ansatz, CIEE (Causal Inference using Estimating Equations), ermöglicht direkte genetische Effekte zu schätzen und testen. C-JAMP wird in dieser Arbeit für Assoziationsstudien von seltenen genetischen Varianten mit quantitativen Traits evaluiert, und CIEE für Assoziationsstudien von häufigen genetischen Varianten mit quantitativen Traits und Ereigniszeiten. Die Ergebnisse von umfangreichen Simulationsstudien zeigen, dass beide Methoden unverzerrte und effiziente Parameterschätzer liefern und die statistische Power von Assoziationstests im Vergleich zu existierenden Methoden erhöhen können - welche ihrerseits oft keine valide Inferenz liefern. Für das zweite Ziel dieser Arbeit, neue genetische und transkriptomische Marker für kardiometabolische Traits zu identifizieren, werden zwei Studien mit genom- und transkriptomweiten Daten mit C-JAMP und CIEE analysiert. In den Analysen werden mehrere neue Kandidatenmarker und -gene für Blutdruck und Adipositas identifiziert. Dies unterstreicht den Wert, neue statistische Methoden zu entwickeln, evaluieren, und implementieren. Für beide entwickelten Methoden sind R Pakete verfügbar, die ihre Anwendung in zukünftigen Studien ermöglichen. / In recent years, the biotechnological advancements have allowed to investigate associations of genetic and molecular markers with multiple complex phenotypes in much greater depth. However, for the analysis of such complex datasets, available statistical methods often don’t yield valid inference. The first aim of this thesis is to develop two novel statistical methods for association analyses of genetic markers with multiple phenotypes, to implement them in a computationally efficient and robust manner so that they can be used for large-scale analyses, and evaluate them in comparison to existing statistical approaches under realistic scenarios. The first approach, called the copula-based joint analysis of multiple phenotypes (C-JAMP) method, allows investigating genetic associations with multiple traits in a joint copula model and is evaluated for genetic association analyses of rare genetic variants with quantitative traits. The second approach, called the causal inference using estimating equations (CIEE) method, allows estimating and testing direct genetic effects in directed acyclic graphs, and is evaluated for association analyses of common genetic variants with quantitative and time-to-event traits. The results of extensive simulation studies show that both approaches yield unbiased and efficient parameter estimators and can improve the power of association tests in comparison to existing approaches, which yield invalid inference in many scenarios. For the second goal of this thesis, to identify novel genetic and transcriptomic markers associated with cardiometabolic traits, C-JAMP and CIEE are applied in two large-scale studies including genome- and transcriptome-wide data. In the analyses, several novel candidate markers and genes are identified, which highlights the merit of developing, evaluating, and implementing novel statistical approaches. R packages are available for both methods and enable their application in future studies. Genomweite Assoziationsstudien Multiple Phänotypen Copula Modelle Kausale Inferenz Kardiometabolische Traits Seltene genetische Varianten R Pakete RNA Sequenzierung Genome-wide association studies Multiple phenotypes Copula models Causal inference Cardiometabolic traits Rare genetic variants R packages RNA Sequencing 004 Informatik 576 Genetik und Evolution 610 Medizin und Gesundheit WC 7700 ddc:519 ddc:004 ddc:576 ddc:610

1

Page generated in 0.0574 seconds