Spelling suggestions: "subject:"modelbased recursive partitioning"" "subject:"model.based recursive partitioning""
1 |
Gaining Insight With Recursive Partitioning Of Generalized Linear ModelsRusch, Thomas, Zeileis, Achim 06 1900 (has links) (PDF)
Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles.
Then, usually, a constant in every partition is fitted. While this is a simple and
intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and
independent variables may look. Or it may be that a certain model is assumed or of
interest and there is a number of candidate variables that may non-linearily give rise to
different model parameter values.
We present an approach that combines generalized linear models with recursive partitioning
that offers enhanced interpretability of classical trees as well as providing an
explorative way to assess a candidate variable's influence on a parametric model.
This method conducts recursive partitioning of a the generalized linear model by
(1) fitting the model to the data set, (2) testing for parameter instability over a set of
partitioning variables, (3) splitting the data set with respect to the variable associated with
the highest instability. The outcome is a tree where each terminal node is associated with a generalized linear model.
We will show the methods versatility and suitability to gain additional insight
into the relationship of dependent and independent variables by two examples, modelling
voting behaviour and a failure model for debt amortization. / Series: Research Report Series / Department of Statistics and Mathematics
|
2 |
Modeling Mortality Rates In The WikiLeaks Afghanistan War LogsRusch, Thomas, Hofmarcher, Paul, Hatzinger, Reinhold, Hornik, Kurt 09 1900 (has links) (PDF)
The WikiLeaks Afghanistan war logs contain more than 76 000 reports about fatalities and their circumstances in the US led Afghanistan war, covering the period from January 2004 to December 2009. In this paper we use those reports to build statistical models to help us understand the mortality rates associated with specific circumstances. We choose an approach that combines Latent Dirichlet Allocation (LDA) with negative binomial based recursive partitioning. LDA is used to process the natural language information contained in each report summary. We estimate latent topics and assign each report to one of them. These topics - in addition to other variables in the data set - subsequently serve as explanatory variables for modeling the number of fatalities of the civilian population, ISAF Forces, Anti-Coalition Forces and the Afghan National Police or military as well as the combined number of fatalities. Modeling is carried out with manifest mixtures of negative binomial distributions estimated with model-based recursive partitioning. For each group of fatalities, we identify segments with different mortality rates that correspond to a small number of topics and other explanatory variables as well as their interactions. Furthermore, we carve out the similarities between segments and connect them to stories that have been covered in the media. This provides an unprecedented description of the war in Afghanistan covered by the war logs. Additionally, our approach can serve as an example as to how modern statistical methods may lead to extra insight if applied to problems of data journalism. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
3 |
Gaining Insight with Recursive Partitioning of Generalized Linear ModelsRusch, Thomas, Zeileis, Achim January 2013 (has links) (PDF)
Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles.
Then, usually, a constant in every partition is fitted. While this is a simple and intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and independent variables may look. Or it may be that a certain model is assumed or of interest and there is a number of candidate variables that may non-linearly give rise to different model parameter values. We present an approach that combines generalized linear models with recursive partitioning that offers enhanced interpretability of classical trees as well as providing an explorative way to assess a candidate variable's in uence on a parametric model. This method conducts recursive partitioning of a generalized linear model by (1) fitting the model to the data set, (2) testing for parameter
instability over a set of partitioning variables, (3) splitting the data set with respect to the variable associated with the highest instability. The outcome is a tree where each terminal node is associated with a generalized linear model. We will show the method's
versatility and suitability to gain additional insight into the relationship of dependent and independent variables by two examples, modelling voting behaviour and a failure model
for debt amortization, and compare it to alternative approaches.
|
4 |
Influencing Elections with Statistics: Targeting Voters with Logistic Regression TreesRusch, Thomas, Lee, Ilro, Hornik, Kurt, Jank, Wolfgang, Zeileis, Achim 03 1900 (has links) (PDF)
Political campaigning has become a multi-million dollar business. A substantial proportion of a campaign's budget is spent on voter mobilization, i.e., on identifying and
influencing as many people as possible to vote. Based on data, campaigns use statistical
tools to provide a basis for deciding who to target. While the data available is usually rich,
campaigns have traditionally relied on a rather limited selection of information, often including only previous voting behavior and one or two demographical variables. Statistical
procedures that are currently in use include logistic regression or standard classification
tree methods like CHAID, but there is a growing interest in employing modern data mining approaches. Along the lines of this development, we propose a modern framework
for voter targeting called LORET (for logistic regression trees) that employs trees (with
possibly just a single root node) containing logistic regressions (with possibly just an intercept) in every leaf. Thus, they contain logistic regression and classification trees as special
cases and allow for a synthesis of both techniques under one umbrella. We explore various
flavors of LORET models that (a) compare the effect of using the full set of available
variables against using only limited information and (b) investigate their varying effects
either as regressors in the logistic model components or as partitioning variables in the
tree components. To assess model performance and illustrate targeting, we apply LORET
to a data set of 19,634 eligible voters from the 2004 US presidential election. We find that
augmenting the standard set of variables (such as age and voting history) together with
additional predictor variables (such as the household composition in terms of party affiliation and each individual's rank in the household) clearly improves predictive accuracy.
We also find that LORET models based on tree induction outbeat the unpartitioned competitors. Additionally, LORET models using both partitioning variables and regressors
in the resulting nodes can improve the efficiency of allocating campaign resources while
still providing intelligible models. / Series: Research Report Series / Department of Statistics and Mathematics
|
5 |
Model trees with topic model preprocessing: an approach for data journalism illustrated with the WikiLeaks Afghanistan war logsRusch, Thomas, Hofmarcher, Paul, Hatzinger, Reinhold, Hornik, Kurt 06 1900 (has links) (PDF)
The WikiLeaks Afghanistan war logs contain nearly 77,000 reports of
incidents in the US-led Afghanistan war, covering the period from January
2004 to December 2009. The recent growth of data on complex social systems
and the potential to derive stories from them has shifted the focus of
journalistic and scientific attention increasingly toward data-driven journalism
and computational social science. In this paper we advocate the usage
of modern statistical methods for problems of data journalism and beyond,
which may help journalistic and scientific work and lead to additional insight.
Using the WikiLeaks Afghanistan war logs for illustration, we present an approach
that builds intelligible statistical models for interpretable segments in
the data, in this case to explore the fatality rates associated with different circumstances
in the Afghanistan war. Our approach combines preprocessing by
Latent Dirichlet Allocation (LDA) with model trees. LDA is used to process
the natural language information contained in each report summary by estimating
latent topics and assigning each report to one of them. Together with
other variables these topic assignments serve as splitting variables for finding
segments in the data to which local statistical models for the reported number
of fatalities are fitted. Segmentation and fitting is carried out with recursive
partitioning of negative binomial distributions. We identify segments with
different fatality rates that correspond to a small number of topics and other
variables as well as their interactions. Furthermore, we carve out the similarities
between segments and connect them to stories that have been covered in
the media. This gives an unprecedented description of the war in Afghanistan
and serves as an example of how data journalism, computational social science
and other areas with interest in database data can benefit from modern
statistical techniques. (authors' abstract)
|
6 |
Score-Based Approaches to Heterogeneity in Psychological ModelsArnold, Manuel 30 May 2022 (has links)
Statistische Modelle menschlicher Kognition und Verhaltens stützen sich häufig auf aggregierte Daten und vernachlässigen dadurch oft Heterogenität in Form von Unterschieden zwischen Personen oder Gruppen. Die Nichtberücksichtigung vorliegender Heterogenität kann zu verzerrten Parameterschätzungen und zu falsch positiven oder falsch negativen Tests führen. Häufig kann Heterogenität mithilfe von Kovariaten erkannt und vorhergesagt werden. Allerdings erweist sich die Identifizierung von Prädiktoren von Heterogenität oft als schwierige Aufgabe. Zur Lösung dieses Problems schlage ich zwei neue Ansätze vor, um individuelle und gruppenspezifische Unterschiede mithilfe von Kovariaten vorherzusagen.
Die vorliegende kumulative Dissertation setzt sich aus drei Projekten zusammen. Projekt 1 widmet sich dem Verfahren IPC-Regression (Individual Parameter Contribution), welches die Exploration von Parameterheterogenität in Strukturgleichungsmodellen (SEM) mittels Kovariaten erlaubt. Unter anderem evaluiere ich IPC-Regression für dynamische Panel-Modelle, schlage eine alternative Schätzmethode vor und leite IPCs für allgemeine Maximum-Likelihood-Schätzer her. Projekt 2 veranschaulicht, wie IPC-Regression in der Praxis eingesetzt werden kann. Dazu führe ich schrittweise in die Implementierung von IPC-Regression im ipcr-Paket für die statistische Programmiersprache R ein. Schließlich werden in Projekt 3 SEM-Trees weiterentwickelt. SEM-Trees sind eine modellbasierte rekursive Partitionierungsmethode zur Identifizierung von Kovariaten, die Gruppenunterschiede in SEM-Parametern vorhersagen. Die bisher verwendeten SEM-Trees sind sehr rechenaufwendig. In Projekt 3 kombiniere ich SEM-Trees mit unterschiedlichen Score-basierten Tests. Die daraus resultierenden Score-Guided-SEM-Tees lassen sich deutlich schneller als herkömmlichen SEM-Trees berechnen und zeigen bessere statistische Eigenschaften. / Statistical models of human cognition and behavior often rely on aggregated data and may fail to consider heterogeneity, that is, differences across individuals or groups. If overlooked, heterogeneity can bias parameter estimates and may lead to false-positive or false-negative findings. Often, heterogeneity can be detected and predicted with the help of covariates. However, identifying predictors of heterogeneity can be a challenging task. To solve this issue, I propose two novel approaches for detecting and predicting individual and group differences with covariates.
This cumulative dissertation is composed of three projects. Project 1 advances the individual parameter contribution (IPC) regression framework, which allows studying heterogeneity in structural equation model (SEM) parameters by means of covariates. I evaluate the use of IPC regression for dynamic panel models, propose an alternative estimation technique, and derive IPCs for general maximum likelihood estimators. Project 2 illustrates how IPC regression can be used in practice. To this end, I provide a step-by-step introduction to the IPC regression implementation in the ipcr package for the R system for statistical computing. Finally, Project 3 progresses the SEM tree framework. SEM trees are a model-based recursive partitioning method for finding covariates that predict group differences in SEM parameters. Unfortunately, the original SEM tree implementation is computationally demanding. As a solution to this problem, I combine SEM trees with a family of score-based tests. The resulting score-guided SEM trees compute quickly, solving the runtime issues of the original SEM trees, and show favorable statistical properties.
|
7 |
Recursive Partitioning of Models of a Generalized Linear Model TypeRusch, Thomas 10 June 2012 (has links) (PDF)
This thesis is concerned with recursive partitioning of models of a generalized linear model type (GLM-type), i.e., maximum likelihood models with a linear predictor for the linked mean, a topic that has received constant interest over the last twenty years. The resulting tree (a ''model tree'') can be seen as an extension of classic trees, to allow for a GLM-type model in the partitions. In this work, the focus lies on applied and computational aspects of model trees with GLM-type node models to work out different areas where application of the combination of parametric models and trees will be beneficial and to build a computational scaffold for future application of model trees. In the first part, model trees are defined and some algorithms for fitting model trees with GLM-type node model are reviewed and compared in terms of their properties of tree induction and node model fitting. Additionally, the design of a particularly versatile algorithm, the MOB algorithm (Zeileis et al. 2008) in R is described and an in-depth discussion of how the functionality offered can be extended to various GLM-type models is provided. This is highlighted by an example of using partitioned negative binomial models for investigating the effect of health care incentives. Part 2 consists of three research articles where model trees are applied to different problems that frequently occur in the social sciences. The first uses trees with GLM-type node models and applies it to a data set of voters, who show a non-monotone relationship between the frequency of attending past elections and the turnout in 2004. Three different type of model tree algorithms are used to investigate this phenomenon and for two the resulting trees can explain the counter-intuitive finding. Here model tress are used to learn a nonlinear relationship between a target model and a big number of candidate variables to provide more insight into a data set. A second application area is also discussed, namely using model trees to detect ill-fitting subsets in the data. The second article uses model trees to model the number of fatalities in Afghanistan war, based on the WikiLeaks Afghanistan war diary. Data pre-processing with a topic model generates predictors that are used as explanatory variables in a model tree for overdispersed count data. Here the combination of model trees and topic models allows to flexibly analyse database data, frequently encountered in data journalism, and provides a coherent description of fatalities in the Afghanistan war. The third paper uses a new framework built around model trees to approach the classic problem of segmentation, frequently encountered in marketing and management science. Here, the framework is used for segmentation of a sample of the US electorate for identifying likely and unlikely voters. It is shown that the framework's model trees enable accurate identification which in turn allows efficient targeted mobilisation of eligible voters. (author's abstract)
|
Page generated in 0.1596 seconds