1 |
Measuring the Stability of Results from Supervised Statistical LearningPhilipp, Michel, Rusch, Thomas, Hornik, Kurt, Strobl, Carolin 17 January 2017 (has links) (PDF)
Stability is a major requirement to draw reliable conclusions when
interpreting results from supervised statistical learning. In this
paper, we present a general framework for assessing and comparing the
stability of results, that can be used in real-world statistical
learning applications or in benchmark studies. We use the framework to
show that stability is a property of both the algorithm and the
data-generating process. In particular, we demonstrate that unstable
algorithms (such as recursive partitioning) can produce stable results
when the functional form of the relationship between the predictors
and the response matches the algorithm. Typical uses of the framework
in practice would be to compare the stability of results generated by
different candidate algorithms for a data set at hand or to assess the
stability of algorithms in a benchmark study. Code to perform the
stability analyses is provided in the form of an R-package. / Series: Research Report Series / Department of Statistics and Mathematics
|
2 |
Evaluating Model-based Trees in PracticeZeileis, Achim, Hothorn, Torsten, Hornik, Kurt January 2006 (has links) (PDF)
A recently suggested algorithm for recursive partitioning of statistical models (Zeileis, Hothorn and Hornik, 2005), such as models estimated by maximum likelihood or least squares, is evaluated in practice. The general algorithm is applied to linear regression, logisitic regression and survival regression and applied to economical and medical regression problems. Furthermore, its performance with respect to prediction quality and model complexity is compared in a benchmark study with a large collection of other tree-based algorithms showing that the algorithm yields interpretable trees, competitive with previously suggested approaches. / Series: Research Report Series / Department of Statistics and Mathematics
|
3 |
Model-based recursive partitioningZeileis, Achim, Hothorn, Torsten, Hornik, Kurt January 2005 (has links) (PDF)
Recursive partitioning is embedded into the general and well-established class of parametric models that can be fitted using M-type estimators (including maximum likelihood). An algorithm for model-based recursive partitioning is suggested for which the basic steps are: (1) fit a parametric model to a data set, (2) test for parameter instability over a set of partitioning variables, (3) if there is some overall parameter instability, split the model with respect to the variable associated with the highest instability, (4) repeat the procedure in each of the daughter nodes. The algorithm yields a partitioned (or segmented) parametric model that can effectively be visualized and that subject-matter scientists are used to analyze and interpret. / Series: Research Report Series / Department of Statistics and Mathematics
|
4 |
Generation of Individualized Treatment Decision Tree Algorithm with Application to Randomized Control Trials and Electronic Medical Record DataDoubleday, Kevin January 2016 (has links)
With new treatments and novel technology available, personalized medicine has become a key topic in the new era of healthcare. Traditional statistical methods for personalized medicine and subgroup identification primarily focus on single treatment or two arm randomized control trials (RCTs). With restricted inclusion and exclusion criteria, data from RCTs may not reflect real world treatment effectiveness. However, electronic medical records (EMR) offers an alternative venue. In this paper, we propose a general framework to identify individualized treatment rule (ITR), which connects the subgroup identification methods and ITR. It is applicable to both RCT and EMR data. Given the large scale of EMR datasets, we develop a recursive partitioning algorithm to solve the problem (ITR-Tree). A variable importance measure is also developed for personalized medicine using random forest. We demonstrate our method through simulations, and apply ITR-Tree to datasets from diabetes studies using both RCT and EMR data. Software package is available at https://github.com/jinjinzhou/ITR.Tree.
|
5 |
Εφαρμογή αλγορίθμων εξόρυξης δεδομένων σε εικόνες / Application of data mining algorithm in imagesΖαχαρία, Ελισάβετ 26 July 2013 (has links)
H παρούσα εργασία ασχολείται με τεχνικές εξόρυξης δεδομένων από εικόνες. Παρουσιάζει κάποια βασικά θεωρητικά στοιχεία σχετικά με τις διάφορες μεθόδους, και στη συνέχεια εστιάζει στην υλοποίηση της τεχνικής dynamic recursive partitioning (DRP), που αναφέρεται ειδικά σε εξόρυξη δεδομένων σε σχέση με εικόνες. Η συγκεκριμένη τεχνική μελετήθηκε έτσι ώστε να καθοριστούν και να χαρακτηριστούν συγκεκριμένα μορφομετρικά χαρακτηριστικά ανάμεσα σε ανατομικές δομές / εικόνες εγκεφάλων, για ιατρικές εφαρμογές.
Στόχος είναι να αποδειχτεί ότι η μέθοδος αυτή μειώνει τον απαιτούμενο αριθμό στατιστικών τεστ σε σχέση με άλλες αντίστοιχες μεθόδους, όπως για παράδειγμα σε σχέση με τη μέθοδο ανάλυσης κατά pixel. Όπως φάνηκε η μέθοδος DRP αποδίδει έχοντας εξίσου καλά και ικανοποιητικά αποτελέσματα με την μέθοδο ανάλυσης κατά pixel. Ταυτόχρονα όμως, η χρήση της DRP έχει ως αποτέλεσμα να χρησιμοποιείται σαφώς μικρότερος αριθμός στατιστικών τεστ, για την εξόρυξη των δεδομένων από τις εικόνες και την καταγραφή των περιοχών των εικόνων με τις σημαντικότερες μορφολογικές διαφοροποιήσεις, με την μείωση αυτή να φτάνει ως και το 50%. / This dissertation deals with methods of data mining from images. It presents a basic theoretical background regarding the several different methods, and then it focuses on a specific technique called dynamic recursive partitioning (DRP). The specific technique was examined in order to define some basic morphological characteristics between anatomical structures / images of brains for medical applications.
The main target was to prove that this method reduces the necessary number of statistical tests with respect to other similar methods. As it was shown, DRP indeed performs at least the same as other methods. At the same time, its usage results in a significantly lower number of statistical tests, in order to perform data mining from the images and extract the areas of images with the most important morphological differences. This reduction of statistical tests reaches almost 50%.
|
6 |
Let's Have a party! An Open-Source Toolbox for Recursive PartytioningHothorn, Torsten, Zeileis, Achim, Hornik, Kurt January 2007 (has links) (PDF)
Package party, implemented in the R system for statistical computing, provides basic classes and methods for recursive partitioning along with reference implementations for three recently-suggested tree-based learners: conditional inference trees and forests, and model-based recursive partitioning. / Series: Research Report Series / Department of Statistics and Mathematics
|
7 |
Gaining Insight With Recursive Partitioning Of Generalized Linear ModelsRusch, Thomas, Zeileis, Achim 06 1900 (has links) (PDF)
Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles.
Then, usually, a constant in every partition is fitted. While this is a simple and
intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and
independent variables may look. Or it may be that a certain model is assumed or of
interest and there is a number of candidate variables that may non-linearily give rise to
different model parameter values.
We present an approach that combines generalized linear models with recursive partitioning
that offers enhanced interpretability of classical trees as well as providing an
explorative way to assess a candidate variable's influence on a parametric model.
This method conducts recursive partitioning of a the generalized linear model by
(1) fitting the model to the data set, (2) testing for parameter instability over a set of
partitioning variables, (3) splitting the data set with respect to the variable associated with
the highest instability. The outcome is a tree where each terminal node is associated with a generalized linear model.
We will show the methods versatility and suitability to gain additional insight
into the relationship of dependent and independent variables by two examples, modelling
voting behaviour and a failure model for debt amortization. / Series: Research Report Series / Department of Statistics and Mathematics
|
8 |
Clinical Prediction of Symptomatic Vasospasm in Aneurysmal Subarachnoid HemorrhageLee, Hubert January 2017 (has links)
Objective: This study aims to derive a clinically-applicable decision rule to predict the risk of symptomatic vasospasm, a neurological deficit primarily due to abnormal narrowing of cerebral arteries supplying an attributable territory, in aneurysmal subarachnoid hemorrhage (SAH).
Methods: SAH patients presenting from 2002 to 2011 were analyzed using logistic regression and recursive partitioning to identify clinical, radiological, and laboratory features that predict the occurrence of symptomatic vasospasm.
Results: The incidence of symptomatic vasospasm was 21.0%. On multivariate logistic regression analysis, significant predictors of symptomatic vasospasm included age 40-59 years, high Modified Fisher Grade (Grades 3 and 4), and anterior circulation aneurysms.
Conclusion: Development of symptomatic vasospasm can be reliably predicted using a clinical decision rule created by logistic regression. It exhibits increased accuracy over the Modified Fisher Grade alone and may serve as a useful clinical tool to individualize vasospasm risk once prospectively validated in other neurosurgical centres.
|
9 |
Predicting Bankruptcy Using Recursive Partitioning and a Realistically Proportioned Data SetMcKee, Thomas E., Greenstein, Marilyn 01 January 2000 (has links)
Auditors must assess their clients' ability to function as a going concern for at least the year following the financial statement date. The audit profession has been severely criticized for failure to 'blow the whistle' in numerous highly visible bankruptcies that occurred shortly after unmodified audit opinions were issued. Financial distress indicators examined in this study are one mechanism for making such assessments. This study measures and compares the predictive accuracy of an easily implemented two-variable bankruptcy model originally developed using recursive partitioning on an equally proportioned data set of 202 firms. In this study, we test the predictive accuracy of this model, as well as previously developed logit and neural network models, using a realistically proportioned set of 14,212 firms' financial data covering the period 1981-1990. The previously developed recursive partitioning model had an overall accuracy for all firms ranging from 95 to 97% which outperformed both the logit model at 93 to 94% and the neural network model at 86 to 91%. The recursive partitioning model predicted the bankrupt firms with 33-58% accuracy. A sensitivity analysis of recursive partitioning cutting points indicated that a newly specified model could achieve an all firm and a bankrupt firm predictive accuracy of approximately 85%. Auditors will be interested in the Type I and Type II error tradeoffs revealed in a detailed sensitivity table for this easily implemented model.
|
10 |
Addressing the Variable Selection Bias and Local Optimum Limitations of Longitudinal Recursive Partitioning with Time-Efficient ApproximationsJanuary 2019 (has links)
abstract: Longitudinal recursive partitioning (LRP) is a tree-based method for longitudinal data. It takes a sample of individuals that were each measured repeatedly across time, and it splits them based on a set of covariates such that individuals with similar trajectories become grouped together into nodes. LRP does this by fitting a mixed-effects model to each node every time that it becomes partitioned and extracting the deviance, which is the measure of node purity. LRP is implemented using the classification and regression tree algorithm, which suffers from a variable selection bias and does not guarantee reaching a global optimum. Additionally, fitting mixed-effects models to each potential split only to extract the deviance and discard the rest of the information is a computationally intensive procedure. Therefore, in this dissertation, I address the high computational demand, variable selection bias, and local optimum solution. I propose three approximation methods that reduce the computational demand of LRP, and at the same time, allow for a straightforward extension to recursive partitioning algorithms that do not have a variable selection bias and can reach the global optimum solution. In the three proposed approximations, a mixed-effects model is fit to the full data, and the growth curve coefficients for each individual are extracted. Then, (1) a principal component analysis is fit to the set of coefficients and the principal component score is extracted for each individual, (2) a one-factor model is fit to the coefficients and the factor score is extracted, or (3) the coefficients are summed. The three methods result in each individual having a single score that represents the growth curve trajectory. Therefore, now that the outcome is a single score for each individual, any tree-based method may be used for partitioning the data and group the individuals together. Once the individuals are assigned to their final nodes, a mixed-effects model is fit to each terminal node with the individuals belonging to it.
I conduct a simulation study, where I show that the approximation methods achieve the goals proposed while maintaining a similar level of out-of-sample prediction accuracy as LRP. I then illustrate and compare the methods using an applied data. / Dissertation/Thesis / Doctoral Dissertation Psychology 2019
|
Page generated in 0.1277 seconds