1 |
Discussion on Fifty Years of Classification and Regression TreesRusch, Thomas, Zeileis, Achim 12 1900 (has links) (PDF)
In this discussion paper, we argue that the literature on tree algorithms is very fragmented. We identify possible causes and discuss good and bad sides of this situation. Among the latter is the lack of free open-source
implementations for many algorithms. We argue that if the community adopts a standard of creating and sharing free open-source implementations for their developed algorithms and creates easy access to these programs the bad sides of the fragmentation will be actively combated and will benefit the whole scientific community. (authors' abstract)
|
2 |
Predicting emergency department events due to asthma : results from the BRFSS Asthma Call Back Survey 2006-2009Chancellor, Courtney Marie 05 December 2012 (has links)
The identification of asthma patients most at risk of experiencing an emergency department event is an important step toward lessening public health burdens in the United States. In this report, the CDC BRFSS Asthma Call Back Survey Data from 2006 to 2009 is explored for potential factors for a predictive model. A metric for classifying the control level of asthma patients is constructed and applied. The data is then used to construct a predictive model for ED events with the rpart algorithm. / text
|
3 |
Unbiased Recursive Partitioning: A Conditional Inference FrameworkHothorn, Torsten, Hornik, Kurt, Zeileis, Achim January 2004 (has links) (PDF)
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed. / Series: Research Report Series / Department of Statistics and Mathematics
|
4 |
Ovarian tumor risk factors study in a south medical center in TaiwanWu, Wei-Wen 05 July 2012 (has links)
This study discusses main risk factors of ovarian tumor to determine a tumor type. Since symptoms of ovarian tumor are not obvious as the tumors are located in pelvic, the ovarian tumor is difficult to detect. The symptoms are mostly stomach or lower abdomen
swellings, which are often ignored. The probability of ovarian cancer is lower than cervical cancer, but the mortality rate is the highest of all gynecologic diseases. The study uses statistical methods to analyze risk factors of patients to determine the tumor type and an early treatment in order to reduce the death rate. The sources of the studies are from Kaohsiung Veterans General Hospital and are classified according to different cases of tumors based on ultrasound checks and other relevant risk factors, such as ages and tumor marks so as to provide a determined method to distinguish among benign, borderline and malignant ovarian tumors in order to create appropriate classification criteria for followups,
surgeries, and references for tracking. To differentiate between malignant and nonmalignant (benign and borderline) cases, we use risk factors to construct classification and regression trees so as to help physicians to determine the tumor type. In the situation
in which the non-malignant tumor may be determined, we use logistic regression model according to the degree of influence of risk factors to further classify between benign and borderline tumors. The aforementioned process can determine tumor types precisely and can also determine surgery types so as to help determining whether patients would need a
follow-up.
|
5 |
The Application of Atheoretical Regression Trees to Problems in Time Series AnalysisRea, William Stanley January 2008 (has links)
This thesis applies Atheoretical Regression Trees (ART) to the
problem of locating changes in mean in a time series where the
number and location of those changes are unknown. We undertook
an extensive simulation study into ART's performance on a range
of time series. We found ART to be a useful addition to currently
established structural break methodologies such as the CUSUM and
that due to Bai and Perron. ART was found to be useful in the
analysis of long time series which are not practical to analyze
with the optimal procedure of Bai and Perron.
ART was applied to a long standing problem in the analysis of
long memory time series.
We propose two new methods based on ART
for distinguishing between true long memory
and spurious long memory due to structural breaks. These methods
are fundamentally different from current tests and procedures
intended to discriminate between the two sets of competing
models.
The methods were
subjected to a simulation study and shown to be effective in
discrimination between simple regime switching models and
fractionally integrated processes.
We applied the new methods to 16 realized volatility series and
concluded they were not fractionally integrated series. All 16
series had mean shifts, some of which could be identified with
historical events.
We applied the new methods to a range of geophysical time series
and concluded they were not fractional Gaussian noises. All
of the series examined had mean shifts, some of which could
be identified with known climatic changes.
We conclude that our new methods are a significant advance in
model discrimination in long memory series.
|
6 |
Tree-based Models for Longitudinal DataLiu, Dan 16 June 2014 (has links)
No description available.
|
7 |
Urychlení evolučních algoritmů pomocí rozhodovacích stromů a jejich zobecnění / Accelerating evolutionary algorithms by decision trees and their generalizationsKlíma, Jan January 2011 (has links)
Evolutionary algorithms are one of the most successful methods for solving non-traditional optimization problems. As they employ only function values of the objective function, evolutionary algorithms converge much more slowly than optimization methods for smooth functions. This property of evolutionary algorithms is particularly disadvantageous in the context of costly and time-consuming empirical way of obtaining values of the objective function. However, evolutionary algorithms can be substantially speeded up by employing a sufficiently accurate regression model of the empirical objective function. This thesis provides a survey of utilizability of regression trees and their ensembles as a surrogate model to accelerate convergence of evolutionary optimization.
|
8 |
Predicting rifle shooting accuracy from context and sensor data : A study of how to perform data mining and knowledge discovery in the target shooting domain / Prediktering av skytteträffsäkerhet baserat på kontext och sensordata.Pettersson, Max, Jansson, Viktor January 2019 (has links)
The purpose of this thesis is to develop an interpretable model that gives predictions for what factors impacted a shooter’s results. Experiment is our chosen research method. Our three independent variables are weapon movement, trigger pull force and heart rate. Our dependent variable is shooting accuracy. A random forest regression model is trained with the experiment data to produce predictions of shooting accuracy and to show correlation between independent and dependent variables. Our method shows that an increase in weapon movement, trigger pull force and heart rate decrease the predicted accuracy score. Weapon movement impacted shooting results the most with 53.61%, while trigger pull force and heart rateimpacted shooting results 22.20% and 24.18% respectively. We have also shown that LIME can be a viable method to give explanations on how the measured factors impacted shooting results. The results from this thesis lay the groundwork for better training tools for target shooting using explainable prediction models with sensors.
|
9 |
Classifying natural forests using LiDAR data / Klassificering av nyckelbiotoper med hjälp av LiDAR-dataArvidsson, Simon, Gullstrand, Marcus January 2019 (has links)
In forestry, natural forests are forest areas with high biodiversity, in need of preservation. The current mapping of natural forests is a tedious task that requires manual labor that could possibly be automated. In this paper we explore the main features used by a random forest algorithm to classify natural forest and managed forest in northern Sweden. The goal was to create a model with a substantial strength of agreement, meaning a Kappa value of 0.61 or higher, placing the model in the same range as models produced in previous research. We used raster data gathered from airborne LiDAR, combined with labeled sample areas, both supplied by the Swedish Forest Agency. Two experiments were performed with different features. Experiment 1 used features extracted using methods inspired from previous research while Experiment 2 further added upon those features. From the total number of used sample areas (n=2882), 70% was used to train the models and 30% was used for evaluation. The result was a Kappa value of 0.26 for Experiment 1 and 0.32 for Experiment 2. Features shown to be prominent are features derived from canopy height, where the supplied data also had the highest resolution. Percentiles, kurtosis and canopy crown areas derived from the canopy height were shown to be the most important for classification. The results fell short of our goal, possibly indicating a range of flaws in the data used. The size of the sample areas and resolution of raster data are likely important factors when extracting features, playing a large role in the produced model’s performance.
|
10 |
Statistical Methods In Credit RatingSezgin, Ozge 01 September 2006 (has links) (PDF)
Credit risk is one of the major risks banks and financial institutions are faced with. With the New Basel Capital Accord, banks and financial institutions have the opportunity
to improve their risk management process by using Internal Rating Based (IRB) approach. In this thesis, we focused on the internal credit rating process. First, a short overview of credit scoring techniques and validation techniques was given. By using real data set obtained from a Turkish bank about manufacturing firms, default prediction logistic regression, probit regression, discriminant analysis and classification and regression trees models were built. To improve the performances of the models the optimum sample for logistic regression was selected from the data set
and taken as the model construction sample. In addition, also an information on how to convert continuous variables to ordered scaled variables to avoid difference in scale problem was given. After the models were built the performances of models for whole data set including both in sample and out of sample were evaluated with validation techniques suggested by Basel Committee. In most cases classification and regression trees model dominates the other techniques. After credit scoring models were constructed and evaluated, cut-off values used to map probability of default obtained
from logistic regression to rating classes were determined with dual objective optimization. The cut-off values that gave the maximum area under ROC curve and minimum mean square error of regression tree was taken as the optimum threshold
after 1000 simulation.
Keywords: Credit Rating, Classification and Regression Trees, ROC curve, Pietra Index
|
Page generated in 0.0603 seconds