Global ETD Search

1	Discussion on Fifty Years of Classification and Regression Trees Rusch, Thomas, Zeileis, Achim 12 1900 (has links) (PDF) In this discussion paper, we argue that the literature on tree algorithms is very fragmented. We identify possible causes and discuss good and bad sides of this situation. Among the latter is the lack of free open-source implementations for many algorithms. We argue that if the community adopts a standard of creating and sharing free open-source implementations for their developed algorithms and creates easy access to these programs the bad sides of the fragmentation will be actively combated and will benefit the whole scientific community. (authors' abstract)
2	Predicting emergency department events due to asthma : results from the BRFSS Asthma Call Back Survey 2006-2009 Chancellor, Courtney Marie 05 December 2012 (has links) The identification of asthma patients most at risk of experiencing an emergency department event is an important step toward lessening public health burdens in the United States. In this report, the CDC BRFSS Asthma Call Back Survey Data from 2006 to 2009 is explored for potential factors for a predictive model. A metric for classifying the control level of asthma patients is constructed and applied. The data is then used to construct a predictive model for ED events with the rpart algorithm. / text Asthma Predictive modeling rpart Regression trees
3	Unbiased Recursive Partitioning: A Conditional Inference Framework Hothorn, Torsten, Hornik, Kurt, Zeileis, Achim January 2004 (has links) (PDF) Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed. / Series: Research Report Series / Department of Statistics and Mathematics
4	Ovarian tumor risk factors study in a south medical center in Taiwan Wu, Wei-Wen 05 July 2012 (has links) This study discusses main risk factors of ovarian tumor to determine a tumor type. Since symptoms of ovarian tumor are not obvious as the tumors are located in pelvic, the ovarian tumor is difficult to detect. The symptoms are mostly stomach or lower abdomen swellings, which are often ignored. The probability of ovarian cancer is lower than cervical cancer, but the mortality rate is the highest of all gynecologic diseases. The study uses statistical methods to analyze risk factors of patients to determine the tumor type and an early treatment in order to reduce the death rate. The sources of the studies are from Kaohsiung Veterans General Hospital and are classified according to different cases of tumors based on ultrasound checks and other relevant risk factors, such as ages and tumor marks so as to provide a determined method to distinguish among benign, borderline and malignant ovarian tumors in order to create appropriate classification criteria for followups, surgeries, and references for tracking. To differentiate between malignant and nonmalignant (benign and borderline) cases, we use risk factors to construct classification and regression trees so as to help physicians to determine the tumor type. In the situation in which the non-malignant tumor may be determined, we use logistic regression model according to the degree of influence of risk factors to further classify between benign and borderline tumors. The aforementioned process can determine tumor types precisely and can also determine surgery types so as to help determining whether patients would need a follow-up. borderline benign classification and regression trees malignant logistic regression
5	Tree-based Models for Longitudinal Data Liu, Dan 16 June 2014 (has links) No description available. Statistics Longitudinal data Classification and regression trees Quadratic inference functions
6	Predicting rifle shooting accuracy from context and sensor data : A study of how to perform data mining and knowledge discovery in the target shooting domain / Prediktering av skytteträffsäkerhet baserat på kontext och sensordata. Pettersson, Max, Jansson, Viktor January 2019 (has links) The purpose of this thesis is to develop an interpretable model that gives predictions for what factors impacted a shooter’s results. Experiment is our chosen research method. Our three independent variables are weapon movement, trigger pull force and heart rate. Our dependent variable is shooting accuracy. A random forest regression model is trained with the experiment data to produce predictions of shooting accuracy and to show correlation between independent and dependent variables. Our method shows that an increase in weapon movement, trigger pull force and heart rate decrease the predicted accuracy score. Weapon movement impacted shooting results the most with 53.61%, while trigger pull force and heart rateimpacted shooting results 22.20% and 24.18% respectively. We have also shown that LIME can be a viable method to give explanations on how the measured factors impacted shooting results. The results from this thesis lay the groundwork for better training tools for target shooting using explainable prediction models with sensors. Interpretability Target shooting Regression trees Feature selection Cross-validation Computer Sciences Datavetenskap (datalogi)
7	Classifying natural forests using LiDAR data / Klassificering av nyckelbiotoper med hjälp av LiDAR-data Arvidsson, Simon, Gullstrand, Marcus January 2019 (has links) In forestry, natural forests are forest areas with high biodiversity, in need of preservation. The current mapping of natural forests is a tedious task that requires manual labor that could possibly be automated. In this paper we explore the main features used by a random forest algorithm to classify natural forest and managed forest in northern Sweden. The goal was to create a model with a substantial strength of agreement, meaning a Kappa value of 0.61 or higher, placing the model in the same range as models produced in previous research. We used raster data gathered from airborne LiDAR, combined with labeled sample areas, both supplied by the Swedish Forest Agency. Two experiments were performed with different features. Experiment 1 used features extracted using methods inspired from previous research while Experiment 2 further added upon those features. From the total number of used sample areas (n=2882), 70% was used to train the models and 30% was used for evaluation. The result was a Kappa value of 0.26 for Experiment 1 and 0.32 for Experiment 2. Features shown to be prominent are features derived from canopy height, where the supplied data also had the highest resolution. Percentiles, kurtosis and canopy crown areas derived from the canopy height were shown to be the most important for classification. The results fell short of our goal, possibly indicating a range of flaws in the data used. The size of the sample areas and resolution of raster data are likely important factors when extracting features, playing a large role in the produced model’s performance. Geographic information systems Classification and regression trees Supervised learning by classification Computer Systems Datorsystem
8	Statistical Methods In Credit Rating Sezgin, Ozge 01 September 2006 (has links) (PDF) Credit risk is one of the major risks banks and financial institutions are faced with. With the New Basel Capital Accord, banks and financial institutions have the opportunity to improve their risk management process by using Internal Rating Based (IRB) approach. In this thesis, we focused on the internal credit rating process. First, a short overview of credit scoring techniques and validation techniques was given. By using real data set obtained from a Turkish bank about manufacturing firms, default prediction logistic regression, probit regression, discriminant analysis and classification and regression trees models were built. To improve the performances of the models the optimum sample for logistic regression was selected from the data set and taken as the model construction sample. In addition, also an information on how to convert continuous variables to ordered scaled variables to avoid difference in scale problem was given. After the models were built the performances of models for whole data set including both in sample and out of sample were evaluated with validation techniques suggested by Basel Committee. In most cases classification and regression trees model dominates the other techniques. After credit scoring models were constructed and evaluated, cut-off values used to map probability of default obtained from logistic regression to rating classes were determined with dual objective optimization. The cut-off values that gave the maximum area under ROC curve and minimum mean square error of regression tree was taken as the optimum threshold after 1000 simulation. Keywords: Credit Rating, Classification and Regression Trees, ROC curve, Pietra Index QA General 15707
9	Applying Classification and Regression Trees to manage financial risk Martin, Stephen Fredrick 16 August 2012 (has links) This goal of this project is to develop a set of business rules to mitigate risk related to a specific financial decision within the prepaid debit card industry. Under certain circumstances issuers of prepaid debit cards may need to decide if funds on hold can be released early for use by card holders prior to the final transaction settlement. After a brief introduction to the prepaid card industry and the financial risk associated with the early release of funds on hold, the paper presents the motivation to apply the CART (Classification and Regression Trees) method. The paper provides a tutorial of the CART algorithms formally developed by Breiman, Friedman, Olshen and Stone in the monograph Classification and Regression Trees (1984), as well as, a detailed explanation of the R programming code to implement the RPART function. (Therneau 2010) Special attention is given to parameter selection and the process of finding an optimal solution that balances complexity against predictive classification accuracy when measured against an independent data set through a cross validation process. Lastly, the paper presents an analysis of the financial risk mitigation based on the resulting business rules. / text CART Classification and Regression Trees Breiman Risk Prepaid Debit cards Rollback R RPART Cross validation
10	Habitat determinants and predatory interactions of the endemic freshwater crayfish (koura, Paranephrops planifrons) in the lower North Island, New Zealand : a thesis presented in partial fulfillment of the requirements for the degree of Masters of Science in Ecology at Massey University, Palmerston North, New Zealand Brown, Logan Arthur January 2009 (has links) A study in the Lower North Island located Parenephrops planifrons (koura) at 73 sites out of 104 sites visited (appendix 1). There was a significant difference in habitat variables between the sites which had koura present and those where they were absent. Examples of sites are shown in Appendix 3. Habitat variables important for classifying koura habitat included riparian cover, predators, winter equilibrium temperature and presence of in-stream habitat in the form of vegetation, litter cover and the stream sequence composition. Regression trees built could accurately describe the data but the kappa statistic was low. habitat variables regression trees

Search results