Spelling suggestions: "subject:"treebased methods"" "subject:"greenbased methods""
1 |
Classification and Regression Trees in R / Classification and Regression Trees in RNemčíková, Lucia January 2014 (has links)
Tree-based methods are a nice add-on to traditional statistical methods when solving classification and regression problems. The aim of this master thesis is not to judge which approach is better but rather bring the overview of these methods and apply them on the real data using R. Focus is made especially on the basic methodology of tree-based models and the application in specific software in order to provide wide range of tool for reader to be able to use these methods. One part of the thesis touches the advanced tree-based methods to provide full picture of possibilities.
|
2 |
High-dimensional classification and attribute-based forecastingLo, Shin-Lian 27 August 2010 (has links)
This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors. Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. One important characteristic of microarray data is that the number of genes is much larger than the sample size. The penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in the first study, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. The second study is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the assumption of independent observations is violated. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. An efficient hybrid algorithm is introduced to tackle computational challenges in estimation and integration. Applications to a breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations. The second part of this thesis develops a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with predictor structures. The new approach, beyond conventional tree-based methods, incorporates a general linear model and hierarchical splits to make trees more comprehensive, efficient, and interpretable. Through an empirical study in the air cargo industry and a simulation study containing several different settings, the new approach produces higher forecasting accuracy and higher computational efficiency than existing tree-based methods.
|
Page generated in 0.0434 seconds