Global ETD Search

1	Ensemble learning for ranking interesting attributes Sinsel, Erik W. January 2005 (has links) Thesis (M.S.)--West Virginia University, 2005. / Title from document title page. Document formatted into pages; contains viii, 81 p. : ill. Includes abstract. Includes bibliographical references (p. 72-74). Machine learning. Decision trees.
2	Computational complexity analysis of decision tree algorithms Sani, Habiba M., Lei, Ci, Neagu, Daniel 16 November 2018 (has links) Yes / Decision tree is a simple but powerful learning technique that is considered as one of the famous learning algorithms that have been successfully used in practice for various classification tasks. They have the advantage of producing a comprehensible classification model with satisfactory accuracy levels in several application domains. In recent years, the volume of data available for learning is dramatically increasing. As a result, many application domains are faced with a large amount of data thereby posing a major bottleneck on the computability of learning techniques. There are different implementations of the decision tree using different techniques. In this paper, we theoretically and experimentally study and compare the computational power of the most common classical top-down decision tree algorithms (C4.5 and CART). This work can serve as part of review work to analyse the computational complexity of the existing decision tree classifier algorithm to gain understanding of the operational steps with the aim of optimizing the learning algorithm for large datasets. Classification Decision trees Complexity
3	Efficient decision tree building algorithms for uncertain data Tsang, Pui-kwan, Smith., 曾沛坤. January 2008 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Decision trees. Data mining. Algorithms.
4	A Monte-Carlo approach to tool selection for sheet metal punching and nibbling Summad, Emad January 2001 (has links) Selecting the best set of tools to produce certain geometrical shapes/features in sheet metal punching is one of the problems that has a great effect on product development time, cost and achieved quality. The trend nowadays is, where at all possible, to limit design to the use of standard tools. Such an option makes the problem of selecting the appropriate set of tools even more complex, especially when considering that sheet metal features can have a wide range of complex shapes. Another dimension of complexity is limited tool rack capacity. Thus, an inappropriate tool selection strategy will lead to punching inefficiency and may require frequent stopping of the machine and replacing the required tools, which is a rather expensive and time consuming exercise. This work demonstrates that the problem of selecting the best set of tools is actually a process of searching an explosive decision tree. The difficulty in searching such types of decision trees is that intermediate decisions do not necessarily reflect the total cost implication of carrying out such a decision. A new approach to solve such a complex optimisation problem using the Monte Carlo Simulation Methods has been introduced in this thesis. The aim of the present work was to establish the use of Monte Carlo methods as an "assumptions or rule free" baseline or benchmark for the assessment of search strategies. A number of case studies are given, where the feasibility of Monte Carlo Simulation Methods as an efficient and viable method to optimise such a complex optimisation problem is demonstrated. The use of a Monte Carlo approach for selecting the best set of punching tools, showed an interesting point, that is, the effect of dominant "one-to-one" feature/tool matches on the efficiency of the search. This naturally led on to the need of a search methodology that will be more efficient than the application of the Monte Carlo method alone. This thesis presents some interesting speculations for a hybrid approach to tool selection to achieve a better solution than the use of the Monte Carlo method alone to achieve the optimum solution in a shorter time. 670 Decision trees; Simulation; Tools
5	The development of an objective methodology for the prediction of helicopter pilot workload MacDonald, Calum Angus January 2001 (has links) No description available. 510 Decision trees; Rule induction
6	Text Document Categorization by Machine Learning Sendur, Zeynel 01 January 2008 (has links) Because of the explosion of digital and online text information, automatic organization of documents has become a very important research area. There are mainly two machine learning approaches to enhance the organization task of the digital documents. One of them is the supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a training set of labeled documents; and the other one is the unsupervised approach, where there is no need for human intervention or labeled documents at any point in the whole process. In this thesis, we concentrate on the supervised learning task which deals with document classification. One of the most important tasks of information retrieval is to induce classifiers capable of categorizing text documents. The same document can belong to two or more categories and this situation is referred by the term multi-label classification. Multi-label classification domains have been encountered in diverse fields. Most of the existing machine learning techniques which are in multi-label classification domains are extremely expensive since the documents are characterized by an extremely large number of features. In this thesis, we are trying to reduce these computational costs by applying different types of algorithms to the documents which are characterized by large number of features. Another important thing that we deal in this thesis is to have the highest possible accuracy when we have the high computational performance on text document categorization.
7	Model-based decision trees for ranking data Lee, Hong, 李匡 January 2010 (has links) published_or_final_version / Statistics and Actuarial Science / Doctoral / Doctor of Philosophy Ranking and selection (Statistics) Decision trees.
8	Efficient decision tree building algorithms for uncertain data Tsang, Pui-kwan, Smith. January 2008 (has links) Thesis (M. Phil.)--University of Hong Kong, 2009. / Includes bibliographical references (leaves 84-88) Also available in print. Decision trees. Data mining. Algorithms.
9	Mining Shared Decision Trees between Datasets Han, Qian 07 June 2010 (has links) No description available. Computer Science Shared Decision Trees
10	Robust Statistical Approaches Dealing with High-Dimensional Observational Data Zhu, Huichen January 2019 (has links) The theme of this dissertation is to develop robust statistical approaches for the high-dimensional observational data. The development of technology makes data sets more accessible than any other time in history. Abundant data leads to numerous appealing findings and at the same time, requires more thoughtful efforts. We are encountered many obstacles when dealing with high-dimensional data. Heterogeneity and complex interaction structure rule out the traditional mean regression method and expect a novel approach to circumvent the complexity and obtain significant conclusions. Missing data mechanism in high-dimensional data is complicated and is hard to manage with existing methods. This dissertation contains three parts to tackle these obstacles: (1) a tree-based method integrated with the domain knowledge to improve prediction accuracy; (2) a tree-based method with linear splits to accommodate the large-scale and highly correlated data set; (3) an integrative analysis method to reduce the dimension and impute the block-wise missing data simultaneously. In the first part of the dissertation, we propose a tree-based method called conditional quantile random forest (CQRF) to improve the screening and intervention of the onset of mentor disorder incorporating with rich and comprehensive electronic medical records (EMR). Our research is motivated by the REactions to Acute Care and Hospitalization (REACH) study, which is an ongoing prospective observational cohort study of the patient with symptoms of a suspected acute coronary syndrome (ACS). We aim to develop a robust and effective statistical prediction method. The proposed approach fully takes the population heterogeneity into account. We partition the sample space guided by quantile regression over the entire quantile process. The proposed CQRF can provide a more comprehensive and accurate prediction. We also provide theoretical justification for the estimate quantile process. In the second part of the dissertation, we apply the proposed CQRF to REACH data set. The predictive analysis derived by the proposed approach shows that for both entire samples and high-risk group, the proposed CQRF provides more accurate predictions compared with other existing and widely used methods. The variable importance scores give a promising result based on the proposed CQRF that the proposed importance scores identify two variables which have been proved to be critical features by the qualitative study. We also apply the proposed CQRF to Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study data set. We show that the proposed approach improves the personalized medicine recommendation compared with existing treatment recommendation method. We also conduct two simulation studies based on the two real data sets. Both simulation studies validate the consistent property of the estimated quantile process. In the second part, we also extend the proposed CQRF with univariate splits to linear splits to accommodate a large number of highly correlated variables. Gene-environment interaction is a widely concerned topic since the traits of complex disease is always difficult to understand, and we are eager to find interventions tailored to individual genetic variations. The proposed approach is applied to a Breast Cancer Family Registry (BCFR) study data set with body mass index (BMI) as the response variable, several nutrition intake factors, and genotype variables. We aim to figure out what kind of genetic variations affect the heterogeneous effect of the environmental factors on BMI. We devise a criterion which measures the relationship between the response variable and gene variants conditioning on the environmental factor to determine the optimal linear combination split. The variable importance score is also calculated by summing up the criterion across all splits in the random forest. We show in the results that top-ranked genes prioritized by the proposed importance scores make the effect of the environmental factors on BMI differently. In the third part, we introduce an integrative analysis approach called generalized integrative principal component analysis (GIPCA). The heterogeneous data types and the presence of block-wise missing data pose significant challenges to the integration of multi-source data and further statistical analyses. There is not literature can easily accommodate data of multiple types with block-wise missing structure. The proposed GIPCA is a low-rank method which conducts the dimension reduction and imputation of block-wise missing data simultaneously to data with multiple types. Both simulation study and real data analysis show that the proposed approach achieves good missing data imputation accuracy and identifies some meaningful signals. Biometry Biometry--Data processing Statistics Decision trees

Search results