Global ETD Search

1	Large margin strategies for machine learning Cristianini, Nello January 2000 (has links) No description available. 621.3994 Support vector; Decision tree
2	The Application of Fuzzy Decision Trees in Data Mining - Using Taiwan Stock Market as An Example Cheng, Yuan-Chung 18 June 2002 (has links) Taiwan stock market exists a special feature that over 80% of participants are natural persons while only 20% are legal persons. Compared to the latter, natural persons own less expertise in stock trading. Thus the effectiveness of the local stock market is an interesting subject for research. In this paper, we will try to find out an answer through the using of technical analysis on the past two years trading data to see if it can gain benefit in investment.Most of the similar research in past exist some problems, which either use only single or a pair of technical indices for prediction, predict only a specific stock, or filter out unwanted training and testing data in preprocessing, etc. Thus their results may not really reflect the effectiveness of the market. In this paper, we will adopt a different way of experiment design to conduct the test.Past research has shown that a fuzzy decision tree outperforms a normal crisp decision tree in data classification when there are numerical attributes in the target domain to be classified (Y.M. Jeng, 1993). Since most of the technical indices are expressed in terms of numerical values, we therefore choose it as the tool to generate rules from the eight largest stocks out of the local stock market that have the largest capitals and highest turnover rate. The trees are evaluated with more objective criteria and used to predict the up or down of the stock prices in the next day. The experimental results show that the created fuzzy trees have a better predictive accuracy than a random walk, and the investment rewards based on the trees are much better than the buy-and- hold policy. Fuzzy decision tree Data mining
3	Discrete and Continuous Nonconvex Optimization: Decision Trees, Valid Inequalities, and Reduced Basis Techniques Dalkiran, Evrim 26 April 2011 (has links) This dissertation addresses the modeling and analysis of a strategic risk management problem via a novel decision tree optimization approach, as well as development of enhanced Reformulation-Linearization Technique (RLT)-based linear programming (LP) relaxations for solving nonconvex polynomial programming problems, through the generation of valid inequalities and reduced representations, along with the design and implementation of efficient algorithms. We first conduct a quantitative analysis for a strategic risk management problem that involves allocating certain available failure-mitigating and consequence-alleviating resources to reduce the failure probabilities of system safety components and subsequent losses, respectively, together with selecting optimal strategic decision alternatives, in order to minimize the risk or expected loss in the event of a hazardous occurrence. Using a novel decision tree optimization approach to represent the cascading sequences of probabilistic events as controlled by key decisions and investment alternatives, the problem is modeled as a nonconvex mixed-integer 0-1 factorable program. We develop a specialized branch-and-bound algorithm in which lower bounds are computed via tight linear relaxations of the original problem that are constructed by utilizing a polyhedral outer-approximation mechanism in concert with two alternative linearization schemes having different levels of tightness and complexity. We also suggest three alternative branching schemes, each of which is proven to guarantee convergence to a global optimum for the underlying problem. Extensive computational results and sensitivity analyses are presented to provide insights and to demonstrate the efficacy of the proposed algorithm. In particular, our methodology outperformed the commercial software BARON (Version 8.1.5), yielding a more robust performance along with an 89.9% savings in effort on average. Next, we enhance RLT-based LP relaxations for polynomial programming problems by developing two classes of valid inequalities: v-semidefinite cuts and bound-grid-factor constraints. The first of these uses concepts derived from semidefinite programming. Given an RLT relaxation, we impose positive semidefiniteness on suitable dyadic variable-product matrices, and correspondingly derive implied semidefinite cuts. In the case of polynomial programs, there are several possible variants for selecting such dyadic variable-product matrices for imposing positive semidefiniteness restrictions in order to derive implied valid inequalities, which leads to a new class of cutting planes that we call v-semidefinite cuts. We explore various strategies for generating such cuts within the context of an RLT-based branch-and-cut scheme, and exhibit their relative effectiveness towards tightening the RLT relaxations and solving the underlying polynomial programming problems, using a test-bed of randomly generated instances as well as standard problems from the literature. Our results demonstrate that these cutting planes achieve a significant tightening of the lower bound in contrast with using RLT as a stand-alone approach, thereby enabling an appreciable reduction in the overall computational effort, even in comparison with the commercial software BARON. Empirically, our proposed cut-enhanced algorithm reduced the computational effort required by the latter two approaches by 44% and 77%, respectively, over a test-bed of 60 polynomial programming problems. As a second cutting plane strategy, we introduce a new class of bound-grid-factor constraints that can be judiciously used to augment the basic RLT relaxations in order to improve the quality of lower bounds and enhance the performance of global branch-and-bound algorithms. Certain theoretical properties are established that shed light on the effect of these valid inequalities in driving the discrepancies between RLT variables and their associated nonlinear products to zero. To preserve computational expediency while promoting efficiency, we propose certain concurrent and sequential cut generation routines and various grid-factor selection rules. The results indicate a significant tightening of lower bounds, which yields an overall reduction in computational effort of 21% for solving a test-bed of 15 challenging polynomial programming problems to global optimality in comparison with the basic RLT procedure, and over a 100-fold speed-up in comparison with the commercial software BARON. Finally, we explore equivalent, reduced size RLT-based formulations for polynomial programming problems. Utilizing a basis partitioning scheme for an embedded linear equality subsystem, we show that a strict subset of RLT defining equalities imply the remaining ones. Applying this result, we derive significantly reduced RLT representations and develop certain coherent associated branching rules that assure convergence to a global optimum, along with static as well as dynamic basis selection strategies to implement the proposed procedure. In addition, we enhance the RLT relaxations with v-semidefinite cuts, which are empirically shown to further improve the relative performance of the reduced RLT method over the usual RLT approach. Computational results presented using a test-bed of 10 challenging polynomial programs to evaluate the different reduction strategies demonstrate that our superlative proposed approach achieved more than a four-fold improvement in computational effort in comparison with both the commercial software BARON and a recently developed open-source code, Couenne, for solving nonconvex mixed-integer nonlinear programming problems. Moreover, our approach robustly solved all the test cases to global optimality, whereas BARON and Couenne were jointly able to solve only a single instance to optimality within the set computational time limit, having an unresolved average optimality gap of 260% and 437%, respectively, for the other nine instances. This dissertation makes several broader contributions to the field of nonconvex optimization, including factorable, nonlinear mixed-integer programming problems. The proposed decision tree optimization framework can serve as a versatile management tool in the arenas of homeland security and health-care. Furthermore, we have advanced the frontier for tackling formidable nonconvex polynomial programming problems that arise in emerging fields such as signal processing, biomedical engineering, materials science, and risk management. An open-source software using the proposed reduced RLT representations, semidefinite cuts, bound-grid-factor constraints, and range reduction strategies, is currently under preparation. In addition, the different classes of challenging polynomial programming test problems that are utilized in the computational studies conducted in this dissertation have been made available for other researchers via the Web-page http://filebox.vt.edu/users/dalkiran/website/. It is our hope and belief that the modeling and methodological contributions made in this dissertation will serve society in a broader context through the myriad of widespread applications they support. / Ph. D. Decision tree optimization Reformulation-Lineariz
4	Benchmarking purely functional data structures Moss, Graeme E. January 2000 (has links) No description available. 005
5	Fast sequential implementation of a lightweight, data stream driven, parallel language with application to intrusion detection Martin, Xavier 18 December 2007 (has links) The general problem we consider in this thesis is the following: we have to analyze a stream of data (records, packets, events ...) by successively applying to each piece of data a set of ``rules'. Rules are best viewed as lightweight parallel processes synchronizing on each arrival of a new piece of data. In many applications, such as signature-based intrusion detection, only a few rules are concerned with each new piece of data. But all other rules have to be executed anyway just to conclude that they can ignore it. Our goal is to make it possible to avoid this useless work completely. To do so, we perform a static analysis of the code of each rule and we build a decision tree that we apply to each piece of data before executing the rule. The decision tree tells us whether executing the rule or not will change anything to the global analysis results. The decision trees are built at compile time, but their evaluation at each cycle (i.e., for each piece of data) entails an overhead. Thus we organize the set of all computed decision trees in a way that makes their evaluation as fast as possible. The two main original contributions of this thesis are the following. Firstly, we propose a method to organize the set of decision trees and the set of active rules in such a way that deciding which rules to execute can be made optimally in O(r_u), where r_u is the number of useful rules. This time complexity is thus independent of the actual (total) number of active rules. This method is based on the use of a global decision tree that integrates all individual decision trees built from the code of the rules. Secondly, as such a global tree may quickly become much too large if usual data structures are used, we introduce a novel kind of data structure called sequential tree that allows us to keep global decision trees much smaller in many situations where the individual trees share few common conditions. (When many conditions are shared by individual trees the global tree remains small.) To assess our contribution, we first modify the implementation of ASAX, a generic system for data stream analysis based on the rule paradigm presented above. Then we compare the efficiency of the optimized system with respect to its original implementation, using the MIT Lincoln Laboratory Evaluation Dataset and a classical set of intrusion detection rules. Impressive speed-ups are obtained. Finally, our optimized implementation has been used by Nicolas Vanderavero, in his PhD thesis, for the design of stateful honeytanks (i.e., low-interaction honeypots). It makes it possible to simulate tens of thousands hosts on a single computer, with a high level of realism. Decision tree Parallel analyses Intrusion detection
6	Estimating the credit risk of consumer loan by decision tree Lu, Chin-Pin 21 June 2001 (has links) No description available. consumer loan credit risk decision tree
7	Constructing Decision Tree Using Learners¡¦ Portfolio for Supporting e-Learning Liao, Shen-Jai 01 July 2003 (has links) In recent years, with the development of electronic media, e-learning has begun to replace traditional teaching and learning with Internet service. With the availability of newly developed technology, opportunities have risen for the teacher of e-learning to using students¡¦ learning logs that recorded via Web site to understanding the learning state of students. This research will address an analytical mechanism that integrated multidimensional logs to let teachers observe students all learning behaviors and learning status immediately, and used decision tree analysis to detect when and what students may have a learning bottleneck. Finally, teachers can use those results to give the right student with the right remedial instruction at the right time. Summary, we have four conclusions: (1) the decision rules are different from course to course, for example instruction method and assessment method, assignment is a basis to assess student¡¦s learning effectiveness, as well those attributes cooperate with learning effectiveness are related to student¡¦s learning behaviors. (2) To accumulate those learning behavior attributes with the time point actually detect learners probably learning effectiveness early. The variation of effectiveness with different time interval is not clearly, but all time intervals can detect learning effectiveness early. (3) To detect students¡¦ learning effectiveness with different grade level classifications, every grade level classifications can describe decision rules very well, but not to detect all students¡¦ learning effectiveness. (4) Although to detect high-grade students¡¦ learning effectiveness are very difficult, but we can detect lower-grade students¡¦ learning effectiveness. Finally, this research can really observe student¡¦s leaning states immediately, and early detect students¡¦ learning effectiveness. Therefore, teachers can make decisions to manage learning activities to promote learning effect. Portfolio Decision-Tree Analysis e-Learning Data Mining
8	NOVEL MACHINE LEARNING ALGORITHMS Farhangfar, Alireza Unknown Date No description available. Active learning Fixed-depth decision tree
9	Using statistical learning to predict survival of passengers on the RMS Titanic Whitley, Michael Aaron January 1900 (has links) Master of Science / Statistics / Christopher Vahl / When exploring data, predictive analytics techniques have proven to be effective. In this report, the efficiency of several predictive analytics methods are explored. During the time of this study, Kaggle.com, a data science competition website, had the predictive modeling competition, "Titanic: Machine Learning from Disaster" available. This competition posed a classification problem to build a predictive model to predict the survival of passengers on the RMS Titanic. The focus of our approach was on applying a traditional classification and regression tree algorithm. The algorithm is greedy and can over fit the training data, which consequently can yield non-optimal prediction accuracy. In efforts to correct such issues with using the classification and regression tree algorithm, we have implemented cost complexity pruning and ensemble methods such as bagging and random forests. However, no improvement was observed here which may be an artifact associated with the Titanic data and may not be representative of those methods’ performances. The decision trees and prediction accuracy of each method are presented and compared. Results indicate that the predictors sex/title, fare price, age, and passenger class are the most important variables in predicting survival of the passengers. Decision tree Ensemble Kaggle Titanic Statistics (0463)
10	Increasing Nursing Staff Knowledge of Palliative Care Criteria with a Decision Tree Cotton, Juliana 01 January 2019 (has links) Palliative care is often not considered during care or is considered too late in the patient’s healthcare journey to provide much benefit. The underutilization of palliative care contributes to increased healthcare costs, poor patient outcomes, and decreased patient satisfaction. The practice-focused question guiding this evidence-based practice (EBP) project was whether an education program would increase nursing knowledge regarding palliative care criteria. The program was developed using Rogers’s diffusion of innovation model and a literature review to create educational tools and achieve a sustainable EBP change. An evidence-based decision tree was developed and used as a tool for teaching and learning. Other assessment tools included a pretest, posttest, and program evaluation. Twenty staff nurses from the same department participated in the education program. Registered nurses were selected based on the amount of regular face to face patient contact they have with patients. The education program increased knowledge of palliative care by 58% and validated the need for nursing education of palliative care criteria. The program might be beneficial to disseminate to all nurses who have patient contact. The potential for positive social change generated from findings of this project include improving satisfaction, quality of care, and outcomes of the patients and families benefiting from palliative care services. Decision Tree Nursing Education Palliative Care Nursing

Search results