Global ETD Search

81	Improving Hoeffding Trees Kirkby, Richard Brendon January 2008 (has links) Modern information technology allows information to be collected at a far greater rate than ever before. So fast, in fact, that the main problem is making sense of it all. Machine learning offers promise of a solution, but the field mainly focusses on achieving high accuracy when data supply is limited. While this has created sophisticated classification algorithms, many do not cope with increasing data set sizes. When the data set sizes get to a point where they could be considered to represent a continuous supply, or data stream, then incremental classification algorithms are required. In this setting, the effectiveness of an algorithm cannot simply be assessed by accuracy alone. Consideration needs to be given to the memory available to the algorithm and the speed at which data is processed in terms of both the time taken to predict the class of a new data sample and the time taken to include this sample in an incrementally updated classification model. The Hoeffding tree algorithm is a state-of-the-art method for inducing decision trees from data streams. The aim of this thesis is to improve this algorithm. To measure improvement, a comprehensive framework for evaluating the performance of data stream algorithms is developed. Within the framework memory size is fixed in order to simulate realistic application scenarios. In order to simulate continuous operation, classes of synthetic data are generated providing an evaluation on a large scale. Improvements to many aspects of the Hoeffding tree algorithm are demonstrated. First, a number of methods for handling continuous numeric features are compared. Second, tree prediction strategy is investigated to evaluate the utility of various methods. Finally, the possibility of improving accuracy using ensemble methods is explored. The experimental results provide meaningful comparisons of accuracy and processing speeds between different modifications of the Hoeffding tree algorithm under various memory limits. The study on numeric attributes demonstrates that sacrificing accuracy for space at the local level often results in improved global accuracy. The prediction strategy shown to perform best adaptively chooses between standard majority class and Naive Bayes prediction in the leaves. The ensemble method investigation shows that combining trees can be worthwhile, but only when sufficient memory is available, and improvement is less likely than in traditional machine learning. In particular, issues are encountered when applying the popular boosting method to streams. machine learning classification data streams decision trees hoeffding trees boosting bagging option trees
82	Structural classification of glaucomatous optic neuropathy Twa, Michael Duane, January 2006 (has links) Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 115-121).
83	Predicting High-cost Patients in General Population Using Data Mining Techniques Izad Shenas, Seyed Abdolmotalleb 26 October 2012 (has links) In this research, we apply data mining techniques to a nationally-representative expenditure data from the US to predict very high-cost patients in the top 5 cost percentiles, among the general population. Samples are derived from the Medical Expenditure Panel Survey’s Household Component data for 2006-2008 including 98,175 records. After pre-processing, partitioning and balancing the data, the final MEPS dataset with 31,704 records is modeled by Decision Trees (including C5.0 and CHAID), Neural Networks. Multiple predictive models are built and their performances are analyzed using various measures including correctness accuracy, G-mean, and Area under ROC Curve. We conclude that the CHAID tree returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. Among a primary set of 66 attributes, the best predictors to estimate the top 5% high-cost population include individual’s overall health perception, history of blood cholesterol check, history of physical/sensory/mental limitations, age, and history of colonic prevention measures. It is worthy to note that we do not consider number of visits to care providers as a predictor since it has a high correlation with the expenditure, and does not offer a new insight to the data (i.e. it is a trivial predictor). We predict high-cost patients without knowing how many times the patient was visited by doctors or hospitalized. Consequently, the results from this study can be used by policy makers, health planners, and insurers to plan and improve delivery of health services. Cost prediction Data mining Decision Trees Neural networks Medical Expenditure Panel Survey Predictive modelling
84	Defect Cause Modeling With Decision Tree And Regression Analysis: A Case Study In Casting Industry Bakir, Berna 01 May 2007 (has links) (PDF) In this thesis, we study improvement of product quality in manufacturing industry by identifying and optimizing influential process variables that cause defects on the items produced. Real data provided by a manufacturing company from the metal casting industry were studied. Two well-known approaches, logistic regression and decision trees, were used to model the relationship between process variables and defect types. The approaches used were compared. T Information Technology 58.5-58.64
85	A Comprehensive Review Of Data Mining Applications In Quality Improvement And A Case Study Gunturkun, Fatma 01 August 2007 (has links) (PDF) In today&lsquo / s world, knowledge is the most powerful factor for the success of the organizations. One of the most important resources to reach this knowledge is the huge data stored in their databases. In the analysis of this data, DM techniques are essentially used. In this thesis, firstly, a comprehensive literature review on DM techniques for the quality improvement in manufacturing is presented. Then one of these techniques is applied on a case study. In the case study, the customer quality perception data for driver seat quality is analyzed. Decision tree approach is implemented to identify the most influential variables on the satisfaction of customers regarding the comfort of the driver seat. Results obtained are compared to those of logistic regression analysis implemented in another study. QA General 15707
86	Business Failure Predictions In Istanbul Stock Exchange Tekel, Onur 01 June 2009 (has links) (PDF) This study aims to develop business failure prediction models using the data of selected firms from ISE markets. The sample data comprise ten selected financial ratios for 27 non-going concerns (failed businesses) and paired 27 going concerns. Two non-parametric classification methods are used in the study: Artificial Neural Networks (ANN) and Decision Trees. The classification results show that there is equilibrium in the classification of the training samples by the models, but ANN model outperform the decision tree model in the classification of the testing samples. Further, the potential usefulness of ANN and Decision Tree type data mining techniques in the analysis of complex and non-linear relationships are observed. HG Finance 1-9999
87	Analyzing biological expression data based on decision tree induction Flöter, André January 2005 (has links) <P>Modern biological analysis techniques supply scientists with various forms of data. One category of such data are the so called "expression data". These data indicate the quantities of biochemical compounds present in tissue samples.</P> <P>Recently, expression data can be generated at a high speed. This leads in turn to amounts of data no longer analysable by classical statistical techniques. Systems biology is the new field that focuses on the modelling of this information.</P> <P>At present, various methods are used for this purpose. One superordinate class of these methods is machine learning. Methods of this kind had, until recently, predominantly been used for classification and prediction tasks. This neglected a powerful secondary benefit: the ability to induce interpretable models.</P> <P>Obtaining such models from data has become a key issue within Systems biology. Numerous approaches have been proposed and intensively discussed. This thesis focuses on the examination and exploitation of one basic technique: decision trees.</P> <P>The concept of comparing sets of decision trees is developed. This method offers the possibility of identifying significant thresholds in continuous or discrete valued attributes through their corresponding set of decision trees. Finding significant thresholds in attributes is a means of identifying states in living organisms. Knowing about states is an invaluable clue to the understanding of dynamic processes in organisms. Applied to metabolite concentration data, the proposed method was able to identify states which were not found with conventional techniques for threshold extraction.</P> <P>A second approach exploits the structure of sets of decision trees for the discovery of combinatorial dependencies between attributes. Previous work on this issue has focused either on expensive computational methods or the interpretation of single decision trees a very limited exploitation of the data. This has led to incomplete or unstable results. That is why a new method is developed that uses sets of decision trees to overcome these limitations.</P> <P>Both the introduced methods are available as software tools. They can be applied consecutively or separately. That way they make up a package of analytical tools that usefully supplement existing methods.</P> <P>By means of these tools, the newly introduced methods were able to confirm existing knowledge and to suggest interesting and new relationships between metabolites.</P> / <P>Neuere biologische Analysetechniken liefern Forschern verschiedenste Arten von Daten. Eine Art dieser Daten sind die so genannten "Expressionsdaten". Sie geben die Konzentrationen biochemischer Inhaltsstoffe in Gewebeproben an.<P> <P>Neuerdings können Expressionsdaten sehr schnell erzeugt werden. Das führt wiederum zu so großen Datenmengen, dass sie nicht mehr mit klassischen statistischen Verfahren analysiert werden können. "System biology" ist eine neue Disziplin, die sich mit der Modellierung solcher Information befasst.</P> <P>Zur Zeit werden dazu verschiedenste Methoden benutzt. Eine Superklasse dieser Methoden ist das maschinelle Lernen. Dieses wurde bis vor kurzem ausschließlich zum Klassifizieren und zum Vorhersagen genutzt. Dabei wurde eine wichtige zweite Eigenschaft vernachlässigt, nämlich die Möglichkeit zum Erlernen von interpretierbaren Modellen.</P> <P>Die Erstellung solcher Modelle hat mittlerweile eine Schlüsselrolle in der "Systems biology" erlangt. Es sind bereits zahlreiche Methoden dazu vorgeschlagen und diskutiert worden. Die vorliegende Arbeit befasst sich mit der Untersuchung und Nutzung einer ganz grundlegenden Technik: den Entscheidungsbäumen.</P> <P>Zunächst wird ein Konzept zum Vergleich von Baummengen entwickelt, welches das Erkennen bedeutsamer Schwellwerte in reellwertigen Daten anhand ihrer zugehörigen Entscheidungswälder ermöglicht. Das Erkennen solcher Schwellwerte dient dem Verständnis von dynamischen Abläufen in lebenden Organismen. Bei der Anwendung dieser Technik auf metabolische Konzentrationsdaten wurden bereits Zustände erkannt, die nicht mit herkömmlichen Techniken entdeckt werden konnten.</P> <P>Ein zweiter Ansatz befasst sich mit der Auswertung der Struktur von Entscheidungswäldern zur Entdeckung von kombinatorischen Abhängigkeiten zwischen Attributen. Bisherige Arbeiten hierzu befassten sich vornehmlich mit rechenintensiven Verfahren oder mit einzelnen Entscheidungsbäumen, eine sehr eingeschränkte Ausbeutung der Daten. Das führte dann entweder zu unvollständigen oder instabilen Ergebnissen. Darum wird hier eine Methode entwickelt, die Mengen von Entscheidungsbäumen nutzt, um diese Beschränkungen zu überwinden.</P> <P>Beide vorgestellten Verfahren gibt es als Werkzeuge für den Computer, die entweder hintereinander oder einzeln verwendet werden können. Auf diese Weise stellen sie eine sinnvolle Ergänzung zu vorhandenen Analyswerkzeugen dar.</P> <P>Mit Hilfe der bereitgestellten Software war es möglich, bekanntes Wissen zu bestätigen und interessante neue Zusammenhänge im Stoffwechsel von Pflanzen aufzuzeigen.</P> Molekulare Bioinformatik Maschinelles Lernen Entscheidungsbäume machine learning decision trees computational biology Data processing Computer science
88	Comparing NR Expression among Metabolic Syndrome Risk Factors Jacobsson, Annelie January 2003 (has links) <p>The metabolic syndrome is a cluster of metabolic risk factors such as diabetes type II, dyslipidemia, hypertension, obesity, microalbuminurea and insulin resistance, which in the recent years has increased greatly in many parts of the world. In this thesis decision trees were applied to the BioExpress database, including both clinical data about donors and gene expression data, to investigate nuclear receptors ability to serve as markers for the metabolic syndrome. Decision trees were created and the classification performance for each individual risk factor were then analysed. The rules generated from the risk factor trees were compared in order to search for similarities and dissimilarities. The comparisons of rules were performed in pairs of risk factors, in groups of three and on all risk factors and they resulted in the discovery of a set of genes where the most interesting were the Peroxisome Proliferator Activated Receptor - Alpha, the Peroxisome Proliferator Activated Receptor - Gamma and the Glucocorticoid Receptor. These genes existed in pathways associated with the metabolic syndrome and in the recent scientific literature.</p> Metabolic Syndrome Nuclear Receptors Data Mining Decision trees Gene expression analysis Bioinformatics Bioinformatik
89	Application of decision diagrams for information storage and retrieval Komaragiri, Vivek Chakravarthy. January 2002 (has links) Thesis (M.S.)--Mississippi State University. Department of Electrical and Computer Engineering. / Title from title screen. Includes bibliographical references.
90	Discrete function representations utilizing decision diagrams and spectral techniques Townsend, Whitney Jeanne. January 2002 (has links) Thesis (M.S.) -- Mississippi State University. Department of Electrical and Computer Engineering. / Title from title screen. Includes bibliographical references.

Search results