Global ETD Search

1	An Empirical Study of Instance Hardness Smith, Michael Reed 20 November 2009 (has links) (PDF) Most widely accepted measures of performance for learning algorithms, such as accuracy and area under the ROC curve, provide information about behavior at the data set level. They say nothing about which instances are misclassified, whether two learning algorithms with the same classification accuracy on a data set misclassify the same instances, or whether there are instances misclassified by all learning algorithms. These questions about behavior at the instance level motivate our empirical analysis of instance hardness, a measure of expected classification accuracy for an instance. We analyze the classification of 57 data sets using 9 learning algorithms. Of the over 175000 instances investigated, 5% are misclassified by all 9 of the considered learning algorithms, and 15% are misclassified by at least half. We find that the major cause of misclassification for hard instances is class overlap, manifested as outliers and border points which can be exacerbated by class skew. We analyze these causes and show to what extent each leads to misclassifications, both in isolation and jointly. 19.8% of all misclassified instances are outliers; 71.3% are border points; 21% belong to a minority class. We also find that 91.6% of all outliers and 38.3% of all border points are misclassified whereas only 3.5% of instances without class overlap are misclassified. We propose a set of heuristics to predict when an instance will be hard to correctly classify. Additionally, we analyze how different learning algorithms perform on tasks with varying degrees of outliers, border points and class skew. instance hardness outliers border points class skew Computer Sciences
2	Using Instance-Level Meta-Information to Facilitate a More Principled Approach to Machine Learning Smith, Michael Reed 01 April 2015 (has links) (PDF) As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of an experienced practitioner, intuition, trial and error, or another heuristic approach. This dissertation proposes a more principled approach to understand how the learning algorithm, hyperparameters, and data interact with each other to facilitate a data-driven approach for applying machine learning techniques. Specifically, this dissertation examines the properties of the training data and proposes techniques to integrate this information into the learning process and for preprocessing the training set.It also proposes techniques and tools to address selecting a learning algorithm and setting its hyperparameters.This dissertation is comprised of a collection of papers that address understanding the data used in machine learning and the relationship between the data, the performance of a learning algorithm, and the learning algorithms associated hyperparameter settings.Contributions of this dissertation include:* Instance hardness that examines how difficult an instance is to classify correctly.* hardness measures that characterize properties of why an instance may be misclassified.* Several techniques for integrating instance hardness into the learning process. These techniques demonstrate the importance of considering each instance individually rather than doing a global optimization which considers all instances equally.* Large-scale examinations of the investigated techniques including a large numbers of examined data sets and learning algorithms. This provides more robust results that are less likely to be affected by noise.* The Machine Learning Results Repository, a repository for storing the results from machine learning experiments at the instance level (the prediction for each instance is stored). This allows many data set-level measures to be calculated such as accuracy, precision, or recall. These results can be used to better understand the interaction between the data, learning algorithms, and associated hyperparameters. Further, the repository is designed to be a tool for the community where data can be downloaded and uploaded to follow the development of machine learning algorithms and applications. machine learning supervised learning classification meta-learning instance hardness machine learning results repository Computer Sciences
3	Caractérisation des instances difficiles de problèmes d'optimisation NP-difficiles / Characterization of difficult instances for NP-hard problems Weber, Valentin 08 July 2013 (has links) L'étude expérimentale d'algorithmes est un sujet crucial dans la conception de nouveaux algorithmes, puisque le contexte d'évaluation influence inévitablement la mesure de la qualité des algorithmes. Le sujet particulier qui nous intéresse dans l'étude expérimentale est la pertinence des instances choisies pour servir de base de test à l'expérimentation. Nous formalisons ce critère par la notion de "difficulté d'instance" qui dépend des performances pratiques de méthodes de résolution. Le coeur de la thèse porte sur un outil pour évaluer empiriquement la difficulté d'instance. L'approche proposée présente une méthode de benchmarking d'instances sur des jeux de test d'algorithmes. Nous illustrons cette méthode expérimentale pour évaluer des classes d'instances à travers plusieurs exemples d'applications sur le problème du voyageur de commerce. Nous présentons ensuite une approche pour générer des instances difficiles. Elle repose sur des opérations qui modifient les instances, mais qui permettent de retrouver facilement une solution optimale, d'une instance à l'autre. Nous étudions théoriquement et expérimentalement son impact sur les performances de méthodes de résolution. / The empirical study of algorithms is a crucial topic in the design of new algorithms because the context of evaluation inevitably influences the measure of the quality of algorithms. In this topic, we particularly focus on the relevance of instances forming testbeds. We formalize this criterion with the notion of 'instance hardness' that depends on practical performance of some resolution methods. The aim of the thesis is to introduce a tool to evaluate instance hardness. The approach uses benchmarking of instances against a testbed of algorithms. We illustrate our experimental methodology to evaluate instance classes through several applications to the traveling salesman problem. We also suggest possibilities to generate hard instances. They rely on operations that modify instances but that allow to easily find the optimal solution of one instance from the other. We theoretically and empirically study their impact on the performance of some resolution methods. Difficulté d'instance Modification d'instances Science expérimentale des algorithmes Fixed-parameter tractability (FPT) Problème du voyageur de commerce Instance hardness Instance modification Empirical science of algorithms Fixed-parameter tractability (FPT) Traveling salesman problem (TSP) 510 004

1

Page generated in 0.0687 seconds