Global ETD Search

1	Generalization Performance of Margin Multi-category Classifiers / Performances en généralisation des classifieurs multi-classes à marge Musayeva, Khadija 23 September 2019 (has links) Cette thèse porte sur la théorie de la discrimination multi-classe à marge. Elle a pour cadre la théorie statistique de l’apprentissage de Vapnik et Chervonenkis. L’objectif est d’établir des bornes de généralisation possédant une dépendances explicite au nombre C de catégories, à la taille m de l’échantillon et au paramètre de marge gamma, lorsque la fonction de perte considérée est une fonction de perte à marge possédant la propriété d’être lipschitzienne. La borne de généralisation repose sur la performance empirique du classifieur ainsi que sur sa "capacité". Dans cette thèse, les mesures de capacité considérées sont les suivantes : la complexité de Rademacher, les nombres de recouvrement et la dimension fat-shattering. Nos principales contributions sont obtenues sous l’hypothèse que les classes de fonctions composantes calculées par le classifieur ont des dimensions fat-shattering polynomiales et que les fonctions composantes sont indépendantes. Dans le contexte du schéma de calcul introduit par Mendelson, qui repose sur les relations entre les mesures de capacité évoquées plus haut, nous étudions l’impact que la décomposition au niveau de l’une de ces mesures de capacité a sur les dépendances (de la borne de généralisation) à C, m et gamma. En particulier, nous démontrons que la dépendance à C peut être considérablement améliorée par rapport à l’état de l’art si la décomposition est reportée au niveau du nombre de recouvrement ou de la dimension fat-shattering. Ce changement peut affecter négativement le taux de convergence (dépendance à m), ce qui souligne le fait que l’optimisation par rapport aux trois paramètres fondamentaux se traduit par la recherche d’un compromis. / This thesis deals with the theory of margin multi-category classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scale-sensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fat-shattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fat-shattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fat-shattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fat-shattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a trade-off. Apprentissage Théorie de l’apprentissage Discrimination multi-classe Risques garantis Classifieurs à marge Statistical learning theory Multi-category classification Risk bounds Margin classifiers 006.3 518.1
2	Aspects of Online Learning Harrington, Edward, edwardharrington@homemail.com.au January 2004 (has links) Online learning algorithms have several key advantages compared to their batch learning algorithm counterparts: they are generally more memory efficient, and computationally mor efficient; they are simpler to implement; and they are able to adapt to changes where the learning model is time varying. Online algorithms because of their simplicity are very appealing to practitioners. his thesis investigates several online learning algorithms and their application. The thesis has an underlying theme of the idea of combining several simple algorithms to give better performance. In this thesis we investigate: combining weights, combining hypothesis, and (sort of) hierarchical combining.¶ Firstly, we propose a new online variant of the Bayes point machine (BPM), called the online Bayes point machine (OBPM). We study the theoretical and empirical performance of the OBPm algorithm. We show that the empirical performance of the OBPM algorithm is comparable with other large margin classifier methods such as the approximately large margin algorithm (ALMA) and methods which maximise the margin explicitly, like the support vector machine (SVM). The OBPM algorithm when used with a parallel architecture offers potential computational savings compared to ALMA. We compare the test error performance of the OBPM algorithm with other online algorithms: the Perceptron, the voted-Perceptron, and Bagging. We demonstrate that the combinationof the voted-Perceptron algorithm and the OBPM algorithm, called voted-OBPM algorithm has better test error performance than the voted-Perceptron and Bagging algorithms. We investigate the use of various online voting methods against the problem of ranking, and the problem of collaborative filtering of instances. We look at the application of online Bagging and OBPM algorithms to the telecommunications problem of channel equalization. We show that both online methods were successful at reducing the effect on the test error of label flipping and additive noise.¶ Secondly, we introduce a new mixture of experts algorithm, the fixed-share hierarchy (FSH) algorithm. The FSH algorithm is able to track the mixture of experts when the switching rate between the best experts may not be constant. We study the theoretical aspects of the FSH and the practical application of it to adaptive equalization. Using simulations we show that the FSH algorithm is able to track the best expert, or mixture of experts, in both the case where the switching rate is constant and the case where the switching rate is time varying. online learning algorithms perceptron large margin classifiers Bayes point machine BPM online Bayes point machine OBPM tracking experts fixed share hierarchy algorithm FSH channel equalization equalizer line voting ranking algorithms
3	Sparse Multiclass And Multi-Label Classifier Design For Faster Inference Bapat, Tanuja 12 1900 (has links) (PDF) Many real-world problems like hand-written digit recognition or semantic scene classiﬁcation are treated as multiclass or multi-label classiﬁcation prob-lems. Solutions to these problems using support vector machines (SVMs) are well studied in literature. In this work, we focus on building sparse max-margin classiﬁers for multiclass and multi-label classiﬁcation. Sparse representation of the resulting classiﬁer is important both from eﬃcient training and fast inference viewpoints. This is true especially when the training and test set sizes are large.Very few of the existing multiclass and multi-label classiﬁcation algorithms have given importance to controlling the sparsity of the designed classiﬁers directly. Further, these algorithms were not found to be scalable. Motivated by this, we propose new formulations for sparse multiclass and multi-label classiﬁer design and also give eﬃcient algorithms to solve them. The formulation for sparse multi-label classiﬁcation also incorporates the prior knowledge of label correlations. In both the cases, the classiﬁcation model is designed using a common set of basis vectors across all the classes. These basis vectors are greedily added to an initially empty model, to approximate the target function. The sparsity of the classiﬁer can be controlled by a user deﬁned parameter, dmax which indicates the max-imum number of common basis vectors. The computational complexity of these algorithms for multiclass and multi-label classiﬁer designisO(lk2d2 max), Where l is the number of training set examples and k is the number of classes. The inference time for the proposed multiclass and multi-label classiﬁers is O(kdmax). Numerical experiments on various real-world benchmark datasets demonstrate that the proposed algorithms result in sparse classiﬁers that require lesser number of basis vectors than required by state-of-the-art algorithms, to attain the same generalization performance. Very small value of dmax results in signiﬁcant reduction in inference time. Thus, the proposed algorithms provide useful alternatives to the existing algorithms for sparse multiclass and multi-label classiﬁer design. Artificial Intelligence Machine Learning Multiclass Classification Multi-label Classification Sparse Max-Margin Classifiers Support Vector Machine (SVM) Sparse Classifiers Computer Science

1

Page generated in 0.3649 seconds