Spelling suggestions: "subject:"atatistical 1earning 1heory"" "subject:"atatistical 1earning atheory""
1 
Fundamental Limitations of SemiSupervised LearningLu, Tyler (Tian) 30 April 2009 (has links)
The emergence of a new paradigm in machine learning known as semisupervised learning (SSL) has seen benefits to many applications where labeled data is expensive to obtain. However, unlike supervised learning (SL), which enjoys a rich and deep theoretical foundation, semisupervised learning, which uses additional unlabeled data for training, still remains a theoretical mystery lacking a sound fundamental understanding. The purpose of this research thesis is to take a first step towards bridging this theorypractice gap.
We focus on investigating the inherent limitations of the benefits SSL can provide over SL. We develop a framework under which one can analyze the potential benefits, as measured by the sample complexity of SSL. Our framework is utopian in the sense that a SSL algorithm trains on a labeled sample and an unlabeled distribution, as opposed to an unlabeled sample in the usual SSL model. Thus, any lower bound on the sample complexity of SSL in this model implies lower bounds in the usual model.
Roughly, our conclusion is that unless the learner is absolutely certain there is some nontrivial relationship between labels and the unlabeled distribution (``SSL type assumption''), SSL cannot provide significant advantages over SL. Technically speaking, we show that the sample complexity of SSL is no more than a constant factor better than SL for any unlabeled distribution, under a nopriorknowledge setting (i.e. without SSL type assumptions). We prove that for the class of thresholds in the realizable setting the sample complexity of SL is at most twice that of SSL. Also, we prove that in the agnostic setting for the classes of thresholds and union of intervals the sample complexity of SL is at most a constant factor larger than that of SSL. We conjecture this to be a general phenomenon applying to any hypothesis class.
We also discuss issues regarding SSL type assumptions, and in particular the popular cluster assumption. We give examples that show even in the most accommodating circumstances, learning under the cluster assumption can be hazardous and lead to prediction performance much worse than simply ignoring the unlabeled data and doing supervised learning.
We conclude with a look into future research directions that build on our investigation.

2 
Fundamental Limitations of SemiSupervised LearningLu, Tyler (Tian) 30 April 2009 (has links)
The emergence of a new paradigm in machine learning known as semisupervised learning (SSL) has seen benefits to many applications where labeled data is expensive to obtain. However, unlike supervised learning (SL), which enjoys a rich and deep theoretical foundation, semisupervised learning, which uses additional unlabeled data for training, still remains a theoretical mystery lacking a sound fundamental understanding. The purpose of this research thesis is to take a first step towards bridging this theorypractice gap.
We focus on investigating the inherent limitations of the benefits SSL can provide over SL. We develop a framework under which one can analyze the potential benefits, as measured by the sample complexity of SSL. Our framework is utopian in the sense that a SSL algorithm trains on a labeled sample and an unlabeled distribution, as opposed to an unlabeled sample in the usual SSL model. Thus, any lower bound on the sample complexity of SSL in this model implies lower bounds in the usual model.
Roughly, our conclusion is that unless the learner is absolutely certain there is some nontrivial relationship between labels and the unlabeled distribution (``SSL type assumption''), SSL cannot provide significant advantages over SL. Technically speaking, we show that the sample complexity of SSL is no more than a constant factor better than SL for any unlabeled distribution, under a nopriorknowledge setting (i.e. without SSL type assumptions). We prove that for the class of thresholds in the realizable setting the sample complexity of SL is at most twice that of SSL. Also, we prove that in the agnostic setting for the classes of thresholds and union of intervals the sample complexity of SL is at most a constant factor larger than that of SSL. We conjecture this to be a general phenomenon applying to any hypothesis class.
We also discuss issues regarding SSL type assumptions, and in particular the popular cluster assumption. We give examples that show even in the most accommodating circumstances, learning under the cluster assumption can be hazardous and lead to prediction performance much worse than simply ignoring the unlabeled data and doing supervised learning.
We conclude with a look into future research directions that build on our investigation.

3 
Mathematical Theories of Interaction with OraclesYang, Liu 01 October 2013 (has links)
No description available.

4 
Efficient Algorithms for Learning Combinatorial Structures from Limited DataAsish Ghoshal (5929691) 15 May 2019 (has links)
<div>Recovering combinatorial structures from noisy observations is a recurrent problem in many application domains, including, but not limited to, natural language processing, computer vision, genetics, health care, and automation. For instance, dependency parsing in natural language processing entails recovering parse trees from sentences which are inherently ambiguous. From a computational standpoint, such problems are typically intractable and call for designing efficient approximation or randomized algorithms with provable guarantees. From a statistical standpoint, algorithms that recover the desired structure using an optimal number of samples are of paramount importance.</div><div><br></div><div>We tackle several such problems in this thesis and obtain computationally and statistically efficient procedures. We demonstrate optimality of our methods by proving fundamental lower bounds on the number of samples needed by any method for recovering the desired structures. Specifically, the thesis makes the following contributions:</div><div><br></div><div>(i) We develop polynomialtime algorithms for learning linear structural equation models  which are a widely used class of models for performing causal inference  that recover the correct directed acyclic graph structure under identifiability conditions that are weaker than existing conditions. We also show that the sample complexity of our method is informationtheoretically optimal.</div><div><br></div><div>(ii) We develop polynomialtime algorithms for learning the underlying graphical game from observations of the behavior of selfinterested agents. The key combinatorial problem here is to recover the Nash equilibria set of the true game from behavioral data. We obtain fundamental lower bounds on the number of samples required for learning games and show that our method is statistically optimal.</div><div><br></div><div>(iii) Lastly, departing from the generative model framework, we consider the problem of structured prediction where the goal is to learn predictors from data that predict complex structured objects directly from a given input. We develop efficient learning algorithms that learn structured predictors by approximating the partition function and obtain generalization guarantees for our method. We demonstrate that randomization can not only improve efficiency but also generalization to unseen data.</div><div><br></div>

5 
Neural NetworksJordan, Michael I., Bishop, Christopher M. 13 March 1996 (has links)
We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.

6 
A Note on Support Vector Machines DegeneracyRifkin, Ryan, Pontil, Massimiliano, Verri, Alessandro 11 August 1999 (has links)
When training Support Vector Machines (SVMs) over nonseparable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.

7 
Secession and Survival: Nations, States and Violent ConflictSiroky, David S. January 2009 (has links)
<p>Secession is a watershed event not only for the new state that is created and the old state that is dissolved, but also for neighboring states, proximate ethnopolitical groups and major powers. This project examines the problem of violent secessionist conflict and addresses an important debate at the intersection of comparative and international politics about the conditions under which secession is a peaceful solution to ethnic conflict. It demonstrates that secession is rarely a solution to ethnic conflict, does not assure the protection of remaining minorities and produces new forms of violence. To explain why some secessions produce peace, while others generate violence, the project develops a theoretical model of the conditions that produce internally coherent, stable and peaceful postsecessionist states rather than recursive secession (i.e., secession from a new secessionist state) or interstate disputes between the rump and secessionist state. Theoretically, the analysis reveals a curvilinear relationship between ethnoterritorial heterogeneity and conflict, explains disparate findings in the literature on ethnic conflict and conclusively links ethnic structure and violence. The project also contributes to the literature on secessionist violence, and civil war more generally, by linking intrastate and interstate causes, showing that what is frequently thought of as a domestic phenomenon is in fact mostly a phenomenon of international politics. Drawing upon original data, methodological advances at the interface of statistics, computer science and probability theory, and qualitative methods such as elite interviews and archival research, the project offers a comprehensive, comparative and contextual treatment of secession and violence.</p> / Dissertation

8 
Generalization Performance of Margin Multicategory Classifiers / Performances en généralisation des classifieurs multiclasses à margeMusayeva, Khadija 23 September 2019 (has links)
Cette thèse porte sur la théorie de la discrimination multiclasse à marge. Elle a pour cadre la théorie statistique de l’apprentissage de Vapnik et Chervonenkis. L’objectif est d’établir des bornes de généralisation possédant une dépendances explicite au nombre C de catégories, à la taille m de l’échantillon et au paramètre de marge gamma, lorsque la fonction de perte considérée est une fonction de perte à marge possédant la propriété d’être lipschitzienne. La borne de généralisation repose sur la performance empirique du classifieur ainsi que sur sa "capacité". Dans cette thèse, les mesures de capacité considérées sont les suivantes : la complexité de Rademacher, les nombres de recouvrement et la dimension fatshattering. Nos principales contributions sont obtenues sous l’hypothèse que les classes de fonctions composantes calculées par le classifieur ont des dimensions fatshattering polynomiales et que les fonctions composantes sont indépendantes. Dans le contexte du schéma de calcul introduit par Mendelson, qui repose sur les relations entre les mesures de capacité évoquées plus haut, nous étudions l’impact que la décomposition au niveau de l’une de ces mesures de capacité a sur les dépendances (de la borne de généralisation) à C, m et gamma. En particulier, nous démontrons que la dépendance à C peut être considérablement améliorée par rapport à l’état de l’art si la décomposition est reportée au niveau du nombre de recouvrement ou de la dimension fatshattering. Ce changement peut affecter négativement le taux de convergence (dépendance à m), ce qui souligne le fait que l’optimisation par rapport aux trois paramètres fondamentaux se traduit par la recherche d’un compromis. / This thesis deals with the theory of margin multicategory classification, and is based on the statistical learning theory founded by Vapnik and Chervonenkis. We are interested in deriving generalization bounds with explicit dependencies on the number C of categories, the sample size m and the margin parameter gamma, when the loss function considered is a Lipschitz continuous margin loss function. Generalization bounds rely on the empirical performance of the classifier as well as its "capacity". In this work, the following scalesensitive capacity measures are considered: the Rademacher complexity, the covering numbers and the fatshattering dimension. Our main contributions are obtained under the assumption that the classes of component functions implemented by a classifier have polynomially growing fatshattering dimensions and that the component functions are independent. In the context of the pathway of Mendelson, which relates the Rademacher complexity to the covering numbers and the latter to the fatshattering dimension, we study the impact that decomposing at the level of one of these capacity measures has on the dependencies on C, m and gamma. In particular, we demonstrate that the dependency on C can be substantially improved over the state of the art if the decomposition is postponed to the level of the metric entropy or the fatshattering dimension. On the other hand, this impacts negatively the rate of convergence (dependency on m), an indication of the fact that optimizing the dependencies on the three basic parameters amounts to looking for a tradeoff.

9 
DataDependent Analysis of Learning AlgorithmsPhilips, Petra Camilla, petra.philips@gmail.com January 2005 (has links)
This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the datadependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶
First, we propose an extension of the standard framework for the derivation of
generalization bounds for algorithms taking their hypotheses from random classes of functions. This approach is motivated by the fact that the function produced by a learning algorithm based on a random sample of data depends on this sample and is therefore a random function. Such an approach avoids the detour of the worstcase uniform bounds as done in the standard approach. We show that the mechanism which allows one to obtain generalization bounds for random classes in our framework is based on a “small complexity” of certain random coordinate
projections. We demonstrate how this notion of complexity relates to learnability
and how one can explore geometric properties of these projections in order to derive estimates of rates of convergence and good confidence interval estimates for the expected risk. We then demonstrate the generality of our new approach by presenting a range of examples, among them the algorithmdependent compression schemes and the datadependent luckiness
frameworks, which fall into our random subclass framework.¶
Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). Recent results show that one can significantly improve the highprobability estimates for the convergence rates for empirical minimizers by a direct analysis of the ERM algorithm.
These results are based on a new localized notion of complexity of subsets of hypothesis functions with identical expected errors and are therefore dependent on the underlying unknown distribution. We investigate the extent to which one can estimate these highprobability convergence rates in a datadependent manner. We provide an algorithm which computes a datadependent upper bound for the expected error of empirical minimizers in terms of the “complexity” of datadependent local subsets. These subsets are sets of functions of empirical errors of a given range and can be
determined based solely on empirical data.
We then show that recent direct estimates, which are essentially sharp estimates on the highprobability convergence rate for the ERM algorithm, can not be recovered universally from empirical data.

10 
A Note on the Generalization Performance of Kernel Classifiers with MarginEvgeniou, Theodoros, Pontil, Massimiliano 01 May 2000 (has links)
We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.

Page generated in 0.1506 seconds