• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

On sparse representations and new meta-learning paradigms for representation learning

Mehta, Nishant A. 27 August 2014 (has links)
Given the "right" representation, learning is easy. This thesis studies representation learning and meta-learning, with a special focus on sparse representations. Meta-learning is fundamental to machine learning, and it translates to learning to learn itself. The presentation unfolds in two parts. In the first part, we establish learning theoretic results for learning sparse representations. The second part introduces new multi-task and meta-learning paradigms for representation learning. On the sparse representations front, our main pursuits are generalization error bounds to support a supervised dictionary learning model for Lasso-style sparse coding. Such predictive sparse coding algorithms have been applied with much success in the literature; even more common have been applications of unsupervised sparse coding followed by supervised linear hypothesis learning. We present two generalization error bounds for predictive sparse coding, handling the overcomplete setting (more original dimensions than learned features) and the infinite-dimensional setting. Our analysis led to a fundamental stability result for the Lasso that shows the stability of the solution vector to design matrix perturbations. We also introduce and analyze new multi-task models for (unsupervised) sparse coding and predictive sparse coding, allowing for one dictionary per task but with sharing between the tasks' dictionaries. The second part introduces new meta-learning paradigms to realize unprecedented types of learning guarantees for meta-learning. Specifically sought are guarantees on a meta-learner's performance on new tasks encountered in an environment of tasks. Nearly all previous work produced bounds on the expected risk, whereas we produce tail bounds on the risk, thereby providing performance guarantees on the risk for a single new task drawn from the environment. The new paradigms include minimax multi-task learning (minimax MTL) and sample variance penalized meta-learning (SVP-ML). Regarding minimax MTL, we provide a high probability learning guarantee on its performance on individual tasks encountered in the future, the first of its kind. We also present two continua of meta-learning formulations, each interpolating between classical multi-task learning and minimax multi-task learning. The idea of SVP-ML is to minimize the task average of the training tasks' empirical risks plus a penalty on their sample variance. Controlling this sample variance can potentially yield a faster rate of decrease for upper bounds on the expected risk of new tasks, while also yielding high probability guarantees on the meta-learner's average performance over a draw of new test tasks. An algorithm is presented for SVP-ML with feature selection representations, as well as a quite natural convex relaxation of the SVP-ML objective.
2

Data-Dependent Analysis of Learning Algorithms

Philips, Petra Camilla, petra.philips@gmail.com January 2005 (has links)
This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. This approach is motivated by the fact that the function produced by a learning algorithm based on a random sample of data depends on this sample and is therefore a random function. Such an approach avoids the detour of the worst-case uniform bounds as done in the standard approach. We show that the mechanism which allows one to obtain generalization bounds for random classes in our framework is based on a “small complexity” of certain random coordinate projections. We demonstrate how this notion of complexity relates to learnability and how one can explore geometric properties of these projections in order to derive estimates of rates of convergence and good confidence interval estimates for the expected risk. We then demonstrate the generality of our new approach by presenting a range of examples, among them the algorithm-dependent compression schemes and the data-dependent luckiness frameworks, which fall into our random subclass framework.¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). Recent results show that one can significantly improve the high-probability estimates for the convergence rates for empirical minimizers by a direct analysis of the ERM algorithm. These results are based on a new localized notion of complexity of subsets of hypothesis functions with identical expected errors and are therefore dependent on the underlying unknown distribution. We investigate the extent to which one can estimate these high-probability convergence rates in a data-dependent manner. We provide an algorithm which computes a data-dependent upper bound for the expected error of empirical minimizers in terms of the “complexity” of data-dependent local subsets. These subsets are sets of functions of empirical errors of a given range and can be determined based solely on empirical data. We then show that recent direct estimates, which are essentially sharp estimates on the high-probability convergence rate for the ERM algorithm, can not be recovered universally from empirical data.

Page generated in 0.0935 seconds