• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Notes on Intersective Polynomials

Gegner, Ethan 27 August 2018 (has links)
No description available.
2

Benefits of Additive Noise in Composing Classes of Functions with Applications to Neural Networks

Fathollah Pour, Alireza January 2022 (has links)
Let F and H be two (compatible) classes of functions. We observe that even when both F and H have small capacities as measured by their uniform covering numbers, the capacity of the composition class H o F={h o f| f in F, h in H} can become prohibitively large or even unbounded. To this end, in this thesis we provide a framework for controlling the capacity of composition and extend our results to bound the capacity of neural networks. Composition of Random Classes: We show that adding a small amount of Gaussian noise to the output of cF before composing it with H can effectively control the capacity of H o F, offering a general recipe for modular design. To prove our results, we define new notions of uniform covering number of random functions with respect to the total variation and Wasserstein distances. The bounds for composition then come naturally through the use of data processing inequality. Capacity of Neural Networks: We instantiate our results for the case of sigmoid neural networks. We start by finding a bound for the single-layer noisy neural network by estimating input distributions with mixtures of Gaussians and covering them. Next, we use our composition theorems to propose a novel bound for the covering number of a multi-layer network. This bound does not require Lipschitz assumption and works for networks with potentially large weights. Empirical Investigation of Generalization Bounds: We include preliminary empirical results on MNIST dataset to compare several covering number bounds based on their suggested generalization bounds. To compare these bounds, we propose a new metric (NVAC) that measures the minimum number of samples required to make the bound non-vacuous. The empirical results indicate that the amount of noise required to improve over existing uniform bounds can be numerically negligible. The source codes are available at https://github.com/fathollahpour/composition_noise / Thesis / Master of Science (MSc) / Given two classes of functions with bounded capacity, is there a systematic way to bound the capacity of their composition? We show that this is not generally true. Capacity of a class of functions is a learning-theoretic quantity that may be used to explain its sample complexity and generalization behaviour. In other words, bounding the capacity of a class can be used to ensure that given enough samples, with high probability, the deviation between training and expected errors is small. In this thesis, we show that adding a small amount of Gaussian noise to the output of functions can effectively control the capacity of composition, introducing a general framework for modular design. We instantiate our results for sigmoid neural networks and derive capacity bounds that work for networks with large weights. Our empirical results show that the amount of Gaussian noise required to improve over existing bounds is negligible.
3

Random parameters in learning: advantages and guarantees

Evzenie Coupkova (18396918) 22 April 2024 (has links)
<p dir="ltr">The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. We study a family of low-complexity classifiers consisting of thresholding a random one-dimensional feature. The feature is obtained by projecting the data on a random line after embedding it into a higher-dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n, based on its performance on training data, is chosen. </p><p dir="ltr">We show that this type of classifier is extremely flexible, as it is likely to approximate, to an arbitrary precision, any continuous function on a compact set as well as any Boolean function on a compact set that splits the support into measurable subsets. In particular, given full knowledge of the class conditional densities, the error of these low-complexity classifiers would converge to the optimal (Bayes) error as k and n go to infinity. On the other hand, if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity. </p><p dir="ltr">We also bound the generalization error of our random classifiers. In general, our bounds are better than those for any classifier with VC dimension greater than O(ln(n)). In particular, our bounds imply that, unless the number of projections n is extremely large, there is a significant advantageous gap between the generalization error of the random projection approach and that of a linear classifier in the extended space. Asymptotically, as the number of samples approaches infinity, the gap persists for any such n. Thus, there is a potentially large gain in generalization properties by selecting parameters at random, rather than optimization. </p><p dir="ltr">Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. </p><p dir="ltr">We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.</p>

Page generated in 0.1143 seconds