1 |
Learning with non-Standard SupervisionUrner, Ruth January 2013 (has links)
Machine learning has enjoyed astounding practical
success in a wide range of applications in recent
years-practical success that often hurries ahead of our
theoretical understanding. The standard framework for machine
learning theory assumes full supervision, that is, training data
consists of correctly labeled iid examples from the same task
that the learned classifier is supposed to be applied to.
However, many practical applications successfully make use of
the sheer abundance of data that is currently produced. Such
data may not be labeled or may be collected from various
sources.
The focus of this thesis is to provide theoretical analysis of
machine learning regimes where the learner is given such
(possibly large amounts) of non-perfect training data. In
particular, we investigate the benefits and limitations of
learning with unlabeled data in semi-supervised learning and
active learning as well as benefits and limitations of learning
from data that has been generated by a task that is different
from the target task (domain adaptation learning).
For all three settings, we propose
Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we
discuss our suggested notion by comparing it to other common
data assumptions.
|
2 |
Learning with non-Standard SupervisionUrner, Ruth January 2013 (has links)
Machine learning has enjoyed astounding practical
success in a wide range of applications in recent
years-practical success that often hurries ahead of our
theoretical understanding. The standard framework for machine
learning theory assumes full supervision, that is, training data
consists of correctly labeled iid examples from the same task
that the learned classifier is supposed to be applied to.
However, many practical applications successfully make use of
the sheer abundance of data that is currently produced. Such
data may not be labeled or may be collected from various
sources.
The focus of this thesis is to provide theoretical analysis of
machine learning regimes where the learner is given such
(possibly large amounts) of non-perfect training data. In
particular, we investigate the benefits and limitations of
learning with unlabeled data in semi-supervised learning and
active learning as well as benefits and limitations of learning
from data that has been generated by a task that is different
from the target task (domain adaptation learning).
For all three settings, we propose
Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we
discuss our suggested notion by comparing it to other common
data assumptions.
|
3 |
A Comprehensive Analysis of Deep Learning for Interference Suppression, Sample and Model Complexity in Wireless SystemsOyedare, Taiwo Remilekun 12 March 2024 (has links)
The wireless spectrum is limited and the demand for its use is increasing due to technological advancements in wireless communication, resulting in persistent interference issues. Despite progress in addressing interference, it remains a challenge for effective spectrum usage, particularly in the use of license-free and managed shared bands and other opportunistic spectrum access solutions. Therefore, efficient and interference-resistant spectrum usage schemes are critical. In the past, most interference solutions have relied on avoidance techniques and expert system-based mitigation approaches. Recently, researchers have utilized artificial intelligence/machine learning techniques at the physical (PHY) layer, particularly deep learning, which suppress or compensate for the interfering signal rather than simply avoiding it. In addition, deep learning has been utilized by researchers in recent years to address various difficult problems in wireless communications such as, transmitter classification, interference classification and modulation recognition, amongst others. To this end, this dissertation presents a thorough analysis of deep learning techniques for interference classification and suppression, and it thoroughly examines complexity (sample and model) issues that arise from using deep learning. First, we address the knowledge gap in the literature with respect to the state-of-the-art in deep learning-based interference suppression. To account for the limitations of deep learning-based interference suppression techniques, we discuss several challenges, including lack of interpretability, the stochastic nature of the wireless channel, issues with open set recognition (OSR) and challenges with implementation. We also provide a technical discussion of the prominent deep learning algorithms proposed in the literature and also offer guidelines for their successful implementation. Next, we investigate convolutional neural network (CNN) architectures for interference and transmitter classification tasks. In particular, we utilize a CNN architecture to classify interference, investigate model complexity of CNN architectures for classifying homogeneous and heterogeneous devices and then examine their impact on test accuracy. Next, we explore the issues with sample size and sample quality with regards to the training data in deep learning. In doing this, we also propose a rule-of-thumb for transmitter classification using CNN based on the findings from our sample complexity study. Finally, in cases where interference cannot be avoided, it is important to suppress such interference. To achieve this, we build upon autoencoder work from other fields to design a convolutional neural network (CNN)-based autoencoder model to suppress interference thereby ensuring coexistence of different wireless technologies in both licensed and unlicensed bands. / Doctor of Philosophy / Wireless communication has advanced a lot in recent years, but it is still hard to use the limited amount of available spectrum without interference from other devices. In the past, researchers tried to avoid interference using expert systems. Now, researchers are using artificial intelligence and machine learning, particularly deep learning, to mitigate interference in a different way. Deep learning has also been used to solve other tough problems in wireless communication, such as classifying the type of device transmitting a signal, classifying the signal itself or avoiding it. This dissertation presents a comprehensive review of deep learning techniques for reducing interference in wireless communication. It also leverages a deep learning model called convolutional neural network (CNN) to classify interference and investigates how the complexity of the CNN effects its performance. It also looks at the relationship between model performance and dataset size (i.e., sample complexity) in wireless communication. Finally, it discusses a CNN-based autoencoder technique to suppress interference in digital amplitude-phase modulation system. All of these techniques are important for making sure different wireless technologies can work together in both licensed and unlicensed bands.
|
4 |
SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principaisJr., Juscelino Izidoro de Oliveira 30 July 2012 (has links)
Made available in DSpace on 2017-07-21T14:19:33Z (GMT). No. of bitstreams: 1
Juscelino Izidoro Oliveira.pdf: 622255 bytes, checksum: 54447b380bca4ea8e2360060669d5cff (MD5)
Previous issue date: 2012-07-30 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Multivariate data analysis allows the researcher to verify the interaction among a lot of attributes that can influence the behavior of a response variable. That analysis uses models
that can be induced from experimental data set. An important issue in the induction of multivariate regressors and classifers is the sample size, because this determines the reliability of the model for tasks of regression or classification of the response variable. This work approachs the sample size issue through the Theory of Probably Approximately Correct Learning, that comes from problems about machine learning for induction of models. Given the importance of agricultural modelling, this work shows two procedures to select variables. Variable Selection by Principal Component Analysis is an unsupervised procedure and allows the researcher to select the most relevant variables from the agricultural data by considering the variation in the data. Variable Selection by Supervised Principal Component Analysis is a supervised procedure and allows the researcher to perform the same process as in the previous procedure, but concentrating the focus of the selection over the variables with more influence in the behavior of the response variable. Both procedures allow the sample complexity informations to be explored in
variable selection process. Those procedures were tested in five experiments, showing that the supervised procedure has allowed to induce models that produced better scores, by
mean, than that models induced over variables selected by unsupervised procedure. Those experiments also allowed to verify that the variables selected by the unsupervised and supervised procedure showed reduced indices of multicolinearity. / A análise multivariada de dados permite verificar a interação de vários atributos que podem influenciar o comportamento de uma variável de resposta. Tal análise utiliza modelos
que podem ser induzidos de conjuntos de dados experimentais. Um fator importante na indução de regressores e classificadores multivariados é o tamanho da amostra, pois, esta determina a contabilidade do modelo quando há a necessidade de se regredir ou classificar a variável de resposta. Este trabalho aborda a questão do tamanho da amostra por meio da Teoria do Aprendizado Provavelmente Aproximadamente Correto, oriundo de problemas sobre o aprendizado de máquina para a indução de modelos. Dada a importância da modelagem agrícola, este trabalho apresenta dois procedimentos para a seleção de variáveis. O procedimento de Seleção de Variáveis por Análise de Componentes Principais, que não é supervisionado e permite ao pesquisador de agricultura selecionar as variáveis mais relevantes de um conjunto de dados agrícolas considerando a variação contida nos dados. O procedimento de Seleção de Variáveis por Análise de Componentes Principais
Supervisionado, que é supervisionado e permite realizar o mesmo processo do primeiro procedimento, mas concentrando-se apenas nas variáveis que possuem maior infuência no
comportamento da variável de resposta. Ambos permitem que informações a respeito da complexidade da amostra sejam exploradas na seleção de variáveis. Os dois procedimentos
foram avaliados em cinco experimentos, mostrando que o procedimento supervisionado permitiu, em média, induzir modelos que produziram melhores pontuações do que aqueles
modelos gerados sobre as variáveis selecionadas pelo procedimento não supervisionado. Os experimentos também permitiram verificar que as variáveis selecionadas por ambos os procedimentos apresentavam índices reduzidos de multicolinaridade..
|
5 |
STRUCTURED PREDICTION: STATISTICAL AND COMPUTATIONAL GUARANTEES IN LEARNING AND INFERENCEKevin Segundo Bello Medina (11196552) 28 July 2021 (has links)
<div>Structured prediction consists of receiving a structured input and producing a combinatorial structure such as trees, clusters, networks, sequences, permutations, among others. From the computational viewpoint, structured prediction is in general considered <i>intractable</i> because of the size of the output space being exponential in the input size. For instance, in image segmentation tasks, the number of admissible segments is exponential in the number of pixels. A second factor is the combination of the input dimensionality along with the amount of data under availability. In structured prediction it is common to have the input live in a high-dimensional space, which involves to jointly reason about thousands or millions of variables, and at the same time contend with limited amount of data. Thus, learning and inference methods with strong computational and statistical guarantees are desired. The focus of our research is then to propose <i>principled methods</i> for structured prediction that are both polynomial time, i.e., <i>computationally efficient</i>, and require a polynomial number of data samples, i.e., <i>statistically efficient</i>.</div><div><br></div><div>The main contributions of this thesis are as follows:</div><div><br></div><div>(i) We develop an efficient and principled learning method of latent variable models for structured prediction under Gaussian perturbations. We derive a Rademacher-based generalization bound and argue that the use of non-convex formulations in learning latent-variable models leads to tighter bounds of the Gibbs decoder distortion.</div><div><br></div><div>(ii) We study the fundamental limits of structured prediction, i.e., we characterize the necessary sample complexity for learning factor graph models in the context of structured prediction. In particular, we show that the finiteness of our novel MaxPair-dimension is necessary for learning. Lastly, we show a connection between the MaxPair-dimension and the VC-dimension---which allows for using existing results on VC-dimension to calculate the MaxPair-dimension.</div><div><br></div><div>(iii) We analyze a generative model based on connected graphs, and find the structural conditions of the graph that allow for the exact recovery of the node labels. In particular, we show that exact recovery is realizable in polynomial time for a large class of graphs. Our analysis is based on convex relaxations, where we thoroughly analyze a semidefinite program and a degree-4 sum-of-squares program. Finally, we extend this model to consider linear constraints (e.g., fairness), and formally explain the effect of the added constraints on the probability of exact recovery.</div><div><br></div>
|
6 |
CONTINUOUS RELAXATION FOR COMBINATORIAL PROBLEMS - A STUDY OF CONVEX AND INVEX PROGRAMSAdarsh Barik (15359902) 27 April 2023 (has links)
<p>In this thesis, we study optimization problems which have a combinatorial aspect to them. Search space for such problems quickly grows large - exponentially - with respect to the problem dimension. Thus, exhaustive search becomes intractable and we need good relaxations to solve combinatorial problems efficiently. Another challenge arises due to the high dimensionality of such problems and lack of large number of samples. Our aim is to come up with innovative approaches that solve the problem in polynomial time and sample complexity. We discuss three combinatorial optimization problems and provide continuous relaxations for them. Our continuous relaxations involve both convex and nonconvex (invex) relaxations. Furthermore, we provide efficient first order algorithms to solve a general class of invex problems with provable convergence rate guarantees. The three combinatorial problems we study in this work are – learning the directed structure of a Bayesian network using blackbox data, fair sparse regression on a biased dataset where bias depends upon a hidden binary attribute and mixed linear regression. We propose convex relaxation for the first problem, while the other two are solved using invex relaxation. On the first problem, we come up with a novel notion of low rank representation of conditional probability tables for a Bayesian network and connect it to Fourier transformation of real valued set functions to recover the exact structure of the Bayesian networks. For the second problem, we propose a novel invex relaxation for the combinatorial version of sparse linear regression with fairness. For the final problem, we again use invex relaxation to learn a mixture of sparse linear regression models. We formally show correctness of our proposed methods and provide provable theoretical guarantees for efficient computational and sample complexity. We also develop efficient first order algorithms to solve invex problems. We provide convergence rate analysis for our proposed methods. Furthermore, we also discuss possible future research directions and the problems we want to tackle in future.</p>
|
7 |
PROBABLY APPROXIMATELY CORRECT BOUNDS FOR ESTIMATING MARKOV TRANSITION KERNELSImon Banerjee (17555685) 06 December 2023 (has links)
<p dir="ltr">This thesis presents probably approximately correct (PAC) bounds on estimates of the transition kernels of Controlled Markov chains (CMC’s). CMC’s are a natural choice for modelling various industrial and medical processes, and are also relevant to reinforcement learning (RL). Learning the transition dynamics of CMC’s in a sample efficient manner is an important question that is open. This thesis aims to close this gap in knowledge in literature.</p><p dir="ltr">In Chapter 2, we lay the groundwork for later chapters by introducing the relevant concepts and defining the requisite terms. The two subsequent chapters focus on non-parametric estimation. </p><p dir="ltr">In Chapter 3, we restrict ourselves to a finitely supported CMC with d states and k controls and produce a general theory for minimax sample complexity of estimating the transition matrices.</p><p dir="ltr">In Chapter 4 we demonstrate the applicability of this theory by using it to recover the sample complexities of various controlled Markov chains, as well as RL problems.</p><p dir="ltr">In Chapter 5 we move to a continuous state and action spaces with compact supports. We produce a robust, non-parametric test to find the best histogram based estimator of the transition density; effectively reducing the problem into one of model selection based on empricial processes.</p><p dir="ltr">Finally, in Chapter 6 we move to a parametric and Bayesian regime, and restrict ourselves to Markov chains. Under this setting we provide a PAC-Bayes bound for estimating model parameters under tempered posteriors.</p>
|
8 |
Model Based Reinforcement Learning for Automatic Tuning of Cavity Filters / Modellbaserad förstärkningsinlärning för automatisk inställning av filterNimara, Doumitrou Daniil January 2021 (has links)
As telecommunication continues developing, the demand for mass production of well calibrated Base Transceiver Stations (BTS) components increases. Cavity Filters are an essential piece of every BTS; however, manufacturing tolerances often lead to detuned filters which require costly post-production fine tuning. Model Free Reinforcement Learning has been proposed to automate this process; however agents are not sample efficient. This is especially problematic, as agent training with newer, more precise environment simulators is time demanding. This work aims to leverage Model Based Reinforcement Learning to improve sample efficiency, while maintaining the same degree of accuracy. To this end, we evaluate and improve upon the performance of three state-of-the-art methods, present in the literature. The proposed modifications on these methods can serve as a template for their application on other, high dimensional non image data problems. In particular, the proposed modification on the Dreamer is modular, improves training stability and greatly decreases sample complexity. More specifically, sample complexity was reduced by a factor of 4 for the 6p2z filter and by a factor of 10 for 8p4z. Furthermore, hyperparameter sensitivity analysis is provided to add extra insight behind each approach. Overall, results facilitate further research in this field. The reduced sample complexity opens the possibility of training on more accurate simulators of more complicated filters, which would previously be intractable due to the high amount of samples required. / Moderna mobilnät är uppbyggda av massproducerade basstationer (Base Tranciever Stations), som var och en innehåller ett antal kavitetsfilter. Dessa filter är mycket känsliga, vilket gör att de efter produktion behöver finjusteras manuellt för att fungera som avsett. För att automatisera denna process har man tidigare använt Model Free Reinforcement Learning (MFRL). Denna teknik kräver dock mycket beräkningar, vilket är problematiskt, eftersom man skulle vilja genomföra träningen med mer komplexa simuleringsmodeller, vilket inte går i dagsläget. I detta arbete skall vi undersöka möjligheten att använda Model Based Reinforcement Learning (MBRL) för att lösa samma problem med färre beräkningssteg. Vi utvärderar, och anpassar, därför tre befintliga MBRL-algoritmer till problemet. Dessa anpassningar kan även överföras till andra problem. Den anpassning som görs på Dreamer-algoritmen är modulär, förbättrar stabiliteten i träningen, och minskar antalet beräkningar som behövs. I detalj så minskade antalet beräkningar med en faktor 4 för ett så-kallat 6p2z-filter och en faktor 10 för ett 8p4z-filter. En känslighetsanalys vad gäller hyperparametrar har också gjorts för varje metod. Rapportens resultat kan användas i vidare forskning på så sätt att det minskade antalet beräkningar gör att man kan använda mer realistiska modeller, av mer komplicerade filter, på ett sätt som tidigare inte var möjligt.
|
9 |
Optimization tools for non-asymptotic statistics in exponential familiesLe Priol, Rémi 04 1900 (has links)
Les familles exponentielles sont une classe de modèles omniprésente en statistique.
D'une part, elle peut modéliser n'importe quel type de données.
En fait la plupart des distributions communes en font partie : Gaussiennes, variables catégoriques, Poisson, Gamma, Wishart, Dirichlet.
D'autre part elle est à la base des modèles linéaires généralisés (GLM), une classe de modèles fondamentale en apprentissage automatique.
Enfin les mathématiques qui les sous-tendent sont souvent magnifiques, grâce à leur lien avec la dualité convexe et la transformée de Laplace.
L'auteur de cette thèse a fréquemment été motivé par cette beauté.
Dans cette thèse, nous faisons trois contributions à l'intersection de l'optimisation et des statistiques, qui tournent toutes autour de la famille exponentielle.
La première contribution adapte et améliore un algorithme d'optimisation à variance réduite appelé ascension des coordonnées duales stochastique (SDCA), pour entraîner une classe particulière de GLM appelée champ aléatoire conditionnel (CRF). Les CRF sont un des piliers de la prédiction structurée. Les CRF étaient connus pour être difficiles à entraîner jusqu'à la découverte des technique d'optimisation à variance réduite. Notre version améliorée de SDCA obtient des performances favorables comparées à l'état de l'art antérieur et actuel.
La deuxième contribution s'intéresse à la découverte causale.
Les familles exponentielles sont fréquemment utilisées dans les modèles graphiques, et en particulier dans les modèles graphique causaux.
Cette contribution mène l'enquête sur une conjecture spécifique qui a attiré l'attention dans de précédents travaux : les modèles causaux s'adaptent plus rapidement aux perturbations de l'environnement.
Nos résultats, obtenus à partir de théorèmes d'optimisation, soutiennent cette hypothèse sous certaines conditions. Mais sous d'autre conditions, nos résultats contredisent cette hypothèse. Cela appelle à une précision de cette hypothèse, ou à une sophistication de notre notion de modèle causal.
La troisième contribution s'intéresse à une propriété fondamentale des familles exponentielles.
L'une des propriétés les plus séduisantes des familles exponentielles est la forme close de l'estimateur du maximum de vraisemblance (MLE), ou maximum a posteriori (MAP) pour un choix naturel de prior conjugué.
Ces deux estimateurs sont utilisés presque partout, souvent sans même y penser.
(Combien de fois calcule-t-on une moyenne et une variance pour des données en cloche sans penser au modèle Gaussien sous-jacent ?)
Pourtant la littérature actuelle manque de résultats sur la convergence de ces modèles pour des tailles d'échantillons finis, lorsque l'on mesure la qualité de ces modèles avec la divergence de Kullback-Leibler (KL).
Pourtant cette divergence est la mesure de différence standard en théorie de l'information.
En établissant un parallèle avec l'optimisation, nous faisons quelques pas vers un tel résultat, et nous relevons quelques directions pouvant mener à des progrès, tant en statistiques qu'en optimisation.
Ces trois contributions mettent des outil d'optimisation au service des statistiques dans les familles exponentielles : améliorer la vitesse d'apprentissage de GLM de prédiction structurée, caractériser la vitesse d'adaptation de modèles causaux, estimer la vitesse d'apprentissage de modèles omniprésents.
En traçant des ponts entre statistiques et optimisation, cette thèse fait progresser notre maîtrise de méthodes fondamentales d'apprentissage automatique. / Exponential families are a ubiquitous class of models in statistics.
On the one hand, they can model any data type.
Actually, the most common distributions are exponential families: Gaussians, categorical, Poisson, Gamma, Wishart, or Dirichlet.
On the other hand, they sit at the core of generalized linear models (GLM), a foundational class of models in machine learning.
They are also supported by beautiful mathematics thanks to their connection with convex duality and the Laplace transform.
This beauty is definitely responsible for the existence of this thesis.
In this manuscript, we make three contributions at the intersection of optimization and statistics, all revolving around exponential families.
The first contribution adapts and improves a variance reduction optimization algorithm called stochastic dual coordinate ascent (SDCA) to train a particular class of GLM called conditional random fields (CRF). CRF are one of the cornerstones of structured prediction. CRF were notoriously hard to train until the advent of variance reduction techniques, and our improved version of SDCA performs favorably compared to the previous state-of-the-art.
The second contribution focuses on causal discovery.
Exponential families are widely used in graphical models, and in particular in causal graphical models.
This contribution investigates a specific conjecture that gained some traction in previous work: causal models adapt faster to perturbations of the environment.
Using results from optimization, we find strong support for this assumption when the perturbation is coming from an intervention on a cause, and support against this assumption when perturbation is coming from an intervention on an effect.
These pieces of evidence are calling for a refinement of the conjecture.
The third contribution addresses a fundamental property of exponential families.
One of the most appealing properties of exponential families is its closed-form maximum likelihood estimate (MLE) and maximum a posteriori (MAP) for a natural choice of conjugate prior. These two estimators are used almost everywhere, often unknowingly
-- how often are mean and variance computed for bell-shaped data without thinking about the Gaussian model they underly?
Nevertheless, literature to date lacks results on the finite sample convergence property of the information (Kulback-Leibler) divergence between these estimators and the true distribution.
Drawing on a parallel with optimization, we take some steps towards such a result, and we highlight directions for progress both in statistics and optimization.
These three contributions are all using tools from optimization at the service of statistics in exponential families: improving upon an algorithm to learn GLM, characterizing the adaptation speed of causal models, and estimating the learning speed of ubiquitous models.
By tying together optimization and statistics, this thesis is taking a step towards a better understanding of the fundamentals of machine learning.
|
10 |
Random parameters in learning: advantages and guaranteesEvzenie Coupkova (18396918) 22 April 2024 (has links)
<p dir="ltr">The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. We study a family of low-complexity classifiers consisting of thresholding a random one-dimensional feature. The feature is obtained by projecting the data on a random line after embedding it into a higher-dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n, based on its performance on training data, is chosen. </p><p dir="ltr">We show that this type of classifier is extremely flexible, as it is likely to approximate, to an arbitrary precision, any continuous function on a compact set as well as any Boolean function on a compact set that splits the support into measurable subsets. In particular, given full knowledge of the class conditional densities, the error of these low-complexity classifiers would converge to the optimal (Bayes) error as k and n go to infinity. On the other hand, if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity. </p><p dir="ltr">We also bound the generalization error of our random classifiers. In general, our bounds are better than those for any classifier with VC dimension greater than O(ln(n)). In particular, our bounds imply that, unless the number of projections n is extremely large, there is a significant advantageous gap between the generalization error of the random projection approach and that of a linear classifier in the extended space. Asymptotically, as the number of samples approaches infinity, the gap persists for any such n. Thus, there is a potentially large gain in generalization properties by selecting parameters at random, rather than optimization. </p><p dir="ltr">Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. </p><p dir="ltr">We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.</p>
|
Page generated in 0.0815 seconds