Global ETD Search

61	Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo Wei Deng (11804435) 18 December 2021 (has links) <div>The rise of artificial intelligence (AI) hinges on the efficient training of modern deep neural networks (DNNs) for non-convex optimization and uncertainty quantification, which boils down to a non-convex Bayesian learning problem. A standard tool to handle the problem is Langevin Monte Carlo, which proposes to approximate the posterior distribution with theoretical guarantees. However, non-convex Bayesian learning in real big data applications can be arbitrarily slow and often fails to capture the uncertainty or informative modes given a limited time. As a result, advanced techniques are still required.</div><div><br></div><div>In this thesis, we start with the replica exchange Langevin Monte Carlo (also known as parallel tempering), which is a Markov jump process that proposes appropriate swaps between exploration and exploitation to achieve accelerations. However, the na\"ive extension of swaps to big data problems leads to a large bias, and the bias-corrected swaps are required. Such a mechanism leads to few effective swaps and insignificant accelerations. To alleviate this issue, we first propose a control variates method to reduce the variance of noisy energy estimators and show a potential to accelerate the exponential convergence. We also present the population-chain replica exchange and propose a generalized deterministic even-odd scheme to track the non-reversibility and obtain an optimal round trip rate. Further approximations are conducted based on stochastic gradient descents, which yield a user-friendly nature for large-scale uncertainty approximation tasks without much tuning costs. </div><div><br></div><div>In the second part of the thesis, we study scalable dynamic importance sampling algorithms based on stochastic approximation. Traditional dynamic importance sampling algorithms have achieved successes in bioinformatics and statistical physics, however, the lack of scalability has greatly limited their extensions to big data applications. To handle this scalability issue, we resolve the vanishing gradient problem and propose two dynamic importance sampling algorithms based on stochastic gradient Langevin dynamics. Theoretically, we establish the stability condition for the underlying ordinary differential equation (ODE) system and guarantee the asymptotic convergence of the latent variable to the desired fixed point. Interestingly, such a result still holds given non-convex energy landscapes. In addition, we also propose a pleasingly parallel version of such algorithms with interacting latent variables. We show that the interacting algorithm can be theoretically more efficient than the single-chain alternative with an equivalent computational budget.</div> Statistics Stochastic Analysis and Modelling Monte Carlo Algorithm Artificial intelligence Importance sampling Computer vision Langevin Dynamics Variance reduction techniques Wang-Landau algorithm Interacting particles Hamiltonian Monte Carlo Log-Sobolev inequality Metropolis Hasting Deep neural network Stochastic variance-reduced gradient Wasserstein distance Convolutional neural network Deterministic even odd scheme Non-reversibility Stochastic approximation Monte Carlo Stochastic differential equation Stochastic gradient descent Parallel tempering Stochastic approximation Replica exchange Stochastic gradient Langevin dynamics Markov Chain Monte Carlo
62	Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and an Application Lakshminarayanan, Chandrashekar January 2015 (has links) (PDF) Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as engineering, science and economics. Such problems can often be cast in the framework of Markov Decision Process (MDP). Solving an MDP requires computing the optimal value function and the optimal policy. The idea of dynamic programming (DP) and the Bellman equation (BE) are at the heart of solution methods. The three important exact DP methods are value iteration, policy iteration and linear programming. The exact DP methods compute the optimal value function and the optimal policy. However, the exact DP methods are inadequate in practice because the state space is often large and in practice, one might have to resort to approximate methods that compute sub-optimal policies. Further, in certain cases, the system observations are known only in the form of noisy samples and we need to design algorithms that learn from these samples. In this thesis we study interesting theoretical questions pertaining to approximate and learning algorithms, and also present an interesting application of MDPs in the domain of crowd sourcing. Approximate Dynamic Programming (ADP) methods handle the issue of large state space by computing an approximate value function and/or a sub-optimal policy. In this thesis, we are concerned with conditions that result in provably good policies. Motivated by the limitations of the PBE in the conventional linear algebra, we study the PBE in the (min, +) linear algebra. It is a well known fact that deterministic optimal control problems with cost/reward criterion are (min, +)/(max, +) linear and ADP methods have been developed for such systems in literature. However, it is straightforward to show that inﬁnite horizon discounted reward/cost MDPs are neither (min, +) nor (max, +) linear. We develop novel ADP schemes namely the Approximate Q Iteration (AQI) and Variational Approximate Q Iteration (VAQI), where the approximate solution is a (min, +) linear combination of a set of basis functions whose span constitutes a subsemimodule. We show that the new ADP methods are convergent and we present a bound on the performance of the sub-optimal policy. The Approximate Linear Program (ALP) makes use of linear function approximation (LFA) and oﬀers theoretical performance guarantees. Nevertheless, the ALP is diﬃcult to solve due to the presence of a large number of constraints and in practice, a reduced linear program (RLP) is solved instead. The RLP has a tractable number of constraints sampled from the original constraints of the ALP. Though the RLP is known to perform well in experiments, theoretical guarantees are available only for a speciﬁc RLP obtained under idealized assumptions. In this thesis, we generalize the RLP to deﬁne a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution here is the novel theoretical framework developed to obtain error bounds for any given GRLP. Reinforcement Learning (RL) algorithms can be viewed as sample trajectory based solution methods for solving MDPs. Typically, RL algorithms that make use of stochastic approximation (SA) are iterative schemes taking small steps towards the desired value at each iteration. Actor-Critic algorithms form an important sub-class of RL algorithms, wherein, the critic is responsible for policy evaluation and the actor is responsible for policy improvement. The actor and critic iterations have deferent step-size schedules, in particular, the step-sizes used by the actor updates have to be generally much smaller than those used by the critic updates. Such SA schemes that use deferent step-size schedules for deferent sets of iterates are known as multitimescale stochastic approximation schemes. One of the most important conditions required to ensure the convergence of the iterates of a multi-timescale SA scheme is that the iterates need to be stable, i.e., they should be uniformly bounded almost surely. However, the conditions that imply the stability of the iterates in a multi-timescale SA scheme have not been well established. In this thesis, we provide veritable conditions that imply stability of two timescale stochastic approximation schemes. As an example, we also demonstrate that the stability of a widely used actor-critic RL algorithm follows from our analysis. Crowd sourcing (crowd) is a new mode of organizing work in multiple groups of smaller chunks of tasks and outsourcing them to a distributed and large group of people in the form of an open call. Recently, crowd sourcing has become a major pool for human intelligence tasks (HITs) such as image labeling, form digitization, natural language processing, machine translation evaluation and user surveys. Large organizations/requesters are increasingly interested in crowd sourcing the HITs generated out of their internal requirements. Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. This is an issue for frequent requesters desiring predictability in the completion times of tasks speciﬁed in terms of percentage of tasks completed within a stipulated amount of time. An important task attribute that aﬀects the completion time of a task is its price. However, a pricing policy that does not take the dynamics of the crowd into account might fail to achieve the desired predictability in completion times. Here, we make use of the MDP framework to compute a pricing policy that achieves predictable completion times in simulations as well as real world experiments. Dynamic Programming (DP) Markov Decision Process (MDP) Bellman Equation CBE Machine Learning Bellman Operator Crowdsourcing Approximate Linear Programming (ALP) Reinforcement Learning Stochastic Approximation Approximate Dynamic Programming (ADP) Approximate Linear Program Linear Function Approximation (LFA) Reduced Linear Program (RLP) Crowd Sourcing Computer Science and Automation
63	Stochastic approximation and least-squares regression, with applications to machine learning / Approximation stochastique et régression par moindres carrés : applications en apprentissage automatique Flammarion, Nicolas 24 July 2017 (has links) De multiples problèmes en apprentissage automatique consistent à minimiser une fonction lisse sur un espace euclidien. Pour l’apprentissage supervisé, cela inclut les régressions par moindres carrés et logistique. Si les problèmes de petite taille sont résolus efficacement avec de nombreux algorithmes d’optimisation, les problèmes de grande échelle nécessitent en revanche des méthodes du premier ordre issues de la descente de gradient. Dans ce manuscrit, nous considérons le cas particulier de la perte quadratique. Dans une première partie, nous nous proposons de la minimiser grâce à un oracle stochastique. Dans une seconde partie, nous considérons deux de ses applications à l’apprentissage automatique : au partitionnement de données et à l’estimation sous contrainte de forme. La première contribution est un cadre unifié pour l’optimisation de fonctions quadratiques non-fortement convexes. Celui-ci comprend la descente de gradient accélérée et la descente de gradient moyennée. Ce nouveau cadre suggère un algorithme alternatif qui combine les aspects positifs du moyennage et de l’accélération. La deuxième contribution est d’obtenir le taux optimal d’erreur de prédiction pour la régression par moindres carrés en fonction de la dépendance au bruit du problème et à l’oubli des conditions initiales. Notre nouvel algorithme est issu de la descente de gradient accélérée et moyennée. La troisième contribution traite de la minimisation de fonctions composites, somme de l’espérance de fonctions quadratiques et d’une régularisation convexe. Nous étendons les résultats existants pour les moindres carrés à toute régularisation et aux différentes géométries induites par une divergence de Bregman. Dans une quatrième contribution, nous considérons le problème du partitionnement discriminatif. Nous proposons sa première analyse théorique, une extension parcimonieuse, son extension au cas multi-labels et un nouvel algorithme ayant une meilleure complexité que les méthodes existantes. La dernière contribution de cette thèse considère le problème de la sériation. Nous adoptons une approche statistique où la matrice est observée avec du bruit et nous étudions les taux d’estimation minimax. Nous proposons aussi un estimateur computationellement efficace. / Many problems in machine learning are naturally cast as the minimization of a smooth function defined on a Euclidean space. For supervised learning, this includes least-squares regression and logistic regression. While small problems are efficiently solved by classical optimization algorithms, large-scale problems are typically solved with first-order techniques based on gradient descent. In this manuscript, we consider the particular case of the quadratic loss. In the first part, we are interestedin its minimization when its gradients are only accessible through a stochastic oracle. In the second part, we consider two applications of the quadratic loss in machine learning: clustering and estimation with shape constraints. In the first main contribution, we provided a unified framework for optimizing non-strongly convex quadratic functions, which encompasses accelerated gradient descent and averaged gradient descent. This new framework suggests an alternative algorithm that exhibits the positive behavior of both averaging and acceleration. The second main contribution aims at obtaining the optimal prediction error rates for least-squares regression, both in terms of dependence on the noise of the problem and of forgetting the initial conditions. Our new algorithm rests upon averaged accelerated gradient descent. The third main contribution deals with minimization of composite objective functions composed of the expectation of quadratic functions and a convex function. Weextend earlier results on least-squares regression to any regularizer and any geometry represented by a Bregman divergence. As a fourth contribution, we consider the the discriminative clustering framework. We propose its first theoretical analysis, a novel sparse extension, a natural extension for the multi-label scenario and an efficient iterative algorithm with better running-time complexity than existing methods. The fifth main contribution deals with the seriation problem. We propose a statistical approach to this problem where the matrix is observed with noise and study the corresponding minimax rate of estimation. We also suggest a computationally efficient estimator whose performance is studied both theoretically and experimentally. Optimisation convexe Accélération Moyennage Gradient stochastique Régression par moindres carrés Approximation stochastique Algorithme dual moyenné Descente miroire Partionnement discriminatif Relaxation convexe Parcimonie Sériation statistique Apprentissage de permutation Estimation minimax Contraintes de forme Convex optimization Acceleration Averaging Stochastic gradient Least-squares regression Stochastic approximation Dual averaging Mirror descent Discriminative clustering Convex relaxation Sparsity Statistical seriation Permutation learning Minimax estimation Shape constraints 519
64	Langevinized Ensemble Kalman Filter for Large-Scale Dynamic Systems Peiyi Zhang (11166777) 26 July 2021 (has links) <p>The Ensemble Kalman filter (EnKF) has achieved great successes in data assimilation in atmospheric and oceanic sciences, but its failure in convergence to the right filtering distribution precludes its use for uncertainty quantification. Other existing methods, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the datasets. In this dissertation, we address these difficulties in a coherent way.</p><p><br></p><p> </p><p>In the first part of the dissertation, we reformulate the EnKF under the framework of Langevin dynamics, which leads to a new particle filtering algorithm, the so-called Langevinized EnKF (LEnKF). The LEnKF algorithm inherits the forecast-analysis procedure from the EnKF and the use of mini-batch data from the stochastic gradient Langevin-type algorithms, which make it scalable with respect to both the dimension and sample size. We prove that the LEnKF converges to the right filtering distribution in Wasserstein distance under the big data scenario that the dynamic system consists of a large number of stages and has a large number of samples observed at each stage, and thus it can be used for uncertainty quantification. We reformulate the Bayesian inverse problem as a dynamic state estimation problem based on the techniques of subsampling and Langevin diffusion process. We illustrate the performance of the LEnKF using a variety of examples, including the Lorenz-96 model, high-dimensional variable selection, Bayesian deep learning, and Long Short-Term Memory (LSTM) network learning with dynamic data.</p><p><br></p><p> </p><p>In the second part of the dissertation, we focus on two extensions of the LEnKF algorithm. Like the EnKF, the LEnKF algorithm was developed for Gaussian dynamic systems containing no unknown parameters. We propose the so-called stochastic approximation- LEnKF (SA-LEnKF) for simultaneously estimating the states and parameters of dynamic systems, where the parameters are estimated on the fly based on the state variables simulated by the LEnKF under the framework of stochastic approximation. Under mild conditions, we prove the consistency of resulting parameter estimator and the ergodicity of the SA-LEnKF. For non-Gaussian dynamic systems, we extend the LEnKF algorithm (Extended LEnKF) by introducing a latent Gaussian measurement variable to dynamic systems. Those two extensions inherit the scalability of the LEnKF algorithm with respect to the dimension and sample size. The numerical results indicate that they outperform other existing methods in both states/parameters estimation and uncertainty quantification.</p> Statistics Dynamical Systems in Applications Applied Statistics Data Assimilation Inverse Problems State Space Model Dynamic System Markov chain Monte Carlo Uncertainty Quantification Ensemble Kalman Filter Stochastic Approximation Long Short Term Memory Networks Dynamic Poisson Model Embedding Latent Representation
65	Network Utility Maximization Based on Information Freshness Cho-Hsin Tsai (12225227) 20 April 2022 (has links) <p>It is predicted that there would be 41.6 billion IoT devices by 2025, which has kindled new interests on the timing coordination between sensors and controllers, i.e., how to use the waiting time to improve the performance. Sun et al. showed that a <i>controller</i> can strictly improve the data freshness, the so-called Age-of-Information (AoI), via careful scheduling designs. The optimal waiting policy for the <i>sensor</i> side was later characterized in the context of remote estimation. The first part of this work develops the jointly optimal sensor/controller waiting policy. It generalizes the above two important results in that not only do we consider joint sensor/controller designs, but we also assume random delay in both the forward and feedback directions. </p> <p> </p> <p>The second part of the work revisits and significantly strengthens the seminal results of Sun et al on the following fronts: (i) When designing the optimal offline schemes with full knowledge of the delay distributions, a new <i>fixed-point-based</i> method is proposed with <i>quadratic convergence rate</i>; (ii) When the distributional knowledge is unavailable, two new low-complexity online algorithms are proposed, which provably attain the optimal average AoI penalty; and (iii) the online schemes also admit a modular architecture, which allows the designer to <i>upgrade</i> certain components to handle additional practical challenges. Two such upgrades are proposed: (iii.1) the AoI penalty function incurred at the destination is unknown to the source node and must also be estimated on the fly, and (iii.2) the unknown delay distribution is Markovian instead of i.i.d. </p> <p> </p> <p>With the exponential growth of interconnected IoT devices and the increasing risk of excessive resource consumption in mind, the third part of this work derives an optimal joint cost-and-AoI minimization solution for multiple coexisting source-destination (S-D) pairs. The results admit a new <i>AoI-market-price</i>-based interpretation and are applicable to the setting of (i) general heterogeneous AoI penalty functions and Markov delay distributions for each S-D pair, and (ii) a general network cost function of aggregate throughput of all S-D pairs. </p> <p> </p> <p>In each part of this work, extensive simulation is used to demonstrate the superior performance of the proposed schemes. The discussion on analytical as well as numerical results sheds some light on designing practical network utility maximization protocols.</p> Computer Engineering Information Systems Coding and Information Theory Networking and Communications Information Engineering and Theory Age-of-information (AoI) Data freshness Information freshness Remote estimation Online algorithm Fixed-point equation Stochastic approximation Stochastic control Information update system Markov decision process (MDP) Network utility maximization Information theory Wireless networking Networking Communication systems Communication theory
66	Accelerated algorithms for temporal difference learning methods Rankawat, Anushree 12 1900 (has links) L'idée centrale de cette thèse est de comprendre la notion d'accélération dans les algorithmes d'approximation stochastique. Plus précisément, nous tentons de répondre à la question suivante : Comment l'accélération apparaît-elle naturellement dans les algorithmes d'approximation stochastique ? Nous adoptons une approche de systèmes dynamiques et proposons de nouvelles méthodes accélérées pour l'apprentissage par différence temporelle (TD) avec approximation de fonction linéaire : Polyak TD(0) et Nesterov TD(0). Contrairement aux travaux antérieurs, nos méthodes ne reposent pas sur une conception des méthodes de TD comme des méthodes de descente de gradient. Nous étudions l'interaction entre l'accélération, la stabilité et la convergence des méthodes accélérées proposées en temps continu. Pour établir la convergence du système dynamique sous-jacent, nous analysons les modèles en temps continu des méthodes d'approximation stochastique accélérées proposées en dérivant la loi de conservation dans un système de coordonnées dilaté. Nous montrons que le système dynamique sous-jacent des algorithmes proposés converge à un rythme accéléré. Ce cadre nous fournit également des recommandations pour le choix des paramètres d'amortissement afin d'obtenir ce comportement convergent. Enfin, nous discrétisons ces ODE convergentes en utilisant deux schémas de discrétisation différents, Euler explicite et Euler symplectique, et nous analysons leurs performances sur de petites tâches de prédiction linéaire. / The central idea of this thesis is to understand the notion of acceleration in stochastic approximation algorithms. Specifically, we attempt to answer the question: How does acceleration naturally show up in SA algorithms? We adopt a dynamical systems approach and propose new accelerated methods for temporal difference (TD) learning with linear function approximation: Polyak TD(0) and Nesterov TD(0). In contrast to earlier works, our methods do not rely on viewing TD methods as gradient descent methods. We study the interplay between acceleration, stability, and convergence of the proposed accelerated methods in continuous time. To establish the convergence of the underlying dynamical system, we analyze continuous-time models of the proposed accelerated stochastic approximation methods by deriving the conservation law in a dilated coordinate system. We show that the underlying dynamical system of our proposed algorithms converges at an accelerated rate. This framework also provides us recommendations for the choice of the damping parameters to obtain this convergent behavior. Finally, we discretize these convergent ODEs using two different discretization schemes, explicit Euler, and symplectic Euler, and analyze their performance on small, linear prediction tasks. Temporal difference learning Stochastic Approximation Accelerated methods Momentum methods Reinforcement learning Approximate Dynamic Programming Function approximation Conservation laws Convergence rates Machine learning Méthodes des différences temporelles Approximation Stochastique Méthodes accélérées Méthodes de quantité de mouvement Apprentissage par renforcement Programmation dynamique approchée Lois de conservation Taux de convergence Apprentissage automatique
67	Optimisation et Auto-Optimisation dans les réseaux LTE / Optimization and Self-Optimization in LTE-Advanced Networks Tall, Abdoulaye 17 December 2015 (has links) Le réseau mobile d’Orange France comprend plus de 100 000 antennes 2G, 3G et 4G sur plusieurs bandes de fréquences sans compter les nombreuses femto-cells fournies aux clients pour résoudre les problèmes de couverture. Ces chiffres ne feront que s’accroître pour répondre à la demande sans cesse croissante des clients pour les données mobiles. Cela illustre le défi énorme que rencontrent les opérateurs de téléphonie mobile en général à savoir gérer un réseau aussi complexe tout en limitant les coûts d’opération pour rester compétitifs. Cette thèse s’attache à utiliser le concept SON (réseaux auto-organisants) pour réduire cette complexité en automatisant les tâches répétitives ou complexes. Plus spécifiquement, nous proposons des algorithmes d’optimisation automatique pour des scénarios liés à la densification par les small cells ou les antennes actives. Nous abordons les problèmes classiques d’équilibrage de charge mais avec un lien backhaul à capacité limitée et de coordination d’interférence que ce soit dans le domaine temporel (notamment avec le eICIC) ou le domaine fréquentiel. Nous proposons aussi des algorithmes d’activation optimale de certaines fonctionnalités lorsque cette activation n’est pas toujours bénéfique. Pour la formulation mathématique et la résolution de tous ces algorithmes, nous nous appuyons sur les résultats de l’approximation stochastique et de l’optimisation convexe. Nous proposons aussi une méthodologie systématique pour la coordination de multiples fonctionnalités SON qui seraient exécutées en parallèle. Cette méthodologie est basée sur les jeux concaves et l’optimisation convexe avec comme contraintes des inégalités matricielles linéaires. / The mobile network of Orange in France comprises more than 100 000 2G, 3G and 4G antennas with severalfrequency bands, not to mention many femto-cells for deep-indoor coverage. These numbers will continue toincrease in order to address the customers’ exponentially increasing need for mobile data. This is an illustrationof the challenge faced by the mobile operators for operating such a complex network with low OperationalExpenditures (OPEX) in order to stay competitive. This thesis is about leveraging the Self-Organizing Network(SON) concept to reduce this complexity by automating repetitive or complex tasks. We specifically proposeautomatic optimization algorithms for scenarios related to network densification using either small cells orActive Antenna Systems (AASs) used for Vertical Sectorization (VeSn), Virtual Sectorization (ViSn) and multilevelbeamforming. Problems such as load balancing with limited-capacity backhaul and interference coordination eitherin time-domain (eICIC) or in frequency-domain are tackled. We also propose optimal activation algorithms forVeSn and ViSn when their activation is not always beneficial. We make use of results from stochastic approximationand convex optimization for the mathematical formulation of the problems and their solutions. We also proposea generic methodology for the coordination of multiple SON algorithms running in parallel using results fromconcave game theory and Linear Matrix Inequality (LMI)-constrained optimization. Réseaux Auto-Organisants LTE-Advanced Approximation stochastique SON Optimisation convexe Jeux concaves Theorie des files d’attente LTE LTE-Advanced Réseaux hétérogènes Antennes actives Small cells Sectorisation verticale Sectorisation virtuelle Beamforming hiérarchique Équilibrage de charge Coordination d’interférence EICIC Coordination SON Inégalités matricielles linéaires Self-Organizing Networks (SON) LTE-Advanced Stochastic Approximation SON Convex optimization Concave games Queuing theory LTE LTE-Advanced Heterogeneous networks HetNets Active antenna systems AAS Small cells Vertical sectorization Virtual sectorization Multilevel beamforming Load balancing Backhaul-constrained load balancing Interference coordination EICIC SON coordination Linear matrix inequalities LMI

Page generated in 0.0983 seconds