1 |
Algebraic Methods for the Estimation of Statistical DistributionsGrosdos Koutsoumpelias, Alexandros 15 July 2021 (has links)
This thesis deals with the problem of estimating statistical distributions from data. In the first part, the method of moments is used in combination with computational algebraic techniques in order to estimate parameters coming from local Dirac mixtures and their convolutions. The second part focuses on the nonparametric setting, in particular on combinatorial and algebraic aspects of the estimation of log-concave distributions.
|
2 |
Computing Markov bases, Gröbner bases, and extreme raysMalkin, Peter 25 June 2007 (has links)
In this thesis, we address problems from two topics of applied mathematics: linear integer programming and polyhedral computation. Linear integer programming concerns solving optimisation problems to maximise a linear cost function over the set of integer points in a polyhedron. Polyhedral computation is concerned with algorithms for computing different properties of convex polyhedra. First, we explore the theory and computation of Gröbner bases and Markov bases for linear integer programming. Second, we investigate and improve an algorithm from polyhedral computation that converts between different representations of cones and
polyhedra.
A Markov basis is a set of integer vectors such that we can move between any two feasible solutions of an integer program by adding or subtracting vectors in the Markov basis while never moving outside the set of feasible solutions. Markov bases are mainly used in algebraic statistics for sampling from a set of feasible solutions. The major contribution of this thesis is a fast algorithm for computing Markov bases, which we used to solve a previously intractable computational challenge.
Gröbner basis methods are exact local search approaches for solving integer programs. We present a Gröbner basis approach that can use the structure of an integer program in order to solve it more efficiently. Gröbner basis methods are interesting mainly from a purely theoretical viewpoint, but they are also interesting because they may provide insight into why some classes of integer programs are difficult to solve using standard techniques and because someday they may be able to solve these difficult problems.
Computing the properties of convex polyhedra is useful for solving problems within different areas of mathematics such as linear programming, integer programming, combinatorial optimisation, and computational geometry. We investigate and improve an algorithm for converting between a generator representation of a cone or polyhedron and a constraint representation of the cone or polyhedron and vice versa. This algorithm can be extended to compute circuits of matrices, which are used in computational biology for metabolic pathway analysis.
|
3 |
The algebraic statistics of sampling, likelihood, and regressionMarigliano, Orlando 04 December 2020 (has links)
This thesis is about statistical models and algebraic varieties. Algebraic Statistics unites these two concepts, turning algebraic structure into statistical insight. Featured here are three types of models that have such an algebraic structure.
Linear Gaussian covariance models are continuous models which are simple to define but hard to analyze. We compute their maximum likelihood degree in dimension two and find it equal to $2n-3$ generically if the model has $n$ covariates.
Discrete models with rational MLE are those discrete models for which likelihood estimation is easiest. We characterize them geometrically by building on the work of Huh and Kapranov on Horn uniformization.
Algebraic manifolds are a more general kind of object which is used to encode continuous data. We introduce a new method for computing integrals and sampling from distributions on them, based on intersecting with random linear spaces.
A brief report on mathematics in the sciences featuring case studies from soil ecology and nonparametric statistics closes the thesis.
|
4 |
Classifying Maximum Likelihood Degree for Small Colored Gaussian Graphical Models / Klassifikation av Maximum Likelihood Graden av Små Färgade Gaussiska Grafiska ModellerKuhlin, Jacob January 2023 (has links)
The Maximum Likelihood Degree (ML degree) of a statistical model is the number of complex critical points of the likelihood function. In this thesis we study this on Colored Gaussian Graphical Models, classifying the ML degree of colored graphs of order up to three. We do this by calculating the rational function degree of the gradient of the log- likelihood. Moreover we find that coloring a graph can lower the ML degree. Finally we calculate solutions to the homaloidal partial differential equation developed by Améndola et al. The code developed for these calculations can be used on graphs of higher orders. / Maximum likelihood-graden (ML-graden) för en statistisk modell är antalet komplexa kritiska punkter för likelihoodfunktionen. I denna avhandling studerar vi detta på färgade Gaussiska grafiska modeller och klassificerar ML-graden för färgade grafer av ordning upp till tre. Detta görs genom att beräkna den rationella funktionsgraden för gradienten av logaritmen av likelihoodfunktionen. Dessutom finner vi att ML-graden av en graf kan minskas genom att färgläggas. Slutligen beräknar vi lösningar till den homaloidala partiella differentialekvationen utvecklad av Améndola et al. Den kod som utvecklats för dessa beräkningar kan användas på grafer av högre ordning.
|
5 |
On Boundaries of Statistical Models / Randeigenschaften statistischer ModelleKahle, Thomas 24 June 2010 (has links) (PDF)
In the thesis "On Boundaries of Statistical Models" problems related to a description of probability
distributions with zeros, lying in the boundary of a statistical model, are treated. The
distributions considered are joint distributions of finite collections of finite discrete random
variables. Owing to this restriction, statistical models are subsets of finite dimensional real
vector spaces. The support set problem for exponential families, the main class of models considered
in the thesis, is to characterize the possible supports of distributions in the boundaries of these
statistical models. It is shown that this problem is equivalent to a characterization of the face
lattice of a convex polytope, called the convex support. The main tool for treating questions
related to the boundary are implicit representations. Exponential families are shown to be sets of
solutions of binomial equations, connected to an underlying combinatorial structure, called oriented
matroid. Under an additional assumption these equations are polynomial and one is placed in the
setting of commutative algebra and algebraic geometry. In this case one recovers results from
algebraic statistics. The combinatorial theory of exponential families using oriented matroids makes
the established connection between an exponential family and its convex support completely natural:
Both are derived from the same oriented matroid.
The second part of the thesis deals with hierarchical models, which are a special class of
exponential families constructed from simplicial complexes. The main technical tool for their
treatment in this thesis are so called elementary circuits. After their introduction, they are used
to derive properties of the implicit representations of hierarchical models. Each elementary circuit
gives an equation holding on the hierarchical model, and these equations are shown to be the
"simplest", in the sense that the smallest degree among the equations corresponding to elementary
circuits gives a lower bound on the degree of all equations characterizing the model. Translating
this result back to polyhedral geometry yields a neighborliness property of marginal polytopes, the
convex supports of hierarchical models. Elementary circuits of small support are related to
independence statements holding between the random variables whose joint distributions the
hierarchical model describes. Models for which the complete set of circuits consists of elementary
circuits are shown to be described by totally unimodular matrices. The thesis also contains an
analysis of the case of binary random variables. In this special situation, marginal polytopes can
be represented as the convex hulls of linear codes. Among the results here is a classification of
full-dimensional linear code polytopes in terms of their subgroups.
If represented by polynomial equations, exponential families are the varieties of binomial prime
ideals. The third part of the thesis describes tools to treat models defined by not necessarily
prime binomial ideals. It follows from Eisenbud and Sturmfels' results on binomial ideals that these
models are unions of exponential families, and apart from solving the support set problem for each
of these, one is faced with finding the decomposition. The thesis discusses algorithms for
specialized treatment of binomial ideals, exploiting their combinatorial nature. The provided
software package Binomials.m2 is shown to be able to compute very large primary decompositions,
yielding a counterexample to a recent conjecture in algebraic statistics.
|
6 |
On Boundaries of Statistical ModelsKahle, Thomas 26 May 2010 (has links)
In the thesis "On Boundaries of Statistical Models" problems related to a description of probability
distributions with zeros, lying in the boundary of a statistical model, are treated. The
distributions considered are joint distributions of finite collections of finite discrete random
variables. Owing to this restriction, statistical models are subsets of finite dimensional real
vector spaces. The support set problem for exponential families, the main class of models considered
in the thesis, is to characterize the possible supports of distributions in the boundaries of these
statistical models. It is shown that this problem is equivalent to a characterization of the face
lattice of a convex polytope, called the convex support. The main tool for treating questions
related to the boundary are implicit representations. Exponential families are shown to be sets of
solutions of binomial equations, connected to an underlying combinatorial structure, called oriented
matroid. Under an additional assumption these equations are polynomial and one is placed in the
setting of commutative algebra and algebraic geometry. In this case one recovers results from
algebraic statistics. The combinatorial theory of exponential families using oriented matroids makes
the established connection between an exponential family and its convex support completely natural:
Both are derived from the same oriented matroid.
The second part of the thesis deals with hierarchical models, which are a special class of
exponential families constructed from simplicial complexes. The main technical tool for their
treatment in this thesis are so called elementary circuits. After their introduction, they are used
to derive properties of the implicit representations of hierarchical models. Each elementary circuit
gives an equation holding on the hierarchical model, and these equations are shown to be the
"simplest", in the sense that the smallest degree among the equations corresponding to elementary
circuits gives a lower bound on the degree of all equations characterizing the model. Translating
this result back to polyhedral geometry yields a neighborliness property of marginal polytopes, the
convex supports of hierarchical models. Elementary circuits of small support are related to
independence statements holding between the random variables whose joint distributions the
hierarchical model describes. Models for which the complete set of circuits consists of elementary
circuits are shown to be described by totally unimodular matrices. The thesis also contains an
analysis of the case of binary random variables. In this special situation, marginal polytopes can
be represented as the convex hulls of linear codes. Among the results here is a classification of
full-dimensional linear code polytopes in terms of their subgroups.
If represented by polynomial equations, exponential families are the varieties of binomial prime
ideals. The third part of the thesis describes tools to treat models defined by not necessarily
prime binomial ideals. It follows from Eisenbud and Sturmfels'' results on binomial ideals that these
models are unions of exponential families, and apart from solving the support set problem for each
of these, one is faced with finding the decomposition. The thesis discusses algorithms for
specialized treatment of binomial ideals, exploiting their combinatorial nature. The provided
software package Binomials.m2 is shown to be able to compute very large primary decompositions,
yielding a counterexample to a recent conjecture in algebraic statistics.
|
7 |
Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE SolversMüller, Johannes 07 June 2024 (has links)
This thesis is divided into two parts dealing with the optimization problems in Markov decision processes (MDPs) and different neural network-based numerical solvers for partial differential equations (PDEs).
In Part I we analyze the optimization problem arising in (partially observable) Markov decision processes using tools from algebraic statistics and information geometry, which can be viewed as neighboring fields of applied algebra and differential geometry, respectively. Here, we focus on infinite horizon problems and memoryless stochastic policies. Markov decision processes provide a mathematical framework for sequential decision-making on which most current reinforcement learning algorithms are built. They formalize the task of optimally controlling the state of a system through appropriate actions. For fully observable problems, the action can be selected knowing the current state of the system. This case has been studied extensively and optimizing the action selection is known to be equivalent to solving a linear program over the (generalized) stationary distributions of the Markov decision process, which are also referred to as state-action frequencies.
In Chapter 3, we study partially observable problems where an action must be chosen based solely on an observation of the current state, which might not fully reveal the underlying state. We characterize the feasible state-action frequencies of partially observable Markov decision processes by polynomial inequalities. In particular, the optimization problem in partially observable MDPs is described as a polynomially constrained linear objective program that generalizes the (dual) linear programming formulation of fully observable problems. We use this to study the combinatorial and algebraic complexity of this optimization problem and to upper bound the number of critical points over the individual boundary components of the feasible set. Furthermore, we show that our polynomial programming formulation can be used to effectively solve partially observable MDPs using interior point methods, numerical algebraic techniques, and convex relaxations. Gradient-based methods, including variants of natural gradient methods, have gained tremendous attention in the theoretical reinforcement learning community, where they are commonly referred to as (natural) policy gradient methods.
In Chapter 4, we provide a unified treatment of a variety of natural policy gradient methods for fully observable problems by studying their state-action frequencies from the standpoint of information geometry. For a variety of NPGs and reward functions, we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Morimura and co-authors and Kakade by observing that these arise from the Hessian geometries of the entropy and conditional entropy, respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. We provide experimental evidence indicating that our predicted rates are essentially tight. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the inverse penalization strength, which recovers existing results as special cases.
Part II addresses neural network-based PDE solvers that have recently experienced tremendous growth in popularity and attention in the scientific machine learning community. We focus on two approaches that represent the approximation of a solution of a PDE as the minimization over the parameters of a neural network: the deep Ritz method and physically informed neural networks.
In Chapter 5, we study the theoretical properties of the boundary penalty for these methods and obtain a uniform convergence result for the deep Ritz method for a large class of potentially nonlinear problems. For linear PDEs, we estimate the error of the deep Ritz method in terms of the optimization error, the approximation capabilities of the neural network, and the strength of the penalty. This reveals a trade-off in the choice of the penalization strength, where too little penalization allows large boundary values, and too strong penalization leads to a poor solution of the PDE inside the domain. For physics-informed networks, we show that when working with neural networks that have zero boundary values also the second derivatives of the solution are approximated whereas otherwise only lower-order derivatives are approximated.
In Chapter 6, we propose energy natural gradient descent, a natural gradient method with respect to second-order information in the function space, as an optimization algorithm for physics-informed neural networks and the deep Ritz method. We show that this method, which can be interpreted as a generalized Gauss-Newton method, mimics Newton’s method in function space except for an orthogonal projection onto the tangent space of the model. We show that for a variety of PDEs, natural energy gradients converge rapidly and approximations to the solution of the PDE are several orders of magnitude more accurate than gradient descent, Adam and Newton’s methods, even when these methods are given more computational time.:Chapter 1. Introduction 1
1.1 Notation and conventions 7
Part I. Geometry of Markov decision processes 11
Chapter 2. Background on Markov decision processes 12
2.1 State-action frequencies 19
2.2 The advantage function and Bellman optimality 23
2.3 Rational structure of the reward and an explicit line theorem 26
2.4 Solution methods for Markov decision processes 35
Chapter 3. State-action geometry of partially observable MDPs 44
3.1 The state-action polytope of fully observables systems 45
3.2 State-action geometry of partially observable systems 54
3.3 Number and location of critical points 69
3.4 Reward optimization in state-action space (ROSA) 83
Chapter 4. Geometry and convergence of natural policy gradient methods 94
4.1 Natural gradients 96
4.2 Natural policy gradient methods 101
4.3 Convergence of natural policy gradient flows 107
4.4 Locally quadratic convergence for regularized problems 128
4.5 Discussion and outlook 131
Part II. Neural network-based PDE solvers 133
Chapter 5. Theoretical analysis of the boundary penalty method for neural network-based PDE solvers 134
5.1 Presentation and discussion of the main results 137
5.2 Preliminaries regarding Sobolev spaces and neural networks 146
5.3 Proofs regarding uniform convergence for the deep Ritz method 150
5.4 Proofs of error estimates for the deep Ritz method 156
5.5 Proofs of implications of exact boundary values in residual minimization 167
Chapter 6. Energy natural gradients for neural network-based PDE solvers 174
6.1 Energy natural gradients 176
6.2 Experiments 183
6.3 Conclusion and outlook 192
Bibliography 193
|
Page generated in 0.0734 seconds