Global ETD Search

1	Learning to rank in supervised and unsupervised settings using convexity and monotonicity Acharyya, Sreangsu 10 September 2013 (has links) This dissertation addresses the task of learning to rank, both in the supervised and unsupervised settings, by exploiting the interplay of convex functions, monotonic mappings and their fixed points. In the supervised setting of learning to rank, one wishes to learn from examples of correctly ordered items whereas in the unsupervised setting, one tries to maximize some quantitatively defined characteristic of a "good" ranking. A ranking method selects one permutation from among the combinatorially many permutations defined on the items to rank. Accomplishing this optimally in the supervised setting, with minimal loss in generality, if any, is challenging. In this dissertation this problem is addressed by optimizing, globally and efficiently, a statistically consistent loss functional over the class of compositions of a linear function by an arbitrary, strictly monotonic, separable mapping with large margins. This capability also enables learning the parameters of a generalized linear model with an unknown link function. The method can handle infinite dimensional feature spaces if the corresponding kernel function is known. In the unsupervised setting, a popular ranking approach is is link analysis over a graph of recommendations, as exemplified by pagerank. This dissertation shows that pagerank may be viewed as an instance of an unsupervised consensus optimization problem. The dissertation then solves a more general problem of unsupervised consensus over noisy, directed recommendation graphs that have uncertainty over the set of "out" edges that emanate from a vertex. The proposed consensus rank is essentially the pagerank over the expected edge-set, where the expectation is computed over the distribution that achieves the most agreeable consensus. This consensus is measured geometrically by a suitable Bregman divergence between the consensus rank and the ranks induced by item specific distributions Real world deployed ranking methods need to be resistant to spam, a particularly sophisticated type of which is link-spam. A popular class of countermeasures "de-spam" the corrupted webgraph by removing abusive pages identified by supervised learning. Since exhaustive detection and neutralization is infeasible, there is a need for ranking functions that can, on one hand, attenuate the effects of link-spam without supervision and on the other hand, counter spam more aggressively when supervision is available. A family of non-linear, iteratively defined monotonic functions is proposed that propagates "rank" and "trust" scores through the webgraph. It relies on non-linearity, monotonicity and Schurconvexity to provide the resistance against spam. / text Learning to rank Convexity Monotonicity Bregman divergence
2	Generalized Maximum Entropy, Convexity and Machine Learning Sears, Timothy Dean, tim.sears@biogreenoil.com January 2008 (has links) This thesis identiﬁes and extends techniques that can be linked to the principle of maximum entropy (maxent) and applied to parameter estimation in machine learning and statistics. Entropy functions based on deformed logarithms are used to construct Bregman divergences, and together these represent a generalization of relative entropy. The framework is analyzed using convex analysis to charac- terize generalized forms of exponential family distributions. Various connections to the existing machine learning literature are discussed and the techniques are applied to the problem of non-negative matrix factorization (NMF). Maximum entropy Bregman divergence exponential family deformed logarithm escort distribution non-negative matrix factorization.
3	Structured numerical problems in contemporary applications Sustik, Mátyás Attila 31 October 2013 (has links) The presence of structure in a computational problem can often be exploited and can lead to a more efficient numerical algorithm. In this dissertation, we look at structured numerical problems that arise from applications in wireless communications and machine learning that also impact other areas of scientific computing. In wireless communication system designs, certain structured matrices (frames) need to be generated. The design of such matrices is equivalent to a symmetric inverse eigenvalue problem where the values of the diagonal elements are prescribed. We present algorithms that are capable of generating a larger set of these constructions than previous algorithms. We also discuss the existence of equiangular tight frames---frames that satisfy additional structural properties. Kernel learning is an important class of problems in machine learning. It often relies on efficient numerical algorithms that solve underlying convex optimization problems. In our work, the objective functions to be minimized are the von Neumann and the LogDet Bregman matrix divergences. The algorithm that solves this optimization problem performs matrix updates based on repeated eigendecompositions of diagonal plus rank-one matrices in the case of von Neumann matrix divergence, and Cholesky updates in case of the LogDet Bregman matrix divergence. Our contribution exploits the low-rank representations and the structure of the constraint matrices, resulting in more efficient algorithms than previously known. We also present two specialized zero-finding algorithms where we exploit the structure through the shape and exact formulation of the objective function. The first zero-finding task arises during the matrix update step which is part of the above-mentioned kernel learning application. The second zero-finding problem is for the secular equation; it is equivalent to the computation of the eigenvalues of a diagonal plus rank-one matrix. The secular equation arises in various applications, the most well-known is the divide-and-conquer eigensolver. In our solutions, we build upon a somewhat forgotten zero-finding method by P. Jarratt, first described in 1966. The method employs first derivatives only and needs the same amount of evaluations as Newton's method, but converges faster. Our contributions are the more efficient specialized zero-finding algorithms. / text Matrix computation Inverse eigenvalue problem Equiangular frame Bregman divergence Zero-finding Divide-and-conquer eigensolver
4	Collective reasoning under uncertainty and inconsistency Adamcik, Martin January 2014 (has links) In this thesis we investigate some global desiderata for probabilistic knowledge merging given several possibly jointly inconsistent, but individually consistent knowledge bases. We show that the most naive methods of merging, which combine applications of a single expert inference process with the application of a pooling operator, fail to satisfy certain basic consistency principles. We therefore adopt a different approach. Following recent developments in machine learning where Bregman divergences appear to be powerful, we define several probabilistic merging operators which minimise the joint divergence between merged knowledge and given knowledge bases. In particular we prove that in many cases the result of applying such operators coincides with the sets of fixed points of averaging projective procedures - procedures which combine knowledge updating with pooling operators of decision theory. We develop relevant results concerning the geometry of Bregman divergences and prove new theorems in this field. We show that this geometry connects nicely with some desirable principles which have arisen in the epistemology of merging. In particular, we prove that the merging operators which we define by means of convex Bregman divergences satisfy analogues of the principles of merging due to Konieczny and Pino-Perez. Additionally, we investigate how such merging operators behave with respect to principles concerning irrelevant information, independence and relativisation which have previously been intensively studied in case of single-expert probabilistic inference. Finally, we argue that two particular probabilistic merging operators which are based on Kullback-Leibler divergence, a special type of Bregman divergence, have overall the most appealing properties amongst merging operators hitherto considered. By investigating some iterative procedures we propose algorithms to practically compute them. 519.2
5	Convex regression and its extensions to learning a Bregman divergence and difference of convex functions Siahkamari, Ali 26 January 2022 (has links) Nonparametric convex regression has been extensively studied over the last two decades. It has been shown any Lipschitz convex function can be approximated with arbitrarily accuracy with a max of linear functions. Using this framework, in this thesis we generalize convex regression to learning an arbitrary Bregman divergence and learning a difference of convex functions. We provide approximation guarantees and sample complexity bounds for both these extensions. Furthermore, we provide algorithms to solve the resulting optimization problems based on 2-block alternative direction method of multipliers (ADMM). For this algorithm, we provide convergence guarantees with iteration complexity of O(n√d/𝜖) for a dataset X 𝝐 ℝ^n,d and arbitrary positive 𝜖. Finally we provide experiments for both the Bregman divergence learning and difference of convex functions learning based on UCI datasets that demonstrate the state of the art on regression and classification datasets. Statistics ADMM BREGMAN divergence Convex regression Difference of convex Piecewise linear regression
6	Detecting Influential observations in spatial models using Bregman divergence / Detecção de observações influentes em modelos espaciais usando divergência de Bregman Danilevicz, Ian Meneghel 26 February 2018 (has links) How to evaluate if a spatial model is well ajusted to a problem? How to know if it is the best model between the class of conditional autoregressive (CAR) and simultaneous autoregressive (SAR) models, including homoscedasticity and heteroscedasticity cases? To answer these questions inside Bayesian framework, we propose new ways to apply Bregman divergence, as well as recent information criteria as widely applicable information criterion (WAIC) and leave-one-out cross-validation (LOO). The functional Bregman divergence is a generalized form of the well known Kullback-Leiber (KL) divergence. There is many special cases of it which might be used to identify influential points. All the posterior distributions displayed in this text were estimate by Hamiltonian Monte Carlo (HMC), a optimized version of Metropolis-Hasting algorithm. All ideas showed here were evaluate by both: simulation and real data. / Como avaliar se um modelo espacial está bem ajustado? Como escolher o melhor modelo entre muitos da classe autorregressivo condicional (CAR) e autorregressivo simultâneo (SAR), homoscedásticos e heteroscedásticos? Para responder essas perguntas dentro do paradigma bayesiano, propomos novas formas de aplicar a divergência de Bregman, assim como critérios de informação bastante recentes na literatura, são eles o widely applicable information criterion (WAIC) e validação cruzada leave-one-out (LOO). O funcional de Bregman é uma generalização da famosa divergência de Kullback-Leiber (KL). Há diversos casos particulares dela que podem ser usados para identificar pontos influentes. Todas as distribuições a posteriori apresentadas nesta dissertação foram estimadas usando Monte Carlo Hamiltoniano (HMC), uma versão otimizada do algoritmo Metropolis-Hastings. Todas as ideias apresentadas neste texto foram submetidas a simulações e aplicadas em dados reais. Bayesian inference Bregman divergence Divergência de Bregman Hamiltonian Monte Carlo Heteroscedasticidade Heteroscedasticity Inferência Bayesiana Influential points Modelos espaciais Monte Carlo Hamiltoniano Pontos influentes spatial models
7	Detecting Influential observations in spatial models using Bregman divergence / Detecção de observações influentes em modelos espaciais usando divergência de Bregman Ian Meneghel Danilevicz 26 February 2018 (has links) How to evaluate if a spatial model is well ajusted to a problem? How to know if it is the best model between the class of conditional autoregressive (CAR) and simultaneous autoregressive (SAR) models, including homoscedasticity and heteroscedasticity cases? To answer these questions inside Bayesian framework, we propose new ways to apply Bregman divergence, as well as recent information criteria as widely applicable information criterion (WAIC) and leave-one-out cross-validation (LOO). The functional Bregman divergence is a generalized form of the well known Kullback-Leiber (KL) divergence. There is many special cases of it which might be used to identify influential points. All the posterior distributions displayed in this text were estimate by Hamiltonian Monte Carlo (HMC), a optimized version of Metropolis-Hasting algorithm. All ideas showed here were evaluate by both: simulation and real data. / Como avaliar se um modelo espacial está bem ajustado? Como escolher o melhor modelo entre muitos da classe autorregressivo condicional (CAR) e autorregressivo simultâneo (SAR), homoscedásticos e heteroscedásticos? Para responder essas perguntas dentro do paradigma bayesiano, propomos novas formas de aplicar a divergência de Bregman, assim como critérios de informação bastante recentes na literatura, são eles o widely applicable information criterion (WAIC) e validação cruzada leave-one-out (LOO). O funcional de Bregman é uma generalização da famosa divergência de Kullback-Leiber (KL). Há diversos casos particulares dela que podem ser usados para identificar pontos influentes. Todas as distribuições a posteriori apresentadas nesta dissertação foram estimadas usando Monte Carlo Hamiltoniano (HMC), uma versão otimizada do algoritmo Metropolis-Hastings. Todas as ideias apresentadas neste texto foram submetidas a simulações e aplicadas em dados reais. Divergência de Bregman Heteroscedasticidade Inferência Bayesiana Modelos espaciais Monte Carlo Hamiltoniano Pontos influentes Bayesian inference Bregman divergence Hamiltonian Monte Carlo Heteroscedasticity Influential points spatial models
8	Convergence rates for variational regularization of statistical inverse problems Sprung, Benjamin 04 October 2019 (has links) No description available. 510 regularization theory convergence rates Gaussian white noise Poisson data variational source conditions Bregman divergence Bregman iterated Tikhonov regularization Mathematics (PPN61756535X)
9	Unsupervised 3D image clustering and extension to joint color and depth segmentation / Classification non supervisée d’images 3D et extension à la segmentation exploitant les informations de couleur et de profondeur Hasnat, Md Abul 01 October 2014 (has links) L'accès aux séquences d'images 3D s'est aujourd'hui démocratisé, grâce aux récentes avancées dans le développement des capteurs de profondeur ainsi que des méthodes permettant de manipuler des informations 3D à partir d'images 2D. De ce fait, il y a une attente importante de la part de la communauté scientifique de la vision par ordinateur dans l'intégration de l'information 3D. En effet, des travaux de recherche ont montré que les performances de certaines applications pouvaient être améliorées en intégrant l'information 3D. Cependant, il reste des problèmes à résoudre pour l'analyse et la segmentation de scènes intérieures comme (a) comment l'information 3D peut-elle être exploitée au mieux ? et (b) quelle est la meilleure manière de prendre en compte de manière conjointe les informations couleur et 3D ? Nous abordons ces deux questions dans cette thèse et nous proposons de nouvelles méthodes non supervisées pour la classification d'images 3D et la segmentation prenant en compte de manière conjointe les informations de couleur et de profondeur. A cet effet, nous formulons l'hypothèse que les normales aux surfaces dans les images 3D sont des éléments à prendre en compte pour leur analyse, et leurs distributions sont modélisables à l'aide de lois de mélange. Nous utilisons la méthode dite « Bregman Soft Clustering » afin d'être efficace d'un point de vue calculatoire. De plus, nous étudions plusieurs lois de probabilités permettant de modéliser les distributions de directions : la loi de von Mises-Fisher et la loi de Watson. Les méthodes de classification « basées modèles » proposées sont ensuite validées en utilisant des données de synthèse puis nous montrons leur intérêt pour l'analyse des images 3D (ou de profondeur). Une nouvelle méthode de segmentation d'images couleur et profondeur, appelées aussi images RGB-D, exploitant conjointement la couleur, la position 3D, et la normale locale est alors développée par extension des précédentes méthodes et en introduisant une méthode statistique de fusion de régions « planes » à l'aide d'un graphe. Les résultats montrent que la méthode proposée donne des résultats au moins comparables aux méthodes de l'état de l'art tout en demandant moins de temps de calcul. De plus, elle ouvre des perspectives nouvelles pour la fusion non supervisée des informations de couleur et de géométrie. Nous sommes convaincus que les méthodes proposées dans cette thèse pourront être utilisées pour la classification d'autres types de données comme la parole, les données d'expression en génétique, etc. Elles devraient aussi permettre la réalisation de tâches complexes comme l'analyse conjointe de données contenant des images et de la parole / Access to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis Analyse d'images de profondeur Segmentation d'images RGB-D Classification non supervisée Divergence de Bregman Sélection de modèles Distributions directionnelles Loi de Von Mises-Fisher Loi de Watson Analysis depth images RGB-D image segmentation Unsupervised classification Bregman divergence Models selection Directional distributions Von Mises-Fisher distribution Watson distribution
10	Optimization tools for non-asymptotic statistics in exponential families Le Priol, Rémi 04 1900 (has links) Les familles exponentielles sont une classe de modèles omniprésente en statistique. D'une part, elle peut modéliser n'importe quel type de données. En fait la plupart des distributions communes en font partie : Gaussiennes, variables catégoriques, Poisson, Gamma, Wishart, Dirichlet. D'autre part elle est à la base des modèles linéaires généralisés (GLM), une classe de modèles fondamentale en apprentissage automatique. Enfin les mathématiques qui les sous-tendent sont souvent magnifiques, grâce à leur lien avec la dualité convexe et la transformée de Laplace. L'auteur de cette thèse a fréquemment été motivé par cette beauté. Dans cette thèse, nous faisons trois contributions à l'intersection de l'optimisation et des statistiques, qui tournent toutes autour de la famille exponentielle. La première contribution adapte et améliore un algorithme d'optimisation à variance réduite appelé ascension des coordonnées duales stochastique (SDCA), pour entraîner une classe particulière de GLM appelée champ aléatoire conditionnel (CRF). Les CRF sont un des piliers de la prédiction structurée. Les CRF étaient connus pour être difficiles à entraîner jusqu'à la découverte des technique d'optimisation à variance réduite. Notre version améliorée de SDCA obtient des performances favorables comparées à l'état de l'art antérieur et actuel. La deuxième contribution s'intéresse à la découverte causale. Les familles exponentielles sont fréquemment utilisées dans les modèles graphiques, et en particulier dans les modèles graphique causaux. Cette contribution mène l'enquête sur une conjecture spécifique qui a attiré l'attention dans de précédents travaux : les modèles causaux s'adaptent plus rapidement aux perturbations de l'environnement. Nos résultats, obtenus à partir de théorèmes d'optimisation, soutiennent cette hypothèse sous certaines conditions. Mais sous d'autre conditions, nos résultats contredisent cette hypothèse. Cela appelle à une précision de cette hypothèse, ou à une sophistication de notre notion de modèle causal. La troisième contribution s'intéresse à une propriété fondamentale des familles exponentielles. L'une des propriétés les plus séduisantes des familles exponentielles est la forme close de l'estimateur du maximum de vraisemblance (MLE), ou maximum a posteriori (MAP) pour un choix naturel de prior conjugué. Ces deux estimateurs sont utilisés presque partout, souvent sans même y penser. (Combien de fois calcule-t-on une moyenne et une variance pour des données en cloche sans penser au modèle Gaussien sous-jacent ?) Pourtant la littérature actuelle manque de résultats sur la convergence de ces modèles pour des tailles d'échantillons finis, lorsque l'on mesure la qualité de ces modèles avec la divergence de Kullback-Leibler (KL). Pourtant cette divergence est la mesure de différence standard en théorie de l'information. En établissant un parallèle avec l'optimisation, nous faisons quelques pas vers un tel résultat, et nous relevons quelques directions pouvant mener à des progrès, tant en statistiques qu'en optimisation. Ces trois contributions mettent des outil d'optimisation au service des statistiques dans les familles exponentielles : améliorer la vitesse d'apprentissage de GLM de prédiction structurée, caractériser la vitesse d'adaptation de modèles causaux, estimer la vitesse d'apprentissage de modèles omniprésents. En traçant des ponts entre statistiques et optimisation, cette thèse fait progresser notre maîtrise de méthodes fondamentales d'apprentissage automatique. / Exponential families are a ubiquitous class of models in statistics. On the one hand, they can model any data type. Actually, the most common distributions are exponential families: Gaussians, categorical, Poisson, Gamma, Wishart, or Dirichlet. On the other hand, they sit at the core of generalized linear models (GLM), a foundational class of models in machine learning. They are also supported by beautiful mathematics thanks to their connection with convex duality and the Laplace transform. This beauty is definitely responsible for the existence of this thesis. In this manuscript, we make three contributions at the intersection of optimization and statistics, all revolving around exponential families. The first contribution adapts and improves a variance reduction optimization algorithm called stochastic dual coordinate ascent (SDCA) to train a particular class of GLM called conditional random fields (CRF). CRF are one of the cornerstones of structured prediction. CRF were notoriously hard to train until the advent of variance reduction techniques, and our improved version of SDCA performs favorably compared to the previous state-of-the-art. The second contribution focuses on causal discovery. Exponential families are widely used in graphical models, and in particular in causal graphical models. This contribution investigates a specific conjecture that gained some traction in previous work: causal models adapt faster to perturbations of the environment. Using results from optimization, we find strong support for this assumption when the perturbation is coming from an intervention on a cause, and support against this assumption when perturbation is coming from an intervention on an effect. These pieces of evidence are calling for a refinement of the conjecture. The third contribution addresses a fundamental property of exponential families. One of the most appealing properties of exponential families is its closed-form maximum likelihood estimate (MLE) and maximum a posteriori (MAP) for a natural choice of conjugate prior. These two estimators are used almost everywhere, often unknowingly -- how often are mean and variance computed for bell-shaped data without thinking about the Gaussian model they underly? Nevertheless, literature to date lacks results on the finite sample convergence property of the information (Kulback-Leibler) divergence between these estimators and the true distribution. Drawing on a parallel with optimization, we take some steps towards such a result, and we highlight directions for progress both in statistics and optimization. These three contributions are all using tools from optimization at the service of statistics in exponential families: improving upon an algorithm to learn GLM, characterizing the adaptation speed of causal models, and estimating the learning speed of ubiquitous models. By tying together optimization and statistics, this thesis is taking a step towards a better understanding of the fundamentals of machine learning. Apprentissage automatique famille exponentielle divergence de Bregman statistiques non-asymptotiques taux de convergence dualité convexe optimisation stochastique réduction de variance prédiction structurée causalité Machine learning exponential families Bregman divergence non-asymptotic statistics sample complexity, convex duality stochastic optimization variance reduction structured prediction causality

Search results