Spelling suggestions: "subject:"nonconvex optimization"" "subject:"nonconvexe optimization""
121 |
Supervised metric learning with generalization guarantees / Apprentissage supervisé de métriques avec garanties en généralisationBellet, Aurélien 11 December 2012 (has links)
Ces dernières années, l'importance cruciale des métriques en apprentissage automatique a mené à un intérêt grandissant pour l'optimisation de distances et de similarités en utilisant l'information contenue dans des données d'apprentissage pour les rendre adaptées au problème traité. Ce domaine de recherche est souvent appelé apprentissage de métriques. En général, les méthodes existantes optimisent les paramètres d'une métrique devant respecter des contraintes locales sur les données d'apprentissage. Les métriques ainsi apprises sont généralement utilisées dans des algorithmes de plus proches voisins ou de clustering.Concernant les données numériques, beaucoup de travaux ont porté sur l'apprentissage de distance de Mahalanobis, paramétrisée par une matrice positive semi-définie. Les méthodes récentes sont capables de traiter des jeux de données de grande taille.Moins de travaux ont été dédiés à l'apprentissage de métriques pour les données structurées (comme les chaînes ou les arbres), car cela implique souvent des procédures plus complexes. La plupart des travaux portent sur l'optimisation d'une notion de distance d'édition, qui mesure (en termes de nombre d'opérations) le coût de transformer un objet en un autre.Au regard de l'état de l'art, nous avons identifié deux limites importantes des approches actuelles. Premièrement, elles permettent d'améliorer la performance d'algorithmes locaux comme les k plus proches voisins, mais l'apprentissage de métriques pour des algorithmes globaux (comme les classifieurs linéaires) n'a pour l'instant pas été beaucoup étudié. Le deuxième point, sans doute le plus important, est que la question de la capacité de généralisation des méthodes d'apprentissage de métriques a été largement ignorée.Dans cette thèse, nous proposons des contributions théoriques et algorithmiques qui répondent à ces limites. Notre première contribution est la construction d'un nouveau noyau construit à partir de probabilités d'édition apprises. A l'inverse d'autres noyaux entre chaînes, sa validité est garantie et il ne comporte aucun paramètre. Notre deuxième contribution est une nouvelle approche d'apprentissage de similarités d'édition pour les chaînes et les arbres inspirée par la théorie des (epsilon,gamma,tau)-bonnes fonctions de similarité et formulée comme un problème d'optimisation convexe. En utilisant la notion de stabilité uniforme, nous établissons des garanties théoriques pour la similarité apprise qui donne une borne sur l'erreur en généralisation d'un classifieur linéaire construit à partir de cette similarité. Dans notre troisième contribution, nous étendons ces principes à l'apprentissage de métriques pour les données numériques en proposant une méthode d'apprentissage de similarité bilinéaire qui optimise efficacement l'(epsilon,gamma,tau)-goodness. La similarité est apprise sous contraintes globales, plus appropriées à la classification linéaire. Nous dérivons des garanties théoriques pour notre approche, qui donnent de meilleurs bornes en généralisation pour le classifieur que dans le cas des données structurées. Notre dernière contribution est un cadre théorique permettant d'établir des bornes en généralisation pour de nombreuses méthodes existantes d'apprentissage de métriques. Ce cadre est basé sur la notion de robustesse algorithmique et permet la dérivation de bornes pour des fonctions de perte et des régulariseurs variés / In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers.
|
122 |
Graph-based variational optimization and applications in computer vision / Optimisation variationnelle discrète et applications en vision par ordinateurCouprie, Camille 10 October 2011 (has links)
De nombreuses applications en vision par ordinateur comme le filtrage, la segmentation d'images, et la stéréovision peuvent être formulées comme des problèmes d'optimisation. Récemment les méthodes discrètes, convexes, globalement optimales ont reçu beaucoup d'attention. La méthode des "graph cuts'", très utilisée en vision par ordinateur est basée sur la résolution d'un problème de flot maximum discret, mais les solutions souffrent d'un effet de blocs,notamment en segmentation d'images. Une nouvelle formulation basée sur le problème continu est introduite dans le premier chapitre et permet d'éviter cet effet. La méthode de point interieur employée permet d'optimiser le problème plus rapidement que les méthodes existantes, et la convergence est garantie. Dans le second chapitre, la formulation proposée est efficacement étendue à la restauration d'image. Grâce à une approche du à la contrainte et à un algorithme proximal parallèle, la méthode permet de restaurer (débruiter, déflouter, fusionner) des images rapidement et préserve un meilleur contraste qu'avec la méthode de variation totale classique. Le chapitre suivant met en évidence l'existence de liens entre les méthodes de segmentation "graph-cuts'", le "randomwalker'', et les plus courts chemins avec un algorithme de segmentation par ligne de partage des eaux (LPE). Ces liens ont inspiré un nouvel algorithme de segmentation multi-labels rapide produisant une ligne de partage des eaux unique, moins sensible aux fuites que la LPE classique. Nous avons nommé cet algorithme "LPE puissance''. L'expression de la LPE sous forme d'un problème d'optimisation a ouvert la voie à de nombreuses applications possibles au delà de la segmentation d'images, par exemple dans le dernier chapitre en filtrage pour l'optimisation d'un problème non convexe, en stéréovision, et en reconstruction rapide de surfaces lisses délimitant des objets à partir de nuages de points bruités / Many computer vision applications such as image filtering, segmentation and stereovision can be formulated as optimization problems. Recently discrete, convex, globally optimal methods have received a lot of attention. Many graph-based methods suffer from metrication artefacts, segmented contours are blocky in areas where contour information is lacking. In the first part of this work, we develop a discrete yet isotropic energy minimization formulation for the continuous maximum flow problem that prevents metrication errors. This new convex formulation leads us to a provably globally optimal solution. The employed interior point method can optimize the problem faster than the existing continuous methods. The energy formulation is then adapted and extended to multi-label problems, and shows improvements over existing methods. Fast parallel proximal optimization tools have been tested and adapted for the optimization of this problem. In the second part of this work, we introduce a framework that generalizes several state-of-the-art graph-based segmentation algorithms, namely graph cuts, random walker, shortest paths, and watershed. This generalization allowed us to exhibit a new case, for which we developed a globally optimal optimization method, named "Power watershed''. Our proposed power watershed algorithm computes a unique global solution to multi labeling problems, and is very fast. We further generalize and extend the framework to applications beyond image segmentation, for example image filtering optimizing an L0 norm energy, stereovision and fast and smooth surface reconstruction from a noisy cloud of 3D points
|
123 |
Factor analysis of dynamic PET imagesCruz Cavalcanti, Yanna 31 October 2018 (has links)
La tomographie par émission de positrons (TEP) est une technique d'imagerie nucléaire noninvasive qui permet de quantifier les fonctions métaboliques des organes à partir de la diffusion d'un radiotraceur injecté dans le corps. Alors que l'imagerie statique est souvent utilisée afin d'obtenir une distribution spatiale de la concentration du traceur, une meilleure évaluation de la cinétique du traceur est obtenue par des acquisitions dynamiques. En ce sens, la TEP dynamique a suscité un intérêt croissant au cours des dernières années, puisqu'elle fournit des informations à la fois spatiales et temporelles sur la structure des prélèvements de traceurs en biologie \textit{in vivo}. Les techniques de quantification les plus efficaces en TEP dynamique nécessitent souvent une estimation de courbes temps-activité (CTA) de référence représentant les tissus ou une fonction d'entrée caractérisant le flux sanguin. Dans ce contexte, de nombreuses méthodes ont été développées pour réaliser une extraction non-invasive de la cinétique globale d'un traceur, appelée génériquement analyse factorielle. L'analyse factorielle est une technique d'apprentissage non-supervisée populaire pour identifier un modèle ayant une signification physique à partir de données multivariées. Elle consiste à décrire chaque voxel de l'image comme une combinaison de signatures élémentaires, appelées \textit{facteurs}, fournissant non seulement une CTA globale pour chaque tissu, mais aussi un ensemble des coefficients reliant chaque voxel à chaque CTA tissulaire. Parallèlement, le démélange - une instance particulière d'analyse factorielle - est un outil largement utilisé dans la littérature de l'imagerie hyperspectrale. En imagerie TEP dynamique, elle peut être très pertinente pour l'extraction des CTA, puisqu'elle prend directement en compte à la fois la non-négativité des données et la somme-à-une des proportions de facteurs, qui peuvent être estimées à partir de la diffusion du sang dans le plasma et les tissus. Inspiré par la littérature de démélange hyperspectral, ce manuscrit s'attaque à deux inconvénients majeurs des techniques générales d'analyse factorielle appliquées en TEP dynamique. Le premier est l'hypothèse que la réponse de chaque tissu à la distribution du traceur est spatialement homogène. Même si cette hypothèse d'homogénéité a prouvé son efficacité dans plusieurs études d'analyse factorielle, elle ne fournit pas toujours une description suffisante des données sousjacentes, en particulier lorsque des anomalies sont présentes. Pour faire face à cette limitation, les modèles proposés ici permettent un degré de liberté supplémentaire aux facteurs liés à la liaison spécifique. Dans ce but, une perturbation spatialement variante est introduite en complément d'une CTA nominale et commune. Cette variation est indexée spatialement et contrainte avec un dictionnaire, qui est soit préalablement appris ou explicitement modélisé par des non-linéarités convolutives affectant les tissus de liaisons non-spécifiques. Le deuxième inconvénient est lié à la distribution du bruit dans les images PET. Même si le processus de désintégration des positrons peut être décrit par une distribution de Poisson, le bruit résiduel dans les images TEP reconstruites ne peut généralement pas être simplement modélisé par des lois de Poisson ou gaussiennes. Nous proposons donc de considérer une fonction de coût générique, appelée $\beta$-divergence, capable de généraliser les fonctions de coût conventionnelles telles que la distance euclidienne, les divergences de Kullback-Leibler et Itakura-Saito, correspondant respectivement à des distributions gaussiennes, de Poisson et Gamma. Cette fonction de coût est appliquée à trois modèles d'analyse factorielle afin d'évaluer son impact sur des images TEP dynamiques avec différentes caractéristiques de reconstruction. / Thanks to its ability to evaluate metabolic functions in tissues from the temporal evolution of a previously injected radiotracer, dynamic positron emission tomography (PET) has become an ubiquitous analysis tool to quantify biological processes. Several quantification techniques from the PET imaging literature require a previous estimation of global time-activity curves (TACs) (herein called \textit{factors}) representing the concentration of tracer in a reference tissue or blood over time. To this end, factor analysis has often appeared as an unsupervised learning solution for the extraction of factors and their respective fractions in each voxel. Inspired by the hyperspectral unmixing literature, this manuscript addresses two main drawbacks of general factor analysis techniques applied to dynamic PET. The first one is the assumption that the elementary response of each tissue to tracer distribution is spatially homogeneous. Even though this homogeneity assumption has proven its effectiveness in several factor analysis studies, it may not always provide a sufficient description of the underlying data, in particular when abnormalities are present. To tackle this limitation, the models herein proposed introduce an additional degree of freedom to the factors related to specific binding. To this end, a spatially-variant perturbation affects a nominal and common TAC representative of the high-uptake tissue. This variation is spatially indexed and constrained with a dictionary that is either previously learned or explicitly modelled with convolutional nonlinearities affecting non-specific binding tissues. The second drawback is related to the noise distribution in PET images. Even though the positron decay process can be described by a Poisson distribution, the actual noise in reconstructed PET images is not expected to be simply described by Poisson or Gaussian distributions. Therefore, we propose to consider a popular and quite general loss function, called the $\beta$-divergence, that is able to generalize conventional loss functions such as the least-square distance, Kullback-Leibler and Itakura-Saito divergences, respectively corresponding to Gaussian, Poisson and Gamma distributions. This loss function is applied to three factor analysis models in order to evaluate its impact on dynamic PET images with different reconstruction characteristics.
|
124 |
String-averaging incremental subgradient methods for constrained convex optimization problems / Média das sequências e métodos de subgradientes incrementais para problemas de otimização convexa com restriçõesOliveira, Rafael Massambone de 12 July 2017 (has links)
In this doctoral thesis, we propose new iterative methods for solving a class of convex optimization problems. In general, we consider problems in which the objective function is composed of a finite sum of convex functions and the set of constraints is, at least, convex and closed. The iterative methods we propose are basically designed through the combination of incremental subgradient methods and string-averaging algorithms. Furthermore, in order to obtain methods able to solve optimization problems with many constraints (and possibly in high dimensions), generally given by convex functions, our analysis includes an operator that calculates approximate projections onto the feasible set, instead of the Euclidean projection. This feature is employed in the two methods we propose; one deterministic and the other stochastic. A convergence analysis is proposed for both methods and numerical experiments are performed in order to verify their applicability, especially in large scale problems. / Nesta tese de doutorado, propomos novos métodos iterativos para a solução de uma classe de problemas de otimização convexa. Em geral, consideramos problemas nos quais a função objetivo é composta por uma soma finita de funções convexas e o conjunto de restrições é, pelo menos, convexo e fechado. Os métodos iterativos que propomos são criados, basicamente, através da junção de métodos de subgradientes incrementais e do algoritmo de média das sequências. Além disso, visando obter métodos flexíveis para soluções de problemas de otimização com muitas restrições (e possivelmente em altas dimensões), dadas em geral por funções convexas, a nossa análise inclui um operador que calcula projeções aproximadas sobre o conjunto viável, no lugar da projeção Euclideana. Essa característica é empregada nos dois métodos que propomos; um determinístico e o outro estocástico. Uma análise de convergência é proposta para ambos os métodos e experimentos numéricos são realizados a fim de verificar a sua aplicabilidade, principalmente em problemas de grande escala.
|
125 |
Métodos de busca em coordenada / Coordinate descent methodsSantos, Luiz Gustavo de Moura dos 22 November 2017 (has links)
Problemas reais em áreas como aprendizado de máquina têm chamado atenção pela enorme quantidade de variáveis (> 10^6) e volume de dados. Em problemas dessa escala o custo para se obter e trabalhar com informações de segunda ordem são proibitivos. Tais problemas apresentam características que podem ser aproveitadas por métodos de busca em coordenada. Essa classe de métodos é caracterizada pela alteração de apenas uma ou poucas variáveis a cada iteração. A variante do método comumente descrita na literatura é a minimização cíclica de variáveis. Porém, resultados recentes sugerem que variantes aleatórias do método possuem melhores garantias de convergência. Nessa variante, a cada iteração, a variável a ser alterada é sorteada com uma probabilidade preestabelecida não necessariamente uniforme. Neste trabalho estudamos algumas variações do método de busca em coordenada. São apresentados aspectos teóricos desses métodos, porém focamos nos aspectos práticos de implementação e na comparação experimental entre variações do método de busca em coordenada aplicados a diferentes problemas com aplicações reais. / Real world problemas in areas such as machine learning are known for the huge number of decision variables (> 10^6) and data volume. For such problems working with second order derivatives is prohibitive. These problems have properties that benefits the application of coordinate descent/minimization methods. These kind of methods are defined by the change of a single, or small number of, decision variable at each iteration. In the literature, the commonly found description of this type of method is based on the cyclic change of variables. Recent papers have shown that randomized versions of this method have better convergence properties. This version is based on the change of a single variable chosen randomly at each iteration, based on a fixed, but not necessarily uniform, distribution. In this work we present some theoretical aspects of such methods, but we focus on practical aspects.
|
126 |
Robust Control with Complexity Constraint : A Nevanlinna-Pick Interpolation ApproachNagamune, Ryozo January 2002 (has links)
No description available.
|
127 |
Spectral Estimation by Geometric, Topological and Optimization MethodsEnqvist, Per January 2001 (has links)
QC 20100601
|
128 |
A convex optimization approach to complexity constrained analytic interpolation with applications to ARMA estimation and robust controlBlomqvist, Anders January 2005 (has links)
Analytical interpolation theory has several applications in systems and control. In particular, solutions of low degree, or more generally of low complexity, are of special interest since they allow for synthesis of simpler systems. The study of degree constrained analytic interpolation was initialized in the early 80's and during the past decade it has had significant progress. This thesis contributes in three different aspects to complexity constrained analytic interpolation: theory, numerical algorithms, and design paradigms. The contributions are closely related; shortcomings of previous design paradigms motivate development of the theory, which in turn calls for new robust and efficient numerical algorithms. Mainly two theoretical developments are studied in the thesis. Firstly, the spectral Kullback-Leibler approximation formulation is merged with simultaneous cepstral and covariance interpolation. For this formulation, both uniqueness of the solution, as well as smoothness with respect to data, is proven. Secondly, the theory is generalized to matrix-valued interpolation, but then only allowing for covariance-type interpolation conditions. Again, uniqueness and smoothness with respect to data is proven. Three algorithms are presented. Firstly, a refinement of a previous algorithm allowing for multiple as well as matrix-valued interpolation in an optimization framework is presented. Secondly, an algorithm capable of solving the boundary case, that is, with spectral zeros on the unit circle, is given. This also yields an inherent numerical robustness. Thirdly, a new algorithm treating the problem with both cepstral and covariance conditions is presented. Two design paradigms have sprung out of the complexity constrained analytical interpolation theory. Firstly, in robust control it enables low degree Hinf controller design. This is illustrated by a low degree controller design for a benchmark problem in MIMO sensitivity shaping. Also, a user support for the tuning of controllers within the design paradigm for the SISO case is presented. Secondly, in ARMA estimation it provides unique model estimates, which depend smoothly on the data as well as enables frequency weighting. For AR estimation, a covariance extension approach to frequency weighting is discussed, and an example is given as an illustration. For ARMA estimation, simultaneous cepstral and covariance matching is generalized to include prefiltering. An example indicates that this might yield asymptotically efficient estimates. / QC 20100928
|
129 |
Baseband Processing in Analog Combining MIMO Systems: From Theoretical Design to FPGA ImplementationElvira Arregui, Víctor 21 July 2011 (has links)
In this thesis, we consider an analog antenna combining architecture for a MIMO wireless transceiver, while pointing out its advantages with respect to the traditional MIMO architectures. In the first part of this work, we focus on the transceiver design, especially the calculation of the beamformers that must be applied at the RF. This analysis is performed in an OFDM system under different assumptions on the channel state information. As a result, several criteria and algorithms for the selection of the beamformers are proposed. In the second part, we address the FPGA design and implementation of a baseband processor for this architecture. This baseband processor is based on the standard IEEE 802.11a. Finally, some real-time tests of the implemented baseband processor are carried out both in stand-alone configuration and also with the whole physical layer setup. / En esta tesis consideramos una arquitectura de combinación analógica de antenas para una estación inalámbrica MIMO, señalando las ventajas de ésta con respecto a la arquitectura tradicional MIMO. En la primera parte de este trabajo analizamos el cálculo de los pesos que se deben aplicar en RF. Este análisis es realizado para un sistema OFDM bajo diferentes suposiciones sobre el conocimiento del canal en el transmisor. Como resultado, se ofrecen varios criterios y algoritmos para el cálculo de los pesos. La segunda parte se centra en el diseño y la implementación FPGA de un procesador banda base para esta arquitectura. Este procesador está basando en el estándar IEEE 802.11a. Finalmente se llevan a cabo algunos experimentos en tiempo-real del procesador banda base. Estos experimentos se han realizado tanto con el procesador aislado como integrado en el resto de la capa física del sistema.
|
130 |
Dynamic Graph Generation and an Asynchronous Parallel Bundle Method Motivated by Train TimetablingFischer, Frank 12 July 2013 (has links) (PDF)
Lagrangian relaxation is a successful solution approach for many combinatorial optimisation problems, one of them being the train timetabling problem (TTP). We model this problem using time expanded networks for the single train schedules and coupling constraints to enforce restrictions like station capacities and headway times. Lagrangian relaxation of these coupling constraints leads to shortest path subproblems in the time expanded networks and is solved using a proximal bundle method. However, large instances of our practical partner Deutsche Bahn lead to computationally intractable models. In this thesis we develop two new algorithmic techniques to improve the solution process for this kind of optimisation problems.
The first new technique, Dynamic Graph Generation (DGG), aims at improving the computation of the shortest path subproblems in large time expanded networks. Without sacrificing any accuracy, DGG allows to store only small parts of the networks and to dynamically extend them whenever the stored part proves to be too small. This is possible by exploiting the properties of the objective function in many scheduling applications to prefer early paths or due times, respectively. We prove that DGG can be implemented very efficiently and its running time and the size of nodes that have to be stored additionally does not depend on the size of the time expanded network but only on the length of the train routes.
The second technique is an asynchronous and parallel bundle method (APBM). Traditional bundle methods require one solution of each subproblem in each iteration. However, many practical applications, e.g. the TTP, consist of rather loosely coupled subproblems. The APBM chooses only small subspaces corresponding to the Lagrange multipliers of strongly violated coupling constraints and optimises only these variables while keeping all other variables fixed. Several subspaces of disjoint variables may be chosen simultaneously and are optimised in parallel. The solutions of the subspace problem are incorporated into the global data as soon as it is available without any synchronisation mechanism. However, in order to guarantee convergence, the algorithm detects automatically dependencies between different subspaces and respects these dependencies in future subspace selections. We prove the convergence of the APBM under reasonable assumptions for both, the dual and associated primal aggregate data. The APBM is then further extended to problems with unknown dependencies between subproblems and constraints in the Lagrangian relaxation problem. The algorithm automatically detects these dependencies and respects them in future iterations. Again we prove the convergence of this algorithm under reasonable assumptions.
Finally we test our solution approach for the TTP on some real world instances of Deutsche Bahn. Using an iterative rounding heuristic based on the approximate fractional solutions obtained by the Lagrangian relaxation we are able to compute feasible schedules for all trains in a subnetwork of about 10% of the whole German network in about 12 hours. In these timetables 99% of all passenger trains could be scheduled with no significant delay and the travel time of the freight trains could be reduced by about one hour on average.
|
Page generated in 0.1226 seconds