Global ETD Search

1	Convex relaxation for the planted clique, biclique, and clustering problems Ames, Brendan January 2011 (has links) A clique of a graph G is a set of pairwise adjacent nodes of G. Similarly, a biclique (U, V ) of a bipartite graph G is a pair of disjoint, independent vertex sets such that each node in U is adjacent to every node in V in G. We consider the problems of identifying the maximum clique of a graph, known as the maximum clique problem, and identifying the biclique (U, V ) of a bipartite graph that maximizes the product \|U \| · \|V \|, known as the maximum edge biclique problem. We show that ﬁnding a clique or biclique of a given size in a graph is equivalent to ﬁnding a rank one matrix satisfying a particular set of linear constraints. These problems can be formulated as rank minimization problems and relaxed to convex programming by replacing rank with its convex envelope, the nuclear norm. Both problems are NP-hard yet we show that our relaxation is exact in the case that the input graph contains a large clique or biclique plus additional nodes and edges. For each problem, we provide two analyses of when our relaxation is exact. In the ﬁrst, the diversionary edges are added deterministically by an adversary. In the second, each potential edge is added to the graph independently at random with ﬁxed probability p. In the random case, our bounds match the earlier bounds of Alon, Krivelevich, and Sudakov, as well as Feige and Krauthgamer for the maximum clique problem. We extend these results and techniques to the k-disjoint-clique problem. The maximum node k-disjoint-clique problem is to ﬁnd a set of k disjoint cliques of a given input graph containing the maximum number of nodes. Given input graph G and nonnegative edge weights w, the maximum mean weight k-disjoint-clique problem seeks to identify the set of k disjoint cliques of G that maximizes the sum of the average weights of the edges, with respect to w, of the complete subgraphs of G induced by the cliques. These problems may be considered as a way to pose the clustering problem. In clustering, one wants to partition a given data set so that the data items in each partition or cluster are similar and the items in diﬀerent clusters are dissimilar. For the graph G such that the set of nodes represents a given data set and any two nodes are adjacent if and only if the corresponding items are similar, clustering the data into k disjoint clusters is equivalent to partitioning G into k-disjoint cliques. Similarly, given a complete graph with nodes corresponding to a given data set and edge weights indicating similarity between each pair of items, the data may be clustered by solving the maximum mean weight k-disjoint-clique problem. We show that both instances of the k-disjoint-clique problem can be formulated as rank constrained optimization problems and relaxed to semideﬁnite programs using the nuclear norm relaxation of rank. We also show that when the input instance corresponds to a collection of k disjoint planted cliques plus additional edges and nodes, this semideﬁnite relaxation is exact for both problems. We provide theoretical bounds that guarantee exactness of our relaxation and provide empirical examples of successful applications of our algorithm to synthetic data sets, as well as data sets from clustering applications. nuclear norm minimization convex optimization maximum clique clustering Combinatorics and Optimization
2	Convex relaxation for the planted clique, biclique, and clustering problems Ames, Brendan January 2011 (has links) A clique of a graph G is a set of pairwise adjacent nodes of G. Similarly, a biclique (U, V ) of a bipartite graph G is a pair of disjoint, independent vertex sets such that each node in U is adjacent to every node in V in G. We consider the problems of identifying the maximum clique of a graph, known as the maximum clique problem, and identifying the biclique (U, V ) of a bipartite graph that maximizes the product \|U \| · \|V \|, known as the maximum edge biclique problem. We show that ﬁnding a clique or biclique of a given size in a graph is equivalent to ﬁnding a rank one matrix satisfying a particular set of linear constraints. These problems can be formulated as rank minimization problems and relaxed to convex programming by replacing rank with its convex envelope, the nuclear norm. Both problems are NP-hard yet we show that our relaxation is exact in the case that the input graph contains a large clique or biclique plus additional nodes and edges. For each problem, we provide two analyses of when our relaxation is exact. In the ﬁrst, the diversionary edges are added deterministically by an adversary. In the second, each potential edge is added to the graph independently at random with ﬁxed probability p. In the random case, our bounds match the earlier bounds of Alon, Krivelevich, and Sudakov, as well as Feige and Krauthgamer for the maximum clique problem. We extend these results and techniques to the k-disjoint-clique problem. The maximum node k-disjoint-clique problem is to ﬁnd a set of k disjoint cliques of a given input graph containing the maximum number of nodes. Given input graph G and nonnegative edge weights w, the maximum mean weight k-disjoint-clique problem seeks to identify the set of k disjoint cliques of G that maximizes the sum of the average weights of the edges, with respect to w, of the complete subgraphs of G induced by the cliques. These problems may be considered as a way to pose the clustering problem. In clustering, one wants to partition a given data set so that the data items in each partition or cluster are similar and the items in diﬀerent clusters are dissimilar. For the graph G such that the set of nodes represents a given data set and any two nodes are adjacent if and only if the corresponding items are similar, clustering the data into k disjoint clusters is equivalent to partitioning G into k-disjoint cliques. Similarly, given a complete graph with nodes corresponding to a given data set and edge weights indicating similarity between each pair of items, the data may be clustered by solving the maximum mean weight k-disjoint-clique problem. We show that both instances of the k-disjoint-clique problem can be formulated as rank constrained optimization problems and relaxed to semideﬁnite programs using the nuclear norm relaxation of rank. We also show that when the input instance corresponds to a collection of k disjoint planted cliques plus additional edges and nodes, this semideﬁnite relaxation is exact for both problems. We provide theoretical bounds that guarantee exactness of our relaxation and provide empirical examples of successful applications of our algorithm to synthetic data sets, as well as data sets from clustering applications. nuclear norm minimization convex optimization maximum clique clustering Combinatorics and Optimization
3	A probabilistic framework and algorithms for modeling and analyzing multi-instance data Behmardi, Behrouz 28 November 2012 (has links) Multi-instance data, in which each object (e.g., a document) is a collection of instances (e.g., word), are widespread in machine learning, signal processing, computer vision, bioinformatic, music, and social sciences. Existing probabilistic models, e.g., latent Dirichlet allocation (LDA), probabilistic latent semantic indexing (pLSI), and discrete component analysis (DCA), have been developed for modeling and analyzing multiinstance data. Such models introduce a generative process for multi-instance data which includes a low dimensional latent structure. While such models offer a great freedom in capturing the natural structure in the data, their inference may present challenges. For example, the sensitivity in choosing the hyper-parameters in such models, requires careful inference (e.g., through cross-validation) which results in large computational complexity. The inference for fully Bayesian models which contain no hyper-parameters often involves slowly converging sampling methods. In this work, we develop approaches for addressing such challenges and further enhancing the utility of such models. This dissertation demonstrates a unified convex framework for probabilistic modeling of multi-instance data. The three main aspects of the proposed framework are as follows. First, joint regularization is incorporated into multiple density estimation to simultaneously learn the structure of the distribution space and infer each distribution. Second, a novel confidence constraints framework is used to facilitate a tuning-free approach to control the amount of regularization required for the joint multiple density estimation with theoretical guarantees on correct structure recovery. Third, we formulate the problem using a convex framework and propose efficient optimization algorithms to solve it. This work addresses the unique challenges associated with both discrete and continuous domains. In the discrete domain we propose a confidence-constrained rank minimization (CRM) to recover the exact number of topics in topic models with theoretical guarantees on recovery probability and mean squared error of the estimation. We provide a computationally efficient optimization algorithm for the problem to further the applicability of the proposed framework to large real world datasets. In the continuous domain, we propose to use the maximum entropy (MaxEnt) framework for multi-instance datasets. In this approach, bags of instances are represented as distributions using the principle of MaxEnt. We learn basis functions which span the space of distributions for jointly regularized density estimation. The basis functions are analogous to topics in a topic model. We validate the efficiency of the proposed framework in the discrete and continuous domains by extensive set of experiments on synthetic datasets as well as on real world image and text datasets and compare the results with state-of-the-art algorithms. / Graduation date: 2013 Multi-instance learning Maximum entropy Low-rank matrix recovery Confidence-constraint Nuclear norm minimization Entropy (Information theory) Machine learning -- Mathematical models Computer algorithms
4	Optimisation non-lisse pour l'apprentissage statistique avec régularisation matricielle structurée / Nonsmooth optimization for statistical learning with structured matrix regularization Pierucci, Federico 23 June 2017 (has links) La phase d’apprentissage des méthodes d’apprentissage statistique automatique correspondent à la résolution d’un problème d’optimisation mathématique dont la fonction objectif se décompose en deux parties: a) le risque empirique, construit à partir d’une fonction de perte, dont la forme est déterminée par la métrique de performance et les hypothèses sur le bruit; b) la pénalité de régularisation, construite a partir d’une norme ou fonction jauge, dont la structure est déterminée par l’information à priori disponible sur le problème a résoudre.Les fonctions de perte usuelles, comme la fonction de perte charnière pour la classification supervisée binaire, ainsi que les fonctions de perte plus avancées comme celle pour la classification supervisée avec possibilité d’abstention, sont non-différentiables. Les pénalités de régularisation comme la norme l1 (vectorielle), ainsi que la norme nucléaire (matricielle), sont également non- différentiables. Cependant, les algorithmes d’optimisation numériques les plus simples, comme l’algorithme de sous-gradient ou les méthodes de faisceaux, ne tirent pas profit de la structure composite de l’objectif. Le but de cette thèse est d’étudier les problèmes d’apprentissage doublement non-différentiables (perte non- différentiable et régularisation non-différentiable), ainsi que les algorithmes d’optimisation numérique qui sont en mesure de bénéficier de cette structure composite.Dans le premier chapitre, nous présentons une nouvelle famille de pénalité de régularisation, les normes de Schatten par blocs, qui généralisent les normes de Schatten classiques. Nous démontrons les principales propriétés des normes de Schatten par blocs en faisant appel à des outils d’analyse convexe et d’algèbre linéaire; nous retrouvons en particulier des propriétés caractérisant les normes proposées en termes d’enveloppe convexes. Nous discutons plusieurs applications potentielles de la norme nucléaire par blocs, pour le filtrage collaboratif, la compression de bases de données, et l’annotation multi-étiquettes d’images.Dans le deuxième chapitre, nous présentons une synthèse de différentes tech- niques de lissage qui permettent d’utiliser des algorithmes de premier ordre adaptes aux objectifs composites qui de décomposent en un terme différentiable et un terme non-différentiable. Nous montrons comment le lissage peut être utilisé pour lisser la fonction de perte correspondant à la précision au rang k, populaire pour le classement et la classification supervises d’images. Nous décrivons dans les grandes lignes plusieurs familles d’algorithmes de premier ordre qui peuvent bénéficier du lissage: i) les algorithmes de gradient conditionnel; ii) les algorithmes de gradient proximal; iii) les algorithmes de gradient incrémental.Dans le troisième chapitre, nous étudions en profondeur les algorithmes de gradient conditionnel pour les problèmes d’optimisation non-différentiables d’apprentissage statistique automatique. Nous montrons qu’une stratégie de lis- sage adaptative associée à un algorithme de gradient conditionnel donne lieu à de nouveaux algorithmes de gradient conditionnel qui satisfont des garanties de convergence théoriques. Nous présentons des résultats expérimentaux prometteurs des problèmes de filtrage collaboratif pour la recommandation de films et de catégorisation d’images. / Training machine learning methods boils down to solving optimization problems whose objective functions often decomposes into two parts: a) the empirical risk, built upon the loss function, whose shape is determined by the performance metric and the noise assumptions; b) the regularization penalty, built upon a norm, or a gauge function, whose structure is determined by the prior information available for the problem at hand.Common loss functions, such as the hinge loss for binary classification, or more advanced loss functions, such as the one arising in classification with reject option, are non-smooth. Sparse regularization penalties such as the (vector) l1- penalty, or the (matrix) nuclear-norm penalty, are also non-smooth. However, basic non-smooth optimization algorithms, such as subgradient optimization or bundle-type methods, do not leverage the composite structure of the objective. The goal of this thesis is to study doubly non-smooth learning problems (with non-smooth loss functions and non-smooth regularization penalties) and first- order optimization algorithms that leverage composite structure of non-smooth objectives.In the first chapter, we introduce new regularization penalties, called the group Schatten norms, to generalize the standard Schatten norms to block- structured matrices. We establish the main properties of the group Schatten norms using tools from convex analysis and linear algebra; we retrieve in particular some convex envelope properties. We discuss several potential applications of the group nuclear-norm, in collaborative filtering, database compression, multi-label image tagging.In the second chapter, we present a survey of smoothing techniques that allow us to use first-order optimization algorithms designed for composite objectives decomposing into a smooth part and a non-smooth part. We also show how smoothing can be used on the loss function corresponding to the top-k accuracy, used for ranking and multi-class classification problems. We outline some first-order algorithms that can be used in combination with the smoothing technique: i) conditional gradient algorithms; ii) proximal gradient algorithms; iii) incremental gradient algorithms.In the third chapter, we study further conditional gradient algorithms for solving doubly non-smooth optimization problems. We show that an adaptive smoothing combined with the standard conditional gradient algorithm gives birth to new conditional gradient algorithms having the expected theoretical convergence guarantees. We present promising experimental results in collaborative filtering for movie recommendation and image categorization. Méthodes de gradient Frank-Wolfe Gradient conditionnel Lissage Norme nucléaire First-Order optimization Conditional gradient Frank-Wolfe Smoothing Nuclear-Norm 004 510
5	Robust Subspace Estimation Using Low-rank Optimization. Theory And Applications In Scene Reconstruction, Video Denoising, And Activity Recognition. Oreifej, Omar 01 January 2013 (has links) In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time. Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories . . . etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the iii turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage “sparsifies” the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the iv problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and `1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by `1 norm. Second, since the object’s motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned diffi- culties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions v of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results. Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size . . . etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modelled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin. low rank representation low rank sparse representation sparse activity recognition turbulence mitigation video denoising complex event recognition nuclear norm augmented lagrange multiplier camera motion estimation trecvid hoha water waves rank trajectories particle advection registration decomposition moving object detection background subtraction atmospheric turbulence Computer Engineering Engineering

1

Page generated in 0.0256 seconds