Spelling suggestions: "subject:"aximum variance"" "subject:"amaximum variance""
1 |
Isometry and convexity in dimensionality reductionVasiloglou, Nikolaos 30 March 2009 (has links)
The size of data generated every year follows an exponential growth. The number of data points as well as the dimensions have increased dramatically the past 15 years. The gap between the demand from the industry in data processing and the solutions provided by the machine learning community is increasing. Despite the growth in memory and computational power, advanced statistical processing on the order of gigabytes is beyond any possibility. Most sophisticated Machine Learning algorithms require at least quadratic complexity. With the current computer model architecture, algorithms with higher complexity than linear O(N) or O(N logN) are not considered practical. Dimensionality reduction is a challenging problem in machine learning. Often data represented as multidimensional points happen to have high dimensionality. It turns out that the information they carry can be expressed with much less dimensions. Moreover the reduced dimensions of the data can have better interpretability than the original ones. There is a great variety of dimensionality reduction algorithms under the theory of Manifold Learning. Most of the methods such as Isomap, Local Linear Embedding, Local Tangent Space Alignment, Diffusion Maps etc. have been extensively studied under the framework of Kernel Principal Component Analysis (KPCA). In this dissertation we study two current state of the art dimensionality reduction methods, Maximum Variance Unfolding (MVU) and Non-Negative Matrix Factorization (NMF). These two dimensionality reduction methods do not fit under the umbrella of Kernel PCA. MVU is cast as a Semidefinite Program, a modern convex nonlinear optimization algorithm, that offers more flexibility and power compared to iv KPCA. Although MVU and NMF seem to be two disconnected problems, we show that there is a connection between them. Both are special cases of a general nonlinear factorization algorithm that we developed. Two aspects of the algorithms are of particular interest: computational complexity and interpretability. In other words computational complexity answers the question of how fast we can find the best solution of MVU/NMF for large data volumes. Since we are dealing with optimization programs, we need to find the global optimum. Global optimum is strongly connected with the convexity of the problem. Interpretability is strongly connected with local isometry1 that gives meaning in relationships between data points. Another aspect of interpretability is association of data with labeled information. The contributions of this thesis are the following:
1. MVU is modified so that it can scale more efficient. Results are shown on 1 million speech datasets. Limitations of the method are highlighted.
2. An algorithm for fast computations for the furthest neighbors is presented for the first time in the literature.
3. Construction of optimal kernels for Kernel Density Estimation with modern convex programming is presented. For the first time we show that the Leave One Cross Validation (LOOCV) function is quasi-concave.
4. For the first time NMF is formulated as a convex optimization problem
5. An algorithm for the problem of Completely Positive Matrix Factorization is presented.
6. A hybrid algorithm of MVU and NMF the isoNMF is presented combining advantages of both methods.
7. The Isometric Separation Maps (ISM) a variation of MVU that contains classification information is presented.
8. Large scale nonlinear dimensional analysis on the TIMIT speech database is performed.
9. A general nonlinear factorization algorithm is presented based on sequential convex programming. Despite the efforts to scale the proposed methods up to 1 million data points in reasonable time, the gap between the industrial demand and the current state of the art is still orders of magnitude wide.
|
2 |
Linear systems with Markov jumps and multiplicative noises: the constrained total variance problem. / Sistemas lineares com saltos Markovianos e ruídos multiplicativos: o problema da variância total restrita.Barbieri, Fabio 20 December 2016 (has links)
In this work we study the stochastic optimal control problem of discrete-time linear systems subject to Markov jumps and multiplicative noises. We consider the multiperiod and finite time horizon optimization of a mean-variance cost function under a new criterion. In this new problem, we apply a constraint on the total output variance weighted by its risk parameter while maximizing the expected output. The optimal control law is obtained from a set of interconnected Riccati difference equations, extending previous results in the literature. The application of our results is exemplified by numerical simulations of a portfolio of stocks and a risk-free asset. / Neste trabalho, estudamos o problema do controle ótimo estocástico de sistemas lineares em tempo discreto sujeitos a saltos Markovianos e ruídos multiplicativos. Consideramos a otimização multiperíodo, com horizonte de tempo finito, de um funcional da média-variância sob um novo critério. Neste novo problema, maximizamos o valor esperado da saída do sistema ao mesmo tempo em que limitamos a sua variância total ponderada pelo seu parâmetro de risco. A lei de controle ótima é obtida através de um conjunto de equações de diferenças de Riccati interconectadas, estendendo resultados anteriores da literatura. São apresentadas simulações numéricas para uma carteira de investimentos com ações e um ativo de risco para exemplificarmos a aplicação de nossos resultados.
|
3 |
Linear systems with Markov jumps and multiplicative noises: the constrained total variance problem. / Sistemas lineares com saltos Markovianos e ruídos multiplicativos: o problema da variância total restrita.Fabio Barbieri 20 December 2016 (has links)
In this work we study the stochastic optimal control problem of discrete-time linear systems subject to Markov jumps and multiplicative noises. We consider the multiperiod and finite time horizon optimization of a mean-variance cost function under a new criterion. In this new problem, we apply a constraint on the total output variance weighted by its risk parameter while maximizing the expected output. The optimal control law is obtained from a set of interconnected Riccati difference equations, extending previous results in the literature. The application of our results is exemplified by numerical simulations of a portfolio of stocks and a risk-free asset. / Neste trabalho, estudamos o problema do controle ótimo estocástico de sistemas lineares em tempo discreto sujeitos a saltos Markovianos e ruídos multiplicativos. Consideramos a otimização multiperíodo, com horizonte de tempo finito, de um funcional da média-variância sob um novo critério. Neste novo problema, maximizamos o valor esperado da saída do sistema ao mesmo tempo em que limitamos a sua variância total ponderada pelo seu parâmetro de risco. A lei de controle ótima é obtida através de um conjunto de equações de diferenças de Riccati interconectadas, estendendo resultados anteriores da literatura. São apresentadas simulações numéricas para uma carteira de investimentos com ações e um ativo de risco para exemplificarmos a aplicação de nossos resultados.
|
4 |
Social Graph Anonymization / Anonymisation de graphes sociauxNguyen, Huu-Hiep 04 November 2016 (has links)
La vie privée est une préoccupation des utilisateurs des réseaux sociaux. Les réseaux sociaux sont une source de données précieuses pour des analyses scientifiques ou commerciales. Cette thèse aborde trois problèmes de confidentialité des réseaux sociaux: l'anonymisation de graphes sociaux, la détection de communautés privées et l'échange de liens privés. Nous abordons le problème d'anonymisation de graphes via la sémantique de l'incertitude et l'intimité différentielle. Pour la première, nous proposons un modèle général appelé Uncertain Adjacency Matrix (UAM) qui préserve dans le graphe anonymisé les degrés des nœuds du graphe non-anonymisé. Nous analysons deux schémas proposés récemment et montrons leur adaptation dans notre modèle. Nous aussi présentons notre approche dite MaxVar. Pour la technique d'intimité différentielle, le problème devient difficile en raison de l'énorme espace des graphes anonymisés possibles. Un grand nombre de systèmes existants ne permettent pas de relâcher le budget contrôlant la vie privée, ni de déterminer sa borne supérieure. Dans notre approche nous pouvons calculer cette borne. Nous introduisons le nouveau schéma Top-m-Filter de complexité linéaire et améliorons la technique récente EdgeFlip. L'évaluation de ces algorithmes sur une large gamme de graphes donne un panorama de l'état de l'art. Nous présentons le problème original de la détection de la communauté dans le cadre de l'intimité différentielle. Nous analysons les défis majeurs du problème et nous proposons quelques approches pour les aborder sous deux angles: par perturbation d'entrée (schéma LouvainDP) et par perturbation d'algorithme (schéma ModDivisive) / Privacy is a serious concern of users in daily usage of social networks. Social networks are a valuable data source for large-scale studies on social organization and evolution and are usually published in anonymized forms. This thesis addresses three privacy problems of social networks: graph anonymization, private community detection and private link exchange. First, we tackle the problem of graph anonymization via uncertainty semantics and differential privacy. As for uncertainty semantics, we propose a general obfuscation model called Uncertain Adjacency Matrix (UAM) that keep expected node degrees equal to those in the unanonymized graph. We analyze two recently proposed schemes and show their fitting into the model. We also present our scheme Maximum Variance (MaxVar) to fill the gap between them. Using differential privacy, the problem is very challenging because of the huge output space of noisy graphs. A large body of existing schemes on differentially private release of graphs are not consistent with increasing privacy budgets as well as do not clarify the upper bounds of privacy budgets. In this thesis, such a bound is provided. We introduce the new linear scheme Top-m-Filter (TmF) and improve the existing technique EdgeFlip. Thorough comparative evaluation on a wide range of graphs provides a panorama of the state-of-the-art's performance as well as validates our proposed schemes. Second, we present the problem of community detection under differential privacy. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation (LouvainDP) and algorithm perturbation (ModDivisive)
|
Page generated in 0.356 seconds