Global ETD Search

11	Efficient Bayesian Inference for Multivariate Factor Stochastic Volatility Models Kastner, Gregor, Frühwirth-Schnatter, Sylvia, Lopes, Hedibert Freitas 24 February 2016 (has links) (PDF) We discuss efficient Bayesian estimation of dynamic covariance matrices in multivariate time series through a factor stochastic volatility model. In particular, we propose two interweaving strategies (Yu and Meng, Journal of Computational and Graphical Statistics, 20(3), 531-570, 2011) to substantially accelerate convergence and mixing of standard MCMC approaches. Similar to marginal data augmentation techniques, the proposed acceleration procedures exploit non-identifiability issues which frequently arise in factor models. Our new interweaving strategies are easy to implement and come at almost no extra computational cost; nevertheless, they can boost estimation efficiency by several orders of magnitude as is shown in extensive simulation studies. To conclude, the application of our algorithm to a 26-dimensional exchange rate data set illustrates the superior performance of the new approach for real-world data. / Series: Research Report Series / Department of Statistics and Mathematics
12	Representações textuais e a geração de hubs : um estudo comparativo Aguiar, Raul Freire January 2017 (has links) Orientador: Prof. Dr. Ronaldo Pratti / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2017. / O efeito de hubness, juntamente com a maldição de dimensionalidade, vem sendo estudado, sob diferentes oticas, nos ultimos anos. Os estudos apontam que este problema esta presente em varios conjuntos de dados do mundo real e que a presença de hubs (tendencia de alguns exemplos aparecem com frequencia na lista de vizinhos mais proximos de outros exemplos) traz uma serie de consequencias indesejaveis, como por exemplo, afetar o desempenho de classificadores. Em tarefas de mineração de texto, o problema depende tambem da maneira escolhida pra representar os documentos. Sendo assim o objetivo principal dessa dissertação é avaliar o impacto da formação de hubs em diferentes representações textuais. Ate onde vai o nosso conhecimento e durante o período desta pesquisa, não foi posivel encontrar na literatura um estudo aprofundado sobre as implicaçõess do efeito de hubness em diferentes representações textuais. Os resultados sugerem que as diferentes representações textuais implicam em corpus com propensão menor para a formação de hubs. Notou-se também que a incidencia de hubs nas diferentes representações textuais possuem in uencia similar em alguns classificadores. Analisamos tambem o desempenho dos classifcadores apos a remoção de documentos sinalizados como hubs em porçõess pre-estabelecidas do tamanho total do data set. Essa remoção trouxe, a alguns algoritmos, uma tendencia de melhoria de desempenho. Dessa maneira, apesar de nem sempre efetiva, a estrategia de identifcar e remover hubs com uma vizinhança majoritariamente ruim pode ser uma interessante tecnica de pre-processamento a ser considerada, com o intuito de melhorar o desempenho preditivo da tarefa de classificação. / The hubness phenomenon, associated to the curse of dimensionality, has been studied, from diferent perspectives, in recent years. These studies point out that the hubness problem is present in several real-world data sets and, as a consequence, the hubness implies a series of undesirable side efects, such as an increase in misclassifcation error in classification tasks. In text mining research, this problem also depends on the choice of text representation. Hence, the main objective of the dissertation is to evaluate the impact of the hubs presence in diferent textual representations. To the best of our knowledge, this is the first study that performs an in-depth analysis on the efects of the hub problem in diferent textual representations. The results suggest that diferent text representations implies in diferent bias towards hubs presence in diferent corpus. It was also noticed that the presence of hubs in dierent text representations has similar in uence for some classifiers. We also analyzed the performance of classifiers after removing documents agged as hubs in pre-established portions of the total data set size. This removal allows, to some algorithms, a trend of improvement in performance. Thus, although not always efective, the strategy of identifying and removing hubs with a majority of bad neighborhood may be an interesting preprocessing technique to be considered in order to improve the predictive performance of the text classification task. MINERAÇÃO DE TEXTOS REPRESENTAÇÃO DE TEXTOS MALDIÇÃO DE DIMENSIONALIDADE TEXT MINING TEXT REPRESENTATION CURSE OF DIMENSIONALITY
13	Contributions à l'inférence statistique dans les modèles de régression partiellement linéaires additifs / Contributions to the statistical inference in partially linear additive regression model Chokri, Khalid 21 November 2014 (has links) Les modèles de régression paramétrique fournissent de puissants outils pour la modélisation des données lorsque celles-ci s’y prêtent bien. Cependant, ces modèles peuvent être la source d’importants biais lorsqu’ils ne sont pas adéquats. Pour éliminer ces biais de modélisation, des méthodes non paramétriques ont été introduites permettant aux données elles mêmes de construire le modèle. Ces méthodes présentent, dans le cas multivarié, un handicap connu sous l’appellation de fléau de la dimension où la vitesse de convergence des estimateurs est une fonction décroissante de la dimension des covariables. L’idée est alors de combiner une partie linéaire avec une partie non-linéaire, ce qui aurait comme effet de réduire l’impact du fléau de la dimension. Néanmoins l’estimation non-paramétrique de la partie non-linéaire, lorsque celle-ci est multivariée, est soumise à la même contrainte de détérioration de sa vitesse de convergence. Pour pallier ce problème, la réponse adéquate est l’introduction d’une structure additive de la partie non-linéaire de son estimation par des méthodes appropriées. Cela permet alors de définir des modèles de régression partièllement linéaires et additifs. L’objet de la thèse est d’établir des résultats asymptotiques relatifs aux divers paramètres de ce modèle (consistance, vitesses de convergence, normalité asymptotique et loi du logarithme itéré) et de construire aussi des tests d’hypothèses relatives à la structure du modèle, comme l’additivité de la partie non-linéaire, et à ses paramètres. / Parametric regression models provide powerful tools for analyzing practical data when the models are correctly specified, but may suffer from large modelling biases when structures of the models are misspecified. As an alternative, nonparametric smoothing methods eases the concerns on modelling biases. However, nonparametric models are hampered by the so-called curse of dimensionality in multivariate settings. One of the methods for attenuating this difficulty is to model covariate effects via a partially linear structure, a combination of linear and nonlinear parts. To reduce the dimension impact in the estimation of the nonlinear part of the partially linear regression model, we introduce an additive structure of this part which induces, finally, a partially linear additive model. Our aim in this work is to establish some limit results pertaining to various parameters of the model (consistency, rate of convergence, asymptotic normality and iterated logarithm law) and to construct some hypotheses testing procedures related to the model structure, as the additivity of the nonlinear part, and to its parameters. Modèle additif Test d'additivité Fléau de la dimension Loi du logarithme itéré Estimateur à noyau Intégration marginale . Curse of dimensionality Additive modeling 510
14	A Design Space Exploration Process for Large Scale, Multi-Objective Computer Simulations Zentner, John Marc 07 July 2006 (has links) The primary contributions of this thesis are associated with the development of a new method for exploring the relationships between inputs and outputs for large scale computer simulations. Primarily, the proposed design space exploration procedure uses a hierarchical partitioning method to help mitigate the curse of dimensionality often associated with the analysis of large scale systems. Closely coupled with the use of a partitioning approach, is the problem of how to partition the system. This thesis also introduces and discusses a quantitative method developed to aid the user in finding a set of good partitions for creating partitioned metamodels of large scale systems. The new hierarchically partitioned metamodeling scheme, the lumped parameter model (LPM), was developed to address two primary limitations to the current partitioning methods for large scale metamodeling. First the LPM was formulated to negate the need to rely on variable redundancies between partitions to account for potentially important interactions. By using a hierarchical structure, the LPM addresses the impact of neglected, direct interactions by indirectly accounting for these interactions via the interactions that occur between the lumped parameters in intermediate to top-level mappings. Secondly, the LPM was developed to allow for hierarchical modeling of black-box analyses that do not have available intermediaries with which to partition the system around. The second contribution of this thesis is a graph-based partitioning method for large scale, black-box systems. The graph-based partitioning method combines the graph and sparse matrix decomposition methods used by the electrical engineering community with the results of a screening test to create a quantitative method for partitioning large scale, black-box systems. An ANOVA analysis of the results of a screening test can be used to determine the sparse nature of the large scale system. With this information known, the sparse matrix and graph theoretic partitioning schemes can then be used to create potential sets of partitions to use with the lumped parameter model. Graph-based partitioning Hierarchical metamodeling Partitioned metamodeling Large scale metamodeling Computer experimentation Curse-of-dimensionality Partitions (Mathematics) Computer simulation Large scale systems
15	Novel computationally intelligent machine learning algorithms for data mining and knowledge discovery Gheyas, Iffat A. January 2009 (has links) This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model. 006.3
16	Nonparametric Learning in High Dimensions Liu, Han 01 December 2010 (has links) This thesis develops flexible and principled nonparametric learning algorithms to explore, understand, and predict high dimensional and complex datasets. Such data appear frequently in modern scientific domains and lead to numerous important applications. For example, exploring high dimensional functional magnetic resonance imaging data helps us to better understand brain functionalities; inferring large-scale gene regulatory network is crucial for new drug design and development; detecting anomalies in high dimensional transaction databases is vital for corporate and government security. Our main results include a rigorous theoretical framework and efficient nonparametric learning algorithms that exploit hidden structures to overcome the curse of dimensionality when analyzing massive high dimensional datasets. These algorithms have strong theoretical guarantees and provide high dimensional nonparametric recipes for many important learning tasks, ranging from unsupervised exploratory data analysis to supervised predictive modeling. In this thesis, we address three aspects: 1 Understanding the statistical theories of high dimensional nonparametric inference, including risk, estimation, and model selection consistency; 2 Designing new methods for different data-analysis tasks, including regression, classification, density estimation, graphical model learning, multi-task learning, spatial-temporal adaptive learning; 3 Demonstrating the usefulness of these methods in scientific applications, including functional genomics, cognitive neuroscience, and meteorology. In the last part of this thesis, we also present the future vision of high dimensional and large-scale nonparametric inference. machine learning statistical inference nonparametric methods curse of dimensionality regression classification multi-task learning density estimation undirected graphical models structure learning spatial-temporal adaptive learning
17	Conjurer la malédiction de la dimension dans le calcul du noyau de viabilité à l'aide de parallélisation sur carte graphique et de la théorie de la fiabilité : application à des dynamiques environnementales / Dispel the dimensionality curse in viability kernel computation with the help of GPGPU and reliability theory : application to environmental dynamics Brias, Antoine 15 December 2016 (has links) La théorie de la viabilité propose des outils permettant de contrôler un système dynamique afin de le maintenir dans un domaine de contraintes. Le concept central de cette théorie est le noyau de viabilité, qui est l’ensemble des états initiaux à partir desquels il existe au moins une trajectoire contrôlée restant dans le domaine de contraintes. Cependant, le temps et l’espace nécessaires au calcul du noyau de viabilité augmentent exponentiellement avec le nombre de dimensions du problème considéré. C’est la malédiction de la dimension. Elle est d’autant plus présente dans le cas de systèmes incorporant des incertitudes. Dans ce cas-là, le noyau de viabilité devient l’ensemble des états pour lesquels il existe une stratégie de contrôle permettant de rester dans le domaine de contraintes avec au moins une certaine probabilité jusqu’à l’horizon de temps donné. L’objectif de cette thèse est d’étudier et de développer des approches afin de combattre cette malédiction de la dimension. Pour ce faire, nous avons proposé deux axes de recherche : la parallélisation des calculs et l’utilisation de la théorie de la fiabilité. Les résultats sont illustrés par plusieurs applications. Le premier axe explore l’utilisation de calcul parallèle sur carte graphique. La version du programme utilisant la carte graphique est jusqu’à 20 fois plus rapide que la version séquentielle, traitant des problèmes jusqu’en dimension 7. Outre ces gains en temps de calcul, nos travaux montrent que la majeure partie des ressources est utilisée pour le calcul des probabilités de transition du système. Cette observation fait le lien avec le deuxième axe de recherche qui propose un algorithme calculant une approximation de noyaux de viabilité stochastiques utilisant des méthodes fiabilistes calculant les probabilités de transition. L’espace-mémoire requis par cet algorithme est une fonction linéaire du nombre d’états de la grille utilisée, contrairement à l’espace-mémoire requis par l’algorithme de programmation dynamique classique qui dépend quadratiquement du nombre d’états. Ces approches permettent d’envisager l’application de la théorie de la viabilité à des systèmes de plus grande dimension. Ainsi nous l’avons appliquée à un modèle de dynamique du phosphore dans le cadre de la gestion de l’eutrophisation des lacs, préalablement calibré sur les données du lac du Bourget. De plus, les liens entre fiabilité et viabilité sont mis en valeur avec une application du calcul de noyau de viabilité stochastique, autrement appelé noyau de fiabilité, en conception fiable dans le cas d’une poutre corrodée. / Viability theory provides tools to maintain a dynamical system in a constraint domain. The main concept of this theory is the viability kernel, which is the set of initial states from which there is at least one controlled trajectory remaining in the constraint domain. However, the time and space needed to calculate the viability kernel increases exponentially with the number of dimensions of the problem. This issue is known as “the curse of dimensionality”. This curse is even more present when applying the viability theory to uncertain systems. In this case, the viability kernel is the set of states for which there is at least a control strategy to stay in the constraint domain with some probability until the time horizon. The objective of this thesis is to study and develop approaches to beat back the curse of dimensionality. We propose two lines of research: the parallel computing and the use of reliability theory tools. The results are illustrated by several applications. The first line explores the use of parallel computing on graphics card. The version of the program using the graphics card is up to 20 times faster than the sequential version, dealing with problems until dimension 7. In addition to the gains in calculation time, our work shows that the majority of the resources is used to the calculation of transition probabilities. This observation makes the link with the second line of research which proposes an algorithm calculating a stochastic approximation of viability kernels by using reliability methods in order to compute the transition probabilities. The memory space required by this algorithm is a linear function of the number of states of the grid, unlike the memory space required by conventional dynamic programming algorithm which quadratically depends on the number of states. These approaches may enable the use of the viability theory in the case of high-dimension systems. So we applied it to a phosphorus dynamics for the management of Lake Bourget eutrophication, previously calibrated from experimental data. In addition the relationship between reliability and viability is highlighted with an application of stochastic viability kernel computation, otherwise known as reliability kernel, in reliable design in the case of a corroded beam. Théorie de la viabilité Malédiction de la dimension Parallélisation Théorie de la fiabilité Programmation dynamique Systèmes environnementaux Viability theory Curse of dimensionality Parallel computing Reliability theory Dynamic programming Environmental systems
18	Sinbad Automation Of Scientific Process: From Hidden Factor Analysis To Theory Synthesis Kursun, Olcay 01 January 2004 (has links) Modern science is turning to progressively more complex and data-rich subjects, which challenges the existing methods of data analysis and interpretation. Consequently, there is a pressing need for development of ever more powerful methods of extracting order from complex data and for automation of all steps of the scientific process. Virtual Scientist is a set of computational procedures that automate the method of inductive inference to derive a theory from observational data dominated by nonlinear regularities. The procedures utilize SINBAD – a novel computational method of nonlinear factor analysis that is based on the principle of maximization of mutual information among non-overlapping sources (Imax), yielding higherorder features of the data that reveal hidden causal factors controlling the observed phenomena. One major advantage of this approach is that it is not dependent on a particular choice of learning algorithm to use for the computations. The procedures build a theory of the studied subject by finding inferentially useful hidden factors, learning interdependencies among its variables, reconstructing its functional organization, and describing it by a concise graph of inferential relations among its variables. The graph is a quantitative model of the studied subject, capable of performing elaborate deductive inferences and explaining behaviors of the observed variables by behaviors of other such variables and discovered hidden factors. The set of Virtual Scientist procedures is a powerful analytical and theory-building tool designed to be used in research of complex scientific problems characterized by multivariate and nonlinear relations. Bayesian networks Blind source separation Causal relations Concept acquisition Curse of dimensionality IMAX Knowledge representation Nonlinear factor analysis Virtual scientist Computer Sciences Engineering
19	Neue Indexingverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen Datenmengen Guhlemann, Steffen 08 April 2016 (has links) Ein zunehmend wichtiges Thema in der Informatik ist der Umgang mit Ähnlichkeit in einer großen Anzahl unterschiedlicher Domänen. Derzeit existiert keine universell verwendbare Infrastruktur für die Ähnlichkeitssuche in allgemeinen metrischen Räumen. Ziel der Arbeit ist es, die Grundlage für eine derartige Infrastruktur zu legen, die in klassische Datenbankmanagementsysteme integriert werden könnte. Im Rahmen einer Analyse des State of the Art wird der M-Baum als am besten geeignete Basisstruktur identifiziert. Dieser wird anschließend zum EM-Baum erweitert, wobei strukturelle Kompatibilität mit dem M-Baum erhalten wird. Die Abfragealgorithmen werden im Hinblick auf eine Minimierung notwendiger Distanzberechnungen optimiert. Aufbauend auf einer mathematischen Analyse der Beziehung zwischen Baumstruktur und Abfrageaufwand werden Freiheitsgrade in Baumänderungsalgorithmen genutzt, um Bäume so zu konstruieren, dass Ähnlichkeitsanfragen mit einer minimalen Anzahl an Anfrageoperationen beantwortet werden können. / A topic of growing importance in computer science is the handling of similarity in multiple heterogenous domains. Currently there is no common infrastructure to support this for the general metric space. The goal of this work is lay the foundation for such an infrastructure, which could be integrated into classical data base management systems. After some analysis of the state of the art the M-Tree is identified as most suitable base and enhanced in multiple ways to the EM-Tree retaining structural compatibility. The query algorithms are optimized to reduce the number of necessary distance calculations. On the basis of a mathematical analysis of the relation between the tree structure and the query performance degrees of freedom in the tree edit algorithms are used to build trees optimized for answering similarity queries using a minimal number of distance calculations. info:eu-repo/classification/ddc/004 ddc:004
20	Navigating the Metric Zoo: Towards a More Coherent Model For Quantitative Evaluation of Generative ML Models Dozier, Robbie 26 August 2022 (has links) No description available. Computer Science Artificial Intelligence Computer Science Artificial Intelligence Machine Learning Generative Models Generative Adversarial Networks GANs Variational Autoencoders VAEs Evaluation Metrics GAN Evaluation Computer Vision Typicality Information Theory DGMs Likelihood Curse of Dimensionality High Dimensional Probability Feature Embedding Typical Set

Search results