Global ETD Search

11	Distribution-based Exploration and Visualization of Large-scale Vector and Multivariate Fields Lu, Kewei 08 August 2017 (has links) No description available. Computer Science Computer Engineering Large Data Analysis Visualization Query Driven Visualization Parallel Computing
12	Multi-tree Monte Carlo methods for fast, scalable machine learning Holmes, Michael P. 09 January 2009 (has links) As modern applications of machine learning and data mining are forced to deal with ever more massive quantities of data, practitioners quickly run into difficulty with the scalability of even the most basic and fundamental methods. We propose to provide scalability through a marriage between classical, empirical-style Monte Carlo approximation and deterministic multi-tree techniques. This union entails a critical compromise: losing determinism in order to gain speed. In the face of large-scale data, such a compromise is arguably often not only the right but the only choice. We refer to this new approximation methodology as Multi-Tree Monte Carlo. In particular, we have developed the following fast approximation methods: 1. Fast training for kernel conditional density estimation, showing speedups as high as 10⁵ on up to 1 million points. 2. Fast training for general kernel estimators (kernel density estimation, kernel regression, etc.), showing speedups as high as 10⁶ on tens of millions of points. 3. Fast singular value decomposition, showing speedups as high as 10⁵ on matrices containing billions of entries. The level of acceleration we have shown represents improvement over the prior state of the art by several orders of magnitude. Such improvement entails a qualitative shift, a commoditization, that opens doors to new applications and methods that were previously invisible, outside the realm of practicality. Further, we show how these particular approximation methods can be unified in a Multi-Tree Monte Carlo meta-algorithm which lends itself as scaffolding to the further development of new fast approximation methods. Thus, our contribution includes not just the particular algorithms we have derived but also the Multi-Tree Monte Carlo methodological framework, which we hope will lead to many more fast algorithms that can provide the kind of scalability we have shown here to other important methods from machine learning and related fields. Machine learning SVD Scalable Monte Carlo Kernel estimators Large data Monte Carlo method Trees Development Data processing Algorithms Computer algorithms
13	Bayesian Inference in Large Data Problems Quiroz, Matias January 2015 (has links) In the last decade or so, there has been a dramatic increase in storage facilities and the possibility of processing huge amounts of data. This has made large high-quality data sets widely accessible for practitioners. This technology innovation seriously challenges traditional modeling and inference methodology. This thesis is devoted to developing inference and modeling tools to handle large data sets. Four included papers treat various important aspects of this topic, with a special emphasis on Bayesian inference by scalable Markov Chain Monte Carlo (MCMC) methods. In the first paper, we propose a novel mixture-of-experts model for longitudinal data. The model and inference methodology allows for manageable computations with a large number of subjects. The model dramatically improves the out-of-sample predictive density forecasts compared to existing models. The second paper aims at developing a scalable MCMC algorithm. Ideas from the survey sampling literature are used to estimate the likelihood on a random subset of data. The likelihood estimate is used within the pseudomarginal MCMC framework and we develop a theoretical framework for such algorithms based on subsets of the data. The third paper further develops the ideas introduced in the second paper. We introduce the difference estimator in this framework and modify the methods for estimating the likelihood on a random subset of data. This results in scalable inference for a wider class of models. Finally, the fourth paper brings the survey sampling tools for estimating the likelihood developed in the thesis into the delayed acceptance MCMC framework. We compare to an existing approach in the literature and document promising results for our algorithm. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Submitted. Paper 2: Submitted. Paper 3: Manuscript. Paper 4: Manuscript.</p> Bayesian inference Large data sets Markov chain Monte Carlo Survey sampling Pseudo-marginal MCMC Delayed acceptance MCMC
14	Financial crisis forecasts and applications to systematic trading strategies / Indicateurs de crises financières et applications aux stratégies de trading algorithmique Kornprobst, Antoine 23 October 2017 (has links) Cette thèse, constituée de trois papiers de recherche, est organisée autour de la construction d’indicateurs de crises financières dont les signaux sont ensuite utilisés pour l’élaboration de stratégies de trading algorithmique. Le premier papier traite de l’établissement d’un cadre de travail permettant la construction des indicateurs de crises financière. Le pouvoir de prédiction de nos indicateurs est ensuite démontré en utilisant l’un d’eux pour construire une stratégie de type protective-put active qui est capable de faire mieux en termes de performances qu’une stratégie passive ou, la plupart du temps, que de multiples réalisations d’une stratégie aléatoire. Le second papier va plus loin dans l’application de nos indicateurs de crises à la création de stratégies de trading algorithmique en utilisant le signal combiné d’un grand nombre de nos indicateurs pour gouverner la composition d’un portefeuille constitué d’un mélange de cash et de titres d’un ETF répliquant un indice equity comme le SP500. Enfin, dans le troisième papier, nous construisons des indicateurs de crises financières en utilisant une approche complètement différente. En étudiant l’évolution dynamique de la distribution des spreads des composantes d’un indice CDS tel que l’ITRAXXX Europe 125, une bande de Bollinger est construite autour de la fonction de répartition de la distribution empirique des spreads, exprimée sur une base de deux distributions log-normales choisies à l’avance. Le passage par la fonction de répartition empirique de la frontière haute ou de la frontière basse de cette bande de Bollinger est interprétée en termes de risque et permet de produire un signal de trading. / This thesis is constituted of three research papers and is articulated around the construction of financial crisis indicators, which produce signals, which are then applied to devise successful systematic trading strategies. The first paper deals with the establishment of a framework for the construction of our financial crisis indicators. Their predictive power is then demonstrated by using one of them to build an active protective-put strategy, which is able to beat in terms of performance a passive strategy as well as, most of the time, multiple paths of a random strategy. The second paper goes further in the application of our financial crisis indicators to the elaboration of systematic treading strategies by using the aggregated signal produce by many of our indicators to govern a portfolio constituted of a mix of cash and ETF shares, replicating an equity index like the SP500. Finally, in the third paper, we build financial crisis indicators by using a completely different approach. By studying the dynamics of the evolution of the distribution of the spreads of the components of a CDS index like the ITRAXX Europe 125, a Bollinger band is built around the empirical cumulative distribution function of the distribution of the spreads, fitted on a basis constituted of two lognormal distributions, which have been chosen beforehand. The crossing by the empirical cumulative distribution function of either the upper or lower boundary of this Bollinger band is then interpreted in terms of risk and enables us to construct a trading signal. Crises financières Économétrie Indicateurs Quantitative finance Econometrics Simulation methods Forecasting Large data sets Financial crises Random matrix theory 519
15	Otimização computacional e estudo comparativo das técnicas de extração de conhecimento de grandes repositórios de dados. / Comparative study of techniques for extracting knowledge from large data repository. Fernando Luiz Coelho Senra 16 September 2009 (has links) Ao se realizar estudo em qualquer área do conhecimento, quanto mais dados se dispuser, maior a dificuldade de se extrair conhecimento útil deste banco de dados. A finalidade deste trabalho é apresentar algumas ferramentas ditas inteligentes, de extração de conhecimento destes grandes repositórios de dados. Apesar de ter várias conotações, neste trabalho, irá se entender extração de conhecimento dos repositórios de dados a ocorrência combinada de alguns dados com freqüência e confiabilidade que se consideram interessantes, ou seja, na medida e que determinado dado ou conjunto de dados aparece no repositório de dados, em freqüência considerada razoável, outro dado ou conjunto de dados irá aparecer. Executada sobre repositórios de dados referentes a informações georreferenciadas dos alunos da UERJ (Universidade do Estado do Rio de Janeiro), irá se analisar os resultados de duas ferramentas de extração de dados, bem como apresentar possibilidades de otimização computacional destas ferramentas. / Comparative Study of Techniques for Extracting knowledge from large data repositories. When conducting the study in any field of knowledge, the more data is available, the greater the difficulty in extracting useful knowledge from this database. The purpose of this paper is to present some tools called intelligent, knowledge extraction of these large data repositories. Although many connotations, this work will understand knowledge extraction from data repositories on the combined occurrence of some data with frequency and reliability that are considered interesting, ie, the extent and specific data or data set appears in the data, at a rate deemed reasonable, other data or data set will appear. Runs on repositories of data on georeferenced data of students UERJ (Universidade do Estado do Rio de Janeiro), will analyze the results of two tools to extract data and present opportunities for optimization of these computational tools. Engenharia da Computação Apriori Fuzzy Regras de Associação Lógica Nebulosa Computer Engineering Large data repositories Extracting knowledge Fuzzy system ENGENHARIAS
16	Otimização computacional e estudo comparativo das técnicas de extração de conhecimento de grandes repositórios de dados. / Comparative study of techniques for extracting knowledge from large data repository. Fernando Luiz Coelho Senra 16 September 2009 (has links) Ao se realizar estudo em qualquer área do conhecimento, quanto mais dados se dispuser, maior a dificuldade de se extrair conhecimento útil deste banco de dados. A finalidade deste trabalho é apresentar algumas ferramentas ditas inteligentes, de extração de conhecimento destes grandes repositórios de dados. Apesar de ter várias conotações, neste trabalho, irá se entender extração de conhecimento dos repositórios de dados a ocorrência combinada de alguns dados com freqüência e confiabilidade que se consideram interessantes, ou seja, na medida e que determinado dado ou conjunto de dados aparece no repositório de dados, em freqüência considerada razoável, outro dado ou conjunto de dados irá aparecer. Executada sobre repositórios de dados referentes a informações georreferenciadas dos alunos da UERJ (Universidade do Estado do Rio de Janeiro), irá se analisar os resultados de duas ferramentas de extração de dados, bem como apresentar possibilidades de otimização computacional destas ferramentas. / Comparative Study of Techniques for Extracting knowledge from large data repositories. When conducting the study in any field of knowledge, the more data is available, the greater the difficulty in extracting useful knowledge from this database. The purpose of this paper is to present some tools called intelligent, knowledge extraction of these large data repositories. Although many connotations, this work will understand knowledge extraction from data repositories on the combined occurrence of some data with frequency and reliability that are considered interesting, ie, the extent and specific data or data set appears in the data, at a rate deemed reasonable, other data or data set will appear. Runs on repositories of data on georeferenced data of students UERJ (Universidade do Estado do Rio de Janeiro), will analyze the results of two tools to extract data and present opportunities for optimization of these computational tools. Engenharia da Computação Apriori Fuzzy Regras de Associação Lógica Nebulosa Computer Engineering Large data repositories Extracting knowledge Fuzzy system ENGENHARIAS
17	PARALLEL 3D IMAGE SEGMENTATION BY GPU-AMENABLE LEVEL SET SOLUTION Hagan, Aaron M. 17 June 2009 (has links) No description available. Computer Science 3D Image Segmentation Large Data Set Level Set Graphics Hardware Cluster Computing Lattice Boltzmann Method
18	Data Summarization for Large Time-varying Flow Visualization and Analysis Chen, Chun-Ming 29 December 2016 (has links) No description available. Computer Science
19	A multiresolutional approach for large data visualization Wang, Chaoli 30 November 2006 (has links) No description available. Computer Science large data sets multiresolution volume visualization parallel rendering level-of-detail (LOD) image-based quality metric LOD map
20	Using Statistical Methods to Determine Geolocation Via Twitter Wright, Christopher M. 01 May 2014 (has links) With the ever expanding usage of social media websites such as Twitter, it is possible to use statistical inquires to form a geographic location of a person using solely the content of their tweets. According to a study done in 2010, Zhiyuan Cheng, was able to detect a location of a Twitter user within 100 miles of their actual location 51% of the time. While this may seem like an already significant find, this study was done while Twitter was still finding its ground to stand on. In 2010, Twitter had 75 million unique users registered, as of March 2013, Twitter has around 500 million unique users. In this thesis, my own dataset was collected and using Excel macros, a comparison of my results to that of Cheng’s will see if the results have changed over the three years since his study. If found to be that Cheng’s 51% can be shown more efficiently using a simpler methodology, this could have a significant impact on Homeland Security and cyber security measures. Large Data Set Linguistics Big Data Analysis Human Intelligence Twitter Facebook Social Networking Software Engineering Internet Communication Technology and New Media OS and Networks Software Engineering Theory and Algorithms

Search results