Global ETD Search

41	Efficient And Scalable Evaluation Of Continuous, Spatio-temporal Queries In Mobile Computing Environments Cazalas, Jonathan M 01 January 2012 (has links) A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatiotemporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. iv For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatiotemporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications Mobile computing continuous queries data streams spatio temporal queries spatio temporal data streams evaluation scalable range query knn gpu geds nvidia cuda Computer Sciences Engineering
42	Some Advanced Semiparametric Single-index Modeling for Spatially-Temporally Correlated Data Mahmoud, Hamdy F. F. 09 October 2014 (has links) Semiparametric modeling is a hybrid of the parametric and nonparametric modelings where some function forms are known and others are unknown. In this dissertation, we have made several contributions to semiparametric modeling based on the single index model related to the following three topics: the first is to propose a model for detecting change points simultaneously with estimating the unknown function; the second is to develop two models for spatially correlated data; and the third is to further develop two models for spatially-temporally correlated data. To address the first topic, we propose a unified approach in its ability to simultaneously estimate the nonlinear relationship and change points. We propose a single index change point model as our unified approach by adjusting for several other covariates. We nonparametrically estimate the unknown function using kernel smoothing and also provide a permutation based testing procedure to detect multiple change points. We show the asymptotic properties of the permutation testing based procedure. The advantage of our approach is demonstrated using the mortality data of Seoul, Korea from January, 2000 to December, 2007. On the second topic, we propose two semiparametric single index models for spatially correlated data. One additively separates the nonparametric function and spatially correlated random effects, while the other does not separate the nonparametric function and spatially correlated random effects. We estimate these two models using two algorithms based on Markov Chain Expectation Maximization algorithm. Our approaches are compared using simulations, suggesting that the semiparametric single index nonadditive model provides more accurate estimates of spatial correlation. The advantage of our approach is demonstrated using the mortality data of six cities, Korea from January, 2000 to December, 2007. The third topic involves proposing two semiparametric single index models for spatially and temporally correlated data. Our first model has the nonparametric function which can separate from spatially and temporally correlated random effects. We refer it to "semiparametric spatio-temporal separable single index model (SSTS-SIM)", while the second model does not separate the nonparametric function from spatially correlated random effects but separates the time random effects. We refer our second model to "semiparametric nonseparable single index model (SSTN-SIM)". Two algorithms based on Markov Chain Expectation Maximization algorithm are introduced to simultaneously estimate parameters, spatial effects, and times effects. The proposed models are then applied to the mortality data of six major cities in Korea. Our results suggest that SSTN-SIM is more flexible than SSTS-SIM because it can estimate various nonparametric functions while SSTS-SIM enforces the similar nonparametric curves. SSTN-SIM also provides better estimation and prediction. / Ph. D. Change Point Generalized Linear Model Generalized Additive Model Markov Chain Expectation Maximization Mixed model Permutation Test Semiparametric regression Single Index model Spatially correlated data Spatio-temporal data.
43	應用在空間認知發展的學習歷程分析之高效率空間探勘演算法 / Efficient Mining of Spatial Co-orientation Patterns for Analyzing Portfolios of Spatial Cognitive Development 魏綾音, WEI, LING-YIN Unknown Date (has links) 空間認知(Spatial Cognition)指出人所理解的空間複雜度，也就是人與環境互動的過程中，經由記憶與感官經驗，透過內化與重建產生物體在空間的關係認知。認知圖(Cognitive Map)是最常被使用在評估空間認知。分析學生所畫的認知圖有助於老師們瞭解學生的空間認知能力，進而擬定適當的地理教學設計。我們視空間認知發展的學習歷程檔案是由這些認知圖所構成。隨著數位學習科技的進步，我們可以透過探勘認知圖的方式，探討空間認知發展的學習歷程檔案。因此，我們藉由透過圖像的空間資料探勘，分析學生空間認知發展的學習歷程。空間資料探勘(Spatial Data Mining)主要是從空間資料庫或圖像資料庫中找出有趣且有意義的樣式。在論文中，我們介紹一種空間樣式(Spatial Co-orientation Pattern)探勘以提供空間認知發展學習歷程的分析。Spatial Co-orientation Pattern是指圖像資料庫中，具有共同相對方向關係的物體(Object)常一起出現。例如，我們可以從圖像資料庫中發現物體P常出現在物體Q的左邊，我們利用二維字串(2D String)來表示物體分佈在圖像中的空間方向關係。我們透過Pattern-growth的方法探勘此種空間樣式，藉由實驗結果呈現Pattern-growth的方法與過去Apriori-based的方法[14]之優缺點。我們延伸Spatial Co-orientation Pattern的概念至時空資料庫(Spatio-temporal Database)，提出從時空資料庫中，探勘Temporal Co-orientation Pattern。Temporal Co-orientation Pattern是指Spatial Co-orientation Pattern隨著時間的變化。論文中，我們提出兩種此類樣式，即是Coarse Temporal Co-orientation Pattern與Fine Temporal Co-orientation Pattern。針對此兩種樣式，我們提出三階段(three-stage)演算法，透過實驗分析演算法的效率。 / Spatial cognition means how human interpret spatial complexity. Cognitive maps are mostly used to test the spatial cognition. Analyzing cognitive maps drawn by students is helpful for teachers to understand students’ spatial cognitive ability and to draft geography teaching plans. Cognitive maps constitute the portfolios of spatial cognitive development. With the advance of e-learning technology, we can analyze portfolios of spatial cognitive development by spatial data mining of cognitive images. Therefore, we can analyze portfolios of spatial cognitive development by spatial data mining of images. Spatial data mining is an important task to discover interesting and meaningful patterns from spatial or image databases. In this thesis, we investigate the spatial co-orientation patterns for analyzing portfolios of spatial cognitive development. Spatial co-orientation patterns refer to objects that frequently occur with the same spatial orientation, e.g. left, right, below, etc., among images. For example, an object P is frequently left to an object Q among images. We utilize the data structure, 2D string, to represent the spatial orientation of objects. We propose the pattern-growth approach for mining co-orientation patterns. An experimental evaluation with synthetic datasets shows the advantages and disadvantages between pattern-growth approach and Apriori-based approach proposed by Huang [14]. Moreover, we extend the concept of spatial co-orientation pattern to that of temporal patterns. Temporal co-orientation patterns refer to the change of spatial co-orientation patterns over time. Two temporal patterns, the coarse temporal co-orientation patterns and fine temporal co-orientation patterns are introduced to be extracted from spatio-temporal databases. We propose the three-stage algorithms, CTPMiner and FTPMiner, for mining coarse and fine temporal co-orientation patterns, respectively. An experimental evaluation with synthetic datasets shows the performance of these algorithms. 空間認知認知圖空間資料探勘時空資料探勘 Spatial Cognition Cognitive Map Spatial Data Mining Spatio-temporal Data Mining
44	Les figures de la discontinuité dans le développement résidentiel périurbain : application à la région Limousin. / Discontinuous urban patterns of peri-urban residential development. : application to the Limousin region Reux, Sara 16 January 2015 (has links) Alors que la continuité du bâti ne suffit plus pour appréhender l’espace urbain d’aujourd’hui,la discontinuité du tissu urbain est devenue une clé de compréhension de la ville contemporaine et de sonprocessus de formation. Elle suscite l'intérêt des chercheurs, d'autant plus que le déploiement des systèmesd'information géographique offre de nouvelles perspectives de mesure des formes urbaines. Mais, si lestravaux en écologie du paysage ou en géographie permettent de mesurer l'émergence de ces formesdiscontinues, il nous semble important de nous intéresser aux fondements économiques de l'urbanisationdiscontinue qui commencent à faire l’objet de travaux empiriques en économie. La constitution d’une grillede lecture de l’urbanisation discontinue nous permet de comprendre de manière concomitante la formationdes espaces périurbains et les formes de développement de l’habitat à l’échelle parcellaire. Cette rechercheest appliquée au Limousin sur la période 1950-2009. Le prisme de la discontinuité nous apporte un éclairagesur les trajectoires de développement résidentiel des communes de cette région. La construction d’une basede données spatio-temporelles nous offre la possibilité de lire ces trajectoires à partir de l’association demesures de dispersion géographique et de dispersion morphologique de l’habitat. À partir de ces mesuresde dispersion, nous abordons l’articulation des logiques fonctionnelles et morphologiques du développementrésidentiel grâce à la construction d’une base de données multithématiques. Pour comprendre les schémasde localisation des ménages, nous analysons plus particulièrement les problématiques de la production deslogements, de l’interaction entre structure foncière et régulation publique à l’échelle des communes et del’influence des aménités et désaménités des espaces urbains et ruraux sur la dispersion de l’habitat. / While understanding urban areas through continuity of developed land reached its limits,discontinuity of urban fabrics has become a key to understand today's cities and their shaping dynamics. Itraises researchers’ interest especially as GIS development gives new opportunities to measure urbanpatterns. While researches in landscape ecology or geography allow to measure discontinuous patterns, itseems to be important to focus on their economic foundations which are a matter for recent empiricalresearches in economy. The construction of an analytical grid of discontinuous urban patterns allows tounderstand simultaneously peri-urban development and patterns of residential development at the parcellevel. This research is applied to the Limousin region on the 1950-2009 period. The focus on discontinuousurban patterns sheds light on residential trajectories of the Limousin region's communes. The proposal of aspatio-temporal data base allows to understand these trajectories through combined measures of geographical dispersion and morphological dispersion. With these measures, we broach the link betweenfunctional and morphological dynamics thanks to a multitheme data base. To understand household locationand residential dispersion, we analyze the issue of housing production, the interaction between property andpublic regulation at the scale of communes, the influence of amenities and desamenities of urban and ruralspaces Urbanisation discontinue Étalement urbain Demande résidentielle Régulation publique Offre résidentielle Trajectoires résidentielles Données spatio-temporelless Limousin Discountinuous urban patterns Sprawl Residential demand Public policies Residential supply Residential trajectories Spatio-temporal data Limousin 333.1
45	Indexování dat pohybujících se objektů / Moving Objects Indexing Křížová, Martina January 2010 (has links) This thesis deals with indexing of spatio-temporal data. It describes existing approaches to indexing data and support for indexing in Oracle Database 11g. The aim of this work is to design structures of databases for storing spatio-temporal data over Oracle Database 11g to propose experiments for these databases. Ways of spatio-temporal data storage are evaluated according to these experiments in terms of time demands of queries and appropriateness of using available indexing structure and spatial operators.
46	Leveraging formal concept analysis and pattern mining for moving object trajectory analysis / Exploitation de l'analyse formelle de concepts et de l'extraction de motifs pour l'analyse de trajectoires d'objets mobiles Almuhisen, Feda 10 December 2018 (has links) Cette thèse présente un cadre de travail d'analyse de trajectoires contenant une phase de prétraitement et un processus d’extraction de trajectoires d’objets mobiles. Le cadre offre des fonctions visuelles reflétant le comportement d'évolution des motifs de trajectoires. L'originalité de l’approche est d’allier extraction de motifs fréquents, extraction de motifs émergents et analyse formelle de concepts pour analyser les trajectoires. A partir des données de trajectoires, les méthodes proposées détectent et caractérisent les comportements d'évolution des motifs. Trois contributions sont proposées : Une méthode d'analyse des trajectoires, basée sur les concepts formels fréquents, est utilisée pour détecter les différents comportements d’évolution de trajectoires dans le temps. Ces comportements sont “latents”, "emerging", "decreasing", "lost" et "jumping". Ils caractérisent la dynamique de la mobilité par rapport à l'espace urbain et le temps. Les comportements détectés sont visualisés sur des cartes générées automatiquement à différents niveaux spatio-temporels pour affiner l'analyse de la mobilité dans une zone donnée de la ville. Une deuxième méthode basée sur l'extraction de concepts formels séquentiels fréquents a également été proposée pour exploiter la direction des mouvements dans la détection de l'évolution. Enfin, une méthode de prédiction basée sur les chaînes de Markov est présentée pour prévoir le comportement d’évolution dans la future période pour une région. Ces trois méthodes sont évaluées sur ensembles de données réelles . Les résultats expérimentaux obtenus sur ces données valident la pertinence de la proposition et l'utilité des cartes produites / This dissertation presents a trajectory analysis framework, which includes both a preprocessing phase and trajectory mining process. Furthermore, the framework offers visual functions that reflect trajectory patterns evolution behavior. The originality of the mining process is to leverage frequent emergent pattern mining and formal concept analysis for moving objects trajectories. These methods detect and characterize pattern evolution behaviors bound to time in trajectory data. Three contributions are proposed: (1) a method for analyzing trajectories based on frequent formal concepts is used to detect different trajectory patterns evolution over time. These behaviors are "latent", "emerging", "decreasing", "lost" and "jumping". They characterize the dynamics of mobility related to urban spaces and time. The detected behaviors are automatically visualized on generated maps with different spatio-temporal levels to refine the analysis of mobility in a given area of the city, (2) a second trajectory analysis framework that is based on sequential concept lattice extraction is also proposed to exploit the movement direction in the evolution detection process, and (3) prediction method based on Markov chain is presented to predict the evolution behavior in the future period for a region. These three methods are evaluated on two real-world datasets. The obtained experimental results from these data show the relevance of the proposal and the utility of the generated maps Trajectoires Données spatio-Temporelles Motifs fréquents Motifs séquentiels Motifs émergents Concepts formels Treillis de concepts Comportement Visualisation Prédiction Modèle de Markov Trajectories Spatio-Temporal data Frequent itemsets Sequential patterns Emerging patterns Formal concepts Concept lattice Behavior Visualization Prediction Markov model.
47	Získávání znalostí z objektově relačních databází / Knowledge Discovery in Object Relational Databases Chytka, Karel Unknown Date (has links) The goal of this master's thesis is to acquaint with a problem of a knowledge discovery and objectrelational data classification. It summarizes problems which are connected with mining spatiotemporal data. There is described data mining kernel algorithm SVM. The second part solves classification method implementation. This method solves data mining in a Caretaker trajectory database. This thesis contains application's implementation for spatio-temporal data preprocessing, their organization in database and presentation too.
48	Databáze pohybujících se objektů / The Database of Moving Objects Vališ, Jaroslav January 2008 (has links) This work treats the representation of moving objects and operations over these objects. Introduces the support for spatio-temporal data in Oracle Database 10g and presents two designs of moving objects database structure. Upon these designs a database was implemented using the user-defined data types. Sample application provides a graphical output of stored spatial data and allows us to call an implemented spatio-temporal operations. Finally, an evaluation of achieved results is done and possible extensions of project are discussed.
49	Co-evolution pattern mining in dynamic attributed graphs / Fouille de motifs de co-evolution dans des graphes dynamiques attribués Desmier, Elise 15 July 2014 (has links) Cette thèse s'est déroulée dans le cadre du projet ANR FOSTER, "FOuille de données Spatio-Temporelles : application à la compréhension et à la surveillance de l'ERosion" (ANR-2010-COSI-012-02, 2011-2014). Dans ce contexte, nous nous sommes intéressés à la modélisation de données spatio-temporelles dans des graphes enrichis de sorte que des calculs de motifs sur de telles données permettent de formuler des hypothèses intéressantes sur les phénomènes à comprendre. Plus précisément, nous travaillons sur la fouille de motifs dans des graphes relationnels (chaque noeud est identifié de fa\c con unique), attribués (chaque noeud du graphe est décrit par des attributs qui sont ici numériques), et dynamiques (les valeurs des attributs et les relations entre les noeuds peuvent évoluer dans le temps). Nous proposons un nouveau domaine de motifs nommé motifs de co-évolution. Ce sont des triplets d'ensembles de noeuds, d'ensembles de pas de temps et d'ensembles d'attributs signés, c'est à dire des attributs associés à une tendance (croissance,décroissance). L'intérêt de ces motifs est de décrire un sous-ensemble des données qui possède un comportement spécifique et a priori intéressant pour conduire des analyses non triviales. Dans ce but, nous définissons deux types de contraintes, une contrainte sur la structure du graphe et une contrainte sur la co-évolution de la valeur des attributs portés par les noeuds. Pour confirmer la spécificité du motif par rapport au reste des données, nous définissons trois mesures de densité qui tendent à répondre à trois questions. À quel point le comportement des noeuds en dehors du motif est similaire à celui des noeuds du motif ? Quel est le comportement du motif dans le temps, est-ce qu'il apparaît soudainement ? Est-ce que les noeuds du motif ont un comportement similaire seulement sur les attributs du motif ou aussi en dehors ? Nous proposons l'utilisation d'une hiérarchie sur les attributs comme connaissance à priori de l'utilisateur afin d'obtenir des motifs plus généraux et adaptons l'ensemble des contraintes à l'utilisation de cette hiérarchie. Finalement, pour simplifier l'utilisation de l'algorithme par l'utilisateur en réduisant le nombre de seuils à fixer et pour extraire uniquement l'ensemble des motifs les plus intéressants, nous utilisons le concept de ``skyline'' réintroduit récemment dans le domaine de la fouille de données. Nous proposons ainsi trois algorithmes MINTAG, H-MINTAG et Sky-H-MINTAG qui sont complets pour extraire l'ensemble de tous les motifs qui respectent les différentes contraintes. L'étude des propriétés des contraintes (anti-monotonie, monotonie/anti-monotonie par parties) nous permet de les pousser efficacement dans les algorithmes proposés et d'obtenir ainsi des extractions sur des données réelles dans des temps raisonnables. / This thesis was conducted within the project ANR FOSTER, ``Spatio-Temporal Data Mining: application to the understanding and monitoring of erosion'' (ANR-2010-COSI-012-02, 2011-2014). In this context, we are interested in the modeling of spatio- temporal data in enriched graphs so that computation of patterns on such data can be used to formulate interesting hypotheses about phenomena to understand. Specifically, we are working on pattern mining in relational graphs (each vertex is uniquely identified), attributed (each vertex of the graph is described by numerical attributes) and dynamic (attribute values and relations between vertices may change over time). We propose a new pattern domain that has been called co-evolution patterns. These are trisets of vertices, times and signed attributes, i.e., attributes associated with a trend (increasing or decreasing). The interest of these patterns is to describe a subset of the data that has a specific behaviour and a priori interesting to conduct non-trivial analysis. For this purpose, we define two types of constraints, a constraint on the structure of the graph and a constraint on the co-evolution of the value worn by vertices attributes. To confirm the specificity of the pattern with regard to the rest of the data, we define three measures of density that tend to answer to three questions. How similar is the behaviour of the vertices outside the co-evolution pattern to the ones inside it? What is the behaviour of the pattern over time, does it appear suddenly? Does the vertices of the pattern behave similarly only on the attributes of the pattern or even outside? We propose the use of a hierarchy of attributes as an a priori knowledge of the user to obtain more general patterns and we adapt the set of constraints to the use of this hierarchy. Finally, to simplify the use of the algorithm by the user by reducing the number of thresholds to be set and to extract only all the most interesting patterns, we use the concept of ``skyline'' reintroduced recently in the domain of data mining. We propose three constraint-based algorithms, called MINTAG, H-MINTAG and Sky-H-MINTAG, that are complete to extract the set of all patterns that meet the different constraints. These algorithms are based on constraints, i.e., they use the anti-monotonicity and piecewise monotonicity/anti-monotonicity properties to prune the search space and make the computation feasible in practical contexts. To validate our method, we experiment on several sets of data (graphs) created from real-world data. Informatique Fouille de données Fouille sous contrainte Données spatio-Temporelles Graphes dynamiques attribués Motifs de co-Evolution Mesures d'intérêt Analyse skyline Information Technology Data mining Constraint-Based mining Spatio-Temporal data Dynamic attributed graphs Co-Evolution patterns Interestingness measures Skyline analysis 006.310 72
50	Non-Parametric Clustering of Multivariate Count Data Tekumalla, Lavanya Sita January 2017 (has links) (PDF) The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters. As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios. This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain. As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora. Multivariate Count Data Clustering Mixture Models Non-parametric Clustering Bulk Cache Preloading Dirichlet Process Mixture Models Spatio-Temporal Data Aggregation Sparse Multivariate Poisson MultiVariate Poisson (MVP) Copulas Nested Hierarchical Dirichlet Processes Dirichlet Process Mixtures Sparse-Multivariate Poisson Dirichlet Process Mixture Model Computer Science

Search results