Global ETD Search

1	Multivariate Charts for Multivariate Poisson-Distributed Data January 2010 (has links) abstract: There has been much research involving simultaneous monitoring of several correlated quality characteristics that rely on the assumptions of multivariate normality and independence. In real world applications, these assumptions are not always met, particularly when small counts are of interest. In general, the use of normal approximation to the Poisson distribution seems to be justified when the Poisson means are large enough. A new two-sided Multivariate Poisson Exponentially Weighted Moving Average (MPEWMA) control chart is proposed, and the control limits are directly derived from the multivariate Poisson distribution. The MPEWMA and the conventional Multivariate Exponentially Weighted Moving Average (MEWMA) charts are evaluated by using the multivariate Poisson framework. The MPEWMA chart outperforms the MEWMA with the normal-theory limits in terms of the in-control average run lengths. An extension study of the two-sided MPEWMA to a one-sided version is performed; this is useful for detecting an increase in the count means. The results of comparison with the one-sided MEWMA chart are quite similar to the two-sided case. The implementation of the MPEWMA scheme for multiple count data is illustrated, with step by step guidelines and several examples. In addition, the method is compared to other model-based control charts that are used to monitor the residual values such as the regression adjustment. The MPEWMA scheme shows better performance on detecting the mean shift in count data when positive correlation exists among all variables. / Dissertation/Thesis / Ph.D. Industrial Engineering 2010 Industrial Engineering MEWMA chart Multivariate Poisson Distribution
2	Multivariate Poisson hidden Markov models for analysis of spatial counts Karunanayake, Chandima Piyadharshani 08 June 2007 Multivariate count data are found in a variety of fields. For modeling such data, one may consider the multivariate Poisson distribution. Overdispersion is a problem when modeling the data with the multivariate Poisson distribution. Therefore, in this thesis we propose a new multivariate Poisson hidden Markov model based on the extension of independent multivariate Poisson finite mixture models, as a solution to this problem. This model, which can take into account the spatial nature of weed counts, is applied to weed species counts in an agricultural field. The distribution of counts depends on the underlying sequence of states, which are unobserved or hidden. These hidden states represent the regions where weed counts are relatively homogeneous. Analysis of these data involves the estimation of the number of hidden states, Poisson means and covariances. Parameter estimation is done using a modified EM algorithm for maximum likelihood estimation. <p>We extend the univariate Markov-dependent Poisson finite mixture model to the multivariate Poisson case (bivariate and trivariate) to model counts of two or three species. Also, we contribute to the hidden Markov model research area by developing Splus/R codes for the analysis of the multivariate Poisson hidden Markov model. Splus/R codes are written for the estimation of multivariate Poisson hidden Markov model using the EM algorithm and the forward-backward procedure and the bootstrap estimation of standard errors. The estimated parameters are used to calculate the goodness of fit measures of the models.<p>Results suggest that the multivariate Poisson hidden Markov model, with five states and an independent covariance structure, gives a reasonable fit to this dataset. Since this model deals with overdispersion and spatial information, it will help to get an insight about weed distribution for herbicide applications. This model may lead researchers to find other factors such as soil moisture, fertilizer level, etc., to determine the states, which govern the distribution of the weed counts. EM algorithm multivariate Poisson hidden Markov model Weed species counts Multivariate Poisson distribution
3	Multivariate Poisson hidden Markov models for analysis of spatial counts Karunanayake, Chandima Piyadharshani 08 June 2007 (has links) Multivariate count data are found in a variety of fields. For modeling such data, one may consider the multivariate Poisson distribution. Overdispersion is a problem when modeling the data with the multivariate Poisson distribution. Therefore, in this thesis we propose a new multivariate Poisson hidden Markov model based on the extension of independent multivariate Poisson finite mixture models, as a solution to this problem. This model, which can take into account the spatial nature of weed counts, is applied to weed species counts in an agricultural field. The distribution of counts depends on the underlying sequence of states, which are unobserved or hidden. These hidden states represent the regions where weed counts are relatively homogeneous. Analysis of these data involves the estimation of the number of hidden states, Poisson means and covariances. Parameter estimation is done using a modified EM algorithm for maximum likelihood estimation. <p>We extend the univariate Markov-dependent Poisson finite mixture model to the multivariate Poisson case (bivariate and trivariate) to model counts of two or three species. Also, we contribute to the hidden Markov model research area by developing Splus/R codes for the analysis of the multivariate Poisson hidden Markov model. Splus/R codes are written for the estimation of multivariate Poisson hidden Markov model using the EM algorithm and the forward-backward procedure and the bootstrap estimation of standard errors. The estimated parameters are used to calculate the goodness of fit measures of the models.<p>Results suggest that the multivariate Poisson hidden Markov model, with five states and an independent covariance structure, gives a reasonable fit to this dataset. Since this model deals with overdispersion and spatial information, it will help to get an insight about weed distribution for herbicide applications. This model may lead researchers to find other factors such as soil moisture, fertilizer level, etc., to determine the states, which govern the distribution of the weed counts. EM algorithm multivariate Poisson hidden Markov model Weed species counts Multivariate Poisson distribution
4	Developing accident-speed relationships using a new modelling approach Imprialou, Maria-Ioanna January 2015 (has links) Changing speed limit leads to proportional changes in average speeds which may affect the number of traffic accident occurrences. It is however critical and challenging to evaluate the impact of a speed limit alteration on the number and severity of accidents due primarily to the unavailability of adequate data and the inherent limitations of existing approaches. Although speed is regarded as one of the main contributory factors in traffic accident occurrences, research findings are inconsistent. Independent of the robustness of their statistical approaches, accident frequency models typically use accident grouping concepts based on spatial criteria (e.g. accident counts by link termed as a link-based approach). In the link-based approach, the variability of accidents is explained by highly aggregated average measures of explanatory variables that may be inappropriate, especially for time-varying variables such as speed and volume. This thesis re-examines accident-speed relationships by developing a new accident data aggregation method that enables improved representation of the road conditions just before accident occurrences in order to evaluate the impact of a potential speed limit increase on the UK motorways (e.g. from 70 mph to 80 mph). In this work, accidents are aggregated according to the similarity of their pre-accident traffic and geometric conditions, forming an alternative accident count dataset termed as the condition-based approach. Accident-speed relationships are separately developed and compared for both approaches (i.e. link-based and condition-based) by employing the reported annual accidents that occurred on the Strategic Road Network of England in 2012 along with traffic and geometric variables. Accident locations were refined using a fuzzy-logic-based algorithm designed for the study area with 98.9% estimated accuracy. The datasets were modelled by injury severity (i.e. fatal and serious or slight) and by number of vehicles involved (i.e. single-vehicle and multiple-vehicle) using the multivariate Poisson lognormal regression, with spatial effects for the link-based model under a full Bayesian inference method. The results of the condition-based models imply that single-vehicle accidents of all severities and multiple-vehicle accidents with fatal or serious injuries increase at higher speed conditions, particularly when these are combined with lower volumes. Multiple-vehicle slight injury accidents were not found to be related with higher speeds, but instead with congested traffic. The outcomes of the link-based model were almost the opposite; suggesting that the speed-accident relationship is negative. The differences between the results reveal that data aggregation may be crucial, yet so far overlooked in the methodological aspect of accident data analyses. By employing the speed elasticity of motorway accidents that was derived from the calibrated condition-based models it has been found that a 10 mph increase in UK motorway speed limit (i.e. from 70 mph to 80 mph) would result in a 6-12% increase in fatal and serious injury accidents and 1-3% increase in slight injury accidents. 388.3
5	Probabilistic Modeling of Multi-relational and Multivariate Discrete Data Wu, Hao 07 February 2017 (has links) Modeling and discovering knowledge from multi-relational and multivariate discrete data is a crucial task that arises in many research and application domains, e.g. text mining, intelligence analysis, epidemiology, social science, etc. In this dissertation, we study and address three problems involving the modeling of multi-relational discrete data and multivariate multi-response count data, viz. (1) discovering surprising patterns from multi-relational data, (2) constructing a generative model for multivariate categorical data, and (3) simultaneously modeling multivariate multi-response count data and estimating covariance structures between multiple responses. To discover surprising multi-relational patterns, we first study the ``where do I start?'' problem originating from intelligence analysis. By studying nine methods with origins in association analysis, graph metrics, and probabilistic modeling, we identify several classes of algorithmic strategies that can supply starting points to analysts, and thus help to discover interesting multi-relational patterns from datasets. To actually mine for interesting multi-relational patterns, we represent the multi-relational patterns as dense and well-connected chains of biclusters over multiple relations, and model the discrete data by the maximum entropy principle, such that in a statistically well-founded way we can gauge the surprisingness of a discovered bicluster chain with respect to what we already know. We design an algorithm for approximating the most informative multi-relational patterns, and provide strategies to incrementally organize discovered patterns into the background model. We illustrate how our method is adept at discovering the hidden plot in multiple synthetic and real-world intelligence analysis datasets. Our approach naturally generalizes traditional attribute-based maximum entropy models for single relations, and further supports iterative, human-in-the-loop, knowledge discovery. To build a generative model for multivariate categorical data, we apply the maximum entropy principle to propose a categorical maximum entropy model such that in a statistically well-founded way we can optimally use given prior information about the data, and are unbiased otherwise. Generally, inferring the maximum entropy model could be infeasible in practice. Here, we leverage the structure of the categorical data space to design an efficient model inference algorithm to estimate the categorical maximum entropy model, and we demonstrate how the proposed model is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application. Modeling data with multivariate count responses is a challenging problem due to the discrete nature of the responses. Existing methods for univariate count responses cannot be easily extended to the multivariate case since the dependency among multiple responses needs to be properly accounted for. To model multivariate data with multiple count responses, we propose a novel multivariate Poisson log-normal model (MVPLN). By simultaneously estimating the regression coefficients and inverse covariance matrix over the latent variables with an efficient Monte Carlo EM algorithm, the proposed model takes advantages of association among multiple count responses to improve the model prediction accuracy. Simulation studies and applications to real world data are conducted to systematically evaluate the performance of the proposed method in comparison with conventional methods. / Ph. D. Multivariate Discrete Data Multi-relational Data Maximum Entropy Modeling Subjective Interestingness Latent Variable Model Multivariate Poisson Regression Covariance Estimation.
6	Non-Parametric Clustering of Multivariate Count Data Tekumalla, Lavanya Sita January 2017 (has links) (PDF) The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters. As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios. This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain. As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora. Multivariate Count Data Clustering Mixture Models Non-parametric Clustering Bulk Cache Preloading Dirichlet Process Mixture Models Spatio-Temporal Data Aggregation Sparse Multivariate Poisson MultiVariate Poisson (MVP) Copulas Nested Hierarchical Dirichlet Processes Dirichlet Process Mixtures Sparse-Multivariate Poisson Dirichlet Process Mixture Model Computer Science

1

Page generated in 0.0875 seconds