Global ETD Search

41	Temporally Correlated Dirichlet Processes in Pollution Receptor Modeling Heaton, Matthew J. 31 May 2007 (has links) (PDF) Understanding the effect of human-induced pollution on the environment is an important precursor to promoting public health and environmental stability. One aspect of understanding pollution is understanding pollution sources. Various methods have been used and developed to understand pollution sources and the amount of pollution those sources emit. Multivariate receptor modeling seeks to estimate pollution source profiles and pollution emissions from concentrations of pollutants such as particulate matter (PM) in the air. Previous approaches to multivariate receptor modeling make the following two key assumptions: (1) PM measurements are independent and (2) source profiles are constant through time. Notwithstanding these assumptions, the existence of temporal correlation among PM measurements and time-varying source profiles is commonly accepted. In this thesis an approach to multivariate receptor modeling is developed in which the temporal structure of PM measurements is accounted for by modeling source profiles as a time-dependent Dirichlet process. The Dirichlet process (DP) pollution model developed herein is evaluated using several simulated data sets. In the presence of time-varying source profiles, the DP model more accurately estimates source profiles and source contributions than other multivariate receptor model approaches. Additionally, when source profiles are constant through time, the DP model outperforms other pollution receptor models by more accurately estimating source profiles and source contributions. pollution source apportionment dynamic linear model dirichlet process temporal correlation source profile time-varying loadings Statistics and Probability
42	Clustering metagenome contigs using coverage with CONCOCT / Klustring av metagenom-kontiger baserat på abundans-profiler med CONCOCT Bjarnason, Brynjar Smári January 2017 (has links) Metagenomics allows studying genetic potentials of microorganisms without prior cultivation. Since metagenome assembly results in fragmented genomes, a key challenge is to cluster the genome fragments (contigs) into more or less complete genomes. The goal of this project was to investigate how well CONCOCT bins assembled contigs into taxonomically relevant clusters using the abundance profiles of the contigs over multiple samples. This was done by studying the effects of different parameter settings for CONCOCT on the clustering results when clustering metagenome contigs from in silico model communities generated by mixing data from isolate genomes. These parameters control how the model that CONCOCT trains is tuned and then how the model fits contigs to their cluster. Each parameter was tested in isolation while others were kept at their default values. For each of the data set used, the number of clusters was kept constant at the known number of species and strains in their respective data set. The resulting configuration was to use a tied covariance model, using principal components explaining 90% of the variance, and filtering out contigs shorter than 3000 bp. It also suggested that all available samples should be used for the abundance profiles. Using these parameters for CONCOCT, it was executed to have it estimate the number of clusters automatically. This gave poor results which lead to the conclusion that the process for selecting the number of clusters that was implemented in CONCOCT, “Bayesian Information Criterion”, was not good enough. That led to the testing of another similar mathematical model, “Dirichlet Process Gaussian Mixture Model”, that uses a different algorithm to estimate number of clusters. This new model gave much better results and CONCOCT has adapted a similar model in later versions. / Metagenomik möjliggör analys av arvsmassor i mikrobiella floror utan att först behöva odla mikroorgansimerna. Metoden innebär att man läser korta DNA-snuttar som sedan pusslas ihop till längre genomfragment (kontiger). Genom att gruppera kontiger som härstammar från samma organism kan man sedan återskapa mer eller mindre fullständiga genom, men detta är en svår bioinformatisk utmaning. Målsättningen med det här projektet var att utvärdera precisionen med vilken mjukvaran CONCOCT, som vi nyligen utvecklat, grupperar kontiger som härstammar från samma organism baserat på information om kontigernas sekvenskomposition och abundansprofil över olika prover. Vi testade hur olika parametrar påverkade klustringen av kontiger i artificiella metagenomdataset av olika komplexitet som vi skapade in silico genom att blanda data från tidigare sekvenserade genom. Parametrarna som testades rörde indata såväl som den statistiska modell som CONCOCT använder för att utföra klustringen. Parametrarna varierades en i taget medan de andra parametrarna hölls konstanta. Antalet kluster hölls också konstant och motsvarade antalet olika organismer i flororna. Bäst resultat erhölls då vi använde en låst kovariansmodell och använde principalkomponenter som förklarade 90% av variansen, samt filtrerade bort kontiger som var kortare än 3000 baspar. Vi fick också bäst resultat då vi använde alla tillgängliga prover. Därefter använde vi dessa parameterinställningar och lät CONCOCT själv bestämma lämpligt antal kluster i dataseten med “Bayesian Information Criterion” - metoden som då var implementerad i CONCOCT. Detta gav otillfredsställande resultat med i regel för få och för stora kluster. Därför testade vi en alternativ metod, “Dirichlet Process Gaussian Mixture Model”, för att uppskatta antal kluster. Denna metod gav avsevärt bättre resultat och i senare versioner av CONCOCT har en liknande metod implementerats. metagenomics coverage composition k-mer clustering mixture model gaussian dirichlet process machine learning Computer Sciences Datavetenskap (datalogi)
43	Semi-parametric Bayesian Inference of Accelerated Life Test Using Dirichlet Process Mixture Model Liu, Xi January 2015 (has links) No description available. Industrial Engineering Accelerated Life Testing Dirichlet Process Weibull Mixture Semi-parametric Bayesian Inference
44	Bayesian Sparse Regression with Application to Data-driven Understanding of Climate Das, Debasish January 2015 (has links) Sparse regressions based on constraining the L1-norm of the coefficients became popular due to their ability to handle high dimensional data unlike the regular regressions which suffer from overfitting and model identifiability issues especially when sample size is small. They are often the method of choice in many fields of science and engineering for simultaneously selecting covariates and fitting parsimonious linear models that are better generalizable and easily interpretable. However, significant challenges may be posed by the need to accommodate extremes and other domain constraints such as dynamical relations among variables, spatial and temporal constraints, need to provide uncertainty estimates and feature correlations, among others. We adopted a hierarchical Bayesian version of the sparse regression framework and exploited its inherent flexibility to accommodate the constraints. We applied sparse regression for the feature selection problem of statistical downscaling of the climate variables with particular focus on their extremes. This is important for many impact studies where the climate change information is required at a spatial scale much finer than that provided by the global or regional climate models. Characterizing the dependence of extremes on covariates can help in identification of plausible causal drivers and inform extremes downscaling. We propose a general-purpose sparse Bayesian framework for covariate discovery that accommodates the non-Gaussian distribution of extremes within a hierarchical Bayesian sparse regression model. We obtain posteriors over regression coefficients, which indicate dependence of extremes on the corresponding covariates and provide uncertainty estimates, using a variational Bayes approximation. The method is applied for selecting informative atmospheric covariates at multiple spatial scales as well as indices of large scale circulation and global warming related to frequency of precipitation extremes over continental United States. Our results confirm the dependence relations that may be expected from known precipitation physics and generates novel insights which can inform physical understanding. We plan to extend our model to discover covariates for extreme intensity in future. We further extend our framework to handle the dynamic relationship among the climate variables using a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP). The extended model can achieve simultaneous clustering and discovery of covariates within each cluster. Moreover, the a priori knowledge about association between pairs of data-points is incorporated in the model through must-link constraints on a Markov Random Field (MRF) prior. A scalable and efficient variational Bayes approach is developed to infer posteriors on regression coefficients and cluster variables. / Computer and Information Science Computer Science Climate Change Bayesian Sparse Regression Climate Change Dirichlet Process Mixtures Non-parametric Models Sparse Models Variational Inference
45	Bayesian Approach Dealing with Mixture Model Problems Zhang, Huaiye 05 June 2012 (has links) In this dissertation, we focus on two research topics related to mixture models. The first topic is Adaptive Rejection Metropolis Simulated Annealing for Detecting Global Maximum Regions, and the second topic is Bayesian Model Selection for Nonlinear Mixed Effects Model. In the first topic, we consider a finite mixture model, which is used to fit the data from heterogeneous populations for many applications. An Expectation Maximization (EM) algorithm and Markov Chain Monte Carlo (MCMC) are two popular methods to estimate parameters in a finite mixture model. However, both of the methods may converge to local maximum regions rather than the global maximum when multiple local maxima exist. In this dissertation, we propose a new approach, Adaptive Rejection Metropolis Simulated Annealing (ARMS annealing), to improve the EM algorithm and MCMC methods. Combining simulated annealing (SA) and adaptive rejection metropolis sampling (ARMS), ARMS annealing generate a set of proper starting points which help to reach all possible modes. ARMS uses a piecewise linear envelope function for a proposal distribution. Under the SA framework, we start with a set of proposal distributions, which are constructed by ARMS, and this method finds a set of proper starting points, which help to detect separate modes. We refer to this approach as ARMS annealing. By combining together ARMS annealing with the EM algorithm and with the Bayesian approach, respectively, we have proposed two approaches: an EM ARMS annealing algorithm and a Bayesian ARMS annealing approach. EM ARMS annealing implement the EM algorithm by using a set of starting points proposed by ARMS annealing. ARMS annealing also helps MCMC approaches determine starting points. Both approaches capture the global maximum region and estimate the parameters accurately. An illustrative example uses a survey data on the number of charitable donations. The second topic is related to the nonlinear mixed effects model (NLME). Typically a parametric NLME model requires strong assumptions which make the model less flexible and often are not satisfied in real applications. To allow the NLME model to have more flexible assumptions, we present three semiparametric Bayesian NLME models, constructed with Dirichlet process (DP) priors. Dirichlet process models often refer to an infinite mixture model. We propose a unified approach, the penalized posterior Bayes factor, for the purpose of model comparison. Using simulation studies, we compare the performance of two of the three semiparametric hierarchical Bayesian approaches with that of the parametric Bayesian approach. Simulation results suggest that our penalized posterior Bayes factor is a robust method for comparing hierarchical parametric and semiparametric models. An application to gastric emptying studies is used to demonstrate the advantage of our estimation and evaluation approaches. / Ph. D. Infinite Mixture Model Hierarchical Model Nonlinear Mixed Effects Model Dirichlet Process Simulated Annealing Adaptive Rejection Metropolis Sampling
46	Non-Parametric Clustering of Multivariate Count Data Tekumalla, Lavanya Sita January 2017 (has links) (PDF) The focus of this thesis is models for non-parametric clustering of multivariate count data. While there has been significant work in Bayesian non-parametric modelling in the last decade, in the context of mixture models for real-valued data and some forms of discrete data such as multinomial-mixtures, there has been much less work on non-parametric clustering of Multi-variate Count Data. The main challenges in clustering multivariate counts include choosing a suitable multivariate distribution that adequately captures the properties of the data, for instance handling over-dispersed data or sparse multivariate data, at the same time leveraging the inherent dependency structure between dimensions and across instances to get meaningful clusters. As the first contribution, this thesis explores extensions to the Multivariate Poisson distribution, proposing efficient algorithms for non-parametric clustering of multivariate count data. While Poisson is the most popular distribution for count modelling, the Multivariate Poisson often leads to intractable inference and a suboptimal t of the data. To address this, we introduce a family of models based on the Sparse-Multivariate Poisson, that exploit the inherent sparsity in multivariate data, reducing the number of latent variables in the formulation of Multivariate Poisson leading to a better t and more efficient inference. We explore Dirichlet process mixture model extensions and temporal non-parametric extensions to models based on the Sparse Multivariate Poisson for practical use of Poisson based models for non-parametric clustering of multivariate counts in real-world applications. As a second contribution, this thesis addresses moving beyond the limitations of Poisson based models for non-parametric clustering, for instance in handling over dispersed data or data with negative correlations. We explore, for the first time, marginal independent inference techniques based on the Gaussian Copula for multivariate count data in the Dirichlet Process mixture model setting. This enables non-parametric clustering of multivariate counts without limiting assumptions that usually restrict the marginal to belong to a particular family, such as the Poisson or the negative-binomial. This inference technique can also work for mixed data (combination of counts, binary and continuous data) enabling Bayesian non-parametric modelling to be used for a wide variety of data types. As the third contribution, this thesis addresses modelling a wide range of more complex dependencies such as asymmetric and tail dependencies during non-parametric clustering of multivariate count data with Vine Copula based Dirichlet process mixtures. While vine copula inference has been well explored for continuous data, it is still a topic of active research for multivariate counts and mixed multivariate data. Inference for multivariate counts and mixed data is a hard problem owing to ties that arise with discrete marginal. An efficient marginal independent inference approach based on extended rank likelihood, based on recent work in the statistics literature, is proposed in this thesis, extending the use vines for multivariate counts and mixed data in practical clustering scenarios. This thesis also explores the novel systems application of Bulk Cache Preloading by analysing I/O traces though predictive models for temporal non-parametric clustering of multivariate count data. State of the art techniques in the caching domain are limited to exploiting short-range correlations in memory accesses at the milli-second granularity or smaller and cannot leverage long range correlations in traces. We explore for the first time, Bulk Cache Preloading, the process of pro-actively predicting data to load into cache, minutes or hours before the actual request from the application, by leveraging longer range correlation at the granularity of minutes or hours. This enables the development of machine learning techniques tailored for caching due to relaxed timing constraints. Our approach involves a data aggregation process, converting I/O traces into a temporal sequence of multivariate counts, that we analyse with the temporal non-parametric clustering models proposed in this thesis. While the focus of our thesis is models for non-parametric clustering for discrete data, particularly multivariate counts, we also hope our work on bulk cache preloading paves the way to more inter-disciplinary research for using data mining techniques in the systems domain. As an additional contribution, this thesis addresses multi-level non-parametric admixture modelling for discrete data in the form of grouped categorical data, such as document collections. Non-parametric clustering for topic modelling in document collections, where a document is as-associated with an unknown number of semantic themes or topics, is well explored with admixture models such as the Hierarchical Dirichlet Process. However, there exist scenarios, where a doc-ument requires being associated with themes at multiple levels, where each theme is itself an admixture over themes at the previous level, motivating the need for multilevel admixtures. Consider the example of non-parametric entity-topic modelling of simultaneously learning entities and topics from document collections. This can be realized by modelling a document as an admixture over entities while entities could themselves be modeled as admixtures over topics. We propose the nested Hierarchical Dirichlet Process to address this gap and apply a two level version of our model to automatically learn author entities and topics from research corpora. Multivariate Count Data Clustering Mixture Models Non-parametric Clustering Bulk Cache Preloading Dirichlet Process Mixture Models Spatio-Temporal Data Aggregation Sparse Multivariate Poisson MultiVariate Poisson (MVP) Copulas Nested Hierarchical Dirichlet Processes Dirichlet Process Mixtures Sparse-Multivariate Poisson Dirichlet Process Mixture Model Computer Science
47	Extensões em modelos de sobrevivência com fração de cura e efeitos aleatórios / Extensions in survival models with cure rate and random effects Gallardo Mateluna, Diego Ignacio 03 February 2014 (has links) Neste trabalho são apresentadas algumas extensões de modelos de sobrevivência com fração de cura, assumindo o contexto em que as observações estão agrupadas. Dois efeitos aleatórios são incorporados para cada grupo: um para explicar o efeito no tempo de sobrevida das observações suscetíveis e outro para explicar a probabilidade de cura. Apresenta-se uma abordagem clássica através dos estimadores REML e uma abordagem bayesiana através do uso de processos de Dirichlet. Discute-se alguns estudos de simulação em que avalia-se o desempenho dos estimadores propostos, além de comparar as duas abordagens. Finalmente, ilustram-se os resultados com dados reais. / In this work some extensions in survival models with cure fraction are presented, assuming the context in which the observations are grouped into clusters. Two random effects are incorporated for each group: one to explain the effect on survival time of susceptible observations and another to explain the probability of cure. A classical approach through the REML estimators is presented as well as a bayesian approach through Dirichlet Process. Besides comparing both approaches, some simulation studies which evaluates the performance of the proposed estimators are discussed. Finally, the results are illustrated with a real database. destructive models Dirichlet process modelo de tempos de promoção modelos destrutivos modelos mistos processos Dirichlet promotion time cure rate model random effect models
48	Économie informelle en Haïti, marché du travail et pauvreté : analyses quantitatives / Informal economy in Haiti, labour market and poverty : a quantitative analysis Aspilaire, Roseman 03 November 2017 (has links) La prédominance de l’informel dans l’économie d’Haïti, où plus de 80% de la population vit en dessous du seuil de la pauvreté et plus de 35% au chômage, laisse entrevoir des liens étroits entre l’économie informelle, la pauvreté et le marché du travail. Faire ressortir ces interrelations, exige une évaluation de cette économie informelle qui fait l’objet des quatre chapitres de notre thèse traitant successivement l’évolution de la situation macroéconomique, le capital humain, les gains des travailleurs informels, et la segmentation du marché du travail.Le premier chapitre fait un diagnostic du phénomène selon l’état des lieux des théories élaborées et l’évolution du cadre macro-économique d’Haïti de 1980 à 2010 et propose une évaluation macroéconomique de l’informel à partir d’un modèle PLS (Partial Least Squares) en pourcentage du PIB.Le chapitre deux établit les relations entre l’évolution de l’économie informelle, dérégulation et politiques néolibérales grâce à un modèle LISREL (Linear Structural Relations). Nous examinons les incidences des politiques fiscales, budgétaires et monétaires des 30 dernières années sur l’économie informelle. Nous réévaluons aussi les causes de l’évolution de l’informel généralement évoquées par les études empiriques (taxes, sécurité sociale).Au chapitre trois, nous analysons la dimension micro-réelle de l’informel grâce à un modèle des gains à la Mincer estimé par les équations logit à partir des données d’une enquête nationale sur l’emploi et l’économie informelle (EEEI) de 2007. Nous analysons les déterminants des gains informels au regard de la position des travailleurs sur le marché (salariés, entrepreneurs et indépendants) ; et les revenus (formels et informels) et les caractéristiques socioéconomiques des travailleurs pauvres et non-pauvres par rapport au seuil de pauvreté.Au chapitre quatre, nous testons d’abord la compétitivité et la segmentation du marché de l’emploi en faisant usage de modèle de Roy et du modèle de Roy élargi à travers une estimation d’un modèle Tobit. Nous utilisons un modèle de Processus de Dirichlet : d’abord analyser la segmentation et la compétitivité éventuelle du marché du travail informel ainsi que ses déterminants, selon les données de l’EEEI-2007 ; ensuite, pour distinguer les caractéristiques fondamentales des informels involontaires (exclus du marché du travail formel) de celles des informels volontaires qui en retirent des avantages comparatifs. / The predominance of the informal sector in the economy of Haiti, where more than 80% of the population lives below the threshold of poverty and more than 35% unemployed, suggests links between the informal economy, poverty and the labour market. Highlight these interrelationships, requires an assessment of the informal economy, which is the subject of the four chapters of this thesis, dealing successively with the evolution of the macroeconomic situation, human capital, the informal earnings of workers, and the segmentation of the labour market.The first chapter made a diagnosis of the phenomenon according to the State of affairs of the developed theories and the evolution of the macroeconomic framework of Haiti from 1980 to 2010. And then offers a macroeconomic assessment of the informal sector as a percentage of GDP from a PLS (Partial Least Squares).Chapter two sets out the relationship between the evolution of the informal economy, deregulation and neo-liberal policies through a LISREL (Linear Structural Relations) model. We look at the impact of the budgetary, fiscal and monetary policies of the past 30 years on the informal economy. We also reassess the causes of the evolution of the informal economy generally evoked by the empirical studies (taxes, social security).In the chapter three, we analyse the micro-real dimension of the informal economy through a model of the Mincer earnings estimated by the equations logit from data in a national survey on employment and the informal economy (EEEI) in 2007. We analyse the determinants of informal gains in terms of the position of the market workers (employees, entrepreneurs and self-employed); and revenues (formal and informal) and the socio-economic characteristics of the working poor and non-poor compared to the poverty line.In chapter four, we first test the competitiveness and the segmentation of the labour market by making use of model of Roy and the expanded Roy model through an estimate a model Tobit. We use a model of Dirichlet process: first analyse the segmentation and possible informal work and market competitiveness as its determinants, according to data from the EEEI 2007; then, to distinguish the fundamental characteristics of the involuntary informal (excluded from the formal labour market) than the voluntary informal who gain comparative advantages. Economie informelle en Haiti Marché du Travail Pauvreté Modele PLS Lisrel Processus de Dirichlet Informal Economy in Haiti Labour Market Poverty PLS modelling Lisrel Dirichlet Process
49	Extensões em modelos de sobrevivência com fração de cura e efeitos aleatórios / Extensions in survival models with cure rate and random effects Diego Ignacio Gallardo Mateluna 03 February 2014 (has links) Neste trabalho são apresentadas algumas extensões de modelos de sobrevivência com fração de cura, assumindo o contexto em que as observações estão agrupadas. Dois efeitos aleatórios são incorporados para cada grupo: um para explicar o efeito no tempo de sobrevida das observações suscetíveis e outro para explicar a probabilidade de cura. Apresenta-se uma abordagem clássica através dos estimadores REML e uma abordagem bayesiana através do uso de processos de Dirichlet. Discute-se alguns estudos de simulação em que avalia-se o desempenho dos estimadores propostos, além de comparar as duas abordagens. Finalmente, ilustram-se os resultados com dados reais. / In this work some extensions in survival models with cure fraction are presented, assuming the context in which the observations are grouped into clusters. Two random effects are incorporated for each group: one to explain the effect on survival time of susceptible observations and another to explain the probability of cure. A classical approach through the REML estimators is presented as well as a bayesian approach through Dirichlet Process. Besides comparing both approaches, some simulation studies which evaluates the performance of the proposed estimators are discussed. Finally, the results are illustrated with a real database. modelo de tempos de promoção modelos destrutivos modelos mistos processos Dirichlet destructive models Dirichlet process promotion time cure rate model random effect models
50	Automatic role detection in online forums / Détection automatique des rôles dans les forums en ligne Lumbreras, Alberto 07 November 2016 (has links) Nous traitons dans cette thèse le problème de la détection des rôles des utilisateurs sur des forums de discussion en ligne. On peut détenir un rôle comme l'ensemble des comportements propres d'une personne ou d'une position. Sur les forums de discussion, les comportements sont surtout observés à travers des conversations. Pour autant, nous centrons notre attention sur la manière dont les utilisateurs dialoguent. Nous proposons trois méthodes pour détecter des groupes d'utilisateurs où les utilisateurs d'un même groupe dialoguent de façon similaire.Notre première méthode se base sur les structures des conversations dans lesquelles les utilisateurs participent. Nous appliquons des notions de voisinage différentes (radiusbased, order-based, and time-based) applicables aux commentaires qui sont représentés par des noeuds sur un arbre. Nous comparons les motifs de conversation qu'ils permettent de détecter ainsi que les groupes d'utilisateurs associés à des motifs similaires. Notre deuxième méthode se base sur des modèles stochastiques de croissance appliqués aux fils de discussion. Nous proposons une méthode pour trouver des groupes d'utilisateurs qui ont tendance à répondre au même type de commentaire. Nous montrons que, bien qu'il y ait des groupes d'utilisateurs avec des motifs de réponse similaires, il n'y a pas d'évidence forte qui confirme que ces comportements présentent des propriétés prédictives quant aux comportements futurs {sauf pour quelques groupes avec des comportements extrêmes. Avec notre troisième méthode nous intégrons les types de données utilisés dans les deux méthodes précédentes (feature-based et behavioral ou functional-based) et nous montrons que le modèle trouve des groupes en ayant besoin de moins d'observations. L'hypothèse du modèle est que les utilisateurs qui ont des caractéristiques similaires ont aussi des comportements similaires. / This thesis addresses the problem of detecting user roles in online discussion forums. A role may be defined as the set of behaviors characteristic of a person or a position. In discussion forums, behaviors are primarily observed through conversations. Hence, we focus our attention on how users discuss. We propose three methods to detect groups of users with similar conversational behaviors.Our first method for the detection of roles is based on conversational structures. Weapply different notions of neighborhood for posts in tree graphs (radius-based, order-based, and time-based) and compare the conversational patterns that they detect as well as the clusters of users with similar conversational patterns.Our second method is based on stochastic models of growth for conversation threads.Building upon these models we propose a method to find groups of users that tend to reply to the same type of posts. We show that, while there are clusters of users with similar replying patterns, there is no strong evidence that these behaviors are predictive of future behaviors \|except for some groups of users with extreme behaviors.In out last method, we integrate the type of data used in the two previous methods(feature-based and behavioral or functional-based) and show that we can find clusters using fewer examples. The model exploits the idea that users with similar features have similar behaviors. Rôles Détection de rôles Analyse des réseaux sociaux Forums Apprentissage automatique Clustering Statistique bay Roles Role detection Social network analysis Forums Machine learning Clustering Bayesian statistics Dirichlet process Graphs

Search results