Global ETD Search

1	Nonparametric Bayesian analysis of some clustering problems Ray, Shubhankar 30 October 2006 (has links) Nonparametric Bayesian models have been researched extensively in the past 10 years following the work of Escobar and West (1995) on sampling schemes for Dirichlet processes. The infinite mixture representation of the Dirichlet process makes it useful for clustering problems where the number of clusters is unknown. We develop nonparametric Bayesian models for two different clustering problems, namely functional and graphical clustering. We propose a nonparametric Bayes wavelet model for clustering of functional or longitudinal data. The wavelet modelling is aimed at the resolution of global and local features during clustering. The model also allows the elicitation of prior belief about the regularity of the functions and has the ability to adapt to a wide range of functional regularity. Posterior inference is carried out by Gibbs sampling with conjugate priors for fast computation. We use simulated as well as real datasets to illustrate the suitability of the approach over other alternatives. The functional clustering model is extended to analyze splice microarray data. New microarray technologies probe consecutive segments along genes to observe alternative splicing (AS) mechanisms that produce multiple proteins from a single gene. Clues regarding the number of splice forms can be obtained by clustering the functional expression profiles from different tissues. The analysis was carried out on the Rosetta dataset (Johnson et al., 2003) to obtain a splice variant by tissue distribution for all the 10,000 genes. We were able to identify a number of splice forms that appear to be unique to cancer. We propose a Bayesian model for partitioning graphs depicting dependencies in a collection of objects. After suitable transformations and modelling techniques, the problem of graph cutting can be approached by nonparametric Bayes clustering. We draw motivation from a recent work (Dhillon, 2001) showing the equivalence of kernel k-means clustering and certain graph cutting algorithms. It is shown that loss functions similar to the kernel k-means naturally arise in this model, and the minimization of associated posterior risk comprises an effective graph cutting strategy. We present here results from the analysis of two microarray datasets, namely the melanoma dataset (Bittner et al., 2000) and the sarcoma dataset (Nykter et al., 2006). Nonparametric Bayesian Clustering Dirichlet Processes
2	Nonparametric Bayesian Models for Supervised Dimension Reduction and Regression Mao, Kai January 2009 (has links) <p>We propose nonparametric Bayesian models for supervised dimension</p><p>reduction and regression problems. Supervised dimension reduction is</p><p>a setting where one needs to reduce the dimensionality of the</p><p>predictors or find the dimension reduction subspace and lose little</p><p>or no predictive information. Our first method retrieves the</p><p>dimension reduction subspace in the inverse regression framework by</p><p>utilizing a dependent Dirichlet process that allows for natural</p><p>clustering for the data in terms of both the response and predictor</p><p>variables. Our second method is based on ideas from the gradient</p><p>learning framework and retrieves the dimension reduction subspace</p><p>through coherent nonparametric Bayesian kernel models. We also</p><p>discuss and provide a new rationalization of kernel regression based</p><p>on nonparametric Bayesian models allowing for direct and formal</p><p>inference on the uncertain regression functions. Our proposed models</p><p>apply for high dimensional cases where the number of variables far</p><p>exceed the sample size, and hold for both the classical setting of</p><p>Euclidean subspaces and the Riemannian setting where the marginal</p><p>distribution is concentrated on a manifold. Our Bayesian perspective</p><p>adds appropriate probabilistic and statistical frameworks that allow</p><p>for rich inference such as uncertainty estimation which is important</p><p>for measuring the estimates. Formal probabilistic models with</p><p>likelihoods and priors are given and efficient posterior sampling</p><p>can be obtained by Markov chain Monte Carlo methodologies,</p><p>particularly Gibbs sampling schemes. For the supervised dimension</p><p>reduction as the posterior draws are linear subspaces which are</p><p>points on a Grassmann manifold, we do the posterior inference with</p><p>respect to geodesics on the Grassmannian. The utility of our</p><p>approaches is illustrated on simulated and real examples.</p> / Dissertation Statistics Dirichlet process Kernel models Nonparametric Bayesian Supervised dimension reduction
3	Modeling and projection of respondent driven network samples Zhuang, Zhihe January 1900 (has links) Master of Science / Department of Statistics / Perla E. Reyes Cuellar / The term network has become part of our everyday vocabulary. The more popular are perhaps the social ones, but the concept also includes business partnerships, literature citations, biological networks, among others. Formally, networks are defined as sets of items and their connections. Often modeled as the mathematic object known as a graph, networks have been studied extensively for several years, and research is widely available. In statistics, a variety of modeling techniques and statistical terms have been developed to analyze them and predict individual behaviors. Specifically, certain statistics like degree distribution, clustering coefficient, and so on are considered important indicators in traditional social network studies. However, while conventional network models assume that the whole network population is known, complete information is not always available. Thus, different sampling methods are often required when the population data is inaccessible. Less time has been dedicated to studying the accuracy of these sampling methods to produce a representative sample. As such, the aim of this report is to identify the capacity of sampling techniques to reflect the features of the original network. In particular, we study Anti-cluster Respondent Driven Sampling (AC-RDS). We also explore whether standard modeling techniques paired with sample data could estimate statistics often used in the study of social networks. Respondent Driven Sampling (RDS) is a chain referral approach to study rare and/or hidden populations. Originating from the link-tracing design, RDS has been further developed into a series of methods utilized in social network studies, such as locating target populations or estimating the number and proportion of needle-sharing among drug addicts. However, RDS does not always perform as well as expected. When the social network contains tight communities (or clusters) with few connections between them, traditional RDS tends to oversample one community, introducing bias. AC-RDS is a special Markov chain process that collects samples across communities, capturing the whole network. With special referral requests, the initial seeds are more likely to refer to the individuals that are outside their communities. In this report, we fitted the Exponential Random Graph Model (ERGM) and a Stochastic Block Model (SBM) to an empirical study of the Facebook friendship network of 1034 participants. Then, given our goal of identifying techniques that will produce a representative sample, we decided to compare two version of AC-RDSs, in addition to traditional RDS, with Simple Random Sampling (SRS). We compared the methods by drawing 100 network samples using each sampling technique, then fitting an SBM to each sample network we used the results to project the network into one of population size. We calculated essential network statistics, such as degree distribution, of each sampling method and then compared the result to the original network observed statistics. Networks Respondent Driven Sampling Nonparametric Bayesian Sampling Methods Stochastic Blockmodel
4	On Nonparametric Bayesian Inference for Tukey Depth Han, Xuejun January 2017 (has links) The Dirichlet process is perhaps the most popular prior used in the nonparametric Bayesian inference. This prior which is placed on the space of probability distributions has conjugacy property and asymptotic consistency. In this thesis, our concentration is on applying this nonparametric Bayesian inference on the Tukey depth and Tukey median. Due to the complexity of the distribution of Tukey median, we use this nonparametric Bayesian inference, namely the Lo’s bootstrap, to approximate the distribution of the Tukey median. We also compare our results with the Efron’s bootstrap and Rubin’s bootstrap. Furthermore, the existing asymptotic theory for the Tukey median is reviewed. Based on these existing results, we conjecture that the bootstrap sample Tukey median converges to the same asymp- totic distribution and our simulation supports the conjecture that the asymptotic consistency holds. Nonparametric Bayesian Inference Dirichlet Process Data Depth Tukey Depth
5	Modeling Non-Gaussian Time-correlated Data Using Nonparametric Bayesian Method Xu, Zhiguang 20 October 2014 (has links) No description available. Statistics
6	Nonparametric Discovery of Human Behavior Patterns from Multimodal Data Sun, Feng-Tso 01 May 2014 (has links) Recent advances in sensor technologies and the growing interest in context- aware applications, such as targeted advertising and location-based services, have led to a demand for understanding human behavior patterns from sensor data. People engage in routine behaviors. Automatic routine discovery goes beyond low-level activity recognition such as sitting or standing and analyzes human behaviors at a higher level (e.g., commuting to work). The goal of the research presented in this thesis is to automatically discover high-level semantic human routines from low-level sensor streams. One recent line of research is to mine human routines from sensor data using parametric topic models. The main shortcoming of parametric models is that they assume a fixed, pre-specified parameter regardless of the data. Choosing an appropriate parameter usually requires an inefficient trial-and-error model selection process. Furthermore, it is even more difficult to find optimal parameter values in advance for personalized applications. The research presented in this thesis offers a novel nonparametric framework for human routine discovery that can infer high-level routines without knowing the number of latent low-level activities beforehand. More specifically, the frame-work automatically finds the size of the low-level feature vocabulary from sensor feature vectors at the vocabulary extraction phase. At the routine discovery phase, the framework further automatically selects the appropriate number of latent low-level activities and discovers latent routines. Moreover, we propose a new generative graphical model to incorporate multimodal sensor streams for the human activity discovery task. The hypothesis and approaches presented in this thesis are evaluated on public datasets in two routine domains: two daily-activity datasets and a transportation mode dataset. Experimental results show that our nonparametric framework can automatically learn the appropriate model parameters from multimodal sensor data without any form of manual model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks. Activity recognition machine learning topic modeling nonparametric Bayesian probabilistic graphical models context-aware systems
7	Nonparametric Bayesian Modelling in Machine Learning Habli, Nada January 2016 (has links) Nonparametric Bayesian inference has widespread applications in statistics and machine learning. In this thesis, we examine the most popular priors used in Bayesian non-parametric inference. The Dirichlet process and its extensions are priors on an infinite-dimensional space. Originally introduced by Ferguson (1983), its conjugacy property allows a tractable posterior inference which has lately given rise to a significant developments in applications related to machine learning. Another yet widespread prior used in nonparametric Bayesian inference is the Beta process and its extensions. It has originally been introduced by Hjort (1990) for applications in survival analysis. It is a prior on the space of cumulative hazard functions and it has recently been widely used as a prior on an infinite dimensional space for latent feature models. Our contribution in this thesis is to collect many diverse groups of nonparametric Bayesian tools and explore algorithms to sample from them. We also explore machinery behind the theory to apply and expose some distinguished features of these procedures. These tools can be used by practitioners in many applications. Nonparametric Bayesian Dirichlet process Gamma process Beta process Machine Learning Beta Bernoulli process
8	Um modelo Bayesiano semi-paramétrico para o monitoramento ``on-line\" de qualidade de Taguchi para atributos / A semi-parametric model for Taguchi´s On-Line Quality-Monitoring Procedure for Attributes Tsunemi, Miriam Harumi 27 April 2009 (has links) Este modelo contempla o cenário em que a sequência de frações não-conformes no decorrer de um ciclo do processo de produção aumenta gradativamente (situação comum, por exemplo, quando o desgaste de um equipamento é gradual), diferentemente dos modelos de Taguchi, Nayebpour e Woodall e Nandi e Sreehari (1997), que acomodam sequências de frações não-conformes assumindo no máximo três valores, e de Nandi e Sreehari (1999) e Trindade, Ho e Quinino (2007) que contemplam funções de degradação mais simples. O desenvolvimento é baseado nos trabalhos de Ferguson e Antoniak para o cálculo da distribuição a posteriori de uma medida P desconhecida, associada a uma função de distribuição F desconhecida que representa a sequência de frações não-conformes ao longo de um ciclo, supondo, a priori, mistura de Processos Dirichlet. A aplicação consiste na estimação da função de distribuição F e as estimativas de Bayes são analisadas através de alguns casos particulares / In this work, we propose an alternative model for Taguchi´s On-Line Quality-Monitoring Procedure for Attributes under a Bayesian nonparametric framework. This model may be applied to production processes the sequences of defective fractions during a cycle of which increase gradually (for example, when an equipment deteriorates little by little), differently from either Taguchi\'s, Nayebpour and Woodall\'s and Nandi and Sreehari\'s models that allow at most three values for the defective fraction or Nandi and Sreehari\'s and Trindade, Ho and Quinino\'s which take into account simple deterioration functions. The development is based on Ferguson\'s and Antoniak\'s papers to obtain a posteriori distribution for an unknown measure P, associated with an unknown distribution function F that represents the sequence of defective fractions, considering a prior mixture of Dirichlet Processes. The results are applied to the estimation of the distribution function F and the Bayes estimates are analised through some particular cases. Inferência Bayesiana não-paramétrica mistura de Processos Dirichlet mixture of Dirichlet Processes nonparametric Bayesian Inference
9	Um modelo Bayesiano semi-paramétrico para o monitoramento ``on-line\" de qualidade de Taguchi para atributos / A semi-parametric model for Taguchi´s On-Line Quality-Monitoring Procedure for Attributes Miriam Harumi Tsunemi 27 April 2009 (has links) Este modelo contempla o cenário em que a sequência de frações não-conformes no decorrer de um ciclo do processo de produção aumenta gradativamente (situação comum, por exemplo, quando o desgaste de um equipamento é gradual), diferentemente dos modelos de Taguchi, Nayebpour e Woodall e Nandi e Sreehari (1997), que acomodam sequências de frações não-conformes assumindo no máximo três valores, e de Nandi e Sreehari (1999) e Trindade, Ho e Quinino (2007) que contemplam funções de degradação mais simples. O desenvolvimento é baseado nos trabalhos de Ferguson e Antoniak para o cálculo da distribuição a posteriori de uma medida P desconhecida, associada a uma função de distribuição F desconhecida que representa a sequência de frações não-conformes ao longo de um ciclo, supondo, a priori, mistura de Processos Dirichlet. A aplicação consiste na estimação da função de distribuição F e as estimativas de Bayes são analisadas através de alguns casos particulares / In this work, we propose an alternative model for Taguchi´s On-Line Quality-Monitoring Procedure for Attributes under a Bayesian nonparametric framework. This model may be applied to production processes the sequences of defective fractions during a cycle of which increase gradually (for example, when an equipment deteriorates little by little), differently from either Taguchi\'s, Nayebpour and Woodall\'s and Nandi and Sreehari\'s models that allow at most three values for the defective fraction or Nandi and Sreehari\'s and Trindade, Ho and Quinino\'s which take into account simple deterioration functions. The development is based on Ferguson\'s and Antoniak\'s papers to obtain a posteriori distribution for an unknown measure P, associated with an unknown distribution function F that represents the sequence of defective fractions, considering a prior mixture of Dirichlet Processes. The results are applied to the estimation of the distribution function F and the Bayes estimates are analised through some particular cases. Inferência Bayesiana não-paramétrica mistura de Processos Dirichlet mixture of Dirichlet Processes nonparametric Bayesian Inference
10	Efficient Bayesian methods for mixture models with genetic applications Zuanetti, Daiane Aparecida 14 December 2016 (has links) Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-01-16T12:38:12Z No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-17T11:47:35Z (GMT) No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-17T11:47:42Z (GMT) No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Made available in DSpace on 2017-01-17T11:47:50Z (GMT). No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) Previous issue date: 2016-12-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / We propose Bayesian methods for selecting and estimating di erent types of mixture models which are widely used in Genetics and Molecular Biology. We speci cally propose data-driven selection and estimation methods for a generalized mixture model, which accommodates the usual (independent) and the rst-order (dependent) models in one framework, and QTL (quantitative trait locus) mapping models for independent and pedigree data. For clustering genes through a mixture model, we propose three nonparametric Bayesian methods: a marginal nested Dirichlet process (NDP), which is able to cluster distributions and, a predictive recursion clustering scheme (PRC) and a subset nonparametric Bayesian (SNOB) clustering algorithm for clustering big data. We analyze and compare the performance of the proposed methods and traditional procedures of selection, estimation and clustering in simulated and real data sets. The proposed methods are more exible, improve the convergence of the algorithms and provide more accurate estimates in many situations. In addition, we propose methods for predicting nonobservable QTLs genotypes and missing parents and improve the Mendelian probability of inheritance of nonfounder genotype using conditional independence structures. We also suggest applying diagnostic measures to check the goodness of t of QTL mapping models. / N os propomos métodos Bayesianos para selecionar e estimar diferentes tipos de modelos de mistura que são amplamente utilizados em Genética e Biologia Molecular. Especificamente, propomos métodos direcionados pelos dados para selecionar e estimar um modelo de mistura generalizado, que descreve o modelo de mistura usual (independente) e o de primeira ordem numa mesma estrutura, e modelos de mapeamento de QTL com dados independentes e familiares. Para agrupar genes através de modelos de mistura, nós propomos três métodos Bayesianos não-paramétricos: o processo de Dirichlet aninhado que possibilita agrupamento de distribuições e, um algoritmo preditivo recursivo e outro Bayesiano nãoparamétrico exato para agrupar dados de alta dimensão. Analisamos e comparamos o desempenho dos métodos propostos e dos procedimentos tradicionais de seleção e estimação de modelos e agrupamento de dados em conjuntos de dados simulados e reais. Os métodos propostos são mais extáveis, aprimoram a convergência dos algoritmos e apresentam estimativas mais precisas em muitas situações. Além disso, nós propomos procedimentos para predizer o genótipo não observável dos QTLs e de pais faltantes e melhorar a probabilidade Mendeliana de herança genética do genótipo dos descendentes através da estrutura de independência condicional entre os indivíduos. Também sugerimos aplicar medidas de diagnóstico para verificar a qualidade do ajuste dos modelos de mapeamento de QTLs. Mixture models Data-driven Bayesian methods Nonparametric Bayesian methods QTL mapping Clustering distributions

Search results