Spelling suggestions: "subject:"nonparametric bayesian"" "subject:"nonparametric eayesian""
1 |
Nonparametric Bayesian analysis of some clustering problemsRay, Shubhankar 30 October 2006 (has links)
Nonparametric Bayesian models have been researched extensively in the past 10 years
following the work of Escobar and West (1995) on sampling schemes for Dirichlet processes.
The infinite mixture representation of the Dirichlet process makes it useful
for clustering problems where the number of clusters is unknown. We develop nonparametric
Bayesian models for two different clustering problems, namely functional
and graphical clustering.
We propose a nonparametric Bayes wavelet model for clustering of functional or
longitudinal data. The wavelet modelling is aimed at the resolution of global and
local features during clustering. The model also allows the elicitation of prior belief
about the regularity of the functions and has the ability to adapt to a wide range
of functional regularity. Posterior inference is carried out by Gibbs sampling with
conjugate priors for fast computation. We use simulated as well as real datasets to
illustrate the suitability of the approach over other alternatives.
The functional clustering model is extended to analyze splice microarray data.
New microarray technologies probe consecutive segments along genes to observe alternative
splicing (AS) mechanisms that produce multiple proteins from a single gene.
Clues regarding the number of splice forms can be obtained by clustering the functional
expression profiles from different tissues. The analysis was carried out on the Rosetta dataset (Johnson et al., 2003) to obtain a splice variant by tissue distribution
for all the 10,000 genes. We were able to identify a number of splice forms that appear
to be unique to cancer.
We propose a Bayesian model for partitioning graphs depicting dependencies
in a collection of objects. After suitable transformations and modelling techniques,
the problem of graph cutting can be approached by nonparametric Bayes clustering.
We draw motivation from a recent work (Dhillon, 2001) showing the equivalence of
kernel k-means clustering and certain graph cutting algorithms. It is shown that
loss functions similar to the kernel k-means naturally arise in this model, and the
minimization of associated posterior risk comprises an effective graph cutting strategy.
We present here results from the analysis of two microarray datasets, namely the
melanoma dataset (Bittner et al., 2000) and the sarcoma dataset (Nykter et al.,
2006).
|
2 |
Nonparametric Bayesian Models for Supervised Dimension Reduction and RegressionMao, Kai January 2009 (has links)
<p>We propose nonparametric Bayesian models for supervised dimension</p><p>reduction and regression problems. Supervised dimension reduction is</p><p>a setting where one needs to reduce the dimensionality of the</p><p>predictors or find the dimension reduction subspace and lose little</p><p>or no predictive information. Our first method retrieves the</p><p>dimension reduction subspace in the inverse regression framework by</p><p>utilizing a dependent Dirichlet process that allows for natural</p><p>clustering for the data in terms of both the response and predictor</p><p>variables. Our second method is based on ideas from the gradient</p><p>learning framework and retrieves the dimension reduction subspace</p><p>through coherent nonparametric Bayesian kernel models. We also</p><p>discuss and provide a new rationalization of kernel regression based</p><p>on nonparametric Bayesian models allowing for direct and formal</p><p>inference on the uncertain regression functions. Our proposed models</p><p>apply for high dimensional cases where the number of variables far</p><p>exceed the sample size, and hold for both the classical setting of</p><p>Euclidean subspaces and the Riemannian setting where the marginal</p><p>distribution is concentrated on a manifold. Our Bayesian perspective</p><p>adds appropriate probabilistic and statistical frameworks that allow</p><p>for rich inference such as uncertainty estimation which is important</p><p>for measuring the estimates. Formal probabilistic models with</p><p>likelihoods and priors are given and efficient posterior sampling</p><p>can be obtained by Markov chain Monte Carlo methodologies,</p><p>particularly Gibbs sampling schemes. For the supervised dimension</p><p>reduction as the posterior draws are linear subspaces which are</p><p>points on a Grassmann manifold, we do the posterior inference with</p><p>respect to geodesics on the Grassmannian. The utility of our</p><p>approaches is illustrated on simulated and real examples.</p> / Dissertation
|
3 |
Modeling and projection of respondent driven network samplesZhuang, Zhihe January 1900 (has links)
Master of Science / Department of Statistics / Perla E. Reyes Cuellar / The term network has become part of our everyday vocabulary. The more popular are perhaps the social ones, but the concept also includes business partnerships, literature citations, biological networks, among others. Formally, networks are defined as sets of items and their connections. Often modeled as the mathematic object known as a graph, networks have been studied extensively for several years, and research is widely available. In statistics, a variety of modeling techniques and statistical terms have been developed to analyze them and predict individual behaviors. Specifically, certain statistics like degree distribution, clustering coefficient, and so on are considered important indicators in traditional social network studies. However, while conventional network models assume that the whole network population is known, complete information is not always available. Thus, different sampling methods are often required when the population data is inaccessible. Less time has been dedicated to studying the accuracy of these sampling methods to produce a representative sample. As such, the aim of this report is to identify the capacity of sampling techniques to reflect the features of the original network. In particular, we study Anti-cluster Respondent Driven Sampling (AC-RDS). We also explore whether standard modeling techniques paired with sample data could estimate statistics often used in the study of social networks.
Respondent Driven Sampling (RDS) is a chain referral approach to study rare and/or hidden populations. Originating from the link-tracing design, RDS has been further developed into a series of methods utilized in social network studies, such as locating target populations or estimating the number and proportion of needle-sharing among drug addicts. However, RDS does not always perform as well as expected. When the social network contains tight communities (or clusters) with few connections between them, traditional RDS tends to oversample one community, introducing bias. AC-RDS is a special Markov chain process that collects samples across communities, capturing the whole network. With special referral requests, the initial seeds are more likely to refer to the individuals that are outside their communities. In this report, we fitted the Exponential Random Graph Model (ERGM) and a Stochastic Block Model (SBM) to an empirical study of the Facebook friendship network of 1034 participants. Then, given our goal of identifying techniques that will produce a representative sample, we decided to compare two version of AC-RDSs, in addition to traditional RDS, with Simple Random Sampling (SRS). We compared the methods by drawing 100 network samples using each sampling technique, then fitting an SBM to each sample network we used the results to project the network into one of population size. We calculated essential network statistics, such as degree distribution, of each sampling method and then compared the result to the original network observed statistics.
|
4 |
On Nonparametric Bayesian Inference for Tukey DepthHan, Xuejun January 2017 (has links)
The Dirichlet process is perhaps the most popular prior used in the nonparametric Bayesian inference. This prior which is placed on the space of probability distributions has conjugacy property and asymptotic consistency. In this thesis, our concentration is on applying this nonparametric Bayesian inference on the Tukey depth and Tukey median. Due to the complexity of the distribution of Tukey median, we use this nonparametric Bayesian inference, namely the Lo’s bootstrap, to approximate the distribution of the Tukey median. We also compare our results with the Efron’s bootstrap and Rubin’s bootstrap. Furthermore, the existing asymptotic theory for the Tukey median is reviewed. Based on these existing results, we conjecture that the bootstrap sample Tukey median converges to the same asymp- totic distribution and our simulation supports the conjecture that the asymptotic consistency holds.
|
5 |
Modeling Non-Gaussian Time-correlated Data Using Nonparametric Bayesian MethodXu, Zhiguang 20 October 2014 (has links)
No description available.
|
6 |
Nonparametric Discovery of Human Behavior Patterns from Multimodal DataSun, Feng-Tso 01 May 2014 (has links)
Recent advances in sensor technologies and the growing interest in context- aware applications, such as targeted advertising and location-based services, have led to a demand for understanding human behavior patterns from sensor data. People engage in routine behaviors. Automatic routine discovery goes beyond low-level activity recognition such as sitting or standing and analyzes human behaviors at a higher level (e.g., commuting to work). The goal of the research presented in this thesis is to automatically discover high-level semantic human routines from low-level sensor streams. One recent line of research is to mine human routines from sensor data using parametric topic models. The main shortcoming of parametric models is that they assume a fixed, pre-specified parameter regardless of the data. Choosing an appropriate parameter usually requires an inefficient trial-and-error model selection process. Furthermore, it is even more difficult to find optimal parameter values in advance for personalized applications. The research presented in this thesis offers a novel nonparametric framework for human routine discovery that can infer high-level routines without knowing the number of latent low-level activities beforehand. More specifically, the frame-work automatically finds the size of the low-level feature vocabulary from sensor feature vectors at the vocabulary extraction phase. At the routine discovery phase, the framework further automatically selects the appropriate number of latent low-level activities and discovers latent routines. Moreover, we propose a new generative graphical model to incorporate multimodal sensor streams for the human activity discovery task. The hypothesis and approaches presented in this thesis are evaluated on public datasets in two routine domains: two daily-activity datasets and a transportation mode dataset. Experimental results show that our nonparametric framework can automatically learn the appropriate model parameters from multimodal sensor data without any form of manual model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks.
|
7 |
Nonparametric Bayesian Modelling in Machine LearningHabli, Nada January 2016 (has links)
Nonparametric Bayesian inference has widespread applications in statistics and machine learning. In this thesis, we examine the most popular priors used in Bayesian non-parametric inference. The Dirichlet process and its extensions are priors on an infinite-dimensional space. Originally introduced by Ferguson (1983), its conjugacy property allows a tractable posterior inference which has lately given rise to a significant developments in applications related to machine learning. Another yet widespread prior used in nonparametric Bayesian inference is the Beta process and its extensions. It has originally been introduced by Hjort (1990) for applications in survival analysis. It is a prior on the space of cumulative hazard functions and it has recently been widely used as a prior on an infinite dimensional space for latent feature models.
Our contribution in this thesis is to collect many diverse groups of nonparametric Bayesian tools and explore algorithms to sample from them. We also explore machinery behind the theory to apply and expose some distinguished features of these procedures. These tools can be used by practitioners in many applications.
|
8 |
Um modelo Bayesiano semi-paramétrico para o monitoramento ``on-line\" de qualidade de Taguchi para atributos / A semi-parametric model for Taguchi´s On-Line Quality-Monitoring Procedure for AttributesTsunemi, Miriam Harumi 27 April 2009 (has links)
Este modelo contempla o cenário em que a sequência de frações não-conformes no decorrer de um ciclo do processo de produção aumenta gradativamente (situação comum, por exemplo, quando o desgaste de um equipamento é gradual), diferentemente dos modelos de Taguchi, Nayebpour e Woodall e Nandi e Sreehari (1997), que acomodam sequências de frações não-conformes assumindo no máximo três valores, e de Nandi e Sreehari (1999) e Trindade, Ho e Quinino (2007) que contemplam funções de degradação mais simples. O desenvolvimento é baseado nos trabalhos de Ferguson e Antoniak para o cálculo da distribuição a posteriori de uma medida P desconhecida, associada a uma função de distribuição F desconhecida que representa a sequência de frações não-conformes ao longo de um ciclo, supondo, a priori, mistura de Processos Dirichlet. A aplicação consiste na estimação da função de distribuição F e as estimativas de Bayes são analisadas através de alguns casos particulares / In this work, we propose an alternative model for Taguchi´s On-Line Quality-Monitoring Procedure for Attributes under a Bayesian nonparametric framework. This model may be applied to production processes the sequences of defective fractions during a cycle of which increase gradually (for example, when an equipment deteriorates little by little), differently from either Taguchi\'s, Nayebpour and Woodall\'s and Nandi and Sreehari\'s models that allow at most three values for the defective fraction or Nandi and Sreehari\'s and Trindade, Ho and Quinino\'s which take into account simple deterioration functions. The development is based on Ferguson\'s and Antoniak\'s papers to obtain a posteriori distribution for an unknown measure P, associated with an unknown distribution function F that represents the sequence of defective fractions, considering a prior mixture of Dirichlet Processes. The results are applied to the estimation of the distribution function F and the Bayes estimates are analised through some particular cases.
|
9 |
Um modelo Bayesiano semi-paramétrico para o monitoramento ``on-line\" de qualidade de Taguchi para atributos / A semi-parametric model for Taguchi´s On-Line Quality-Monitoring Procedure for AttributesMiriam Harumi Tsunemi 27 April 2009 (has links)
Este modelo contempla o cenário em que a sequência de frações não-conformes no decorrer de um ciclo do processo de produção aumenta gradativamente (situação comum, por exemplo, quando o desgaste de um equipamento é gradual), diferentemente dos modelos de Taguchi, Nayebpour e Woodall e Nandi e Sreehari (1997), que acomodam sequências de frações não-conformes assumindo no máximo três valores, e de Nandi e Sreehari (1999) e Trindade, Ho e Quinino (2007) que contemplam funções de degradação mais simples. O desenvolvimento é baseado nos trabalhos de Ferguson e Antoniak para o cálculo da distribuição a posteriori de uma medida P desconhecida, associada a uma função de distribuição F desconhecida que representa a sequência de frações não-conformes ao longo de um ciclo, supondo, a priori, mistura de Processos Dirichlet. A aplicação consiste na estimação da função de distribuição F e as estimativas de Bayes são analisadas através de alguns casos particulares / In this work, we propose an alternative model for Taguchi´s On-Line Quality-Monitoring Procedure for Attributes under a Bayesian nonparametric framework. This model may be applied to production processes the sequences of defective fractions during a cycle of which increase gradually (for example, when an equipment deteriorates little by little), differently from either Taguchi\'s, Nayebpour and Woodall\'s and Nandi and Sreehari\'s models that allow at most three values for the defective fraction or Nandi and Sreehari\'s and Trindade, Ho and Quinino\'s which take into account simple deterioration functions. The development is based on Ferguson\'s and Antoniak\'s papers to obtain a posteriori distribution for an unknown measure P, associated with an unknown distribution function F that represents the sequence of defective fractions, considering a prior mixture of Dirichlet Processes. The results are applied to the estimation of the distribution function F and the Bayes estimates are analised through some particular cases.
|
10 |
Efficient Bayesian methods for mixture models with genetic applicationsZuanetti, Daiane Aparecida 14 December 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-01-16T12:38:12Z
No. of bitstreams: 1
TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-17T11:47:35Z (GMT) No. of bitstreams: 1
TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-17T11:47:42Z (GMT) No. of bitstreams: 1
TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Made available in DSpace on 2017-01-17T11:47:50Z (GMT). No. of bitstreams: 1
TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5)
Previous issue date: 2016-12-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / We propose Bayesian methods for selecting and estimating di erent types of mixture models which are widely used in Genetics and Molecular Biology. We speci cally propose data-driven selection and estimation methods for a generalized mixture model, which accommodates the usual (independent) and the rst-order (dependent) models in one framework, and QTL (quantitative trait locus) mapping models for independent and pedigree data. For clustering genes through a mixture model, we propose three nonparametric Bayesian methods: a marginal nested Dirichlet process (NDP), which is able to cluster distributions and, a predictive recursion clustering scheme (PRC) and a subset nonparametric Bayesian (SNOB) clustering algorithm for clustering big data. We analyze and compare the performance of the proposed methods and traditional procedures of selection, estimation and clustering in simulated and real data sets. The proposed methods are more exible, improve the convergence of the algorithms and provide more accurate estimates in many situations. In addition, we propose methods for predicting
nonobservable QTLs genotypes and missing parents and improve the Mendelian probability of inheritance of nonfounder genotype using conditional independence structures. We also suggest applying diagnostic measures to check the goodness of t of QTL mapping models. / N os propomos métodos Bayesianos para selecionar e estimar diferentes tipos de modelos de mistura que são amplamente utilizados em Genética e Biologia
Molecular. Especificamente, propomos métodos direcionados pelos dados para
selecionar e estimar um modelo de mistura generalizado, que descreve o modelo
de mistura usual (independente) e o de primeira ordem numa mesma estrutura,
e modelos de mapeamento de QTL com dados independentes e familiares. Para agrupar genes através de modelos de mistura, nós propomos três métodos Bayesianos
não-paramétricos: o processo de Dirichlet aninhado que possibilita agrupamento
de distribuições e, um algoritmo preditivo recursivo e outro Bayesiano nãoparamétrico exato para agrupar dados de alta dimensão. Analisamos e comparamos o desempenho dos métodos propostos e dos procedimentos tradicionais de seleção e estimação de modelos e agrupamento de dados em conjuntos de dados simulados
e reais. Os métodos propostos são mais
extáveis, aprimoram a convergência dos
algoritmos e apresentam estimativas mais precisas em muitas situações. Além disso,
nós propomos procedimentos para predizer o genótipo não observável dos QTLs e
de pais faltantes e melhorar a probabilidade Mendeliana de herança genética do
genótipo dos descendentes através da estrutura de independência condicional entre
os indivíduos. Também sugerimos aplicar medidas de diagnóstico para verificar a
qualidade do ajuste dos modelos de mapeamento de QTLs.
|
Page generated in 0.0857 seconds