Spelling suggestions: "subject:"nonparametric."" "subject:"nonparametrically.""
11 |
Bayesian Methods for Data-Dependent PriorsDarnieder, William Francis 22 July 2011 (has links)
No description available.
|
12 |
The Cauchy-Net Mixture Model for Clustering with Anomalous DataSlifko, Matthew D. 11 September 2019 (has links)
We live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible Bayesian nonparametric tool that employs a mixture between a Dirichlet Process Mixture Model (DPMM) and a Cauchy distributed component, which we call the Cauchy-Net (CN). Each portion of the model offers benefits, as the DPMM eliminates the limitation of requiring a fixed number of a components and the CN captures observations that do not belong to the well-defined components by leveraging its heavy tails. Through isolating the anomalous observations in a single component, we simultaneously identify the observations in the net as warranting further inspection and prevent them from interfering with the formation of the remaining components. The result is a framework that allows for simultaneously clustering observations and making predictions in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia. / Doctor of Philosophy / We live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible tool for identifying and isolating the anomalies, while simultaneously discovering cluster structure and making predictions among the nonanomalous observations. The result is a framework that allows for simultaneously clustering and predicting in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia.
|
13 |
Adaptive methods for risk calibrationWeining, Wang 19 September 2012 (has links)
Dieser Artikel enthält vier Kapitel. Das erste Kapitel ist berechtigt, '''' lokalen Quantil Regression"und seine Zusammenfassung: Quantil Regression ist eine Technik, bedingte Quantil Kurven zu schätzen. Es bietet ein umfassendes Bild über ein Antwort-Kontingent auf erklärenden Variablen. In einem Rahmen flexible Modellierung ist eine besondere Form der bedingten Quantil-Kurve nicht von vornherein festgelegt. Dies motiviert eine lokale parametrische anstatt einer globalen feste Modell passend Ansatz. Eine nichtparametrische Glättung Schätzung der bedingte Quantil Kurve erfordert, zwischen lokalen Krümmung und stochastische auszugleichen Variabilität. In den ersten Essay empfehlen wir eine lokale Modellauswahl Technik, die eine adaptive Schätzung der bedingte bietet Quantil-Regression-Kurve bei jedem Entwurf-Punkt. Theoretische Ergebnisse behaupten, dass das vorgeschlagene adaptive Verfahren als führt gut als Orakel die würde das Risiko der lokalen Abschätzung für die Aufgabenstellung minimieren. Wir veranschaulichen die Leistung der Trolle. / This article includes four chapters. The first chapter is entitled ``Local Quantile Regression", and its summary: Quantile regression is a technique to estimate conditional quantile curves. It provides a comprehensive picture of a response contingent on explanatory variables. In a flexible modeling framework, a specific form of the conditional quantile curve is not a priori fixed. This motivates a local parametric rather than a global fixed model fitting approach. A nonparametric smoothing estimate of the conditional quantile curve requires to balance between local curvature and stochastic variability. In the first essay, we suggest a local model selection technique that provides an adaptive estimate of the conditional quantile regression curve at each design point. Theoretical results claim that the proposed adaptive procedure performs as good as an oracle which would minimize the local estimation risk for the problem at hand. We illustrate the performance of the procedure by an extensive simulation study and consider a couple of applications: to tail dependence analysis for the Hong Kong stock market and to analysis of the distributions of the risk factors of temperature dynamics.
|
14 |
Bayesian Nonparametric Modeling of Latent StructuresXing, Zhengming January 2014 (has links)
<p>Unprecedented amount of data has been collected in diverse fields such as social network, infectious disease and political science in this information explosive era. The high dimensional, complex and heterogeneous data imposes tremendous challenges on traditional statistical models. Bayesian nonparametric methods address these challenges by providing models that can fit the data with growing complexity. In this thesis, we design novel Bayesian nonparametric models on dataset from three different fields, hyperspectral images analysis, infectious disease and voting behaviors. </p><p>First, we consider analysis of noisy and incomplete hyperspectral imagery, with the objective of removing the noise and inferring the missing data. The noise statistics may be wavelength-dependent, and the fraction of data missing (at random) may be substantial, including potentially entire bands, offering the potential to significantly reduce the quantity of data that need be measured. We achieve this objective by employing Bayesian dictionary learning model, considering two distinct means of imposing sparse dictionary usage and drawing the dictionary elements from a Gaussian process prior, imposing structure on the wavelength dependence of the dictionary elements.</p><p>Second, a Bayesian statistical model is developed for analysis of the time-evolving properties of infectious disease, with a particular focus on viruses. The model employs a latent semi-Markovian state process, and the state-transition statistics are driven by three terms: ($i$) a general time-evolving trend of the overall population, ($ii$) a semi-periodic term that accounts for effects caused by the days of the week, and ($iii$) a regression term that relates the probability of infection to covariates (here, specifically, to the Google Flu Trends data).</p><p>Third, extensive information on 3 million randomly sampled United States citizens is used to construct a statistical model of constituent preferences for each U.S. congressional district. This model is linked to the legislative voting record of the legislator from each district, yielding an integrated model for constituency data, legislative roll-call votes, and the text of the legislation. The model is used to examine the extent to which legislators' voting records are aligned with constituent preferences, and the implications of that alignment (or lack thereof) on subsequent election outcomes. The analysis is based on a Bayesian nonparametric formalism, with fast inference via a stochastic variational Bayesian analysis.</p> / Dissertation
|
15 |
Análise bayesiana de densidades aleatórias simples / Bayesian analysis of simple random densitiesPaulo Cilas Marques Filho 19 December 2011 (has links)
Definimos, a partir de uma partição de um intervalo limitado da reta real formada por subintervalos, uma distribuição a priori sobre uma classe de densidades em relação à medida de Lebesgue construindo uma densidade aleatória cujas realizações são funções simples não negativas que assumem um valor constante em cada subintervalo da partição e possuem integral unitária. Utilizamos tais densidades aleatórias simples na análise bayesiana de um conjunto de observáveis absolutamente contínuos e provamos que a distribuição a priori é fechada sob amostragem. Exploramos as distribuições a priori e a posteriori via simulações estocásticas e obtemos soluções bayesianas para o problema de estimação de densidade. Os resultados das simulações exibem o comportamento assintótico da distribuição a posteriori quando crescemos o tamanho das amostras dos dados analisados. Quando a partição não é conhecida a priori, propomos um critério de escolha a partir da informação contida na amostra. Apesar de a esperança de uma densidade aleatória simples ser sempre uma densidade descontínua, obtemos estimativas suaves resolvendo um problema de decisão em que os estados da natureza são realizações da densidade aleatória simples e as ações são densidades suaves de uma classe adequada. / We define, from a known partition in subintervals of a bounded interval of the real line, a prior distribution over a class of densities with respect to Lebesgue measure constructing a random density whose realizations are nonnegative simple functions that integrate to one and have a constant value on each subinterval of the partition. These simple random densities are used in the Bayesian analysis of a set of absolutely continuous observables and the prior distribution is proved to be closed under sampling. We explore the prior and posterior distributions through stochastic simulations and find Bayesian solutions to the problem of density estimation. Simulations results show the asymptotic behavior of the posterior distribution as we increase the size of the analyzed data samples. When the partition is unknown, we propose a choice criterion based on the information contained in the sample. In spite of the fact that the expectation of a simple random density is always a discontinuous density, we get smooth estimates solving a decision problem where the states of nature are realizations of the simple random density and the actions are smooth densities of a suitable class.
|
16 |
Análise bayesiana de densidades aleatórias simples / Bayesian analysis of simple random densitiesMarques Filho, Paulo Cilas 19 December 2011 (has links)
Definimos, a partir de uma partição de um intervalo limitado da reta real formada por subintervalos, uma distribuição a priori sobre uma classe de densidades em relação à medida de Lebesgue construindo uma densidade aleatória cujas realizações são funções simples não negativas que assumem um valor constante em cada subintervalo da partição e possuem integral unitária. Utilizamos tais densidades aleatórias simples na análise bayesiana de um conjunto de observáveis absolutamente contínuos e provamos que a distribuição a priori é fechada sob amostragem. Exploramos as distribuições a priori e a posteriori via simulações estocásticas e obtemos soluções bayesianas para o problema de estimação de densidade. Os resultados das simulações exibem o comportamento assintótico da distribuição a posteriori quando crescemos o tamanho das amostras dos dados analisados. Quando a partição não é conhecida a priori, propomos um critério de escolha a partir da informação contida na amostra. Apesar de a esperança de uma densidade aleatória simples ser sempre uma densidade descontínua, obtemos estimativas suaves resolvendo um problema de decisão em que os estados da natureza são realizações da densidade aleatória simples e as ações são densidades suaves de uma classe adequada. / We define, from a known partition in subintervals of a bounded interval of the real line, a prior distribution over a class of densities with respect to Lebesgue measure constructing a random density whose realizations are nonnegative simple functions that integrate to one and have a constant value on each subinterval of the partition. These simple random densities are used in the Bayesian analysis of a set of absolutely continuous observables and the prior distribution is proved to be closed under sampling. We explore the prior and posterior distributions through stochastic simulations and find Bayesian solutions to the problem of density estimation. Simulations results show the asymptotic behavior of the posterior distribution as we increase the size of the analyzed data samples. When the partition is unknown, we propose a choice criterion based on the information contained in the sample. In spite of the fact that the expectation of a simple random density is always a discontinuous density, we get smooth estimates solving a decision problem where the states of nature are realizations of the simple random density and the actions are smooth densities of a suitable class.
|
17 |
Bayesian Nonparametric Modeling and Theory for Complex DataPati, Debdeep January 2012 (has links)
<p>The dissertation focuses on solving some important theoretical and methodological problems associated with Bayesian modeling of infinite dimensional `objects', popularly called nonparametric Bayes. The term `infinite dimensional object' can refer to a density, a conditional density, a regression surface or even a manifold. Although Bayesian density estimation as well as function estimation are well-justified in the existing literature, there has been little or no theory justifying the estimation of more complex objects (e.g. conditional density, manifold, etc.). Part of this dissertation focuses on exploring the structure of the spaces on which the priors for conditional densities and manifolds are supported while studying how the posterior concentrates as increasing amounts of data are collected.</p><p>With the advent of new acquisition devices, there has been a need to model complex objects associated with complex data-types e.g. millions of genes affecting a bio-marker, 2D pixelated images, a cloud of points in the 3D space, etc. A significant portion of this dissertation has been devoted to developing adaptive nonparametric Bayes approaches for learning low-dimensional structures underlying higher-dimensional objects e.g. a high-dimensional regression function supported on a lower dimensional space, closed curves representing the boundaries of shapes in 2D images and closed surfaces located on or near the point cloud data. Characterizing the distribution of these objects has a tremendous impact in several application areas ranging from tumor tracking for targeted radiation therapy, to classifying cells in the brain, to model based methods for 3D animation and so on. </p><p> </p><p> The first three chapters are devoted to Bayesian nonparametric theory and modeling in unconstrained Euclidean spaces e.g. mean regression and density regression, the next two focus on Bayesian modeling of manifolds e.g. closed curves and surfaces, and the final one on nonparametric Bayes spatial point pattern data modeling when the sampling locations are informative of the outcomes.</p> / Dissertation
|
18 |
Bayesian Hierarchical, Semiparametric, and Nonparametric Methods for International New Product Di ffusionHartman, Brian Matthew 2010 August 1900 (has links)
Global marketing managers are keenly interested in being able to predict the sales
of their new products. Understanding how a product is adopted over time allows
the managers to optimally allocate their resources. With the world becoming ever
more global, there are strong and complex interactions between the countries in the
world. My work explores how to describe the relationship between those countries and
determines the best way to leverage that information to improve the sales predictions.
In Chapter II, I describe how diffusion speed has changed over time. The most
recent major study on this topic, by Christophe Van den Bulte, investigated new
product di ffusions in the United States. Van den Bulte notes that a similar study
is needed in the international context, especially in developing countries. Additionally,
his model contains the implicit assumption that the diffusion speed parameter
is constant throughout the life of a product. I model the time component as a nonparametric
function, allowing the speed parameter the
flexibility to change over time.
I find that early in the product's life, the speed parameter is higher than expected.
Additionally, as the Internet has grown in popularity, the speed parameter has increased.
In Chapter III, I examine whether the interactions can be described through
a reference hierarchy in addition to the cross-country word-of-mouth eff ects already
in the literature. I also expand the word-of-mouth e ffect by relating the magnitude
of the e ffect to the distance between the two countries. The current literature only applies that e ffect equally to the n closest countries (forming a neighbor set). This
also leads to an analysis of how to best measure the distance between two countries. I
compare four possible distance measures: distance between the population centroids,
trade
ow, tourism
ow, and cultural similarity. Including the reference hierarchy
improves the predictions by 30 percent over the current best model.
Finally, in Chapter IV, I look more closely at the Bass Diffusion Model. It is
prominently used in the marketing literature and is the base of my analysis in Chapter
III. All of the current formulations include the implicit assumption that all the
regression parameters are equal for each country. One dollar increase in GDP should
have more of an eff ect in a poor country than in a rich country. A Dirichlet process
prior enables me to cluster the countries by their regression coefficients. Incorporating
the distance measures can improve the predictions by 35 percent in some cases.
|
19 |
Some Recent Advances in Non- and Semiparametric Bayesian Modeling with Copulas, Mixtures, and Latent VariablesMurray, Jared January 2013 (has links)
<p>This thesis develops flexible non- and semiparametric Bayesian models for mixed continuous, ordered and unordered categorical data. These methods have a range of possible applications; the applications considered in this thesis are drawn primarily from the social sciences, where multivariate, heterogeneous datasets with complex dependence and missing observations are the norm. </p><p>The first contribution is an extension of the Gaussian factor model to Gaussian copula factor models, which accommodate continuous and ordinal data with unspecified marginal distributions. I describe how this model is the most natural extension of the Gaussian factor model, preserving its essential dependence structure and the interpretability of factor loadings and the latent variables. I adopt an approximate likelihood for posterior inference and prove that, if the Gaussian copula model is true, the approximate posterior distribution of the copula correlation matrix asymptotically converges to the correct parameter under nearly any marginal distributions. I demonstrate with simulations that this method is both robust and efficient, and illustrate its use in an application from political science.</p><p>The second contribution is a novel nonparametric hierarchical mixture model for continuous, ordered and unordered categorical data. The model includes a hierarchical prior used to couple component indices of two separate models, which are also linked by local multivariate regressions. This structure effectively overcomes the limitations of existing mixture models for mixed data, namely the overly strong local independence assumptions. In the proposed model local independence is replaced by local conditional independence, so that the induced model is able to more readily adapt to structure in the data. I demonstrate the utility of this model as a default engine for multiple imputation of mixed data in a large repeated-sampling study using data from the Survey of Income and Participation. I show that it improves substantially on its most popular competitor, multiple imputation by chained equations (MICE), while enjoying certain theoretical properties that MICE lacks. </p><p>The third contribution is a latent variable model for density regression. Most existing density regression models are quite flexible but somewhat cumbersome to specify and fit, particularly when the regressors are a combination of continuous and categorical variables. The majority of these methods rely on extensions of infinite discrete mixture models to incorporate covariate dependence in mixture weights, atoms or both. I take a fundamentally different approach, introducing a continuous latent variable which depends on covariates through a parametric regression. In turn, the observed response depends on the latent variable through an unknown function. I demonstrate that a spline prior for the unknown function is quite effective relative to Dirichlet Process mixture models in density estimation settings (i.e., without covariates) even though these Dirichlet process mixtures have better theoretical properties asymptotically. The spline formulation enjoys a number of computational advantages over more flexible priors on functions. Finally, I demonstrate the utility of this model in regression applications using a dataset on U.S. wages from the Census Bureau, where I estimate the return to schooling as a smooth function of the quantile index.</p> / Dissertation
|
20 |
BAYESIAN SEMIPARAMETRIC GENERALIZATIONS OF LINEAR MODELS USING POLYA TREESSchoergendorfer, Angela 01 January 2011 (has links)
In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions.
One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations from the assumption of logistic error can result in great bias in odds ratio estimates. A one-step approximation to the Savage-Dickey ratio will be presented as a Bayesian test for distributional assumptions in the traditional logistic regression model. The approximation utilizes least-squares estimates in the place of a full Bayesian Markov Chain simulation, and the equivalence of inferences based on the two implementations will be shown. A framework for flexible, semiparametric estimation of risks in the case that the assumption of logistic error is rejected will be proposed.
A second application deals with regression scenarios in which residuals are correlated and their distribution evolves over an ordinal covariate such as time. In the context of prediction, such complex error distributions need to be modeled carefully and flexibly. The proposed model introduces dependent, but separate Polya tree priors for each time point, thus pooling information across time points to model gradual changes in distributional shapes. Theoretical properties of the proposed model will be outlined, and its potential predictive advantages in simulated scenarios and real data will be demonstrated.
|
Page generated in 0.0663 seconds