271 |
Latent Dirichlet Allocation in RPonweiser, Martin 05 1900 (has links) (PDF)
Topic models are a new research field within the computer sciences information retrieval and text mining. They are generative probabilistic models of text corpora inferred by machine learning and they can be used for retrieval and text mining tasks. The most prominent topic model is latent Dirichlet allocation (LDA), which was introduced in 2003 by Blei et al. and has since then sparked off the development of other topic models for domain-specific purposes.
This thesis focuses on LDA's practical application. Its main goal is the replication of the data analyses from the 2004 LDA paper ``Finding scientific topics'' by Thomas Griffiths and Mark Steyvers within the framework of the R statistical programming language and the R~package topicmodels by Bettina Grün and Kurt Hornik. The complete process, including extraction of a text corpus from the PNAS journal's website, data preprocessing, transformation into a document-term matrix, model selection, model estimation, as well as presentation of the results, is fully documented and commented. The outcome closely matches the analyses of the original paper, therefore the research by Griffiths/Steyvers can be reproduced. Furthermore, this thesis proves the suitability of the R environment for text mining with LDA. (author's abstract) / Series: Theses / Institute for Statistics and Mathematics
|
272 |
Modeling Endogenous Treatment Eects with Heterogeneity: A Bayesian Nonparametric ApproachHu, Xuequn 01 January 2011 (has links)
This dissertation explores the estimation of endogenous treatment effects in the presence of heterogeneous responses. A Bayesian Nonparametric approach is taken to model the heterogeneity in treatment effects. Specifically, I adopt the Dirichlet Process Mixture (DPM) model to capture the heterogeneity and show that DPM often outperforms Finite Mixture Model (FMM) in providing more flexible function forms and thus better model fit. Rather than fixing the number of components in a mixture model, DPM allows the data and prior knowledge to determine the number of components in the data, thus providing an automatic mechanism for model selection.
Two DPM models are presented in this dissertation. The first DPM model is based on a two-equation selection model. A Dirichlet Process (DP) prior is specified on some or all the parameters of the structural equation, and marginal likelihoods are calculated to select the best DPM model. This model is used to study the incentive and selection
effects of having prescription drug coverage on total drug expenditures among Medicare beneficiaries.
The second DPM model utilizes a three-equation Roy-type
framework to model the observed heterogeneity that arises due to the treatment status, while the unobserved heterogeneity is handled by separate DPM models for the treated and untreated outcomes. This Roy-type DPM model is applied to a data set consisting of 33,081 independent individuals from the Medical Expenditure Panel Survey (MEPS), and the treatment effects of having private medical insurance on the outpatient expenditures are estimated.
Key Words: Treatment Effects, Endogeneity, Heterogeneity, Finite Mixture Model, Dirichlet Process Prior, Dirichlet Process Mixture, Roy-type Modeling, Importance Sampling,
Bridge Sampling
|
273 |
Forced vibrations via Nash-Moser iterationsFokam, Jean-Marcel 11 April 2014 (has links)
In this thesis, we prove the existence of large frequency periodic solutions for the nonlinear wave equations utt − uxx − v(x)u = u3 + [fnof]([Omega]t, x) (1) with Dirichlet boundary conditions. Here, [Omega] represents the frequency of the solution. The method we use to find the periodic solutions u([Omega]) for large [Omega] originates in the work of Craig and Wayne [10] where they constructed solutions for free vibrations, i.e., for [fnof] = 0. Here we construct smooth solutions for forced vibrations ([fnof] [not equal to] 0). Given an x-dependent analytic potential v(x) previous works on (1) either assume a smallness condition on [fnof] or yields a weak solution. The study of equations like (1) goes back at least to Rabinowitz in the sixties [25]. The main difficulty in finding periodic solutions of an equation like (1), is the appearance of small denominators in the linearized operator stemming from the left hand side. To overcome this difficulty, we used a Nash-Moser scheme introduced by Craig and Wayne in [10]. / text
|
274 |
Diskreti ribinė teorema bendrosioms Dirichlė eilutėms meromorfinių funkcijų erdvėje / A discrete limit theorem for general Dirichlet series in the space of meromorphic functionsŠemiotas, Donatas 29 September 2008 (has links)
Darbe įrodyta diskreti ribinė teorema bendrųjų Dirichlė eilučių poklasiui meromorfinių funkcijų erdvėje. Pateiktas ribinio mato išreikštinis pavidalas. / The discrete limit theorem for general Dirichlet series in the space of meromorphic functions was proved in this paper. Expressed shape of limit measue was provided.
|
275 |
Bayesian Hierarchical Models for Model ChoiceLi, Yingbo January 2013 (has links)
<p>With the development of modern data collection approaches, researchers may collect hundreds to millions of variables, yet may not need to utilize all explanatory variables available in predictive models. Hence, choosing models that consist of a subset of variables often becomes a crucial step. In linear regression, variable selection not only reduces model complexity, but also prevents over-fitting. From a Bayesian perspective, prior specification of model parameters plays an important role in model selection as well as parameter estimation, and often prevents over-fitting through shrinkage and model averaging.</p><p>We develop two novel hierarchical priors for selection and model averaging, for Generalized Linear Models (GLMs) and normal linear regression, respectively. They can be considered as "spike-and-slab" prior distributions or more appropriately "spike- and-bell" distributions. Under these priors we achieve dimension reduction, since their point masses at zero allow predictors to be excluded with positive posterior probability. In addition, these hierarchical priors have heavy tails to provide robust- ness when MLE's are far from zero.</p><p>Zellner's g-prior is widely used in linear models. It preserves correlation structure among predictors in its prior covariance, and yields closed-form marginal likelihoods which leads to huge computational savings by avoiding sampling in the parameter space. Mixtures of g-priors avoid fixing g in advance, and can resolve consistency problems that arise with fixed g. For GLMs, we show that the mixture of g-priors using a Compound Confluent Hypergeometric distribution unifies existing choices in the literature and maintains their good properties such as tractable (approximate) marginal likelihoods and asymptotic consistency for model selection and parameter estimation under specific values of the hyper parameters.</p><p>While the g-prior is invariant under rotation within a model, a potential problem with the g-prior is that it inherits the instability of ordinary least squares (OLS) estimates when predictors are highly correlated. We build a hierarchical prior based on scale mixtures of independent normals, which incorporates invariance under rotations within models like ridge regression and the g-prior, but has heavy tails like the Zeller-Siow Cauchy prior. We find this method out-performs the gold standard mixture of g-priors and other methods in the case of highly correlated predictors in Gaussian linear models. We incorporate a non-parametric structure, the Dirichlet Process (DP) as a hyper prior, to allow more flexibility and adaptivity to the data.</p> / Dissertation
|
276 |
An evaluation of latent Dirichlet allocation in the context of plant-pollinator networksCallaghan, Liam 08 January 2013 (has links)
There may be several mechanisms that drive observed interactions
between plants and pollinators in an ecosystem, many of which may
involve trait matching or trait complementarity. Hence a model of
insect species activity on plant species should be represented as
a mixture of these linkage rules. Unfortunately, ecologists do not
always know how many, or even which, traits are the main contributors
to the observed interactions. This thesis proposes the Latent Dirichlet
Allocation (LDA) model from artificial intelligence for modelling
the observed interactions in an ecosystem as a finite mixture of
(latent) interaction groups in which plant and pollinator pairs that
share common linkage rules are placed in the same interaction group.
Several model selection criteria are explored for estimating how many
interaction groups best describe the observed interactions. This thesis
also introduces a new model selection score called ``penalized perplexity".
The performance of the model selection criteria, and of LDA in general,
are evaluated through a comprehensive simulation study that consider
networks of various size along with varying levels of nesting and numbers of
interaction groups. Results of the simulation study suggest that LDA
works well on networks with mild-to-no nesting, but loses accuracy with
increased nestedness. Further, the penalized perplexity tended to
outperform the other model selection criteria in identifying the correct
number of interaction groups used to simulate the data. Finally, LDA was
demonstrated on a real network, the results of which provided insights
into the functional roles of pollinator species in the study region.
|
277 |
Sudėtinės funkcijos universalumas / Universality of one composite functionTamašauskaitė, Ugnė 30 July 2013 (has links)
Sudėtinės funkcijos universalumo įrodymas. / Bachelor thesis about universality of one composite function.
|
278 |
Jungtine universalumo teorema Dirichle L funkcijoms / Joint universality theorem for Dirichlet L-functionsDaukšaitė, Renata 02 July 2012 (has links)
Tegul X dirichlė charakteris moduliu q, s=o+it kompleksinis skaičius. Dirichlė L funkcija L(s, X) pusplokštumėje o>1 yra apibrėžiama Dirichlė eilute. Gerai žinome, kad funkcija L(s, X) kai X nėra pagrindinis charakteris, yra analiziškai pratęsiama į visą kompleksinę plokštum, tai yra, ji yra sveikoji funkcija. Jei X yra pagrindinis, tai tuomet funkcija turi paparastąjį polių su reziduumu. 1975 m S. M. Voroninas atrado labai įdomią funkcijų L(s, X) universalumo savybę. Grubiai kalbant ši savybė reiškia, kad kiekviena analizinė funkcija tam tikroje srityje gali būti norimu tikslumu aproksimuojama L funkcijų postūmiais L(s+it, X). Pastaruoju metu yra žinomas šiek tiek bendresnis teoremos variantas, kai X_1,...,X_r yra Dirichlė charakteriai,tenkinantyts 1 teoremos sąlygas, tačiau šio variano įrodymas nėra niekur paskeltas. Todėl magistro darbo tikslas yra pateikti tokios jungtinės universalumo teoremos Dirichlė L funkcijoms įrodymą. / Let X be a Dirichlet character modulo q, and s=o+it be a complex variable. A Dirichlet L-function L(s,X) is defined, for o>1, by Dirichlet serie and is analitic continued to the whole comples plane. It is knowen that the function L(s,X) is universal in the sense that the shifts L(s+it, X) approximate any analytic function. Also, Dirichlet L-function are jointly collection of given analytic functions. The master work is devoted to the proof of a modern joint universality theorem for Dirichlet L-function. This theorem is knowen,howerver , its proof is not given in literature.We remove this gap, and prove the following theorem.
|
279 |
Bayesian Methods for Two-Sample ComparisonSoriano, Jacopo January 2015 (has links)
<p>Two-sample comparison is a fundamental problem in statistics. Given two samples of data, the interest lies in understanding whether the two samples were generated by the same distribution or not. Traditional two-sample comparison methods are not suitable for modern data where the underlying distributions are multivariate and highly multi-modal, and the differences across the distributions are often locally concentrated. The focus of this thesis is to develop novel statistical methodology for two-sample comparison which is effective in such scenarios. Tools from the nonparametric Bayesian literature are used to flexibly describe the distributions. Additionally, the two-sample comparison problem is decomposed into a collection of local tests on individual parameters describing the distributions. This strategy not only yields high statistical power, but also allows one to identify the nature of the distributional difference. In many real-world applications, detecting the nature of the difference is as important as the existence of the difference itself. Generalizations to multi-sample comparison and more complex statistical problems, such as multi-way analysis of variance, are also discussed.</p> / Dissertation
|
280 |
Dirichlė L funkcijų universalumas / Universality of Dirichlet L-functionsJančiauskienė, Dovilija 17 July 2014 (has links)
Rusų matematikas S. M. Voroninas įrodė, kad vienos funkcijos pagalba galima aproksimuoti norimu tikslumu tam tikros srities kompleksinėse aibėse bet kurią analizinę funkciją. Tačiau neįrodė 1 teoremos analogo Dirichlė L funkcijoms. Darbo tikslas pateikti šios teoremos pilną įrodymą. / Russian mathematician S.M. Voronin proved, that any function can be approximated to the desired accuracy by one function in a specific sets in complex plane. But failed to theorem 1 analogue Dirichlet L-functions. The aim of this to provide a complete proof of the theorem.
|
Page generated in 0.0495 seconds