• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 1
  • 1
  • Tagged with
  • 12
  • 12
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Estima??o em modelos de tempo de falha acelerado para dados de sobreviv?ncia correlacionados

Santos, Patr?cia Borchardt 01 December 2009 (has links)
Made available in DSpace on 2014-12-17T15:26:38Z (GMT). No. of bitstreams: 1 Patricia Borchardt Santos.pdf: 378137 bytes, checksum: e27ccc5c056aa17d7bd2ca2c8b64458f (MD5) Previous issue date: 2009-12-01 / We presented in this work two methods of estimation for accelerated failure time models with random e_ects to process grouped survival data. The _rst method, which is implemented in software SAS, by NLMIXED procedure, uses an adapted Gauss-Hermite quadrature to determine marginalized likelihood. The second method, implemented in the free software R, is based on the method of penalized likelihood to estimate the parameters of the model. In the _rst case we describe the main theoretical aspects and, in the second, we briey presented the approach adopted with a simulation study to investigate the performance of the method. We realized implement the models using actual data on the time of operation of oil wells from the Potiguar Basin (RN / CE). / Apresentamos neste trabalho dois m?todos de estima??o para modelos de tempo de falha acelerado com efeito aleat?rio para tratar de dados de sobreviv?ncia correlacionados. O primeiro m?todo, que est? implementado no software SAS, atrav?s do procedimento NLMIXED, utiliza a quadratura Gauss-Hermite adaptada para obter a verossimilhan?a marginalizada. O segundo m?todo, implementado no software livre R, est? baseado no m?todo da verossimilhan?a penalizada para estimar os par?metros do modelo. No primeiro caso descrevemos os principais aspectos te?ricos e, no segundo, apresentamos brevemente a abordagem adotada juntamente com um estudo de simula??o para investigar a performance do m?todo. Realizamos uma aplica??o dos modelos usando dados reais sobre o tempo de funcionamento de po?os petrol?feros da Bacia Potiguar (RN/CE).
12

Scalable Sprase Bayesian Nonparametric and Matrix Tri-factorization Models for Text Mining Applications

Ranganath, B N January 2017 (has links) (PDF)
Hierarchical Bayesian Models and Matrix factorization methods provide an unsupervised way to learn latent components of data from the grouped or sequence data. For example, in document data, latent component corn-responds to topic with each topic as a distribution over a note vocabulary of words. For many applications, there exist sparse relationships between the domain entities and the latent components of the data. Traditional approaches for topic modelling do not take into account these sparsity considerations. Modelling these sparse relationships helps in extracting relevant information leading to improvements in topic accuracy and scalable solution. In our thesis, we explore these sparsity relationships for di errant applications such as text segmentation, topical analysis and entity resolution in dyadic data through the Bayesian and Matrix tri-factorization approaches, propos-in scalable solutions. In our rest work, we address the problem of segmentation of a collection of sequence data such as documents using probabilistic models. Existing state-of-the-art Hierarchical Bayesian Models are connected to the notion of Complete Exchangeability or Markov Exchangeability. Bayesian Nonpareil-metric Models based on the notion of Markov Exchangeability such as HDP-HMM and Sticky HDP-HMM, allow very restricted permutations of latent variables in grouped data (topics in documents), which in turn lead to com-mutational challenges for inference. At the other extreme, models based on Complete Exchangeability such as HDP allow arbitrary permutations within each group or document, and inference is significantly more tractable as a result, but segmentation is not meaningful using such models. To over-come these problems, we explored a new notion of exchangeability called Block Exchangeability that lies between Markov Exchangeability and Com-plate Exchangeability for which segmentation is meaningful, but inference is computationally less expensive than both Markov and Complete Exchange-ability. Parametrically, Block Exchangeability contains sparser number of transition parameters, linear in number of states compared to the quadratic order for Markov Exchangeability that is still less than that for Complete Exchangeability and for which parameters are on the order of the number of documents. For this, we propose a nonparametric Block Exchangeable model (BEM) based on the new notion of Block Exchangeability, which we have shown to be a superclass of Complete Exchangeability and subclass of Markov Exchangeability. We propose a scalable inference algorithm for BEM to infer the topics for words and segment boundaries associated with topics for a document using the collapsed Gibbs Sampling procedure. Empirical results show that BEM outperforms state-of-the-art nonparametric models in terms of scalability and generalization ability and shows nearly the same segmentation quality on News dataset, Product review dataset and on a Synthetic dataset. Interestingly, we can tune the scalability by varying the block size through a parameter in our model for a small trade-o with segmentation quality. In addition to exploring the association between documents and words, we also explore the sparse relationships for dyadic data, where associations between one pair of domain entities such as (documents, words) and as-associations between another pair such as (documents, users) are completely observed. We motivate the analysis of such dyadic data introducing an additional discrete dimension, which we call topics, and explore sparse relation-ships between the domain entities and the topic, such as of user-topic and document-topic respectively. In our second work, for this problem of sparse topical analysis of dyadic data, we propose a formulation using sparse matrix tri-factorization. This formulation requires sparsity constraints, not only on the individual factor matrices, but also on the product of two of the factors. To the best of our knowledge, this problem of sparse matrix tri-factorization has not been stud-ide before. We propose a solution that introduces a surrogate for the product of factors and enforces sparsity on this surrogate as well as on the individual factors through L1-regularization. The resulting optimization problem is e - cogently solvable in an alternating minimization framework over sub-problems involving individual factors using the well-known FISTA algorithm. For the sub-problems that are constrained, we use a projected variant of the FISTA algorithm. We also show that our formulation leads to independent sub-problems towards solving a factor matrix, thereby supporting parallel implementation leading to a scalable solution. We perform experiments over bibliographic and product review data to show that the proposed framework based on sparse tri-factorization formulation results in better generalization ability and factorization accuracy compared to baselines that use sparse bi-factorization. Even though the second work performs sparse topical analysis for dyadic data, ending sparse topical associations for the users, the user references with di errant names could belong to the same entity and those with same names could belong to different entities. The problem of entity resolution is widely studied in the research community, where the goal is to identify real users associated with the user references in the documents. Finally, we focus on the problem of entity resolution in dyadic data, where associations between one pair of domain entities such as documents-words and associations between another pair such as documents-users are ob.-served, an example of which includes bibliographic data. In our nil work, for this problem of entity resolution in bibliographic data, we propose a Bayesian nonparametric `Sparse entity resolution model' (SERM) exploring the sparse relationships between the grouped data involving grouping of the documents, and the topics/author entities in the group. Further, we also exploit the sparseness between an author entity and the associated author aliases. Grouping of the documents is achieved with the stick breaking prior for the Dirichlet processes (DP). To achieve sparseness, we propose a solution that introduces separate Indian Bu et process (IBP) priors over topics and the author entities for the groups and k-NN mechanism for selecting author aliases for the author entities. We propose a scalable inference for SERM by appropriately combining partially collapsed Gibbs sampling scheme in Focussed topic model (FTM), the inference scheme used for parametric IBP prior and the k-NN mechanism. We perform experiments over bibliographic datasets, Cite seer and Rexa, to show that the proposed SERM model imp-proves the accuracy of entity resolution by ending relevant author entities through modelling sparse relationships and is scalable, when compared to the state-of-the-art baseline

Page generated in 0.0609 seconds