Global ETD Search

11	Agrupamento de textos utilizando divergência Kullback-Leibler / Texts grouping using Kullback-Leibler divergence Willian Darwin Junior 22 February 2016 (has links) O presente trabalho propõe uma metodologia para agrupamento de textos que possa ser utilizada tanto em busca textual em geral como mais especificamente na distribuição de processos jurídicos para fins de redução do tempo de resolução de conflitos judiciais. A metodologia proposta utiliza a divergência Kullback-Leibler aplicada às distribuições de frequência dos radicais (semantemas) das palavras presentes nos textos. Diversos grupos de radicais são considerados, formados a partir da frequência com que ocorrem entre os textos, e as distribuições são tomadas em relação a cada um desses grupos. Para cada grupo, as divergências são calculadas em relação à distribuição de um texto de referência formado pela agregação de todos os textos da amostra, resultando em um valor para cada texto em relação a cada grupo de radicais. Ao final, esses valores são utilizados como atributos de cada texto em um processo de clusterização utilizando uma implementação do algoritmo K-Means, resultando no agrupamento dos textos. A metodologia é testada em exemplos simples de bancada e aplicada a casos concretos de registros de falhas elétricas, de textos com temas em comum e de textos jurídicos e o resultado é comparado com uma classificação realizada por um especialista. Como subprodutos da pesquisa realizada, foram gerados um ambiente gráfico de desenvolvimento de modelos baseados em Reconhecimento de Padrões e Redes Bayesianas e um estudo das possibilidades de utilização de processamento paralelo na aprendizagem de Redes Bayesianas. / This work proposes a methodology for grouping texts for the purposes of textual searching in general but also specifically for aiding in distributing law processes in order to reduce time applied in solving judicial conflicts. The proposed methodology uses the Kullback-Leibler divergence applied to frequency distributions of word stems occurring in the texts. Several groups of stems are considered, built up on their occurrence frequency among the texts and the resulting distributions are taken regarding each one of those groups. For each group, divergences are computed based on the distribution taken from a reference text originated from the assembling of all sample texts, yelding one value for each text in relation to each group of stems. Finally, those values are taken as attributes of each text in a clusterization process driven by a K-Means algorithm implementation providing a grouping for the texts. The methodology is tested for simple toy examples and applied to cases of electrical failure registering, texts with similar issues and law texts and compared to an expert\'s classification. As byproducts from the conducted research, a graphical development environment for Pattern Recognition and Bayesian Networks based models and a study on the possibilities of using parallel processing in Bayesian Networks learning have also been obtained. Agrupamento de textos Algoritmo K-Means Divergência Kullback-Leibler Informação mútua K-Means algorithm Kullback-Leibler divergence Mutual information Text clustering
12	Agrupamento de textos utilizando divergência Kullback-Leibler / Texts grouping using Kullback-Leibler divergence Darwin Junior, Willian 22 February 2016 (has links) O presente trabalho propõe uma metodologia para agrupamento de textos que possa ser utilizada tanto em busca textual em geral como mais especificamente na distribuição de processos jurídicos para fins de redução do tempo de resolução de conflitos judiciais. A metodologia proposta utiliza a divergência Kullback-Leibler aplicada às distribuições de frequência dos radicais (semantemas) das palavras presentes nos textos. Diversos grupos de radicais são considerados, formados a partir da frequência com que ocorrem entre os textos, e as distribuições são tomadas em relação a cada um desses grupos. Para cada grupo, as divergências são calculadas em relação à distribuição de um texto de referência formado pela agregação de todos os textos da amostra, resultando em um valor para cada texto em relação a cada grupo de radicais. Ao final, esses valores são utilizados como atributos de cada texto em um processo de clusterização utilizando uma implementação do algoritmo K-Means, resultando no agrupamento dos textos. A metodologia é testada em exemplos simples de bancada e aplicada a casos concretos de registros de falhas elétricas, de textos com temas em comum e de textos jurídicos e o resultado é comparado com uma classificação realizada por um especialista. Como subprodutos da pesquisa realizada, foram gerados um ambiente gráfico de desenvolvimento de modelos baseados em Reconhecimento de Padrões e Redes Bayesianas e um estudo das possibilidades de utilização de processamento paralelo na aprendizagem de Redes Bayesianas. / This work proposes a methodology for grouping texts for the purposes of textual searching in general but also specifically for aiding in distributing law processes in order to reduce time applied in solving judicial conflicts. The proposed methodology uses the Kullback-Leibler divergence applied to frequency distributions of word stems occurring in the texts. Several groups of stems are considered, built up on their occurrence frequency among the texts and the resulting distributions are taken regarding each one of those groups. For each group, divergences are computed based on the distribution taken from a reference text originated from the assembling of all sample texts, yelding one value for each text in relation to each group of stems. Finally, those values are taken as attributes of each text in a clusterization process driven by a K-Means algorithm implementation providing a grouping for the texts. The methodology is tested for simple toy examples and applied to cases of electrical failure registering, texts with similar issues and law texts and compared to an expert\'s classification. As byproducts from the conducted research, a graphical development environment for Pattern Recognition and Bayesian Networks based models and a study on the possibilities of using parallel processing in Bayesian Networks learning have also been obtained. Agrupamento de textos Algoritmo K-Means Divergência Kullback-Leibler Informação mútua K-Means algorithm Kullback-Leibler divergence Mutual information Text clustering
13	Kombinování diskrétních pravděpodobnostních rozdělení pomocí křížové entropie pro distribuované rozhodování / Cross-entropy based combination of discrete probability distributions for distributed decision making Sečkárová, Vladimíra January 2015 (has links) Dissertation abstract Title: Cross-entropy based combination of discrete probability distributions for distributed de- cision making Author: Vladimíra Sečkárová Author's email: seckarov@karlin.mff.cuni.cz Department: Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague Supervisor: Ing. Miroslav Kárný, DrSc., The Institute of Information Theory and Automation of the Czech Academy of Sciences Supervisor's email: school@utia.cas.cz Abstract: In this work we propose a systematic way to combine discrete probability distributions based on decision making theory and theory of information, namely the cross-entropy (also known as the Kullback-Leibler (KL) divergence). The optimal combination is a probability mass function minimizing the conditional expected KL-divergence. The ex- pectation is taken with respect to a probability density function also minimizing the KL divergence under problem-reflecting constraints. Although the combination is derived for the case when sources provided probabilistic type of information on the common support, it can applied to other types of given information by proposed transformation and/or extension. The discussion regarding proposed combining and sequential processing of available data, duplicate data, influence...
14	Cellular diagnostic systems using hidden Markov models Mohammad, Maruf H. 29 November 2006 (has links) Radio frequency system optimization and troubleshooting remains one of the most challenging aspects of working in a cellular network. To stay competitive, cellular providers continually monitor the performance of their networks and use this information to determine where to improve or expand services. As a result, operators are saddled with the task of wading through overwhelmingly large amounts of data in order to trouble-shoot system problems. Part of the difficulty of this task is that for many complicated problems such as hand-off failure, clues about the cause of the failure are hidden deep within the statistics of underlying dynamic physical phenomena like fading, shadowing, and interference. In this research we propose that Hidden Markov Models (HMMs) be used as a method to infer signature statistics about the nature and sources of faults in a cellular system by fitting models to various time-series data measured throughout the network. By including HMMs in the network management tool, a provider can explore the statistical relationships between channel dynamics endemic to a cell and its resulting performance. This research effort also includes a new distance measure between a pair of HMMs that approximates the Kullback-Leibler divergence (KLD). Since there is no closed-form solution to calculate the KLD between the HMMs, the proposed analytical expression is very useful in classification and identification problems. A novel HMM based position location technique has been introduced that may be very useful for applications involving cognitive radios. / Ph. D. Kullback-Leibler divergence 1xEV-DV Cellular diagnostics Dropped call prediction Hidden Markov models Coverage problem Position location
15	BAYESIAN OPTIMAL DESIGN OF EXPERIMENTS FOR EXPENSIVE BLACK-BOX FUNCTIONS UNDER UNCERTAINTY Piyush Pandita (6561242) 10 June 2019 (has links) <div>Researchers and scientists across various areas face the perennial challenge of selecting experimental conditions or inputs for computer simulations in order to achieve promising results.</div><div> The aim of conducting these experiments could be to study the production of a material that has great applicability.</div><div> One might also be interested in accurately modeling and analyzing a simulation of a physical process through a high-fidelity computer code.</div><div> The presence of noise in the experimental observations or simulator outputs, called aleatory uncertainty, is usually accompanied by limited amount of data due to budget constraints.</div><div> This gives rise to what is known as epistemic uncertainty. </div><div> This problem of designing of experiments with limited number of allowable experiments or simulations under aleatory and epistemic uncertainty needs to be treated in a Bayesian way.</div><div> The aim of this thesis is to extend the state-of-the-art in Bayesian optimal design of experiments where one can optimize and infer statistics of the expensive experimental observation(s) or simulation output(s) under uncertainty.</div> Mechanical Engineering Optimal experimental design Uncertainty quantification Bayesian inference Non-stationary Gaussian Processes Kullback Leibler divergence Bayesian Optimization Stochastic Optimization
16	Uso dos métodos clássico e bayesiano para os modelos não-lineares heterocedásticos simétricos / Use of the classical and bayesian methods for nonlinear heterocedastic symmetric models Macêra, Márcia Aparecida Centanin 21 June 2011 (has links) Os modelos normais de regressão têm sido utilizados durante muitos anos para a análise de dados. Mesmo nos casos em que a normalidade não podia ser suposta, tentava-se algum tipo de transformação com o intuito de alcançar a normalidade procurada. No entanto, na prática, essas suposições sobre normalidade e linearidade nem sempre são satisfeitas. Como alternativas à técnica clássica, foram desenvolvidas novas classes de modelos de regressão. Nesse contexto, focamos a classe de modelos em que a distribuição assumida para a variável resposta pertence à classe de distribuições simétricas. O objetivo geral desse trabalho é a modelagem desta classe no contexto bayesiano, em particular a modelagem da classe de modelos não-lineares heterocedásticos simétricos. Vale ressaltar que esse trabalho tem ligação com duas linhas de pesquisa, a saber: a inferência estatística abordando aspectos da teoria assintótica e a inferência bayesiana considerando aspectos de modelagem e critérios de seleção de modelos baseados em métodos de simulação de Monte Carlo em Cadeia de Markov (MCMC). Uma primeira etapa consiste em apresentar a classe dos modelos não-lineares heterocedásticos simétricos bem como a inferência clássica dos parâmetros desses modelos. Posteriormente, propomos uma abordagem bayesiana para esses modelos, cujo objetivo é mostrar sua viabilidade e comparar a inferência bayesiana dos parâmetros estimados via métodos MCMC com a inferência clássica das estimativas obtidas por meio da ferramenta GAMLSS. Além disso, utilizamos o método bayesiano de análise de influência caso a caso baseado na divergência de Kullback-Leibler para detectar observações influentes nos dados. A implementação computacional foi desenvolvida no software R e para detalhes dos programas pode ser consultado aos autores do trabalho / The normal regression models have been used for many years for data analysis. Even in cases where normality could not be assumed, was trying to be some kind of transformation in order to achieve the normality sought. However, in practice, these assumptions about normality and linearity are not always satisfied. As alternatives to classical technique new classes of regression models were developed. In this context, we focus on the class of models in which the distribution assumed for the response variable belongs to the symmetric distributions class. The aim of this work is the modeling of this class in the bayesian context, in particular the modeling of the nonlinear models heteroscedastic symmetric class. Note that this work is connected with two research lines, the statistical inference addressing aspects of asymptotic theory and the bayesian inference considering aspects of modeling and criteria for models selection based on simulation methods Monte Carlo Markov Chain (MCMC). A first step is to present the nonlinear models heteroscedastic symmetric class as well as the classic inference of parameters of these models. Subsequently, we propose a bayesian approach to these models, whose objective is to show their feasibility and compare the estimated parameters bayesian inference by MCMC methods with the classical inference of the estimates obtained by GAMLSS tool. In addition, we use the bayesian method of influence analysis on a case based on the Kullback-Leibler divergence for detecting influential observations in the data. The computational implementation was developed in the software R and programs details can be found at the studys authors Divergência de Kullback-Leibler GAMLSS GAMLSS Heterocedasticidade Heterocedasticity Kullback-Leibler divergence MCMC MCMC Modelos não-lineares Modelos simétricos Nonlinear models Symmetric models
17	Estimação e diagnóstico na distribuição exponencial por partes em análise de sobrevivência com fração de cura / Estimation and diagnostics for the piecewise exponential distribution in survival analysis with fraction cure Sibim, Alessandra Cristiane 31 March 2011 (has links) O principal objetivo deste trabalho é desenvolver procedimentos inferências em uma perspectiva bayesiana para modelos de sobrevivência com (ou sem) fração de cura baseada na distribuição exponencial por partes. A metodologia bayesiana é baseada em métodos de Monte Carlo via Cadeias de Markov (MCMC). Para detectar observações influentes nos modelos considerados foi usado o método bayesiano de análise de influência caso a caso (Cho et al., 2009), baseados na divergência de Kullback-Leibler. Além disso, propomos o modelo destrutivo binomial negativo com fração de cura. O modelo proposto é mais geral que os modelos de sobrevivência com fração de cura, já que permitem estimar a probabilidade do número de causas que não foram eliminadas por um tratamento inicial / The main objective is to develop procedures inferences in a bayesian perspective for survival models with (or without) the cure rate based on piecewise exponential distribution. The methodology is based on bayesian methods for Markov Chain Monte Carlo (MCMC). To detect influential observations in the models considering bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence (Cho et al., 2009). Furthermore, we propose the negative binomial model destructive cure rate. The proposed model is more general than the survival models with cure rate, since the probability to estimate the number of cases which were not eliminated by an initial treatment Análise de sobrevivência Bayesian inference Divergência de Kullback-Leibler Inferência bayesiana Kullback-Leibler divergence MCMC methods Measures of diagnostic bayesian Medidas de diagnóstico bayesiano Métodos MCMC Survival analysis
18	Estimação e diagnóstico na distribuição exponencial por partes em análise de sobrevivência com fração de cura / Estimation and diagnostics for the piecewise exponential distribution in survival analysis with fraction cure Alessandra Cristiane Sibim 31 March 2011 (has links) O principal objetivo deste trabalho é desenvolver procedimentos inferências em uma perspectiva bayesiana para modelos de sobrevivência com (ou sem) fração de cura baseada na distribuição exponencial por partes. A metodologia bayesiana é baseada em métodos de Monte Carlo via Cadeias de Markov (MCMC). Para detectar observações influentes nos modelos considerados foi usado o método bayesiano de análise de influência caso a caso (Cho et al., 2009), baseados na divergência de Kullback-Leibler. Além disso, propomos o modelo destrutivo binomial negativo com fração de cura. O modelo proposto é mais geral que os modelos de sobrevivência com fração de cura, já que permitem estimar a probabilidade do número de causas que não foram eliminadas por um tratamento inicial / The main objective is to develop procedures inferences in a bayesian perspective for survival models with (or without) the cure rate based on piecewise exponential distribution. The methodology is based on bayesian methods for Markov Chain Monte Carlo (MCMC). To detect influential observations in the models considering bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence (Cho et al., 2009). Furthermore, we propose the negative binomial model destructive cure rate. The proposed model is more general than the survival models with cure rate, since the probability to estimate the number of cases which were not eliminated by an initial treatment Análise de sobrevivência Divergência de Kullback-Leibler Inferência bayesiana Medidas de diagnóstico bayesiano Métodos MCMC Bayesian inference Kullback-Leibler divergence MCMC methods Measures of diagnostic bayesian Survival analysis
19	Uso dos métodos clássico e bayesiano para os modelos não-lineares heterocedásticos simétricos / Use of the classical and bayesian methods for nonlinear heterocedastic symmetric models Márcia Aparecida Centanin Macêra 21 June 2011 (has links) Os modelos normais de regressão têm sido utilizados durante muitos anos para a análise de dados. Mesmo nos casos em que a normalidade não podia ser suposta, tentava-se algum tipo de transformação com o intuito de alcançar a normalidade procurada. No entanto, na prática, essas suposições sobre normalidade e linearidade nem sempre são satisfeitas. Como alternativas à técnica clássica, foram desenvolvidas novas classes de modelos de regressão. Nesse contexto, focamos a classe de modelos em que a distribuição assumida para a variável resposta pertence à classe de distribuições simétricas. O objetivo geral desse trabalho é a modelagem desta classe no contexto bayesiano, em particular a modelagem da classe de modelos não-lineares heterocedásticos simétricos. Vale ressaltar que esse trabalho tem ligação com duas linhas de pesquisa, a saber: a inferência estatística abordando aspectos da teoria assintótica e a inferência bayesiana considerando aspectos de modelagem e critérios de seleção de modelos baseados em métodos de simulação de Monte Carlo em Cadeia de Markov (MCMC). Uma primeira etapa consiste em apresentar a classe dos modelos não-lineares heterocedásticos simétricos bem como a inferência clássica dos parâmetros desses modelos. Posteriormente, propomos uma abordagem bayesiana para esses modelos, cujo objetivo é mostrar sua viabilidade e comparar a inferência bayesiana dos parâmetros estimados via métodos MCMC com a inferência clássica das estimativas obtidas por meio da ferramenta GAMLSS. Além disso, utilizamos o método bayesiano de análise de influência caso a caso baseado na divergência de Kullback-Leibler para detectar observações influentes nos dados. A implementação computacional foi desenvolvida no software R e para detalhes dos programas pode ser consultado aos autores do trabalho / The normal regression models have been used for many years for data analysis. Even in cases where normality could not be assumed, was trying to be some kind of transformation in order to achieve the normality sought. However, in practice, these assumptions about normality and linearity are not always satisfied. As alternatives to classical technique new classes of regression models were developed. In this context, we focus on the class of models in which the distribution assumed for the response variable belongs to the symmetric distributions class. The aim of this work is the modeling of this class in the bayesian context, in particular the modeling of the nonlinear models heteroscedastic symmetric class. Note that this work is connected with two research lines, the statistical inference addressing aspects of asymptotic theory and the bayesian inference considering aspects of modeling and criteria for models selection based on simulation methods Monte Carlo Markov Chain (MCMC). A first step is to present the nonlinear models heteroscedastic symmetric class as well as the classic inference of parameters of these models. Subsequently, we propose a bayesian approach to these models, whose objective is to show their feasibility and compare the estimated parameters bayesian inference by MCMC methods with the classical inference of the estimates obtained by GAMLSS tool. In addition, we use the bayesian method of influence analysis on a case based on the Kullback-Leibler divergence for detecting influential observations in the data. The computational implementation was developed in the software R and programs details can be found at the studys authors Divergência de Kullback-Leibler GAMLSS Heterocedasticidade MCMC Modelos não-lineares Modelos simétricos GAMLSS Heterocedasticity Kullback-Leibler divergence MCMC Nonlinear models Symmetric models
20	Text-Based Information Retrieval Using Relevance Feedback Krishnan, Sharenya January 2011 (has links) Europeana, a freely accessible digital library with an idea to make Europe's cultural and scientific heritage available to the public was founded by the European Commission in 2008. The goal was to deliver a semantically enriched digital content with multilingual access to it. Even though they managed to increase the content of data they slowly faced the problem of retrieving information in an unstructured form. So to complement the Europeana portal services, ASSETS (Advanced Search Service and Enhanced Technological Solutions) was introduced with services that sought to improve the usability and accessibility of Europeana. My contribution is to study different text-based information retrieval models, their relevance feedback techniques and to implement one simple model. The thesis explains a detailed overview of the information retrieval process along with the implementation of the chosen strategy for relevance feedback that generates automatic query expansion. Finally, the thesis concludes with the analysis made using relevance feedback, discussion on the model implemented and then an assessment on future use of this model both as a continuation of my work and using this model in ASSETS. Information Retrieval Relevance Feedback Query Expansion Rocchio classification Probabilistic model Lucene Similarity scoring function Kullback-Leibler Divergence (KLD) Engineering and Technology Teknik och teknologier

Search results