Spelling suggestions: "subject:"KL divergence"" "subject:"KL adivergence""
1 |
Computational Gains Via a Discretization of the Parameter Space in Individual Level Models of Infectious DiseaseFANG, XUAN 13 January 2012 (has links)
The Bayesian Markov Chain Monte Carlo(MCMC) approach to inference is commonly used to estimate the parameters in spatial infectious disease models. However, such MCMC analyses can pose a hefty computational burden. Here we present new method to reduce the computing time cost in such MCMC analyses and study its usefulness. This method is based a round the discretization of the spatial parameters in the infectious disease model. A normal approximation of the posterior density of the output from the original model will be compared to that of the modified model, using the Kullback-Leibler(KL) divergence measure.
|
2 |
Data-rich document geotagging using geodesic gridsWing, Benjamin Patai 07 July 2011 (has links)
This thesis investigates automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evidence. All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds. For Wikipedia, our best method obtains a median prediction error of just 11.8 kilometers. Twitter geolocation is more challenging: we obtain a median error of 479 km, an improvement on previous results for the dataset. / text
|
3 |
Representações hierárquicas de vocábulos de línguas indígenas brasileiras: modelos baseados em mistura de Gaussianas / Hierarchical representations of words of brazilian indigenous languages: models based on Gaussian mixtureSepúlveda Torres, Lianet 08 December 2010 (has links)
Apesar da ampla diversidade de línguas indígenas no Brasil, poucas pesquisas estudam estas línguas e suas relações. Inúmeros esforços têm sido dedicados a procurar similaridades entre as palavras das línguas indígenas e classificá-las em famílias de línguas. Seguindo a classificação mais aceita das línguas indígenas do Brasil, esta pesquisa propõe comparar palavras de 10 línguas indígenas brasileiras. Para isso, considera-se que estas palavras são sinais de fala e estima-se a função de distribuição de probabilidade (PDF) de cada palavra, usando um modelo de mistura de gaussianas (GMM). A PDF foi considerada um modelo para representar as palavras. Os modelos foram comparados utilizando medidas de distância para construir estruturas hierárquicas que evidenciaram possíveis relações entre as palavras. Seguindo esta linha, a hipótese levantada nesta pesquisa é que as PDFs baseadas em GMM conseguem caracterizar as palavras das línguas indígenas, permitindo o emprego de medidas de distância entre elas para estabelecer relações entre as palavras, de forma que tais relações confirmem algumas das classificações. Os parâmetros do GMM foram calculados utilizando o algoritmo Maximização da Expectância (em inglês, Expectation Maximization (EM)). A divergência Kullback Leibler (KL) foi empregada para medir semelhança entre as PDFs. Esta divergência serve de base para estabelecer as estruturas hierárquicas que ilustram as relações entre os modelos. A estimativa da PDF, baseada em GMM foi testada com o auxílio de sinais simulados, sendo possível confirmar que os parâmetros obtidos são próximos dos originais. Foram implementadas várias medidas de distância para avaliar se a semelhança entre os modelos estavam determinadas pelos modelos e não pelas medidas adotadas neste estudo. Os resultados de todas as medidas foram similares, somente foi observada alguma diferença nos agrupamentos realizados pela distância C2, por isso foi proposta como complemento da divergência KL. Estes resultados sugerem que as relações entre os modelos dependem das suas características, não das métricas de distância selecionadas no estudo e que as PDFs baseadas em GMM, conseguem fazer uma caracterização adequada das palavras. Em geral, foram observados agrupamentos entre palavras que pertenciam a línguas de um mesmo tronco linguístico, assim como se observou uma tendência a incluir línguas isoladas nos agrupamentos dos troncos linguísticos. Palavras que pertenciam a determinada língua apresentaram um comportamento padrão, sendo identificadas por esse tipo de comportamento. Embora os resultados para as palavras das línguas indígenas sejam inconclusivos, considera-se que o estudo foi útil para aumentar o conhecimento destas 10 línguas estudadas, propondo novas linhas de pesquisas dedicadas à análise destas palavras. / Although there exists a large diversity of indigenous languages in Brazil, there are few researches on these languages and their relationships. Numerous efforts have been dedicated to search for similarities among words of indigenous languages to classify them into families. Following the most accepted classification of Brazilian indigenous languages, this research proposes to compare words of 10 Brazilian indigenous languages. The words of the indigenous languages are considered speech signals and the Probability Distribution Function (PDF) of each word was estimated using the Gaussian Mixture Models (GMM). This estimation was considered a model to represent each word. The models were compared using distance measures to construct hierarchical structures that illustrate possible relationships among words. The hypothesis in this research is that the estimation of the PDF, based on GMM can characterize the words of indigenous languages, allowing the use of distance measures between the PDFs to establish relationships among the words and confirm some of the classifications. The Expectation Maximization algorithm (EM) was implemented to estimate the parameters that describe the GMM. The Kullback Leibler (KL) divergence was used to measure similarities between two PDFs. This divergence is the basis to establish the hierarchical structures that show the relationships among the models. The PDF estimation, based on GMM was tested using simulated signals, allowing confirming the useful approximation of the original parameters. Several distance measures were implemented to prove that the similarities among the models depended on the model of each word, and not on the distance measure adopted in this study. The results of all measures were similar, however, as the clustering results of the C2 distances showed some differences from the other clusters, C2 distance was proposed to complement the KL divergence. The results suggest that the relationships between models depend on their characteristics, and not on the distance measures selected in this study, and the PDFs based on GMM can properly characterize the words. In general, relations among languages that belong to the same linguistic branch were illustrated, showing a tendency to include isolated languages in groups of languages that belong to the same linguistic branches. As the GMM of some language families presents a standard behavior, it allows identifying each family. Although the results of the words of indigenous languages are inconclusive, this study is considered very useful to increase the knowledge of these types of languages and to propose new research lines directed to analyze this type of signals.
|
4 |
Representações hierárquicas de vocábulos de línguas indígenas brasileiras: modelos baseados em mistura de Gaussianas / Hierarchical representations of words of brazilian indigenous languages: models based on Gaussian mixtureLianet Sepúlveda Torres 08 December 2010 (has links)
Apesar da ampla diversidade de línguas indígenas no Brasil, poucas pesquisas estudam estas línguas e suas relações. Inúmeros esforços têm sido dedicados a procurar similaridades entre as palavras das línguas indígenas e classificá-las em famílias de línguas. Seguindo a classificação mais aceita das línguas indígenas do Brasil, esta pesquisa propõe comparar palavras de 10 línguas indígenas brasileiras. Para isso, considera-se que estas palavras são sinais de fala e estima-se a função de distribuição de probabilidade (PDF) de cada palavra, usando um modelo de mistura de gaussianas (GMM). A PDF foi considerada um modelo para representar as palavras. Os modelos foram comparados utilizando medidas de distância para construir estruturas hierárquicas que evidenciaram possíveis relações entre as palavras. Seguindo esta linha, a hipótese levantada nesta pesquisa é que as PDFs baseadas em GMM conseguem caracterizar as palavras das línguas indígenas, permitindo o emprego de medidas de distância entre elas para estabelecer relações entre as palavras, de forma que tais relações confirmem algumas das classificações. Os parâmetros do GMM foram calculados utilizando o algoritmo Maximização da Expectância (em inglês, Expectation Maximization (EM)). A divergência Kullback Leibler (KL) foi empregada para medir semelhança entre as PDFs. Esta divergência serve de base para estabelecer as estruturas hierárquicas que ilustram as relações entre os modelos. A estimativa da PDF, baseada em GMM foi testada com o auxílio de sinais simulados, sendo possível confirmar que os parâmetros obtidos são próximos dos originais. Foram implementadas várias medidas de distância para avaliar se a semelhança entre os modelos estavam determinadas pelos modelos e não pelas medidas adotadas neste estudo. Os resultados de todas as medidas foram similares, somente foi observada alguma diferença nos agrupamentos realizados pela distância C2, por isso foi proposta como complemento da divergência KL. Estes resultados sugerem que as relações entre os modelos dependem das suas características, não das métricas de distância selecionadas no estudo e que as PDFs baseadas em GMM, conseguem fazer uma caracterização adequada das palavras. Em geral, foram observados agrupamentos entre palavras que pertenciam a línguas de um mesmo tronco linguístico, assim como se observou uma tendência a incluir línguas isoladas nos agrupamentos dos troncos linguísticos. Palavras que pertenciam a determinada língua apresentaram um comportamento padrão, sendo identificadas por esse tipo de comportamento. Embora os resultados para as palavras das línguas indígenas sejam inconclusivos, considera-se que o estudo foi útil para aumentar o conhecimento destas 10 línguas estudadas, propondo novas linhas de pesquisas dedicadas à análise destas palavras. / Although there exists a large diversity of indigenous languages in Brazil, there are few researches on these languages and their relationships. Numerous efforts have been dedicated to search for similarities among words of indigenous languages to classify them into families. Following the most accepted classification of Brazilian indigenous languages, this research proposes to compare words of 10 Brazilian indigenous languages. The words of the indigenous languages are considered speech signals and the Probability Distribution Function (PDF) of each word was estimated using the Gaussian Mixture Models (GMM). This estimation was considered a model to represent each word. The models were compared using distance measures to construct hierarchical structures that illustrate possible relationships among words. The hypothesis in this research is that the estimation of the PDF, based on GMM can characterize the words of indigenous languages, allowing the use of distance measures between the PDFs to establish relationships among the words and confirm some of the classifications. The Expectation Maximization algorithm (EM) was implemented to estimate the parameters that describe the GMM. The Kullback Leibler (KL) divergence was used to measure similarities between two PDFs. This divergence is the basis to establish the hierarchical structures that show the relationships among the models. The PDF estimation, based on GMM was tested using simulated signals, allowing confirming the useful approximation of the original parameters. Several distance measures were implemented to prove that the similarities among the models depended on the model of each word, and not on the distance measure adopted in this study. The results of all measures were similar, however, as the clustering results of the C2 distances showed some differences from the other clusters, C2 distance was proposed to complement the KL divergence. The results suggest that the relationships between models depend on their characteristics, and not on the distance measures selected in this study, and the PDFs based on GMM can properly characterize the words. In general, relations among languages that belong to the same linguistic branch were illustrated, showing a tendency to include isolated languages in groups of languages that belong to the same linguistic branches. As the GMM of some language families presents a standard behavior, it allows identifying each family. Although the results of the words of indigenous languages are inconclusive, this study is considered very useful to increase the knowledge of these types of languages and to propose new research lines directed to analyze this type of signals.
|
5 |
Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural GradientJanuary 2019 (has links)
abstract: Deep neural networks (DNNs) have had tremendous success in a variety of
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019
|
6 |
Mera sličnosti između modela Gausovih smeša zasnovana na transformaciji prostora parametaraKrstanović Lidija 25 September 2017 (has links)
<p>Predmet istraživanja ovog rada je istraživanje i eksploatacija mogućnosti da parametri Gausovih komponenti korišćenih Gaussian mixture modela (GMM) aproksimativno leže na niže dimenzionalnoj površi umetnutoj u konusu pozitivno definitnih matrica. U tu svrhu uvodimo novu, mnogo efikasniju meru sličnosti između GMM-ova projektovanjem LPP-tipa parametara komponenti iz više dimenzionalnog parametarskog originalno konfiguracijskog prostora u prostor značajno niže dimenzionalnosti. Prema tome, nalaženje distance između dva GMM-a iz originalnog prostora se redukuje na nalaženje distance između dva skupa niže dimenzionalnih euklidskih vektora, ponderisanih odgovarajućim težinama. Predložena mera je pogodna za primene koje zahtevaju visoko dimenzionalni prostor obeležja i/ili veliki ukupan broj Gausovih komponenti. Razrađena metodologija je primenjena kako na sintetičkim tako i na realnim eksperimentalnim podacima.</p> / <p>This thesis studies the possibility that the parameters of Gaussian components of a<br />particular Gaussian Mixture Model (GMM) lie approximately on a lower-dimensional<br />surface embedded in the cone of positive definite matrices. For that case, we deliver<br />novel, more efficient similarity measure between GMMs, by LPP-like projecting the<br />components of a particular GMM, from the high dimensional original parameter space,<br />to a much lower dimensional space. Thus, finding the distance between two GMMs in<br />the original space is reduced to finding the distance between sets of lower<br />dimensional euclidian vectors, pondered by corresponding weights. The proposed<br />measure is suitable for applications that utilize high dimensional feature spaces and/or<br />large overall number of Gaussian components. We confirm our results on artificial, as<br />well as real experimental data.</p>
|
7 |
Mining for Frequent Community Structures using Approximate Graph MatchingKolli, Lakshmi Priya 15 July 2021 (has links)
No description available.
|
8 |
Recommending Answers to Math Questions Using KL-Divergence and the Approximate XML Tree Matching ApproachGao, Siqi 30 May 2023 (has links) (PDF)
Mathematics is the science and study of quality, structure, space, and change. It seeks out patterns, formulates new conjectures, and establishes the truth by rigorous deduction from appropriately chosen axioms and definitions. The study of mathematics makes a person better at solving problems. It gives someone skills that (s)he can use across other subjects and apply in many different job roles. In the modern world, builders use mathematics every day to do their work, since construction workers add, subtract, divide, multiply, and work with fractions. It is obvious that mathematics is a major contributor to many areas of study. For this reason, retrieving, ranking, and recommending Math answers, which is an application of Math information retrieval (IR), deserves attention and recognition, since a reliable recommender system helps users find the relevant answers to Math questions and benefits all Math learners whenever they need help solve a Math problem, regardless of the time and place. Such a recommender system can enhance the learning experience and enrich the knowledge in Math of its users. We have developed MaRec, a recommender system that retrieves and ranks Math answers based on their textual content and embedded formulas in answering a Math question. MaRec (i) applies KL-divergence to rank the textual content of a potential answer A with respect to the textual content of a Math question Q, and (ii) together with the representation of the Math formulas in Q and A as XML trees determines their subtree matching scores in ranking A as an answer to Q. The design of MaRec is simple, since it does not require the training and test process mandated by machine learning-based Math IR systems, which is tedious to set up and time consuming to train the models. Conducted empirical studies show that MaRec significantly outperforms (i) three existing state-of-the-art MathIR systems based on an offline evaluation, and (ii) a top-of-the-line machine learning system based on an online performance analysis.
|
9 |
Extraction of gating mechanisms from Markov state models of a pentameric ligand-gated ion channelKaralis, Dimitrios January 2021 (has links)
GLIC är en pH-känslig pentamerisk ligandstyrd jonkanal (pLGIC) som finns i cellmembranet hos prokaryoten Gloeobacter violaceus. GLIC är en bakteriell homolog till flera receptorer som är viktiga i nervsystemet hos de flesta eukaryotiska organismer. Dessa receptorer fungerar som mallar för utvecklingen av målstyrda bedövnings- och stimulerande läkemedel som påverkar nervsystemet. Förståelsen av ett proteins mekanismer har därför hög prioritet inför läkemedelsutvecklingen. Eukaryota pLGICs är dock mycket komplexa eftersom några av de är heteromera, har flera domäner, och de pågår eftertranslationella ändringar. GLIC, å andra sidan, har en enklare struktur och det räcker att analysera strukturen av en subenhet - eftersom alla subenheter är helt lika. Flertalet möjliga grindmekanismer föreslogs av vetenskapen men riktiga öppningsmekanismen av GLIC är fortfarande oklar. Projektets mål är att genomföra maskininlärning (ML) för att upptäcka nya grindmekanismer med hjälp av datormetoder. Urspungsdatan togs från tidigare forskning där andra ML-redskap såsom molekyldynamik (MD), elastisk nätverksstyrd Brownsk dynamik (eBDIMS) och Markovstillståndsmodeller (MSM) användes. Utifrån dessa redskap simulerades proteinet som vildtyp samt med funktionsförstärkt mutation vid två olika pH värden. Fem makrotillstånd byggdes: två öppna, två stängda och ett mellanliggande. I projektet användes ett annat ML redskap: KL-divergens. Detta redskap användes för att hitta skillnader i avståndfördelning mellan öppet och stängt makrotillstånd. Utifrån ursprungsdatan byggdes en tensor som lagrade alla parvisa aminosyrornas avstånd. Varje aminosyrapar hade sin egen metadata som i sin tur användes för att frambringa alla fem avståndsfördelningar fråm MSMs som byggdes i förväg. Sedan bräknades medel-KL-divergens mellan två avståndfördelningar av intresse för att filtrera bort aminosyropar med överlappande avståndsfördelningar. För att se till att aminosyror inom aminosyrapar som låg kvar kan påverka varandra, filtrerades bort alla par vars minsta och medelavstånd var stora. De kvarvarande aminosyroparen utvärderades i förhållande till alla fem makrotillstånd Viktiga nya grindmekanismer som hittades genom både KL-divergens och makrotillståndsfördelningar innefattade loopen mellan M2-M3 helixarna av en subenhet och både loopen mellan sträckor β8 och β9 (Loop F)/N-terminal β9-sträckan och pre-M1/N-terminal M1 av närliggande subenheten. Loopen mellan sträckor β8 och β9 (Loop F) visade höga KL-värden också med loopen mellan sträckor β1 och β2 loop samt med loopen mellan sträckor β6 och β7 (Pro-loop) och avståndet mellan aminosyror minskade vid kanalens grind. Övriga intressanta grindmekanismer innefattade parning av aminosyror från loopen β4-β5 (Loop A) med aminosyror från sträckor β1 och β6 samt böjning av kanalen porangränsande helix. KL-divergens påvisades vara ett viktigt redskap för att filtrera tillgänglig data och de nya grindmekanismer kan bli användbara både för akademin, som vill reda ut GLIC:s fullständiga grindmekanismer, och läkemedelsföretag, som letar efter bindningsställen inom molekylen för att utveckla nya läkemedel. / GLIC is a transmembrane proton-gated pentameric ligand-gated ion channel (pLGIC) that is found in the prokaryote Gloeobacter violaceus. GLIC is the prokaryotic homolog to several receptors that are found in the nervous system of many eukaryotic organisms. These receptors are targets for the development of pharmaceutical drugs that interfere with the gating of these channels - such drugs involve anesthetics and stimulants. Understanding the mechanism of a drug’s target is a high priority for the development of a novel medicine. However, eukaryotic pLGICs are complex to analyse, because some of them are heteromeric, have more domains, and because of their post-translational modifications (PTMs). GLIC, on the other hand, has a simpler structure and it is enough to study the structure of only one subunit - since all subunits are identical. Several possible gating mechanisms have been proposed by the scientific community, but the complete gating of GLIC remains unclear. The goal of this project is to implement machine learning (ML) to discover novel gating mechanisms by computational approaches. The starting data was extracted from a previous research where computational tools like unbiased molecular dynamics (MD), elastic network-driven Brownian Dynamics (eBDIMS), and Markov state models (MSMs) were used. From those tools, the protein was simulated in wild-type and in a gain-of-function mutation at two different pH values. Five macrostates were constructed: two open, two closed, and an intermediate. In this project another ML tool was used: KL divergence. This tool was used to score the difference between the distance distributions of one open and one closed macrostate. The starting data was used to create a tensor that stored all residue-residue distances. Each residue pair had its own metadata, which in turn was used to yield the distance distributions of all five pre-build MSMs. Then the average KL scores between two states of interest were calculated and were used to filter out the residue pairs with overlapping distance distributions. To make sure that the residues within a pair can interact with each other, all residue pairs with very high minimum and average distance were filtered out as well. The residue pairs that remained were later evaluated across all five macrostates for further studies. Important novel mechanisms discovered in this project through both the KL divergence and the macrostate distributions involved the M2-M3 loop of one subunit and both the β8-β9 loop/N-terminal β9 strand and the preM1/N-terminal M1 region of the neighboring subunit. The β8-β9 loop (Loop F) showed high KL scores with the β1-β2 and β6-β7 (Pro-loop) loops as well with decreasing distances upon the channel’s opening. Other notable gating mechanisms involved are the pairing of residues from the β1-β2 loop (Loop A) with residues from the strands β1 and β6, as well as the kink of the pore-lining helix. KL divergence proved a valuable tool to filter available data and the novel mechanisms can prove useful both to the academic community that seeks to unravel the complete gating mechanism of GLIC and to the pharmaceutical companies that search for new binding sites within the molecule for new drugs.
|
10 |
Robust Change Detection with Unknown Post-Change DistributionSargun, Deniz January 2021 (has links)
No description available.
|
Page generated in 0.0727 seconds