Spelling suggestions: "subject:"kullbackleibler"" "subject:"kullbackdivergensen""
1 |
The Use Of Kullback-Leibler Divergence In Opinion RetrievalCen, Kun 24 September 2008 (has links)
With the huge amount of subjective contents in on-line documents, there is a clear need for an information retrieval system that supports retrieval of documents containing opinions about the topic expressed in a user’s query. In recent years, blogs, a new publishing medium, have attracted a large number of people to express personal opinions covering all kinds of topics in response to the real-world events. The opinionated nature of blogs makes them a new interesting research area for opinion retrieval. Identification and extraction of subjective contents from blogs has become the subject of several research projects.
In this thesis, four novel methods are proposed to retrieve blog posts that express opinions about the given topics. The first method utilizes the Kullback-Leibler divergence (KLD) to weight the lexicon of subjective adjectives around query terms. Considering the distances between the query terms and subjective adjectives, the second method uses KLD scores of subjective adjectives based on distances from the query terms for document re-ranking. The third method calculates KLD scores of subjective adjectives for predefined query categories. In the fourth method, collocates, words co-occurring with query terms in the corpus, are used to construct the subjective lexicon automatically. The KLD scores of collocates are then calculated and used for document ranking.
Four groups of experiments are conducted to evaluate the proposed methods on the TREC test collections. The results of the experiments are compared with the baseline systems to determine the effectiveness of using KLD in opinion retrieval. Further studies are recommended to explore more sophisticated approaches to identify subjectivity and promising techniques to extract opinions.
|
2 |
The Use Of Kullback-Leibler Divergence In Opinion RetrievalCen, Kun 24 September 2008 (has links)
With the huge amount of subjective contents in on-line documents, there is a clear need for an information retrieval system that supports retrieval of documents containing opinions about the topic expressed in a user’s query. In recent years, blogs, a new publishing medium, have attracted a large number of people to express personal opinions covering all kinds of topics in response to the real-world events. The opinionated nature of blogs makes them a new interesting research area for opinion retrieval. Identification and extraction of subjective contents from blogs has become the subject of several research projects.
In this thesis, four novel methods are proposed to retrieve blog posts that express opinions about the given topics. The first method utilizes the Kullback-Leibler divergence (KLD) to weight the lexicon of subjective adjectives around query terms. Considering the distances between the query terms and subjective adjectives, the second method uses KLD scores of subjective adjectives based on distances from the query terms for document re-ranking. The third method calculates KLD scores of subjective adjectives for predefined query categories. In the fourth method, collocates, words co-occurring with query terms in the corpus, are used to construct the subjective lexicon automatically. The KLD scores of collocates are then calculated and used for document ranking.
Four groups of experiments are conducted to evaluate the proposed methods on the TREC test collections. The results of the experiments are compared with the baseline systems to determine the effectiveness of using KLD in opinion retrieval. Further studies are recommended to explore more sophisticated approaches to identify subjectivity and promising techniques to extract opinions.
|
3 |
KLIC作為傾向分數配對平衡診斷之可行性探討 / Using Kullback-Leibler Information Criterion on balancing diagnostics for baseline covariates between treatment groups in propensity-score matched samples李珮嘉, Li, Pei Chia Unknown Date (has links)
觀察性研究資料中,透過傾向分數的使用,可以使基準變數在實驗與對照兩組間達到某種程度的平衡,並可視同為一隨機試驗,進而進行有效的統計推論。文獻中有關平衡與否的診斷,大多聚焦於平均數與變異數的比較。本文中我們提出使用KLIC(Kullback-Leibler Information Criterion)及KS(Kolmogorov and Simonov)兩種比較分配函數差異的統計量,作為另一種平衡診斷工具的構想,並針對其可行性進行探討與評比。此外,數據顯示KLIC及KS與透過傾向分數配對的成功比例呈現負相關。由於配對成功比例過低將導致後續統計推論結果的侷限性,因此本文也就KLIC及KS作為是否進行配對的一個先行指標之可行性作探討。模擬結果顯示,二者的答案均是肯定的。 / In observational studies, propensity scores are frequently used as tools to balance the distribution of baseline covariates between treated and untreated groups to some extent so that the data could be treated as if they were from a randomized controlled trial (RCT) and causal inferences could thus be made. In the past, balance or not was usually diagnosed in terms of the means and/or the variances. In this study, we proposed using either Kullback-Leibler Information Criterion (KLIC) or Kolmogorov and Simonov (KS) statistic as a diagnostic measure, and evaluated its feasibility. In addition, since low propensity score matching rate decreases the power of the statistical inference and a pilot study showed that the matching rate was negatively correlated with KLIC and KS; thus, we also discussed the possibilities of using KLIC and KS to be pre-indices before implementing propensity score matching. Both considerations appear to be positive through our simulation study.
|
4 |
Fusion de données tolérante aux défaillances : application à la surveillance de l’intégrité d’un système de localisation / Fault tolerant data fusion : application on integrity monitoring of a localization systemAl Hage, Joelle 17 October 2016 (has links)
L'intérêt des recherches dans le domaine de la fusion de données multi-capteurs est en plein essor en raison de la diversité de ses secteurs d'applications. Plus particulièrement, dans le domaine de la robotique et de la localisation, l'exploitation des différentes informations fournies par les capteurs constitue une étape primordiale afin d'assurer une estimation fiable de la position. Dans ce contexte de fusion de données multi-capteurs, nous nous attachons à traiter le diagnostic, menant à l'identification de la cause d'une défaillance, et la tolérance de l'approche proposée aux défauts de capteurs, peu abordés dans la littérature.Nous avons fait le choix de développer une approche basée sur un formalisme purement informationnel : filtre informationnel d'une part, et outils de la théorie de l'information d'autre part. Des résidus basés sur la divergence de Kullback-Leibler sont développés. Via des méthodes optimisées de seuillage, ces résidus conduisent à la détection et à l'exclusion de ces défauts capteurs. La théorie proposée est éprouvée sur deux applications de localisation. La première application concerne la localisation collaborative, tolérante aux défauts d'un système multi-robots. La seconde application traite de la localisation en milieu ouvert utilisant un couplage serré GNSS/odométrie tolérant aux défauts. / The interest of research in the multi-sensor data fusion field is growing because of its various applications sectors. Particularly, in the field of robotics and localization, the use of different sensors informations is a vital step to ensure a reliable position estimation. In this context of multi-sensor data fusion, we consider the diagnosis, leading to the identification of the cause of a failure, and the sensors faults tolerance aspect, discussed in limited work in the literature. We chose to develop an approach based on a purely informational formalism: information filter on the one hand and tools of the information theory on the other. Residuals based on the Kullback-Leibler divergence are developed. These residuals allow to detect and to exclude the faulty sensors through optimized thresholding methods. This theory is tested in two applications. The first application is the fault tolerant collaborative localization of a multi-robot system. The second application is the localization in outdoor environments using a tightly coupled GNSS/odometer with a fault tolerant aspect.
|
5 |
Fusion multi-capteurs tolérante aux fautes pour un niveau d'intégrité élevé du suivi de la personne / High integrity personal tracking using fault tolerant multi-sensor data fusionDaher, Mohamad 13 December 2017 (has links)
Environ un tiers des personnes âgées vivant à domicile souffrent d'une chute chaque année. Les chutes les plus graves se produisent lorsque la personne est seule et incapable de se lever, ce qui entraîne un grand nombre de personnes âgées admis au service de gériatrique et un taux de mortalité malheureusement élevé. Le système PAL (Personally Assisted Living) apparaît comme une des solutions de ce problème. Ce système d’intelligence ambiante permet aux personnes âgées de vivre dans un environnement intelligent et pro-actif. Le travail de cette thèse s’inscrit dans le cadre de suivi des personnes âgées avec un maintien à domicile, la reconnaissance quotidienne des activités et le système automatique de détection des chutes à l'aide d'un ensemble de capteurs non intrusifs qui accorde l'intimité et le confort aux personnes âgées. En outre, une méthode de fusion tolérante aux fautes est proposée en utilisant un formalisme purement informationnel: filtre informationnel d’une part, et outils de la théorie de l’information d’autre part. Des résidus basés sur la divergence de Kullback-Leibler sont utilisés. Via un seuillage adéquat, ces résidus conduisent à la détection et à l’exclusion des défauts capteurs. Les algorithmes proposés ont été validés avec plusieurs scénarii différents contenant les différentes activités: marcher, s’asseoir, debout, se coucher et tomber. Les performances des méthodes développées ont montré une sensibilité supérieure à 94% pour la détection de chutes de personnes et plus de 92% pour la discrimination entre les différentes ADL (Activités de la vie quotidienne). / About one third of home-dwelling older people suffer a fall each year. The most painful falls occur when the person is alone and unable to get up, resulting in huge number of elders which are associated with institutionalization and high morbidity-mortality rate. The PAL (Personally Assisted Living) system appears to be one of the solutions of this problem. This ambient intelligence system allows elderly people to live in an intelligent and pro-active environment. This thesis describes the ongoing work of in-home elder tracking, activities daily living recognition, and automatic fall detection system using a set of non-intrusive sensors that grants privacy and comfort to the elders. In addition, a fault-tolerant fusion method is proposed using a purely informational formalism: information filter on the one hand, and information theory tools on the other hand. Residues based on the Kullback-Leibler divergence are used. Using an appropriate thresholding, these residues lead to the detection and the exclusion of sensors faults. The proposed algorithms were validated with many different scenarios containing the different activities: walking, sitting, standing, lying down, and falling. The performances of the developed methods showed a sensitivity of more than 94% for the fall detection of persons and more than 92% for the discrimination between the different ADLs (Activities of the daily life).
|
6 |
State space time series clustering using discrepancies based on the Kullback-Leibler information and the Mahalanobis distanceFoster, Eric D. 01 December 2012 (has links)
In this thesis, we consider the clustering of time series data; specifically, time series that can be modeled in the state space framework. Of primary focus is the pairwise discrepancy between two state space time series. The state space model can be formulated in terms of two equations: the state equation, based on a latent process, and the observation equation. Because the unobserved state process is often of interest, we develop discrepancy measures based on the estimated version of the state process. We compare these measures to discrepancies based on the observed data. In all, seven novel discrepancies are formulated.
First, discrepancies derived from Kullback-Leibler (KL) information and Mahalanobis distance (MD) measures are proposed based on the observed data. Next, KL information and MD discrepancies are formulated based on the composite marginal contributions of the smoothed estimates of the unobserved state process. Furthermore, an MD is created based on the joint contributions of the collection of smoothed estimates of the unobserved state process. The cross trajectory distance, a discrepancy heavily influenced by both observed and smoothed data, is proposed as well as a Euclidean distance based on the smoothed state estimates. The performance of these seven novel discrepancies is compared to the often used Euclidean distance based on the observed data, as well as a KL information discrepancy based on the joint contributions of the collection of smoothed state estimates (Bengtsson and Cavanaugh, 2008).
We find that those discrepancy measures based on the smoothed estimates of the unobserved state process outperform those discrepancy measures based on the observed data. The best performance was achieved by the discrepancies founded upon the joint contributions of the collection of unobserved states, followed by the discrepancies derived from the marginal contributions.
We observed a non-trivial degradation in clustering performance when estimating the parameters of the state space model. To improve estimation, we propose an iterative estimation and clustering routine based on the notion of finding a series' most similar counterparts, pooling them, and estimating a new set of parameters. Under ideal circumstances, we show that the iterative estimation and clustering algorithm can potentially achieve results that approach those obtained in settings where parameters are known. In practice, the algorithm often improves the performance of the model-based clustering measures.
We apply our methods to two examples. The first application pertains to the clustering of time course genetic data. We use data from Cho et al. (1998) where a time course experiment of yeast gene expression was performed in order to study the yeast mitotic cell cycle. We attempt to discover the phase to which 219 genes belong.
The second application seeks to answer whether or not influenza and pneumonia mortality can be explained geographically. Data from a collection of cities across the U.S. are acquired from the Morbidity and Mortality Weekly Report (MMWR). We cluster the MMWR data without geographic constraints, and compare the results to clusters defined by MMWR geographic regions. We find that influenza and pneumonia mortality cannot be explained by geography.
|
7 |
センサネットワークにおける観測データの相関を用いた伝送品質の改善小林, 健太郎, 山里, 敬也, 岡田, 啓, 片山, 正昭 01 December 2005 (has links)
No description available.
|
8 |
Development and Evaluation of Model-Based Misfire Detection AlgorithmTherén, Linus January 2014 (has links)
This report present the work to develop a misfire detection algorithm for onboard diagnostics on a spark ignited combustion engine. The work is based on a previous developed model-based detection algorithm, created to meet more stringent future legislation and reduce the cost of calibration. In the existing approach a simplified engine model is used to estimate the torque from the flywheel angular velocity, and the algorithm can detect misfires in various conditions. The main contribution in this work, is further development of the misfire detection algorithm with focus on improving the handling of disturbances and variations between different vehicles. The resulting detection algorithm can be automatically calibrated with training data and manage disturbances such as manufacturing errors on the flywheel and torsional vibrations in the crankshaft occurring after a misfire. Furthermore a robustness analysis with different engine configurations is carried out, and the algorithm is evaluated with the Kullback- Leibler divergence correlated to the diagnosis requirements. In the validation, data from vehicles with four cylinder engines are used and the algorithm show good performance with few false alarms and missed detections.
|
9 |
Detección de condición falla de encolamientos de cambios de estado de móviles prepago a través de divergencia de Kullvack-LeiblerTorres Huircaman, Milflen January 2012 (has links)
Magíster en Ingeniería de Redes de Comunicaciones / La industria de telefonía móvil de prepago chilena concentra al 70% de los clientes móviles de los principales operadores en el país. Este servicio utiliza un proceso de descuento y abono en línea que permite rebajar en forma casi instantánea los créditos consumidos al utilizar los servicios de voz y datos habilitados en el terminal, y abonar el crédito correspondientes cuando se hace aplica una recarga prepagada, que son las operaciones más habituales que se aplican para cambiar el estado de operación de un terminal móvil prepago.
La dinámica de estas transiciones depende de manera íntima de la operatividad del sistema computacional que administra y ejecuta estos cambios. Su arquitectura, del tipo servidor-cola de comandos, utiliza una filosofía first-in first-out (FIFO) para procesar cada comando asociado a la transición de estado que debe aplicarse sobre
cada terminal de la red. Este sistema de administración de comandos puede colapsar si la demanda por cambios de estado aumenta en forma repentina y supera la capacidad de procesamiento del servidor. Ello tiene como consecuencia un aumento desmedido del tamaño de la cola de comandos, lo que a su vez, puede originar problemas en las prestaciones de telecomunicaciones dentro de la red y pérdidas monetarias al operador al dejar fuera de línea el sistema de cobro.
Este fenómeno, que se denomina encolamiento, es controlado en los sistemas comerciales utilizando alarmas por umbral, las que indican a los administradores del sistema la necesidad de activar las contramedidas necesarias para restablecer el correcto funcionamiento del sistema. Sin embargo, el valor de este umbral es fijado sin utilizar necesariamente criterios de optimalidad de desempeño, lo que reduce la
eficiencia en la operación técnica y comercial del servicio.
La hipótesis de trabajo de esta investigación es que el uso un umbral ``duro'' puede ser mejorado al emplear un enfoque que incorpore la historia del proceso que describe la longitud de la cola de comandos, como el basado en las distribuciones de
probabilidad de las condiciones de operación normal y de encolamiento. Para validar esta conjetura, se diseñó un detector de encolamientos basado en la divergencia de Kullback-Leibler, la que permite comparar la distribución instantánea de las observaciones con las correspondientes a la condición de operación normal y de encolamiento.
La metodología empleada para validar esta tesis se basó en la simulación computacional de las transiciones de estado descrita mediante el uso de una cadena de Markov de 3 estados, que se utilizó para cuantificar la operación del detector y compararla con las métricas asociadas a la detección dura mediante umbrales. Las métricas de desempeño empleadas fueron el porcentaje de errores de tipo I (no
detección) y de tipo II (falso positivo), las cuales fueron calculadas en forma empírica en ambos detectores. Además, el funcionamiento del detector fue validado con datos reales de operación a partir de un registro de 14 meses de observaciones.
Los resultados obtenidos avalan la hipótesis planteada, en el sentido que se observaron mejoras de desempeño de hasta un 60% en la detección de encolamiento y un 85% en la disminución de falsos positivos al comparar el detector de Kullback-Leibler con aquellos basados en umbral.
En este sentido, estos resultados constituyen un avance importante en el aumento de la precisión y confiabilidad de detección de condiciones de fallas que justifica la incorporación de esta nueva estrategia en el ambiente de operaciones de una empresa de telecomunicaciones. Además, la hace eventualmente extensible a procesos controlados a través de colas.
|
10 |
A Kullback-Leiber Divergence Filter for Anomaly Detection in Non-Destructive Pipeline InspectionZhou, Ruikun 14 September 2020 (has links)
Anomaly detection generally refers to algorithmic procedures aimed at identifying relatively rare events in data sets that differ substantially from the majority of the data set to which they belong. In the context of data series generated by sensors mounted on mobile devices for non-destructive inspection and monitoring, anomalies typically identify defects to be detected, therefore defining the main task of this class of devices. In this case, a useful way of operationally defining anomalies is to look at their information content with respect to the background data, which is typically noisy and therefore easily masking the relevant events if unfiltered. In this thesis, a Kullback-Leibler (KL) Divergence filter is proposed to detect signals with relatively high information content, namely anomalies, within data series. The data is generated by using the model of a broad class of proximity sensors that apply to devices commonly used in engineering practice. This includes, for example, sensory devices mounted on mobile robotic devices for the non-destructive inspection of hazardous or other environments that may not be accessible to humans for direct inspection. The raw sensory data generated by this class of sensors is often challenging to analyze due to the prevalence of noise over the signal content that reveals the presence of relevant features, as for example damage in gas pipelines. The proposed filter is built to detect the difference of information content between the data series collected by the sensor and a baseline data series, with the advantage of not requiring the design of a threshold. Moreover, differing from the traditional filters which need the prior knowledge or distribution assumptions about the data, this KL Divergence filter is model free and suitable for all kinds of raw sensory data. Of course, it is also compatible with classical signal distribution assumptions, such as Gaussian approximation, for instance. Also, the robustness and sensitivity of the KL Divergence filter are discussed under different scenarios with various signal to noise ratios of data generated by a simulator reproducing very realistic scenarios and based on models of real sensors provided by manufacturers or widely accepted in the literature.
|
Page generated in 0.1184 seconds