Spelling suggestions: "subject:"lempel_ziv complexity"" "subject:"lempelziv complexity""
1 |
SALZA : mesure d’information universelle entre chaînes pour la classificationet l’inférence de causalité / SALZA : universal information measure between strings for classifiation and causalityRevolle, Marion 25 October 2018 (has links)
Les données sous forme de chaîne de symboles sont très variées (ADN, texte, EEG quantifié,…) et ne sont pas toujours modélisables. Une description universelle des chaînes de symboles indépendante des probabilités est donc nécessaire. La complexité de Kolmogorov a été introduite en 1960 pour répondre à cette problématique. Le concept est simple : une chaîne de symboles est complexe quand il n'en existe pas une description courte. La complexité de Kolmogorov est le pendant algorithmique de l’entropie de Shannon et permet de définir la théorie algorithmique de l’information. Cependant, la complexité de Kolmogorov n’est pas calculable en un temps fini ce qui la rend inutilisable en pratique.Les premiers à rendre opérationnelle la complexité de Kolmogorov sont Lempel et Ziv en 1976 qui proposent de restreindre les opérations de la description. Une autre approche est d’utiliser la taille de la chaîne compressée par un compresseur sans perte. Cependant ces deux estimateurs sont mal définis pour le cas conditionnel et le cas joint, il est donc difficile d'étendre la complexité de Lempel-Ziv ou les compresseurs à la théorie algorithmique de l’information.Partant de ce constat, nous introduisons une nouvelle mesure d’information universelle basée sur la complexité de Lempel-Ziv appelée SALZA. L’implémentation et la bonne définition de notre mesure permettent un calcul efficace des grandeurs de la théorie algorithmique de l’information.Les compresseurs sans perte usuels ont été utilisés par Cilibrasi et Vitányi pour former un classifieur universel très populaire : la distance de compression normalisée [NCD]. Dans le cadre de cette application, nous proposons notre propre estimateur, la NSD, et montrons qu’il s’agit d’une semi-distance universelle sur les chaînes de symboles. La NSD surclasse la NCD en s’adaptant naturellement à davantage de diversité des données et en définissant le conditionnement adapté grâce à SALZA.En utilisant les qualités de prédiction universelle de la complexité de Lempel-Ziv, nous explorons ensuite les questions d’inférence de causalité. Dans un premier temps, les conditions algorithmiques de Markov sont rendues calculables grâce à SALZA. Puis en définissant pour la première l’information dirigée algorithmique, nous proposons une interprétation algorithmique de la causalité de Granger algorithmique. Nous montrons, sur des données synthétiques et réelles, la pertinence de notre approche. / Data in the form of strings are varied (DNA, text, quantify EEG) and cannot always be modeled. A universal description of strings, independent of probabilities, is thus necessary. The Kolmogorov complexity was introduced in 1960 to address the issue. The principle is simple: a string is complex if a short description of it does not exist. The Kolmogorov complexity is the counterpart of the Shannon entropy and defines the algorithmic information theory. Yet, the Kolmogorov complexity is not computable in finit time making it unusable in practice.The first ones to make operational the Kolmogorov complexity are Lempel and Ziv in 1976 who proposed to restrain the operations of the description. Another approach uses the size of the compressed string by a lossless data compression algorithm. Yet these two estimators are not well-defined regarding the joint and conditional complexity cases. So, compressors and Lempel-Ziv complexity are not valuable to estimate algorithmic information theory.In the light of this observation, we introduce a new universal information measure based on the Lempel-Ziv complexity called SALZA. The implementation and the good definition of our measure allow computing efficiently values of the algorithmic information theory.Usual lossless compressors have been used by Cilibrasi and Vitányi to define a very popular universal classifier: the normalized compression distance [NCD]. As part of this application, we introduce our own estimator, called the NSD, and we show that the NSD is a universal semi-distance between strings. NSD surpasses NCD because it gets used to a large data set and uses the adapted conditioning with SALZA.Using the accurate universal prediction quality of the Lempel-Ziv complexity, we explore the question of causality inference. At first, we compute the algorithmic causal Markov condition thanks to SALZA. Then we define, for the first time, the algorithmic directed information and based on it we introduce the algorithmic Granger causality. The relevance of our approach is demonstrated on real and synthetic data.
|
2 |
Correlation attacks on stream ciphers using convolutional codesBruwer, Christian S 24 January 2006 (has links)
This dissertation investigates four methods for attacking stream ciphers that are based on nonlinear combining generators: -- Two exhaustive-search correlation attacks, based on the binary derivative and the Lempel-Ziv complexity measure. -- A fast-correlation attack utilizing the Viterbi algorithm -- A decimation attack, that can be combined with any of the above three attacks. These are ciphertext-only attacks that exploit the correlation that occurs between the ciphertext and an internal linear feedback shift-register (LFSR) of a stream cipher. This leads to a so-called divide and conquer attack that is able to reconstruct the secret initial states of all the internal LFSRs within the stream cipher. The binary derivative attack and the Lempel-Ziv attack apply an exhaustive search to find the secret key that is used to initialize the LFSRs. The binary derivative and the Lempel-Ziv complexity measures are used to discriminate between correct and incorrect solutions, in order to identify the secret key. Both attacks are ideal for implementation on parallel processors. Experimental results show that the Lempel-Ziv correlation attack gives successful results for correlation levels of p = 0.482, requiring approximately 62000 ciphertext bits. And the binary derivative attack is successful for correlation levels of p = 0.47, using approximately 24500 ciphertext bits. The fast-correlation attack, utilizing the Viterbi algorithm, applies principles from convolutional coding theory, to identify an embedded low-rate convolutional code in the pn-sequence that is generated by an internal LFSR. The embedded convolutional code can then be decoded with a low complexity Viterbi algorithm. The algorithm operates in two phases: In the first phase a set of suitable parity check equations is found, based on the feedback taps of the LFSR, which has to be done once only once for a targeted system. In the second phase these parity check equations are utilized in a Viterbi decoding algorithm to recover the transmitted pn-sequence, thereby obtaining the secret initial state of the LFSR. Simulation results for a 19-bit LFSR show that this attack can recover the secret key for correlation levels of p = 0.485, requiring an average of only 153,448 ciphertext bits. All three attacks investigated in this dissertation are capable of attacking LFSRs with a length of approximately 40 bits. However, these attacks can be extended to attack much longer LFSRs by making use of a decimation attack. The decimation attack is able to reduce (decimate) the size of a targeted LFSR, and can be combined with any of the three above correlation attacks, to attack LFSRs with a length much longer than 40 bits. / Dissertation (MEng (Electronic Engineering))--University of Pretoria, 2007. / Electrical, Electronic and Computer Engineering / unrestricted
|
3 |
Análise de textura em imagens baseado em medidas de complexidade / Image Texture Analysis based on complex measuresCondori, Rayner Harold Montes 30 November 2015 (has links)
A análise de textura é uma das mais básicas e famosas áreas de pesquisa em visão computacional. Ela é também de grande importância em muitas outras disciplinas, tais como ciências médicas e biológicas. Por exemplo, uma tarefa comum de análise de textura é a detecção de tecidos não saudáveis em imagens de Ressonância Magnética do pulmão. Nesta dissertação, nós propomos um método novo de caracterização de textura baseado nas medidas de complexidade tais como o expoente de Hurst, o expoente de Lyapunov e a complexidade de Lempel-Ziv. Estas medidas foram aplicadas sobre amostras de imagens no espaço de frequência. Três métodos de amostragem foram propostas, amostragem: radial, circular e por caminhadas determinísticas parcialmente auto- repulsivas (amostragem CDPA). Cada método de amostragem produz um vetor de características por medida de complexidade aplicada. Esse vetor contem um conjunto de descritores que descrevem a imagem processada. Portanto, cada imagem será representada por nove vetores de características (três medidas de complexidade e três métodos de amostragem), os quais serão comparados na tarefa de classificação de texturas. No final, concatenamos cada vetor de características conseguido calculando a complexidade de Lempel-Ziv em amostras radiais e circulares com os descritores obtidos através de técnicas de análise de textura tradicionais, tais como padrões binários locais (LBP), wavelets de Gabor (GW), matrizes de co-ocorrência en níveis de cinza (GLCM) e caminhadas determinísticas parcialmente auto-repulsivas em grafos (CDPAg). Este enfoque foi testado sobre três bancos de imagens: Brodatz, USPtex e UIUC, cada um com seus próprios desafios conhecidos. As taxas de acerto de todos os métodos tradicionais foram incrementadas com a concatenação de relativamente poucos descritores de Lempel-Ziv. Por exemplo, no caso do método LBP, o incremento foi de 84.25% a 89.09% com a concatenação de somente cinco descritores. De fato, simplesmente concatenando cinco descritores são suficientes para ver um incremento na taxa de acerto de todos os métodos tradicionais estudados. Por outro lado, a concatenação de un número excessivo de descritores de Lempel-Ziv (por exemplo mais de 40) geralmente não leva a melhora. Neste sentido, vendo os resultados semelhantes obtidos nos três bancos de imagens analisados, podemos concluir que o método proposto pode ser usado para incrementar as taxas de acerto em outras tarefas que envolvam classificação de texturas. Finalmente, com a amostragem CDPA também se obtém resultados significativos, que podem ser melhorados em trabalhos futuros. / Texture analysis is one of the basic and most popular computer vision research areas. It is also of importance in many other disciplines, such as medical sciences and biology. For example, non-healthy tissue detection in lung Magnetic Resonance images is a common texture analysis task. We proposed a novel method for texture characterization based on complexity measures such as Lyapunov exponent, Hurst exponent and Lempel-Ziv complexity. This measurements were applied over samples taken from images in the frequency domain. Three types of sampling methods were proposed: radial sampling, circular sampling and sampling by using partially self-avoiding deterministic walks (CDPA sampling). Each sampling method produce a feature vector which contains a set of descriptors that characterize the processed image. Then, each image will be represented by nine feature vectors which are means to be compared in texture classification tasks (three complexity measures over samples from three sampling methods). In the end, we combine each Lempel-Ziv feature vector from the circular and radial sampling with descriptors obtained through traditional image analysis techniques, such as Local Binary Patterns (LBP), Gabor Wavelets (GW), Gray Level Co-occurrence Matrix (GLCM) and Self-avoiding Deterministic Walks in graphs (CDPAg). This approach were tested in three datasets: Brodatz, USPtex and UIUC, each one with its own well-known challenges. All traditional methods success rates were increased by adding relatively few Lempel-Ziv descriptors. For example in the LBP case the increment went from 84.25% to 89.09% with the addition of only five descriptors. In fact, just adding five Lempel-Ziv descriptors are enough to see an increment in the success rate of every traditional method. However, adding too many Lempel-Ziv descriptors (for example more than 40) generally doesnt produce better results. In this sense, seeing the similar results we obtain in all three databases, we conclude that this approach may be used to increment the success rate in a lot of others texture classification tasks. Finally, the CDPA sampling also obtain very promising results that we can improve further on future works.
|
4 |
Biomechanical online signature modeling applied to verification / Modélisation biomécanique des signatures en ligne appliqué à la vérificationCoutinho Canuto, Jânio 08 December 2014 (has links)
Cette thèse porte sur la modélisation et vérification des signatures en ligne. La première partie a pour thème principal la modélisation biomécanique des mouvements de la main. Un modèle basé sur le critère de Minimum de Secousse (MS) a été choisi parmi plusieurs théories du contrôle moteur. Ensuite, le problème de la segmentation des trajectoires en traits qui correspondent au modèle cinématique choisi a été étudié, ce qui a conduit à la mise au point d'une méthode de segmentation itérative. Le choix du modèle et de la méthode de segmentation sont basé sur le compromis entre la qualité de reconstruction et la compression. Dans la deuxième partie, le modèle polynomial issu du critère de MS est volontairement dégradé. Les zéros non-Réels des polynômes sont jetés et les effets de cette dégradation sont étudiés dans une perspective de vérification biométrique. Cette dégradation est équivalente à la technique connue sous le nom d’Infinity Clipping, initialement appliqué à des signaux de parole. Pour les signatures en ligne, comme pour la parole, la préservation de l'information essentielle a été observée sur des tâches de vérification de signature. En fait, en utilisant seulement la distance de Levenshtein sur la représentation dégradée, un taux d'erreur comparable à ceux des méthodes plus élaborées a été obtenu. En outre, la représentation symbolique issue de l’Infinity Clipping permet d’établir une relation conceptuelle entre le nombre de segments obtenus par la segmentation itératif basée sur le MS et la complexité de Lempel-Ziv. Cette relation est potentiellement utile pour l'analyse des signatures en ligne et pour l’amélioration des systèmes de reconnaissance / This thesis deals with the modelling and verification of online signatures. The first part has as main theme the biomechanical modelling of hand movements associated to the signing gesture. A model based on the Minimum Jerk (MJ) criterion was chosen amongst the several available motor control theories. Next, the problem of signature trajectory segmentation into strokes that better fit the chosen kinematic model is studied, leading to the development of an iterative segmentation method. Both the choice of the model and the segmentation method are strongly based on the tradeoff between reconstruction quality and compression. On the second part, the polynomial model provided by the MJ criterion is intentionally degraded. The non-Real zeroes of the polynomials are discarded and the effects of this degradation are studied from a biometric verification perspective. This degradation is equivalent to the signal processing technique known as Infinity Clipping, originally applied to speech signals. On signatures, as for speech, the preservation of essential information was observed on signature verification tasks. As a matter of fact, using only the Levenshtein distance over the infinitely clipped representation, verification error rates comparable to those of more elaborate methods were obtained. Furthermore, the symbolic representation yielded by the infinity clipping technique allows for a conceptual relationship between the number of polynomial segments obtained through the Minimum Jerk-Based iterative segmentation and the Lempel-Ziv complexity. This relationship is potentially useful for the analysis of online signature signals and the improvement of recognition systems
|
5 |
Análise de textura em imagens baseado em medidas de complexidade / Image Texture Analysis based on complex measuresRayner Harold Montes Condori 30 November 2015 (has links)
A análise de textura é uma das mais básicas e famosas áreas de pesquisa em visão computacional. Ela é também de grande importância em muitas outras disciplinas, tais como ciências médicas e biológicas. Por exemplo, uma tarefa comum de análise de textura é a detecção de tecidos não saudáveis em imagens de Ressonância Magnética do pulmão. Nesta dissertação, nós propomos um método novo de caracterização de textura baseado nas medidas de complexidade tais como o expoente de Hurst, o expoente de Lyapunov e a complexidade de Lempel-Ziv. Estas medidas foram aplicadas sobre amostras de imagens no espaço de frequência. Três métodos de amostragem foram propostas, amostragem: radial, circular e por caminhadas determinísticas parcialmente auto- repulsivas (amostragem CDPA). Cada método de amostragem produz um vetor de características por medida de complexidade aplicada. Esse vetor contem um conjunto de descritores que descrevem a imagem processada. Portanto, cada imagem será representada por nove vetores de características (três medidas de complexidade e três métodos de amostragem), os quais serão comparados na tarefa de classificação de texturas. No final, concatenamos cada vetor de características conseguido calculando a complexidade de Lempel-Ziv em amostras radiais e circulares com os descritores obtidos através de técnicas de análise de textura tradicionais, tais como padrões binários locais (LBP), wavelets de Gabor (GW), matrizes de co-ocorrência en níveis de cinza (GLCM) e caminhadas determinísticas parcialmente auto-repulsivas em grafos (CDPAg). Este enfoque foi testado sobre três bancos de imagens: Brodatz, USPtex e UIUC, cada um com seus próprios desafios conhecidos. As taxas de acerto de todos os métodos tradicionais foram incrementadas com a concatenação de relativamente poucos descritores de Lempel-Ziv. Por exemplo, no caso do método LBP, o incremento foi de 84.25% a 89.09% com a concatenação de somente cinco descritores. De fato, simplesmente concatenando cinco descritores são suficientes para ver um incremento na taxa de acerto de todos os métodos tradicionais estudados. Por outro lado, a concatenação de un número excessivo de descritores de Lempel-Ziv (por exemplo mais de 40) geralmente não leva a melhora. Neste sentido, vendo os resultados semelhantes obtidos nos três bancos de imagens analisados, podemos concluir que o método proposto pode ser usado para incrementar as taxas de acerto em outras tarefas que envolvam classificação de texturas. Finalmente, com a amostragem CDPA também se obtém resultados significativos, que podem ser melhorados em trabalhos futuros. / Texture analysis is one of the basic and most popular computer vision research areas. It is also of importance in many other disciplines, such as medical sciences and biology. For example, non-healthy tissue detection in lung Magnetic Resonance images is a common texture analysis task. We proposed a novel method for texture characterization based on complexity measures such as Lyapunov exponent, Hurst exponent and Lempel-Ziv complexity. This measurements were applied over samples taken from images in the frequency domain. Three types of sampling methods were proposed: radial sampling, circular sampling and sampling by using partially self-avoiding deterministic walks (CDPA sampling). Each sampling method produce a feature vector which contains a set of descriptors that characterize the processed image. Then, each image will be represented by nine feature vectors which are means to be compared in texture classification tasks (three complexity measures over samples from three sampling methods). In the end, we combine each Lempel-Ziv feature vector from the circular and radial sampling with descriptors obtained through traditional image analysis techniques, such as Local Binary Patterns (LBP), Gabor Wavelets (GW), Gray Level Co-occurrence Matrix (GLCM) and Self-avoiding Deterministic Walks in graphs (CDPAg). This approach were tested in three datasets: Brodatz, USPtex and UIUC, each one with its own well-known challenges. All traditional methods success rates were increased by adding relatively few Lempel-Ziv descriptors. For example in the LBP case the increment went from 84.25% to 89.09% with the addition of only five descriptors. In fact, just adding five Lempel-Ziv descriptors are enough to see an increment in the success rate of every traditional method. However, adding too many Lempel-Ziv descriptors (for example more than 40) generally doesnt produce better results. In this sense, seeing the similar results we obtain in all three databases, we conclude that this approach may be used to increment the success rate in a lot of others texture classification tasks. Finally, the CDPA sampling also obtain very promising results that we can improve further on future works.
|
6 |
Fast Low Memory T-Transform: string complexity in linear time and space with applications to Android app store security.Rebenich, Niko 27 April 2012 (has links)
This thesis presents flott, the Fast Low Memory T-Transform, the currently fastest and most memory efficient linear time and space algorithm available to compute the string complexity measure T-complexity. The flott algorithm uses 64.3% less memory and in our experiments runs asymptotically 20% faster than its predecessor. A full C-implementation is provided and published under the Apache Licence 2.0. From the flott algorithm two deterministic information measures are derived and applied to Android app store security. The derived measures are the normalized T-complexity distance and the instantaneous T-complexity rate which are used to detect, locate, and visualize unusual information changes in Android applications. The information measures introduced present a novel, scalable approach to assist with the detection of malware in app stores. / Graduate
|
Page generated in 0.048 seconds