Global ETD Search

11	SALZA : mesure d’information universelle entre chaînes pour la classificationet l’inférence de causalité / SALZA : universal information measure between strings for classifiation and causality Revolle, Marion 25 October 2018 (has links) Les données sous forme de chaîne de symboles sont très variées (ADN, texte, EEG quantifié,…) et ne sont pas toujours modélisables. Une description universelle des chaînes de symboles indépendante des probabilités est donc nécessaire. La complexité de Kolmogorov a été introduite en 1960 pour répondre à cette problématique. Le concept est simple : une chaîne de symboles est complexe quand il n'en existe pas une description courte. La complexité de Kolmogorov est le pendant algorithmique de l’entropie de Shannon et permet de définir la théorie algorithmique de l’information. Cependant, la complexité de Kolmogorov n’est pas calculable en un temps fini ce qui la rend inutilisable en pratique.Les premiers à rendre opérationnelle la complexité de Kolmogorov sont Lempel et Ziv en 1976 qui proposent de restreindre les opérations de la description. Une autre approche est d’utiliser la taille de la chaîne compressée par un compresseur sans perte. Cependant ces deux estimateurs sont mal définis pour le cas conditionnel et le cas joint, il est donc difficile d'étendre la complexité de Lempel-Ziv ou les compresseurs à la théorie algorithmique de l’information.Partant de ce constat, nous introduisons une nouvelle mesure d’information universelle basée sur la complexité de Lempel-Ziv appelée SALZA. L’implémentation et la bonne définition de notre mesure permettent un calcul efficace des grandeurs de la théorie algorithmique de l’information.Les compresseurs sans perte usuels ont été utilisés par Cilibrasi et Vitányi pour former un classifieur universel très populaire : la distance de compression normalisée [NCD]. Dans le cadre de cette application, nous proposons notre propre estimateur, la NSD, et montrons qu’il s’agit d’une semi-distance universelle sur les chaînes de symboles. La NSD surclasse la NCD en s’adaptant naturellement à davantage de diversité des données et en définissant le conditionnement adapté grâce à SALZA.En utilisant les qualités de prédiction universelle de la complexité de Lempel-Ziv, nous explorons ensuite les questions d’inférence de causalité. Dans un premier temps, les conditions algorithmiques de Markov sont rendues calculables grâce à SALZA. Puis en définissant pour la première l’information dirigée algorithmique, nous proposons une interprétation algorithmique de la causalité de Granger algorithmique. Nous montrons, sur des données synthétiques et réelles, la pertinence de notre approche. / Data in the form of strings are varied (DNA, text, quantify EEG) and cannot always be modeled. A universal description of strings, independent of probabilities, is thus necessary. The Kolmogorov complexity was introduced in 1960 to address the issue. The principle is simple: a string is complex if a short description of it does not exist. The Kolmogorov complexity is the counterpart of the Shannon entropy and defines the algorithmic information theory. Yet, the Kolmogorov complexity is not computable in finit time making it unusable in practice.The first ones to make operational the Kolmogorov complexity are Lempel and Ziv in 1976 who proposed to restrain the operations of the description. Another approach uses the size of the compressed string by a lossless data compression algorithm. Yet these two estimators are not well-defined regarding the joint and conditional complexity cases. So, compressors and Lempel-Ziv complexity are not valuable to estimate algorithmic information theory.In the light of this observation, we introduce a new universal information measure based on the Lempel-Ziv complexity called SALZA. The implementation and the good definition of our measure allow computing efficiently values of the algorithmic information theory.Usual lossless compressors have been used by Cilibrasi and Vitányi to define a very popular universal classifier: the normalized compression distance [NCD]. As part of this application, we introduce our own estimator, called the NSD, and we show that the NSD is a universal semi-distance between strings. NSD surpasses NCD because it gets used to a large data set and uses the adapted conditioning with SALZA.Using the accurate universal prediction quality of the Lempel-Ziv complexity, we explore the question of causality inference. At first, we compute the algorithmic causal Markov condition thanks to SALZA. Then we define, for the first time, the algorithmic directed information and based on it we introduce the algorithmic Granger causality. The relevance of our approach is demonstrated on real and synthetic data. Complexité de Lempel Ziv Mesure d’information universelle Classification non supervisée Causalité Lempel-Ziv complexity Algorithmic information theory Universal information measure Unsupervised classification Causality 004 620
12	Contributions to arithmetic complexity and compression / Contributions à la complexité arithmétique et à la compression Lagarde, Guillaume 05 July 2018 (has links) Cette thèse explore deux territoires distincts de l’informatique fondamentale : la complexité et la compression. Plus précisément, dans une première partie, nous étudions la puissance des circuits arithmétiques non commutatifs, qui calculent des polynômes non commutatifs en plusieurs indéterminées. Pour cela, nous introduisons plusieurs modèles de calcul, restreints dans leur manière de calculer les monômes. Ces modèles en généralisent d’autres, plus anciens et largement étudiés, comme les programmes à branchements. Les résultats sont de trois sortes. Premièrement, nous donnons des bornes inférieures sur le nombre d’opérations arithmétiques nécessaires au calcul de certains polynômes tels que le déterminant ou encore le permanent. Deuxièmement, nous concevons des algorithmes déterministes fonctionnant en temps polynomial pour résoudre le problème du test d’identité polynomiale. Enfin, nous construisons un pont entre la théorie des automates et les circuits arithmétiques non commutatifs, ce qui nous permet de dériver de nouvelles bornes inférieures en utilisant une mesure reposant sur le rang de la matrice dite de Hankel, provenant de la théorie des automates. Une deuxième partie concerne l’analyse de l’algorithme de compression sans perte Lempel-Ziv. Pourtant très utilisé, sa stabilité est encore mal établie. Vers la fin des années 90s, Jack Lutz popularise la question suivante, connue sous le nom de « one-bit catastrophe » : « étant donné un mot compressible, est-il possible de le rendre incompressible en ne changeant qu’un seul bit ? ». Nous montrons qu’une telle catastrophe est en effet possible. Plus précisément, en donnant des bornes optimales sur la variation de la taille de la compression, nous montrons qu’un mot « très compressible » restera toujours compressible après modification d’un bit, mais que certains mots « peu compressibles » deviennent en effet incompressibles. / This thesis explores two territories of computer science: complexity and compression. More precisely, in a first part, we investigate the power of non-commutative arithmetic circuits, which compute multivariate non-commutative polynomials. For that, we introduce various models of computation that are restricted in the way they are allowed to compute monomials. These models generalize previous ones that have been widely studied, such as algebraic branching programs. The results are of three different types. First, we give strong lower bounds on the number of arithmetic operations needed to compute some polynomials such as the determinant or the permanent. Second, we design some deterministic polynomial-time algorithm to solve the white-box polynomial identity problem. Third, we exhibit a link between automata theory and non-commutative arithmetic circuits that allows us to derive some old and new tight lower bounds for some classes of non-commutative circuits, using a measure based on the rank of a so-called Hankel matrix. A second part is concerned with the analysis of the data compression algorithm called Lempel-Ziv. Although this algorithm is widely used in practice, we know little about its stability. Our main result is to show that an infinite word compressible by LZ’78 can become incompressible by adding a single bit in front of it, thus closing a question proposed by Jack Lutz in the late 90s under the name “one-bit catastrophe”. We also give tight bounds on the maximal possible variation between the compression ratio of a finite word and its perturbation—when one bit is added in front of it. Complexité algébrique Bornes inférieures Test d’identité polynomiale Circuits arithmétiques Lempel-Ziv Compression Algebraic complexity Lower bounds Polynomial identity testing Arithmetic circuits Lempel-Ziv Compression
13	A Dynamically Partitionable Compressed Cache Chen, David, Peserico, Enoch, Rudolph, Larry 01 1900 (has links) The effective size of an L2 cache can be increased by using a dictionary-based compression scheme. Naive application of this idea performs poorly since the data values in a cache greatly vary in their “compressibility.” The novelty of this paper is a scheme that dynamically partitions the cache into sections of different compressibilities. While compression is often researched in the context of a large stream, in this work it is applied repeatedly on smaller cache-line sized blocks so as to preserve the random access requirement of a cache. When a cache-line is brought into the L2 cache or the cache-line is to be modified, the line is compressed using a dynamic, LZW dictionary. Depending on the compression, it is placed into the relevant partition. The partitioning is dynamic in that the ratio of space allocated to compressed and uncompressed varies depending on the actual performance, Certain SPEC-2000 benchmarks using a compressed L2 cache show an 80reduction in L2 miss-rate when compared to using an uncompressed L2 cache of the same area, taking into account all area overhead associated with the compression circuitry. For other SPEC-2000 benchmarks, the compressed cache performs as well as a traditional cache that is 4.3 times as large as the compressed cache in terms of hit rate, The adaptivity ensures that, in terms of miss rates, the compressed cache never performs worse than a traditional cache. / Singapore-MIT Alliance (SMA) dictionary-based compression Lempel-Ziv-Welch dictionary Partitioned Compressed Cache clock-scheme
14	Quantization for Low Delay and Packet Loss Subasingha, Subasingha Shaminda 22 April 2010 (has links) Quantization of multimodal vector data in Realtime Interactive Communication Networks (RICNs) associated with application areas such as speech, video, audio, and haptic signals introduces a set of unique challenges. In particular, achieving the necessary distortion performance with minimum rate while maintaining low end-to-end delay and handling packet losses is of paramount importance. This dissertation presents vector quantization schemes which aim to satisfy these important requirements based on two source coding paradigms; 1) Predictive coding 2) Distributed source coding. Gaussian Mixture Models (GMMs) can be used to model any probability density function (pdf) with an arbitrarily small error given a sufficient number of mixture components. Hence, Gaussian Mixture Models can be effectively used to model the underlying pdfs of a variety of data in RICN applications. In this dissertation, first we present Gaussian Mixture Models Kalman predictive coding, which uses transform domain predictive GMM quantization techniques with Kalman filtering principles. In particular, we show how suitable modeling of quantization noise leads to a signal-adaptive GMM Kalman predictive coder that provides improved coding performance. Moreover, we demonstrate how running a GMM Kalman predictive coder to convergence can be used to design a stationary GMM Kalman predictive coding system which provides improved coding of GMM vector data but now with only a modest increase in run-time complexity over the baseline. Next, we address the issues of packet loss in the networks using GMM Kalman predictive coding principles. In particular, we show how an initial GMM Kalman predictive coder can be utilized to obtain a robust GMM predictive coder specifically designed to operate in packet loss. We demonstrate how one can define sets of encoding and decoding modes, and design special Kalman encoding and decoding gains for each mode. With this framework, GMM predictive coding design can be viewed as determining the special Kalman gains that minimize the expected mean squared error at the decoder in packet loss conditions. Finally, we present analytical techniques for modeling, analyzing and designing Wyner-Ziv(WZ) quantizers for Distributed Source Coding for jointly Gaussian vector data with imperfect side information. In most of the DSC implementations, the side information is not explicitly available in the decoder. Thus, almost all of the practical implementations obtain the side information from the previously decoded frames. Due to model imperfections, packet losses, previous decoding errors, and quantization noise, the available side information is usually noisy. However, the design of Wyner-Ziv quantizers for imperfect side information has not been widely addressed in the DSC literature. The analytical techniques presented in this dissertation explicitly assume the existence of imperfect side information in the decoder. Furthermore, we demonstrate how the design problem for vector data can be decomposed into independent scalar design subproblems. Then, we present the analytical techniques to compute the optimum step size and bit allocation for each scalar quantizer such that the decoder's expected vector Mean Squared Error(MSE) is minimized. The simulation results verify that the predicted MSE based on the presented analytical techniques closely follow the simulation results.
15	Slepian-Wolf coded nested quantization (SEC-NQ) for Wyner-Ziv coding: high-rate performance analysis, code design, and application to cooperative networks Liu, Zhixin 15 May 2009 (has links) No description available. Wyner-Ziv coding distributed source coding cooperative networks relay channel compress-and-forward
16	Multiterminal source coding: sum-rate loss, code designs, and applications to video sensor networks Yang, Yang 15 May 2009 (has links) Driven by a host of emerging applications (e.g., sensor networks and wireless video), distributed source coding (i.e., Slepian-Wolf coding, Wyner-Ziv coding and various other forms of multiterminal source coding), has recently become a very active research area. This dissertation focuses on multiterminal (MT) source coding problem, and consists of three parts. The first part studies the sum-rate loss of an important special case of quadratic Gaussian multi-terminal source coding, where all sources are positively symmetric and all target distortions are equal. We first give the minimum sum-rate for joint encoding of Gaussian sources in the symmetric case, and then show that the supremum of the sum-rate loss due to distributed encoding in this case is 1 2 log2 5 4 = 0:161 b/s when L = 2 and increases in the order of º L 2 log2 e b/s as the number of terminals L goes to infinity. The supremum sum-rate loss of 0:161 b/s in the symmetric case equals to that in general quadratic Gaussian two-terminal source coding without the symmetric assumption. It is conjectured that this equality holds for any number of terminals. In the second part, we present two practical MT coding schemes under the framework of Slepian-Wolf coded quantization (SWCQ) for both direct and indirect MT problems. The first, asymmetric SWCQ scheme relies on quantization and Wyner-Ziv coding, and it is implemented via source splitting to achieve any point on the sum-rate bound. In the second, conceptually simpler scheme, symmetric SWCQ, the two quantized sources are compressed using symmetric Slepian-Wolf coding via a channel code partitioning technique that is capable of achieving any point on the Slepian-Wolf sum-rate bound. Our practical designs employ trellis-coded quantization and turbo/LDPC codes for both asymmetric and symmetric Slepian-Wolf coding. Simulation results show a gap of only 0.139-0.194 bit per sample away from the sum-rate bound for both direct and indirect MT coding problems. The third part applies the above two MT coding schemes to two practical sources, i.e., stereo video sequences to save the sum rate over independent coding of both sequences. Experiments with both schemes on stereo video sequences using H.264, LDPC codes for Slepian-Wolf coding of the motion vectors, and scalar quantization in conjunction with LDPC codes for Wyner-Ziv coding of the residual coefficients give slightly smaller sum rate than separate H.264 coding of both sequences at the same video quality. Multiterminal source coding Wyner-Ziv coding Slepian-Wolf coding Video sensor networks TCQ LDPC codes
17	Slepian-Wolf coded nested quantization (SEC-NQ) for Wyner-Ziv coding: high-rate performance analysis, code design, and application to cooperative networks Liu, Zhixin 15 May 2009 (has links) No description available. Wyner-Ziv coding distributed source coding cooperative networks relay channel compress-and-forward
18	Klasifikace biologických sekvencí s využitím bezeztrátové komprese / Biological sequence classification utilizing lossless data compression algorithms Kruml, Ondřej January 2016 (has links) Tato diplomová práce se zabývá možností využití bezeztrátových kompresních algoritmů ke klasifikaci biologických sekvencí. Nejdříve je představena literární rešerše o bezeztrátových kompresních algoritmech, která byla využita k výběru slovníkového algoritmu vytvořeného A. Lempelem a J. Zivem v roce 1976 (LZ77). Tento algoritmus je běžně používán k datové kompresi a v předkládané práci byl modifikován tak, aby umožnil klasifikaci biologických sekvencí. K algoritmu byly navrženy další modifikace, které rozvíjí jeho klasifikační možnosti. V průběhu práce byla sestavena sada datasetů biologických sekvencí, která umožnila podrobné testování algoritmu. Algoritmus byl porovnán s klasickými zarovnávacími metodami: Jukes-Cantor, Tamura a Kimura. Bylo ukázáno, že algoritmus dosahuje srovnatelných výsledků v oblasti klasifikace biologických sekvencí a dokonce je u 20% datasetů překonává. Lepší výsledky dosahuje zejména u sekvencí, jež jsou si vzájemně vzdálené.
19	Análise de textura em imagens baseado em medidas de complexidade / Image Texture Analysis based on complex measures Condori, Rayner Harold Montes 30 November 2015 (has links) A análise de textura é uma das mais básicas e famosas áreas de pesquisa em visão computacional. Ela é também de grande importância em muitas outras disciplinas, tais como ciências médicas e biológicas. Por exemplo, uma tarefa comum de análise de textura é a detecção de tecidos não saudáveis em imagens de Ressonância Magnética do pulmão. Nesta dissertação, nós propomos um método novo de caracterização de textura baseado nas medidas de complexidade tais como o expoente de Hurst, o expoente de Lyapunov e a complexidade de Lempel-Ziv. Estas medidas foram aplicadas sobre amostras de imagens no espaço de frequência. Três métodos de amostragem foram propostas, amostragem: radial, circular e por caminhadas determinísticas parcialmente auto- repulsivas (amostragem CDPA). Cada método de amostragem produz um vetor de características por medida de complexidade aplicada. Esse vetor contem um conjunto de descritores que descrevem a imagem processada. Portanto, cada imagem será representada por nove vetores de características (três medidas de complexidade e três métodos de amostragem), os quais serão comparados na tarefa de classificação de texturas. No final, concatenamos cada vetor de características conseguido calculando a complexidade de Lempel-Ziv em amostras radiais e circulares com os descritores obtidos através de técnicas de análise de textura tradicionais, tais como padrões binários locais (LBP), wavelets de Gabor (GW), matrizes de co-ocorrência en níveis de cinza (GLCM) e caminhadas determinísticas parcialmente auto-repulsivas em grafos (CDPAg). Este enfoque foi testado sobre três bancos de imagens: Brodatz, USPtex e UIUC, cada um com seus próprios desafios conhecidos. As taxas de acerto de todos os métodos tradicionais foram incrementadas com a concatenação de relativamente poucos descritores de Lempel-Ziv. Por exemplo, no caso do método LBP, o incremento foi de 84.25% a 89.09% com a concatenação de somente cinco descritores. De fato, simplesmente concatenando cinco descritores são suficientes para ver um incremento na taxa de acerto de todos os métodos tradicionais estudados. Por outro lado, a concatenação de un número excessivo de descritores de Lempel-Ziv (por exemplo mais de 40) geralmente não leva a melhora. Neste sentido, vendo os resultados semelhantes obtidos nos três bancos de imagens analisados, podemos concluir que o método proposto pode ser usado para incrementar as taxas de acerto em outras tarefas que envolvam classificação de texturas. Finalmente, com a amostragem CDPA também se obtém resultados significativos, que podem ser melhorados em trabalhos futuros. / Texture analysis is one of the basic and most popular computer vision research areas. It is also of importance in many other disciplines, such as medical sciences and biology. For example, non-healthy tissue detection in lung Magnetic Resonance images is a common texture analysis task. We proposed a novel method for texture characterization based on complexity measures such as Lyapunov exponent, Hurst exponent and Lempel-Ziv complexity. This measurements were applied over samples taken from images in the frequency domain. Three types of sampling methods were proposed: radial sampling, circular sampling and sampling by using partially self-avoiding deterministic walks (CDPA sampling). Each sampling method produce a feature vector which contains a set of descriptors that characterize the processed image. Then, each image will be represented by nine feature vectors which are means to be compared in texture classification tasks (three complexity measures over samples from three sampling methods). In the end, we combine each Lempel-Ziv feature vector from the circular and radial sampling with descriptors obtained through traditional image analysis techniques, such as Local Binary Patterns (LBP), Gabor Wavelets (GW), Gray Level Co-occurrence Matrix (GLCM) and Self-avoiding Deterministic Walks in graphs (CDPAg). This approach were tested in three datasets: Brodatz, USPtex and UIUC, each one with its own well-known challenges. All traditional methods success rates were increased by adding relatively few Lempel-Ziv descriptors. For example in the LBP case the increment went from 84.25% to 89.09% with the addition of only five descriptors. In fact, just adding five Lempel-Ziv descriptors are enough to see an increment in the success rate of every traditional method. However, adding too many Lempel-Ziv descriptors (for example more than 40) generally doesnt produce better results. In this sense, seeing the similar results we obtain in all three databases, we conclude that this approach may be used to increment the success rate in a lot of others texture classification tasks. Finally, the CDPA sampling also obtain very promising results that we can improve further on future works. Análise de textura Complexidade de Lempel-Ziv complexity measurement Expoente de Hurst Fourier Transform Hurst exponent Lempel-Ziv complexity Medidas de complexidade Texture analysis Transformada de Fourier
20	Biomechanical online signature modeling applied to verification / Modélisation biomécanique des signatures en ligne appliqué à la vérification Coutinho Canuto, Jânio 08 December 2014 (has links) Cette thèse porte sur la modélisation et vérification des signatures en ligne. La première partie a pour thème principal la modélisation biomécanique des mouvements de la main. Un modèle basé sur le critère de Minimum de Secousse (MS) a été choisi parmi plusieurs théories du contrôle moteur. Ensuite, le problème de la segmentation des trajectoires en traits qui correspondent au modèle cinématique choisi a été étudié, ce qui a conduit à la mise au point d'une méthode de segmentation itérative. Le choix du modèle et de la méthode de segmentation sont basé sur le compromis entre la qualité de reconstruction et la compression. Dans la deuxième partie, le modèle polynomial issu du critère de MS est volontairement dégradé. Les zéros non-Réels des polynômes sont jetés et les effets de cette dégradation sont étudiés dans une perspective de vérification biométrique. Cette dégradation est équivalente à la technique connue sous le nom d’Infinity Clipping, initialement appliqué à des signaux de parole. Pour les signatures en ligne, comme pour la parole, la préservation de l'information essentielle a été observée sur des tâches de vérification de signature. En fait, en utilisant seulement la distance de Levenshtein sur la représentation dégradée, un taux d'erreur comparable à ceux des méthodes plus élaborées a été obtenu. En outre, la représentation symbolique issue de l’Infinity Clipping permet d’établir une relation conceptuelle entre le nombre de segments obtenus par la segmentation itératif basée sur le MS et la complexité de Lempel-Ziv. Cette relation est potentiellement utile pour l'analyse des signatures en ligne et pour l’amélioration des systèmes de reconnaissance / This thesis deals with the modelling and verification of online signatures. The first part has as main theme the biomechanical modelling of hand movements associated to the signing gesture. A model based on the Minimum Jerk (MJ) criterion was chosen amongst the several available motor control theories. Next, the problem of signature trajectory segmentation into strokes that better fit the chosen kinematic model is studied, leading to the development of an iterative segmentation method. Both the choice of the model and the segmentation method are strongly based on the tradeoff between reconstruction quality and compression. On the second part, the polynomial model provided by the MJ criterion is intentionally degraded. The non-Real zeroes of the polynomials are discarded and the effects of this degradation are studied from a biometric verification perspective. This degradation is equivalent to the signal processing technique known as Infinity Clipping, originally applied to speech signals. On signatures, as for speech, the preservation of essential information was observed on signature verification tasks. As a matter of fact, using only the Levenshtein distance over the infinitely clipped representation, verification error rates comparable to those of more elaborate methods were obtained. Furthermore, the symbolic representation yielded by the infinity clipping technique allows for a conceptual relationship between the number of polynomial segments obtained through the Minimum Jerk-Based iterative segmentation and the Lempel-Ziv complexity. This relationship is potentially useful for the analysis of online signature signals and the improvement of recognition systems Biométrique Vérification des signatures Modélisation Jerk minimum Infinity clipping Complexité de Lempel-Ziv Représentation symbolique Génération de signatures Biometrics Signature verification Modeling Minimum jerk Infinity clipping Lempel-Ziv complexity Symbolic representation Signature generation

Search results