Global ETD Search

1	Lempel-Ziv Factorization Using Less Time and Space Chen, Gang 08 1900 (has links) <p> For 30 years the Lempel-Ziv factorization LZx of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. When the Internet came in, a huge need for Lempel-Ziv factorization was created. Nowadays it has become a basic efficient data transmission format on the Internet.</p> <p> Traditionally the standard method for computing LZx was based on O(n)-time processing of the suffix tree STx of x. Ukkonen's algorithm constructs suffix tree online and so permits LZ to be built from subtrees of ST; this gives it an advantage, at least in terms of space, over the fast and compact version of McCreight's STCA [37] due to Kurtz [24]. In 2000 Abouelhoda, Kurtz & Ohlebusch proposed a O(n)-time Lempel-Ziv factorization algorithm based on an "enhanced" suffix array - that is, a suffix array SAx together with other supporting data structures.</p> <p> In this thesis we first examine some previous algorithms for computing Lempel-Ziv factorization. We then analyze the rationale of development and introduce a collection of new algorithms for computing LZ-factorization. By theoretical proof and experimental comparison based on running time and storage usage, we show that our new algorithms appear either in their theoretical behavior or in practice or both to be superior to those previously proposed. In the last chapter the conclusion of our new algorithms are given, and some open problems are pointed out for our future research.</p> / Thesis / Master of Science (MSc)
2	SOME MEASURED PERFORMANCE BOUNDS AND IMPLEMENTATION CONSIDERATIONS FOR THE LEMPEL-ZIV-WELCH DATA COMPACTION ALGORITHM Jacobsen, H. D. 10 1900 (has links) International Telemetering Conference Proceedings / October 26-29, 1992 / Town and Country Hotel and Convention Center, San Diego, California / Lempel-Ziv-Welch (LZW) algorithm is a popular data compaction technique that has been adopted by CCITT in its V.42bis recommendation and is often implemented in association with the V.32 standard for 9600 bps modems. It has also been implemented as Microcom Networking Protocol (MNP) Level 7, where it goes by the name of Enhanced Data Compression. LZW compacts data by encoding frequently occurring input strings with a single output symbol. The algorithm automatically generates a string dictionary for each symbol at each end of the transmission path. The amount of compaction that can be derived with the LZW algorithm varies with the type of data being transmitted and the efficiency by which table entries can be indexed. Table indexing is usually implemented by use of a hashing table. Although some manufacturers advertise a 4-to-1 gain in throughput, this seems to be an extreme case. This paper documents a implementation of the exact ZLW algorithm. The results presented in this paper are significantly less, typically on the order of 1-to-2 for ASCII text, with substantially less compaction for pre-compacted files or files containing random bit patterns. The efficiency of the LZW algorith on ASCII text is shown to be a function of dictionary size and block size. Although fewer transmitted symbols are required for larger dictionary tables, the additional bits required for the symbol index is marginally greater than the efficiency that is gained. The net effect is that dictionary sizes beyond 2K in size are increasingly less efficient for input data block sizes of 10K or more. The author concludes that the algorithm could be implemented as a direct table look-up rather than through a hashing algorithm. This would allow the LZW to be implemented with very simple firmware and with a maximum of hardware efficiency. Data Compression (Principal Topic) Lempel-Ziv Algorithm (Primary Category)
3	Optimizing Lempel-Ziv Factorization for the GPU Architecture Ching, Bryan 01 June 2014 (has links) (PDF) Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel nature of GPUs for computations other that their original purpose of rendering graphics. Our work targets the use of GPUs for general lossless data compression. Specifically, we developed and ported an algorithm that constructs the Lempel-Ziv factorization directly on the GPU. Our implementation bypasses the sequential nature of the LZ factorization and attempts to compute the factorization in parallel. By breaking down the LZ factorization into what we call the PLZ, we are able to outperform the fastest serial CPU implementations by up to 24x and perform comparatively to a parallel multicore CPU implementation. To achieve these speeds, our implementation outputted LZ factorizations that were on average only 0.01 percent greater than the optimal solution that what could be computed sequentially. We are also able to reevaluate the fastest GPU suffix array construction algorithm, which is needed to compute the LZ factorization. We are able to find speedups of up to 5x over the fastest CPU implementations. GPU CUDA Parallelism Compression LZ77 Lempel-Ziv Computer and Systems Architecture
4	SALZA : mesure d’information universelle entre chaînes pour la classificationet l’inférence de causalité / SALZA : universal information measure between strings for classifiation and causality Revolle, Marion 25 October 2018 (has links) Les données sous forme de chaîne de symboles sont très variées (ADN, texte, EEG quantifié,…) et ne sont pas toujours modélisables. Une description universelle des chaînes de symboles indépendante des probabilités est donc nécessaire. La complexité de Kolmogorov a été introduite en 1960 pour répondre à cette problématique. Le concept est simple : une chaîne de symboles est complexe quand il n'en existe pas une description courte. La complexité de Kolmogorov est le pendant algorithmique de l’entropie de Shannon et permet de définir la théorie algorithmique de l’information. Cependant, la complexité de Kolmogorov n’est pas calculable en un temps fini ce qui la rend inutilisable en pratique.Les premiers à rendre opérationnelle la complexité de Kolmogorov sont Lempel et Ziv en 1976 qui proposent de restreindre les opérations de la description. Une autre approche est d’utiliser la taille de la chaîne compressée par un compresseur sans perte. Cependant ces deux estimateurs sont mal définis pour le cas conditionnel et le cas joint, il est donc difficile d'étendre la complexité de Lempel-Ziv ou les compresseurs à la théorie algorithmique de l’information.Partant de ce constat, nous introduisons une nouvelle mesure d’information universelle basée sur la complexité de Lempel-Ziv appelée SALZA. L’implémentation et la bonne définition de notre mesure permettent un calcul efficace des grandeurs de la théorie algorithmique de l’information.Les compresseurs sans perte usuels ont été utilisés par Cilibrasi et Vitányi pour former un classifieur universel très populaire : la distance de compression normalisée [NCD]. Dans le cadre de cette application, nous proposons notre propre estimateur, la NSD, et montrons qu’il s’agit d’une semi-distance universelle sur les chaînes de symboles. La NSD surclasse la NCD en s’adaptant naturellement à davantage de diversité des données et en définissant le conditionnement adapté grâce à SALZA.En utilisant les qualités de prédiction universelle de la complexité de Lempel-Ziv, nous explorons ensuite les questions d’inférence de causalité. Dans un premier temps, les conditions algorithmiques de Markov sont rendues calculables grâce à SALZA. Puis en définissant pour la première l’information dirigée algorithmique, nous proposons une interprétation algorithmique de la causalité de Granger algorithmique. Nous montrons, sur des données synthétiques et réelles, la pertinence de notre approche. / Data in the form of strings are varied (DNA, text, quantify EEG) and cannot always be modeled. A universal description of strings, independent of probabilities, is thus necessary. The Kolmogorov complexity was introduced in 1960 to address the issue. The principle is simple: a string is complex if a short description of it does not exist. The Kolmogorov complexity is the counterpart of the Shannon entropy and defines the algorithmic information theory. Yet, the Kolmogorov complexity is not computable in finit time making it unusable in practice.The first ones to make operational the Kolmogorov complexity are Lempel and Ziv in 1976 who proposed to restrain the operations of the description. Another approach uses the size of the compressed string by a lossless data compression algorithm. Yet these two estimators are not well-defined regarding the joint and conditional complexity cases. So, compressors and Lempel-Ziv complexity are not valuable to estimate algorithmic information theory.In the light of this observation, we introduce a new universal information measure based on the Lempel-Ziv complexity called SALZA. The implementation and the good definition of our measure allow computing efficiently values of the algorithmic information theory.Usual lossless compressors have been used by Cilibrasi and Vitányi to define a very popular universal classifier: the normalized compression distance [NCD]. As part of this application, we introduce our own estimator, called the NSD, and we show that the NSD is a universal semi-distance between strings. NSD surpasses NCD because it gets used to a large data set and uses the adapted conditioning with SALZA.Using the accurate universal prediction quality of the Lempel-Ziv complexity, we explore the question of causality inference. At first, we compute the algorithmic causal Markov condition thanks to SALZA. Then we define, for the first time, the algorithmic directed information and based on it we introduce the algorithmic Granger causality. The relevance of our approach is demonstrated on real and synthetic data. Complexité de Lempel Ziv Mesure d’information universelle Classification non supervisée Causalité Lempel-Ziv complexity Algorithmic information theory Universal information measure Unsupervised classification Causality 004 620
5	Nelinearna dinamička analiza fizičkih procesa u žiivotnoj sredini / Nonlinear dynamical analysis of the physical processes in the environment Mimić Gordan 29 September 2016 (has links) <p>Ispitivan  je  spregnut  sistem  jednačina  za  prognozu  temperature  na povr&scaron;ini  i  u  dubljem sloju zemlji&scaron;ta.  Računati  su  Ljapunovljevi eksponenti,  bifurkacioni dijagram, atraktor i analiziran je domen re&scaron;enja. Uvedene su nove informacione mere  bazirane na<br />Kolmogorovljevoj kompleksnosti,  za kvantifikaciju  stepena nasumičnosti u vremenskim serijama,.  Nove mere su primenjene na razne serije dobijene merenjem fizičkih faktora životne sredine i pomoću klimatskih modela.</p> / <p>Coupled system of prognostic equations for  the  ground surface temperature and  the deeper layer temperature was examind. Lyapunov exponents, bifurcation diagrams, attractor and the domain of solutions were analyzed.  Novel information measures based on Kolmogorov complexity  and used  for the quantification of randomness in time series, were presented.Novel measures were tested on various time series obtained by measuring physical factors of the environment or as the climate model outputs.</p>
6	Contributions to arithmetic complexity and compression / Contributions à la complexité arithmétique et à la compression Lagarde, Guillaume 05 July 2018 (has links) Cette thèse explore deux territoires distincts de l’informatique fondamentale : la complexité et la compression. Plus précisément, dans une première partie, nous étudions la puissance des circuits arithmétiques non commutatifs, qui calculent des polynômes non commutatifs en plusieurs indéterminées. Pour cela, nous introduisons plusieurs modèles de calcul, restreints dans leur manière de calculer les monômes. Ces modèles en généralisent d’autres, plus anciens et largement étudiés, comme les programmes à branchements. Les résultats sont de trois sortes. Premièrement, nous donnons des bornes inférieures sur le nombre d’opérations arithmétiques nécessaires au calcul de certains polynômes tels que le déterminant ou encore le permanent. Deuxièmement, nous concevons des algorithmes déterministes fonctionnant en temps polynomial pour résoudre le problème du test d’identité polynomiale. Enfin, nous construisons un pont entre la théorie des automates et les circuits arithmétiques non commutatifs, ce qui nous permet de dériver de nouvelles bornes inférieures en utilisant une mesure reposant sur le rang de la matrice dite de Hankel, provenant de la théorie des automates. Une deuxième partie concerne l’analyse de l’algorithme de compression sans perte Lempel-Ziv. Pourtant très utilisé, sa stabilité est encore mal établie. Vers la fin des années 90s, Jack Lutz popularise la question suivante, connue sous le nom de « one-bit catastrophe » : « étant donné un mot compressible, est-il possible de le rendre incompressible en ne changeant qu’un seul bit ? ». Nous montrons qu’une telle catastrophe est en effet possible. Plus précisément, en donnant des bornes optimales sur la variation de la taille de la compression, nous montrons qu’un mot « très compressible » restera toujours compressible après modification d’un bit, mais que certains mots « peu compressibles » deviennent en effet incompressibles. / This thesis explores two territories of computer science: complexity and compression. More precisely, in a first part, we investigate the power of non-commutative arithmetic circuits, which compute multivariate non-commutative polynomials. For that, we introduce various models of computation that are restricted in the way they are allowed to compute monomials. These models generalize previous ones that have been widely studied, such as algebraic branching programs. The results are of three different types. First, we give strong lower bounds on the number of arithmetic operations needed to compute some polynomials such as the determinant or the permanent. Second, we design some deterministic polynomial-time algorithm to solve the white-box polynomial identity problem. Third, we exhibit a link between automata theory and non-commutative arithmetic circuits that allows us to derive some old and new tight lower bounds for some classes of non-commutative circuits, using a measure based on the rank of a so-called Hankel matrix. A second part is concerned with the analysis of the data compression algorithm called Lempel-Ziv. Although this algorithm is widely used in practice, we know little about its stability. Our main result is to show that an infinite word compressible by LZ’78 can become incompressible by adding a single bit in front of it, thus closing a question proposed by Jack Lutz in the late 90s under the name “one-bit catastrophe”. We also give tight bounds on the maximal possible variation between the compression ratio of a finite word and its perturbation—when one bit is added in front of it. Complexité algébrique Bornes inférieures Test d’identité polynomiale Circuits arithmétiques Lempel-Ziv Compression Algebraic complexity Lower bounds Polynomial identity testing Arithmetic circuits Lempel-Ziv Compression
7	A Dynamically Partitionable Compressed Cache Chen, David, Peserico, Enoch, Rudolph, Larry 01 1900 (has links) The effective size of an L2 cache can be increased by using a dictionary-based compression scheme. Naive application of this idea performs poorly since the data values in a cache greatly vary in their “compressibility.” The novelty of this paper is a scheme that dynamically partitions the cache into sections of different compressibilities. While compression is often researched in the context of a large stream, in this work it is applied repeatedly on smaller cache-line sized blocks so as to preserve the random access requirement of a cache. When a cache-line is brought into the L2 cache or the cache-line is to be modified, the line is compressed using a dynamic, LZW dictionary. Depending on the compression, it is placed into the relevant partition. The partitioning is dynamic in that the ratio of space allocated to compressed and uncompressed varies depending on the actual performance, Certain SPEC-2000 benchmarks using a compressed L2 cache show an 80reduction in L2 miss-rate when compared to using an uncompressed L2 cache of the same area, taking into account all area overhead associated with the compression circuitry. For other SPEC-2000 benchmarks, the compressed cache performs as well as a traditional cache that is 4.3 times as large as the compressed cache in terms of hit rate, The adaptivity ensures that, in terms of miss rates, the compressed cache never performs worse than a traditional cache. / Singapore-MIT Alliance (SMA) dictionary-based compression Lempel-Ziv-Welch dictionary Partitioned Compressed Cache clock-scheme
8	Klasifikace biologických sekvencí s využitím bezeztrátové komprese / Biological sequence classification utilizing lossless data compression algorithms Kruml, Ondřej January 2016 (has links) Tato diplomová práce se zabývá možností využití bezeztrátových kompresních algoritmů ke klasifikaci biologických sekvencí. Nejdříve je představena literární rešerše o bezeztrátových kompresních algoritmech, která byla využita k výběru slovníkového algoritmu vytvořeného A. Lempelem a J. Zivem v roce 1976 (LZ77). Tento algoritmus je běžně používán k datové kompresi a v předkládané práci byl modifikován tak, aby umožnil klasifikaci biologických sekvencí. K algoritmu byly navrženy další modifikace, které rozvíjí jeho klasifikační možnosti. V průběhu práce byla sestavena sada datasetů biologických sekvencí, která umožnila podrobné testování algoritmu. Algoritmus byl porovnán s klasickými zarovnávacími metodami: Jukes-Cantor, Tamura a Kimura. Bylo ukázáno, že algoritmus dosahuje srovnatelných výsledků v oblasti klasifikace biologických sekvencí a dokonce je u 20% datasetů překonává. Lepší výsledky dosahuje zejména u sekvencí, jež jsou si vzájemně vzdálené.
9	Análise de textura em imagens baseado em medidas de complexidade / Image Texture Analysis based on complex measures Condori, Rayner Harold Montes 30 November 2015 (has links) A análise de textura é uma das mais básicas e famosas áreas de pesquisa em visão computacional. Ela é também de grande importância em muitas outras disciplinas, tais como ciências médicas e biológicas. Por exemplo, uma tarefa comum de análise de textura é a detecção de tecidos não saudáveis em imagens de Ressonância Magnética do pulmão. Nesta dissertação, nós propomos um método novo de caracterização de textura baseado nas medidas de complexidade tais como o expoente de Hurst, o expoente de Lyapunov e a complexidade de Lempel-Ziv. Estas medidas foram aplicadas sobre amostras de imagens no espaço de frequência. Três métodos de amostragem foram propostas, amostragem: radial, circular e por caminhadas determinísticas parcialmente auto- repulsivas (amostragem CDPA). Cada método de amostragem produz um vetor de características por medida de complexidade aplicada. Esse vetor contem um conjunto de descritores que descrevem a imagem processada. Portanto, cada imagem será representada por nove vetores de características (três medidas de complexidade e três métodos de amostragem), os quais serão comparados na tarefa de classificação de texturas. No final, concatenamos cada vetor de características conseguido calculando a complexidade de Lempel-Ziv em amostras radiais e circulares com os descritores obtidos através de técnicas de análise de textura tradicionais, tais como padrões binários locais (LBP), wavelets de Gabor (GW), matrizes de co-ocorrência en níveis de cinza (GLCM) e caminhadas determinísticas parcialmente auto-repulsivas em grafos (CDPAg). Este enfoque foi testado sobre três bancos de imagens: Brodatz, USPtex e UIUC, cada um com seus próprios desafios conhecidos. As taxas de acerto de todos os métodos tradicionais foram incrementadas com a concatenação de relativamente poucos descritores de Lempel-Ziv. Por exemplo, no caso do método LBP, o incremento foi de 84.25% a 89.09% com a concatenação de somente cinco descritores. De fato, simplesmente concatenando cinco descritores são suficientes para ver um incremento na taxa de acerto de todos os métodos tradicionais estudados. Por outro lado, a concatenação de un número excessivo de descritores de Lempel-Ziv (por exemplo mais de 40) geralmente não leva a melhora. Neste sentido, vendo os resultados semelhantes obtidos nos três bancos de imagens analisados, podemos concluir que o método proposto pode ser usado para incrementar as taxas de acerto em outras tarefas que envolvam classificação de texturas. Finalmente, com a amostragem CDPA também se obtém resultados significativos, que podem ser melhorados em trabalhos futuros. / Texture analysis is one of the basic and most popular computer vision research areas. It is also of importance in many other disciplines, such as medical sciences and biology. For example, non-healthy tissue detection in lung Magnetic Resonance images is a common texture analysis task. We proposed a novel method for texture characterization based on complexity measures such as Lyapunov exponent, Hurst exponent and Lempel-Ziv complexity. This measurements were applied over samples taken from images in the frequency domain. Three types of sampling methods were proposed: radial sampling, circular sampling and sampling by using partially self-avoiding deterministic walks (CDPA sampling). Each sampling method produce a feature vector which contains a set of descriptors that characterize the processed image. Then, each image will be represented by nine feature vectors which are means to be compared in texture classification tasks (three complexity measures over samples from three sampling methods). In the end, we combine each Lempel-Ziv feature vector from the circular and radial sampling with descriptors obtained through traditional image analysis techniques, such as Local Binary Patterns (LBP), Gabor Wavelets (GW), Gray Level Co-occurrence Matrix (GLCM) and Self-avoiding Deterministic Walks in graphs (CDPAg). This approach were tested in three datasets: Brodatz, USPtex and UIUC, each one with its own well-known challenges. All traditional methods success rates were increased by adding relatively few Lempel-Ziv descriptors. For example in the LBP case the increment went from 84.25% to 89.09% with the addition of only five descriptors. In fact, just adding five Lempel-Ziv descriptors are enough to see an increment in the success rate of every traditional method. However, adding too many Lempel-Ziv descriptors (for example more than 40) generally doesnt produce better results. In this sense, seeing the similar results we obtain in all three databases, we conclude that this approach may be used to increment the success rate in a lot of others texture classification tasks. Finally, the CDPA sampling also obtain very promising results that we can improve further on future works. Análise de textura Complexidade de Lempel-Ziv complexity measurement Expoente de Hurst Fourier Transform Hurst exponent Lempel-Ziv complexity Medidas de complexidade Texture analysis Transformada de Fourier
10	Biomechanical online signature modeling applied to verification / Modélisation biomécanique des signatures en ligne appliqué à la vérification Coutinho Canuto, Jânio 08 December 2014 (has links) Cette thèse porte sur la modélisation et vérification des signatures en ligne. La première partie a pour thème principal la modélisation biomécanique des mouvements de la main. Un modèle basé sur le critère de Minimum de Secousse (MS) a été choisi parmi plusieurs théories du contrôle moteur. Ensuite, le problème de la segmentation des trajectoires en traits qui correspondent au modèle cinématique choisi a été étudié, ce qui a conduit à la mise au point d'une méthode de segmentation itérative. Le choix du modèle et de la méthode de segmentation sont basé sur le compromis entre la qualité de reconstruction et la compression. Dans la deuxième partie, le modèle polynomial issu du critère de MS est volontairement dégradé. Les zéros non-Réels des polynômes sont jetés et les effets de cette dégradation sont étudiés dans une perspective de vérification biométrique. Cette dégradation est équivalente à la technique connue sous le nom d’Infinity Clipping, initialement appliqué à des signaux de parole. Pour les signatures en ligne, comme pour la parole, la préservation de l'information essentielle a été observée sur des tâches de vérification de signature. En fait, en utilisant seulement la distance de Levenshtein sur la représentation dégradée, un taux d'erreur comparable à ceux des méthodes plus élaborées a été obtenu. En outre, la représentation symbolique issue de l’Infinity Clipping permet d’établir une relation conceptuelle entre le nombre de segments obtenus par la segmentation itératif basée sur le MS et la complexité de Lempel-Ziv. Cette relation est potentiellement utile pour l'analyse des signatures en ligne et pour l’amélioration des systèmes de reconnaissance / This thesis deals with the modelling and verification of online signatures. The first part has as main theme the biomechanical modelling of hand movements associated to the signing gesture. A model based on the Minimum Jerk (MJ) criterion was chosen amongst the several available motor control theories. Next, the problem of signature trajectory segmentation into strokes that better fit the chosen kinematic model is studied, leading to the development of an iterative segmentation method. Both the choice of the model and the segmentation method are strongly based on the tradeoff between reconstruction quality and compression. On the second part, the polynomial model provided by the MJ criterion is intentionally degraded. The non-Real zeroes of the polynomials are discarded and the effects of this degradation are studied from a biometric verification perspective. This degradation is equivalent to the signal processing technique known as Infinity Clipping, originally applied to speech signals. On signatures, as for speech, the preservation of essential information was observed on signature verification tasks. As a matter of fact, using only the Levenshtein distance over the infinitely clipped representation, verification error rates comparable to those of more elaborate methods were obtained. Furthermore, the symbolic representation yielded by the infinity clipping technique allows for a conceptual relationship between the number of polynomial segments obtained through the Minimum Jerk-Based iterative segmentation and the Lempel-Ziv complexity. This relationship is potentially useful for the analysis of online signature signals and the improvement of recognition systems Biométrique Vérification des signatures Modélisation Jerk minimum Infinity clipping Complexité de Lempel-Ziv Représentation symbolique Génération de signatures Biometrics Signature verification Modeling Minimum jerk Infinity clipping Lempel-Ziv complexity Symbolic representation Signature generation

Search results