Global ETD Search

1	Higher Compression from the Burrows-Wheeler Transform with New Algorithms for the List Update Problem Chapin, Brenton 08 1900 (has links) Burrows-Wheeler compression is a three stage process in which the data is transformed with the Burrows-Wheeler Transform, then transformed with Move-To-Front, and finally encoded with an entropy coder. Move-To-Front, Transpose, and Frequency Count are some of the many algorithms used on the List Update problem. In 1985, Competitive Analysis first showed the superiority of Move-To-Front over Transpose and Frequency Count for the List Update problem with arbitrary data. Earlier studies due to Bitner assumed independent identically distributed data, and showed that while Move-To-Front adapts to a distribution faster, incurring less overwork, the asymptotic costs of Frequency Count and Transpose are less. The improvements to Burrows-Wheeler compression this work covers are increases in the amount, not speed, of compression. Best x of 2x-1 is a new family of algorithms created to improve on Move-To-Front's processing of the output of the Burrows-Wheeler Transform which is like piecewise independent identically distributed data. Other algorithms for both the middle stage of Burrows-Wheeler compression and the List Update problem for which overwork, asymptotic cost, and competitive ratios are also analyzed are several variations of Move One From Front and part of the randomized algorithm Timestamp. The Best x of 2x - 1 family includes Move-To-Front, the part of Timestamp of interest, and Frequency Count. Lastly, a greedy choosing scheme, Snake, switches back and forth as the amount of compression that two List Update algorithms achieves fluctuates, to increase overall compression. The Burrows-Wheeler Transform is based on sorting of contexts. The other improvements are better sorting orders, such as “aeioubcdf...” instead of standard alphabetical “abcdefghi...” on English text data, and an algorithm for computing orders for any data, and Gray code sorting instead of standard sorting. Both techniques lessen the overwork incurred by whatever List Update algorithms are used by reducing the difference between adjacent sorted contexts. Data compression (Computer science) Burrows-Wheeler data compression list update
2	Compression et indexation de séquences annotées / Compressing and indexing labeled sequences Rocher, Tatiana 12 February 2018 (has links) Cette thèse en algorithmique du texte étudie la compression, l'indexation et les requêtes sur un texte annoté. Un texte annoté est un texte sur lequel nous ajoutons des informations. Ce peut être par exemple une recombinaison V(D)J, un marqueur de globules blancs, où le texte est une séquence ADN et les annotations sont des noms de gènes. Le système immunitaire d'une personne se représente par un ensemble de recombinaisons V(D)J. Avec le séquençage à haut débit, on peut avoir accès à des millions de recombinaisons V(D)J qui sont stockées et doivent pouvoir être retrouvées et comparées rapidement.La première contribution de ce manuscrit est une méthode de compression d'un texte annoté qui repose sur le principe du stockage par références. Le texte est découpé en facteurs pointant vers les séquences annotées déjà connues. La seconde contribution propose deux index pour un texte annoté. Ils utilisent une transformée de Burrows-Wheeler indexant le texte ainsi qu'un Wavelet Tree stockant les annotations. Ces index permettent des requêtes efficaces sur le texte, les annotations ou les deux. Nous souhaitons à terme utiliser l'un de ces index pour indexer des recombinaisons V(D)J obtenues dans des services d'hématologie lors du diagnostic et du suivi de patients atteints de leucémie. / This thesis in text algorithm studies the compression, indexation and querying on a labeled text. A labeled text is a text to which we add information. For example: a V(D)J recombination, a marker for lymphocytes, where the text is a DNA sequence and the labels are the genes' names. A person's immune system can be represented with a set of V(D)J recombinations. With high-throughput sequencing, we have access to millions of V(D)J recombinations which are stored and need to be recovered and compared quickly.The first contribution of this manuscript is a compression method for a labeled text which uses the concept of storage by references. The text is divided into sections which point to pre-established labeled sequences. The second contribution offers two indexes for a labeled text. Both use a Burrows-Wheeler transform to index the text and a Wavelet Tree to index the labels. These indexes allow efficient queries on text, labels or both. We would like to use one of these indexes on V(D)J recombinations which are obtained with hematology services from the diagnostic or follow-up of patients suffering from leukemia. Algorithmique du texte Indexation de texte Transformée de Burrows-Wheeler Wavelet Tree 005.741
3	Experiments in Compressing Wikipedia Wotschka, Marco January 2013 (has links) No description available. Computer Science BWT Burrows-Wheeler PPM compression Wikipedia
4	Lossless and nearly-lossless image compression based on combinatorial transforms / Compression d'images sans perte ou quasi sans perte basée sur des transformées combinatoires Syahrul, Elfitrin 29 June 2011 (has links) Les méthodes classiques de compression d’image sont communément basées sur des transformées fréquentielles telles que la transformée en Cosinus Discret (DCT) ou encore la transformée discrète en ondelettes. Nous présentons dans ce document une méthode originale basée sur une transformée combinatoire celle de Burrows-Wheeler(BWT). Cette transformée est à la base d’un réagencement des données du fichier servant d’entrée au codeur à proprement parler. Ainsi après utilisation de cette méthode sur l’image originale, les probabilités pour que des caractères identiques initialement éloignés les uns des autres se retrouvent côte à côte sont alors augmentées. Cette technique est utilisée pour la compression de texte, comme le format BZIP2 qui est actuellement l’un des formats offrant un des meilleurs taux de compression. La chaîne originale de compression basée sur la transformée de Burrows-Wheeler est composée de 3 étapes. La première étape est la transformée de Burrows-Wheeler elle même qui réorganise les données de façon à regrouper certains échantillons de valeurs identiques. Burrows et Wheeler conseillent d’utiliser un codage Move-To-Front (MTF) qui va maximiser le nombre de caractères identiques et donc permettre un codage entropique (EC) (principalement Huffman ou un codeur arithmétique). Ces deux codages représentent les deux dernières étapes de la chaîne de compression. Nous avons étudié l’état de l’art et fait des études empiriques de chaînes de compression basées sur la transformée BWT pour la compression d’images sans perte. Les données empiriques et les analyses approfondies se rapportant aux plusieurs variantes de MTF et EC. En plus, contrairement à son utilisation pour la compression de texte,et en raison de la nature 2D de l’image, la lecture des données apparaît importante. Ainsi un prétraitement est utilisé lors de la lecture des données et améliore le taux de compression. Nous avons comparé nos résultats avec les méthodes de compression standards et en particulier JPEG 2000 et JPEG-LS. En moyenne le taux de com-pression obtenu avec la méthode proposée est supérieur à celui obtenu avec la norme JPEG 2000 ou JPEG-LS / Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform or Wavelets. We present a different approach for loss-less image compression, it is based on combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes a promising compression approach based on contextmodelling. BWT was initially applied for text compression software such as BZIP2 ; nevertheless it has been recently applied to the image compression field. Compression scheme based on Burrows Wheeler Transform is usually lossless ; therefore we imple-ment this algorithm in medical imaging in order to reconstruct every bit. Many vari-ants of the three stages which form the original BWT-based compression scheme can be found in the literature. We propose an analysis of the more recent methods and the impact of their association. Then, we present several compression schemes based on this transform which significantly improve the current standards such as JPEG2000and JPEG-LS. In the final part, we present some open problems which are also further research directions Transformé de Burrows-Wheeler Compression sans perte et quasi sans Burrows-Wheeler Transform (BWT) Lossless (nearly lossless) image 006.6 005.7
5	[en] THE BURROWS-WHEELER TRANSFORM AND ITS APPLICATIONS TO COMPRESSION / [pt] A TRANSFORMADA DE BURROWS-WHEELER E SUA APLICAÇÃO À COMPRESSÃO JULIO CESAR DUARTE 23 July 2003 (has links) [pt] A transformada de Burrows-Wheeler, baseada na ordenação de contextos, transforma uma seqüência de caracteres em uma nova seqüência mais facilmente comprimida por um algoritmo que explore grandes seqüências de repetições de caracteres. Aliado a recodificação do MoverParaFrente e seguida de uma codificação para os inteiros gerados, eles formam uma nova família de compressores, que possuem excelentes taxas de compressão, com boas performances nos tempos de compressão e descompressão. Este trabalho examina detalhadamente essa transformada, suas variações e algumas alternativas para os algoritmos utilizados em conjunto com ela. Como resultado final, apresentamos uma combinação de estratégias que produz taxas de compressão para texto melhores do que as oferecidas pelas implementações até aqui disponíveis. / [en] The Burrows-Wheeler Transform, based on sorting of contexts, transforms a sequence of characters into a new sequence easier to compress by an algorithm that exploits long sequences of repeted characters. Combined with the coding provided by the MoveToFront Algorithm and followed by a codification for the generated integers, they propose a new family of compressors, that achieve excellent compression rates with good time performances in compression and decompression. This work examines detaildedly this transform, its variations and some alternatives for the algorithms used together with it. As a final result, we present a combination of strategies that producescompression rates for text data that are better than those offered by implementations available nowadays. [pt] TEORIA DA INFORMACAO [en] INFORMATION THEORY [pt] COMPRESSAO DE DADOS [en] DATA COMPRESSION [pt] TRANSFORMADA DE BURROWS-WHEELER [en] BURROWS-WHEELER TRANSFORM [pt] MOVERPARAFRENTE [en] MOVETOFRONT [pt] CODIFICACAO DE INTEIROS [en] INTEGER CODING
6	Lossless and nearly-lossless image compression based on combinatorial transforms Syahrul, Elfitrin 29 June 2011 (has links) (PDF) Common image compression standards are usually based on frequency transform such as Discrete Cosine Transform or Wavelets. We present a different approach for loss-less image compression, it is based on combinatorial transform. The main transform is Burrows Wheeler Transform (BWT) which tends to reorder symbols according to their following context. It becomes a promising compression approach based on contextmodelling. BWT was initially applied for text compression software such as BZIP2 ; nevertheless it has been recently applied to the image compression field. Compression scheme based on Burrows Wheeler Transform is usually lossless ; therefore we imple-ment this algorithm in medical imaging in order to reconstruct every bit. Many vari-ants of the three stages which form the original BWT-based compression scheme can be found in the literature. We propose an analysis of the more recent methods and the impact of their association. Then, we present several compression schemes based on this transform which significantly improve the current standards such as JPEG2000and JPEG-LS. In the final part, we present some open problems which are also further research directions [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Burrows-Wheeler Transform (BWT) Lossless (nearly lossless) image
7	The analysis of enumerative source codes and their use in Burrows‑Wheeler compression algorithms McDonald, Andre Martin 10 September 2010 (has links) In the late 20th century the reliable and efficient transmission, reception and storage of information proved to be central to the most successful economies all over the world. The Internet, once a classified project accessible to a selected few, is now part of the everyday lives of a large part of the human population, and as such the efficient storage of information is an important part of the information economy. The improvement of the information storage density of optical and electronic media has been remarkable, but the elimination of redundancy in stored data and the reliable reconstruction of the original data is still a desired goal. The field of source coding is concerned with the compression of redundant data and its reliable decompression. The arithmetic source code, which was independently proposed by J. J. Rissanen and R. Pasco in 1976, revolutionized the field of source coding. Compression algorithms that use an arithmetic code to encode redundant data are typically more effective and computationally more efficient than compression algorithms that use earlier source codes such as extended Huffman codes. The arithmetic source code is also more flexible than earlier source codes, and is frequently used in adaptive compression algorithms. The arithmetic code remains the source code of choice, despite having been introduced more than 30 years ago. The problem of effectively encoding data from sources with known statistics (i.e. where the probability distribution of the source data is known) was solved with the introduction of the arithmetic code. The probability distribution of practical data is seldomly available to the source encoder, however. The source coding of data from sources with unknown statistics is a more challenging problem, and remains an active research topic. Enumerative source codes were introduced by T. J. Lynch and L. D. Davisson in the 1960s. These lossless source codes have the remarkable property that they may be used to effectively encode source sequences from certain sources without requiring any prior knowledge of the source statistics. One drawback of these source codes is the computationally complex nature of their implementations. Several years after the introduction of enumerative source codes, J. G. Cleary and I. H. Witten proved that approximate enumerative source codes may be realized by using an arithmetic code. Approximate enumerative source codes are significantly less complex than the original enumerative source codes, but are less effective than the original codes. Researchers have become more interested in arithmetic source codes than enumerative source codes since the publication of the work by Cleary and Witten. This thesis concerns the original enumerative source codes and their use in Burrows–Wheeler compression algorithms. A novel implementation of the original enumerative source code is proposed. This implementation has a significantly lower computational complexity than the direct implementation of the original enumerative source code. Several novel enumerative source codes are introduced in this thesis. These codes include optimal fixed–to–fixed length source codes with manageable computational complexity. A generalization of the original enumerative source code, which includes more complex data sources, is proposed in this thesis. The generalized source code uses the Burrows–Wheeler transform, which is a low–complexity algorithm for converting the redundancy of sequences from complex data sources to a more accessible form. The generalized source code effectively encodes the transformed sequences using the original enumerative source code. It is demonstrated and proved mathematically that this source code is universal (i.e. the code has an asymptotic normalized average redundancy of zero bits). AFRIKAANS : Die betroubare en doeltreffende versending, ontvangs en berging van inligting vorm teen die einde van die twintigste eeu die kern van die mees suksesvolle ekonomie¨e in die wˆereld. Die Internet, eens op ’n tyd ’n geheime projek en toeganklik vir slegs ’n klein groep verbruikers, is vandag deel van die alledaagse lewe van ’n groot persentasie van die mensdom, en derhalwe is die doeltreffende berging van inligting ’n belangrike deel van die inligtingsekonomie. Die verbetering van die bergingsdigteid van optiese en elektroniese media is merkwaardig, maar die uitwissing van oortolligheid in gebergde data, asook die betroubare herwinning van oorspronklike data, bly ’n doel om na te streef. Bronkodering is gemoeid met die kompressie van oortollige data, asook die betroubare dekompressie van die data. Die rekenkundige bronkode, wat onafhanklik voorgestel is deur J. J. Rissanen en R. Pasco in 1976, het ’n revolusie veroorsaak in die bronkoderingsveld. Kompressiealgoritmes wat rekenkundige bronkodes gebruik vir die kodering van oortollige data is tipies meer doeltreffend en rekenkundig meer effektief as kompressiealgoritmes wat vroe¨ere bronkodes, soos verlengde Huffman kodes, gebruik. Rekenkundige bronkodes, wat gereeld in aanpasbare kompressiealgoritmes gebruik word, is ook meer buigbaar as vroe¨ere bronkodes. Die rekenkundige bronkode bly na 30 jaar steeds die bronkode van eerste keuse. Die probleem om data wat afkomstig is van bronne met bekende statistieke (d.w.s. waar die waarskynlikheidsverspreiding van die brondata bekend is) doeltreffend te enkodeer is opgelos deur die instelling van rekenkundige bronkodes. Die bronenkodeerder het egter selde toegang tot die waarskynlikheidsverspreiding van praktiese data. Die bronkodering van data wat afkomstig is van bronne met onbekende statistieke is ’n groter uitdaging, en bly steeds ’n aktiewe navorsingsveld. T. J. Lynch and L. D. Davisson het tel–bronkodes in die 1960s voorgestel. Tel– bronkodes het die merkwaardige eienskap dat bronsekwensies van sekere bronne effektief met hierdie foutlose kodes ge¨enkodeer kan word, sonder dat die bronenkodeerder enige vooraf kennis omtrent die statistieke van die bron hoef te besit. Een nadeel van tel–bronkodes is die ho¨e rekenkompleksiteit van hul implementasies. J. G. Cleary en I. H. Witten het verskeie jare na die instelling van tel–bronkodes bewys dat benaderde tel–bronkodes gerealiseer kan word deur die gebruik van rekenkundige bronkodes. Benaderde tel–bronkodes het ’n laer rekenkompleksiteit as tel–bronkodes, maar benaderde tel–bronkodes is minder doeltreffend as die oorspronklike tel–bronkodes. Navorsers het sedert die werk van Cleary en Witten meer belangstelling getoon in rekenkundige bronkodes as tel–bronkodes. Hierdie tesis is gemoeid met die oorspronklike tel–bronkodes en die gebruik daarvan in Burrows–Wheeler kompressiealgoritmes. ’n Nuwe implementasie van die oorspronklike tel–bronkode word voorgestel. Die voorgestelde implementasie het ’n beduidende laer rekenkompleksiteit as die direkte implementasie van die oorspronklike tel–bronkode. Verskeie nuwe tel–bronkodes, insluitende optimale vaste–tot–vaste lengte tel–bronkodes met beheerbare rekenkompleksiteit, word voorgestel. ’n Veralgemening van die oorspronklike tel–bronkode, wat meer komplekse databronne insluit as die oorspronklike tel–bronkode, word voorgestel in hierdie tesis. The veralgemeende tel–bronkode maak gebruik van die Burrows–Wheeler omskakeling. Die Burrows–Wheeler omskakeling is ’n lae–kompleksiteit algoritme wat die oortolligheid van bronsekwensies wat afkomstig is van komplekse databronne omskakel na ’n meer toeganklike vorm. Die veralgemeende bronkode enkodeer die omgeskakelde sekwensies effektief deur die oorspronklike tel–bronkode te gebruik. Die universele aard van hierdie bronkode word gedemonstreer en wiskundig bewys (d.w.s. dit word bewys dat die kode ’n asimptotiese genormaliseerde gemiddelde oortolligheid van nul bisse het). Copyright / Dissertation (MEng)--University of Pretoria, 2010. / Electrical, Electronic and Computer Engineering / unrestricted Source coding Burrows wheeler Enumerative Source code Compression Fixed length code Universal code UCTD
8	Inexact Mapping of Short Biological Sequences in High Performance Computational Environments Salavert Torres, José 30 October 2014 (has links) La bioinformática es la aplicación de las ciencias computacionales a la gestión y análisis de datos biológicos. A partir de 2005, con la aparición de los secuenciadores de ADN de nueva generación surge lo que se conoce como Next Generation Sequencing o NGS. Un único experimento biológico puesto en marcha en una máquina de secuenciación NGS puede producir fácilmente cientos de gigabytes o incluso terabytes de datos. Dependiendo de la técnica elegida este proceso puede realizarse en unas pocas horas o días. La disponibilidad de recursos locales asequibles, tales como los procesadores multinúcleo o las nuevas tarjetas gráfi cas preparadas para el cálculo de propósito general GPGPU (General Purpose Graphic Processing Unit ), constituye una gran oportunidad para hacer frente a estos problemas. En la actualidad, un tema abordado con frecuencia es el alineamiento de secuencias de ADN. En bioinformática, el alineamiento permite comparar dos o más secuencias de ADN, ARN, o estructuras primarias proteicas, resaltando sus zonas de similitud. Dichas similitudes podrían indicar relaciones funcionales o evolutivas entre los genes o proteínas consultados. Además, la existencia de similitudes entre las secuencias de un individuo paciente y de otro individuo con una enfermedad genética detectada podría utilizarse de manera efectiva en el campo de la medicina diagnóstica. El problema en torno al que gira el desarrollo de la tesis doctoral consiste en la localización de fragmentos de secuencia cortos dentro del ADN. Esto se conoce bajo el sobrenombre de mapeo de secuencia o sequence mapping. Dicho mapeo debe permitir errores, pudiendo mapear secuencias incluso existiendo variabilidad genética o errores de lectura en el mapeo. Existen diversas técnicas para abordar el mapeo, pero desde la aparición de la NGS destaca la búsqueda por pre jos indexados y agrupados mediante la transformada de Burrows-Wheeler [28] (o BWT en lo sucesivo). Dicha transformada se empleó originalmente en técnicas de compresión de datos, como es el caso del algoritmo bzip2. Su utilización como herramienta para la indización y búsqueda posterior de información es más reciente [22]. La ventaja es que su complejidad computacional depende únicamente de la longitud de la secuencia a mapear. Por otra parte, una gran cantidad de técnicas de alineamiento se basan en algoritmos de programación dinámica, ya sea Smith-Watterman o modelos ocultos de Markov. Estos proporcionan mayor sensibilidad, permitiendo mayor cantidad de errores, pero su coste computacional es mayor y depende del tamaño de la secuencia multiplicado por el de la cadena de referencia. Muchas herramientas combinan una primera fase de búsqueda con la BWT de regiones candidatas al alineamiento y una segunda fase de alineamiento local en la que se mapean cadenas con Smith-Watterman o HMM. Cuando estamos mapeando permitiendo pocos errores, una segunda fase con un algoritmo de programación dinámica resulta demasiado costosa, por lo que una búsqueda inexacta basada en BWT puede resultar más e ficiente. La principal motivación de la tesis doctoral es la implementación de un algoritmo de búsqueda inexacta basado únicamente en la BWT, adaptándolo a las arquitecturas paralelas modernas, tanto en CPU como en GPGPU. El algoritmo constituirá un método nuevo de rami cación y poda adaptado a la información genómica. Durante el periodo de estancia se estudiarán los Modelos ocultos de Markov y se realizará una implementación sobre modelos de computación funcional GTA (Aggregate o Test o Generate), así como la paralelización en memoria compartida y distribuida de dicha plataforma de programación funcional. / Salavert Torres, J. (2014). Inexact Mapping of Short Biological Sequences in High Performance Computational Environments [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/43721 Inexact mapping Backward search BWT Burrows-Wheeler Transform Suffix Array GPGPU GPU
9	Akcelerace Burrows-Wheelerovy transformace s využitím GPU / Acceleration of Burrows-Wheeler Transform Using GPU Zahradníček, Tomáš January 2019 (has links) This thesis deals with Burrows-Wheeler transform (BWT) and possibilities of acceleration of this transform on graphics processing unit (GPU). Methods of compression based on BWT are introduced, as well as software libraries CUDA and OpenCL for writing programs for GPU. Parallel variants of BWT are implemented, as well as following steps necessary for compression, using CUDA library. Amount of compression of used approaches are tested and parallel versions are compared to their sequential counterparts.
10	Implementace statistických kompresních metod / Implementation of Statistical Compression Methods Štys, Jiří January 2013 (has links) This thesis describes Burrow-Wheeler compression algorithm. It focuses on each part of Burrow-Wheeler algorithm, most of all on and entropic coders. In section are described methods like move to front, inverse frequences, interval coding, etc. Among the described entropy coders are Huffman, arithmetic and Rice-Golomg coders. In conclusion there is testing of described methods of global structure transformation and entropic coders. Best combinations are compared with the most common compress algorithm.

Search results