301 |
Information retrieval via universal source codingBae, Soo Hyun 17 November 2008 (has links)
This dissertation explores the intersection of information retrieval and universal source coding techniques and studies an optimal multidimensional source representation from an information theoretic point of view. Previous research on information retrieval particularly focus on learning probabilistic or deterministic source models based on primarily two different types of source representations, e.g., fixed-shape partitions or uniform regions. We study the limitations of the conventional source representations on capturing the semantics of the given multidimensional source sequences and propose a new type of primitive source representation generated by a universal source coding technique. We propose a multidimensional incremental parsing algorithm extended from the Lempel-Ziv incremental parsing and its three component schemes for multidimensional source coding. The properties of the proposed coding algorithm are exploited under two-dimensional lossless and lossy source coding. By the proposed coding algorithm, a given multidimensional source sequence is parsed into a number of variable-size patches. We call this methodology a parsed representation.
Based on the source representation, we propose an information retrieval framework that analyzes a set of source sequences under a linguistic processing technique and implemented content-based image retrieval systems. We examine the relevance of the proposed source representation by comparing it with the conventional representation of visual information. To further extend the proposed framework, we apply a probabilistic linguistic processing technique to modeling the latent aspects of a set of documents. In addition, beyond the symbol-wise pattern matching paradigm employed in the source coding and the image retrieval systems, we devise a robust pattern matching that compares the first- and second-order statistics of source patches. Qualitative and quantitative analysis of the proposed framework justifies the superiority of the proposed information retrieval framework based on the parsed representation. The proposed
source representation technique and the information retrieval frameworks encourage future work in exploiting a systematic way of understanding multidimensional sources that parallels a linguistic structure.
|
302 |
Managing and Exploring Large Data Sets Generated by Liquid Separation - Mass SpectrometryBäckström, Daniel January 2007 (has links)
A trend in natural science and especially in analytical chemistry is the increasing need for analysis of a large number of complex samples with low analyte concentrations. Biological samples (urine, blood, plasma, cerebral spinal fluid, tissue etc.) are often suitable for analysis with liquid separation mass spectrometry (LS-MS), resulting in two-way data tables (time vs. m/z). Such biological 'fingerprints' taken for all samples in a study correspond to a large amount of data. Detailed characterization requires a high sampling rate in combination with high mass resolution and wide mass range, which presents a challenge in data handling and exploration. This thesis describes methods for managing and exploring large data sets made up of such detailed 'fingerprints' (represented as data matrices). The methods were implemented as scripts and functions in Matlab, a wide-spread environment for matrix manipulations. A single-file structure to hold the imported data facilitated both easy access and fast manipulation. Routines for baseline removal and noise reduction were intended to reduce the amount of data without loosing relevant information. A tool for visualizing and exploring single runs was also included. When comparing two or more 'fingerprints' they usually have to be aligned due to unintended shifts in analyte positions in time and m/z. A PCA-like multivariate method proved to be less sensitive to such shifts, and an ANOVA implementation made it easier to find systematic differences within the data sets. The above strategies and methods were applied to complex samples such as plasma, protein digests, and urine. The field of application included urine profiling (paracetamole intake; beverage effects), peptide mapping (different digestion protocols) and search for potential biomarkers (appendicitis diagnosis) . The influence of the experimental factors was visualized by PCA score plots as well as clustering diagrams (dendrograms).
|
303 |
Νέες τεχνικές συμπίεσης δεδομένων δοκιμής που βασίζονται στη χρήση πινάκων / New dictionary-based techniques for test data compressionΣισμάνογλου, Παναγιώτης 01 October 2012 (has links)
Στην εργασία, αυτή, εξετάζονται οι μέθοδοι συμπίεσης του συνόλου δοκιμής με τη χρήση πινάκων που έχουν ήδη προταθεί και προτείνεται μία νέα μέθοδος συμπίεσης δεδομένων δοκιμής για πυρήνες που ο έλεγχος ορθής λειτουργίας υλοποιείται μέσω μονοπατιών ολίσθησης. Η νέα μέθοδος επαναχρησιμοποιεί μπλοκ του πίνακα για τη σύνθεση διανυσμάτων δοκιμής. Δύο νέοι αλγόριθμοι παρουσιάζονται για επιλεκτική και πλήρη καταχώρηση τμημάτων του συνόλου δοκιμής σε πίνακα. Η προτεινόμενη μέθοδος συγκρίνεται με τις υπάρχουσες μεθόδους ως προς το ποσοστό συμπίεσης αλλά και ως προς το κόστος υλοποίησης. Για την αξιολόγηση της μεθόδου λαμβάνονται υπόψη σύνολα δοκιμής που έχουν παραχθεί για την ανίχνευση απλών σφαλμάτων μόνιμης τιμής, απλών σφαλμάτων μόνιμης τιμής με πολλαπλότητα ανίχνευσης Ν (Ν-detect) και σφαλμάτων καθυστέρησης μετάβασης. / In this work we refer to dictionary based test data compression methods. At first the already known dictionary based test data compression methods are comparably presented. Then we propose a new method and we show that the test data compression achieved by a dictionary based method can be improved significantly by suitably reusing parts of the dictionary entries. To this end two new algorithms are proposed, suitable for partial and complete dictionary coding respectively. For the evaluation of the proposed method, test sets have been generated and used based on the stuck-at fault model for single and N detection of each fault as well as on the transition fault model.
|
304 |
Desenvolvimento e implementação em FPGA de um compressor sem perdas de baixa complexidade para imagens de satéliteCosta, Yuri Gonzaga Gonçalves da 31 July 2012 (has links)
Made available in DSpace on 2015-05-14T12:36:33Z (GMT). No. of bitstreams: 1
Arquivototal.pdf: 3633724 bytes, checksum: f53669bf4f692585666fd625941bdbe0 (MD5)
Previous issue date: 2012-07-31 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The amount of data generated and transmitted by satellites to ground stations is always growing. As the technology advances, space imaging systems, especially those
present in Earth observing missions, use equipment of increasing resolutions. Hence, it is necessary to ensure that this great quantity of data arrives at their destination reliably. Among some techniques involved, data compression plays an important role to accomplish this requirement. A data compression system for this purpose must comply with some conditions, particularly regarding performance. In this context, hardware implementations based on prediction and Golomb-Rice coding has achieved excellent results considering hardware and compression performance in both lossless and lossy cases. This work proposes a digital hardware approach of a low complexity satellite image lossless compressor based on prediction and Golomb-Rice coding that is attuned to the balance between performance requirements and error propagation, a
common issue in space systems environment that is enhanced by data compression. In order to validate and analyze the compressor, a functional verification and FPGA prototyping methodology were followed. Given an image set from Brazilian's National Institute for Space Research (INPE, in the Portuguese acronym), CBERS-2B satellite, its results in FPGA show that this compressor achieves average compression ratio of 3.4, comparable value to related works in this area, and throughput of 28 MPixel/s (224
Mbit/s). Taking advantage of images nature, its compression can be parallelized through simultaneous multi-cores compressors. For example, using 5 cores, this work
is able to compress those images in a rate of 142 MPixel/s (1.1 Gbit/s). All these features make it useful and effective in a current remote sensing imaging system. / A quantidade de dados gerados e transmitidos pelos satélites para as estações na Terra é cada vez maiores. Com o passar do tempo e avanço da tecnologia, os sistemas de imageamento espaciais, particularmente as missões de observação da Terra, tem utilizado equipamentos com resoluções cada vez maiores. Por esse motivo, se faz necessário garantir que os dados cheguem ao destino de maneira confiável. Dentre algumas técnicas envolvidas, a compressão de dados é o meio mais viável de alcançar esse requisito. Um sistema de compressão de dados para esse fim deve obedecer algumas condições, principalmente quanto ao desempenho. Nesse contexto, implementações em hardware baseadas em predição e codificação de Golomb-Rice têm obtido excelentes resultados considerando desempenho do hardware e da compressão, tanto nos casos sem perdas como nos com perdas. O presente trabalho apresenta uma proposta de hardware digital de um compressor sem perdas para imagens de satélite baseado em predição e codificação Golomb-Rice que busca um balanceamento entre os requisitos de desempenho e a propagação de erros, um problema comum no âmbito de sistemas espaciais e que é potencializado no caso dos compressores de dados. Para validação e análise do compressor, é seguida uma metodologia de verificação funcional de hardware digital e o desenvolvimento de um
protótipo em FPGA. Dado um conjunto de imagens do satélite CBERS-2B disponibilizadas pelo Instituto Nacional de Pesquisas Espaciais, os resultados obtidos em FPGA mostram que esse compressor alcança razão de compressão média de 3,4,
valor comparável a trabalhos correlatos, e velocidade de 28 MPixel/s (224 Mbit/s). Considerando a natureza das imagens, a compressão pode ser paralelizada por meio de simultâneos núcleos compressores em uma abordagem multicore. Por exemplo, usando 5 núcleos, o sistema proposto é capaz de comprimir essas imagens em uma velocidade de 142 MPixel/s (1.1 Gbit/s). Todas essas características tornam-no útil e
efetivo para a aplicação em um sistema moderno de imageamento para sensoriamento remoto.
|
305 |
Sistema Embarcado para um Monitor Holter que Utiliza o Modelo PPM na Compressão de Sinais ECGFarias, Thyago Maia Tavares de 04 March 2010 (has links)
Made available in DSpace on 2015-05-14T12:36:54Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 2004014 bytes, checksum: 3d8ca87826ca89996bb9c71a82501746 (MD5)
Previous issue date: 2010-03-04 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In this work, we present the development of an embedded system prototyping with
soft-core Nios II and FPGA for a holter monitor that implements data compression, using the
PPM Algorithm, and simulate ECG signals through the implementation of the Fourier series.
Through a holter monitor, cardiologists can obtain ECG signals, serving as the basis for the
perception of symptoms and activities of patients. These signals are captured and recorded
by monitors in periods greater than or equal to 24 hours, requiring large storage size to store
them, therefore increasing cost of the monitor. Using the PPM algorithm, a monitor holter can
considerably reduce the size of the signals stored, thus reducing storage space and cost of
device, addition to allow rapid transmission of the data. Integrating the ECG signal simulator
to the device, is possible to generate samples of ECG via the embedded system, saving time
and eliminating difficulties in obtaining signals, compared with the capture of real ECG
signals by invasive and noninvasive methods. It enables the analysis and study of normal
and abnormal ECGs. An embedded system on programmable chip (SOPC) was prototyped
with a development kit containing peripherals and FPGA chip compatible with the Nios II.
Architecture soft-core was set to compact operating system and software modules have
been successfully developed, ported and validated on this platform. / Neste trabalho, é apresentado o desenvolvimento de um sistema embarcado com
prototipagem em FPGA contendo instanciação do processador soft-core Nios II (SOPC
System on a Programmable Chip), para um monitor holter que implementa compressão de
dados, utilizando o algoritmo PPM, e simula sinais ECG através da implementação das
Séries de Fourier. Através de um monitor holter, cardiologistas podem obter sinais ECG, que
servem de base para a percepção de sintomas e atividades em pacientes, captados e
armazenados pelos monitores em períodos maiores ou iguais a 24 horas, requisitando
grandes espaços de armazenamento, aumentando, assim, o custo deste monitor. Utilizando
o PPM, o dispositivo desenvolvido poderá reduzir consideravelmente a quantidade de dados
armazenados, reduzindo, portanto, o espaço de armazenamento e o custo do dispositivo,
permitindo ainda a rápida transmissão dos dados. Integrando o simulador de sinais ECG ao
dispositivo, possibilita-se a geração de amostras de sinais eletrocardiográficos através do
sistema embarcado, economizando tempo e eliminando dificuldades na obtenção de sinais,
em comparação com a captação real de sinais ECG através de métodos invasivos e nãoinvasivos.
O mesmo permite a análise e o estudo de sinais ECG normais e anormais. Um
sistema embarcado em chip programável (SOPC) foi prototipado com uma placa contendo
periféricos e uma pastilha FPGA dotada de compatibilidade com o Nios II; a arquitetura do
soft-core foi configurada em sistema operacional compacto e módulos de software foram
exitosamente desenvolvidos, portados e validados sobre essa plataforma.
|
306 |
Reconhecimento de formas utilizando modelos de compressão de dados e espaços de escalas de curvaturaLordão, Fernando Augusto Ferreira 27 August 2009 (has links)
Made available in DSpace on 2015-05-14T12:36:54Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 2499209 bytes, checksum: 80d399f8f00f3e82d2a3b34e52fd6b05 (MD5)
Previous issue date: 2009-08-27 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / As the processing power of computers increases, the quantity and complexity of stored data
have growing in the same way, requiring more sophisticated mechanisms to accomplish
retrieval with efficacy and efficiency over these information. In image processing, it has
become common the retrieval based on its own content, namely Content-Based Image
Retrieval (CBIR), which eliminates the need to place additional annotations as textual
descriptions and keywords registered by an observer. The purpose of this work is the
development of an image retrieval mechanism based on shape recognition. The mechanism
consists in (1) compute the Full Curvature Scale Space (FullCSS) image descriptors; and (2)
apply over them a lossless compression method objecting to (3) classify these descriptors and
retrieve the corresponding images. The FullCSS descriptors register the curvature variations
on the image contour indicating the degree and the signal of these variations, which allow
identifying where the curvature is concave or convex. The adopted compression method uses
the Prediction by Partial Matching (PPM) compression model, which has been successfully
used in other works to classify texture images. The results obtained show that this novel
approach is able to reach competitive levels of efficacy and efficiency when compared to
other works recently developed in this same area. / Com o aumento do poder de processamento dos computadores, cresceu também a quantidade
e complexidade dos dados armazenados, exigindo mecanismos cada vez mais sofisticados
para se conseguir uma recuperação eficaz e eficiente destas informações. No caso do
processamento de imagens, tem se tornado comum a recuperação baseada em seu próprio
conteúdo, ou seja, Recuperação de Imagem Baseada em Conteúdo (Content-Based Image
Retrieval CBIR), eliminando a necessidade de anotações adicionais como descrições
textuais e palavras-chave registradas por um observador. A proposta deste trabalho é o
desenvolvimento de um mecanismo de recuperação de imagens através do reconhecimento de
sua forma. O mecanismo consiste em (1) calcular os descritores Full Curvature Scale Space
(FullCSS) das imagens; e (2) aplicar sobre eles um método de compressão sem perdas com a
finalidade de (3) classificar esses descritores e recuperar as imagens correspondentes. Os
descritores FullCSS registram as variações na curvatura do contorno da imagem indicando o
grau e o sinal dessas variações, permitindo identificar onde a curvatura é côncava ou convexa.
O método de compressão adotado utiliza o modelo de compressão Prediction by Partial
Matching (PPM), utilizado com sucesso em outros trabalhos para classificar imagens de
texturas. Os resultados obtidos indicam que esta abordagem inovadora é capaz de atingir
níveis competitivos de eficácia e eficiência quando comparada a outros trabalhos atualmente
desenvolvidos nesta mesma área.
|
307 |
Distributed compressed data gathering in wireless sensor networksLeinonen, M. (Markus) 02 October 2018 (has links)
Abstract
Wireless sensor networks (WSNs) consisting of battery-powered sensors are increasingly deployed for a myriad of Internet of Things applications, e.g., environmental, industrial, and healthcare monitoring. Since wireless access is typically the main contributor to battery usage, minimizing communications is crucial to prolong network lifetime and improve user experience. The objective of this thesis is to develop and analyze energy-efficient distributed compressed data acquisition techniques for WSNs. The thesis proposes four approaches to conserve sensors' energy by minimizing the amount of information each sensor has to transmit to meet given application requirements.
The first part addresses a cross-layer design to minimize the sensors’ sum transmit power via joint optimization of resource allocation and multi-path routing. A distributed consensus optimization based algorithm is proposed to solve the problem. The algorithm is shown to have superior convergence compared to several baselines.
The remaining parts deal with compressed sensing (CS) of sparse/compressible sources. The second part focuses on the distributed CS acquisition of spatially and temporally correlated sensor data streams. A CS algorithm based on sliding window and recursive decoding is developed. The method is shown to achieve higher reconstruction accuracy with fewer transmissions and less decoding delay and complexity compared to several baselines, and to progressively refine past estimates.
The last two approaches incorporate the quantization of CS measurements and focus on lossy source coding. The third part addresses the distributed quantized CS (QCS) acquisition of correlated sparse sources. A distortion-rate optimized variable-rate QCS method is proposed. The method is shown to achieve higher distortion-rate performance than the baselines and to enable a trade-off between compression performance and encoding complexity via the pre-quantization of measurements.
The fourth part investigates information-theoretic rate-distortion (RD) performance limits of single-sensor QCS. A lower bound to the best achievable compression — defined by the remote RD function (RDF) — is derived. A method to numerically approximate the remote RDF is proposed. The results compare practical QCS methods to the derived limits, and show a novel QCS method to approach the remote RDF. / Tiivistelmä
Patterikäyttöisistä antureista koostuvat langattomat anturiverkot yleistyvät esineiden internetin myötä esim. ympäristö-, teollisuus-, ja terveydenhoitosovelluksissa. Koska langaton tiedonsiirto kuluttaa merkittävästi energiaa, kommunikoinnin minimointi on elintärkeää pidentämään verkon elinikää ja parantamaan käyttäjäkokemusta. Väitöskirjan tavoitteena on kehittää ja analysoida energiatehokkaita hajautettuja pakattuja datankeruumenetelmiä langattomiin anturiverkkoihin. Työssä ehdotetaan neljä lähestymistapaa, jotka säästävät anturien energiaa minimoimalla se tiedonsiirron määrä, mikä vaaditaan täyttämään sovelluksen asettamat kriteerit.
Väitöskirjan ensimmäinen osa tarkastelee protokollakerrosten yhteissuunnittelua, jossa minimoidaan anturien yhteislähetysteho optimoimalla resurssiallokaatio ja monitiereititys. Ratkaisuksi ehdotetaan konsensukseen perustuva hajautettu algoritmi. Tulokset osoittavat algoritmin suppenemisominaisuuksien olevan verrokkejaan paremmat.
Loppuosat keskittyvät harvojen lähteiden pakattuun havaintaan (compressed sensing, CS). Toinen osa keskittyy tila- ja aikatasossa korreloituneen anturidatan hajautettuun keräämiseen. Työssä kehitetään liukuvaan ikkunaan ja rekursiiviseen dekoodaukseen perustuva CS-algoritmi. Tulokset osoittavat menetelmän saavuttavan verrokkejaan korkeamman rekonstruktiotarkkuuden pienemmällä tiedonsiirrolla sekä dekoodausviiveellä ja -kompleksisuudella ja kykenevän asteittain parantamaan menneitä estimaatteja.
Työn viimeiset osat sisällyttävät järjestelmämalliin CS-mittausten kvantisoinnin keskittyen häviölliseen lähdekoodaukseen. Kolmas osa käsittelee hajautettua korreloitujen harvojen signaalien kvantisoitua CS-havaintaa (quantized CS, QCS). Työssä ehdotetaan särön ja muuttuvan koodinopeuden välisen suhteen optimoiva QCS-menetelmä. Menetelmällä osoitetaan olevan verrokkejaan parempi pakkaustehokkuus sekä kyky painottaa suorituskyvyn ja enkooderin kompleksisuuden välillä mittausten esikvantisointia käyttäen.
Neljäs osa tutkii informaatioteoreettisia, koodisuhde-särösuhteeseen perustuvia suorituskykyrajoja yhden anturin QCS-järjestelmässä. Parhaimmalle mahdolliselle pakkaustehokkuudelle johdetaan alaraja, sekä kehitetään menetelmä sen numeeriseen arviointiin. Tulokset vertaavat käytännön QCS-menetelmiä johdettuihin rajoihin, ja osoittavat ehdotetun QCS-menetelmän saavuttavan lähes optimaalinen suorituskyky.
|
308 |
Procedimentos para tratamento e compressão de imagens e video utilizando tecnologia fractal e transformadas waveletSilva, Fernando Silvestre da 23 September 2005 (has links)
Orientador: Yuzo Iano / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-05T13:46:30Z (GMT). No. of bitstreams: 1
Silva_FernandoSilvestreda_D.pdf: 35017484 bytes, checksum: fb460a6a42e44fe0a50e94599ac027fc (MD5)
Previous issue date: 2005 / Resumo: A excelente qualidade visual e taxa de compressão dos codificadores fractais de imagem tem aplicações limitadas devido ao exaustivo tempo de codificação inerente. Esta pesquisa apresenta um novo codificador híbrido de imagens que aplica a velocidade da transformada wavelet à qualidade da compressão fractal. Neste esquema, uma codificação fractal acelerada usando a classificação de domínios de Fisher é aplicada na sub-banda passa-baixas de uma imagem transformada por wavelet e uma versão modificada do SPIHT (Set Partitioned in Hierarchical Trees) é aplicada nos coeficientes remanescentes. Os detalhes de imagem e as características de transmissão progressiva da transformada wavelet são mantidas; nenhum efeito de bloco comuns às técnicas fractais é introduzido, e o problema de fidelidade de codificação comuns aos codificadores híbridos fractal-wavelet é resolvido. O sistema proposto reduz o tempo médio de processamento das imagens em 94% comparado com o codificador fractal tradicional e um ganho de 0 a 2,4dB em PSNR sobre o algoritmo SPIHT puro. Em ambos os casos, o novo esquema proposto apresentou melhorias em temos de qualidade subjetiva das imagens para altas, médias e baixas taxas de compressão / Abstract: The excellent visual quality and compression rate of fractal image coding have limited applications due to exhaustive inherent encoding time. This research presents a new fast and efficient image coder that applies the speed of the wavelet transform to the image quality of the fractal compression. In this scheme, a fast fractal encoding using Fisher¿s domain classification is applied to the lowpass subband of wavelet transformed image and a modified SPIHT coding (Set Partitioning in Hierarchical Trees), on the remaining coefficients. The image details and wavelet progressive transmission characteristics are maintained; no blocking effects from fractal techniques are introduced; and the encoding fidelity problem common in fractal-wavelet hybrid coders is solved. The proposed scheme provides an average of 94% reduction in encoding-decoding time compared to the pure accelerated Fractal coding results, and a 0-2,4dB gain in PSNR over the pure SPIHT coding. In both cases, the new scheme improves the subjective quality of pictures for high, medium and low bit rates / Doutorado / Telecomunicações e Telemática / Doutor em Engenharia Elétrica
|
309 |
Analysis and Compression of Large CFD Data Sets Using Proper Orthogonal DecompositionBlanc, Trevor Jon 01 July 2014 (has links) (PDF)
Efficient analysis and storage of data is an integral but often challenging task when working with computation fluid dynamics mainly due to the amount of data it can output. Methods centered around the proper orthogonal decomposition were used to analyze, compress, and model various simulation cases. Two different high-fidelity, time-accurate turbomachinery simulations were investigated to show various applications of the analysis techniques. The first turbomachinery example was used to illustrate the extraction of turbulent coherent structures such as traversing shocks, vortex shedding, and wake variation from deswirler and rotor blade passages. Using only the most dominant modes, flow fields were reconstructed and analyzed for error. The reconstructions reproduced the general dynamics within the flow well, but failed to fully resolve shock fronts and smaller vortices. By decomposing the domain into smaller, independent pieces, reconstruction error was reduced by up to 63 percent. A new method of data compression that combined an image compression algorithm and the proper orthogonal decomposition was used to store the reconstructions of the flow field, increasing data compression ratios by a factor of 40.The second turbomachinery simulation studied was a three-stage fan with inlet total pressure distortion. Both the snapshot and repeating geometry methods were used to characterize structures of static pressure fluctuation within the blade passages of the third rotor blade row. Modal coefficients filtered by frequencies relating to the inlet distortion pattern were used to produce reconstructions of the pressure field solely dependent on the inlet boundary condition. A hybrid proper orthogonal decomposition method was proposed to limit burdens on computational resources while providing high temporal resolution analysis.Parametric reduced order models were created from large databases of transient and steady conjugate heat transfer and airfoil simulations. Performance of the models were found to depend heavily on the range of the parameters varied as well as the number of simulations used to traverse that range. The heat transfer models gave excellent predictions for temperature profiles in heated solids for ambitious parameter ranges. Model development for the airfoil case showed that accuracy was highly dependent on modal truncation. The flow fields were predicted very well, especially outside the boundary layer region of the flow.
|
310 |
From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression AlgorithmsDamme, Patrick, Ungethüm, Annett, Hildebrandt, Juliana, Habich, Dirk, Lehner, Wolfgang 11 January 2023 (has links)
Lightweight integer compression algorithms are frequently applied in in-memory database systems to tackle the growing gap between processor speed and main memory bandwidth. In recent years, the vectorization of basic techniques such as delta coding and null suppression has considerably enlarged the corpus of available algorithms. As a result, today there is a large number of algorithms to choose from, while different algorithms are tailored to different data characteristics. However, a comparative evaluation of these algorithms with different data and hardware characteristics has never been sufficiently conducted in the literature. To close this gap, we conducted an exhaustive experimental survey by evaluating several state-of-the-art lightweight integer compression algorithms as well as cascades of basic techniques. We systematically investigated the influence of data as well as hardware properties on the performance and the compression rates. The evaluated algorithms are based on publicly available implementations as well as our own vectorized reimplementations. We summarize our experimental findings leading to several new insights and to the conclusion that there is no single-best algorithm. Moreover, in this article, we also introduce and evaluate a novel cost model for the selection of a suitable lightweight integer compression algorithm for a given dataset.
|
Page generated in 0.0793 seconds