Spelling suggestions: "subject:"runlength encoding"" "subject:"andlength encoding""
1 |
Efficient Algorithms for the Block Edit Distance and Related ProblemsAnn, Hsing-Yen 18 May 2010 (has links)
Computing the similarity of two strings or sequences is one of the most important fundamental in computer field, and it has been widely studied for several decades.
In the last decade, it gained the researchers' attentions again because of the improvements of the hardware computation ability and the presence of huge amount of data in biotechnology.
In this dissertation, we pay attention to computing the edit distance between two sequences where the block-edit operations are involved in addition to the character-edit operations.
Previous researches show that this problem is NP-hard if recursive block moves are allowed.
Since we are interested in solving the editing problems by the polynomial-time optimization algorithms, we consider the simplified version of the edit distance problem.
We first focus on the longest common subsequence (LCS) of run-length encoded (RLE) strings, where the runs can be seen as a class of simplified blocks.
Then, we apply constraints to the problem, i.e. to find the constrained LCS (CLCS) of RLE strings.
Besides, we show that the problems which involve block-edit operations can still be solved by the polynomial-time optimization algorithms if some restrictions are applied.
Let X and Y be two sequences of lengths n and m, respectively.
Also, let N and M, be the numbers of runs in the corresponding RLE forms of X and Y, respectively.
In this dissertation, first, we propose a simple algorithm for computing the LCS of X and Y in O(NM + min{ p_1, p_2 }) time, where p_1 and p_2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively.
This new algorithm improves the previously known time bound O(min{nM, Nm}) and outperforms the time bounds O(NM log NM) or O((N+M+q) log (N+M+q)) for some cases, where q denotes the number of matched blocks.
Next, we give an efficient algorithm for solving the CLCS problem, which is to find a common subsequences Z of X and Y such that a given constrained sequence P is a subsequence of Z and the length of Z is maximized.
Suppose X, Y and P are all in RLE format, and the lengths of X, Y and P are n, m and r, respectively.
Let N, M and R be the numbers of runs in X, Y, and P, respectively.
We show that by RLE, the CLCS problem can be solved in O(NMr + min{q_1 r + q_4, q_2 r + q_5 }) time, where q_1 and q_2 denote the numbers of elements in the south and east boundaries of the partially matched blocks on the first layer, respectively, and q_4 and q_5 denote the numbers of elements of the west and north pillars in the bottom boundaries of all fully matched cuboids in the DP lattice, respectively.
When the input strings have good compression ratios, our work obviously outperforms the previously known DP algorithms and the Hunt-Szymanski-like algorithms.
Finally, we consider variations of the block edit distance problem that involve character insertions, character deletions, block copies and block deletions, for two given sequences X and Y.
In this dissertation, three variations are defined with different measuring functions, which are P(EIS, C), P(EI, L) and P(EI, N).
Then we show that with some preprocessing, the minimum block edit distances of these three variations can be obtained by dynamic programming in O(nm), O(nm log m) and O(nm^2) time, respectively, where n and m are the lengths of X and Y.
|
2 |
Boundless Fluids Using the Lattice-Boltzmann MethodHaughey, Kyle J 01 June 2009 (has links)
Computer-generated imagery is ubiquitous in today's society, appearing in advertisements, video games, and computer-animated movies among other places. Much of this imagery needs to be as realistic as possible, and animators have turned to techniques such as fluid simulation to create scenes involving substances like smoke, fire, and water. The Lattice-Boltzmann Method (LBM) is one fluid simulation technique that has gained recent popularity due to its relatively simple basic algorithm and the ease with which it can be distributed across multiple processors. Unfortunately, current LBM simulations also suffer from high memory usage and restrict free surface fluids to domains of fixed size. This thesis modifies the LBM to utilize a recursive run-length-encoded (RLE) grid data structure instead of the standard fixed array of grid cells, which reduces the amount of memory required for LBM simulations as well as allowing the domain to grow and shrink as necessary to accomodate a liquid surface. The modified LBM is implemented within the open-source 3D animation package Blender and compared to Blender's current LBM simulator using the metrics of memory usage and time required to complete a given simulation. Results show that, although the RLE-based simulator can take several times longer than the current simulator to complete a given simulation, the memory usage is significantly reduced, making an RLE-based simulation preferable in a few specific circumstances.
|
3 |
[en] LSHSIM: A LOCALITY SENSITIVE HASHING BASED METHOD FOR MULTIPLE-POINT GEOSTATISTICS / [pt] LSHSIM: UM MÉTODO DE GEOESTATÍSTICA MULTIPONTO BASEADO EM LOCALITY SENSITIVITY HASHINGPEDRO NUNO DE SOUZA MOURA 14 November 2017 (has links)
[pt] A modelagem de reservatórios consiste em uma tarefa de muita relevância na medida em que permite a representação de uma dada região geológica de interesse. Dada a incerteza envolvida no processo, deseja-se gerar uma grande quantidade de cenários possíveis para se determinar aquele que melhor representa essa região. Há, então, uma forte demanda de se gerar rapidamente cada simulação. Desde a sua origem, diversas metodologias foram propostas para esse propósito e, nas últimas duas décadas, Multiple-Point Geostatistics (MPS) passou a ser a dominante. Essa metodologia é fortemente baseada no conceito de imagem de treinamento (TI) e no uso de suas características, que são denominadas de padrões. No presente trabalho, é proposto um novo método de MPS que combina a aplicação de dois conceitos-chave: a técnica denominada Locality Sensitive Hashing (LSH), que permite a aceleração da busca por padrões similares a um dado objetivo; e a técnica de compressão Run-Length Encoding (RLE), utilizada para acelerar o cálculo da similaridade de Hamming. Foram realizados experimentos com imagens de treinamento tanto categóricas quanto contínuas que evidenciaram que o LSHSIM é computacionalmente
eficiente e produz realizações de boa qualidade, enquanto gera um espaço de incerteza de tamanho razoável. Em particular, para dados categóricos, os resultados sugerem que o LSHSIM é mais rápido do que o MS-CCSIM, que corresponde a um dos métodos componentes do estado-da-arte. / [en] Reservoir modeling is a very important task that permits the representation of a geological region of interest. Given the uncertainty involved in the process, one wants to generate a considerable number of possible scenarios so as to find those which best represent this region. Then, there is a strong demand for quickly generating each simulation. Since its inception, many methodologies have been proposed for this purpose and, in the last two decades, multiple-point geostatistics (MPS) has been the dominant one. This methodology is strongly based on the concept of training image (TI) and the use of its characteristics, which are called patterns. In this work, we propose a new MPS method that combines the application of a technique called Locality Sensitive Hashing (LSH), which permits to accelerate the search for patterns similar to a target one, with a Run-Length Encoding (RLE) compression technique that speeds up the calculation of the Hamming similarity. We have performed experiments with both categorical and continuous images which showed that LSHSIM is computationally efficient and produce good quality realizations, while achieving a reasonable space of uncertainty. In particular, for categorical data, the results suggest that LSHSIM is faster than MS-CCSIM, one of the state-of-the-art methods.
|
4 |
Quantization of Random Processes and Related Statistical ProblemsShykula, Mykola January 2006 (has links)
<p>In this thesis we study a scalar uniform and non-uniform quantization of random processes (or signals) in average case setting. Quantization (or discretization) of a signal is a standard task in all nalog/digital devices (e.g., digital recorders, remote sensors etc.). We evaluate the necessary memory capacity (or quantization rate) needed for quantized process realizations by exploiting the correlation structure of the model random process. The thesis consists of an introductory survey of the subject and related theory followed by four included papers (A-D).</p><p>In Paper A we develop a quantization coding method when quantization levels crossings by a process realization are used for its coding. Asymptotical behavior of mean quantization rate is investigated in terms of the correlation structure of the original process. For uniform and non-uniform quantization, we assume that the quantization cellwidth tends to zero and the number of quantization levels tends to infinity, respectively.</p><p>In Papers B and C we focus on an additive noise model for a quantized random process. Stochastic structures of asymptotic quantization errors are derived for some bounded and unbounded non-uniform quantizers when the number of quantization levels tends to infinity. The obtained results can be applied, for instance, to some optimization design problems for quantization levels.</p><p>Random signals are quantized at sampling points with further compression. In Paper D the concern is statistical inference for run-length encoding (RLE) method, one of the compression techniques, applied to quantized stationary Gaussian sequences. This compression method is widely used, for instance, in digital signal and image processing. First, we deal with mean RLE quantization rates for various probabilistic models. For a time series with unknown stochastic structure, we investigate asymptotic properties (e.g., asymptotic normality) of two estimates for the mean RLE quantization rate based on an observed sample when the sample size tends to infinity.</p><p>These results can be used in communication theory, signal processing, coding, and compression applications. Some examples and numerical experiments demonstrating applications of the obtained results for synthetic and real data are presented.</p>
|
5 |
Quantization of Random Processes and Related Statistical ProblemsShykula, Mykola January 2006 (has links)
In this thesis we study a scalar uniform and non-uniform quantization of random processes (or signals) in average case setting. Quantization (or discretization) of a signal is a standard task in all nalog/digital devices (e.g., digital recorders, remote sensors etc.). We evaluate the necessary memory capacity (or quantization rate) needed for quantized process realizations by exploiting the correlation structure of the model random process. The thesis consists of an introductory survey of the subject and related theory followed by four included papers (A-D). In Paper A we develop a quantization coding method when quantization levels crossings by a process realization are used for its coding. Asymptotical behavior of mean quantization rate is investigated in terms of the correlation structure of the original process. For uniform and non-uniform quantization, we assume that the quantization cellwidth tends to zero and the number of quantization levels tends to infinity, respectively. In Papers B and C we focus on an additive noise model for a quantized random process. Stochastic structures of asymptotic quantization errors are derived for some bounded and unbounded non-uniform quantizers when the number of quantization levels tends to infinity. The obtained results can be applied, for instance, to some optimization design problems for quantization levels. Random signals are quantized at sampling points with further compression. In Paper D the concern is statistical inference for run-length encoding (RLE) method, one of the compression techniques, applied to quantized stationary Gaussian sequences. This compression method is widely used, for instance, in digital signal and image processing. First, we deal with mean RLE quantization rates for various probabilistic models. For a time series with unknown stochastic structure, we investigate asymptotic properties (e.g., asymptotic normality) of two estimates for the mean RLE quantization rate based on an observed sample when the sample size tends to infinity. These results can be used in communication theory, signal processing, coding, and compression applications. Some examples and numerical experiments demonstrating applications of the obtained results for synthetic and real data are presented.
|
6 |
Komprese signálu EKG / Compression of ECG signalBlaschová, Eliška January 2016 (has links)
This paper represents the most well-known compression methods, which have been published. A Compression of ECG signal is important primarily for space saving in memory cards or efficiency improvement of data transfer. An application of wavelet transform for compression is a worldwide discussed topic and this is the reason why the paper focuses in this direction. Gained wavelet coefficients might be firstly quantized and then compressed using suitable method. There are many options for a selection of wavelet and a degree of decomposition, which will be tested from the point of view of the most efficient compression of ECG signal.
|
7 |
Conflict Detection-Based Run-Length Encoding: AVX-512 CD Instruction Set in ActionLehner, Wolfgang, Ungethum, Annett, Pietrzyk, Johannes, Damme, Patrick, Habich, Dirk 18 January 2023 (has links)
Data as well as hardware characteristics are two key aspects for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, efficient in-memory processing benefits from novel processing concepts based on lightweight compressed data. Thus, an active research field deals with the adaptation of new hardware features such as vectorization using SIMD instructions to speedup lightweight data compression algorithms. Following this trend, we propose a novel approach for run-length encoding, a well-known and often applied lightweight compression technique. Our novel approach is based on newly introduced conflict detection (CD) instructions in Intel's AVX-512 instruction set extension. As we are going to show, our CD-based approach has unique properties and outperforms the state-of-the-art RLE approach for data sets with small run lengths.
|
8 |
Implementace statistických kompresních metod / Implementation of Statistical Compression MethodsŠtys, Jiří January 2013 (has links)
This thesis describes Burrow-Wheeler compression algorithm. It focuses on each part of Burrow-Wheeler algorithm, most of all on and entropic coders. In section are described methods like move to front, inverse frequences, interval coding, etc. Among the described entropy coders are Huffman, arithmetic and Rice-Golomg coders. In conclusion there is testing of described methods of global structure transformation and entropic coders. Best combinations are compared with the most common compress algorithm.
|
Page generated in 0.0897 seconds