281 |
Statistické jazykové modely založené na neuronových sítích / STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKSMikolov, Tomáš January 2012 (has links)
Statistické jazykové modely jsou důležitou součástí mnoha úspěšných aplikací, mezi něž patří například automatické rozpoznávání řeči a strojový překlad (příkladem je známá aplikace Google Translate). Tradiční techniky pro odhad těchto modelů jsou založeny na tzv. N-gramech. Navzdory známým nedostatkům těchto technik a obrovskému úsilí výzkumných skupin napříč mnoha oblastmi (rozpoznávání řeči, automatický překlad, neuroscience, umělá inteligence, zpracování přirozeného jazyka, komprese dat, psychologie atd.), N-gramy v podstatě zůstaly nejúspěšnější technikou. Cílem této práce je prezentace několika architektur jazykových modelůzaložených na neuronových sítích. Ačkoliv jsou tyto modely výpočetně náročnější než N-gramové modely, s technikami vyvinutými v této práci je možné jejich efektivní použití v reálných aplikacích. Dosažené snížení počtu chyb při rozpoznávání řeči oproti nejlepším N-gramovým modelům dosahuje 20%. Model založený na rekurentní neurovové síti dosahuje nejlepších publikovaných výsledků na velmi známé datové sadě (Penn Treebank).
|
282 |
High-Throughput BitPacking CompressionLisa, Nusrat Jahan, Nguyen, Tuan Duy Anh, Habich, Dirk, Kumar, Akash, Lehner, Wolfgang 03 July 2023 (has links)
To efficiently support analytical applications from a data management perspective, in-memory column store database systems are state-of-the art. In this kind of database system, lossless lightweight integer compression schemes are crucial to keep the memory storage as low as possible and to speedup query processing. In this specific compression domain, BitPacking is one of the most frequently applied compression scheme. However, (de) compression should not come with any additional cost during run time, but should be provided transparently without compromising the overall system performance. To achieve that, we focus on acceleration of BitPacking using Field Programmable Gate Arrays (FPGAs). Therefore, we outline several FPGA designs for BitPacking in this paper. As we are going to show in our evaluation, our specific designs provide the BitPacking compression scheme with high-throughput.
|
283 |
Differential pulse code modulation data compressionLum, Randall M. G. 01 January 1989 (has links) (PDF)
With the requirement to store and transmit information efficiently, an ever increasing number of uses of data compression techniques have been generated in diverse fields such as television, surveillance, remote sensing, medical processing, office automation, and robotics. Rapid increases in processing capabilities and the speed of complex integrated circuits make data compression techniques a prime candidate for application in the areas mentioned above. This report addresses, from a theoretical viewpoint, three major data compression techniques, Pixel Coding, Predictive Coding, and Transform Coding. It begins with a project description and continues with data compression techniques, focusing on Differential Pulse Code Modulation.
|
284 |
The Compression of IoT operational data time series in vehicle embedded systemsXing, Renzhi January 2018 (has links)
This thesis examines compression algorithms for time series operational data which are collected from the Controller Area Network (CAN) bus in an automotive Internet of Things (IoT) setting. The purpose of a compression algorithm is to decrease the size of a set of time series data (such as vehicle speed, wheel speed, etc.) so that the data to be transmitted from the vehicle is small size, thus decreasing the cost of transmission while providing potentially better offboard data analysis. The project helped improve the quality of data collected by the data analysts and reduced the cost of data transmission. Since the time series data compression mostly concerns data storage and transmission, the difficulties in this project were where to locate the combination of data compression and transmission, within the limited performance of the onboard embedded systems. These embedded systems have limited resources (concerning hardware and software resources). Hence the efficiency of the compression algorithm becomes very important. Additionally, there is a tradeoff between the compression ratio and real-time performance. Moreover, the error rate introduced by the compression algorithm must be smaller than an expected value. The compression algorithm contains two phases: (1) an online lossy compression algorithm - piecewise approximation to shrink the total number of data samples while maintaining a guaranteed precision and (2) a lossless compression algorithm – Delta-XOR encoding to compress the output of the lossy algorithm. The algorithm was tested with four typical time series data samples from real CAN logs with different functions and properties. The similarities and differences between these logs are discussed. These differences helped to determine the algorithms that should be used. After the experiments which helped to compare different algorithms and check their performances, a simulation is implemented based on the experiment results. The results of this simulation show that the combined compression algorithm can meet the need of certain compression ratio by controlling the error bound. Finally, the possibility of improving the compression algorithm in the future is discussed. / Denna avhandling undersöker komprimeringsalgoritmer för driftdata från tidsserier som samlas in från ett fordons CAN-buss i ett sammanhang rörande Internet of Things (IoT) speciellt tillämpat för bilindustrin. Syftet med en kompressionsalgoritm är att minska storleken på en uppsättning tidsseriedata (som tex fordonshastighet, hjulhastighet etc.) så att data som ska överföras från fordonet har liten storlek och därmed sänker kostnaden för överföring samtidigt som det möjliggör bättre dataanalys utanför fordonet. Projektet bidrog till att förbättra kvaliteten på data som samlats in av dataanalytiker och minskade kostnaderna för dataöverföring. Eftersom tidsseriekomprimeringen huvudsakligen handlar om datalagring och överföring var svårigheterna i det här projektet att lokalisera kombinationen av datakomprimering och överföring inom den begränsade prestandan hos de inbyggda systemen. Dessa inbyggda system har begränsade resurser (både avseende hårdvaru- och programvaruresurser). Därför blir effektiviteten hos kompressionsalgoritmen mycket viktig. Dessutom är det en kompromiss mellan kompressionsförhållandet och realtidsprestanda. Dessutom måste felfrekvensen som införs av kompressionsalgoritmen vara mindre än ett givet gränsvärde. Komprimeringsalgoritmen i denna avhandling benämns kombinerad kompression, och innehåller två faser: (1) en online-algoritm med dataförluster, för att krympa det totala antalet data-samples samtidigt som det garanterade felet kan hållas under en begränsad nivå och (2) en dataförlustfri kompressionsalgoritm som komprimerar utsignalen från den första algoritmen. Algoritmen testades med fyra typiska tidsseriedataxempel från reella CAN-loggar med olika funktioner och egenskaper. Likheterna och skillnaderna mellan dessa olika typer diskuteras. Dessa skillnader hjälpte till att bestämma vilken algoritm som ska väljas i båda faser. Efter experimenten som jämför prestandan för olika algoritmer, implementeras en simulering baserad på experimentresultaten. Resultaten av denna simulering visar att den kombinerade kompressionsalgoritmen kan möta behovet av ett visst kompressionsförhållande genom att styra mot den bundna felgränsen. Slutligen diskuteras möjligheten att förbättra kompressionsalgoritmen i framtiden.
|
285 |
Experimental Study on Machine Learning with Approximation to Data StreamsJiang, Jiani January 2019 (has links)
Realtime transferring of data streams enables many data analytics and machine learning applications in the areas of e.g. massive IoT and industrial automation. Big data volume of those streams is a significant burden or overhead not only to the transportation network, but also to the corresponding application servers. Therefore, researchers and scientists focus on reducing the amount of data needed to be transferred via data compressions and approximations. Data compression techniques like lossy compression can significantly reduce data volume with the price of data information loss. Meanwhile, how to do data compression is highly dependent on the corresponding applications. However, when apply the decompressed data in some data analysis application like machine learning, the results may be affected due to the information loss. In this paper, the author did a study on the impact of data compression to the machine learning applications. In particular, from the experimental perspective, it shows the tradeoff among the approximation error bound, compression ratio and the prediction accuracy of multiple machine learning methods. The author believes that, with proper choice, data compression can dramatically reduce the amount of data transferred with limited impact on the machine learning applications. / Realtidsöverföring av dataströmmar möjliggör många dataanalyser och maskininlärningsapplikationer inom områdena t.ex. massiv IoT och industriell automatisering. Stor datavolym för dessa strömmar är en betydande börda eller omkostnad inte bara för transportnätet utan också för motsvarande applikationsservrar. Därför fokuserar forskare och forskare om att minska mängden data som behövs för att överföras via datakomprimeringar och approximationer. Datakomprimeringstekniker som förlustkomprimering kan minska datavolymen betydligt med priset för datainformation. Samtidigt är datakomprimering mycket beroende av motsvarande applikationer. Men när du använder dekomprimerade data i en viss dataanalysapplikation som maskininlärning, kan resultaten påverkas på grund av informationsförlusten. I denna artikel gjorde författaren en studie om effekterna av datakomprimering på maskininlärningsapplikationerna. I synnerhet, från det experimentella perspektivet, visar det avvägningen mellan tillnärmningsfelbundet, kompressionsförhållande och förutsägbarhetsnoggrannheten för flera maskininlärningsmetoder. Författaren anser att datakomprimering med rätt val dramatiskt kan minska mängden data som överförs med begränsad inverkan på maskininlärningsapplikationerna.
|
286 |
Real-time Realistic Rendering And High Dynamic Range Image Display And CompressionXu, Ruifeng 01 January 2005 (has links)
This dissertation focuses on the many issues that arise from the visual rendering problem. Of primary consideration is light transport simulation, which is known to be computationally expensive. Monte Carlo methods represent a simple and general class of algorithms often used for light transport computation. Unfortunately, the images resulting from Monte Carlo approaches generally suffer from visually unacceptable noise artifacts. The result of any light transport simulation is, by its very nature, an image of high dynamic range (HDR). This leads to the issues of the display of such images on conventional low dynamic range devices and the development of data compression algorithms to store and recover the corresponding large amounts of detail found in HDR images. This dissertation presents our contributions relevant to these issues. Our contributions to high dynamic range image processing include tone mapping and data compression algorithms. This research proposes and shows the efficacy of a novel level set based tone mapping method that preserves visual details in the display of high dynamic range images on low dynamic range display devices. The level set method is used to extract the high frequency information from HDR images. The details are then added to the range compressed low frequency information to reconstruct a visually accurate low dynamic range version of the image. Additional challenges associated with high dynamic range images include the requirements to reduce excessively large amounts of storage and transmission time. To alleviate these problems, this research presents two methods for efficient high dynamic range image data compression. One is based on the classical JPEG compression. It first converts the raw image into RGBE representation, and then sends the color base and common exponent to classical discrete cosine transform based compression and lossless compression, respectively. The other is based on the wavelet transformation. It first transforms the raw image data into the logarithmic domain, then quantizes the logarithmic data into the integer domain, and finally applies the wavelet based JPEG2000 encoder for entropy compression and bit stream truncation to meet the desired bit rate requirement. We believe that these and similar such contributions will make a wide application of high dynamic range images possible. The contributions to light transport simulation include Monte Carlo noise reduction, dynamic object rendering and complex scene rendering. Monte Carlo noise is an inescapable artifact in synthetic images rendered using stochastic algorithm. This dissertation proposes two noise reduction algorithms to obtain high quality synthetic images. The first one models the distribution of noise in the wavelet domain using a Laplacian function, and then suppresses the noise using a Bayesian method. The other extends the bilateral filtering method to reduce all types of Monte Carlo noise in a unified way. All our methods reduce Monte Carlo noise effectively. Rendering of dynamic objects adds more dimension to the expensive light transport simulation issue. This dissertation presents a pre-computation based method. It pre-computes the surface radiance for each basis lighting and animation key frame, and then renders the objects by synthesizing the pre-computed data in real-time. Realistic rendering of complex scenes is computationally expensive. This research proposes a novel 3D space subdivision method, which leads to a new rendering framework. The light is first distributed to each local region to form local light fields, which are then used to illuminate the local scenes. The method allows us to render complex scenes at interactive frame rates. Rendering has important applications in mixed reality. Consistent lighting and shadows between real scenes and virtual scenes are important features of visual integration. The dissertation proposes to render the virtual objects by irradiance rendering using live captured environmental lighting. This research also introduces a virtual shadow generation method that computes shadows cast by virtual objects to the real background. We finally conclude the dissertation by discussing a number of future directions for rendering research, and presenting our proposed approaches.
|
287 |
Conflict Detection-Based Run-Length Encoding: AVX-512 CD Instruction Set in ActionLehner, Wolfgang, Ungethum, Annett, Pietrzyk, Johannes, Damme, Patrick, Habich, Dirk 18 January 2023 (has links)
Data as well as hardware characteristics are two key aspects for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, efficient in-memory processing benefits from novel processing concepts based on lightweight compressed data. Thus, an active research field deals with the adaptation of new hardware features such as vectorization using SIMD instructions to speedup lightweight data compression algorithms. Following this trend, we propose a novel approach for run-length encoding, a well-known and often applied lightweight compression technique. Our novel approach is based on newly introduced conflict detection (CD) instructions in Intel's AVX-512 instruction set extension. As we are going to show, our CD-based approach has unique properties and outperforms the state-of-the-art RLE approach for data sets with small run lengths.
|
288 |
A Benchmark Framework for Data Compression TechniquesDamme, Patrick, Habich, Dirk, Lehner, Wolfgang 03 February 2023 (has links)
Lightweight data compression is frequently applied in main memory database systems to improve query performance. The data processed by such systems is highly diverse. Moreover, there is a high number of existing lightweight compression techniques. Therefore, choosing the optimal technique for a given dataset is non-trivial. Existing approaches are based on simple rules, which do not suffice for such a complex decision. In contrast, our vision is a cost-based approach. However, this requires a detailed cost model, which can only be obtained from a systematic benchmarking of many compression algorithms on many different datasets. A naïve benchmark evaluates every algorithm under consideration separately. This yields many redundant steps and is thus inefficient. We propose an efficient and extensible benchmark framework for compression techniques. Given an ensemble of algorithms, it minimizes the overall run time of the evaluation. We experimentally show that our approach outperforms the naïve approach.
|
289 |
Computer Graphics and Visualization based Analysis and Record System for Hand Surgery and Therapy PracticeGokavarapu, Venkatamanikanta Subrahmanyakartheek 27 May 2016 (has links)
No description available.
|
290 |
Make Larger Vector Register Sizes New Challenges?: Lessons Learned from the Area of Vectorized Lightweight Compression AlgorithmsHabich, Dirk, Damme, Patrick, Ungethüm, Annett, Lehner, Wolfgang 15 September 2022 (has links)
The exploitation of data as well as hardware properties is a core aspect for efficient data management. This holds in particular for the field of in-memory data processing. Aside from increasing main memory capacities, in-memory data processing also benefits from novel processing concepts based on lightweight compressed data. To speed up compression as well as decompression, an active research field deals with the specialization of these algorithms to hardware features such as vectorization using SIMD instructions. Most of the vectorized implementations have been proposed for 128 bit vector registers. However, hardware vendors still increase the vector register sizes, whereby a straightforward transformation to these wider vector sizes is possible in most-cases. Thus, we systematically investigated the impact of different SIMD instruction set extensions with wider vector sizes on the behavior of straightforward transformed implementations. In this paper, we will describe our evaluation methodology and present selective results of our exhaustive evaluation. In particular, we will highlight some challenges and present first approaches to tackle them.
|
Page generated in 0.1086 seconds