Global ETD Search

1	Adaptive Cache-Oblivious All-to-All Operation Chung, Shin Yee, Hsu, Wen Jing 01 1900 (has links) Modern processors rely on cache memories to reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. However, careful tuning of these parameters for these algorithms is needed for different hardware platforms. Cache-oblivious (CO) algorithms were first introduced by Leiserson to work without the knowledge of the cache parameters mentioned earlier, but still achieve optimal work complexity and optimal cache complexity. Here we present CO algorithms for all-to-all operations (analogous to the cross-product operation). Its applications include Convolution, Polynomial Arithmetic, Multiple Sequence Alignment, N-Body Simulation, etc. Given two lists each with n elements, a naive implementation of all-to-all operation incurs O(n²/L) cache misses. Our CO version incurs only O(n²/L²√Z) cache misses. Preliminary experiments on Opteron 1.4GHz and MIPS 250MHz show that the CO implementation achieves two times faster. The profiling tool further confirms that the amount of cache misses is significantly lower. We also consider various situations where (a) the elements have non-uniform sizes, (b) an element cannot fit into the cache, (c) the lengths of the lists vary, and (d) an element is linked list. In addition, we study the extension to K-lists All-to-All Operation and its application. Finally, we will present the empirical results and compare with cache-aware algorithms. / Singapore-MIT Alliance (SMA) cache-oblivious algorithms all-to-all operations cache misses
2	Aplikace Grayových kódů v cache-oblivious algoritmech / Applications of Gray codes in cache-oblivious algorithms Mička, Ondřej January 2019 (has links) Modern computers employ a sophisticated hierarchy of caches to decrease the latency of memory accesses. This led to the development of cache-oblivious algorithms that strive to achieve the best possible performance on such memory hierarchies with minimal knowledge of the exact parameters of the hierarchy. A common technique used in the design of cache-oblivious algorithms is a recursion-based divide-and-conquer method. In this work, we show an alternative technique based on the Gray codes. We use the binary reflected Gray code to traverse arrays in the cache-oblivious way, allowing us to design algorithms for problems such as matrix transposition, naive matrix multiplication or naive convolution that match the asymptotic performance of their recursion-based counterparts. The advantage is that our algorithms can be implemented without recursion (or a stack that simulates it) by using a loopless algorithm. We also introduce a variant of the binary reflected Gray code tuned to certain applications of our technique and an almost loopless algorithm to generate it. Apart from the theoretical analysis of our technique's performance, we also examine its practical performance on the problem of matrix transposition.

Search results

Adaptive Cache-Oblivious All-to-All Operation

Aplikace Grayových kódů v cache-oblivious algoritmech / Applications of Gray codes in cache-oblivious algorithms