Spelling suggestions: "subject:"cache oblivion""
1 |
Chování datových struktur při změnách velikosti vyrovnávací paměti / Data structure behavior with variable cache sizeKrál, Karel January 2017 (has links)
Cache-oblivious algorithms are well understood when the cache size remains constant. Recently variable cache sizes have been considered. We are motivated by programs running in pseudo-parallel and competing for a single cache. This thesis studies the underlying cache model and gives a generalization of two models considered in the literature. We give a new cache model called the "depth model" where pages are accessed by page depths in an LRU cache instead of their ad- dresses. This model allows us to construct cache-oblivious algorithms that cause a certain number of cache misses prescribed by an arbitrary function computable without causing a cache miss. Finally we prove that two algorithms satisfying the regularity property running in pseudo-parallel cause asymptotically the same number of cache misses as their serial computations provided that the cache is satisfying the tall-cache assumption.
|
2 |
Aplikace Grayových kódů v cache-oblivious algoritmech / Applications of Gray codes in cache-oblivious algorithmsMička, Ondřej January 2019 (has links)
Modern computers employ a sophisticated hierarchy of caches to decrease the latency of memory accesses. This led to the development of cache-oblivious algorithms that strive to achieve the best possible performance on such memory hierarchies with minimal knowledge of the exact parameters of the hierarchy. A common technique used in the design of cache-oblivious algorithms is a recursion-based divide-and-conquer method. In this work, we show an alternative technique based on the Gray codes. We use the binary reflected Gray code to traverse arrays in the cache-oblivious way, allowing us to design algorithms for problems such as matrix transposition, naive matrix multiplication or naive convolution that match the asymptotic performance of their recursion-based counterparts. The advantage is that our algorithms can be implemented without recursion (or a stack that simulates it) by using a loopless algorithm. We also introduce a variant of the binary reflected Gray code tuned to certain applications of our technique and an almost loopless algorithm to generate it. Apart from the theoretical analysis of our technique's performance, we also examine its practical performance on the problem of matrix transposition.
|
3 |
Implicitní reprezentace množin / An implicit representation of setsLieskovský, Matej January 2020 (has links)
In our bachelor thesis, we described an implicit data structure that, given a way to maintain an implicit representation of polylogarithmic buckets, could implement all the dynamic ordered dictionary operations in logarithmic time. We now fulfill our obligation and provide a corresponding construction of implicit buckets. 1
|
4 |
Adaptive Cache-Oblivious All-to-All OperationChung, Shin Yee, Hsu, Wen Jing 01 1900 (has links)
Modern processors rely on cache memories to reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. However, careful tuning of these parameters for these algorithms is needed for different hardware platforms. Cache-oblivious (CO) algorithms were first introduced by Leiserson to work without the knowledge of the cache parameters mentioned earlier, but still achieve optimal work complexity and optimal cache complexity. Here we present CO algorithms for all-to-all operations (analogous to the cross-product operation). Its applications include Convolution, Polynomial Arithmetic, Multiple Sequence Alignment, N-Body Simulation, etc. Given two lists each with n elements, a naive implementation of all-to-all operation incurs O(n²/L) cache misses. Our CO version incurs only O(n²/L²√Z) cache misses. Preliminary experiments on Opteron 1.4GHz and MIPS 250MHz show that the CO implementation achieves two times faster. The profiling tool further confirms that the amount of cache misses is significantly lower. We also consider various situations where (a) the elements have non-uniform sizes, (b) an element cannot fit into the cache, (c) the lengths of the lists vary, and (d) an element is linked list. In addition, we study the extension to K-lists All-to-All Operation and its application. Finally, we will present the empirical results and compare with cache-aware algorithms. / Singapore-MIT Alliance (SMA)
|
5 |
Cache Oblivious Data StructuresOhashi, Darin January 2001 (has links)
This thesis discusses cache oblivious data structures. These are structures which have good caching characteristics without knowing Z, the size of the cache, or L, the length of a cache line. Since the structures do not require these details for good performance they are portable across caching systems. Another advantage of such structures isthat the caching results hold for every level of cache within a multilevel cache. Two simple data structures are studied; the array used for binary search and the linear list. As well as being cache oblivious, the structures presented in this thesis are space efficient, requiring little additional storage. We begin the discussion with a layout for a search tree within an array. This layout allows Searches to be performed in O(log n) time and in O(log n/log L) (the optimal number) cache misses. An algorithm for building this layout from a sorted array in linear time is given. One use for this layout is a heap-like implementation of the priority queue. This structure allows Inserts, Heapifies and ExtractMaxes in O(log n) time and O(log nlog L) cache misses. A priority queue using this layout can be builtfrom an unsorted array in linear time. Besides the n spaces required to hold the data, this structure uses a constant amount of additional storage. The cache oblivious linear list allows scans of the list taking Theta(n) time and incurring Theta(n/L) (the optimal number) cache misses. The running time of insertions and deletions is not constant, however it is sub-polynomial. This structure requires e*n additional storage, where e is any constant greater than zero.
|
6 |
Cache Oblivious Data StructuresOhashi, Darin January 2001 (has links)
This thesis discusses cache oblivious data structures. These are structures which have good caching characteristics without knowing Z, the size of the cache, or L, the length of a cache line. Since the structures do not require these details for good performance they are portable across caching systems. Another advantage of such structures isthat the caching results hold for every level of cache within a multilevel cache. Two simple data structures are studied; the array used for binary search and the linear list. As well as being cache oblivious, the structures presented in this thesis are space efficient, requiring little additional storage. We begin the discussion with a layout for a search tree within an array. This layout allows Searches to be performed in O(log n) time and in O(log n/log L) (the optimal number) cache misses. An algorithm for building this layout from a sorted array in linear time is given. One use for this layout is a heap-like implementation of the priority queue. This structure allows Inserts, Heapifies and ExtractMaxes in O(log n) time and O(log nlog L) cache misses. A priority queue using this layout can be builtfrom an unsorted array in linear time. Besides the n spaces required to hold the data, this structure uses a constant amount of additional storage. The cache oblivious linear list allows scans of the list taking Theta(n) time and incurring Theta(n/L) (the optimal number) cache misses. The running time of insertions and deletions is not constant, however it is sub-polynomial. This structure requires e*n additional storage, where e is any constant greater than zero.
|
7 |
Analyse réaliste d'algorithmes standards / Realistic analysis of standard algorithmsAuger, Nicolas 20 December 2018 (has links)
À l'origine de cette thèse, nous nous sommes intéressés à l'algorithme de tri TimSort qui est apparu en 2002, alors que la littérature sur le problème du tri était déjà bien dense. Bien qu'il soit utilisé dans de nombreux langages de programmation, les performances de cet algorithme n'avaient jamais été formellement analysées avant nos travaux. L'étude fine de TimSort nous a conduits à enrichir nos modèles théoriques, en y incorporant des caractéristiques modernes de l'architecture des ordinateurs. Nous avons, en particulier, étudié le mécanisme de prédiction de branchement. Grâce à cette analyse théorique, nous avons pu proposer des modifications de certains algorithmes élémentaires (comme l'exponentiation rapide ou la dichotomie) qui utilisent ce principe à leur avantage, améliorant significativement leurs performances lorsqu'ils sont exécutés sur des machines récentes. Enfin, même s'il est courant dans le cadre de l'analyse en moyenne de considérer que les entrées sont uniformément distribuées, cela ne semble pas toujours refléter les distributions auxquelles nous sommes confrontés dans la réalité. Ainsi, une des raisons du choix d'implanter TimSort dans des bibliothèques standard de Java et Python est probablement sa capacité à s'adapter à des entrées partiellement triées. Nous proposons, pour conclure cette thèse, un modèle mathématique de distribution non-uniforme sur les permutations qui favorise l'apparition d'entrées partiellement triées, et nous en donnons une analyse probabiliste détaillée / At first, we were interested in TimSort, a sorting algorithm which was designed in 2002, at a time where it was hard to imagine new results on sorting. Although it is used in many programming languages, the efficiency of this algorithm has not been studied formally before our work. The fine-grain study of TimSort leads us to take into account, in our theoretical models, some modern features of computer architecture. In particular, we propose a study of the mechanisms of branch prediction. This theoretical analysis allows us to design variants of some elementary algorithms (like binary search or exponentiation by squaring) that rely on this feature to achieve better performance on recent computers. Even if uniform distributions are usually considered for the average case analysis of algorithms, it may not be the best framework for studying sorting algorithms. The choice of using TimSort in many programming languages as Java and Python is probably driven by its efficiency on almost-sorted input. To conclude this dissertation, we propose a mathematical model of non-uniform distribution on permutations, for which permutations that are almost sorted are more likely, and provide a detailed probabilistic analysis
|
8 |
Cache-oblivious Algorithms / Cache-oblivious AlgorithmsVaner, Michal January 2012 (has links)
In this work, we study the cache-oblivious computation model, which is inspired by the behaviour of the memory hierarchy of current computers. We study several graph algorithms and techniques of their design in this model. We consider graph searching, identifying connected components and computing maximal matching. We also study sorting and matrix multiplication as subproblems of many graph algorithms. In ad- dition to previously known algorithms, we present several new ones. We study their efficiency both by the means of asymptotic complexity and by benchmarking them on real hardware and we compare them with classical algorithms.
|
9 |
Cache-Oblivious Searching and Sorting in MultisetsFarzan, Arash January 2004 (has links)
We study three problems related to searching and sorting in multisets in the cache-oblivious model: Finding the most frequent element (the mode), duplicate elimination and finally multi-sorting. We are interested in minimizing the cache complexity (or number of cache misses) of algorithms for these problems in the context under which the cache size and block size are unknown.
We start by showing the lower bounds in the comparison model. Then we present the lower bounds in the cache-aware model, which are also the lower bounds in the cache-oblivious model. We consider the input multiset of size <i>N</i> with multiplicities <i>N</i><sub>1</sub>,. . . , <i>N<sub>k</sub></i>. The lower bound for the cache complexity of determining the mode is Ω({<i>N</i> over <i>B</i>} log {<i>M</i> over <i>B</i>} {<i>N</i> over <i>fB</i>}) where ƒ is the frequency of the mode and <i>M</i>, <i>B</i> are the cache size and block size respectively. Cache complexities of duplicate removal and multi-sorting have lower bounds of Ω({<i>N</i> over <i>B</i>} log {<i>M</i> over <i>B</i>} {<i>N</i> over <i>B</i>} - £{<i>k</i> over <i>i</i>}=1{<i>N<sub>i</sub></i> over <i>B</i>}log {<i>M</i> over <i>B</i>} {<i>N<sub>i</sub></i> over <i>B</i>}).
We present two deterministic approaches to give algorithms: selection and distribution. The algorithms with these deterministic approaches differ from the lower bounds by at most an additive term of {<i>N</i> over <i>B</i>} loglog <i>M</i>. However, since loglog <i>M</i> is very small in real applications, the gap is tiny. Nevertheless, the ideas of our deterministic algorithms can be used to design cache-aware algorithms for these problems. The algorithms turn out to be simpler than the previously-known cache-aware algorithms for these problems.
Another approach to design algorithms for these problems is the probabilistic approach. In contrast to the deterministic algorithms, our randomized cache-oblivious algorithms are all optimal and their cache complexities exactly match the lower bounds.
All of our algorithms are within a constant factor of optimal in terms of the number of comparisons they perform.
|
10 |
Cache-Oblivious Searching and Sorting in MultisetsFarzan, Arash January 2004 (has links)
We study three problems related to searching and sorting in multisets in the cache-oblivious model: Finding the most frequent element (the mode), duplicate elimination and finally multi-sorting. We are interested in minimizing the cache complexity (or number of cache misses) of algorithms for these problems in the context under which the cache size and block size are unknown.
We start by showing the lower bounds in the comparison model. Then we present the lower bounds in the cache-aware model, which are also the lower bounds in the cache-oblivious model. We consider the input multiset of size <i>N</i> with multiplicities <i>N</i><sub>1</sub>,. . . , <i>N<sub>k</sub></i>. The lower bound for the cache complexity of determining the mode is Ω({<i>N</i> over <i>B</i>} log {<i>M</i> over <i>B</i>} {<i>N</i> over <i>fB</i>}) where ƒ is the frequency of the mode and <i>M</i>, <i>B</i> are the cache size and block size respectively. Cache complexities of duplicate removal and multi-sorting have lower bounds of Ω({<i>N</i> over <i>B</i>} log {<i>M</i> over <i>B</i>} {<i>N</i> over <i>B</i>} - £{<i>k</i> over <i>i</i>}=1{<i>N<sub>i</sub></i> over <i>B</i>}log {<i>M</i> over <i>B</i>} {<i>N<sub>i</sub></i> over <i>B</i>}).
We present two deterministic approaches to give algorithms: selection and distribution. The algorithms with these deterministic approaches differ from the lower bounds by at most an additive term of {<i>N</i> over <i>B</i>} loglog <i>M</i>. However, since loglog <i>M</i> is very small in real applications, the gap is tiny. Nevertheless, the ideas of our deterministic algorithms can be used to design cache-aware algorithms for these problems. The algorithms turn out to be simpler than the previously-known cache-aware algorithms for these problems.
Another approach to design algorithms for these problems is the probabilistic approach. In contrast to the deterministic algorithms, our randomized cache-oblivious algorithms are all optimal and their cache complexities exactly match the lower bounds.
All of our algorithms are within a constant factor of optimal in terms of the number of comparisons they perform.
|
Page generated in 0.0539 seconds