• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 5
  • 5
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Adaptive Cache-Oblivious All-to-All Operation

Chung, Shin Yee, Hsu, Wen Jing 01 1900 (has links)
Modern processors rely on cache memories to reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. However, careful tuning of these parameters for these algorithms is needed for different hardware platforms. Cache-oblivious (CO) algorithms were first introduced by Leiserson to work without the knowledge of the cache parameters mentioned earlier, but still achieve optimal work complexity and optimal cache complexity. Here we present CO algorithms for all-to-all operations (analogous to the cross-product operation). Its applications include Convolution, Polynomial Arithmetic, Multiple Sequence Alignment, N-Body Simulation, etc. Given two lists each with n elements, a naive implementation of all-to-all operation incurs O(n²/L) cache misses. Our CO version incurs only O(n²/L²√Z) cache misses. Preliminary experiments on Opteron 1.4GHz and MIPS 250MHz show that the CO implementation achieves two times faster. The profiling tool further confirms that the amount of cache misses is significantly lower. We also consider various situations where (a) the elements have non-uniform sizes, (b) an element cannot fit into the cache, (c) the lengths of the lists vary, and (d) an element is linked list. In addition, we study the extension to K-lists All-to-All Operation and its application. Finally, we will present the empirical results and compare with cache-aware algorithms. / Singapore-MIT Alliance (SMA)
2

Analyse und Erweiterung des Paradyn Performance Tools

Arndt, Michael 12 May 2006 (has links) (PDF)
Das kostenfrei erhältliche Performanz Analyse Werkzeug Paradyn wird im Hinblick auf die Tauglichkeit zur Performanzanalyse quantenmechanischer Anwendungen (konkret Abinit) untersucht. Zusätzlich wird Paradyn so erweitert, dass eine Analyse mittels vorhandener Hardwarecounter möglich ist. Da Paradyn plattformunabhängig ist werden Performance Counter Bibliotheken wie PCL oder PAPI verwendet.
3

Analyse und Erweiterung des Paradyn Performance Tools

Arndt, Michael 12 May 2006 (has links)
Das kostenfrei erhältliche Performanz Analyse Werkzeug Paradyn wird im Hinblick auf die Tauglichkeit zur Performanzanalyse quantenmechanischer Anwendungen (konkret Abinit) untersucht. Zusätzlich wird Paradyn so erweitert, dass eine Analyse mittels vorhandener Hardwarecounter möglich ist. Da Paradyn plattformunabhängig ist werden Performance Counter Bibliotheken wie PCL oder PAPI verwendet.
4

Experiments with the pentium Performance monitoring counters

Agarwal, Gunjan 06 1900 (has links)
Performance monitoring counters are implemented in most recent microprocessors. In this thesis, we describe various performance measurement experiments for a program and a system that we conducted on a Linux operating system using the Pentium performance counters. We carried out our performance measurements on a Pentium II microprocessor. The Pentium II performance counters can be configured to count events such as cache misses, TLB misses, instructions executed etc. We used a low intrusive overhead technique to access these performance counters. We used these performance counters to measure the cache miss overheads due to context switches in Linux system. Our methodology involves sampling the hardware counters every 50ps. The sampling was set up using signals related to interval timers. We describe an analytical cache performance model under multiprogrammed condition from the literature and validate it using the performance monitoring counters. We next explores the long term performance of a system under different workload conditions. Various performance monitoring events - data cache h, data TLB misses, data cache reads or writes, branches etc. - are monitored over a 24 hour period. This is useful in identifying activities which cause loss of system performance. We used timer interrupts for sampling the performance counters. We develop a profiling methodology to give a perspective of performance of the different functions of a program, not only on the basis of execution-time but also on the data cache misses. Available tools like prof on Unix can be used to pinpoint the regions of performance loss of programs, but they mainly rely on an execution-time profiles. This does not give insight into problems in cache performance for that program. So we develop this methodology to get the performance of each function of the program not only on the basis of its execution time but also on the basis of its cache behavior.
5

Investigating the effect of implementing Data-Oriented Design principles on performance and cache utilization

Nyberg, Frank January 2021 (has links)
Game engines process a lot of data under strict deadlines. Therefore, measures to increase performance are important in this area. Data-Oriented Design (DOD) promotes principles that are meant to increase performance by better cache utilization. The purpose of this thesis is to examine a selection of these principles to give a better understanding of how DOD affects CPU time and the rate of cache misses, with focus on the area of game development. More specifically, the principles examined are removal of run-time polymorphism, iteration over contiguous data, and lowering the amount of data in hot loops. Also, the Entity-Component-System pattern is examined, which is based upon DOD principles. The approach was to first present a theoretical background on the subject, and then to conduct tests by implementing a simulation of movement and collision detection utilizing said principles. The tests were written in C++ and executed on an Intel Core i7 4770k with no rendering. CPU time was measured in updated entities per μs, and cache utilization was measured in the form of cache miss rate. The results showed that the DOD principles did increase performance. Cache miss rate was also lower, with the exception of when removing run-time polymorphism. The conclusion is that Data-Oriented Design, used in game development, is likely to result in better performance, mostly as a result of better cache utilization.

Page generated in 0.0698 seconds