• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Study on Flat-Address-Space Heterogeneous Memory Architectures

Islam, Mahzabeen 05 1900 (has links)
In this dissertation, we present a number of studies that primarily focus on data movement challenges among different types of memories (viz., 3D-DRAM, DDRx DRAM and NVM) employed together as a flat-address heterogeneous memory system. We introduce two different hardware-based techniques for prefetching data from slow off-chip phase change memory (PCM) to fast on-chip memories. The prefetching techniques efficiently fetch data from PCM and place that data into processor-resident or 3D-DRAM-resident buffers without putting high demand on bandwidth and provide significant performance improvements. Next, we explore different page migration techniques for flat-address memory systems which differ in when to migrate pages (i.e., periodically or instantaneously) and how to manage the migrations (i.e., OS-based or hardware-based approach). In the first page migration study, we present several epoch-based page migration policies for different organizations of flat-address memories consisting of two (2-level) and three (3-level) types of memory modules. These policies have resulted in significant energy savings. In the next page migration study, we devise an efficient "on-the-fly'" page migration technique which migrates a page from slow PCM to fast 3D-DRAM whenever it receives a certain number of memory accesses without waiting for any specific time interval. Furthermore, we present a light-weight hardware-assisted address reconciliation process for address management of the migrated pages. Such an on-the-fly page migration with hardware-assisted address reconciliation technique provides significant performance improvement over systems using epoch-based page migration and OS-based address management. Finally, we have developed an analytical model, which employs offline analyses of memory access counts per page and recommends whether an application is migration friendly or not. This can be useful in deciding if page migration (either epoch-based or on-the-fly based) should be used or turned off for a given application. Thus, our data management techniques and model enable significant performance improvements for flat-address heterogeneous memory systems involving NVMs.
2

Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systems

Adavally, Shashank 12 1900 (has links)
CPU-memory bottleneck is a widely recognized problem. It is known that majority of high performance computing (HPC) database systems are configured with large memories and dedicated to process specific workloads like weather prediction, molecular dynamic simulations etc. My research on optimal address mapping improves the memory performance by increasing the channel and bank level parallelism. In an another research direction, I proposed and evaluated adaptive page migration techniques that obviates the need for offline analysis of an application to determine page migration strategies. Furthermore, I explored different migration strategies like reverse migration, sub page migration that I found to be beneficial depending on the application behavior. Ideally, page migration strategies redirect the demand memory traffic to faster memory to improve the memory performance. In my third contribution, I worked and evaluated a memory-side accelerator to assist the main computational core in locating the non-zero elements of a sparse matrix that are typically used in scientific, machine learning workloads on a low-power embedded system configuration. Thus my contributions narrow the speed-gap by improving the latency and/or bandwidth between CPU and memory.
3

Iterative and Adaptive PDE Solvers for Shared Memory Architectures / Iterativa och adaptiva PDE-lösare för parallelldatorer med gemensam minnesorganisation

Löf, Henrik January 2006 (has links)
Scientific computing is used frequently in an increasing number of disciplines to accelerate scientific discovery. Many such computing problems involve the numerical solution of partial differential equations (PDE). In this thesis we explore and develop methodology for high-performance implementations of PDE solvers for shared-memory multiprocessor architectures. We consider three realistic PDE settings: solution of the Maxwell equations in 3D using an unstructured grid and the method of conjugate gradients, solution of the Poisson equation in 3D using a geometric multigrid method, and solution of an advection equation in 2D using structured adaptive mesh refinement. We apply software optimization techniques to increase both parallel efficiency and the degree of data locality. In our evaluation we use several different shared-memory architectures ranging from symmetric multiprocessors and distributed shared-memory architectures to chip-multiprocessors. For distributed shared-memory systems we explore methods of data distribution to increase the amount of geographical locality. We evaluate automatic and transparent page migration based on runtime sampling, user-initiated page migration using a directive with an affinity-on-next-touch semantic, and algorithmic optimizations for page-placement policies. Our results show that page migration increases the amount of geographical locality and that the parallel overhead related to page migration can be amortized over the iterations needed to reach convergence. This is especially true for the affinity-on-next-touch methodology whereby page migration can be initiated at an early stage in the algorithms. We also develop and explore methodology for other forms of data locality and conclude that the effect on performance is significant and that this effect will increase for future shared-memory architectures. Our overall conclusion is that, if the involved locality issues are addressed, the shared-memory programming model provides an efficient and productive environment for solving many important PDE problems.

Page generated in 0.0869 seconds