• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 118
  • 11
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 145
  • 145
  • 145
  • 134
  • 39
  • 35
  • 33
  • 29
  • 27
  • 19
  • 13
  • 11
  • 11
  • 10
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

An API for adaptive loop scheduling in shared address space architectures

Govindaswamy, Kirthilakshmi. January 2003 (has links) (PDF)
Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
132

Active management of Cache resources

Ramaswamy, Subramanian 08 July 2008 (has links)
This dissertation addresses two sets of challenges facing processor design as the industry enters the deep sub-micron region of semiconductor design. The first set of challenges relates to the memory bottleneck. As the focus shifts from scaling processor frequency to scaling the number of cores, performance growth demands increasing die area. Scaling the number of cores also places a concurrent area demand in the form of larger caches. While on-chip caches occupy 50-60% of area and consume 20-30% of energy expended on-chip, their performance and energy efficiencies are less than 15% and 1% respectively for a range of benchmarks! The second set of challenges is posed by transistor leakage and process variation (inter-die and intra-die) at future technology nodes. Leakage power is anticipated to increase exponentially and sharply lower defect-free yield with successive technology generations. For performance scaling to continue, cache efficiencies have to improve significantly. This thesis proposes and evaluates a broad family of such improvements. This dissertation first contributes a model for cache efficiencies and finds them to be extremely low - performance efficiencies less than 15% and energy efficiencies in the order of 1%. Studying the sources of inefficiency leads to a framework for efficiency improvement based on two interrelated strategies. The approach for improving energy efficiency primarily relies on sizing the cache to match the application memory footprint during a program phase while powering down all remaining cache sets. Importantly, the sized is fully functional with no references to inactive sets. Improving performance efficiency primarily relies on cache shaping, i.e., changing the placement function and thereby the manner in which memory shares the cache. Sizing and shaping are applied at different phase of the design cycle: i) post-manufacturing & offline, ii) at compile-time, and at iii) run-time. This thesis proposes and explores techniques at each phase collectively realizing a repertoire of techniques for future memory system designers. The techniques use a combination of HW-SW techniques and are demonstrated to provide substantive improvements with modest overheads.
133

Dynamic partitioned global address spaces for high-efficiency computing

Young, Jeffrey 19 November 2008 (has links)
The current trend of ever larger clusters and data centers has coincided with a dramatic increase in the cost and power of these installations. While many efficiency improvements have focused on processor power and cooling costs, reducing the cost and power consumption of high-performance memory has mostly been overlooked. This thesis proposes a new address translation model called Dynamic Partitioned Global Address Space (DPGAS) that extends the ideas of NUMA and software-based approaches to create a high-performance hardware model that can be used to reduce the overall cost and power of memory in larger server installations. A memory model and hardware implementation of DPGAS is developed, and simulations of memory-intensive workloads are used to show potential cost and power reductions when DPGAS is integrated into a server environment.
134

An initial operating system adaptation heuristic for Swap Cluster Max (SCM)

Somanathan, Muthuveer, January 2008 (has links)
Thesis (M.S.)--University of Texas at El Paso, 2008. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
135

Improving instruction fetch rate with code pattern cache for superscalar architecture

Beg, Azam Muhammad, January 2005 (has links)
Thesis (Ph.D.) -- Mississippi State University. Department of Electrical and Computer Engineering. / Title from title screen. Includes bibliographical references.
136

SRAM system design for memory based computing

Zia, Muneeb 03 April 2013 (has links)
The objective of the research was to design and test an SRAM system which can meet the performance criteria for Memory Based Computing (MBC). This form of computing consists of a Look-Up Table (LUT) which is basically memory array mapped with a function; the computations thereafter consist of essentially read operations. An MBC framework requires very fast and low power read operations. Moreover, the cells need to be read stable as major part of the computation is done by reading the LUTs mapped in the SRAM array. Design and measurement of a prototype MBC test-chip with SRAM system optimized for read-heavy applications is presented in this thesis. For this purpose, a prototype MBC system was designed and taped out. Essential study of the write-ability of the core LUT is also presented. The core memory array for function table mapping was characterized for leakage, write-ability and power saving associated with pulsed read mode.
137

Design of heterogeneous coherence hierarchies using manager-client pairing

Beu, Jesse Garrett 09 April 2013 (has links)
Over the past ten years, the architecture community has witnessed the end of single-threaded performance scaling and a subsequent shift in focus toward multicore and manycore processing. While this is an exciting time for architects, with many new opportunities and design spaces to explore, this brings with it some new challenges. One area that is especially impacted is the memory subsystem. Specifically, the design, verification, and evaluation of cache coherence protocols becomes very challenging as cores become more numerous and more diverse. This dissertation examines these issues and presents Manager-Client Pairing as a solution to the challenges facing next-generation coherence protocol design. By defining a standardized coherence communication interface and permissions checking algorithm, Manager-Client Pairing enables coherence hierarchies to be constructed and evaluated quickly without the high design-cost previously associated with hierarchical composition. Further, Manager-Client Pairing also allows for verification composition, even in the presence of protocol heterogeneity. As a result, this rapid development of diverse protocols is ensured to be bug-free, enabling architects to focus on performance optimization, rather than debugging and correctness concerns, while comparing diverse coherence configurations for use in future heterogeneous systems.
138

Hardware assisted memory checkpointing and applications in debugging and reliability

Doudalis, Ioannis 25 July 2011 (has links)
The problems of software debugging and system reliability/availability are among the most challenging problems the computing industry is facing today, with direct impact on the development and operating costs of computing systems. A promising debugging technique that assists programmers identify and fix the causes of software bugs a lot more efficiently is bidirectional debugging, which enables the user to execute the program in "reverse", and a typical method used to recover a system after a fault is backwards error recovery, which restores the system to the last error-free state. Both reverse execution and backwards error recovery are enabled by creating memory checkpoints, which are used to restore the program/system to a prior point in time and re-execute until the point of interest. The checkpointing frequency is the primary factor that affects both the latency of reverse execution and the recovery time of the system; more frequent checkpoints reduce the necessary re-execution time. Frequent creation of checkpoints poses performance challenges, because of the increased number of memory reads and writes necessary for copying the modified system/program memory, and also because of software interventions, additional synchronization and I/O, etc., needed for creating a checkpoint. In this thesis I examine a number of different hardware accelerators, whose role is to create frequent memory checkpoints in the background, at minimal performance overheads. For the purpose of reverse execution, I propose the HARE and Euripus hardware checkpoint accelerators. HARE and Euripus create different types of checkpoints, and employ different methods for keeping track of the modified memory. As a result, HARE and Euripus have different hardware costs and provide different functionality which directly affects the latency of reverse execution. For improving the availability of the system, I propose the Kyma hardware accelerator. Kyma enables simultaneous creation of checkpoints at different frequencies, which allows the system to recover from multiple types of errors and tolerate variable error-detection latencies. The Kyma and Euripus hardware engines have similar architectures, but the functionality of the Kyma engine is optimized for further reducing the performance overheads and improving the reliability of the system. The functionality of the Kyma and Euripus engines can be combined into a unified accelerator that can serve the needs of both bidirectional debugging and system recovery.
139

The use of memory state knowledge to improve computer memory system organization

Isen, Ciji 01 June 2011 (has links)
The trends in virtualization as well as multi-core, multiprocessor environments have translated to a massive increase in the amount of main memory each individual system needs to be fitted with, so as to effectively utilize this growing compute capacity. The increasing demand on main memory implies that the main memory devices and their issues are as important a part of system design as the central processors. The primary issues of modern memory are power, energy, and scaling of capacity. Nearly a third of the system power and energy can be from the memory subsystem. At the same time, modern main memory devices are limited by technology in their future ability to scale and keep pace with the modern program demands thereby requiring exploration of alternatives to main memory storage technology. This dissertation exploits dynamic knowledge of memory state and memory data value to improve memory performance and reduce memory energy consumption. A cross-boundary approach to communicate information about dynamic memory management state (allocated and deallocated memory) between software and hardware viii memory subsystem through a combination of ISA support and hardware structures is proposed in this research. These mechanisms help identify memory operations to regions of memory that have no impact on the correct execution of the program because they were either freshly allocated or deallocated. This inference about the impact stems from the fact that, data in memory regions that have been deallocated are no longer useful to the actual program code and data present in freshly allocated memory is also not useful to the program because the dynamic memory has not been defined by the program. By being cognizant of this, such memory operations are avoided thereby saving energy and improving the usefulness of the main memory. Furthermore, when stores write zeros to memory, the number of stores to the memory is reduced in this research by capturing it as compressed information which is stored along with memory management state information. Using the methods outlined above, this dissertation harnesses memory management state and data value information to achieve significant savings in energy consumption while extending the endurance limit of memory technologies. / text
140

Global address spaces for efficient resource provisioning in the data center

Young, Jeffrey Scott 13 January 2014 (has links)
The rise of large data sets, or "Big Data'', has coincided with the rise of clusters with large amounts of memory and GPU accelerators that can be used to process rapidly growing data footprints. However, the complexity and performance limitations of sharing memory and accelerators in a cluster limits the options for efficient management and allocation of resources for applications. The global address space model (GAS), and specifically hardware-supported GAS, is proposed as a means to provide a high-performance resource management platform upon which resource sharing between nodes and resource aggregation across nodes can take place. This thesis builds on the initial concept of GAS with a model that is matched to "Big Data'' computing and its data transfer requirements. The proposed model, Dynamic Partitioned Global Address Spaces (DPGAS), is implemented using a commodity converged interconnect, HyperTransport over Ethernet (HToE), and a software framework, the Oncilla runtime and API. The DPGAS model and associated hardware and software components are used to investigate two application spaces, resource sharing for time-varying workloads and resource aggregation for GPU-accelerated data warehousing applications. This work demonstrates that hardware-supported GAS can be used improve the performance and power consumption of memory-intensive applications, and that it can be used to simplify host and accelerator resource management in the data center.

Page generated in 0.6646 seconds