Spelling suggestions: "subject:"managemement. eomputer science."" "subject:"managemement. aomputer science.""
131 |
An API for adaptive loop scheduling in shared address space architecturesGovindaswamy, Kirthilakshmi. January 2003 (has links) (PDF)
Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
132 |
Active management of Cache resourcesRamaswamy, Subramanian 08 July 2008 (has links)
This dissertation addresses two sets of challenges facing processor design as the industry enters the deep sub-micron region of semiconductor design. The first set of challenges relates to the memory bottleneck. As the focus shifts from scaling processor frequency to scaling the number of cores, performance growth demands increasing die area. Scaling the number of cores also places a concurrent area demand in the form of larger caches. While on-chip caches occupy 50-60% of area and consume 20-30% of energy expended on-chip, their performance and energy efficiencies are less than 15% and 1% respectively for a range of benchmarks! The second set of challenges is posed by transistor leakage and process variation (inter-die and intra-die) at future technology nodes. Leakage power is anticipated to increase exponentially and sharply lower defect-free yield with successive technology generations. For performance scaling to continue, cache efficiencies have to improve significantly. This thesis proposes and evaluates a broad family of such improvements.
This dissertation first contributes a model for cache efficiencies and finds them to be extremely low - performance efficiencies less than 15% and energy efficiencies in the order of 1%. Studying the sources of inefficiency leads to a framework for efficiency improvement based on two interrelated strategies. The approach for improving energy efficiency primarily relies on sizing the cache to match the application memory footprint during a program phase while powering down all remaining cache sets. Importantly, the sized is fully functional with no references to inactive sets. Improving performance efficiency primarily relies on cache shaping, i.e., changing the placement function and thereby the manner in which memory shares the cache.
Sizing and shaping are applied at different phase of the design cycle: i) post-manufacturing & offline, ii) at compile-time, and at iii) run-time. This thesis proposes and explores techniques at each phase collectively realizing a repertoire of techniques for future memory system designers. The techniques use a combination of HW-SW techniques and are demonstrated to provide substantive improvements with modest overheads.
|
133 |
Dynamic partitioned global address spaces for high-efficiency computingYoung, Jeffrey 19 November 2008 (has links)
The current trend of ever larger clusters and data centers has coincided with a dramatic increase in the cost and
power of these installations. While many efficiency improvements have focused on processor power and cooling costs,
reducing the cost and power consumption of high-performance memory has mostly been overlooked. This thesis proposes
a new address translation model called Dynamic Partitioned Global Address Space (DPGAS) that extends the ideas of
NUMA and software-based approaches to create a high-performance hardware model that can be used to reduce the overall cost and power of memory in larger server installations. A memory model and hardware implementation of DPGAS is developed, and simulations of memory-intensive workloads are used to show potential cost and power reductions when DPGAS is integrated into a server environment.
|
134 |
An initial operating system adaptation heuristic for Swap Cluster Max (SCM)Somanathan, Muthuveer, January 2008 (has links)
Thesis (M.S.)--University of Texas at El Paso, 2008. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
|
135 |
Improving instruction fetch rate with code pattern cache for superscalar architectureBeg, Azam Muhammad, January 2005 (has links)
Thesis (Ph.D.) -- Mississippi State University. Department of Electrical and Computer Engineering. / Title from title screen. Includes bibliographical references.
|
136 |
SRAM system design for memory based computingZia, Muneeb 03 April 2013 (has links)
The objective of the research was to design and test an SRAM system which can meet the performance criteria for Memory Based Computing (MBC). This form of computing consists of a Look-Up Table (LUT) which is basically memory array mapped with a function; the computations thereafter consist of essentially read operations. An MBC framework requires very fast and low power read operations. Moreover, the cells need to be read stable as major part of the computation is done by reading the LUTs mapped in the SRAM array.
Design and measurement of a prototype MBC test-chip with SRAM system optimized for read-heavy applications is presented in this thesis. For this purpose, a prototype MBC system was designed and taped out. Essential study of the write-ability of the core LUT is also presented. The core memory array for function table mapping was characterized for leakage, write-ability and power saving associated with pulsed read mode.
|
137 |
Design of heterogeneous coherence hierarchies using manager-client pairingBeu, Jesse Garrett 09 April 2013 (has links)
Over the past ten years, the architecture community has witnessed the end of single-threaded performance scaling and a subsequent shift in focus toward multicore and manycore processing. While this is an exciting time for architects, with many new opportunities and design spaces to explore, this brings with it some new challenges. One area that is especially impacted is the memory subsystem. Specifically, the design, verification, and evaluation of cache coherence protocols becomes very challenging as cores become more numerous and more diverse.
This dissertation examines these issues and presents Manager-Client Pairing as a solution to the challenges facing next-generation coherence protocol design. By defining a standardized coherence communication interface and permissions checking algorithm, Manager-Client Pairing enables coherence hierarchies to be constructed and evaluated quickly without the high design-cost previously associated with hierarchical composition. Further, Manager-Client Pairing also allows for verification composition, even in the presence of protocol heterogeneity. As a result, this rapid development of diverse protocols is ensured to be bug-free, enabling architects to focus on performance optimization, rather than debugging and correctness concerns, while comparing diverse coherence configurations for use in future heterogeneous systems.
|
138 |
Hardware assisted memory checkpointing and applications in debugging and reliabilityDoudalis, Ioannis 25 July 2011 (has links)
The problems of software debugging and system reliability/availability are among the most challenging problems the computing industry is facing today, with direct impact on the development and operating costs of computing systems. A promising debugging technique that assists programmers identify and fix the causes of software bugs a lot more efficiently is bidirectional debugging, which enables the user to execute the program in "reverse", and a typical method used to recover a system after a fault is backwards error recovery, which restores the system to the last error-free state. Both reverse execution and backwards error recovery are enabled by creating memory checkpoints, which are used to restore the program/system to a prior point in time and re-execute until the point of interest. The checkpointing frequency is the primary factor that affects both the latency of reverse execution and the recovery time of the system; more frequent checkpoints reduce the necessary re-execution time.
Frequent creation of checkpoints poses performance challenges, because of the increased number of memory reads and writes necessary for copying the modified system/program memory, and also because of software interventions, additional synchronization and I/O, etc., needed for creating a checkpoint. In this thesis I examine a number of different hardware accelerators, whose role is to create frequent memory checkpoints in the background, at minimal performance overheads. For the purpose of reverse execution, I propose the HARE and Euripus hardware checkpoint accelerators. HARE and Euripus create different types of checkpoints, and employ different methods for keeping track of the modified memory. As a result, HARE and Euripus have different hardware costs and provide different functionality which directly affects the latency of reverse execution. For improving the availability of the system, I propose the Kyma hardware accelerator. Kyma enables simultaneous creation of checkpoints at different frequencies, which allows the system to recover from multiple types of errors and tolerate variable error-detection latencies. The Kyma and Euripus hardware engines have similar architectures, but the functionality of the Kyma engine is optimized for further reducing the performance overheads and improving the reliability of the system. The functionality of the Kyma and Euripus engines can be combined into a unified accelerator that can serve the needs of both bidirectional debugging and system recovery.
|
139 |
The use of memory state knowledge to improve computer memory system organizationIsen, Ciji 01 June 2011 (has links)
The trends in virtualization as well as multi-core, multiprocessor environments
have translated to a massive increase in the amount of main memory each individual
system needs to be fitted with, so as to effectively utilize this growing compute capacity.
The increasing demand on main memory implies that the main memory devices and their
issues are as important a part of system design as the central processors. The primary
issues of modern memory are power, energy, and scaling of capacity. Nearly a third of
the system power and energy can be from the memory subsystem. At the same time,
modern main memory devices are limited by technology in their future ability to scale
and keep pace with the modern program demands thereby requiring exploration of
alternatives to main memory storage technology. This dissertation exploits dynamic
knowledge of memory state and memory data value to improve memory performance and
reduce memory energy consumption.
A cross-boundary approach to communicate information about dynamic memory
management state (allocated and deallocated memory) between software and hardware
viii
memory subsystem through a combination of ISA support and hardware structures is
proposed in this research. These mechanisms help identify memory operations to regions
of memory that have no impact on the correct execution of the program because they
were either freshly allocated or deallocated. This inference about the impact stems from
the fact that, data in memory regions that have been deallocated are no longer useful to
the actual program code and data present in freshly allocated memory is also not useful to
the program because the dynamic memory has not been defined by the program. By
being cognizant of this, such memory operations are avoided thereby saving energy and
improving the usefulness of the main memory. Furthermore, when stores write zeros to
memory, the number of stores to the memory is reduced in this research by capturing it as
compressed information which is stored along with memory management state
information.
Using the methods outlined above, this dissertation harnesses memory
management state and data value information to achieve significant savings in energy
consumption while extending the endurance limit of memory technologies. / text
|
140 |
Global address spaces for efficient resource provisioning in the data centerYoung, Jeffrey Scott 13 January 2014 (has links)
The rise of large data sets, or "Big Data'', has coincided with the rise of clusters with large amounts of memory and GPU accelerators that can be used to process rapidly growing data footprints. However, the complexity and performance limitations of sharing memory and accelerators in a cluster limits the options for efficient management and allocation of resources for applications. The global address space model (GAS), and specifically hardware-supported GAS, is proposed as a means to provide a high-performance resource management platform upon which resource sharing between nodes and resource aggregation across nodes
can take place. This thesis builds on the initial concept of GAS with a model that is matched to "Big Data'' computing and its data transfer requirements.
The proposed model, Dynamic Partitioned Global Address Spaces (DPGAS), is implemented using a commodity converged interconnect, HyperTransport over Ethernet (HToE), and a software framework, the Oncilla runtime and API. The DPGAS model and associated hardware and software components are used to investigate two application spaces, resource sharing for time-varying workloads and
resource aggregation for GPU-accelerated data warehousing applications. This work demonstrates that hardware-supported GAS can be used improve the performance and power consumption of memory-intensive applications, and that it can be used to simplify host and accelerator resource management in the data center.
|
Page generated in 0.6646 seconds