Spelling suggestions: "subject:"scratchpad demory"" "subject:"scratchpad amemory""
1 |
Scratchpad Management in Software Managed Manycore ArchitecturesJanuary 2017 (has links)
abstract: Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM called scratchpad memories or SPMs. SPMs consume significantly less area, and are more energy-efficient per access than caches, and therefore make the design of on-chip memories much simpler. Unlike caches, which fetch data from memories automatically, an SPM requires explicit instructions for data transfers. SPM-only architectures are thus named as software managed manycore (SMM), since the data movements of such architectures rely on software. SMM processors have been widely used in different areas, such as embedded computing, network processing, or even high performance computing. While SMM processors provide a low-power platform, the hardware alone does not guarantee power efficiency, if applications on such processors deliver low performance. Efficient software techniques are therefore required. A big body of management techniques for SMM architectures are compiler-directed, as inserting data movement operations by hand forces programmers to trace flow of data, which can be error-prone and sometimes difficult if not impossible. This thesis develops compiler-directed techniques to manage data transfers for embedded applications on SMMs efficiently. The techniques analyze and find out the proper program points and insert data movement instructions accordingly. The techniques manage code, stack and heap data of applications, and reduce execution time by 14%, 52% and 80% respectively compared to their predecessors on typical embedded applications. On top of managing local data, a technique is also developed for shared data in SMM architectures. Experimental results show it achieves more than 2X speedup than the previous technique on average. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
2 |
An Instruction Scratchpad Memory Allocation for the Precision Timed ArchitecturePrakash, Aayush 11 December 2012 (has links)
This work presents a static instruction allocation scheme for the precision timed architecture’s (PRET) scratchpad memory. Since PRET provides timing instructions to control the temporal execution of programs, the objective of the allocation scheme is to ensure that the explicitly specified temporal requirements are met. Furthermore, this allocation incorporates instructions from multiple hardware threads of the PRET architecture. We formulate the allocation as an integer-linear programming problem, and we implement a tool that takes binaries, constructs a control-flow graph, performs the allocation, rewrites the binary with the new allocation, and generates an output binary for the PRET architecture. We carry out experiments on a modified version of the Malardalen benchmarks to illustrate that commonly known ACET and WCET based approaches cannot be directly applied to meet explicit timing requirements. We also show the advantage of performing the allocation across multiple threads. We present a real time benchmark controlling an Unmanned Air Vehicle as the case study.
|
3 |
A Memory-Realistic SPM Allocator with WCET/ACET Tunable PerformanceBai, Jia-yu 16 September 2010 (has links)
Real-time systems often use SPM instead of cache, because SPM allows a program¡¦s run time to be more predictable. Real-time system need predictable runtimes, because they must schedule programs to finish within specific deadlines. A deadline should be larger than its program¡¦s worst-case execution time (WCET).
Our laboratory is conducting ongoing research into scratchpad memory allocation (SPM) for reducing the WCET of a program. Compared to our previous work, this current thesis improves our memory model, our allocation algorithms, our real-time support, and our measurement benchmarks and platform.
Our key accomplishments in this paper are to: 1) add, for the first time in the literature, true WCETmeas analysis to an SPM allocator, 2) to modestly improve the performance of our previous allocator, and 3) to greatly increase the applicability over that allocator, by extending the method to support recursive programs.
|
4 |
A Stack-Optimized Scratchpad Memory Allocator for Reducing Either the Average-Case or the Worst-Case Execution TimeWu, Cheng-Ying 10 August 2009 (has links)
Scratchpad memory (SPM) is popular for real-time embedded systems. Whereas caches use a memory management unit (MMU) to control which data accesses go to the fast, on-chip SRAM, SPM directly maps certain addresses to the SRAM. One advantage of SPM is that it avoids the cache¡¦s costly MMU. Another advantage is that the SPM is 100% statically predictable, whereas the variables stored in the cache depend upon the dynamic execution history. This predictability is beneficial for real-time systems which must schedule tasks to finish by fixed deadlines. To set these deadlines, system designers must determine the worst-case execution times (WCETs) of the applications. The predictability of SPM makes these WCETs easier to measure.
This thesis presents a new method for allocating stack and global data to the SPM. It is the first method to make use of the special properties of non-escaping variables so as to increase the effective size of the SPM. Our insight is that many local variables of caller functions can be temporarily swapped out of the SPM while the callee function executes.
Ours is also the first method to support profiled WCET measurements in the allocation strategy. Most previous SPM methods optimize only for the average-case execution time (ACET), despite the fact that SPMs are often used in real-time environments where the WCET is also important. This new memory allocation strategy is also the first to be WCET/ACET tunable, a feature that is particular useful for soft real-time systems.
Only one previous work considers a WCET-targeted SPM allocator. That work, however, only applies to static WCET analysis tools. Such tools are difficult to program and are not widely used. Also, they only have application to the most safety-critical of real-time systems. In contrast, our approach is the first to employ measurement-based WCET analysis (such as is most commonly used in industry) for SPM allocation.
|
5 |
An Instruction Scratchpad Memory Allocation for the Precision Timed ArchitecturePrakash, Aayush 11 December 2012 (has links)
This work presents a static instruction allocation scheme for the precision timed architecture’s (PRET) scratchpad memory. Since PRET provides timing instructions to control the temporal execution of programs, the objective of the allocation scheme is to ensure that the explicitly specified temporal requirements are met. Furthermore, this allocation incorporates instructions from multiple hardware threads of the PRET architecture. We formulate the allocation as an integer-linear programming problem, and we implement a tool that takes binaries, constructs a control-flow graph, performs the allocation, rewrites the binary with the new allocation, and generates an output binary for the PRET architecture. We carry out experiments on a modified version of the Malardalen benchmarks to illustrate that commonly known ACET and WCET based approaches cannot be directly applied to meet explicit timing requirements. We also show the advantage of performing the allocation across multiple threads. We present a real time benchmark controlling an Unmanned Air Vehicle as the case study.
|
6 |
Compiler and Runtime for Memory Management on Software Managed Manycore ProcessorsJanuary 2014 (has links)
abstract: We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology. / Dissertation/Thesis / Ph.D. Computer Science 2014
|
7 |
WCET-Aware Scratchpad Memory Management for Hard Real-Time SystemsJanuary 2017 (has links)
abstract: Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have a large impact on the worst-case performance due to expensive off- chip memory accesses involved in cache miss handling. In this regard, software-controlled scratchpad memories (SPMs) have become a promising alternative to caches. An SPM is a raw SRAM, controlled only by executing data movement instructions explicitly at runtime, and such explicit control facilitates static analyses to obtain safe and tight upper bounds of WCETs. SPM management techniques, used in compilers targeting an SPM-based processor, determine how to use a given SPM space by deciding where to insert data movement instructions and what operations to perform at those program locations. This dissertation presents several management techniques for program code and stack data, which aim to optimize the WCETs of a given program. The proposed code management techniques include optimal allocation algorithms and a polynomial-time heuristic for allocating functions to the SPM space, with or without the use of abstraction of SPM regions, and a heuristic for splitting functions into smaller partitions. The proposed stack data management technique, on the other hand, finds an optimal set of program locations to evict and restore stack frames to avoid stack overflows, when the call stack resides in a size-limited SPM. In the evaluation, the WCETs of various benchmarks including real-world automotive applications are statically calculated for SPMs and caches in several different memory configurations. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
8 |
Architectures and Theoretical Models for Shared Scratchpad Memory SystemsWittig, Robert Klaus 10 November 2021 (has links)
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has almost doubled every 18 months. Mostly, these advancements were enabled by technological progress. Even the end of frequency scaling could not bring the ever-increasing performance growth to a halt. However, technology burdens, like noticeable leakage currents, have piled up, which shifts the focus towards architectural improvements. Especially the multi-core paradigm has proven its virtue for chip designs over the last decade. While having been introduced in high-performance computing areas, modern technology nodes also enable low-cost, low-power embedded designs to benefi t from multiple cores and accelerators. Since the majority of cores depend on memory, which requires a considerable amount of chip area, this common resource needs to be shared effi ciently. High-performance cores use shared caches to increase memory utilization. However, many accelerators do not use caches as they need predictable and fast scratchpad memory (SM). But sharing SM entails confl icts, questioning its fast and predictable nature. Hence, the question arises on how to adapt architectures for sharing while retaining SM’s advantages.
This thesis presents a novel, shared SM architecture that embraces the idea of a minimal logic path between core and memory, thereby increasing the maximum operating frequency. Because of its additional capabilities, like dynamic address translation and programmable priorities, it is also well suited for heterogeneous platforms that use dynamic scheduling and require predictable behavior. Demonstrating its advantages, we analyze the characteristics of the new architecture and compare it to state-of-the-art approaches. To further mitigate confl icts, we present the conception of access interval prediction (AIP). By predicting memory accesses with a granularity of a single clock cycle, AIP guides the allocation of resources. This method maximizes memory utilization while reducing confl ict delays. With the help of various methods inspired by branch prediction, we achieve over 90 % of accurate predictions and reduce stall cycles signifi cantly. Another key contribution of this thesis is the extension of analytic models to estimate the throughput of shared SM systems. Again, the focus lies on heterogeneous systems with different priorities and access patterns. The results show a promising error reduction, boosting the used
models applicability for real design use cases.
|
9 |
Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip MemoryChen, Zhi 01 January 2013 (has links)
The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms.
|
10 |
Dynamic Allocation for Embedded Heterogeneous Memory : An Empirical StudyPeterson, Thomas January 2018 (has links)
Embedded systems are omnipresent and contribute to our lives in many ways by instantiating functionality in larger systems. To operate, embedded systems require well-functioning software, hardware as well as an interface in-between these. The hardware and software of these systems is under constant change as new technologies arise. An actual change these systems are undergoing are the experimenting with different memory management techniques for RAM as novel non-volatile RAM(NVRAM) technologies have been invented. These NVRAM technologies often come with asymmetrical read and write latencies and thus motivate designing memory consisting of multiple NVRAMs. As a consequence of these properties and memory designs there is a need for memory management that minimizes latencies.This thesis addresses the problem of memory allocation on heterogeneous memory by conducting an empirical study. The first part of the study examines free list, bitmap and buddy system based allocation techniques. The free list allocation technique is then concluded to be superior. Thereafter, multi-bank memory architectures are designed and memory bank selection strategies are established. These strategies are based on size thresholds as well as memory bank occupancies. The evaluation of these strategies did not result in any major conclusions but showed that some strategies were more appropriate for someapplication behaviors. / Inbyggda system existerar allestädes och bidrar till våran livsstandard på flertalet avseenden genom att skapa funktionalitet i större system. För att vara verksamma kräver inbyggda system en välfungerande hård- och mjukvara samt gränssnitt mellan dessa. Dessa tre måste ständigt omarbetas i takt med utvecklingen av nya användbara teknologier för inbyggda system. En förändring dessa system genomgår i nuläget är experimentering med nya minneshanteringstekniker för RAM-minnen då nya icke-flyktiga RAM-minnen utvecklats. Dessa minnen uppvisar ofta asymmetriska läs och skriv fördröjningar vilket motiverar en minnesdesign baserad på flera olika icke-flyktiga RAM. Som en konsekvens av dessa egenskaper och minnesdesigner finns ett behov av att hitta minnesallokeringstekniker som minimerar de fördröjningar som skapas. Detta dokument adresserar problemet med minnesallokering på heterogena minnen genom en empirisk studie. I den första delen av studien studerades allokeringstekniker baserade på en länkad lista, bitmapp och ett kompissystem. Med detta som grund drogs slutsatsen att den länkade listan var överlägsen alternativen. Därefter utarbetades minnesarkitekturer med flera minnesbanker samtidigt som framtagandet av flera strategier för val av minnesbank utfördes. Dessa strategier baserades på storleksbaserade tröskelvärden och nyttjandegrad hos olika minnesbanker. Utvärderingen av dessa strategier resulterade ej i några större slutsatser men visade att olika strategier var olika lämpade för olika beteenden hos applikationer.
|
Page generated in 0.048 seconds