Return to search

Architectures and Theoretical Models for Shared Scratchpad Memory Systems

Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has almost doubled every 18 months. Mostly, these advancements were enabled by technological progress. Even the end of frequency scaling could not bring the ever-increasing performance growth to a halt. However, technology burdens, like noticeable leakage currents, have piled up, which shifts the focus towards architectural improvements. Especially the multi-core paradigm has proven its virtue for chip designs over the last decade. While having been introduced in high-performance computing areas, modern technology nodes also enable low-cost, low-power embedded designs to benefi t from multiple cores and accelerators. Since the majority of cores depend on memory, which requires a considerable amount of chip area, this common resource needs to be shared effi ciently. High-performance cores use shared caches to increase memory utilization. However, many accelerators do not use caches as they need predictable and fast scratchpad memory (SM). But sharing SM entails confl icts, questioning its fast and predictable nature. Hence, the question arises on how to adapt architectures for sharing while retaining SM’s advantages.
This thesis presents a novel, shared SM architecture that embraces the idea of a minimal logic path between core and memory, thereby increasing the maximum operating frequency. Because of its additional capabilities, like dynamic address translation and programmable priorities, it is also well suited for heterogeneous platforms that use dynamic scheduling and require predictable behavior. Demonstrating its advantages, we analyze the characteristics of the new architecture and compare it to state-of-the-art approaches. To further mitigate confl icts, we present the conception of access interval prediction (AIP). By predicting memory accesses with a granularity of a single clock cycle, AIP guides the allocation of resources. This method maximizes memory utilization while reducing confl ict delays. With the help of various methods inspired by branch prediction, we achieve over 90 % of accurate predictions and reduce stall cycles signifi cantly. Another key contribution of this thesis is the extension of analytic models to estimate the throughput of shared SM systems. Again, the focus lies on heterogeneous systems with different priorities and access patterns. The results show a promising error reduction, boosting the used
models applicability for real design use cases.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:76573
Date10 November 2021
CreatorsWittig, Robert Klaus
ContributorsFettweis, Gerhard, Benini, Luca, Matus, Emil, Technische Universität Dresden
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0041 seconds