Global ETD Search

11	Compiler and Runtime for Memory Management on Software Managed Manycore Processors January 2014 (has links) abstract: We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology. / Dissertation/Thesis / Ph.D. Computer Science 2014 Computer science Code Generation Compiler Manycore Memory Scratchpad Memory
12	TIME-PREDICTABLE FAST MEMORIES: CACHES VS. SCRATCHPAD MEMORIES Liu, Yu 01 August 2011 (has links) In modern processor architectures, caches are widely used to shorten the gap between the processor speed and memory access time. However, caches are time unpredictable, especially the shared L2 cache used by different cores on multicore processors. Thus, it can significantly increase the complexity of worst-case execution time (WCET) analysis, which is crucial for real-time systems. This dissertation designs several time-predictable scratchpad memory (SPM) based architectures for both VLIW (Very Long InstructionWord) based single-core and multicore processors. First, this dissertation proposes a time predictable two-level SPM based architecture for VLIW based single-core processors, and an ILP (Integer Linear Programming) based static memory objects allocation algorithm is extended to support the multi-level SPMs without harming the time predictability of SPMs. Second, several SPM based architectures for VLIW based multicore processors are designed. To support these architectures, the dynamic memory objects allocation based partition, the static memory objects allocation based partition and the static memory objects allocation based priority L2 SPM strategy are proposed, which retain the characteristic of time predictability. Also, both the WCET and worst-case energy consumption (WCEC) of our SPM based single-core and multicore architectures are completely evaluated in this dissertation. Last, to exploit the load/store latencies that are statically known in this architecture, we study a SPM-aware scheduling method to improve the performance. Our experimental results indicate the strengths and weaknesses of each proposed architecture and allocation method, which offers interesting memory design options to enable real-time computing. The strength of the two-level architecture is its superior performance compared to the one-level architecture, while the strength of the one-level architecture is its simple implementation. Also, the two-level architecture with separated L1 SPM for each core better fits for the data-intensive real-time applications, which not only retains good performance but also achieves a higher bandwidth by accessing both instruction and data SPM at the same time. Compared to the static based strategies, the dynamic allocation based partition L2 SPM strategy offers the better performance on each core because of the reuse of SPM space at the run-time, but has much higher complexity. In addition, the experimental results show that the timing and energy performance of our proposed SPM based architectures are superior to the similar cache based and hybrid architectures. Meanwhile, our architectures can ensure time predictability which is desirable for the real-time systems. Caches Multicore Real-Time Systems Scratchpad Memories Time-Predictable
13	Dynamic Allocation for Embedded Heterogeneous Memory : An Empirical Study Peterson, Thomas January 2018 (has links) Embedded systems are omnipresent and contribute to our lives in many ways by instantiating functionality in larger systems. To operate, embedded systems require well-functioning software, hardware as well as an interface in-between these. The hardware and software of these systems is under constant change as new technologies arise. An actual change these systems are undergoing are the experimenting with different memory management techniques for RAM as novel non-volatile RAM(NVRAM) technologies have been invented. These NVRAM technologies often come with asymmetrical read and write latencies and thus motivate designing memory consisting of multiple NVRAMs. As a consequence of these properties and memory designs there is a need for memory management that minimizes latencies.This thesis addresses the problem of memory allocation on heterogeneous memory by conducting an empirical study. The first part of the study examines free list, bitmap and buddy system based allocation techniques. The free list allocation technique is then concluded to be superior. Thereafter, multi-bank memory architectures are designed and memory bank selection strategies are established. These strategies are based on size thresholds as well as memory bank occupancies. The evaluation of these strategies did not result in any major conclusions but showed that some strategies were more appropriate for someapplication behaviors. / Inbyggda system existerar allestädes och bidrar till våran livsstandard på flertalet avseenden genom att skapa funktionalitet i större system. För att vara verksamma kräver inbyggda system en välfungerande hård- och mjukvara samt gränssnitt mellan dessa. Dessa tre måste ständigt omarbetas i takt med utvecklingen av nya användbara teknologier för inbyggda system. En förändring dessa system genomgår i nuläget är experimentering med nya minneshanteringstekniker för RAM-minnen då nya icke-flyktiga RAM-minnen utvecklats. Dessa minnen uppvisar ofta asymmetriska läs och skriv fördröjningar vilket motiverar en minnesdesign baserad på flera olika icke-flyktiga RAM. Som en konsekvens av dessa egenskaper och minnesdesigner finns ett behov av att hitta minnesallokeringstekniker som minimerar de fördröjningar som skapas. Detta dokument adresserar problemet med minnesallokering på heterogena minnen genom en empirisk studie. I den första delen av studien studerades allokeringstekniker baserade på en länkad lista, bitmapp och ett kompissystem. Med detta som grund drogs slutsatsen att den länkade listan var överlägsen alternativen. Därefter utarbetades minnesarkitekturer med flera minnesbanker samtidigt som framtagandet av flera strategier för val av minnesbank utfördes. Dessa strategier baserades på storleksbaserade tröskelvärden och nyttjandegrad hos olika minnesbanker. Utvärderingen av dessa strategier resulterade ej i några större slutsatser men visade att olika strategier var olika lämpade för olika beteenden hos applikationer. Memory management NVRAM Scratchpad memory Embedded systems Minneshantering NVRAM Scratchpad minne Inbyggda system Information Systems
14	Decoupled approaches to register and software controlled memory allocations / Approches découplées aux problèmes d'allocations de registres et de mémoires locales Diouf, Boubacar 15 December 2011 (has links) Malgré la hiérarchie mémoire utilisée dans les ordinateurs modernes, il convient toujours d'optimiser l'utilisation des registres du processeur et des mémoires locales gérées de manières logicielles (mémoires locales) présentes dans beaucoup de systèmes embarqués, de processeurs graphiques (GPUs) et de multiprocesseurs. Lors de la compilation, d'un code source vers un langage machine, deux optimisations de la mémoire revêtent une importance capitale : l'allocation de registres et l'allocation de mémoires locales. Dans ce manuscrit de thèse nous nous intéressons à des approches découplées, qui traitent séparément les problèmes d'allocation et d'assignation, permettant d'améliorer les allocations de registres et de mémoires locales. Dans la première partie de la thèse, nous nous penchons sur le problème de l'allocation de registres. Tout d'abord, nous proposons dans le contexte des compilateurs-juste-à-temps, une allocation de registres fractionnées (split register allocation). Avec cette approche l'allocation de registres est effectuée en deux étapes: une faite durant la phase de compilation statique et l'autre pendant la phase de compilation dynamique. Ce qui permet de réduire le temps d'exécution des programmes avec un impact négligeable sur le temps de compilation. Ensuite Nous introduisons une allocation de registres incrémentale qui permet de résoudre d'une manière quasi-optimale le problème d'allocation. Cette méthode est pseudo-polynomiale alors que le problème d'allocation est NP-complet même à l'intérieur d'un « basic block ». Dans la deuxième partie de la thèse nous nous intéressons au problème de l'allocation de mémoires locales. Au vu des dernières avancées dans le domaine de l'allocation de registres, nous étudions dans quelle mesure le problème d'allocation pourrait être séparé de celui de l'assignation dans le contexte des mémoires locales. Dans un premier temps nous validons expérimentalement que les problèmes d'allocation et d'assignation peuvent être résolus séparément. Ensuite, nous procédons à une étude plus théorique d'une approche découplée de l'allocation de mémoires locales. Cela permet d'introduire de nouveaux résultats sur le « submarine-building problem », une variante du « ship-building problem », que nous avons défini. L'un de ces résultats met en évidence pour la première fois une différence de complexité (P vs. NP-complet) entre les graphes d'intervalles et les graphes d'intervalles unitaires. Dans la troisième partie de la thèse nous proposons une nouvelle heuristique, appelée « clustering allocator » fondée sur la construction de sous-graphes stables d'un graphe d'interférence, permettant de découpler aussi bien le problème d'allocation pour les registres que pour les mémoires locales. Cette nouvelle heuristique se veut le pont qui permettra de réconcilier les problèmes d'allocations de registres et de mémoires locales. / Despite the benefit of the memory hierarchy, it is still essential, in order to reduce accesses to higher levels of memory, to have an efficient usage of registers and local memories (also called scratchpad memories) present in most embedded processors, graphical processors (GPUs) and network processors. During the compilation, from a source language to an executable code, there are two optimizations that are of utmost importance: the register allocation and the local memory allocation. In this thesis's report we are interested in decoupled approaches, solving separately the allocation and assignment problems, that helps to improve the quality of the register and local memory allocations. In the first part of this thesis we are interested in two aspects of the register allocation problem: the improvements of the just-in-time (JIT) register allocation and the spill minimization problem. We introduce the split register allocation which leverages the decoupled approach to improve register allocation in the context of JIT compilation. We experimentally validate the effectiveness of split register allocation and its portability with respect to register count variations, relying on annotations whose impact on the bytecode size is negligible. We introduce a new decoupled approach, called iterated-optimal allocation, which focus on the spill minimization problem. The iterated-optimal allocation algorithm achieves results close to optimal while offering pseudo-polynomial guarantees for SSA programs and fast allocations on general programs. In the second part of this thesis, we study how a decoupled local memory allocation can be proposed in light of recent progresses in register allocation. We first validate our intuition for decoupled approach to local memory allocation. Then, we study the local memory allocation in a more theoretical way setting the junction between local memory allocation for linearized programs and weighted interval graph coloring. We design and analyze a new variant of the ship-building problem called the submarine-building problem. We show that this problem is NP-complete on interval graphs, while it is solvable in linear time for proper interval graphs, equivalent to unit interval graphs. The submarine-building problem is the first problem that is known to be NP-complete on interval graphs, while it is solvable in linear time for unit interval graphs. In the third part of this thesis, we propose a heuristic-based solution, the clustering allocator, which decouples the local memory allocation problem and aims to minimize the allocation cost. The clustering allocator while devised for local memory allocation, it appears to be a very good solution to the register allocation problem. After many years of separation, this new algorithm seems to be a bridge to reconcile the local memory allocation and the register allocation problems. Compilation Allocation de registres Allocation de mémoire Scratchpad Coloration de graphes pondérés Problème de submarine-building Compilation Register allocation Memory allocations Scratchpad Weighted graphs coloring Submarine-building problem
15	WCET-Aware Scratchpad Memory Management for Hard Real-Time Systems January 2017 (has links) abstract: Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have a large impact on the worst-case performance due to expensive off- chip memory accesses involved in cache miss handling. In this regard, software-controlled scratchpad memories (SPMs) have become a promising alternative to caches. An SPM is a raw SRAM, controlled only by executing data movement instructions explicitly at runtime, and such explicit control facilitates static analyses to obtain safe and tight upper bounds of WCETs. SPM management techniques, used in compilers targeting an SPM-based processor, determine how to use a given SPM space by deciding where to insert data movement instructions and what operations to perform at those program locations. This dissertation presents several management techniques for program code and stack data, which aim to optimize the WCETs of a given program. The proposed code management techniques include optimal allocation algorithms and a polynomial-time heuristic for allocating functions to the SPM space, with or without the use of abstraction of SPM regions, and a heuristic for splitting functions into smaller partitions. The proposed stack data management technique, on the other hand, finds an optimal set of program locations to evict and restore stack frames to avoid stack overflows, when the call stack resides in a size-limited SPM. In the evaluation, the WCETs of various benchmarks including real-world automotive applications are statically calculated for SPMs and caches in several different memory configurations. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017 Computer science Computer engineering Compiler Embedded Systems Predictability Real-time Systems Scratchpad Memory WCET
16	Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip Memory Chen, Zhi 01 January 2013 (has links) The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms. Heterogeneousmemory magnetic randomaccess memory MRAM Zerocapacitor random access memory Z-RAM scratchpad memory scheduling Computer and Systems Architecture
17	Decoupled approaches to register and software controlled memory allocations Diouf, Boubacar 15 December 2011 (has links) (PDF) Despite the benefit of the memory hierarchy, it is still essential, in order to reduce accesses to higher levels of memory, to have an efficient usage of registers and local memories (also called scratchpad memories) present in most embedded processors, graphical processors (GPUs) and network processors. During the compilation, from a source language to an executable code, there are two optimizations that are of utmost importance: the register allocation and the local memory allocation. In this thesis's report we are interested in decoupled approaches, solving separately the allocation and assignment problems, that helps to improve the quality of the register and local memory allocations. In the first part of this thesis we are interested in two aspects of the register allocation problem: the improvements of the just-in-time (JIT) register allocation and the spill minimization problem. We introduce the split register allocation which leverages the decoupled approach to improve register allocation in the context of JIT compilation. We experimentally validate the effectiveness of split register allocation and its portability with respect to register count variations, relying on annotations whose impact on the bytecode size is negligible. We introduce a new decoupled approach, called iterated-optimal allocation, which focus on the spill minimization problem. The iterated-optimal allocation algorithm achieves results close to optimal while offering pseudo-polynomial guarantees for SSA programs and fast allocations on general programs. In the second part of this thesis, we study how a decoupled local memory allocation can be proposed in light of recent progresses in register allocation. We first validate our intuition for decoupled approach to local memory allocation. Then, we study the local memory allocation in a more theoretical way setting the junction between local memory allocation for linearized programs and weighted interval graph coloring. We design and analyze a new variant of the ship-building problem called the submarine-building problem. We show that this problem is NP-complete on interval graphs, while it is solvable in linear time for proper interval graphs, equivalent to unit interval graphs. The submarine-building problem is the first problem that is known to be NP-complete on interval graphs, while it is solvable in linear time for unit interval graphs. In the third part of this thesis, we propose a heuristic-based solution, the clustering allocator, which decouples the local memory allocation problem and aims to minimize the allocation cost. The clustering allocator while devised for local memory allocation, it appears to be a very good solution to the register allocation problem. After many years of separation, this new algorithm seems to be a bridge to reconcile the local memory allocation and the register allocation problems. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Compilation Register allocation Memory allocations Scratchpad Weighted graphs coloring Submarine-building problem
18	FORmac DEsk CALculator‎ : un outil de mise au point et d'aide au calcul formel sur ordinateur Laplace, André 21 February 1973 (has links) (PDF) . FORTRAN calculateur calculs formels FORDECAL batch FORMAC REDUCE LISP systèmes interactifs système d'exploitation calcul symbolique programmation langage MATHLAB ALADIN MACSYMA SCRATCHPAD FINSTER TUTOR SYMFORM
19	Compiler Assisted Energy Management For Sensor Network Nodes Jindal, Prachee 08 1900 (has links) Emerging low power, embedded, wireless sensor devices are useful for wide range of applications, yet have very limited processing storage and especially energy resources. Sensor networks have a wide variety of applications in medical monitoring, environmental sensing and military surveillance. Due to the large number of sensor nodes that may be deployed and the required long system lifetimes, replacing the battery is not an option. Sensor systems must utilize the minimal possible energy while operating over a wide range of operating scenarios. The most of the efforts in the energy management in sensor networks have concentrated on minimizing energy consumption in the communication subsystem. Some researchers have also dealt with the issue of minimizing the energy in computing subsystem of a sensor network node. Some proposals using energy aware software have also been made. Relatively little work has been done on compiler controlled energy management in sensor networks. In this thesis, we present our investigations on how compiler techniques can be used to minimize CPU energy consumption in sensor network nodes. One effectively used energy management technique in general purpose processors, is dynamic voltage scaling. In this thesis we implement and evaluate a compiler assisted DVS algorithm and show its usefulness for a small sensor node processor. We were able to achieve an energy saving of 29% with a little performance slowdown. Scratchpad memories have been widely used for improving performance. In this thesis we show that if the scratchpad size for the system is chosen carefully, then large energy savings can be achieved by using a compiler assisted scratchpad allocation policy. With a small size of 512 byte scratchpad memory we were able to achieve 50% of energy savings. We also studied the behavior of dynamic voltage scaling in presence of scratchpad memory. Our results show that in presence of scratchpad memory less opportunities are found for applying dynamic voltage scaling techniques. The sensor network community lacks a comprehensive benchmark suite, for our study we also implemented a set of applications, representative of computational workload on sensor network nodes. The techniques studied in this thesis can easily be integrated with existing energy management techniques in sensor networks, yielding in additional energy savings. Sensor Networks - Data Processing Electronic Detector Networks Data Processing Compilers (Computer Science) Dynamic Voltage Scaling Energy Optimization Scratchpad Memory Sensor Node Architecture Sensor Network Node Node Architecture Computer Science

Search results