Spelling suggestions: "subject:"multicore"" "subject:"multiscore""
51 |
Towards Predictable Real-Time Performance on Multi-Core PlatformsKim, Hyoseung 01 June 2016 (has links)
Cyber-physical systems (CPS) integrate sensing, computing, communication and actuation capabilities to monitor and control operations in the physical environment. A key requirement of such systems is the need to provide predictable real-time performance: the timing correctness of the system should be analyzable at design time with a quantitative metric and guaranteed at runtime with high assurance. This requirement of predictability is particularly important for safety-critical domains such as automobiles, aerospace, defense, manufacturing and medical devices. The work in this dissertation focuses on the challenges arising from the use of modern multi-core platforms in CPS. Even as of today, multi-core platforms are rarely used in safety-critical applications primarily due to the temporal interference caused by contention on various resources shared among processor cores, such as caches, memory buses, and I/O devices. Such interference is hard to predict and can significantly increase task execution time, e.g., up to 12 commodity quad-core platforms. To address the problem of ensuring timing predictability on multi-core platforms, we develop novel analytical and systems techniques in this dissertation. Our proposed techniques theoretically bound temporal interference that tasks may suffer from when accessing shared resources. Our techniques also involve software primitives and algorithms for real-time operating systems and hypervisors, which significantly reduce the degree of the temporal interference. Specifically, we tackle the issues of cache and memory contention, locking and synchronization, interrupt handling, and access control for computational accelerators such as general-purpose graphics processing units (GPGPUs), all of which are crucial to achieving predictable real-time performance on a modern multi-core platform. Our solutions are readily applicable to commodity multi-core platforms, and can be used not only for developing new systems but also migrating existing applications from single-core to multi-core platforms.
|
52 |
Co-projeto de hardware e software de um escalonador de processos para arquiteturas multicore heterogêneas baseadas em computação reconfigurável / Hardware and software co-design of a process scheduler for heterogeneous multicore architectures based on reconfigurable computingBueno, Maikon Adiles Fernandez 05 November 2013 (has links)
As arquiteturas multiprocessadas heterogêneas têm como objetivo principal a extração de maior desempenho da execução dos processos, por meio da utilização de núcleos apropriados às suas demandas. No entanto, a extração de maior desempenho é dependente de um mecanismo eficiente de escalonamento, capaz de identificar as demandas dos processos em tempo real e, a partir delas, designar o processador mais adequado, de acordo com seus recursos. Este trabalho tem como objetivo propor e implementar o modelo de um escalonador para arquiteturas multiprocessadas heterogêneas, baseado em software e hardware, aplicado ao sistema operacional Linux e ao processador SPARC Leon3, como prova de conceito. Nesse sentido, foram implementados monitores de desempenho dentro dos processadores, os quais identificam as demandas dos processos em tempo real. Para cada processo, sua demanda é projetada para os demais processadores da arquitetura e em seguida é realizado um balanceamento visando maximizar o desempenho total do sistema, distribuindo os processos entre processadores, de modo a diminuir o tempo total de processamento de todos os processos. O algoritmo de maximização Hungarian, utilizado no balanceamento do escalonador, foi desenvolvido em hardware, proporcionando paralelismo e maior desempenho na execução do algoritmo. O escalonador foi validado por meio da execução paralela de diversos benchmarks, resultando na diminuição dos tempos de execução em relação ao escalonador sem suporte à heterogeneidade / Heterogeneous multiprocessor architectures have as main objective the extraction of higher performance from processes through the use of appropriate cores to their demands. However, the extraction of higher performance is dependent on an efficient scheduling mechanism, able to identify in real-time the demands of processes and to designate the most appropriate processor according to their resources. This work aims at design and implementations of a model of a scheduler for heterogeneous multiprocessor architectures based on software and hardware, applied to the Linux operating system and the SPARC Leon3 processor as proof of concept. In this sense, performance monitors have been implemented within the processors, which in real-time identifies the demands of processes. For each process, its demand is projected for the other processors in the architecture and then it is performed a balancing to maximize the total system performance by distributing processes among processors. The Hungarian maximization algorithm, used in balancing scheduler was developed in hardware, providing greater parallelism and performance in the execution of the algorithm. The scheduler has been validated through the parallel execution of several benchmarks, resulting in decreased execution times compared to the scheduler without the heterogeneity support
|
53 |
Single-phase laminar flow heat transfer from confined electron beam enhanced surfacesFerhati, Arben January 2015 (has links)
The continuing requirement for computational processing power, multi-functional devices and component miniaturization have emphasised the need for thermal management systems able to maintain the temperature at safe operating condition. The thermal management industry is constantly seeking for new cutting edge, efficient, cost effective heat transfer enhancement technologies. The aim of this study is to utilize the electron beam treatment for the improvement of the heat transfer area in liquid cooled plates and experimentally evaluate the performance. Considering the complexity of the technology, this thesis focuses on the design and production of electron beam enhanced test samples, construction of the test facility, testing procedure and evaluation of thermal and hydraulic characteristics. In particular, the current research presented in this thesis contains a number of challenging and cutting edge technological developments that include: (1) an overview of the semiconductor industry, cooling requirements, the market of thermal management systems, (2) an integral literature review of pin-fin enhancement technology, (3) design and fabrication of the electron beam enhanced test samples, (4) upgrade and construction of the experimental test rig and the development of the test procedure, (5) reduction of the experimental data and analysis to evaluate thermal and hydraulic performance. The experimental results show that the capability of the electron beam treatment to improve the thermal efficiency of current untreated liquid cooled plates is approximately three times. The highest heat transfer rate was observed for the sample S3; this is attributed to the irregularities of the enhanced structure, which improves the heat transfer area, mixing, and disturbs the thermal and velocity boundary layers. Enhancement of heat transfer for all three samples was characterised by an increase of pressure drop. The electron beam enhancement technique is a rapid process with zero material waste and cost effective. It allows thermal management systems to be produced smaller and faster, reduce material usage, without compromising safety, labour cost or the environment.
|
54 |
Optimizing a software build system through multi-core processingDahlberg, Robin January 2019 (has links)
In modern software development, continuous integration has become a integral part of agile development methods, advocating that developers should integrate their code frequently. Configura currently has one dedicated machine, performing tasks such as building the software and running system tests each time a developer submits new code to the main repository. One of the main practices of continuous integration advocates for having a fast build in order to keep the feedback loop short for developers, leading to increased productivity. Configura’s build system, named Build Central, currently uses a sequential build procedure to execute said tasks and was becoming too slow to keep up with the number of requested builds. The primary method for speeding up this procedure was to utilize the multi-core architecture of the build machine. In order to accomplish this, the system would need to deploy a scheduling algorithm to distribute and order tasks correctly. In this thesis, six scheduling algorithms are implemented and compared. Four of these algorithms are based on the classic list scheduling approach, and two additional algorithms are proposed which are based on dynamic scheduling principles. In this particular system, the dynamic algorithms proved to have better performance compared to the static scheduling algorithms. Performance on Build Central, using four processing cores, was improved with approximately 3.4 times faster execution time on an average daily build, resulting in a large increase of the number of builds that can be performed each day.
|
55 |
Controlling execution time variability using COTS for Safety-critical systems / Contrôler la variabilité du temps d’exécution en utilisant COTS pour les systèmes Safety-criticalBin, Jingyi 10 July 2014 (has links)
Au cours de la dernière décennie, le domaine safety-critical s’appuie sur les Commercial Off-The-Shelf (COTS) architectures de mono-coeur malgré leur variabilité du temps d'exécution inhérent. Aujourd'hui, l'industrie safety-critical envisage la possibilité d'utilisation des COTS de multi-coeur en tenant compte de la demande croissante de performance. Cependant, le passage de mono-coeur à multi-coeur aggrave le problème de variabilité du temps d'exécution dû à la contention de ressources partagées. Les techniques standard pour gérer cette variabilité comme sur-approvisionnement de ressources ne peuvent pas être appliquées à multi-coeur en considérant que les safety-marges compenseront la plupart voire tout le gain de performance donné par les multi-coeurs. Une solution possible serait de capturer le comportement des mécanismes de contention potentielle sur les ressources partagées relativement à chaque application co-fonctionnant sur le système. Malheureusement, les caractéristiques sur les mécanismes de contention ne sont pas généralement clairement documentées. Dans la thèse, nous introduisons les techniques de mesure basées sur un ensemble de stressing benchmarks et les hardware monitors à caractériser 1) l'architecture en identifiant les ressources partagées et en étudiant leur mécanisme de contention. 2) les applications en étudiant comment elles se comportent relativement aux ressources partagées. Sur la base de ces informations, nous proposons une technique à estimer le WCET d'une application dans un co-running contexte prédéterminé en simulant le pire cas des contentions sur les ressources partagées produites par co-runners de l'application. / While relying during the last decade on single-core Commercial Off-The-Shelf (COTS) architectures despite their inherent runtime variability, the safety critical industry is now considering a shift to multi-core COTS in order to match the increasing performance requirement. However, the shift to multi-core COTS worsens the runtime variability issue due to the contention on shared hardware resources. Standard techniques to handle this variability such as resource over-provisioning cannot be applied to multi-cores as additional safety margins will offset most if not all the multi-core performance gains. A possible solution would be to capture the behavior of potential contention mechanisms on shared hardware resources relatively to each application co-running on the system. However, the features on contention mechanisms are usually very poorly documented. In this thesis, we introduce measurement techniques based on a set of dedicated stressing benchmarks and architecture hardware monitors to characterize (1) the architecture, by identifying the shared hardware resources and revealing their associated contention mechanisms. (2) the applications, by learning how they behave relatively to shared resources. Based on such information, we propose a technique to estimate the WCET of an application in a pre-determined co-running context by simulating the worst case contention on shared resources produced by the application's co-runners.
|
56 |
Dealing with actor runtime environments on hierarchical shared memory multi-core platforms / Environnements d'exécution à base d'acteurs pour plates-formes multi-coeurs à mémoire partagée hiérarchiqueDe Camargo Francesquini, Emilio 15 May 2014 (has links)
Le modèle de programmation à base d'acteurs a été intensivement utilisé pour le développement de grandes applications et systèmes. On citera par exemple la fonction chat de Facebook ou bien encore WhatsApp. Ces systèmes peuvent avoir plusieurs milliers d'utilisateurs connectés simultanément avec des contraintes fortes de performance et d'interactivité. Ces systèmes s"appuient sur des infrastructures informatiques basées sur des processeurs multi-cœurs. Ces infrastructures disposent en général d'un espace mémoire partagé et hiérarchique NUMA (Non-Uniform Memory Access). Notre analyse de l'état de l'art montre que peu d'études ont été menées sur l'adéquation des environnements d'exécution à base d'acteurs avec des plates-formes à mémoire hiérarchique. Ces environnements d'exécution font en général l'hypothèse que l'espace de mémoire est complètement plat, ce qui pose ensuite de sérieux problèmes de performance. Dans cette thèse, nous étudions les défis posés par les plates-formes multi-cœurs à mémoire hiérarchiques pour des environnements à base d'acteurs. Nous étudions plus particulièrement les problèmes de gestion mémoire, d'ordonnancement et d'équilibrage de charge.Dans la première partie de la thèse, nous avons analysé et caractérisé les applications basées sur le modèle d'acteurs. Cela a permis de mettre en évidence le fait que les exécutions des applications et benchmarks faisaient ressortir des structures de communication particulières que les environnements d'exécution se doivent de prendre en compte pour optimiser les performances. La prise en compte du graphe de communication et la mise en œuvre ont été effectuées dans un environnement d'exécution réel, la machine virtuelle (VM) du langage de programmation Erlang. Le langage de programmation Erlang s'appuie sur le modèle d'acteurs avec une syntaxe claire et cohérente pour la gestion des acteurs. Les modifications que nous avons intégrées à la machine virtuelle Erlang permettent d'améliorer significativement les performances grâce à une meilleure prise en compte de l'affinité entre des acteurs qui interagissent beaucoup. L'ordonnancement et la régulation de charge de l'application sont également améliorées grâce à une meilleure connaissance de l'application et de la topologie de la plate-forme. Une des perspectives serait d'intégrer ces contributions à d'autres environnements d'exécution à base d'acteurs, comme par exemple ceux des Kilim et Akka. / The actor model is present in several mission-critical systems, such as those supporting WhatsApp and Facebook Chat. These systems serve thousands of clients simultaneously, therefore demanding substantial computing resources usually provided by multi-processor and multi-core platforms. Non-Uniform Memory Access (NUMA) architectures account for an important share of these platforms. Yet, research on the the suitability of the current actor runtime environments for these machines is very limited. Current runtime environments, in general, assume a flat memory space, thus not performing as well as they could. In this thesis we study the challenges hierarchical shared memory multi-core platforms present to actor runtime environments. In particular, we investigate aspects related to memory management, scheduling, and load-balancing.In this document, we analyze and characterize actor based applications to, in light of the above, propose improvements to actor runtime environments. This analysis highlighted the existence of peculiar communication structures. We argue that the comprehension of these structures and the knowledge about the underlying hardware architecture can be used in tandem to improve application performance. As a proof of concept, we implemented our proposal using a real actor runtime environment, the Erlang Virtual Machine (VM). Concurrency in Erlang is based on the actor model and the language has a consistent syntax for actor handling. Our modifications to the Erlang VM significantly improved the performance of some applications thanks to better informed decisions on scheduling and on load-balancing. As future work we envision the integration of our approach into other actor runtime environments such as Kilim and Akka.
|
57 |
Optimizing Inter-core Data-propagation Delays in Multi-core Embedded SystemsGrosic, Hasan, Hasanovic, Emir January 2019 (has links)
The demand for computing power and performance in real-time embedded systems is continuously increasing since new customer requirements and more advanced features are appearing every day. To support these functionalities and handle them in a more efficient way, multi-core computing platforms are introduced. These platforms allow for parallel execution of tasks on multiple cores, which in addition to its benefits to the system's performance introduces a major problem regarding the timing predictability of the system. That problem is reflected in unpredictable inter-core interferences, which occur due to shared resources among the cores, such as the system bus. This thesis investigates the application of different optimization techniques for the offline scheduling of tasks on the individual cores, together with a global scheduling policy for the access to the shared bus. The main effort of this thesis focuses on optimizing the inter-core data propagation delays which can provide a new way of creating optimized schedules. For that purpose, Constraint Programming optimization techniques are employed and a Phased Execution Model of the tasks is assumed. Also, in order to enforce end-to-end timing constraints that are imposed on the system, job-level dependencies are generated prior and subsequently applied during the scheduling procedure. Finally, an experiment with a large number of test cases is conducted to evaluate the performance of the implemented scheduling approach. The obtained results show that the method is applicable for a wide spectrum of abstract systems with variable requirements, but also open for further improvement in several aspects.
|
58 |
Roko: Balancing Performance and Usability in Coarse-grain ParallelizationSegulja, Cedomir 06 April 2010 (has links)
We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup.
|
59 |
Roko: Balancing Performance and Usability in Coarse-grain ParallelizationSegulja, Cedomir 06 April 2010 (has links)
We present Roko, a system that allows parallelization of sequential C codes with a modest user intervention. The user exposes parallelism at the function level by annotating the code with pragmas. Roko defines only two pragmas: the parallel pragma is used to denote function calls that will be executed asynchronously, and the exposed pragma is used to describe data usage of the marked function calls. Architecturally, Roko consists of three components: a compiler that analyzes pragmas, a software environment that spreads the execution over multiple processors, and a hardware support that implements a novel synchronization scheme, versioning. We have designed, implemented and evaluated an FPGA-based prototype of Roko. Our experimental evaluation shows: (i) that few simple pragmas are all that is needed to expose parallelism in benchmark applications and (ii) that Roko can deliver good performance in terms of application speedup.
|
60 |
Castell: a heterogeneous cmp architecture scalable to hundreds of processorsCabarcas Jaramillo, Felipe 19 September 2011 (has links)
Technology improvements and power constrains have taken multicore architectures to dominate
microprocessor designs over uniprocessors. At the same time, accelerator based architectures
have shown that heterogeneous multicores are very efficient and can provide high throughput for
parallel applications, but with a high-programming effort. We propose Castell a scalable chip
multiprocessor architecture that can be programmed as uniprocessors, and provides the high
throughput of accelerator-based architectures.
Castell relies on task-based programming models that simplify software development. These
models use a runtime system that dynamically finds, schedules, and adds hardware-specific features
to parallel tasks. One of these features is DMA transfers to overlap computation and data
movement, which is known as double buffering. This feature allows applications on Castell
to tolerate large memory latencies and lets us design the memory system focusing on memory
bandwidth.
In addition to provide programmability and the design of the memory system, we have used
a hierarchical NoC and added a synchronization module. The NoC design distributes memory
traffic efficiently to allow the architecture to scale. The synchronization module is a consequence
of the large performance degradation of application for large synchronization latencies.
Castell is mainly an architecture framework that enables the definition of domain-specific
implementations, fine-tuned to a particular problem or application. So far, Castell has been
successfully used to propose heterogeneous multicore architectures for scientific kernels, video
decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW).
It has also been used to explore a number of architecture optimizations such as enhanced DMA
controllers, and architecture support for task-based programming models.
iii
|
Page generated in 0.0319 seconds