Global ETD Search

11	Mapping Concurrent Applications to Multiprocessor Systems with Multithreaded Processors and Network on Chip-Based Interconnections Pop, Ruxandra January 2011 (has links) Network on Chip (NoC) architectures provide scalable platforms for designing Systems on Chip (SoC) with large number of cores. Developing products and applications using an NoC architecture offers many challenges and opportunities. A tool which can map an application or a set of applications to a given NoC architecture will be essential. In this thesis we first survey current techniques and we present our proposals for mapping and scheduling of concurrent applications to NoCs with multithreaded processors as computational resources. NoC platforms are basically a special class of Multiprocessor Embedded Systems (MPES). Conventional MPES architectures are mostly bus-based and, thus, are exposed to potential difficulties regarding scalability and reusability. There has been a lot of research on MPES development including work on mapping and scheduling of applications. Many of these results can also be applied to NoC platforms. Mapping and scheduling are known to be computationally hard problems. A large range of exact and approximate optimization algorithms have been proposed for solving these problems. The methods include Branch-and–Bound (BB), constructive and transformative heuristics such as List Scheduling (LS), Genetic Algorithms (GA) and various types of Mathematical Programming algorithms. Concurrent applications are able to capture a typical embedded system which is multifunctional. Concurrent applications can be executed on an NoC which provides a large computational power with multiple on-chip computational resources. Improving the time performances of concurrent applications which are running on Network on Chip (NoC) architectures is mainly correlated with the ability of mapping and scheduling methodologies to exploit the Thread Level Parallelism (TLP) of concurrent applications through the available NoC parallelism. Matching the architectural parallelism to the application concurrency for obtaining good performance-cost tradeoffs is another aspect of the problem. Multithreading is a technique for hiding long latencies of memory accesses, through the overlapped execution of several threads. Recently, Multi-Threaded Processors (MTPs) have been designed providing the architectural infrastructure to concurrently execute multiple threads at hardware level which, usually, results in a very low context switching overhead. Simultaneous Multi-Threaded Processors (SMTPs) are superscalar processor architectures which adaptively exploit the coarse grain and the fine grain parallelism of applications, by simultaneously executing instructions from several thread contexts. In this thesis we make a case for using SMTPs and MTPs as NoC resources and show that such a multiprocessor architecture provides better time performances than an NoC with solely General-purpose Processors (GP). We have developed a methodology for task mapping and scheduling to an NoC with mixed SMTP, MTP and GP resources, which aims to maximize the time performance of concurrent applications and to satisfy their soft deadlines. The developed methodology was evaluated on many configurations of NoC-based platforms with SMTP, MTP and GP resources. The experimental results demonstrate that the use of SMTPs and MTPs in NoC platforms can significantly speed-up applications. Network on Chip Multiprocessor Embedded Systems Task Mapping Task Scheduling Multithreading Simultaneous Multithreading Response Time Estimation Genetic Algorithms List Scheduling Soft Deadline Task Graphs Engineering and Technology Teknik och teknologier
12	Cache-conscious off-line real-time scheduling for multi-core platforms : algorithms and implementation / Ordonnanceur hors-ligne temps-réel et conscient du cache ciblant les architectures multi-coeurs : algorithmes et implémentations Nguyen, Viet Anh 22 February 2018 (has links) Les temps avancent et les applications temps-réel deviennent de plus en plus gourmandes en ressources. Les plate-formes multi-cœurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en réduisant la taille, le poids, et la consommation énergétique. Le challenge le plus pertinent, lors du déploiement d'un système temps-réel sur une plate-forme multi-cœur, est de garantir les contraintes temporelles des applications temps réel strict s'exécutant sur de telles plate-formes. La difficulté de ce challenge provient d'une interdépendance entre les analyses de prédictabilité temporelle. Cette interdépendance peut être figurativement liée au problème philosophique de l'œuf et de la poule, et expliqué comme suit. L'un des pré-requis des algorithmes d'ordonnancement est le Pire Temps d'Exécution (PTE) des tâches pour déterminer leur placement et leur ordre d'exécution. Mais ce PTE est lui aussi influencé par les décisions de l'ordonnanceur qui va déterminer quelles sont les tâches co-localisées ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partagées et donc le PTE. La plupart des méthodes d'analyse pour les architectures multi-cœurs supputent un seul PTE par tâche, lequel est valide pour toutes conditions d'exécutions confondues. Cette hypothèse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures dotées de caches locaux. Pour de telles architectures, le PTE d'une tâche est dépendant du contenu du cache au début de l'exécution de la dite tâche, qui est lui-même dépendant de la tâche exécutée avant et ainsi de suite. Dans cette thèse, nous proposons de prendre en compte des PTEs incluant les effets des caches privés sur le contexte d’exécution de chaque tâche. Nous proposons dans cette thèse deux techniques d'ordonnancement ciblant des architectures multi-cœurs équipées de caches locaux. Ces deux techniques ordonnancent une application parallèle modélisée par un graphe de tâches, et génèrent un planning statique partitionné et non-préemptif. Nous proposons une méthode optimale à base de Programmation Linéaire en Nombre Entier (PLNE), ainsi qu'une méthode de résolution par heuristique basée sur de l'ordonnancement par liste. Les résultats expérimentaux montrent que la prise en compte des effets des caches privés sur les PTE des tâches réduit significativement la longueur des ordonnancements générés, ce comparé à leur homologue ignorant les caches locaux. Afin de parfaire les résultats ainsi obtenus, nous avons réalisé l'implémentation de nos ordonnancements dirigés par le temps et conscients du cache pour un déploiement sur une machine Kalray MPPA-256, une plate-forme multi-cœur en grappes (clusters). En premier lieu, nous avons identifié les challenges réels survenant lors de ce type d'implémentation, tel que la pollution des caches, la contention induite par le partage du bus, les délais de lancement d'une tâche introduits par la présence de l'ordonnanceur, et l'absence de cohérence des caches de données. En second lieu, nous proposons des stratégies adaptées et incluant, dans la formulation PLNE, les contraintes matérielles ; ainsi qu'une méthode permettant de générer le code final de l'application. Enfin, l'évaluation expérimentale valide la correction fonctionnelle et temporelle de notre implémentation pendant laquelle nous avons pu observé le facteur le plus impactant la longueur de l'ordonnancement: la contention. / Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order. Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches. We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks’ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods. Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution cause by the scheduler, shared bus contention, delay to the start time of tasks, and data cache inconsistency. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules. Ordonnancement temps-Réel Ordonnancements conscient du cache PLNE Ordonnancement par liste Architectures multi-Cœur Kalray MPPA-256 Real-Time scheduling Cache-Conscious schedules ILP List scheduling Multi-Core architectures Kalray MPPA-256

Search results

Mapping Concurrent Applications to Multiprocessor Systems with Multithreaded Processors and Network on Chip-Based Interconnections

Cache-conscious off-line real-time scheduling for multi-core platforms : algorithms and implementation / Ordonnanceur hors-ligne temps-réel et conscient du cache ciblant les architectures multi-coeurs : algorithmes et implémentations