Global ETD Search

161	Conception d’un crypto-système reconfigurable pour la radio logicielle sécurisée Grand, Michaël 02 December 2011 (has links) Les travaux de recherche détaillés dans ce document portent sur la conception et l’implantation d’un composant matériel jouant le rôle du sous-système cryptographique d’une radio logicielle sécurisée.A partir du début des années 90, les systèmes radios ont peu à peu évolué de la radio classique vers la radio logicielle. Le développement de la radio logicielle a permis l’intégration d’un nombre toujours plus grand de standards de communication sur une même plateforme matérielle. La réalisation concrète d’une radio logicielle sécurisée amène son concepteur à faire face à de nombreuses problématiques qui peuvent se résumer par la question suivante : Comment implanter un maximum de standards de communication sur une même plateforme matérielle et logicielle ? Ce document s’intéresse plus particulièrement à l’implantation des standards cryptographiques destinés à protéger les radiocommunications.Idéalement, la solution apportée à ce problème repose exclusivement sur l’utilisation de processeurs numériques. Cependant, les algorithmes cryptographiques nécessitent le plus souvent une puissance de calcul telle que leur implantation sous forme logicielle n’est pas envisageable. Il s’ensuit qu’une radio logicielle doit parfois intégrer des composants matériels dédiés dont l'utilisation entre en conflit avec la propriété de flexibilité propre aux radios logicielles.Or depuis quelques années, le développement de la technologie FPGA a changé la donne. En effet, les derniers FPGA embarquent un nombre de ressources logiques suffisant à l’implantation des fonctions numériques complexes utilisées par la radio logicielle. Plus précisément, la possibilité offerte par les FPGA d'être reconfiguré dans leur totalité (voir même partiellement pour les derniers d’entre eux) fait d’eux des candidats idéaux à l’implantation de composants matériels flexibles et évolutifs dans le temps. À la suite de ces constatations, des travaux de recherche ont été menés au sein de l’équipe Conception des Systèmes Numériques du Laboratoire IMS. Ces travaux ont d’abord débouché sur la publication d’une architecture de sous-système cryptographique pour la radio logicielle sécurisée telle qu’elle est définie par la Software Communication Architecture. Puis, ils se sont poursuivis par la conception et l’implantation d’un cryptoprocesseur multi-cœur dynamiquement reconfigurable sur FPGA. / The research detailed in this document deal with the design and implementation of a hardware integrated circuit intended to be used as a cryptographic sub-system in secure software defined radios.Since the early 90’s, radio systems have gradually evolved from traditional radio to software defined radio. Improvement of the software defined radio has enabled the integration of an increasing number of communication standards on a single radio device. The designer of a software defined radio faces many problems that can be summarized by the following question: How to implement a maximum of communication standards into a single radio device? Specifically, this work focuses on the implementation of cryptographic standards aimed to protect radio communications.Ideally, the solution to this problem is based exclusively on the use of digital processors. However, cryptographic algorithms usually require a large amount of computing power which makes their software implementation inefficient. Therefore, a secure software defined radio needs to incorporate dedicated hardware even if this usage is conflicting with the property of flexibility specific to software defined radios.Yet, in recent years, the improvement of FPGA circuits has changed the deal. Indeed, the latest FPGAs embed a number of logic gates which is sufficient to meet the needs of the complex digital functions used by software defined radios. The possibility offered by FPGAs to be reconfigured in their entirety (or even partially for the last of them) makes them ideal candidates for implementation of hardware components which have to be flexible and scalable over time.Following these observations, research was conducted within the Conception des Systèmes Numériques team of the IMS laboratory. These works led first to the publication of an architecture of cryptographic subsystem compliant with the security supplement of the Software Communication Architecture. Then, they continued with the design and implementation of a partially reconfigurable multi-core cryptoprocessor intended to be used in the latest FPGAs. Radio logicielle sécurisée Software communication architecture Sous-système cryptographique Cryptoprocesseur multi-coeur Fpga Reconfiguration partielle dynamique Secure software defined radio Software communication architecture Cyptographic subsystem Multi-core cryptoprocessor Fpga Partial reconfiguration
162	Improving Last-Level Cache Performance in Single and Multi-Core Processsors Manikanth, R January 2013 (has links) (PDF) With off-chip memory access taking 100's of processor cycles, getting data to the processor in a timely fashion remains one of the key performance bottlenecks in current systems. With increasing core counts, this problem aggravates and the memory access latency becomes even more critical in multi-core systems. Thus the Last Level Cache (LLC) is of particular importance as any miss experienced at the LLC translates into a costly off-chip memory access. A combination of on-chip caches and prefacers is used to hide the off-chip memory access latency. While a hierarchy of caches focus on exploiting locality by retaining useful data, prefacers complement them by initating data accesses early for blocks that are likely to be accessed in future. In the first half of this thesis, we focus on improving the performance of LLC in single-core processors by focusing on prefetchers. In the case of multi-cores, the LLC is shared across many cores and therefore by many programs running on them. Thus, in the second half of this thesis, we focus on novel and efficient management mechanisms for shared LLC to improve the performance of programs running on the various cores. Prefetchers observe a training stream of primary misses in the cache and rely on the regularity present in them to predict and avoid future misses. We quantify the regularity present in the training stream using the information theoretic measure of entropy and study the impact on regularity by extending the training stream to include secondary misses and accesses. We also consider triggering prefetches on secondary misses. We _nd that the extended histories are more regular in general and it is beneficial to trigger prefetches on secondary misses also. However, the best design choice varies on a per-benchmark and prefetcher basis, necessitating a dynamic approach to identify the best prefetcher configuration. We propose an inexpensive bloom filter based dynamic mechanism to identify the best performing prefetch design point at run time. The adaptive scheme improves the performance in terms of Instructions Per Cycle (IPC) by 4.6% on average over a baseline prefetcher. This performance improvement is achieved along with a reduction in memory traffic requirements. It is well known that aggressive prefetching can harm performance due to increased contention for memory bandwidth and cache pollution. Prefetchers treat all loads as equal and try to eliminate as many misses as possible while certain (static) load instructions are known to be more performance critical. As our second contribution, we propose Focused Prefetching, a generic mechanism to introduce performance awareness in prefetching. We identify that a small number of static loads, referred to as Loads Incurring Majority of Commit Stalls (LIMCOS), account for a majority of the commit stalls in processors. We propose simple history-based classifier to identify LIMCOS with high accuracy. We use the classifier to focus the prefetching efforts on LIMCOS. This is achieved in a generic prefetcher-agnostic fashion by filtering the history used by the prefetchers. Focused Prefetching improves performance in terms of IPC by 9.8% for a set of memory intensive SPEC2000 workloads. This performance gain is achieved along with a reduction in memory traffic and an improvement in prefetch accuracy. In the second part of the thesis, we focus on improving the performance of shared caches in multi-core systems. Last level caches are affected by a lack of temporal locality in the access stream as the locality gets filtered out by caches above it. In the case of multi-cores, the interleaving of accesses from the various cores further adds to the problem. To overcome this, we propose a PC-Centric Next-Use Aware Cache Organization (NUcache) for shared caches in multi-cores, with an ability to retain a subset of cache blocks longer. This is achieved by a logical partitioning of the associative ways of a cache set into Main Ways and Deli Ways. While all the blocks have access to the Main Ways, blocks that are likely to be accessed in the near future (with shorter Next-Use distance) are candidates to be retained longer in the Deli Ways to eliminate future misses. We make use of the fact that a small number of PCs, referred to as delinquent PCs, bring in a majority of the cache blocks and learn the Next-Use characteristic of blocks brought in by them. We propose an intelligent cost-benefit based PC-selection mechanism to identify the best set of delinquent PCs that should have access to the Deli Ways to maximize the cache hits. Performance evaluation reveals that NUcache improves the performance (in terms of Average Normalized Turnaround Time, ANTT) of multi-programmed workloads by 6.2%, 13.9%, 15.8% and 19.6% in dual, quad, eight and sixteen core machines respectively. NUcache also performs better than some of the state-of-the-art cache partitioning mechanisms. The last part of the thesis deals with effective shared cache management in multi-core systems to achieve various performance objectives. Explicitly controlling the shared cache occupancy of competing applications is a flexible and practical way to achieve a variety of high level performance goals. Existing solutions control cache occupancy at a coarser granularity, do not scale well to large core counts and, in some cases, lack the flexibility to support a variety of performance goals. To overcome this, we propose Probabilistic Shared Cache Management (PriSM), a framework to manage the cache occupancy of different cores at cache block granularity by controlling their eviction probabilities. The proposed framework requires only simple hardware changes to implement, can scale to larger core count and is flexible enough to support a variety of performance goals like hit-maximization, fairness and QoS. PriSM with Hit-Maximization improves the performance (of multi-programmed workloads) in terms of ANTT by 16.5%, 18.7% and 12.7% over baseline LRU in eight, sixteen and thirty two core machines respectively. Multi-Core Processors Single-Core Processors Single-Core Processors Prefetching Last Level Cache Management Shared Cache Management Next-Use Aware Cache Organization NUcache Last Level Cache (LLC) Computer Science
163	Analyse von Test-Pattern für SoC Multiprozessortest und -debugging mittels Test Access Port (JTAG) Vogelsang, Stefan, Köhler, Steffen, Spallek, Rainer G. 11 June 2007 (has links) Bei der Entwickelung von System-on-Chip (SoC) Debuggern ist es leider hinreichend oft erforderlich den Debugger selbst auf mögliche Fehler zu untersuchen. Da alle ernstzunehmenden Debugger konstruktionsbedingt selbst ein eingebettetes System darstellen, erwächst die Notwendigkeit eine einfache und sicher kontrollierbare Diagnose-Hardware zu entwerfen, welche den Zugang zur Funktionsweise des Debuggers über seine Ausgänge erschließt. Derzeitig ist der Test Access Port (TAP nach IEEE 1149.1-Standard) für viele Integratoren die Grundlage für den Zugriff auf ihre instanzierte Hardware. Selbst in forschungsorientierten Multi- Core System-on-Chip Architekturen wie dem ARM11MP der Firma ARM wird dieses Verfahren noch immer eingesetzt. In unserem Beitrag möchten wir ein Spezialwerkzeug zur Analyse des TAPKommunikationsprotokolles vorstellen, welches den Einsatz teurer Analysetechnik (Logik- Analysatoren) unnötig werden lässt und darüber hinaus eine komfortable, weitergehende Unterstützung für Multi-Core-Systeme bietet. Aufbauend auf der Problematik der Abtastung und Erfassung der Signalzustände am TAP mittels FPGA wird auf die verschiedenen Visualisierungs- und Analyseaspekte der TAPProtokollphasen in einer Multi-Core-Prozessor-Zielsystemumgebung eingegangen. Die hier vorgestellte Lösung ist im Rahmen eines FuE-Verbundprojektes enstanden. Das Vorhaben wird im Rahmen der Technologieförderung mit Mitteln des Europäischen Fonds für regionale Entwicklung (EFRE) 2000-2006 und mit Mitteln des Freistaates Sachsen gefördert. info:eu-repo/classification/ddc/004 ddc:004 info:eu-repo/classification/ddc/500 ddc:500 Eingebettetes System Kommunikationsprotokoll Analyse des TAP-Kommunikationsprotokolls Multi-Core-Systeme System on Chip (SoC)-Debugger
164	Mitteilungen des URZ 3/2009 Clauß, Matthias, Müller, Thomas, Riedel, Wolfgang, Schier, Thomas, Vodel, Matthias 31 August 2009 (has links) Informationen des Universitätsrechenzentrums:Aktuelles zum Dienst 'Server Housing' Neue Technologien im Data-Center Öffentliche Computerpools Interaktives Gebäudeleit- und Infotainmentsystem Kurzinformationen: Personalia, Neue Firefox-Versionen enthalten unser Zertifikat, Vokabeln lernen mit Beolingus Software-News: ANSYS 12.0 verfügbar, Update SimulationX 3.2, Pro/ENGINEER Manikin verfügbar DCE/FCoE Interaktives Gebäudeleitsystem info:eu-repo/classification/ddc/004 ddc:004
165	Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems Hashmi, Jahanzeb Maqbool 17 September 2020 (has links) No description available. Computer Science
166	Diseño, fabricación y caracterización de sensores basados en fibras ópticas de múltiples núcleos Madrigal Madrigal, Javier 14 February 2022 (has links) [ES] La fibra óptica ha supuesto una gran revolución en el mundo de las telecomunicaciones debido a su alta capacidad de transmisión y sus bajas pérdidas. Hoy en día no sería posible transportar la cantidad de tráfico que se genera en internet si no fuera por sis- temas de comunicaciones basados en fibras ópticas. Sin embargo, el número de dispo- sitivos conectados a internet es cada vez mayor, por lo que la capacidad de la fibra óptica estándar de un solo núcleo se puede ver limitada en un futuro no muy lejano. Una forma de aumentar dicha capacidad es utilizar fibras ópticas con varios núcleos. Actualmente existe un gran interés sobre la investigación en este tipo de fibras para aplicaciones de telecomunicaciones, por lo que no es difícil encontrar fibras multinú- cleo comerciales. Aunque el uso más común de la fibra óptica es para telecomunicaciones, también se puede utilizar como sensor. Uno de los métodos más comunes para la implementa- ción de sensores es la inscripción de redes de difracción en fibras ópticas de un solo núcleo. Sin embargo, la inscripción de redes de dirección en fibras de múltiples núcleos abre nuevas líneas de investigación para el desarrollo de sensores avanzados. En esta tesis se ha estudiado distintos tipos de redes de difracción inscritas en una fibra de siete núcleos para su aplicación en la implementación de sensores. En primer lugar, se describe el sistema de fabricación que permite inscribir distintos tipos de redes de difracción en la fibra multinúcleo de forma selectiva, es decir, permite seleccionar en que núcleos se va a inscribir la red. Mediante este sistema se han inscrito redes de periodo largo y posteriormente se han caracterizado como sensor de deformación, tor- sión y curvatura. Después, se han inscrito redes de Bragg inclinadas para aumentar de forma intencionada la diafonía entre los núcleos de la fibra mediante el acoplo de luz entre ellos. Además, se ha demostrado experimentalmente que esta diafonía es sensible a la deformación de la fibra, a la curvatura, a la temperatura y al índice de refracción que rodea la fibra. Por otro lado, se ha demostrado que las redes de Bragg inscritas en fibras multinúcleo se pueden utilizar para implementar sensores de curvatura capaces de operar en entornos radioactivos. Finalmente se han fabricado redes de Bragg rege- neradas capaces de operar a altas temperaturas, estas redes se han caracterizado como sensor de temperatura, deformación y curvatura. / [CAT] La fibra òptica ha suposat una gran revolució en el món de les telecomunicacions a causa de la seua alta capacitat de transmissió i les seues baixes pèrdues. Hui en dia no seria possible transportar la quantitat d'informació que es genera en internet si no fos pels sistemes de comunicacions basats en fibres òptiques. No obstant això, el nombre de dispositius connectats a internet es cada vegada més gran, per la qual cosa la capacitat de la fibra òptica estàndard d'un sol nucli es pot veure limitada en un futur no gaire llunyà. Una manera d'augmentar aquesta capacitat es utilitzar fibres òptiques amb diversos nuclis. Actualment existeix un gran interès sobre la investigació en aquesta mena de fibres per a aplicacions de telecomunicacions, per la qual cosa no es difícil trobar fibres de múltiples nuclis comercials. Encara que l'ús mes comú de la fibra òptica es per a telecomunicacions, també es pot utilitzar com a sensor. Un dels mètodes més comuns per a la implementació de sensors es la inscripció de xarxes de difracció en fibres òptiques d'un sol nucli. No obstant això, la inscripció de xarxes de difracció en fibres de múltiples nuclis obri noves línies d'investigació per al desenvolupament de sensors més complexos. En aquesta tesi s'ha estudiat diferents tipus de xarxes de difracció inscrites en una fibra de set nuclis per a la seua aplicació en la implementació de sensors en fibra òptica. En primer lloc, es descriu el sistema de fabricació de xarxes de difracció que permet inscriure diferents tipus de xarxes de difracció en la fibra de múltiples nuclis de manera selectiva, es a dir, permet seleccionar en que nuclis s'inscriurà la xarxa. Mitjançant aquest sistema s'han inscrit xarxes de període llarg i posteriorment s'han caracteritzat com a sensor de deformació, torsió i curvatura. Després, s'han inscrit xarxes de Bragg inclinades per a augmentar de manera intencionada la diafonia entre els nuclis de la fibra mitjançant l'acoblament de llum entre ells. A més d'això, s'ha demostrat experimentalment que aquesta diafonia es sensible a la deformació de la fibra, a la curvatura, a la temperatura i a l'índex de refracció que envolta la fibra. D'altra banda, s'ha demostrat que les xarxes de Bragg inscrites en fibres múltiples nuclis es poden utilitzar per a implementar sensors de curvatura que poden operar en entorns radioactius. Finalment s'han fabricat xarxes de Bragg regenerades que suporten altes temperatures, aquestes xarxes s'han caracteritzat com a sensor de temperatura, deformació i curvatura. / [EN] Optical fiber has been a great revolution in the world of telecommunications due to its high transmission capacity and low attenuation. Today it would not be possible to transport the amount of traffic that is generated on the Internet without communication systems based on optical fibers. However, the number of devices connected to the Internet is increasing, so the capacity of standard single-core fiber optics may be limited so far in the future. One way to increase this capacity is to use multi-core optical fibers. Nowadays is a great interest in research in this type of fibers for telecommunications applications, so it is not difficult to find commercial multicore fibers. Although the most common use of fiber optics is for telecommunications, it can also be used as a sensor. One of the most common methods for sensor implementation is the inscription of diffraction gratings on single-core optical fibers. However, the enrollment of steering networks in multi-core fibers opens new lines of research for the development of advanced sensors. In this thesis, different types of diffraction gratings inscribed in a seven-core fiber have been studied for their application in the implementation of sensors. In the first place, the diffraction grating manufacturing system is described that allows to inscribe different types of diffraction gratings in the multicore fiber selectively, that is, it allows to select in which cores the grating is going to be inscribed. By means of this system, long-period networks have been inscribed and subsequently they have been characterized as a deformation, torsion, and curvature sensor. Then, slanted Bragg gratings have been inscribed to intentionally increase the crosstalk between the fiber cores by coupling light between them. Furthermore, this crosstalk has been experimentally shown to be sensitive to fiber deformation, curvature, temperature, and the index of refraction surrounding the fiber. On the other hand, it has been shown that Bragg networks inscribed in multicore fibers can be used to implement curvature sensors capable of operating in radioactive environments. Finally, regenerated Bragg nets capable of operating at high temperatures have been manufactured. These nets have been characterized as a temperature, deformation, and curvature sensor. / Agradezco a la Universitat Politècnica de València la beca FPI (PAID-01-18) que me fue concedida para realizar está tesis. / Madrigal Madrigal, J. (2022). Diseño, fabricación y caracterización de sensores basados en fibras ópticas de múltiples núcleos [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/180806 / TESIS Multi Core Fiber (MCF) Tilted Fiber Bragg Gratings (TFBG) Regenerated Fiber Bragg Gratings (RFBGs) Long Period Gratings (LPGs) Fiber Bragg gratings (FBGs) Fibra óptica multinúcleo Sensores Redes de fibra de Bragg inclinadas Redes de fibra de Bragg regeneradas TEORIA DE LA SEÑAL Y COMUNICACIONES
167	Hardware Acceleration of a Neighborhood Dependent Component Feature Learning (NDCFL) Super-Resolution Algorithm Mathari Bakthavatsalam, Pagalavan 22 May 2013 (has links) No description available. Electrical Engineering Computer Engineering Engineering GPU CUDA GPGPU Super-resolution on GPU Acceleration of super-resolution Image Processing NDCFL super-resolution GPU acceleration Multi-core acceleration
168	Specialty Fiber Lasers and Novel Fiber Devices Jollivet, Clemence 01 January 2014 (has links) At the Dawn of the 21st century, the field of specialty optical fibers experienced a scientific revolution with the introduction of the stack-and-draw technique, a multi-steps and advanced fiber fabrication method, which enabled the creation of well-controlled micro-structured designs. Since then, an extremely wide variety of finely tuned fiber structures have been demonstrated including novel materials and novel designs. As the complexity of the fiber design increased, highly-controlled fabrication processes became critical. To determine the ability of a novel fiber design to deliver light with properties tailored according to a specific application, several mode analysis techniques were reported, addressing the recurring needs for in-depth fiber characterization. The first part of this dissertation details a novel experiment that was demonstrated to achieve modal decomposition with extended capabilities, reaching beyond the limits set by the existing mode analysis techniques. As a result, individual transverse modes carrying between ~0.01% and ~30% of the total light were resolved with unmatched accuracy. Furthermore, this approach was employed to decompose the light guided in Large-Mode Area (LMA) fiber, Photonic Crystal Fiber (PCF) and Leakage Channel Fiber (LCF). The single-mode performances were evaluated and compared. As a result, the suitability of each specialty fiber design to be implemented for power-scaling applications of fiber laser systems was experimentally determined. The second part of this dissertation is dedicated to novel specialty fiber laser systems. First, challenges related to the monolithic integration of novel and complex specialty fiber designs in all-fiber systems were addressed. The poor design and size compatibility between specialty fibers and conventional fiber-based components limits their monolithic integration due to high coupling loss and unstable performances. Here, novel all-fiber Mode-Field Adapter (MFA) devices made of selected segments of Graded Index Multimode Fiber (GIMF) were implemented to mitigate the coupling losses between a LMA PCF and a conventional Single-Mode Fiber (SMF), presenting an initial 18-fold mode-field area mismatch. It was experimentally demonstrated that the overall transmission in the mode-matched fiber chain was increased by more than 11 dB (the MFA was a 250 ?m piece of 50 ?m core diameter GIMF). This approach was further employed to assemble monolithic fiber laser cavities combining an active LMA PCF and fiber Bragg gratings (FBG) in conventional SMF. It was demonstrated that intra-cavity mode-matching results in an efficient (60%) and narrow-linewidth (200 pm) laser emission at the FBG wavelength. In the last section of this dissertation, monolithic Multi-Core Fiber (MCF) laser cavities were reported for the first time. Compared to existing MCF lasers, renown for high-brightness beam delivery after selection of the in-phase supermode, the present new generation of 7-coupled-cores Yb-doped fiber laser uses the gain from several supermodes simultaneously. In order to uncover mode competition mechanisms during amplification and the complex dynamics of multi-supermode lasing, novel diagnostic approaches were demonstrated. After characterizing the laser behavior, the first observations of self-mode-locking in linear MCF laser cavities were discovered. Optical fiber specialty fiber fiber laser microstructured fiber mode analysis s^2 imaging leackage channel fiber photonic crystal fiber large mode area fiber multi core fiber mode field adapter self mode locked laser Electromagnetics and Photonics Optics
169	Programming Model and Protocols for Reconfigurable Distributed Systems Arad, Cosmin January 2013 (has links) Distributed systems are everywhere. From large datacenters to mobile devices, an ever richer assortment of applications and services relies on distributed systems, infrastructure, and protocols. Despite their ubiquity, testing and debugging distributed systems remains notoriously hard. Moreover, aside from inherent design challenges posed by partial failure, concurrency, or asynchrony, there remain significant challenges in the implementation of distributed systems. These programming challenges stem from the increasing complexity of the concurrent activities and reactive behaviors in a distributed system on the one hand, and the need to effectively leverage the parallelism offered by modern multi-core hardware, on the other hand. This thesis contributes Kompics, a programming model designed to alleviate some of these challenges. Kompics is a component model and programming framework for building distributed systems by composing message-passing concurrent components. Systems built with Kompics leverage multi-core machines out of the box, and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic execution replay for debugging, testing, and reproducible behavior evaluation for large-scale Kompics distributed systems. The same system code is used for both simulation and production deployment, greatly simplifying the system development, testing, and debugging cycle. We highlight the architectural patterns and abstractions facilitated by Kompics through a case study of a non-trivial distributed key-value storage system. CATS is a scalable, fault-tolerant, elastic, and self-managing key-value store which trades off service availability for guarantees of atomic data consistency and tolerance to network partitions. We present the composition architecture for the numerous protocols employed by the CATS system, as well as our methodology for testing the correctness of key CATS algorithms using the Kompics simulation framework. Results from a comprehensive performance evaluation attest that CATS achieves its claimed properties and delivers a level of performance competitive with similar systems which provide only weaker consistency guarantees. More importantly, this testifies that Kompics admits efficient system implementations. Its use as a teaching framework as well as its use for rapid prototyping, development, and evaluation of a myriad of scalable distributed systems, both within and outside our research group, confirm the practicality of Kompics. / Kompics / CATS / REST distributed systems programming model message-passing concurrency nested hierarchical composition reactive components software architecture dynamic reconfiguration multi-core discrete-event simulation peer-to-peer testing debugging distributed key-value stores data replication consistency linearizability network partition tolerance consistent hashing self-organization scalability elasticity fault tolerance consistent quorums
170	Cache-conscious off-line real-time scheduling for multi-core platforms : algorithms and implementation / Ordonnanceur hors-ligne temps-réel et conscient du cache ciblant les architectures multi-coeurs : algorithmes et implémentations Nguyen, Viet Anh 22 February 2018 (has links) Les temps avancent et les applications temps-réel deviennent de plus en plus gourmandes en ressources. Les plate-formes multi-cœurs sont apparues dans le but de satisfaire les demandes des applications en ressources, tout en réduisant la taille, le poids, et la consommation énergétique. Le challenge le plus pertinent, lors du déploiement d'un système temps-réel sur une plate-forme multi-cœur, est de garantir les contraintes temporelles des applications temps réel strict s'exécutant sur de telles plate-formes. La difficulté de ce challenge provient d'une interdépendance entre les analyses de prédictabilité temporelle. Cette interdépendance peut être figurativement liée au problème philosophique de l'œuf et de la poule, et expliqué comme suit. L'un des pré-requis des algorithmes d'ordonnancement est le Pire Temps d'Exécution (PTE) des tâches pour déterminer leur placement et leur ordre d'exécution. Mais ce PTE est lui aussi influencé par les décisions de l'ordonnanceur qui va déterminer quelles sont les tâches co-localisées ou concurrentes propageant des effets sur les caches locaux et les ressources physiquement partagées et donc le PTE. La plupart des méthodes d'analyse pour les architectures multi-cœurs supputent un seul PTE par tâche, lequel est valide pour toutes conditions d'exécutions confondues. Cette hypothèse est beaucoup trop pessimiste pour entrevoir un gain de performance sur des architectures dotées de caches locaux. Pour de telles architectures, le PTE d'une tâche est dépendant du contenu du cache au début de l'exécution de la dite tâche, qui est lui-même dépendant de la tâche exécutée avant et ainsi de suite. Dans cette thèse, nous proposons de prendre en compte des PTEs incluant les effets des caches privés sur le contexte d’exécution de chaque tâche. Nous proposons dans cette thèse deux techniques d'ordonnancement ciblant des architectures multi-cœurs équipées de caches locaux. Ces deux techniques ordonnancent une application parallèle modélisée par un graphe de tâches, et génèrent un planning statique partitionné et non-préemptif. Nous proposons une méthode optimale à base de Programmation Linéaire en Nombre Entier (PLNE), ainsi qu'une méthode de résolution par heuristique basée sur de l'ordonnancement par liste. Les résultats expérimentaux montrent que la prise en compte des effets des caches privés sur les PTE des tâches réduit significativement la longueur des ordonnancements générés, ce comparé à leur homologue ignorant les caches locaux. Afin de parfaire les résultats ainsi obtenus, nous avons réalisé l'implémentation de nos ordonnancements dirigés par le temps et conscients du cache pour un déploiement sur une machine Kalray MPPA-256, une plate-forme multi-cœur en grappes (clusters). En premier lieu, nous avons identifié les challenges réels survenant lors de ce type d'implémentation, tel que la pollution des caches, la contention induite par le partage du bus, les délais de lancement d'une tâche introduits par la présence de l'ordonnanceur, et l'absence de cohérence des caches de données. En second lieu, nous proposons des stratégies adaptées et incluant, dans la formulation PLNE, les contraintes matérielles ; ainsi qu'une méthode permettant de générer le code final de l'application. Enfin, l'évaluation expérimentale valide la correction fonctionnelle et temporelle de notre implémentation pendant laquelle nous avons pu observé le facteur le plus impactant la longueur de l'ordonnancement: la contention. / Nowadays, real-time applications are more compute-intensive as more functionalities are introduced. Multi-core platforms have been released to satisfy the computing demand while reducing the size, weight, and power requirements. The most significant challenge when deploying real-time systems on multi-core platforms is to guarantee the real-time constraints of hard real-time applications on such platforms. This is caused by interdependent problems, referred to as a chicken and egg situation, which is explained as follows. Due to the effect of multi-core hardware, such as local caches and shared hardware resources, the timing behavior of tasks are strongly influenced by their execution context (i.e., co-located tasks, concurrent tasks), which are determined by scheduling strategies. Symetrically, scheduling algorithms require the Worst-Case Execution Time (WCET) of tasks as prior knowledge to determine their allocation and their execution order. Most schedulability analysis techniques for multi-core architectures assume a single WCET per task, which is valid in all execution conditions. This assumption is too pessimistic for parallel applications running on multi-core architectures with local caches. In such architectures, the WCET of a task depends on the cache contents at the beginning of its execution, itself depending on the task that was executed before the task under study. In this thesis, we address the issue by proposing scheduling algorithms that take into account context-sensitive WCETs of tasks due to the effect of private caches. We propose two scheduling techniques for multi-core architectures equipped with local caches. The two techniques schedule a parallel application modeled as a task graph, and generate a static partitioned non-preemptive schedule. We propose an optimal method, using an Integer Linear Programming (ILP) formulation, as well as a heuristic method based on list scheduling. Experimental results show that by taking into account the effect of private caches on tasks’ WCETs, the length of generated schedules are significantly reduced as compared to schedules generated by cache-unaware scheduling methods. Furthermore, we perform the implementation of time-driven cache-conscious schedules on the Kalray MPPA-256 machine, a clustered many-core platform. We first identify the practical challenges arising when implementing time-driven cache-conscious schedules on the machine, including cache pollution cause by the scheduler, shared bus contention, delay to the start time of tasks, and data cache inconsistency. We then propose our strategies including an ILP formulation for adapting cache-conscious schedules to the identified practical factors, and a method for generating the code of applications to be executed on the machine. Experimental validation shows the functional and the temporal correctness of our implementation. Additionally, shared bus contention is observed to be the most impacting factor on the length of adapted cache-conscious schedules. Ordonnancement temps-Réel Ordonnancements conscient du cache PLNE Ordonnancement par liste Architectures multi-Cœur Kalray MPPA-256 Real-Time scheduling Cache-Conscious schedules ILP List scheduling Multi-Core architectures Kalray MPPA-256

Search results