Global ETD Search

11	Run-time scalable NoC for virtualized FPGA based accelerators as cloud services / NoC évolutif à l'exécution pour les accélérateurs basés sur FPGA virtualisés en tant que services cloud Kidane, Hiliwi Leake 05 November 2018 (has links) Ces dernières années, les fournisseurs de cloud et les centres de données ont intégrés les FPGA dans leur environnement à des fins d'accélération. Cela est dû au fait que les accélérateurs à base de FPGA sont connus pour leur faible puissance et leurs bonnes performances par watt. En outre, l'introduction de la capacité de reconfiguration partielle dynamique (DPR) de certains FPGA incite les chercheurs de l'industrie et des universitaires à proposer des services de cloud FPGA virtualisés baser sur DPR. Dans la plupart des travaux existants, l'interconnexion entre les vFPGA repose soit sur les réseaux BUS ou OpenFlow. Cependant le bus et OpenFlow ne sont pas des solutions optimales pour la virtualisation.Dans cette thèse, nous avons proposé un NoC évolutif à l'exécution pour les accélérateurs basés sur FPGA virtualisés dans un cloud computing. Les composants NoC s'adapteront dynamiquement aux nombres d'accélérateurs virtualisés actifs en ajoutant et en supprimant des sous-noC. Pour minimiser la complexité de la conception de l'architecture NoC à un niveau inférieur (implémentation HDL), nous avons proposé un langage de modélisation unifié de haut niveau (UML) basé sur une ingénierie dirigée par les modèles. Une approche basée sur UML / MARTE et IP-XACT est utilisée pour définir les composants de la topologie NoC de haut niveau et générer les fichiers HDL requis. Les résultats des expériences montrent que le NoC évolutif à l'exécution peut réduire la consommation d'énergie de 17%. La caractérisation NoC sur la modélisation de haut niveau basée sur MDE réduit également le temps de conception de 25%. / In the last few years, cloud providers and data centers have been integrating FPGAs in their environment for acceleration purpose. This is due to the fact that FPGA based accelerator are known for their lower power and good performance per watt. Moreover, the introduction of the ability for dynamic partial reconfiguration (DPR) of some FPGAs trigger researchers in both industry and academics to propose DPR based virtualized FPGA (vFPGA) cloud services. In most of the existing works, the interconnection between the vFPGAs relies either on BUS or OpenFlow networks. However, both the bus and OpenFlow are not virtualization-aware and optimal solutions. In this thesis, we have proposed a virtualization-aware dynamically scalable NoC for virtualized FPGA accelerators in cloud computing. The NoC components will adapt to the number of active virtualized accelerator dynamically by adding and removing sub-NoCs. To minimize the complexity of NoC architecture design at a low level (HDL implementation), we have proposed a Model-Driven Engineering (MDE) based high-level unified modeling language (UML). A UML/MARTE and IP-XACT based approach are used to define the NoC Topology components at a high-level and generate the required HDL files. Experiment results show that the dynamically scalable NoC can reduce the power consumption by 17%. The MDE based high-level modeling based NoC characterization also reduce the design time by 25%. Réseau sur puce (NoC) FPGA virtualisé L'informatique en nuage La reconfiguration dynamique Network-On-Chip (NoC) Virtualized-FPGA Cloud computing Dynamic partial reconfiguration 004.5 004.6
12	DESIGN AND PROTOTYPE OF RESOURCE NETWORK INTERFACES FOR NETWORK ON CHIP Mahmood, Adnan, Mohammed, Zaheer Ahmed January 2009 (has links) Network on Chip (NoC) has emerged as a competitive and efficient communication infrastructure for the core based design of System on Chip. Resource (core), router and interface between router and core are the three main parts of a NoC. Each core communicates with the network through the interface, also called Resource Network Interface (RNI). One approach to speed up the design at NoC based systems is to develop standardized RNI. Design of RNI depends to some extent on the type of routing technique used in NoC. Control of route decision base the categorization of source and distributed routing algorithms. In source routing a complete path to the destination is provided in the packet header at the source, whereas in distributed routing, the path is dynamically computed in routers as the packet moves through the network. Buffering, flitization, deflitization and transfer of data from core to router and vice versa, are common responsibilities of RNI in both types of routing. In source routing, RNI has an extra functionality of storing complete paths to all destinations in tables, extracting path to reach a desired destination and adding it in the header flit. In this thesis, we have made an effort towards designing and prototyping a standardized and efficient RNI for both source and distributed routing. VHDL is used as a design language and prototyping of both types RNI has been carried out on Altera DE2 FPGA board. Testing of RNI was conducted by using Nios II soft core. Simulation results show that the best case flit latency, for both types RNI is 4 clock cycles. RNI design is also resource efficient because it consumes only 2% of the available resources on the target platform. Network on Chip (NoC) System on Chip (SoC) Resource Network Interface (RNI) Altera FPGA Nios II Core On Chip Communication Distributed Routing Source Routing Electrical engineering Elektroteknik
13	DESIGN AND PROTOTYPE OF RESOURCE NETWORK INTERFACES FOR NETWORK ON CHIP Mahmood, Adnan, Mohammed, Zaheer Ahmed January 2009 (has links) <p>Network on Chip (NoC) has emerged as a competitive and efficient communication infrastructure for the core based design of System on Chip. Resource (core), router and interface between router and core are the three main parts of a NoC. Each core communicates with the network through the interface, also called Resource Network Interface (RNI). One approach to speed up the design at NoC based systems is to develop standardized RNI. Design of RNI depends to some extent on the type of routing technique used in NoC. Control of route decision base the categorization of source and distributed routing algorithms. In source routing a complete path to the destination is provided in the packet header at the source, whereas in distributed routing, the path is dynamically computed in routers as the packet moves through the network. Buffering, flitization, deflitization and transfer of data from core to router and vice versa, are common responsibilities of RNI in both types of routing. In source routing, RNI has an extra functionality of storing complete paths to all destinations in tables, extracting path to reach a desired destination and adding it in the header flit. In this thesis, we have made an effort towards designing and prototyping a standardized and efficient RNI for both source and distributed routing. VHDL is used as a design language and prototyping of both types RNI has been carried out on Altera DE2 FPGA board. Testing of RNI was conducted by using Nios II soft core. Simulation results show that the best case flit latency, for both types RNI is 4 clock cycles. RNI design is also resource efficient because it consumes only 2% of the available resources on the target platform.</p> Network on Chip (NoC) System on Chip (SoC) Resource Network Interface (RNI) Altera FPGA Nios II Core On Chip Communication Distributed Routing Source Routing Electrical engineering Elektroteknik
14	Exploration architecturale et étude des performances des réseaux sur puce 3D partiellement connectés verticalement / Architectural exploration and performance analysis of Vertically-Partially-Connected Mesh-based 3D-NoC Bahmani, Maryam 09 December 2013 (has links) L'utilisation de la troisième dimension peut entraîner une réduction significative de la puissance et de la latence moyenne du trafic dans les réseaux sur puce (Network-on-Chip). La technologie des vias à travers le substrat (ou Through-Silicon Via) est la technologie la plus prometteuse pour l'intégration 3D, car elle offre des liens verticaux courts qui remédient au problème des longs fils dans les NoCs-2D. Les TSVs sont cependant énormes et les processus de fabrication sont immatures, ce qui réduit le rendement des systèmes sur puce à base de NoC-3D. Par conséquent, l'idée de réseaux sur puce 3D partiellement connectés verticalement a été introduite pour bénéficier de la technologie 3D tout en conservant un haut rendement. En outre, de tels réseaux sont flexibles, car le nombre, l'emplacement et l'affectation des liens verticaux dans chaque couche peuvent être décidés en fonction des exigences de l'application. Cependant, ce type de réseaux pose un certain nombre de défis : Le routage est le problème majeur, car l'élimination de certains liens verticaux fait que l'on ne peut utiliser les algorithmes classiques qui suivent l'ordre des dimensions. Pour répondre à cette question nous expliquons et évaluons un algorithme de routage déterministe appelé “Elevator First”, qui garanti d'une part que si un chemin existe, alors on le trouve, et que d'autre part il n'y aura pas d'interblocages. Fondamentalement, la performance du NoC est affecté par a) la micro architecture des routeurs et b) l'architecture d'interconnexion. L'architecture du routeur a un effet significatif sur la performance du NoC, à cause de la latence qu'il induit. Nous présentons la conception et la mise en œuvre de la micro-architecture d'un routeur à faible latence implantantl'algorithme de routage Elevator First, qui consomme une quantité raisonnable de surface et de puissance. Du point de vue de l'architecture, le nombre et le placement des liens verticaux ont un rôle important dans la performance des réseaux 3D partiellement connectés verticalement, car ils affectent le nombre moyen de sauts et le taux d'utilisation des FIFOs dans le réseau. En outre, l'affectation des liens verticaux vers les routeurs qui n'ont pas de ports vers le haut ou/et le bas est une question importante qui influe fortement sur les performances. Par conséquent, l'exploration architecturale des réseaux sur puce 3D partiellement connectés verticalement est importante. Nous définissons, étudions et évaluons des paramètres qui décrivent le comportement du réseau, de manière à déterminer le placement et l'affectation des liens verticaux dans les couches de manière simple et efficace. Nous proposons une méthode d'estimation quadratique visantà anticiper le seuil de saturation basée sur ces paramètres. / Utilization of the third dimension can lead to a significant reduction in power and average hop-count in Networks- on-Chip (NoC). TSV technology, as the most promising technology in 3D integration, offers short and fast vertical links which copes with the long wire problem in 2D NoCs. Nonetheless, TSVs are huge and their manufacturing process is still immature, which reduces the yield of 3D NoC based SoC. Therefore, Vertically-Partially-Connected 3D-NoC has been introduced to benefit from both 3D technology and high yield. Moreover, Vertically-Partially-Connected 3D-NoC is flexible, due to the fact that the number, placement, and assignment of the vertical links in each layer can be decided based on the limitations and requirements of the design. However, there are challenges to present a feasible and high-performance Vertically-Partially-Connected Mesh-based 3D-NoC due to the removed vertical links between the layers. This thesis addresses the challenges of Vertically-Partially-Connected Mesh-based 3D-NoC: Routing is the major problem of the Vertically-Partially-Connected 3D-NoC. Since some vertical links are removed, some of the routers do not have up or/and down ports. Therefore, there should be a path to send a packet to upper or lower layer which obviously has to be determined by a routing algorithm. The suggested paths should not cause deadlock through the network. To cope with this problem we explain and evaluate a deadlock- and livelock-free routing algorithm called Elevator First. Fundamentally, the NoC performance is affected by both 1) micro-architecture of routers and 2) architecture of interconnection. The router architecture has a significant effect on the performance of NoC, as it is a part of transportation delay. Therefore, the simplicity and efficiency of the design of NoC router micro architecture are the critical issues, especially in Vertically-Partially-Connected 3D-NoC which has already suffered from high average latency due to some removed vertical links. Therefore, we present the design and implementation the micro-architecture of a router which not only exactly and quickly transfers the packets based on the Elevator First routing algorithm, but it also consumes a reasonable amount of area and power. From the architecture point of view, the number and placement of vertical links have a key role in the performance of the Vertically-Partially-Connected Mesh-based 3D-NoC, since they affect the average hop-count and link and buffer utilization in the network. Furthermore, the assignment of the vertical links to the routers which do not have up or/and down port(s) is an important issue which influences the performance of the 3D routers. Therefore, the architectural exploration of Vertically-Partially-Connected Mesh-based 3D-NoC is both important and non-trivial. We define, study, and evaluate the parameters which describe the behavior of the network. The parameters can be helpful to place and assign the vertical links in the layers effectively. Finally, we propose a quadratic-based estimation method to anticipate the saturation threshold of the network's average latency. Intégration Tridimensionnelle Réseaux sur Puce Interconnexion Routage en Réseaux SoC Routeur de Réseaux sur Puce Three Dimensional integration Network-on-Chip (NoC) Interconnection Network Routing System-on-Chip (SoC) NoC Router 004
15	Securing Multiprocessor Systems-on-Chip Biswas, Arnab Kumar 16 August 2016 (has links) (PDF) MHRD PhD scholarship / With Multiprocessor Systems-on-Chips (MPSoCs) pervading our lives, security issues are emerging as a serious problem and attacks against these systems are becoming more critical and sophisticated. We have designed and implemented different hardware based solutions to ensure security of an MPSoC. Security assisting modules can be implemented at different abstraction levels of an MPSoC design. We propose solutions both at circuit level and system level of abstractions. At the VLSI circuit level abstraction, we consider the problem of presence of noise voltage in input signal coming from outside world. This noise voltage disturbs the normal circuit operation inside a chip causing false logic reception. If the disturbance is caused intentionally the security of a chip may be compromised causing glitch/transient attack. We propose an input receiver with hysteresis characteristic that can work at voltage levels between 0.9V and 5V. The circuit can protect the MPSoC from glitch/transient attack. At the system level, we propose solutions targeting Network-on-Chip (NoC) as the on-chip communication medium. We survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC enabled MPSoC. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple Trusted Execution Environments (TEEs). Software attacks, the most common type of attacks, mainly exploit vulnerabilities like buffer overflow. This is possible if proper access control to memory is absent in the system. We propose four hardware based mechanisms to implement Role Based Access Control (RBAC) model in NoC based MPSoC. Network-on-Chip (NoC) Computer Security Multiprocessor Systems-on-Chips (MPSoCs) MP-SoC Role Based Access Control (RBAC) Role Based Shared Memory Access Control MPSoC Electronics Engineering
16	Stratégie de fiabilisation au niveau système des architectures MPSoC / Dependable Reconfigurable Processor Array (RPA) Hebert, Nicolas 06 July 2011 (has links) Cette thèse s'inscrit dans un contexte où chaque saut technologique, voit apparaitre des circuits intégrés produits de plus en plus tôt dans la phase de qualification et où la technologie de ces circuits intégrés se rapproche de plus en plus des limitations physiques de la matière. Malgré des contre-mesures technologiques, on se retrouve devant un taux de défaillance grandissant ce qui crée des conditions favorables au retour des techniques de tolérance aux fautes sur les circuits intégrés non critiques.La densité d'intégration atteinte aujourd'hui nous permet de considérer les réseaux reconfigurables de processeur comme des architectures SoC d'avenir. En effet, l'homogénéité de ces architectures laisse entrevoir des reconfigurations possibles de la plateforme qui permettraient d'assurer une qualité de service et donc une fiabilité minimum en présence de défauts. Ainsi, de nouvelles solutions de protection doivent être proposées pour garantir le bon fonctionnement des circuits non plus uniquement au niveau de quelques sous-fonctionnalités critiques mais au niveau architecture système lui-même.En s'appuyant sur ces prérogatives, nous présentons une méthode de protection distribuée et dynamique innovatrice, D-Scale. La méthode consiste à détecter, isoler et recouvrir les systèmes en présence d'erreurs de type « crash ». La détection des erreurs qui ont pour conséquence un « crash » de la plateforme est basée sur un mécanisme de messages de diagnostique échangés entre les unités de traitement. La phase de recouvrement est quant à elle basée sur un mécanisme permettant la reconfiguration de la plateforme de manière autonome. Une implémentation de cette protection matérielle et logicielle est proposée. Le coût de protection est réduit afin d'être intégré dans de futures architectures multiprocesseurs. Finalement, un outil d'évaluation d'impacte des fautes sur la plateforme est aussi étudié afin de valider l'efficacité de la protection. / This thesis is placed in a context where, for each technology node, integrated circuits are design at an earlier stage in the qualification process and where the CMOS technology appears to be closer to the silicon physical limitations. Despite technological countermeasure, we face an increase in the failure rate which creates conditions in favor of the return of fault-tolerant techniques for non-critical integrated circuits.Nowadays, we have reached such an integration density that we can consider the reconfigurable processor array as future SoC architectures. Indeed, these homogenous architectures suggest possible platform reconfigurations that would ensure quality of service and consequently a minimum reliability in presence of defects. Thus, new protection solutions must be proposed to ensure circuit smooth operations not only for sub-critical functionalities but at the system architecture level itself.Based on these prerogatives, we present an innovative dynamical and distributed protection method, named D-Scale. This method consists in detecting, isolating and recovering the systems in the presence of error which lead to a "crash" of the platform. The crash error detection is based on heartbeat specific messages exchanged between PEs. The recovery phase is based on an autonomous mechanism which reconfigures the platform.A hardware/software implementation was proposed and evaluated. The protection cost is reduced in order to be integrated within future multi-processor SoC architectures. Finally, a fault effect analysis tool is studied in order to validate the fault-tolerant method robustness. MP-SoC Système tolerant aux fautes MPSoC Fault tolerant system
17	EVALUATION OF SOURCE ROUTING FOR MESH TOPOLOGY NETWORK ON CHIP PLATFORMS MUBEEN, SAAD January 2009 (has links) <p>Network on Chip is a scalable and flexible communication infrastructure for the design of core based System on Chip. Communication performance of a NoC depends heavily on the routing algorithm. Deterministic and adaptive distributed routing algorithms have been advocated in all the current NoC architectural proposals. In this thesis we make a case for the use of source routing for NoCs, especially for regular topologies like mesh. The advantages of source routing include in-order packet delivery; faster and simpler router design; and possibility of mixing non-minimal paths in a mainly minimal routing. We propose a method to compute paths for various communications in such a way that traffic congestion is avoided while ensuring deadlock free routing. We also propose an efficient scheme to encode the paths.</p><p>We developed a tool in Matlab that computes paths for source routing for both general and application specific communications. Depending upon the type of traffic, this tool computes paths for source routing by selecting best routing algorithm out of many routing algorithms. The tool uses a constructive path improvement algorithm to compute paths that give more uniform link load distribution. It also generates different types of traffics. We also developed a simulator capable of simulating source routing for mesh topology NoC. The experiments and simulations which we performed were successful and the results show that the advantages of source routing especially lower packet latency more than compensate its disadvantages. The results also demonstrate that source routing can be a good routing candidate for practical core based SoCs design using network on chip communication infrastructure.</p> Network on Chip (NoC) System on Chip (SoC) Core Based Design On Chip Communication Distributed Routing Source Routing Routing Algorithms Performance Analysis Packet Switched Network Electronics Elektronik Electrical engineering Elektroteknik Computer engineering Datorteknik
18	Estudo da viabilidade do desenvolvimento de sistemas integrados baseados em redes em chip sem processadores: sistema IPNoSys / The study of viability of development of no processor integrated system based on network-on-chip: IPNoSys system Ara?jo, S?lvio Roberto Fernandes de 11 April 2008 (has links) Made available in DSpace on 2014-12-17T15:47:45Z (GMT). No. of bitstreams: 1 SilvioRFA.pdf: 3522539 bytes, checksum: 0e7ac6eda46a29d5f5968d779986fb03 (MD5) Previous issue date: 2008-04-11 / The increase of capacity to integrate transistors permitted to develop completed systems, with several components, in single chip, they are called SoC (System-on-Chip). However, the interconnection subsystem cans influence the scalability of SoCs, like buses, or can be an ad hoc solution, like bus hierarchy. Thus, the ideal interconnection subsystem to SoCs is the Network-on-Chip (NoC). The NoCs permit to use simultaneous point-to-point channels between components and they can be reused in other projects. However, the NoCs can raise the complexity of project, the area in chip and the dissipated power. Thus, it is necessary or to modify the way how to use them or to change the development paradigm. Thus, a system based on NoC is proposed, where the applications are described through packages and performed in each router between source and destination, without traditional processors. To perform applications, independent of number of instructions and of the NoC dimensions, it was developed the spiral complement algorithm, which finds other destination until all instructions has been performed. Therefore, the objective is to study the viability of development that system, denominated IPNoSys system. In this study, it was developed a tool in SystemC, using accurate cycle, to simulate the system that performs applications, which was implemented in a package description language, also developed to this study. Through the simulation tool, several result were obtained that could be used to evaluate the system performance. The methodology used to describe the application corresponds to transform the high level application in data-flow graph that become one or more packages. This methodology was used in three applications: a counter, DCT-2D and float add. The counter was used to evaluate a deadlock solution and to perform parallel application. The DCT was used to compare to STORM platform. Finally, the float add aimed to evaluate the efficiency of the software routine to perform a unimplemented hardware instruction. The results from simulation confirm the viability of development of IPNoSys system. They showed that is possible to perform application described in packages, sequentially or parallelly, without interruptions caused by deadlock, and also showed that the execution time of IPNoSys is more efficient than the STORM platform / O aumento na capacidade de integra??o de transistores permitiu o desenvolvimento de sistemas completos, com in?meros componentes, dentro de um ?nico chip, s?o os chamados SoCs (System-on-Chip). No entanto, o subsistema de interconex?o utilizado pode limitar a escalabilidade dos SoCs, como os barramentos, ou ser uma solu??o ad hoc, como a hierarquia de barramentos. Desse modo, a solu??o ideal para interconex?o no SoCs s?o as redes em chip ou NoCs (Network-on-Chip). As NoCs permitem m?ltiplas conex?o ponto-a-ponto entre os componente e podem ser reusadas em projetos diversos. Entretanto, o uso de NoCs pode representar o aumento na complexidade do projeto do sistema, da ?rea em chip e/ou pot?ncia dissipada. Dessa forma, ? necess?rio ampliar o horizonte de utiliza??o dos sistemas ou quebrar o paradigma do seu desenvolvimento. Assim, ? proposto um sistema baseado em uma NoC, onde as aplica??es s?o descritas em forma de pacotes e executadas de roteador em roteador durante o percurso entre origem e destino dos pacotes, sem a necessidade do uso de processadores convencionais. Para permitir a execu??o de aplica??es, independente do n?mero de instru??es e das dimens?es da rede, foi desenvolvido o algoritmo spiral complement, que permite re-rotear pacotes at? que todas as instru??es contidas nele sejam executadas. Portanto, o objetivo desse trabalho foi estudar a viabilidade do desenvolvimento de tal sistema, denominado sistema IPNoSys. Nesse estudo, foi desenvolvida em SystemC, com precis?o de ciclo, uma ferramenta para simula??o do sistema, a qual permite executar aplica??es implementadas na linguagem de descri??o de pacotes, tamb?m desenvolvida para esse fim. Atrav?s da ferramenta podem ser obtidos diversos resultados que permitem avaliar o funcionamento e desempenho do sistema. A metodologia empregada para descri??o das aplica??es corresponde, a priori, em obter o grafo de fluxo de dados da aplica??o em alto n?vel, e desse grafo descrev?-la em um ou mais pacotes. Utilizando essa metodologia, foram realizados tr?s estudos de casos: contador, DCT-2D e adi??o de ponto flutuante. O contador foi usado para avaliar a capacidade do sistema em tratar situa??es de deadlock e executar aplica??es em paralelo. A DCT-2D foi utilizada para realizar compara??es com a plataforma STORM. E, finalmente, a adi??o de ponto flutuante teve como objetivo ser usada como rotina de tratamento de uma instru??o n?o implementada em hardware. Os resultados de simula??o apontam favoravelmente com rela??o ? viabilidade do desenvolvimento do sistema IPNoSys. Mostrando que ? poss?vel executar aplica??es em forma de pacotes, inclusive paralelamente, sem interrup??es provocadas por eventuais deadlocks, e ainda indicam maior efici?ncia do sistema IPNoSys a respeito do tempo de execu??o comparada a plataforma STORM Sistema em chip (SoC) Redes em chip (NoC) Algoritmo spiral complement Sistema IPNoSys System-on-chip (SoC) Network-on-chip (NoC) Spiral complement algorithm IPNoSys system
19	Hiérarchie mémoire dans les systèmes intégrés multiprocesseurs construits autour de réseaux sur puce / Memory hierarchy in embedded multiprocessor system built around networks on chip Belhadj Amor, Hela 05 October 2017 (has links) Les systèmes parallèles de type multi/pluri-cœurs permettant d'obtenir une grande puissance de calcul à bas coût énergétique sont de nos jours une réalité. Néanmoins, l'exploitation des performances de ces architectures dépend de l'efficacité du système à gérer les accès aux données. Le but de nos travaux est d'améliorer l'efficacité de ces accès en exploitant les caractéristiques de l'architecture matérielle.Dans une première partie, nous proposons une nouvelle organisation de la hiérarchie des mémoires caches qui maximise l'utilisation de l'espace de stockage disponible à chaque niveau. Cette solution, basée sur les architectures à accès non uniforme au cache (NUCA), supporte les transferts inter et intra-niveau de la hiérarchie. Elle requiert un protocole de cohérence de cache qui s'adapte à ses spécifications.Certes, le transfert des données au niveau de la hiérarchie est aussi un déterminant de la performance du système. Dans une seconde partie, nous prenons en compte les besoins de communication spécifiques du protocole. Nous proposons un réseau virtualisé comme support de communication ad-hoc afin de gérer le trafic de cohérence à moindre coût. Ce dernier relie les caches d'un même niveau pour supporter les transferts intra-niveaux, qui sont une spécificité de notre protocole, en vue de réduire la latence moyenne d'accès. / Multi/many-cores parallel systems for high-power computing at low energy costs are nowadays a reality. However, exploiting the performance of these architectures depends on the efficiency of the system in managing data accesses. The aim of our work is to improve the efficiency of these accesses by exploiting the hardware architecture characteristics.In a first part, we propose a new cache hierarchy organization that aims at maximizing the use of the available storage space at each level. This solution, based on non-uniform cache access architectures (NUCA), supports inter and intra-level transfers of the hierarchy. It requires a cache coherency protocol that suits its specifications.Obviously, the transfer of data in the hierarchy is also a determinant of the system performance. In a second part, we consider the specific communication needs of the protocol. We suggest the use of a virtualized network as an ad-hoc communication medium to manage consistency traffic at a lower cost. It links the caches of the same level to support intra-level transfers, which are a specificity of our protocol, in order to reduce the average access latency. Hiérarchie mémoire Cohérence Réseau sur Puce (NoC) Non Uniform Cache Architecture (NUCA) Multi-Processor System-On-Chip (MPSoC Memory hierarchy Consistency Network-On-Chip (NoC) Non Uniform Cache Architecture (NUCA) 004
20	Worst-case delay analysis of core-to-IO flows over many-cores architectures / Analyse des délais pire cas des flux entre coeur et interfaces entrées/sorties sur des architectures pluri-coeurs Abdallah, Laure 05 April 2017 (has links) Les architectures pluri-coeurs sont plus intéressantes pour concevoir des systèmes en temps réel que les systèmes multi-coeurs car il est possible de les maîtriser plus facilement et d’intégrer un plus grand nombre d’applications, potentiellement de différents niveau de criticité. Dans les systèmes temps réel embarqués, ces architectures peuvent être utilisées comme des éléments de traitement au sein d’un réseau fédérateur car ils fournissent un grand nombre d’interfaces Entrées/Sorties telles que les contrôleurs Ethernet et les interfaces de la mémoire DDR-SDRAM. Aussi, il est possible d’y allouer des applications ayant différents niveaux de criticités. Ces applications communiquent entre elles à travers le réseau sur puce (NoC) du pluri coeur et avec des capteurs et des actionneurs via l’interface Ethernet. Afin de garantir les contraintes temps réel de ces applications, les délais de transmission pire cas (WCTT) doivent être calculés pour les flux entre les coeurs ("inter-core") et les flux entre les coeurs et les interfaces entrées/sorties ("core-to-I/O"). Plusieurs réseaux sur puce (NoCs) ciblant les systèmes en temps réel dur ont été conçus en s’appuyant sur des extensions matérielles spécifiques. Cependant, aucune de ces extensions ne sont actuellement disponibles dans les architectures de réseaux sur puce commercialisés, qui se basent sur la commutation wormhole avec la stratégie d’arbitrage par tourniquet. En utilisant cette stratégie de commutation, différents types d’interférences peuvent se produire sur le réseau sur puce entre les flux. De plus, le placement de tâches des applications critiques et non critiques a un impact sur les contentions que peut subir les flux "core-to-I/O". Ces flux "core-to-I/O" parcourent deux réseaux de vitesses différentes: le NoC et Ethernet. Sur le NoC, la taille des paquets autorisés est beaucoup plus petite que la taille des trames Ethernet. Ainsi, lorsque la trame Ethernet est transmise sur le NoC, elle est divisée en plusieurs paquets. La trame sera supprimée de la mémoire tampon de l’interface Ethernet uniquement lorsque la totalité des données aura été transmise. Malheureusement, la congestion du NoC ajoute des délais supplémentaires à la transmission des paquets et la taille de la mémoire tampon de l’interface Ethernet est limitée. En conséquence, ce comportement peut aboutir au rejet des trames Ethernet. L’idée donc est de pouvoir analyser les délais de transmission pire cas sur les NoC et de réduire leurs délais afin d’éviter ce problème de rejet. Dans cette thèse, nous montrons que le pessimisme de méthodes existantes de calcul de WCTT et les stratégies de placements existantes conduisent à rejeter des trames Ethernet en raison d’une congestion interne sur le NoC. Des propriétés des réseaux utilisant la commutation "wormhole" ont été définies et validées afin de mieux prendre en compte les conflits entre les flux. Une stratégie de placement de tâches qui prend en compte les communications avec les I/O a été ensuite proposée. Cette stratégie vise à diminuer les contentions des flux qui proviennent de l’I/O et donc de réduire leurs WCTTs. Les résultats obtenus par la méthode de calcul définie au cours de cette thèse montrent que les valeurs du WCTT des flux peuvent être réduites jusqu’à 50% par rapport aux valeurs de WCTT obtenues par les méthodes de calcul existantes. En outre, les résultats expérimentaux sur des applications avioniques réelles montrent des améliorations significatives des délais de transmission des flux "core-to-I/O", jusqu’à 94%, sans impact significatif sur ceux des flux "intercore". Ces améliorations sont dues à la stratégie d’allocation définie qui place les applications de manière à réduire l’impact des flux non critiques sur les flux critiques. Ces réductions de WCTT des flux "core-to-I/O" évitent le rejet des trames Ethernet. / Many-core architectures are more promising hardware to design real-time systems than multi-core systems as they should enable an easier mastered integration of a higher number of applications, potentially of different level of criticalities. In embedded real-time systems, these architectures will be integrated within backbone Ethernet networks, as they mostly provide Ethernet controllers as Input/Output(I/O) interfaces. Thus, a number of applications of different level of criticalities could be allocated on the Network-on-Chip (NoC) and required to communicate with sensors and actuators. However, the worst-case behavior of NoC for both inter-core and core-to-I/O communications must be established. Several NoCs targeting hard real-time systems, made of specific hardware extensions, have been designed. However, none of these extensions are currently available in commercially available NoC-based many-core architectures, that instead rely on wormhole switching with round-robin arbitration. Using this switching strategy, interference patterns can occur between direct and indirect flows on many-cores. Besides, the mapping over the NoC of both critical and non-critical applications has an impact on the network contention these core-to-I/O communications exhibit. These core-to-I/O flows (coming from the Ethernet interface of the NoC) cross two networks of different speeds: NoC and Ethernet. On the NoC, the size of allowed packets is much smaller than the size of Ethernet frames. Thus, once an Ethernet frame is transmitted over the NoC, it will be divided into many packets. When all the data corresponding to this frame are received by the DDR-SDRAM memory on the NoC, the frame is removed from the buffer of the Ethernet interface. In addition, the congestion on the NoC, due to wormhole switching, can delay these flows. Besides, the buffer in the Ethernet interface has a limited capacity. Then, this behavior may lead to a problem of dropping Ethernet frames. The idea is therefore to analyze the worst case transmission delays on the NoC and reduce the delays of the core-to-I/O flows. In this thesis, we show that the pessimism of the existing Worst-Case Traversal Time (WCTT) computing methods and the existing mapping strategies lead to drop Ethernet frames due to an internal congestion in the NoC. Thus, we demonstrate properties of such NoC-based wormhole networks to reduce the pessimism when modeling flows in contentions. Then, we propose a mapping strategy that minimizes the contention of core-to-I/O flows in order to solve this problem. We show that the WCTT values can be reduced up to 50% compared to current state-of-the-art real-time packet schedulability analysis. These results are due to the modeling of the real impact of the flows in contention in our proposed computing method. Besides, experimental results on real avionics applications show significant improvements of core-to-I/O flows transmission delays, up to 94%, without significantly impacting transmission delays of core-to-core flows. These improvements are due to our mapping strategy that allocates the applications in such a way to reduce the impact of non-critical flows on critical flows. These reductions on the WCTT of the core-to-I/O flows avoid the drop of Ethernet frames. Réseau sur puce Délai de transmission pire cas Commutation wormhole Placement des tâches Système temps réel Interfaces entrées/sorties Network-on-Chip (NoC) Worst-case traversal time (WCTT) Wormhole switching Mapping I/O interfaces Real-time systems

Search results