Global ETD Search

101	Fault tolerant techniques for asynchronous networks on chip Zhang, Guangda January 2016 (has links) Advancing semiconductor technology is boosting the core count on a single chip to achieve continuously increasing performance, posing a growing demand for scalable, efficient and reliable on-chip interconnection. However this advance also makes the electronics increasingly vulnerable to faults. Inter-core connection is increasingly provided by Networks-on-Chip (NoCs), typically using conventional synchronous designs. Scaling makes it increasingly hard to avoid problems with clock distribution and in many chips a single, synchronous domain is inappropriate, anyway. In place of the well-studied synchronous NoCs, event-driven asynchronous NoCs have emerged as a promising replacement. Asynchronous NoCs have many promising advantages over synchronous ones; however, their fault-tolerance has rarely been studied. Implemented in a Quasi-Delay-Insensitive (QDI) fashion, asynchronous NoCs can achieve high timing-robustness but show complicated failure scenarios in the presence of faults and behave differently from synchronous ones, posing a challenge to asynchronous circuit advocates. This research studies the impact of different faults on QDI NoC fabrics and presents thorough and systematic fault-tolerant solutions at the circuit level, providing a holistic, efficient and resilient interconnection solution for QDI NoCs. The contributions of this research include: 1) a thorough analysis of fault impact on QDI NoCs; 2) a Delay-Insensitive Redundant Check (DIRC) coding scheme protecting QDI links from transient faults; 3) a novel time-out technique detecting the fault-caused physical-layer deadlock in a QDI NoC (the adaptability of a QDI circuit to timing variation makes it vulnerable to this kind of deadlock); 4) a fine-grained recovery technique utilising a Spatial Division Multiplexing (SDM) implementation to recover the deadlocked network from a link fault. Both unprotected and protected QDI NoCs are implemented, along with a fault simulation environment, to provide a detailed performance and fault-tolerance evaluation of these techniques. The improvements to the NoC operation, together with the costs in circuit overhead and throughput are enumerated using a typical example of QDI interconnection. 621.3815
102	Analysing Real-Time Traffic in Wormhole-Switched On-ChipNetworks Wu, Taodi, Ding, Shuyang January 2016 (has links) With the increasing demand of computation capabilities, many-core processors are gain-ing more and more attention. As a communication subsystem many-core processors, Network-on-Chip (NoC) draws a lot of attention in the related research fields. A NoC is used to deliver messages among different cores. For many applications, timeliness is of great importance, especially when the application has hard real-time requirements. Thus, the worst-case end-to-end delays of all the messages passing through a NoC should be concerned. Unfortunately, there is no existing analysis tool that can support multiple NoC architectures as well as provide a user-friendly interface.This thesis focuses on a wormhole switched NoC using different arbitration policies which are Fixed Priority (FP) and Round Robin (RR) respectively. FP based arbitration policy includes distinct and shared priority based arbitration policies. We have developed a timing analysis tool targeting the above NoC designs. The Graphical User Interface (GUI) in the tool can simplify the operation of users. The tool takes characteristics of flow sets as input, and returns results regarding the worst-case end-to-end delay of each flow. These results can be used to assist the design of real-time applications on the corre-sponding platform.A number of experiments have been generated to compare different arbitration mecha-nisms using the developed tool. The evaluation focuses on the effect of different param-eters including the number of flows and the number of virtual-channels in a NoC, and the number of hops of each flow. In the first set of experiment, we focus on the schedulabil-ity ratio achieved by different arbitration policies regarding the number of flows. The sec-ond set of experiments focus on the comparison between NoCs with different number of virtual-channels. In the last set of experiments, we compare different arbitration mecha-nisms with respect to the worst-case end-to-end latencies. network-on-chip wormhole switching fixed-priority round robin schedula-bility analysis Computer Sciences Datavetenskap (datalogi)
103	Consolidating Automotive Real-Time Applications on Many-Core Platforms Becker, Matthias January 2017 (has links) Automotive systems have transitioned from basic transportation utilities to sophisticated systems. The rapid increase in functionality comes along with a steep increase in software complexity. This manifests itself in a surge of the number of functionalities as well as the complexity of existing functions. To cope with this transition, current trends shift away from today’s distributed architectures towards integrated architectures, where previously distributed functionality is consolidated on fewer, more powerful, computers. This can ease the integration process, reduce the hardware complexity, and ultimately save costs. One promising hardware platform for these powerful embedded computers is the many-core processor. A many-core processor hosts a vast number of compute cores, that are partitioned on tiles which are connected by a Network-on-Chip. These natural partitions can provide exclusive execution spaces for different applications, since most resources are not shared among them. Hence, natural building blocks towards temporally and spatially separated execution spaces exist as a result of the hardware architecture. Additionally to the traditional task local deadlines, automotive applications are often subject to timing constraints on the data propagation through a chain of semantically related tasks. Such requirements pose challenges to the system designer as they are only able to verify them after the system synthesis (i.e. very late in the design process). In this thesis, we present methods that transform complex timing constraints on the data propagation delay to precedence constraints between individual jobs. An execution framework for the cluster of the many-core is proposed that allows access to cluster external memory while it avoids contention on shared resources by design. A partitioning and configuration of the Network-on-Chip provides isolation between the different applications and reduces the access time from the clusters to external memory. Moreover, methods that facilitate the verification of data propagation delays in each development step are provided. Many-Core Automotive Network-on-Chip Real-Time Timing analysis Embedded Systems Inbäddad systemteknik
104	Proximity coherence for chip-multiprocessors Barrow-Williams, Nick January 2011 (has links) Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence. The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform. Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesign- ing coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design - Proximity Coherence - a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure. 621.39
105	Adaptive NoC for reconﬁgurable SoC / NoC adaptatif pour SoC reconfigurable Pratomo, Istas 08 November 2013 (has links) Les systèmes embarqués sur puce modernes intègrent des milliards de transistors et des composants intégrés hétérogènes pour fournir toutes les fonctionnalités requises par les applications courantes. La solution support de la communication dans ce cadre s'appuie sur la notion de réseau sur puce (NoC pour network on chip). Les principaux objectifs de la conception d'un NoC sont d'obtenir des performances élevées, pour un coût d'implémentation (notamment en surface et en consommation électrique) le plus faible possible. Ainsi, le concepteur de NoC doit tenir compte de l'impact des paramètres du NoC sur le compromis entre les performances du réseau et la taille de silicium requis pour son implémentation. L'utilisation de la technologie submicronique profonde amène des phénomènes de variabilité et de vieillissement qui causes des événements singuliers uniques (SEU pour Single Event Upset). Un SEU provoque le changement d'état d'un bit qui provoque l'échec de la transmission d'une donnée dans un NoC. La mise en œuvre de routage supportant la tolérance aux fautes est donc nécessaire. Dans cette thèse, nous proposons dans un premier temps, une évaluation de l'impact des paramètres de conception des NoC sur ses performances. Le résultat permet de guider le concepteur dans ses choix et le réglage des paramètres du réseau permettant d'éviter la dégradation de ses performances. Deuxièmement, nous avons proposé de nouveaux algorithmes de routage adaptatifs tolérants aux pannes pour un réseaux maillé 2D appelé Gradient et pour un réseaux maillé 3D appelé Diagonal. Ces algorithmes s'adaptent et proposent des séquences de chemins alternatifs pour les paquets lorsque le chemin principal est fautif. Nous avons ainsi évalué le coût d'implémentation de Gradient sur un FPGA actuel. Tous ces travaux ont été validés et caractérisée par simulation et mis en œuvre en FPGA. Les résultats fournissent la comparaison des performances de nos algorithmes avec les algorithmes de l'état de l'art. / Chips will be designed with billions of transistors and heterogeneous components integrated to provide full functionality of a current application for embedded system. These applications also require highly parallel and flexible communicating architecture through a regular interconnection network. The emerging solution that can fulfill this requirement is Network-on-Chips (NoCs). Designing an ideal NoC with high throughput, low latency, minimum using resources, minimum power consumption and small area size are very time consuming. Each application required different levels of QoS such as minimum level throughput delay and jitter. In this thesis, firstly, we proposed an evaluation of the impact of design parameters on performance of NoC. We evaluate the impact of NoC design parameters on the performances of an adaptive NoCs. The objective is to evaluate how big the impact of upgrading the value on performances. The result shows the accuracy of choosing and adjusting the network parameters can avoid performance degradation. It can be considered as the control mechanism in an adaptive NoC to avoid the degradation of QoS NoC. The use of deep sub-micron technology in embedded system and its variability process cause Single Event Upsets (SEU) and ''aging'' the circuit. SEU and aging of circuit is the major problem that cause the failure on transmitting the packet in a NoC. Implementing fault-tolerant routing techniques in NoC switching instead of adding virtual channel is the best solution to avoid the fault in NoC. Communication performance of a NoC is depends heavily on the routing algorithm. An adaptive routing algorithm such as fault-tolerant has been proposed for deadlock avoidance and load balancing. This thesis proposed a novel adaptive fault-tolerant routing algorithm for 2D mesh called Gradient and for 3D mesh called Diagonal. Both algorithms consider sequences of alternative paths for packets when the main path fails. The proposed algorithm tolerates faults in worst condition traffic in NoCs. The number of hops, the number of alternative paths, latency and throughput in faulty network are determined and compared with other 2D mesh routing algorithms. Finally, we implemented Gradient routing algorithm into FPGA. All these work were validated and characterized through simulation and implemented into FPGA. The results provide the comparison performance between proposed method with existing related method using some scenarios. NoC Réseau sur puce Routage adaptatif Tolérance aux fautes NoC Network on chip Adaptive routing Fault tolerant
106	A chip multiprocessor for a large-scale neural simulator Painkras, Eustace January 2013 (has links) A Chip Multiprocessor for a Large-scale Neural SimulatorEustace PainkrasA thesis submitted to The University of Manchesterfor the degree of Doctor of Philosophy, 17 December 2012The modelling and simulation of large-scale spiking neural networks in biologicalreal-time places very high demands on computational processing capabilities andcommunications infrastructure. These demands are difficult to satisfy even with powerfulgeneral-purpose high-performance computers. Taking advantage of the remarkableprogress in semiconductor technologies it is now possible to design and buildan application-driven platform to support large-scale spiking neural network simulations.This research investigates the design and implementation of a power-efficientchip multiprocessor (CMP) which constitutes the basic building block of a spikingneural network modelling and simulation platform. The neural modelling requirementsof many processing elements, high-fanout communications and local memoryare addressed in the design and implementation of the low-level modules in the designhierarchy as well as in the CMP. By focusing on a power-efficient design, the energyconsumption and related cost of SpiNNaker, the massively-parallel computation engine,are kept low compared with other state-of-the-art hardware neural simulators.The SpiNNaker CMP is composed of many simple power-efficient processors withsmall local memories, asynchronous networks-on-chip and numerous bespoke modulesspecifically designed to serve the demands of neural computation with a globallyasynchronous, locally synchronous (GALS) architecture.The SpiNNaker CMP, realised as part of this research, fulfills the demands of neuralsimulation in a power-efficient and scalable manner, with added fault-tolerancefeatures. The CMPs have, to date, been incorporated into three versions of SpiNNakersystem PCBs with up to 48 chips onboard. All chips on the PCBs are performing successfully, during both functional testing and their targeted role of neural simulation. 004
107	Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors Lodde, Mario 04 February 2014 (has links) La jerarquía de caches y la red en el chip (NoC) son dos componentes clave de los chip multiprocesadores (CMPs). La mayoría del trafico en la NoC se debe a mensajes que las caches envían según lo que establece el protocolo de coherencia. La cantidad de trafico, el porcentaje de mensajes cortos y largos y el patrón de trafico en general varían dependiendo de la geometría de las caches y del protocolo de coherencia. La arquitectura de la NoC y la jerarquía de caches están de hecho firmemente acopladas, y estos dos componentes deben ser diseñados y evaluados conjuntamente para estudiar como el variar uno afecta a las prestaciones del otro. Además, cada componente debe ajustarse a los requisitos y a las oportunidades del otro, y al revés. Normalmente diferentes clases de mensajes se envían por diferentes redes virtuales o por NoCs con diferente ancho de banda, separando mensajes largos y cortos. Sin embargo, otra clasificación de los mensajes se puede hacer dependiendo del tipo de información que proveen: algunos mensajes, como las peticiones de datos, necesitan campos para almacenar información (dirección del bloque, tipo de petición, etc.); otros, como los mensajes de reconocimiento (ACK), no proporcionan ninguna información excepto por el ID del nodo destino: solo proveen una información de tipo temporal, en el sentido que la recepción de un ACK indica que el nodo fuente ha recibido el mensaje al que está contestando con el ACK y completado todas las operaciones determinadas por el protocolo de coherencia. Esta segunda clase de mensaje no necesita de mucho ancho de banda: la latencia es mucho mas importante, dado que el nodo destino esta típicamente bloqueado esperando la recepción de ellos. En este trabajo de tesis se desarrolla una red dedicada para trasmitir la segunda clase de mensajes; la red es muy sencilla y rápida, y permite la entrega de los ACKs con una latencia de pocos ciclos de reloj. Reduciendo la latencia y el trafico en la NoC debido a los ACKs, es posible: -acelerar la fase de invalidación en fase de escritura en un sistema que usa un protocolo de coherencia basado en directorios -mejorar las prestaciones de un protocolo de coerencia basado en broadcast, hasta llegar a prestaciones comparables con las de un protocolo de directorios pero sin el coste de área debido a la necesidad de almacenar el directorio -implementar un mapeado dinámico de bloques a las caches de ultimo nivel de forma eficiente, con el objetivo de acercar cuanto al máximo los bloques a los cores que los utilizan El objetivo final es obtener un co-diseño de NoC y jerarquía de caches que minimice los problemas de escalabilidad de los protocolos de coherencia. Como gran objetivo final, se pretende la implementación de un CMP con ubicación dinámica de los recursos de cache y red, tal que estos recursos se puedan particionar de forma eficiente e independiente para asignar diferentes particiones a diferentes aplicaciones en un entorno virtualizado. / Lodde, M. (2014). Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/35325 / TESIS Chip multiprocessors Computer architecture Cache hierarchy Cache coherence protocols Network-on-chip
108	Network-on-Chip Synchronization Buckler, Mark 07 November 2014 (has links) Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of “brute-force” synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency. First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed. To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers. Network-on-Chip Synchronizer VLSI Computer and Systems Architecture Digital Circuits Hardware Systems
109	Protocols and algorithms for secure Software Defined Network on Chip (SDNoC) Ellinidou, Soultana 16 February 2021 (has links) (PDF) Under the umbrella of Internet of Things (IoT) and Internet of Everything (IoE), new applications with diverse requirements have emerged and the traditional System-on-Chips (SoCs) were unable to support them. Hence, new versatile SoC architectures were designed, like chiplets and Cloud-of-Chips (CoC). A key component of every SoC, is the on-chip interconnect technology, which is responsible for the communication between Processing Elements (PEs) of a system. Network-on-Chip (NoC) is the current widely used interconnect technology, which is a layered, scalable approach. However, the last years the high structural complexity together with the functional diversity and the challenges (QoS, high latency, security) of NoC motivated the researchers to explore alternatives of it. One NoC alternative that recently gained attention is the Software Defined Network-on-Chip (SDNoC). SDNoC originated from Software Defined Network (SDN) technology, which supports the dynamic nature of future networks and applications, while lowering operating costs through simplified hardware and software. Nevertheless, SDN technology designed for large scale networks. Thus, in order to be ported to micro-scale networks proper alterations and new hardware architectures need to be considered.In this thesis, an exploration of how to embed the SDN technology within the micro scale networks in order to provide secure and manageable communication, improve the network performance and reduce the hardware complexity is presented. Precisely, the design and implementation of an SDNoC architecture is thoroughly described followed by the creation and evaluation of a novel SDNoC communication protocol, called MicroLET, in order to provide secure and efficient communication within system components. Furthermore, the security aspect of SDNoC constitutes a big gap in the literature. Hence, it has been addressed by proposing a secure SDNoC Group Key Agreement (GKA) communication protocol, called SSPSoC, followed by the exploration of Byzantine faults within SDNoC and the investigation of a novel Hardware Trojan (HT) attack together with a proposed detection and defend method. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished Sciences de l'ingénieur Software Defined Network-on-Chip routing algorithms NoC Hardware Trojan Byzantine Faults Group Key Agreement
110	Fault-Tolerant Nostrum NoC on FPGA for theForSyDe/NoC System Generator Tool Suite Gkalea, Salvator January 2014 (has links) Moore’s law is the observation that over the years, the transistor density will increase,allowing billions of transistors to be integrated on a single chip. Over the lasttwo decades, Moore’s law has enabled the implementation of complex systems on asingle chip(SoCs). The challenge of the System-on-Chip(SoC) era was the demandof an efficient communication mechanism between the growing number of processingcores on the chip. The outcome established an new interconnection scheme (amongothers, like crossbars, rings, buses) based on the telecommunication networks andthe Network- on-Chip(NoC) appeared on the scene.The NoC has been developed not only to support systems embedded into asingle processor, but also to support a set of processors embedded on a singlechip.Therefore, the Multi-Processors System on Chip(MPSoC) has arisen, whichincorporate processing elements, memories and I/O with a fixed interconnection infrastructurein a complete integrated system. In such systems, the NoC constitutesthe backbone of the communication architecture that targets future SoC composedby hundred of processing elements. Besides that, together with the deep sub-microntechnology progress, some drawbacks have arisen. The communication efficiencyand the reliability of the systems rely on the proper functionality of NoC for onchipdata communication. A NoC must deal with the susceptibility of transistors tofailure that indicates the demand for a fault tolerant communication infrastructure.A mechanism that can deal with the existence of different classes of faults(transient,intermittent and permanent [11]) which can occur in the communication network.In this thesis, different algorithms are investigated that implement fault toleranttechniques for permanent faults in the NoC. The outcome would be to deliver a faulttolerantmechanism for the NoC System Generator Tool [29] which is a researchin Network-on-Chip carried out at the Royal Institute of Technology. It will beexplicitly described the fault tolerant algorithm that is implemented in the switchin order to achieve packet rerouting around the faulty communication links. Fault-Tolerant Nostrum Network-on-Chip FPGA ForSyDe Embedded Systems Inbäddad systemteknik

Search results