Global ETD Search

1	Low Power Design Methodology and Photonics Networks on Chip for Multiprocessor System on Chip / Méthodologie de conception basse consommation et réseaux optiques sur puces pour multiprocesseur système sur puce Hamwi, Khawla 30 May 2013 (has links) Les systèmes multiprocesseurs sur puce (MPSoC)s sont fortement émergent comme principaux composants dans les systèmes embarqués à hautes performances. La principale complexité dans la conception et l’implémentation des MPSoC est la communication entre les cœurs. Les réseaux sur puce (NoC) sont considérés comme la solution pour cet effet. ITRS prédit que des centaines de cœurs seront utilisées dans la génération future de système sur puce (SoC), ce qui va donc augmenter les coûts de l’évolutivité, de bande passante et de l’implémentation des réseaux sur puce (NoC)s. Ces problèmes sont présents dans diverses tendances technologiques dans le domaine des semiconducteurs et de la photonique. Cette thèse préconise l'utilisation de la synthèse NoC comme l'approche la plus appropriée pour exploiter ces tendances technologiques et rattraper les exigences des applications. A partir de plusieurs méthodologies de conception basées sur la technologie FPGA et des techniques d'estimation basse énergie (HLS) pour plusieurs IPs, nous proposons une implémentation ASIC basée sur la technologie 3D Tezzaron. Multi-FPGA technologie est utilisée pour valider la conception MPSoC avec 64 processeurs Butterfly NoC. La synthèse NoC est basée sur le regroupement de maîtres et d’esclaves générant des architectures asymétriques avec un soutien approprié pour les demandes très haut débit par optique NoC (ONoC), tandis que les demandes de bande passante inférieure sont traitées par électronique NoC. Une programmation linéaire est proposée comme une solution pour la synthèse NoC. / Multiprocessor systems on chip (MPSoC)s are strongly emerging as main components in high performance embedded systems. Several challenges can be determined in MPSoC design like the challenge which comes from interconnect infrastructure. Network-on-Chip (NOC) with multiple constraints to be satisfied is a promising solution for these challenges. ITRS predicts that hundreds of cores will be used in future generation system on chip (SoC) and thus raises the issue of scalability, bandwidth and implementation costs for NoCs. These issues are raised within the various technological trends in semiconductors and photonics. This PhD thesis advocates the use of NoC synthesis as the most appropriate approach to exploit these technological trends catch up with the applications requirements. Starting with several design methodologies based on FPGA technology and low power estimation techniques (HLS) for several IPs, we propose an ASIC implementation based on 3D Tezzaron technology. Multi-FPGA technology is used to validate MPSoC design with up to 64 processors with Butterfly NoC. NoC synthesis is based on a clustering of masters and slaves generating asymmetric architectures with appropriate support for very high bandwidth requests through Optical NoC (ONoC) while lower bandwidth requests are processed by electronic NoC. A linear programming is proposed as a solution to the NoC synthesis. Consommation d'énergie MPSoC NoC Optique NoC Synthèse NoC hybride Power consumption MPSoC NoC Optical NoC Hybrid NoC synthesis
2	Modélisation et gestion du trafic dans le cadre de réseaux sur puce multi-FPGA / Modeling and management of traffic in the Network-on-Chip multi-FPGA Dorai, Atef 07 July 2017 (has links) Avec la complexité croissante des systèmes sur puce, la conception de la nouvelle génération des systèmes embarqués dédiée aux applications multimédia doit intégrer des structures de communication efficaces telles que le réseau sur puce (Network-on-Chip : NoC). Vu la limitation du nombre de ressources d’un seul FPGA, les plateformes multi-FPGA sont considérées comme la solution la plus appropriée pour émuler et évaluer ces grands systèmes. Le déploiement passe souvent par le partitionnement du NoC sur plusieurs FPGAs et de remplacer les liens de communications internes par des liens de communications externes. Cette solution possède des limitations. En fait, l’évolution des FPGAs tend à rendre les IOs des ressources rares aggravant la bande passante intra-FPGA d’une génération à une autre. Actuellement, le nombre de signaux inter-FPGA est considéré comme un problème majeur pour déployer un NoC à grand échelle sur multi-FPGA. Comme il y a plus de signaux à connecter que les IOs disponibles sur FPGA, un goulot d’étranglement important a été crée laissant les concepteurs soufrera. Les contributions principales de cette thèse sont : (1). Nous avons développé deux architectures de gestions de collisions, une basée sur un accès aléatoire (Backoff) et l’autre basée sur un accès planifié (Round-Robin). Des comparaisons temporelles et des ressources ont été effectuées pour choisir la méthode d’accès la plus performante pour prototyper un NoC sur multi-FPGA. L’architecture basée sur le Backoff permet de partager efficacement le lien externe entre plusieurs routeurs avec un nombre minimum de collisions. Ainsi, cet algorithme permet de gérer le goulet d’étranglement et équilibre les accès des routeurs vers l’inter-FPGA. La nouvelle architecture inter-FPGA pour le Network-on-Chip basée sur l’algorithme BackOff fournit une latence plus faible avec moins de ressources par rapport à d’autres solutions comme le RR (Round-Robin) et le HRRA (Hierarchical Roun-Robin Arbiter). (2) Une méthodologie de modélisation a été émergée pour estimer le nombre de ressources utilisées par chaque architecture. Cette modélisation est basée sur la régression linéaire. Il y a des grandes surestimations avec le round-robin qu’avec le Backoff. (3) Finalement, une architecture de NoC dédiée aux applications multimédias a été proposée. L’objective de cette architecture est de transmettre des trafics avec des niveaux de priorités différentes dans des bonnes conditions. Dans cette architecture de NoC multimédia, nous avons doublé les liens physiques au lieu d’utiliser des canaux virtuels pour permettre aux trafics de haute priorité de récupérer le retard. De plus, nous avons intégré à l’intérieur des routeurs un simple arbitre pour traiter les niveaux de priorité pour chaque paquet. Cette nouvelle architecture a été comparée avec des architectures de NoC traditionnelles avec (basée sur des canaux virtuels) ou sans (NoC Handshake) qualité de service. Plusieurs testsont été effectués pour prouver l’efficacité de l’architecture du NoC multimédia. Finalement, une étude analytique a été proposée pour estimer le nombre d’AP nécessaires pour que cette architecture de NoC multimédia afin de répondre aux exigences d’utilisateurs dans le contexte de multi-FPGA / With the increasing complexity of System-on-Chip, the design of efficient embedded systems dedicated for multimedia applications must integrate effective communication interconnects such as Network-on-Chip. Given the limited number of resources of a single FPGA, multi-FPGA platforms are considered the most appropriate means for experimentation, emulation and evaluation for such large systems. Deployment often involves partitioning the Network-on-Chip on several FPGA and replacing internal communication links with external ones. The limitation of this solution stems from the fact that with ongoing evolution of FPGAs, their I/O resources become scarcer in time. This, consequently, decreases intra-FPGA bandwidth. Currently, the number of inter-FPGA signals is considered a major problem to prototype a Network-on-Chip on multi-FPGA. Since there are more signals needed for routers than the number of available FPGA I/Os. Therefore, inter-FPGA links must be shared between routers, resulting in significant bottlenecks. Since the ratio of logical capacity to the number of IOs increases slowly for each FPGA generation, this technological bottleneck will be remaining for future system designs.The main contributions of this thesis are : (1). We have developed two collision management architectures, one is based on a random access (Backoff) and the other is based on a round-robin algorithm. Timing and resources comparisons are made to evaluate the two inter-FPGA traffic management architectures. The Backoff-based sub-NoC architecture effectively shares external links between multiple routers with a minimum number of collision and balances access between all routers. The new inter-FPGA architecture for the Network-on-Chip based on the BackOff algorithm achieves lower latency with fewer resources compared to other solutions such as Round-Robin and Hierarchical Round-Robin Arbiter. (2) A modeling methodology has emerged to estimate the number of resources used by each architecture. This modeling is based on linear regression. There are considerable over-estimations in the round-robin compared to the Backoff. (3) A NoC architecture dedicated for multimedia applications has been proposed. The objective of such architecture is to transmit traffic with different priority levels under right conditions. In this architecture of NoC multimedia, we have doubled the physical links instead of using virtual channels to allow high priority traffic to recover the delay and to ensure quality of service. In Additionally we have integrated within the routers a simple arbiter to deal with the priority levels for each packet. This new architecture has been compared with traditional architecture based on virtual channels using several test partitioning. Finally, an analytical study was proposed to estimate the number of APs needed for the NoC Multimedia deployed in multi-FPGA systemse to meet the user’s requirements NoC Multi-FPGA Architecture de gestion de collisions NoC multimédia NoC Multi-FPGA Collision management architecture NoC multimedia
3	Networks-on-chip: modeling, system-level abstraction, and application-specific architecture customization. Morgan, Ahmed Abdel Fattah Hassan 19 October 2011 (has links) This dissertation proposes different methodologies, with their associated models, to customize the architectural design of Application-Specific Networks-on-Chip (ASNoC). Specifically, system-level evaluation models are presented and architecture generation methodologies are built on them to allow the designer to generate the most efficient architecture for a given NoC-based application. Our system-level methodologies enable the designer to discover any flaws early during the design process and to quickly investigate the effect of various design choices on the resultant NoC cost and performance. In this dissertation, we have four main contributions. In our first contribution, we propose power and reliability evaluation models. The two models are proposed at the system-level to allow for a quick evaluation of different design decisions. The power model captures the power consumption in NoC routers and links, whereas the reliability one models the probability of the packets being affected by on-chip noise sources. In our second contribution, we propose a cost-efficient architecture generation methodology for NoC based on network partitioning techniques. Our methodology partially customizes the on-chip network architecture with respect to two cost metrics: power and area. The partitioning technique is formulated using NoC terminology based on the Fiduccia-Mattheyses graph partitioning algorithm. Our partitioning scheme is compared to other partitioning techniques and is found to be the most efficient one for NoC. We further analyze the effect of using network partitioning on NoC power, area, and delay. From this analysis, the area reduction is proved to be guaranteed using network partitioning. Moreover, power and delay efficiencies of using network partitioning with NoC are formulated mathematically. Experimental results show that the proposed methodology is an efficient way to reduce power and area costs of NoC with respect to both standard and previous custom architecture generation techniques. In our third contribution, we propose a multi-objective Genetic Algorithm (GA)-based optimization methodology for NoC full-custom architectures. For any application, the designer could control the optimization process through different optimization weight factors. Our methodology is evaluated by applying it to different NoC benchmark applications, as case studies. Results show that the architectures generated by our methodology outperform those generated by other techniques with respect to power, area, delay, reliability, and the combination of the four metrics. Finally, the running time of our methodology is an order of magnitude faster than that of previous architecture optimization techniques. In our fourth contribution, we propose a multi-objective GA-based methodology to optimize the use of standard architectures, which were previously presented in computer network, with NoC. Our methodology combines the best selection of NoC standard architecture and the optimum mapping of application cores onto that architecture. The methodology is further used to carry out an application-specific mapping-oriented evaluation of different NoC standard architectures. Experimental results show that the mapping achieved by our methodology outperforms those generated by previous mapping techniques with respect to power, area, delay, reliability, and the combination of the four metrics. This research work aims at quickly validating various design decisions by proposing system-level power and reliability evaluation models. Moreover, in this dissertation, we present three application-specific methodologies to customize the three main categories of architectures that are currently used in implementing on-chip networks; namely, semi-custom, full-custom, and standard architectures, respectively. Our methodologies consider different NoC metrics: power, area, delay, and reliability, simultaneously. We believe that our proposed methodologies bridge an open gap in NoC research by matching the on-chip network architecture to the characteristics and the rapidly growing requirements of modern NoC applications. / Graduate ASNoC NoC-based application
4	SPAcENoCs : A Scalable Platform for FPGA Accelerated Emulator of NoCs Chen, Guangming 03 October 2013 (has links) The majority of modern high performance computing systems have employed on-chip multi-processors. As the number of on-chip cores soars, the traditional non- scalable communication infrastructures, commonly observed as shared buses or cross- bars, no longer accommodate the increasing communication demand by the modern multi-core chips. The newly emerging Network-On-Chip (NoC) interconnection scheme has provided a scalable, robust and power-efficient solution that also satisfies the requirements on both bandwidth and latency. A tool that enables swift exploration of the vast NoC design space is then in great demand to meet the stiff time pressure over research and development. Based on the work of AcENoCs, an NoC simulator designed on the basis of software and hardware codesign seeking for a large simulatable network size, the SPAcENoCs (Scalable Platform for FPGA Accelerated Emulator of NoCs) employs the Time-Division Multiplexing (TDM) techniques to implement a simulator for even larger NoCs without sacrificing simulation speed and cycle accuracy which have been highlighted in the work of AcENoCs. This paper will focus on re-organization of the given software/hardware codesigned frameworks so that the TDM techniques may be applied. While both frameworks require re-design, the major efforts involve re- construction of the hardware framework by adding data buffers and affiliated logic to ensure the data generated in different time divisions are properly preserved and trans- mitted. Various design tradeoffs over hardware budget and simulation performance are also discussed and attempted in this paper. During the development process, the techniques of device virtualization and generic programming are introduced to overcome the verification challenges that are commonly seen in software/hardware codesigned systems. The synthesis results of various design options suggested that the simulation of a 9 × 6 network, more than twice the size of largest applicable size in AcENoCs, can be accommodated by the device. Based on the simulation result of AcENoCs, the estimated speedup of SPAcENoCs over software simulator for the 9 × 6 NoC is around 28-94X, twice the one achieved by AcENoCs in a smaller network. NoC FPGA Simulator
5	STT-MRAM Based NoC Buffer Design Vikram Kulkarni, Nikhil 2012 August 1900 (has links) As Chip Multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) is a major bottleneck in CMP design. An emerging non-volatile memory - STT MRAM (Spin-Torque Transfer Magnetic RAM) which provides substantial power and area savings, near zero leakage power, and displays higher memory density compared to conventional SRAM. But STT-MRAM suffers from inherit drawbacks like multi cycle write latency and high write power consumption. So, these problem have to addressed in order to have an efficient design to incorporate STT-MRAM for NoC input buffer instead of traditional SRAM based input buffer design. Motivated by short intra-router latency, previously proposed write latency reduction technique is explored by sacrificing retention time and a hybrid design of input buffers using both SRAM and STT-MRAM to "hide" the long write latency efficiently is proposed. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer is also proposed. STT-MRAM Noc Buffer
6	Handshake and Circulation Flow Control in Nanaphotonic Interconnects Jayabalan, Jagadish 2012 August 1900 (has links) Nanophotonics has been proposed to design low latency and high bandwidth Network-On-Chip (NOC) for future Chip Multi-Processors (CMPs). Recent nanophotonic NOC designs adopt the token-based arbitration coupled with credit-based flow control, which leads to low bandwidth utilization. This thesis proposes two handshake schemes for nanophotonic interconnects in CMPs, Global Handshake (GHS) and Distributed Handshake (DHS), which get rid of the traditional credit-based flow control, reduce the average token waiting time, and finally improve the network throughput. Furthermore, we enhance the basic handshake schemes with setaside buffer and circulation techniques to overcome the Head-Of-Line (HOL) blocking. The evaluations show that the proposed handshake schemes improve network throughput by up to 11x under synthetic workloads. With the extracted trace traffic from real applications, the handshake schemes can reduce the communication delay by up to 55%. The basic handshake schemes add only 0.4% hardware overhead for optical components and negligible power consumption. In addition, the performance of the handshake schemes is independent of on-chip buffer space, which makes them feasible in a large scale nanophotonic interconnect design. Nanophotonics NOC Handshake Circulation
7	VLPW: The Very Long Packet Window Architecture for High Throughput Network-On-Chip Router Designs Gu, Haiyin 2011 August 1900 (has links) ChipMulti-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Network-On-Chip (NOC) provides a scalable communication method for CMPs. NOC must be carefully designed to provide low latencies and high throughput in the resource-constrained environment. To improve the network throughput, we propose the Very Long Packet Window (VLPW) architecture for the NOC router design that tries to close the throughput gap between state-of-the-art on-chip routers and the ideal interconnect fabric. To improve throughput, VLPW optimizes Switch Allocation (SA) efficiency. Existing SA normally applies Round-Robin scheduling to arbitrate among the packets targeting the same output port. However, this simple approach suffers from low arbitration efficiency and incurs low network throughput. Instead of relying solely on simple switch scheduling, the VLPW router design globally schedules all the input packets, resolves the output conflicts and achieves high throughput. With the VLPW architecture, we propose two scheduling schemes: Global Fairness and Global Diversity. Our simulation results show that the VLPW router achieves more than 20% throughput improvement without negative effects on zero-load latency. NOC High Throughput
8	Estratégia para redução de congestionamento em sistemas multiprocessadores baseados em NOC KAMEI, Camila Ascendina Nunes 07 August 2015 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-01T13:03:48Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao_Camila_Ascendina_Nunes_Kamei.pdf: 2427056 bytes, checksum: 9c4bd5bb499271557f86edce757edec2 (MD5) / Made available in DSpace on 2016-07-01T13:03:48Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao_Camila_Ascendina_Nunes_Kamei.pdf: 2427056 bytes, checksum: 9c4bd5bb499271557f86edce757edec2 (MD5) Previous issue date: 2015-08-07 / CNPq / Duas questões são críticas em sistemas com paralelismo de memória em rede NoC baseados em MPSoC, a ordem de entrega da mensagem e o congestionamento da rede. Os congestionamentos são frequentes em NoC quando as demandas de pacotes excedem a capacidade dos recursos da rede e a ordem das mensagens precisam ser mantidas para que a informação de coerência de cache tenha signi cado para as memórias. Assim, métodos de controle de congestionamento são necessários para estes sistemas e devem lidar com o congestionamento da rede, enquanto mantém a ordem das transações. Este trabalho propõe uma técnica de roteamento baseada no algoritmo de roteamento Odd-Even associado ao conceito de congestionamento local e global da rede para a escolha do melhor caminho de encaminhamento dos pacotes de comunicação. Desta forma se objetiva a redução dos gargalos de comunicação da rede para os sistemas NoC baseado em MPSoC. Nos experimentos realizados para 16 núcleos, a técnica proposta alcançou a redução de 13,35% da energia consumida, 25% de redução de latência de envio de pacotes em comparação o algoritmo XY e 23% de redução de latência de envio de pacotes em comparação o algoritmo Odd-Even sem modi cação. / Two issues are critical in systems with memory parallelism network NoC-based MPSoC, the delivery order of messages and network congestion. The congestions are frequent in NoC when the packages demands exceed the capacity of the network resources and the order of the messages need to be maintained so that the cache coherency information is meaningful to the memories. Thus, congestion control methods are needed to deal with network congestion while they keep the order of the transactions. This paper proposes the use of the routing algorithm Odd-Even associated with the concept of local and global network congestion to choose the best routing path of communication packages. In this way it aims to reduce the network communication bottlenecks for NoC systems based on MPSoC. In experiments conducted for 16 cores, the proposed technique has achieved the reduction of 13.35 % of energy consumption, 25% of latency compared with the XY algorithm and 23% of latency compared with the Odd-Even algorithm without the modi cation. NoC - Network-on-Chip Congestionamento em NoC Algoritmo Adaptativo NoC - Network-on-Chip Congestion on NoC Adaptative Algorithm
9	COMPARISON OF SINGLE-PORT AND MULTI-PORT NoCs WITH CONTEMPORARY BUSES ON FPGAs BHATTACHARYA, PRASUN 20 July 2006 (has links) No description available. NoC Routers SoC FPGA
10	HIGH LEVEL SYNTHSIS FOR A NETWORK ON CHIP TOPOLOGY Ali, Baraa Saeed 01 May 2013 (has links) Network on chips (NoCs) have emerged as a panacea to solve many intercommunication issues that are imposed by the fast growing of VLSI design. NOC have been deployed as a solution for the communication delay between cores, area overhead, power consumption, etc. One of the leading parameters of speeding up the performance of system on chips (SOCs) is the efficiency of scheduling algorithms for the applications running on a SOC. In this thesis we are arguing that a global scheduling view can significantly improve latency in NoCs. This view can be achieved by having the NoC nodes communicate with each other in a predefined application-based fashion; by calculating in advance how many clock cycles the nodes need to execute and transmit packets to the network and how many clock cycles are needed for the packets to travel all the way to the destination through routers (including queuing delay). By knowing that, we could keep some of the cores stay in "Hold-On" state until the right time comes to start transmitting. This technique could lead to reduced congestion and it may guarantee that the cores do not suffer from severe resource contention, e.g. accessing memory. This task is achieved by using a network simulator (such as OPNET) and gathering statistics, so the worst case latency can be determined. Therefore, if NoC nodes can somehow postpone sending packets in a way that does not violate the deadline of their tasks, packet dropping or livelock can be avoided. It is assumed that the NoC nodes here need buffers of their own in order to hold the ready-to-transmit packets and this can be the cost of this approach. cores Network on Chip (NoC) NoC nodes Routing Algorithm synthesis

Search results