• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 93
  • 46
  • 20
  • 13
  • 8
  • 2
  • 1
  • 1
  • Tagged with
  • 198
  • 198
  • 60
  • 55
  • 50
  • 46
  • 35
  • 32
  • 32
  • 27
  • 27
  • 27
  • 26
  • 24
  • 22
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

Dynamic Voltage/Frequency Scaling and Power-Gating of Network-on-Chip with Machine Learning

Clark, Mark A. 05 June 2019 (has links)
No description available.
172

Exploring trade-offs between Latency and Throughput in the Nostrum Network on Chip

Nilsson, Erland January 2006 (has links)
During the past years has the Nostrum Network on Chip (NoC) been developed to become a competitive platform for network based on-chip communication. The Nostrum NoC provides a versatile communication platform to connect a large number of intellectual properties (IP) on a single chip. The communication is based on a packet switched network which aims for a small physical footprint while still providing a low communication overhead. To reduce the communication network size, a queue-less network has been the research focus. This network uses de ective hot-potato routing which is implemented to perform routing decisions in a single clock cycle. Using a platform like this results in increased design reusability, validated signal integrity, and well developed test strategies, in contrast to a fully customised designs which can have a more optimal communication structure but has a significantly longer development cycle to verify the new design accordingly. Several factors are considered when designing a communication platform. The goal is to create a platform which provides low communication latency, high throughput, low power consumption, small footprint, and low design, verification, and test overhead. Proximity Congestion Awareness is one technique that serves to reduce the network load. This leads to that the latency is reduced which also increases the network throughput. Another technique is to implement low latency paths called Data Motorways achieved through a clocking method called Globally Pseudochronous Locally Synchronous clocking. Furthermore, virtual circuits can be used to provide guarantees on latency and throughput. Such guarantees are dificult in hot-potato networks since network access has to be ensured. A technique that implements virtual circuits use looped containers that are circulating on a predefined circuit. Several overlapping virtual circuits are possible by allocating the virtual circuits in different Temporally Disjoint Networks. This thesis summarise the impact the presented techniques and methods have on the characteristics on the Nostrum model. It is possible to reduce the network load by a factor of 20 which reduces the communication latency. This is done by distributing load information between the Switches in the network. Data Motorways can reduce the communication latency with up to 50% in heavily loaded networks. Such latency reduction results in freed buffer space in the Switch registers which allows the traffic rate to be increased with about 30%. / QC 20101122
173

Runtime Adaptive Scrubbing in Fault-Tolerant Network-on-Chips (NoC) Architectures

Boraten, Travis H. 09 June 2014 (has links)
No description available.
174

A Cycle-Accurate Simulator for Accelerating Convolution on AXI4-based Network-on-Chip Architecture / En cykelexakt simulator för att accelerera konvolution på AXI4-baserad nätverk-på-chip-arkitektur

Liu, Mingrui January 2024 (has links)
Artificial intelligence is probably one of the most prevalent research topics in computer science area, because the technology, if well developed and used properly, is promising to affect the daily lives of ordinaries or even reshape the structure of society. In the meantime, the end of Moore’s Law has promoted the development trend towards domain-specific architectures. The upsurge in researching specific architectures for artificial intelligence applications is unprecedented. Network-on-Chip (NoC) was proposed to address the scalability problem of multi-core system. Recently, NoC has gradually appeared in deep learning computing engines. NoC-based deep learning accelerator is an area worthy of research and currently understudied. Simulating a system is an important step in computer architecture research because it not only allows for rapid verification and measurement of design’s performance, but also provides guidance for subsequent hardware design. In this thesis, we present CNNoCaXiM, a flexible and cycle-accurate simulator for accelerating 2D convolution based on NoC interconnection and AXI4 protocol. We demonstrate its ability by simulating and measuring a convolution example with two different data flows. This simulator can be very useful for upcoming research, either as a baseline case or as a building block for further research. / Artificiell intelligens är förmodligen ett av de vanligaste forskningsämnena inom datavetenskap, eftersom tekniken, om den väl utvecklas och används på rätt sätt, lovar att påverka vanliga människors vardag eller till och med omforma samhällets struktur. Under tiden har slutet av Moores lag främjat utvecklingstrenden mot domänspecifika arkitekturer. Uppsvinget i forskning om specifika arkitekturer för tillämpningar av artificiell intelligens är utan motstycke. Network-on-Chip (NoC) föreslogs för att ta itu med skalbarhetsproblemet med flerkärniga system. Nyligen har NoC gradvis dykt upp i djuplärande datormotorer. NoC-baserad accelerator för djupinlärning är ett område som är värt forskning och för närvarande understuderat. Simulering av ett system är ett viktigt steg i forskning om datorarkitektur eftersom det inte bara möjliggör snabb verifiering och mätning av designens prestanda, utan också ger vägledning för efterföljande hårdvarudesign. I detta examensarbete presenterar vi CNNoCaXiM, en flexibel och cykelnoggrann simulator för att accelerera 2D-faltning baserad på NoC-interconnection och AXI4-protokoll. Vi visar dess förmåga genom att simulera och mäta ett faltningsexempel med två olika dataflöden. Denna simulator kan vara mycket användbar för kommande forskning, antingen som ett grundfall eller som en byggsten för vidare forskning.
175

Projeto e implementa??o de uma plataforma MP-SoC usando SystemC

Rego, Rodrigo Soares de Lima S? 19 May 2006 (has links)
Made available in DSpace on 2014-12-17T15:47:57Z (GMT). No. of bitstreams: 1 RodrigoSLSR.pdf: 1278461 bytes, checksum: ac21fe12bc1ce120cf688ba59e4bf754 (MD5) Previous issue date: 2006-05-19 / This work presents the concept, design and implementation of a MP-SoC platform, named STORM (MP-SoC DirecTory-Based PlatfORM). Currently the platform is composed of the following modules: SPARC V8 processor, GPOP processor, Cache module, Memory module, Directory module and two different modles of Network-on-Chip, NoCX4 and Obese Tree. All modules were implemented using SystemC, simulated and validated, individually or in group. The modules description is presented in details. For programming the platform in C it was implemented a SPARC assembler, fully compatible with gcc s generated assembly code. For the parallel programming it was implemented a library for mutex managing, using the due assembler s support. A total of 10 simulations of increasing complexity are presented for the validation of the presented concepts. The simulations include real parallel applications, such as matrix multiplication, Mergesort, KMP, Motion Estimation and DCT 2D / Este trabalho apresenta o conceito, desenvolvimento e implementa??o de uma plataforma MP-SoC, batizada STORM (MP-SoC DirecTory-Based PlatfORM). A plataforma atualmente ? composta pelos seguintes m?dulos: processador SPARC V8, processador GPOP, m?dulo de Cache, m?dulo de Mem?ria, m?dulo de Diret?rio e dois diferentes modelos de Network-on-Chip, a NoCX4 e a ?rvore Obesa. Todos os m?dulos foram implementados usando a linguagem SystemC, simulados e validados, tanto separadamente quanto em conjunto. A descri??o dos m?dulos ? apresentada em detalhes. Para a programa??o da plataforma usando C foi implementado um montador SPARC, totalmente compat?vel com o c?digo assembly gerado pelo compilador gcc. Para a programa??o concorrente foi implementada uma biblioteca de fun??es para gerenciamento de mutexes, com o devido suporte por parte do montador. S?o apresentadas 10 simula??es do sistema, de complexidade crescente, para valida??o de todos os conceitos apresentados. As simula??es incluem aplica??es paralelas reais, como a multiplica??o de matrizes, Mergesort, KMP, Estima??o de Movimento e DCT 2D
176

A model-based design approach for heterogeneous NoC-based MPSoCs on FPGA

Robino, Francesco January 2014 (has links)
Network-on-chip (NoC) based multi-processor systems-on-chip (MPSoCs) are promising candidates for future multi-processor embedded platforms, which are expected to be composed of hundreds of heterogeneous processing elements (PEs) to potentially provide high performances. However, together with the performances, the systems complexity will increase, and new high level design techniques will be needed to efficiently model, simulate, debug and synthesize them. System-level design (SLD) is considered to be the next frontier in electronic design automation (EDA). It enables the description of embedded systems in terms of abstract functions and interconnected blocks. A promising complementary approach to SLD is the use of models of computation (MoCs) to formally describe the execution semantics of functions and blocks through a set of rules. However, also when this formalization is used, there is no clear way to synthesize system-level models into software (SW) and hardware (HW) towards a NoC-based MPSoC implementation, i.e., there is a lack of system design automation (SDA) techniques to rapidly synthesize and prototype system-level models onto heterogeneous NoC-based MPSoCs. In addition, many of the proposed solutions require large overhead in terms of SW components and memory requirements, resulting in complex and customized multi-processor platforms. In order to tackle the problem, a novel model-based SDA flow has been developed as part of the thesis. It starts from a system-level specification, where functions execute according to the synchronous MoC, and then it can rapidly prototype the system onto an FPGA configured as an heterogeneous NoC-based MPSoC. In the first part of the thesis the HeartBeat model is proposed as a model-based technique which fills the abstraction gap between the abstract system-level representation and its implementation on the multiprocessor prototype. Then details are provided to describe how this technique is automated to rapidly prototype the modeled system on a flexible platform, permitting to adjust the system specification until the designer is satisfied with the results. Finally, the proposed SDA technique is improved defining a methodology to automatically explore possible design alternatives for the modeled system to be implemented on a heterogeneous NoC-based MPSoC. The goal of the exploration is to find an implementation satisfying the designer's requirements, which can be integrated in the proposed SDA flow. Through the proposed SDA flow, the designer is relieved from implementation details and the design time of systems targeting heterogeneous NoC-based MPSoCs on FPGA is significantly reduced. In addition, it reduces possible design errors proposing a completely automated technique for fast prototyping. Compared to other SDA flows, the proposed technique targets a bare-metal solution, avoiding the use of an operating system (OS). This reduces the memory requirements on the FPGA platform comparing to related work targeting MPSoC on FPGA. At the same time, the performance (throughput) of the modeled applications can be increased when the number of processors of the target platform is increased. This is shown through a wide set of case studies implemented on FPGA. / <p>QC 20140609</p>
177

High-Performance Network-on-Chip Design for Many-Core Processors

Wang, Boqian January 2020 (has links)
With the development of on-chip manufacturing technologies and the requirements of high-performance computing, the core count is growing quickly in Chip Multi/Many-core Processors (CMPs) and Multiprocessor System-on-Chip (MPSoC) to support larger scale parallel execution. Network-on-Chip (NoC) has become the de facto solution for CMPs and MPSoCs in addressing the communication challenge. In the thesis, we tackle a few key problems facing high-performance NoC designs. For general-purpose CMPs, we encompass a full system perspective to design high-performance NoC for multi-threaded programs. By exploring the cache coherence under the whole system scenario, we present a smart communication service called Advance Virtual Channel Reservation (AVCR) to provide a highway to target packets, which can greatly reduce their contention delay in NoC. AVCR takes advantage of the fact that we can know or predict the destination of some packets ahead of their arrival at the Network Interface (NI). Exploiting the time interval before a packet is ready, AVCR establishes an end-to-end highway from the source NI to the destination NI. This highway is built up by reserving the Virtual Channel (VC) resources ahead of the target packet transmission and offering priority service to flits in the reserved VC in the wormhole router, which can avoid the target packets’ VC allocation and switch arbitration delay. Besides, we also propose an admission control method in NoC with a centralized Artificial Neural Network (ANN) admission controller, which can improve system performance by predicting the most appropriate injection rate of each node using the network performance information. In the online control process, a data preprocessing unit is applied to simplify the ANN architecture and make the prediction results more accurate. Based on the preprocessed information, the ANN predictor determines the control strategy and broadcasts it to each node where the admission control will be applied. For application-specific MPSoCs, we focus on developing high-performance NoC and NI compatible with the common AMBA AXI4 interconnect protocol. To offer the possibility of utilizing the AXI4 based processors and peripherals in the on-chip network based system, we propose a whole system architecture solution to make the AXI4 protocol compatible with the NoC based communication interconnect in the many-core system. Due to possible out-of-order transmission in the NoC interconnect, which conflicts with the ordering requirements specified by the AXI4 protocol, in the first place, we especially focus on the design of the transaction ordering units, realizing a high-performance and low cost solution to the ordering requirements. The microarchitectures and the functionalities of the transaction ordering units are also described and explained in detail for ease of implementation. Then, we focus on the NI and the Quality of Service (QoS) support in NoC. In our design, the NI is proposed to make the NoC architecture independent from the AXI4 protocol via message format conversion between the AXI4 signal format and the packet format, offering high flexibility to the NoC design. The NoC based communication architecture is designed to support high-performance multiple QoS schemes. The NoC system contains Time Division Multiplexing (TDM) and VC subnetworks to apply multiple QoS schemes to AXI4 signals with different QoS tags and the NI is responsible for traffic distribution between two subnetworks. Besides, a QoS inheritance mechanism is applied in the slave-side NI to support QoS during packets’ round-trip transfer in NoC. / Med utvecklingen av tillverkningsteknologi av on-chip och kraven på högpresterande da-toranläggning växer kärnantalet snabbt i Chip Multi/Many-core Processors (CMPs) ochMultiprocessor Systems-on-Chip (MPSoCs) för att stödja större parallellkörning. Network-on-Chip (NoC) har blivit den de facto lösningen för CMP:er och MPSoC:er för att mötakommunikationsutmaningen. I uppsatsen tar vi upp några viktiga problem med hög-presterande NoC-konstruktioner.Allmänna CMP:er omfattas ett fullständigt systemperspektiv för att design högprester-ande NoC för flertrådad program. Genom att utforska cachekoherensen under hela system-scenariot presenterar vi en smart kommunikationstjänst, AVCR (Advance Virtual ChannelReservation) för att tillhandahålla en motorväg till målpaket, vilket i hög grad kan min-ska deras förseningar i NoC. AVCR utnyttjar det faktum att vi kan veta eller förutsägadestinationen för vissa paket före deras ankomst till nätverksgränssnittet (Network inter-face, NI). Genom att utnyttja tidsintervallet innan ett paket är klart, etablerar AVCRen ände till ände motorväg från källan NI till destinationen NI. Denna motorväg byggsupp genom att reservera virtuell kanal (Virtual Channel, VC) resurser före målpaket-söverföringen och erbjuda prioriterade tjänster till flisar i den reserverade VC i wormholerouter. Dessutom föreslår vi också en tillträdeskontrollmetod i NoC med en centraliseradartificiellt neuronät (Artificial Neural Network, ANN) tillträdeskontroll, som kan förbättrasystemets prestanda genom att förutsäga den mest lämpliga injektionshastigheten för varjenod via nätverksprestationsinformationen. I onlinekontrollprocessen används en förbehan-dlingsenhet på data för att förenkla ANN-arkitekturen och göra förutsägningsresultatenmer korrekta. Baserat på den förbehandlade informationen bestämmer ANN-prediktornkontrollstrategin och sänder den till varje nod där tillträdeskontrollen kommer att tilläm-pas.För applikationsspecifika MPSoC:er fokuserar vi på att utveckla högpresterande NoCoch NI kompatibla med det gemensamma AMBA AXI4 protokoll. För att erbjuda möj-ligheten att använda AXI4-baserade processorer och kringutrustning i det on-chip baseradenätverkssystemet föreslår vi en hel systemarkitekturlösning för att göra AXI4 protokolletkompatibelt med den NoC-baserade kommunikation i det multikärnsystemet. På grundav den out-of-order överföring i NoC, som strider mot ordningskraven som anges i AXI4-protokollet, fokuserar vi i första hand på utformningen av transaktionsordningsenheterna,för att förverkliga en hög prestanda och låg kostnad-lösning på ordningskraven. Sedanfokuserar vi på NI och Quality of Service (QoS)-stödet i NoC. I vår design föreslås NI attgöra NoC-arkitekturen oberoende av AXI4-protokollet via meddelandeformatkonverteringmellan AXI4 signalformatet och paketformatet, vilket erbjuder NoC-designen hög flexi-bilitet. Den NoC-baserade kommunikationsarkitekturen är utformad för att stödja fleraQoS-schema med hög prestanda. NoC-systemet innehåller Time-Division Multiplexing(TDM) och VC-subnät för att tillämpa flera QoS-scheman på AXI4-signaler med olikaQoS-taggar och NI ansvarar för trafikdistribution mellan två subnät. Dessutom tillämpasen QoS-arvsmekanism i slav-sidan NI för att stödja QoS under paketets tur-returöverföringiNoC / <p>QC 20201008</p>
178

AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V Processors

Kamaleldin, Ahmed, Göhringer, Diana 31 May 2024 (has links)
Tile-based many-core architectures are extensively used in modern system-on-chip designs to achieve scalable computing performance with adequate energy efficiency. Heterogeneity is the key element to boost computing performance and keep energy consumption under certain limits for several application domains. However, the steady increase of using many custom heterogeneous tiles leads to an expansion in design and integration cost with limited tiles re-usability. The recent widespread of open-source RISC-V ISA provides the potential to develop modular compute units that can be used for many application domains with high reduction in non-recurring engineering costs. The motivation of this work is to bring design modularity and adaptability features for heterogeneous tile-based many-core architectures by increasing their flexibility to realize different many-core configurations with less design time and costs. In this work, AGILER is proposed as an adaptive tile-base many-core architecture for heterogeneous RISC-V based processors. The proposed architecture consists of modular and adaptable heterogeneous multi-/single-core compute tiles that supports 32-/64-bit RISC-V ISAs with different memory hierarchies. Inter-tile communication is developed based on a scalable network-on-chip architecture to achieve a high degree of system scalability. AGILER supports run-time adaptation through a custom internal reconfiguration manager for dynamic and partial reconfiguration over Xilinx FPGAs. Evaluation results demonstrate that the proposed architecture features a scalable computing performance up to 685 MOPS for 8 x 32-bit tiles and 316 MOPS for 8 x 64-bit tiles with a scalable memory bandwidth up to 7.4 GB/s. AGILER is evaluated on Xilinx Virtex UltrascaleC FPGA with a maximum reconfiguration time of 38.1 ms for a single compute tile.
179

EVALUATION OF SOURCE ROUTING FOR MESH TOPOLOGY NETWORK ON CHIP PLATFORMS

MUBEEN, SAAD January 2009 (has links)
<p>Network on Chip is a scalable and flexible communication infrastructure for the design of core based System on Chip. Communication performance of a NoC depends heavily on the routing algorithm. Deterministic and adaptive distributed routing algorithms have been advocated in all the current NoC architectural proposals. In this thesis we make a case for the use of source routing for NoCs, especially for regular topologies like mesh. The advantages of source routing include in-order packet delivery; faster and simpler router design; and possibility of mixing non-minimal paths in a mainly minimal routing. We propose a method to compute paths for various communications in such a way that traffic congestion is avoided while ensuring deadlock free routing. We also propose an efficient scheme to encode the paths.</p><p>We developed a tool in Matlab that computes paths for source routing for both general and application specific communications. Depending upon the type of traffic, this tool computes paths for source routing by selecting best routing algorithm out of many routing algorithms. The tool uses a constructive path improvement algorithm to compute paths that give more uniform link load distribution. It also generates different types of traffics. We also developed a simulator capable of simulating source routing for mesh topology NoC. The experiments and simulations which we performed were successful and the results show that the advantages of source routing especially lower packet latency more than compensate its disadvantages. The results also demonstrate that source routing can be a good routing candidate for practical core based SoCs design using network on chip communication infrastructure.</p>
180

Desenvolvimento de um sistema dinamicamente reconfigurável baseado em redes intra-chip e ferramenta para posicionamento de módulos. / Development of a dynamically reconfigurable systems under noc and CAD for modules mapping.

Raffo Jara, Mario Andrés 05 February 2010 (has links)
Os sistemas dinamicamente reconfiguráveis (SDRs) são uma alternativa para o desenvolvimento de sistemas sobre silício baseados em circuitos programáveis (SoPC), cujo principal beneficio é o bom aproveitamento da área do dispositivo. Sendo neles implementados circuitos que representam as tarefas que devem operar numa etapa específica do tempo de operação do sistema, permitem um menor consumo de área e de energia, parâmetros importantes nos sistemas portáveis. Isto tem gerado muito interesse no que se refere às metodologias de projeto utilizando FPGAs (Field Programmable Gate Arrays) dinamicamente reconfiguráveis (DRFPGAs) e à definição de um meio de comunicação estruturado para tratar da transferência de dados entre as partes reconfiguráveis e as fixas, mas estas tarefas, assim como a concretização de sua comunicação, seguem sendo ainda essencialmente manuais, devido à falta de metodologias de projeto e ferramentas de CAD que simplifiquem o projeto de SDRs. Este trabalho foca uma das limitações mais efetivas para a adoção da reconfiguração dinâmica: a falta de ferramentas de CAD que suportem o projeto de SDRs, inclusive os baseados em redes intra-chip (NoCs), em particular, no posicionamento dos módulos. Neste trabalho, uma arquitetura para SDRs baseado em NoCs é proposta e um algoritmo de posicionamento dos módulos de um SDR baseado em aspectos reais da família do DRFPGAs é desenvolvido, dentro de uma ferramenta denominada DynoPlace. Desenvolveu-se também um modelo de validação e simulação de SDRs, em tempo de operação, utilizando-se a técnica de chaveamento dinâmico de circuitos. Para o estudo do caso, de validação da arquitetura e metodologia, propõe-se uma aplicação teste baseada em computação de operações aritméticas. A metodologia de simulação permite determinar o tempo da reconfiguração e verificar o comportamento do SDR no momento da reconfiguração. A ferramenta DynoPlace permite gerar os arquivos de restrição de usuário (UCF) de posicionamento dos módulos do SDR no DRFPGA Virtex-4LX25. Este contém informações do posicionamento dos módulos do sistema, dos dispositivos usados para as entradas e saídas do sistema além do posicionamento dos bus-macros. Com os arquivos gerados pela metodologia e ferramenta DynoPlace, pode-se executar com sucesso os scripts da metodologia Early Access da Xilinx para gerar o SDR de forma automática. / Dynamically Reconfigurable Systems (DRSs) are an alternative for developing Systems on a Programmable Chip (SoPC), being the efficient use of device\'s area one of its main advantages. Circuits implemented as DRSs represent tasks which must be active in specific times into the system operation, allowing area and energy saving, which is an important goal for portable systems. This has generated interests on the design methodology using Dynamically Reconfigurable Field Programmable Gate Arrays (DRFPGAs) and on the definition of communication systems for handling data transfer between static and reconfigurable partitions. However, these tasks, as well as the communication structure, are still carried out manually due to lack of design methodologies and CAD tools applied to DRSs design. This work focuses on the one of main drawbacks to the adoption of dynamic reconfiguration methods: the absence of CAD tools which support DRS designs, specifically, in the module positioning task, included, for those based on Network-on-Chip (NoCs). In this work, an architecture for DRSs based on NoCs is presented and an algorithm for module positioning is developed in a tool called DynoPlace as well, based on real specifications of DRFPGAs families. It is also developed a run-time simulation and validation model for DRSs, through a dynamic circuit switching technique. For the validation of architecture and methodology study case, an application test based on arithmetic operations has been proposed. The simulations methodology allows to determine the reconfiguration time and verify the DRS behavior at the moment of reconfiguration. The DynoPlace tool allows to generate User Constraint File (UCF) of DRS\'s modules positioning for the DRFPGA Virtex-4LX25. This file contains information of modules positioning in the system, of the devices used for inputs and outputs of the system, and the positioning of bus-macros. After the files generation by the methodology, and the DynoPlace tool, it is possible to successfully execute the Early Access scripts for generating the DRS automatically.

Page generated in 0.0397 seconds