• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 93
  • 46
  • 20
  • 13
  • 8
  • 2
  • 1
  • 1
  • Tagged with
  • 198
  • 198
  • 60
  • 55
  • 50
  • 46
  • 35
  • 32
  • 32
  • 27
  • 27
  • 27
  • 26
  • 24
  • 22
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

A Link-Level Communication Analysis for Real-Time NoCs

Gholamian, Sina January 2012 (has links)
This thesis presents a link-level latency analysis for real-time network-on-chip interconnects that use priority-based wormhole switching. This analysis incorporates both direct and indirect interferences from other traffic flows, and it leverages pipelining and parallel transmission of data across the links. The resulting link-level analysis provides a tighter worst-case upper-bound than existing techniques, which we verify with our analysis and simulation experiments. Our experiments show that on average, link-level analysis reduces the worst-case latency by 28.8%, and improves the number of flows that are schedulable by 13.2% when compared to previous work.
22

A verilog-hdl implementation of virtual channels in a network-on-chip router

Park, Sungho 15 May 2009 (has links)
As the feature size is continuously decreasing and integration density is increasing, interconnections have become a dominating factor in determining the overall quality of a chip. Due to the limited scalability of system bus, it cannot meet the requirement of current System-on-Chip (SoC) implementations where only a limited number of functional units can be supported. Long global wires also cause many design problems, such as routing congestion, noise coupling, and difficult timing closure. Network-on-Chip (NoC) architectures have been proposed to be an alternative to solve the above problems by using a packet-based communication network. The processing elements (PEs) communicate with each other by exchanging messages over the network and these messages go through buffers in each router. Buffers are one of the major resource used by the routers in virtual channel flow control. In this thesis, we analyze two kinds of buffer allocation approaches, static and dynamic buffer allocations. These approaches aim to increase throughput and minimize latency by means of virtual channel flow control. In statically allocated buffer architecture, size and organization are design time decisions and thus, do not perform optimally for all traffic conditions. In addition, statically allocated virtual channel consumes a waste of area and significant leakage power. However, dynamic buffer allocation scheme claims that buffer utilization can be increased using dynamic virtual channels. Dynamic virtual channel regulator (ViChaR), have been proposed to use centralized buffer architecture which dynamically allocates virtual channels and buffer slots in real-time depending on traffic conditions. This ViChaR’s dynamic buffer management scheme increases buffer utilization, but it also increases design complexity. In this research, we reexamine performance, power consumption, and area of ViChaR’s buffer architecture through implementation. We implement a generic router and a ViChaR architecture using Verilog-HDL. These RTL codes are verified by dynamic simulation, and synthesized by Design Compiler to get area and power consumption. In addition, we get latency through Static Timing Analysis. The results show that a ViChaR’s dynamic buffer management scheme increases the latency and power consumption significantly even though it could increase buffer utilization. Therefore, we need a novel design to achieve high buffer utilization without a loss.
23

HW/SW Codesign and Design, Evaluation of Software Framework for AcENoCs : An FPGA-Accelerated NoC Emulation Platform

Pai, Vinayak 2010 December 1900 (has links)
Majority of the modern day compute intensive applications are heterogeneous in nature. To support their ever increasing computational requirements, present day System-on-Chip (SoC) architectures have adapted multicore style of modeling, thereby incorporating multiple, heterogeneous processing cores on a single chip. The emerging Network-On-Chip (NoC) interconnect paradigm provides a scalable and power-efficient solution for communication among multiple cores, serving as a powerful replacement for traditional bus based architectures. A fast, robust and exible emulation platform is the key to successful realization and validation of such architectures within a very short span of time. This research focuses on various aspects of Hardware/Software (HW/SW) codesign for AcENoCs (Accelerated Emulation Platform for NoCs), a Field Programmable Gate Array (FPGA) accelerated, con gurable, cycle accurate platform for emulation and validation of NoC architectures. This work also details the design, implementation and evaluation of AcENoCs' software framework along with the various design optimizations carried out and tradeoffs considered in AcENoCs' HW/SW codesign for achieving an optimum balance between emulated network dimensions and emulation performance. AcENoCs emulation platform is realized on a Xilinx Virtex-5 FPGA. AcENoCs' hardware framework consists of the NoC built using configurable hardware library components, while the software framework consists of Traffic Generators (TGs) and their associated source queues, Traffic Receptors (TRs) along with statistics analysis module and dynamically controlled emulation clock generator. The software framework is implemented using on-chip Xilinx MicroBlaze processor. This report also describes the interaction between various HW/SW events in an emulation cycle and assesses AcENoCs' performance speedup and tradeoffs over existing FPGA emulators and software simulators. FPGA synthesis results showed that networks with dimensions upto 5x5 could be accommodated inside the device. Varying synthetic traffic workloads, generated by TGs, were used to evaluate the network. Real application based traces were also run on AcENoCs platform to evaluate the performance improvement achieved in comparison to software simulators. For improving the emulator performance, software profiling was carried out to identify and optimize the software components consuming highest number of processor cycles in an emulation cycle. Emulation testcases were run and latency values recorded for varying traffic patterns in order to evaluate AcENoCs platform. Experimental results showed emulation speedups in order of 10000-12000X over HDL (Hardware Description Language) simulators and 14-47X over software simulators, without sacri cing cycle accuracy.
24

CoNoC: Fast Full Chip Topology Generation for Application-Specific Network on Chip

Chen, Shu-yu 08 January 2010 (has links)
We propose a synthesis methodology for Network-on-Chips (NoC) or NoC-based multiprocessor systems-on-chip (MPSoCs) for application-specific or irregular topology generation.We first propose simultaneously synthesize both for processor and communication architectures in order to estimate area and routing more accurately during floorplanning stage, which is different with traditional router and link insertion after floorplanning. Our NoC topology generation is simultaneously optimized for fast, low power and wirelength. Compared with the state of art, our results outperforms averagely 445.45 X in CPU time, 33.20 % in power consumption, and 96.86 % in wirelength at cost of NoC Size of more 2.26 % because our method considering router shape; the number of routers of more 20.63 % because our method only allows router port limit of 5; the number of links of more 3.93 % because our method allows different link lengths. Also our method is scalable and experiments of 2 X, 4 X, 8 X and 16 X outperform averagely 355,089.11 X in CPU time, 1.21 X in the number hops, 78.33 % in power consumption. Our experimental results show our synthesis method is effective, efficiently and scalable.
25

A Link-Level Communication Analysis for Real-Time NoCs

Gholamian, Sina January 2012 (has links)
This thesis presents a link-level latency analysis for real-time network-on-chip interconnects that use priority-based wormhole switching. This analysis incorporates both direct and indirect interferences from other traffic flows, and it leverages pipelining and parallel transmission of data across the links. The resulting link-level analysis provides a tighter worst-case upper-bound than existing techniques, which we verify with our analysis and simulation experiments. Our experiments show that on average, link-level analysis reduces the worst-case latency by 28.8%, and improves the number of flows that are schedulable by 13.2% when compared to previous work.
26

Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

An, Baik Song 2012 August 1900 (has links)
High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%.
27

Design and Analysis of Location Cache in a Network-on-Chip Based Multiprocessor System

Ramakrishnan, Divya 20 April 2009 (has links)
No description available.
28

Scalable Hybrid Neuromorphic Accelerator & Hybrid Neural Networks

Nardone, Joshua 01 June 2024 (has links) (PDF)
With machine learning workloads currently at very large scales, models are distributed across large compute systems. On distributed systems, the performance of these models are limited by the bandwidth limitations of chip-to-chip communication. To relieve this bottleneck, spiking neural networks (SNNs) can be utilized to reduce inter-chip communication traffic utilizing inherit network sparsity. However, in comparison to traditional artificial neural networks (ANNs), SNNs can have significant degradation in performance with increased network scale and complexity. This research proposes a hybrid neural network accelerator that uses the best of both spiking and non-spiking layers by allocating a majority of resources to nonspiking layers on the interior of the chip while bandwidth-limited areas (e.g., I/O pads, or chip separation boundaries) employ spike-based data traffic. By limiting the overall use of spiking layers within the network, we realize the energy savings of SNNs without the a degradation in accuracy which comes with large spike-based models. We present a scalable chiplet architecture and show how hybrid data is managed with both spike and non-spiking data communication. We also demonstrate how the asynchronous spike-based model is integrated efficiently with the synchronous artificial-based deep learning workloads. We demonstrate that our hybrid architecture offers significant improvements in performance, accuracy, and energy consumption in comparison to SNNs and ANNs. With up to a 1.34× increase in energy efficiency and 1.56× decrease in single inference latency, the versatility of the architecture is demonstrated by its validation across multiple datasets, encompassing both language processing and computer vision tasks.
29

Algoritmo de prefetching de dados temporizado para sistemas multiprocessadores baseados em NOC

SILVEIRA, Maria Cireno Ribeiro 09 March 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-03-15T13:58:26Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) UFPE-MEI 2015-078 - Maria Cireno Ribeiro Silveira.pdf: 4578273 bytes, checksum: 1c434494e0c03cb02156a37ebfd1c7da (MD5) / Made available in DSpace on 2016-03-15T13:58:26Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) UFPE-MEI 2015-078 - Maria Cireno Ribeiro Silveira.pdf: 4578273 bytes, checksum: 1c434494e0c03cb02156a37ebfd1c7da (MD5) Previous issue date: 2015-03-09 / O prefetching é uma técnica considerada e ciente para mitigar um problema já conhecido em sistemas computacionais: a diferença entre o desempenho do processador e do acesso à memória. O objetivo do prefetching é aproximar o dado do processador retirando-o da memória e carregando na cache local. Uma vez que o dado seja requisitado pelo processador, ele já estará disponível na cache, reduzindo a taxa de perdas e a penalidade do sistema. Para sistemas multiprocessadores baseados em NoCs a e ciência do prefetching é ainda mais crítica em relação ao desempenho, uma vez que o tempo de acesso ao dado varia dependendo da distância entre processador e memória e do tráfego da rede. Este trabalho propõe um algoritmo de prefetching de dados temporizado, que tem como objetivo minimizar a penalidade dos núcleos através uma solução de prefetching baseada em predição de tempo para sistemas multiprocessadores baseados em NoC. O algoritmo utiliza um processo pró-ativo iniciado pelo servidor para realizar requisições de prefetching baseado no histórico de perdas de cache e informações da NoC. Nos experimentos realizados para 16 núcleos, o algoritmo proposto reduziu a penalidade dos processadores em 53,6% em comparação com o prefetching baseado em eventos (faltas na cache), sendo a maior redução de 29% da penalidade. / The prefetching technique is an e ective approach to mitigate a well-known problem in multi-core processors: the gap between computing and data access performance. The goal of prefetching is to approximate data to the CPU by retrieving the data from the memory and loading it in the cache. When the data is requested by the CPU, it is already available in the cache, reducing the miss rate and penalty. In multiprocessor NoC-based systems the prefetching e ciency is even more critical to system performance, since the access time depends of the distance between the requesting processor and the memory and also of the network tra c. This work proposes a temporized data prefetching algorithm that aims to minimize the penalty of the cores through one prefetching solution based on time prediction for multiprocessor NoC-based systems. The algorithm utilizes a proactive process initiated by the server to request prefetching data based on cache miss history and NoC's information. In the experiments for 16 cores, the proposed algorithm has successfully reduced the processors penalty in 53,6% compared to the event-based prefetching and the best case was a penalty reduction of 29%.
30

Estratégia para redução de congestionamento em sistemas multiprocessadores baseados em NOC

KAMEI, Camila Ascendina Nunes 07 August 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-01T13:03:48Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao_Camila_Ascendina_Nunes_Kamei.pdf: 2427056 bytes, checksum: 9c4bd5bb499271557f86edce757edec2 (MD5) / Made available in DSpace on 2016-07-01T13:03:48Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao_Camila_Ascendina_Nunes_Kamei.pdf: 2427056 bytes, checksum: 9c4bd5bb499271557f86edce757edec2 (MD5) Previous issue date: 2015-08-07 / CNPq / Duas questões são críticas em sistemas com paralelismo de memória em rede NoC baseados em MPSoC, a ordem de entrega da mensagem e o congestionamento da rede. Os congestionamentos são frequentes em NoC quando as demandas de pacotes excedem a capacidade dos recursos da rede e a ordem das mensagens precisam ser mantidas para que a informação de coerência de cache tenha signi cado para as memórias. Assim, métodos de controle de congestionamento são necessários para estes sistemas e devem lidar com o congestionamento da rede, enquanto mantém a ordem das transações. Este trabalho propõe uma técnica de roteamento baseada no algoritmo de roteamento Odd-Even associado ao conceito de congestionamento local e global da rede para a escolha do melhor caminho de encaminhamento dos pacotes de comunicação. Desta forma se objetiva a redução dos gargalos de comunicação da rede para os sistemas NoC baseado em MPSoC. Nos experimentos realizados para 16 núcleos, a técnica proposta alcançou a redução de 13,35% da energia consumida, 25% de redução de latência de envio de pacotes em comparação o algoritmo XY e 23% de redução de latência de envio de pacotes em comparação o algoritmo Odd-Even sem modi cação. / Two issues are critical in systems with memory parallelism network NoC-based MPSoC, the delivery order of messages and network congestion. The congestions are frequent in NoC when the packages demands exceed the capacity of the network resources and the order of the messages need to be maintained so that the cache coherency information is meaningful to the memories. Thus, congestion control methods are needed to deal with network congestion while they keep the order of the transactions. This paper proposes the use of the routing algorithm Odd-Even associated with the concept of local and global network congestion to choose the best routing path of communication packages. In this way it aims to reduce the network communication bottlenecks for NoC systems based on MPSoC. In experiments conducted for 16 cores, the proposed technique has achieved the reduction of 13.35 % of energy consumption, 25% of latency compared with the XY algorithm and 23% of latency compared with the Odd-Even algorithm without the modi cation.

Page generated in 0.0936 seconds