• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 46
  • 20
  • 13
  • 8
  • 2
  • 1
  • 1
  • Tagged with
  • 195
  • 195
  • 59
  • 55
  • 50
  • 45
  • 35
  • 32
  • 32
  • 27
  • 27
  • 27
  • 26
  • 24
  • 22
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

A Link-Level Communication Analysis for Real-Time NoCs

Gholamian, Sina January 2012 (has links)
This thesis presents a link-level latency analysis for real-time network-on-chip interconnects that use priority-based wormhole switching. This analysis incorporates both direct and indirect interferences from other traffic flows, and it leverages pipelining and parallel transmission of data across the links. The resulting link-level analysis provides a tighter worst-case upper-bound than existing techniques, which we verify with our analysis and simulation experiments. Our experiments show that on average, link-level analysis reduces the worst-case latency by 28.8%, and improves the number of flows that are schedulable by 13.2% when compared to previous work.
22

A verilog-hdl implementation of virtual channels in a network-on-chip router

Park, Sungho 15 May 2009 (has links)
As the feature size is continuously decreasing and integration density is increasing, interconnections have become a dominating factor in determining the overall quality of a chip. Due to the limited scalability of system bus, it cannot meet the requirement of current System-on-Chip (SoC) implementations where only a limited number of functional units can be supported. Long global wires also cause many design problems, such as routing congestion, noise coupling, and difficult timing closure. Network-on-Chip (NoC) architectures have been proposed to be an alternative to solve the above problems by using a packet-based communication network. The processing elements (PEs) communicate with each other by exchanging messages over the network and these messages go through buffers in each router. Buffers are one of the major resource used by the routers in virtual channel flow control. In this thesis, we analyze two kinds of buffer allocation approaches, static and dynamic buffer allocations. These approaches aim to increase throughput and minimize latency by means of virtual channel flow control. In statically allocated buffer architecture, size and organization are design time decisions and thus, do not perform optimally for all traffic conditions. In addition, statically allocated virtual channel consumes a waste of area and significant leakage power. However, dynamic buffer allocation scheme claims that buffer utilization can be increased using dynamic virtual channels. Dynamic virtual channel regulator (ViChaR), have been proposed to use centralized buffer architecture which dynamically allocates virtual channels and buffer slots in real-time depending on traffic conditions. This ViChaR’s dynamic buffer management scheme increases buffer utilization, but it also increases design complexity. In this research, we reexamine performance, power consumption, and area of ViChaR’s buffer architecture through implementation. We implement a generic router and a ViChaR architecture using Verilog-HDL. These RTL codes are verified by dynamic simulation, and synthesized by Design Compiler to get area and power consumption. In addition, we get latency through Static Timing Analysis. The results show that a ViChaR’s dynamic buffer management scheme increases the latency and power consumption significantly even though it could increase buffer utilization. Therefore, we need a novel design to achieve high buffer utilization without a loss.
23

HW/SW Codesign and Design, Evaluation of Software Framework for AcENoCs : An FPGA-Accelerated NoC Emulation Platform

Pai, Vinayak 2010 December 1900 (has links)
Majority of the modern day compute intensive applications are heterogeneous in nature. To support their ever increasing computational requirements, present day System-on-Chip (SoC) architectures have adapted multicore style of modeling, thereby incorporating multiple, heterogeneous processing cores on a single chip. The emerging Network-On-Chip (NoC) interconnect paradigm provides a scalable and power-efficient solution for communication among multiple cores, serving as a powerful replacement for traditional bus based architectures. A fast, robust and exible emulation platform is the key to successful realization and validation of such architectures within a very short span of time. This research focuses on various aspects of Hardware/Software (HW/SW) codesign for AcENoCs (Accelerated Emulation Platform for NoCs), a Field Programmable Gate Array (FPGA) accelerated, con gurable, cycle accurate platform for emulation and validation of NoC architectures. This work also details the design, implementation and evaluation of AcENoCs' software framework along with the various design optimizations carried out and tradeoffs considered in AcENoCs' HW/SW codesign for achieving an optimum balance between emulated network dimensions and emulation performance. AcENoCs emulation platform is realized on a Xilinx Virtex-5 FPGA. AcENoCs' hardware framework consists of the NoC built using configurable hardware library components, while the software framework consists of Traffic Generators (TGs) and their associated source queues, Traffic Receptors (TRs) along with statistics analysis module and dynamically controlled emulation clock generator. The software framework is implemented using on-chip Xilinx MicroBlaze processor. This report also describes the interaction between various HW/SW events in an emulation cycle and assesses AcENoCs' performance speedup and tradeoffs over existing FPGA emulators and software simulators. FPGA synthesis results showed that networks with dimensions upto 5x5 could be accommodated inside the device. Varying synthetic traffic workloads, generated by TGs, were used to evaluate the network. Real application based traces were also run on AcENoCs platform to evaluate the performance improvement achieved in comparison to software simulators. For improving the emulator performance, software profiling was carried out to identify and optimize the software components consuming highest number of processor cycles in an emulation cycle. Emulation testcases were run and latency values recorded for varying traffic patterns in order to evaluate AcENoCs platform. Experimental results showed emulation speedups in order of 10000-12000X over HDL (Hardware Description Language) simulators and 14-47X over software simulators, without sacri cing cycle accuracy.
24

CoNoC: Fast Full Chip Topology Generation for Application-Specific Network on Chip

Chen, Shu-yu 08 January 2010 (has links)
We propose a synthesis methodology for Network-on-Chips (NoC) or NoC-based multiprocessor systems-on-chip (MPSoCs) for application-specific or irregular topology generation.We first propose simultaneously synthesize both for processor and communication architectures in order to estimate area and routing more accurately during floorplanning stage, which is different with traditional router and link insertion after floorplanning. Our NoC topology generation is simultaneously optimized for fast, low power and wirelength. Compared with the state of art, our results outperforms averagely 445.45 X in CPU time, 33.20 % in power consumption, and 96.86 % in wirelength at cost of NoC Size of more 2.26 % because our method considering router shape; the number of routers of more 20.63 % because our method only allows router port limit of 5; the number of links of more 3.93 % because our method allows different link lengths. Also our method is scalable and experiments of 2 X, 4 X, 8 X and 16 X outperform averagely 355,089.11 X in CPU time, 1.21 X in the number hops, 78.33 % in power consumption. Our experimental results show our synthesis method is effective, efficiently and scalable.
25

A Link-Level Communication Analysis for Real-Time NoCs

Gholamian, Sina January 2012 (has links)
This thesis presents a link-level latency analysis for real-time network-on-chip interconnects that use priority-based wormhole switching. This analysis incorporates both direct and indirect interferences from other traffic flows, and it leverages pipelining and parallel transmission of data across the links. The resulting link-level analysis provides a tighter worst-case upper-bound than existing techniques, which we verify with our analysis and simulation experiments. Our experiments show that on average, link-level analysis reduces the worst-case latency by 28.8%, and improves the number of flows that are schedulable by 13.2% when compared to previous work.
26

Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems

An, Baik Song 2012 August 1900 (has links)
High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%.
27

Design and Analysis of Location Cache in a Network-on-Chip Based Multiprocessor System

Ramakrishnan, Divya 20 April 2009 (has links)
No description available.
28

Algoritmo de prefetching de dados temporizado para sistemas multiprocessadores baseados em NOC

SILVEIRA, Maria Cireno Ribeiro 09 March 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-03-15T13:58:26Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) UFPE-MEI 2015-078 - Maria Cireno Ribeiro Silveira.pdf: 4578273 bytes, checksum: 1c434494e0c03cb02156a37ebfd1c7da (MD5) / Made available in DSpace on 2016-03-15T13:58:26Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) UFPE-MEI 2015-078 - Maria Cireno Ribeiro Silveira.pdf: 4578273 bytes, checksum: 1c434494e0c03cb02156a37ebfd1c7da (MD5) Previous issue date: 2015-03-09 / O prefetching é uma técnica considerada e ciente para mitigar um problema já conhecido em sistemas computacionais: a diferença entre o desempenho do processador e do acesso à memória. O objetivo do prefetching é aproximar o dado do processador retirando-o da memória e carregando na cache local. Uma vez que o dado seja requisitado pelo processador, ele já estará disponível na cache, reduzindo a taxa de perdas e a penalidade do sistema. Para sistemas multiprocessadores baseados em NoCs a e ciência do prefetching é ainda mais crítica em relação ao desempenho, uma vez que o tempo de acesso ao dado varia dependendo da distância entre processador e memória e do tráfego da rede. Este trabalho propõe um algoritmo de prefetching de dados temporizado, que tem como objetivo minimizar a penalidade dos núcleos através uma solução de prefetching baseada em predição de tempo para sistemas multiprocessadores baseados em NoC. O algoritmo utiliza um processo pró-ativo iniciado pelo servidor para realizar requisições de prefetching baseado no histórico de perdas de cache e informações da NoC. Nos experimentos realizados para 16 núcleos, o algoritmo proposto reduziu a penalidade dos processadores em 53,6% em comparação com o prefetching baseado em eventos (faltas na cache), sendo a maior redução de 29% da penalidade. / The prefetching technique is an e ective approach to mitigate a well-known problem in multi-core processors: the gap between computing and data access performance. The goal of prefetching is to approximate data to the CPU by retrieving the data from the memory and loading it in the cache. When the data is requested by the CPU, it is already available in the cache, reducing the miss rate and penalty. In multiprocessor NoC-based systems the prefetching e ciency is even more critical to system performance, since the access time depends of the distance between the requesting processor and the memory and also of the network tra c. This work proposes a temporized data prefetching algorithm that aims to minimize the penalty of the cores through one prefetching solution based on time prediction for multiprocessor NoC-based systems. The algorithm utilizes a proactive process initiated by the server to request prefetching data based on cache miss history and NoC's information. In the experiments for 16 cores, the proposed algorithm has successfully reduced the processors penalty in 53,6% compared to the event-based prefetching and the best case was a penalty reduction of 29%.
29

Estratégia para redução de congestionamento em sistemas multiprocessadores baseados em NOC

KAMEI, Camila Ascendina Nunes 07 August 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-01T13:03:48Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao_Camila_Ascendina_Nunes_Kamei.pdf: 2427056 bytes, checksum: 9c4bd5bb499271557f86edce757edec2 (MD5) / Made available in DSpace on 2016-07-01T13:03:48Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) dissertacao_Camila_Ascendina_Nunes_Kamei.pdf: 2427056 bytes, checksum: 9c4bd5bb499271557f86edce757edec2 (MD5) Previous issue date: 2015-08-07 / CNPq / Duas questões são críticas em sistemas com paralelismo de memória em rede NoC baseados em MPSoC, a ordem de entrega da mensagem e o congestionamento da rede. Os congestionamentos são frequentes em NoC quando as demandas de pacotes excedem a capacidade dos recursos da rede e a ordem das mensagens precisam ser mantidas para que a informação de coerência de cache tenha signi cado para as memórias. Assim, métodos de controle de congestionamento são necessários para estes sistemas e devem lidar com o congestionamento da rede, enquanto mantém a ordem das transações. Este trabalho propõe uma técnica de roteamento baseada no algoritmo de roteamento Odd-Even associado ao conceito de congestionamento local e global da rede para a escolha do melhor caminho de encaminhamento dos pacotes de comunicação. Desta forma se objetiva a redução dos gargalos de comunicação da rede para os sistemas NoC baseado em MPSoC. Nos experimentos realizados para 16 núcleos, a técnica proposta alcançou a redução de 13,35% da energia consumida, 25% de redução de latência de envio de pacotes em comparação o algoritmo XY e 23% de redução de latência de envio de pacotes em comparação o algoritmo Odd-Even sem modi cação. / Two issues are critical in systems with memory parallelism network NoC-based MPSoC, the delivery order of messages and network congestion. The congestions are frequent in NoC when the packages demands exceed the capacity of the network resources and the order of the messages need to be maintained so that the cache coherency information is meaningful to the memories. Thus, congestion control methods are needed to deal with network congestion while they keep the order of the transactions. This paper proposes the use of the routing algorithm Odd-Even associated with the concept of local and global network congestion to choose the best routing path of communication packages. In this way it aims to reduce the network communication bottlenecks for NoC systems based on MPSoC. In experiments conducted for 16 cores, the proposed technique has achieved the reduction of 13.35 % of energy consumption, 25% of latency compared with the XY algorithm and 23% of latency compared with the Odd-Even algorithm without the modi cation.
30

Advanced Connection Allocation Techniques in Circuit Switching Network on Chip

Chen, Yong 14 September 2017 (has links) (PDF)
With the advancement of semiconductor technology, the System on Chip (SoC) is becoming more and more complex, so the on-chip communication has become a bottleneck of SoC Design. Since the traditional bus system is inefficient and not scalable, the Network-On-Chip (NoC) has emerged as the promising communication mechanism for complex SoCs. As some systems have specific performance requirements, such as a minimum throughput (for real-time streaming data) or bounded latency (for interrupts, process synchronization, etc), communication with Guaranteed Service (GS) support becomes crucial for predictable SoC architectures. Circuit Switching (CS) is a popular approach to support GS, which firstly has to allocate an exclusively connection (circuit) between the source and destination nodes, and then the data packets are delivered over this connection. However, it is inefficient and inflexible because the resource is occupied by single connection during its whole lifetime, which can block other communications. Hence, two extensions of CS have been proposed to share resources: i) Time-Division Multiplexing (TDM), in which the available link capacity is split into multiple time slots to be shared by different flows in TDM scheme; and ii) Space-Division-Multiplexing (SDM), in which only a subset (sub-channel) of the link wires is exclusively allocated to a specific connection, while the remaining wires of the link can be used by other flows. The connection allocation is critical for CS, since the data delivery can start only after the associated connection is allocated. In this thesis, we propose a dedicated hardware connection allocator to solve the dynamic connection allocation problem for CS NoCs, which has to i) allocate a contention-free path between source-destination pairs and ii) allocate appropriate portions of link bandwidth (appropriate number of time slots and subsets) along the path. The dedicated connection allocator, called NoCManager, solves the connection allocation problem by employing a trellis-search based shortest path algorithm. The trellis search can explore all possible paths between source node and destination. Moreover, it shall find the requested path in a fixed low latency and can guarantee the path optimality in terms of path length if the path is available. In this thesis, two different trellis graphs, Forward-Backtrack trellis and Register-Exchange trellis are proposed. The Forward-Backtrack trellis completes the path search in two steps: forward search and backtracking. Firstly, the forward search begins at source node that traverses the network to find the free path. When destination node is reached, the backtrack starts from destination to select the survivor path and collect the associated path parameters. However, Register-Exchange trellis saves the entire survivor path sequences during forward search. Consequently, the backtracking step can be omitted, and thus the allocation time is halved compared to forward-backtrack approaches. Moreover, each trellis graph consists of three categories, unfolded structure, folded structure and bidirectional structure. The unfolded structure can provide high allocation speed while folded structure is more efficient from a hardware point of view. The bidirectional structure starts the search at two sides, source node and destination node simultaneously, so the allocation speed is 2 times faster than previous unidirectional search. Furthermore, in order to address the scalability issue of previous centralized systems, the partitioned architecture (i.e. spatial partitioning technique) is proposed to divide the large system into multiple smaller differentiated logical partitions served by local NoCManagers. This partitioning technique keeps the request load of the manager and manager-node communication overhead moderate. Inside each partition, the path search problem is solved by a local manager with trellis-search algorithm. To establish a path that crosses partitions, the managers communicate with each other in distributed manner to converge the global path. In order to further enhance the path diversity and resource utilization, we adopt the combined TDM and SDM technique. In combined TDM-SDM approach, each SDM sub-channel is split into multiple time slots so that can be shared by multiple flows. Hence, the number of sub-channels can be kept moderate to reduce router complexity, while still providing higher path diversity than TDM scheme. In order to investigate and optimize TDM-SDM partitioning strategy, we studied the influence of different TDM-SDM link partitioning strategies on success rate and path length that allowed us to find the optimal solution. The dedicated connection allocator using the trellis-search algorithm is employed for TDM, SDM and TDM-SDM CS. In the end, we present the router architecture that combines the circuit-switching network (for GS communication) and packet-switching network (for best-effort communication).

Page generated in 0.0327 seconds