Global ETD Search

11	On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation Bhardwaj, Kshitij January 2018 (has links) In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch. Engineering Computer science Networks on a chip Multicasting (Computer networks)
12	Design and Optimization of Networks-on-Chip for Future Heterogeneous Systems-on-Chip Yoon, Young Jin January 2017 (has links) Due to the tight power budget and reduced time-to-market, Systems-on-Chip (SoC) have emerged as a power-efficient solution that provides the functionality required by target applications in embedded systems. To support a diverse set of applications such as real-time video/audio processing and sensor signal processing, SoCs consist of multiple heterogeneous components, such as software processors, digital signal processors, and application-specific hardware accelerators. These components offer different flexibility, power, and performance values so that SoCs can be designed by mix-and-matching them. With the increased amount of heterogeneous cores, however, the traditional interconnects in an SoC exhibit excessive power dissipation and poor performance scalability. As an alternative, Networks-on-Chip (NoC) have been proposed. NoCs provide modularity at design-time because communications among the cores are isolated from their computations via standard interfaces. NoCs also exploit communication parallelism at run-time because multiple data can be transferred simultaneously. In order to construct an efficient NoC, the communication behaviors of various heterogeneous components in an SoC must be considered with the large amount of NoC design parameters. Therefore, providing an efficient NoC design and optimization framework is critical to reduce the design cycle and address the complexity of future heterogeneous SoCs. This is the thesis of my dissertation. Some existing design automation tools for NoCs support very limited degrees of automation that cannot satisfy the requirements of future heterogeneous SoCs. First, these tools only support a limited number of NoC design parameters. Second, they do not provide an integrated environment for software-hardware co-development. Thus, I propose FINDNOC, an integrated framework for the generation, optimization, and validation of NoCs for future heterogeneous SoCs. The proposed framework supports software-hardware co-development, incremental NoC design-decision model, SystemC-based NoC customization and generation, and fast system protyping with FPGA emulations. Virtual channels (VC) and multiple physical (MP) networks are the two main alternative methods to provide better performance, support quality-of-service, and avoid protocol deadlocks in packet-switched NoC design. To examine the effect of using VCs and MPs with other NoC architectural parameters, I completed a comprehensive comparative analysis that combines an analytical model, synthesis-based designs for both FPGAs and standard-cell libraries, and system-level simulations. Based on the results of this analysis, I developed VENTTI, a design and simulation environment that combines a virtual platform (VP), a NoC synthesis tool, and four NoC models characterized at different abstraction levels. VENTTI facilitates an incremental decision-making process with four NoC abstraction models associated with different NoC parameters. The selected NoC parameters can be validated by running simulations with the corresponding model instantiated in the VP. I augmented this framework to complete FINDNOC by implementing ICON, a NoC generation and customization tool that dynamically combines and customizes synthesizable SystemC components from a predesigned library. Thanks to its flexibility and automatic network interface generation capabilities, ICON can generate a rich variety of NoCs that can be then integrated into any Embedded Scalable Platform (ESP) architectures for fast prototying with FPGA emulations. I designed FINDNOC in a modular way that makes it easy to augmenting it with new capabilities. This, combined with the continuous progress of the ESP design methodology, will provide a seamless SoC integration framework, where the hardware accelerators, software applications, and NoCs can be designed, validated, and integrated simultaneously, in order to reduce the design cycle of future SoC platforms. Signal processing Networks on a chip Systems on a chip Computer science Computer engineering
13	Design Space Analysis and a Novel Routing Agorithm for Unstructured Networks-on-Chip Parashar, Neha 01 January 2010 (has links) Traditionally, on-chip network communication was achieved with shared medium networks where devices shared the transmission medium with only one device driving the network at a time. To avoid performance losses, it required a fast bus arbitration logic. However, a single shared bus has serious limitations with the heterogeneous and multi-core communication requirements of today's chip designs. Point-to-point or direct networks solved some of the scalability issues, but the use of routers and of rather complex algorithms to connect nodes during each cycle caused new bottlenecks. As technology scales, the on-chip physical interconnect presents an increasingly limiting factor for performance and energy consumption. Network-on-chip, an emerging interconnect paradigm, provide solutions to these interconnect and communication challenges. Motivated by future bottom-up self-assembled fabrication techniques, which are believed to produce largely unstructured interconnect fabrics in a very inexpensive way, the goal of this thesis is to explore the design trade-offs of such irregular, heterogeneous, and unreliable networks. The important measures we care about for our complex on-chip network models are the information transfer, congestion avoidance, throughput, and latency. We use two control parameters and a network model inspired by Watts and Strogatz's small-world network model to generate a large class of different networks. We then evaluate their cost and performance and introduce a function which allows us to systematically explore the trade-offs between cost and performance depending on the designer's requirement. We further evaluate these networks under different traffic conditions and introduce an adaptive and topology-agnostic ant routing algorithm that does not require any global control and avoids network congestion. Networks on a chip Computer algorithms Routing (Computer network management)
14	Optimal Network Topologies and Resource Mappings for Heterogeneous Networks-on-Chip Chung, Haera 01 January 2013 (has links) Communication has become a bottleneck for modern microprocessors and multi-core chips because metal wires don't scale. The problem becomes worse as the number of components increases and chips become bigger. Traditional Systems-on-Chips (SoCs) interconnect architectures are based on shared-bus communication, which can carry only one communication transaction at a time. This limits the communication bandwidth and scalability. Networks-on-Chip (NoC) were proposed as a promising solution for designing large and complex SoCs. The NoC paradigm provides better scalability and reusability for future SoCs, however, long-distance multi-hop communication through traditional metal wires suffers from both high latency and power consumption. A radical solution to address this challenge is to add long-range, low power, and high-bandwidth single-hop links between distant cores. The use of optical or on-chip RF wireless links has been explored in this context. However, all previous work has focused on regular mesh-based metal wire fabrics that were expanded with one or two additional link types only for long-distance communication. In this thesis we address the following main research questions to address the above-mentioned challenges: (1) What library of different link types would represent an optimum in the design space? (2) How would these links be used to design an application-specific NoC architecture? (3) How would applications use the resulting NoC architecture efficiently? We hypothesize that networks with a higher degree of heterogeneity, i.e., three or more link types, will improve the network throughput and consume less energy compared to traditional NoC architectures. In order to verify our hypothesis and to address the research challenges, we design and analyze optimal heterogeneous networks under different realistic traffic models by considering different cost and performance trade-offs in a comprehensive technology-agnostic simulation framework that uses metaheuristic optimization techniques. As opposed to related work, our heterogeneous links can be placed anywhere in the network, which allows to explore the entire search space. The resulting application-specific networks are then analyzed by using complex network techniques, such as community detection and small-worldness, to understand how heterogeneous link types are used to improve the NoCs performance and cost. Next, we use the application-specific networks as a target architecture for other applications. The goal is to evaluate the performance of our new NoCs for applications they have not been designed for by finding optimal resource allocations. Our results show that there is an optimal number of heterogeneous link types for each set of constraints and that networks with three or more heterogeneous link types provide significantly higher throughput along with lower energy consumption compared to both homogeneous link type and regular 2D mesh networks under three different traffic scenarios. Our evolved networks with three different technology-driven link types, namely metal wires, wireless, and optical links, provide 15% more throughput and fourteen times less energy consumption compared to homogeneous link type network. When ten different abstract link types are used in the design, 12% more throughput and 52% less energy consumption are obtained compared to networks with three different technology-driven link types. This shows that heterogeneous NoC designs based on traditional metal wires, wireless, and optical links, occupy a non-optimal spot in the entire design space. Our results further show that heterogeneous NoCs scale up significantly better in terms of performance and cost compared to mesh networks. We uncovered that network communities evolve robustly and that heterogeneous link types are efficiently establishing inter- and intra-subnet connections depending on their link type properties. We also show that mapping an application on our application-specific NoC architecture provides on average 45% more throughput at 70% less energy consumption compared to regular 2D mesh networks. The NoCs are therefore not only good for the application they were designed for, but for a broad range of other applications as well. Networks on a chip Heterogeneous computing Computer and Systems Architecture Systems and Communications
15	Analysis and optimization of global interconnects for many-core architectures Balakrishnan, Anant 02 December 2010 (has links) The objective of this thesis is to develop circuit-aware interconnect technology optimization for network-on-chip based many-core architectures. The dimensions of global interconnects in many-core chips are optimized for maximum bandwidth density and minimum delay taking into account network-on-chip router latency and size effects of copper. The optimal dimensions thus obtained are used to characterize different network-on-chip topologies based on wiring area utilization, maximum core-to-core channel width, aggregate chip bandwidth and worse case latency. Finally, the advantages of many-core many-tier chips are evaluated for different network-on-chip topologies. Area occupied by a router within a core is shown to be the bottleneck to achieve higher performance in network-on-chip based architectures. Circuit optimization Routing Delay estimation Multiprocessor interconnections Networks on a chip Moore's law Semiconductors
16	Construção e avaliação de uma solução eficiente para comunicação entre processadores SPARCv8 / Development and evaluation of an efficient solution for SPARCv8 processors communication Abdnur, Thiago Borges, 1984- 12 November 2012 (has links) Orientador: Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-22T08:24:56Z (GMT). No. of bitstreams: 1 Abdnur_ThiagoBorges_M.pdf: 3580657 bytes, checksum: 2f83cda26eeb7b31a6ed647c31e27117 (MD5) Previous issue date: 2012 / Resumo: Com a mudança da maior parte das arquiteturas convencionais para multi-core a comunica _cão entre as diferentes unidades de processamento se torna um problema de destaque, principalmente no que tange _a transferência de dados entre cores. Apesar do enorme impacto no desempenho, é limitado o número de trabalhos científicos que tratam sobre novas soluções para o problema, o foco mais comum é realizar a comunicação através da memória ou endereços específicos mapeados em memória. Nesta dissertação foi definido um modelo de comunicação que acrescenta três novas instruções ao conjunto de instruções do SPARCv8, permitindo que diferentes cores transportem dados entre si diretamente, sem a latência derivada do uso de uma memória compartilhada e de Lucas, como _e o caso da atual implementação do LEON3. Avaliou-se esse modelo de comunicação através de diversos tipos de aplicações sintéticas como produtor-consumidor e pipeline. Para tornar o protótipo em FPGA mais realista, também foi construído um modelo de atraso para a memória principal do sistema, para que o desempenho relativo entre processador e memória _que mais próximo do real. Foi adicionado um suporte básico _as novas instruções no compilador para seu uso em código C através de asm-inline. De forma geral, obteve-se ganhos de 3% _a até 70 vezes, em termos de tempo de execução, em comparação ao uso de memória compartilhada e Lucas / Abstract: As processors design shift towards multicore architectures, new challenges arise to increase the core to core communication efficiency. Despite the potential huge performance impact, the number of papers focusing on this problem is limited. In this project, we define a communication model, adding three new instructions to the SPARCv8 instruction set, to allow different cores to communicate directly, without the shared memory and lock latencies. We implemented the model inside the LEON3 VHDL and evaluated it using synthetic benchmarks like producer-consumer and pipeline. To make the FPGA prototype timings more realistic, we also implemented a new memory timer so that it keeps the processor-memory speed ratio closer to real values. We also created the basic compiler support for these new instructions through intrinsic, converted to inline assembly in C code. Our overall results improve the performance from 3% to up to 70 times faster / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Arquitetura de computador Redes-em-chip Circuitos integrados digitais Computer architecture Networks on a chip Digital integrated circuits
17	Integrated Hybrid Voltage Regulation and Adaptive Clocking for System-on-Chips Loscalzo, Erik Jens January 2024 (has links) System-on-chips (SoCs) have become fundamental components in modern electronic devices, from low-power microcontrollers to high-performance AI computing systems. With the increasing demand for performance and efficiency, innovative approaches in power management and clocking mechanisms are increasingly important. One such approach combines multiple regulator architectures to form a hybrid voltage regulation, which this work demonstrated with buck converters and digital low-dropout (D-LDO) regulators. Additionally, the increasing demand for sub-micro-second voltage scaling in SoCs has pushed regulators to be fully integrated in-package and/or on-chip. Buck converters still offer the highest efficiency compared to other converter topologies but present integration challenges that this work addresses by utilizing a package integrated voltage regulator (PIVR) with full back-end integration of magnetic-core power inductors. The on-chip D-LDO demonstrated a fully standard cell-based distributed design integrated into an advanced 12nm FinFET process. A focus on reducing excess timing margins has led to a push towards advanced clocking mechanisms like adaptive clocking, which has caused a shift from more traditional PLL-based dynamic voltage and frequency scaling to unified voltage and frequency scaling architectures that use tunable replica oscillators to decrease timing excess timing margins due to voltage droop, process variations, thermals, and aging. This work implemented UVFS with an HVR architecture using a multi-output PIVR cascaded with on-chip D-LDOs and demonstrated it in a complex 22-core network-on-chip SoC in 12nm FinFET. Electrical engineering Computer engineering Systems on a chip Voltage-frequency converters Networks on a chip
18	An Exploration Of Heterogeneous Networks On Chip Grimm, Allen Gary 01 January 2011 (has links) As the the number of cores on a single chip continue to grow, communication increasingly becomes the bottleneck to performance. Networks on Chips (NoC) is an interconnection paradigm showing promise to allow system size to increase while maintaining acceptable performance. One of the challenges of this paradigm is in constructing the network of inter-core connections. Using the traditional wire interconnect as long range links is proving insufficient due to the increase in relative delay as miniaturization progresses. Novel link types are capable of delivering single-hop long-range communication. We investigate the potential benefits of constructing networks with many link types applied to heterogeneous NoCs and hypothesize that a network with many link types available can achieve a higher performance at a given cost than its homogeneous network counterpart. To investigate NoCs with heterogeneous links, a multiobjective evolutionary algorithm is given a heterogeneous set of links and optimizes the number and placement of those links in an NoC using objectives of cost, throughput, and energy as a representative set of a NoC's quality. The types of links used and the topology of those links is explored as a consequence of the properties of available links and preference set on the objectives. As the platform of experimentation, the Complex Network Evolutionary Algorithm (CNEA) and the associated Complex Network Framework (CNF) are developed. CNEA is a multiobjective evolutionary algorithm built from the ParadisEO framework to facilitate the construction of optimized networks. CNF is designed and used to model and evaluate networks according to: cost of a given topology; performance in terms of a network's throughput and energy consumption; and graph-theory based metrics including average distance, degree-, length-, and link-distributions. It is shown that optimizing complex networks to cost as a function of total link length and average distance creates a power-law link-length distribution. This offers a way to decrease the average distance of a network for a given cost when compared to random networks or the standard mesh network. We then explore the use of several types of constrained-length links in the same optimization problem and find that, when given access to all link types, we obtain networks that have the same or smaller average distance for a given cost than any network that is produced when given access to only one link type. We then introduce traffic on the networks with an interconnect-based packet-level shortest-path-routed traffic model. We find heterogeneous networks can achieve a throughput as good or better than the homogeneous network counterpart using the same amount of link. Finally, these results are confirmed by augmenting a wire-based mesh network with non-traditional link types and finding significant increases the overall performance of that network. Multiobjective optimization Evolutionary algorithm Network design problem Heterogeneous computing

Search results