Global ETD Search

1	Energy-aware synthesis for networks on chip architectures Chan, Jeremy, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) The Network on Chip (NoC) paradigm was introduced as a scalable communication infrastructure for future System-on-Chip applications. Designing application specific customized communication architectures is critical for obtaining low power, high performance solutions. Two significant design automation problems are the creation of an optimized configuration, given application requirement the implementation of this on-chip network. Automating the design of on-chip networks requires models for estimating area and energy, algorithms to effectively explore the design space and network component libraries and tools to generate the hardware description. Chip architects are faced with managing a wide range of customization options for individual components, routers and topology. As energy is of paramount importance, the effectiveness of any custom NoC generation approach lies in the availability of good energy models to effectively explore the design space. This thesis describes a complete NoC synthesis ???ow, called NoCGEN, for creating energy-efficient custom NoC architectures. Three major automation problems are addressed: custom topology generation, energy modeling and generation. An iterative algorithm is proposed to generate application specific point-to-point and packet-switched networks. The algorithm explores the design space for efficient topologies using characterized models and a system-level ???oorplanner for evaluating placement and wire-energy. Prior to our contribution, building an energy model required careful analysis of transistor or gate implementations. To alleviate the burden, an automated linear regression-based methodology is proposed to rapidly extract energy models for many router designs. The resulting models are cycle accurate with low-complexity and found to be within 10% of gate-level energy simulations, and execute several orders of magnitude faster than gate-level simulations. A hardware description of the custom topology is generated using a parameterizable library and custom HDL generator. Fully reusable and scalable network components (switches, crossbars, arbiters, routing algorithms) are described using a template approach and are used to compose arbitrary topologies. A methodology for building and composing routers and topologies using a template engine is described. The entire flow is implemented as several demonstrable extensible tools with powerful visualization functionality. Several experiments are performed to demonstrate the design space exploration capabilities and compare it against a competing min-cut topology generation algorithm. Networks on a chip.
2	Energy-aware synthesis for networks on chip architectures Chan, Jeremy, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) The Network on Chip (NoC) paradigm was introduced as a scalable communication infrastructure for future System-on-Chip applications. Designing application specific customized communication architectures is critical for obtaining low power, high performance solutions. Two significant design automation problems are the creation of an optimized configuration, given application requirement the implementation of this on-chip network. Automating the design of on-chip networks requires models for estimating area and energy, algorithms to effectively explore the design space and network component libraries and tools to generate the hardware description. Chip architects are faced with managing a wide range of customization options for individual components, routers and topology. As energy is of paramount importance, the effectiveness of any custom NoC generation approach lies in the availability of good energy models to effectively explore the design space. This thesis describes a complete NoC synthesis ???ow, called NoCGEN, for creating energy-efficient custom NoC architectures. Three major automation problems are addressed: custom topology generation, energy modeling and generation. An iterative algorithm is proposed to generate application specific point-to-point and packet-switched networks. The algorithm explores the design space for efficient topologies using characterized models and a system-level ???oorplanner for evaluating placement and wire-energy. Prior to our contribution, building an energy model required careful analysis of transistor or gate implementations. To alleviate the burden, an automated linear regression-based methodology is proposed to rapidly extract energy models for many router designs. The resulting models are cycle accurate with low-complexity and found to be within 10% of gate-level energy simulations, and execute several orders of magnitude faster than gate-level simulations. A hardware description of the custom topology is generated using a parameterizable library and custom HDL generator. Fully reusable and scalable network components (switches, crossbars, arbiters, routing algorithms) are described using a template approach and are used to compose arbitrary topologies. A methodology for building and composing routers and topologies using a template engine is described. The entire flow is implemented as several demonstrable extensible tools with powerful visualization functionality. Several experiments are performed to demonstrate the design space exploration capabilities and compare it against a competing min-cut topology generation algorithm. Networks on a chip.
3	Design and implementation of low-latency networks-on-chip. / CUHK electronic theses & dissertations collection January 2010 (has links) Asynchronous circuits are usually applied for the communications between multiple clock-domain blocks in some SoCs. According to application-specific traffic, efficiently allocating reasonable buffers in an asynchronous NoC router can avoid the waste or shortage of buffer resource. The method of application-specific asynchronous First-In-First-Out buffer allocation can reduce the silicon area and the power consumption to improve the network latency. According to given traffic pattems, the save of area buffer of our buffer-allocation method can be up to near 30% and the latency is reduced a little at same time. / Bypass schemes is efficient to reduce the average propagation cycles in NoCs. We propose novel lookahead bypass scheme to improve the network latency. The lookahead bypass router is implemented and evaluations of valious configurations are compared, where the proposed architecture significantly improves the packet latency up to 32.1 % over a baseline router. These prove that the router can reduce the average network latency and power consumption, and decreases the reliance on large buffers and virtual channels. Furthermore, the application-specific short-circuit channel is introduced to add some short cuts in a router to bypass the crossbar switch. It can provide additional internal channels to bypass the crossbar and increase the total probability of lookahead bypass. Therefore, the latency can be further reduced. And the throughput can be increased in some applications. / Multicast is preferred in parallel computers. It is an inherent fault of network-on-chip as compared with competitor bus architecture. Software method is a conventional method to implement multicast, but there is a large overhead in latency. The latency overhead of a 4-flit multicast packet achieves 6&sim;7 times as compared with tree-based or path-based hardware multicast. Hardware multicast support is necessary in these applications. A group-based hardware multicast method is desclibed and estimated in this thesis. Quality of service is also introduced to speed up multicast packets. / On-chip communication infrastructures are inunensely important today. As silicon technology allows more than one billion of transistors in a single piece of silicon, the system-on-chip (SoC) circuits can contain already a large number of processing elements (PEs). Therefore, the Networks-on-Chip (NoCs) are a generally accepted concept to solve the problems such as the scalability and throughput limitation, and physical design problems inherent in dedicated links and shared buses. However, the state-of-the-art on-chip network suffers from latency overhead due to the additional network as compared with dedicated wire connection. According to the different application enviromnents, there are different low-latency technologies for networks-on-chip. This thesis proposes some methods for low-latency NoCs design to relax the latency overhead, which include application-specific asynchronous buffer allocation, hardware multicast support, lookahead bypass scheme and short-circuit crossbar channel optimization. / Xin, Ling. / Adviser: Chui-Sing Choy. / Source: Dissertation Abstracts International, Volume: 73-03, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 157-164). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
4	Design and implementation of networks-on-chip: a cost-efficient framework. / CUHK electronic theses & dissertations collection January 2010 (has links) Integrating many processing elements (PE) in a single chip is inevitable as silicon technology allows more than one billion of transistors in a single piece of silicon. Networks-on-Chip (NoCs) has been proposed as a scalable solution to both increasing bandwidth requirements and physical design problems for multi-PE chips. However, as multi-PE chips drive the design focus to shift from the computation-centric to communication-centric, area and power costs consumed by communication has become comparable to what computation consumes. / The second direction is to reduce hop counts of packets when they travel from sources to destinations, and thus to reduce power consumption of NoCs. The reduction of hop counts is realized by using a recently proposed express virtual channel (EVC) technique to virtually bypass intermediate routers. We study the EVC technique in two domains. The first domain is to present a high-level, application-specific methodology to improve power efficiency of EVC paths early in the design stage. The methodology includes three steps. Firstly, aggregate communication loads between routers are calculated. Secondly, an energy reduction model and an energy overhead model are developed. Finally, energy savings of all possible EVCs path are calculated and a greedy algorithm is applied to insert EVCs paths in an iterative way. / The second domain is to exploit the EVC flow control in design and implementation of low-power NoCs. We firstly present cost-efficient hardware components for both EVC source and EVC bypass routers, then propose a statistical approach to customize buffer architectures for EVC networks, then describe creative use of low-power circuit techniques such as clock gating and operand isolation for EVC routers, and finally evaluate EVC NoCs through detailed ASIC implementations. Results show that EVC NoCs can save up to 34.26% of power compared to baseline NoCs. / This thesis tackles design and implementation of cost-efficient NoCs along two orthogonal directions. The first direction is to reduce area and power costs of a single virtual channel router. Through ASIC implementations, we find that allocator logic, including both virtual channel allocator (VA) and switch allocator (SA), consumes a large amount of costs. Based on RTL simulations for the entire NoCs, we identify great opportunities to reduce design costs of VA and then propose two low-complexity allocators: look-ahead VA and combined switch-VC allocator (SVA). Evaluations are performed for a wide range of traffic patterns and router parameters. Results show that both proposed architectures significantly reduce area and power costs of allocators without penalties on network performances. / Zhang, Min. / Source: Dissertation Abstracts International, Volume: 72-01, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 139-145). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
5	MNoC a network on chip for monitors / Madduri, Sailaja, January 2008 (has links) Thesis (M.S.E.C.E.)--University of Massachusetts Amherst, 2008. / Includes bibliographical references (p. 68-73). Networks on a chip. Computer monitors.
6	Design and performance optimization of asynchronous networks-on-chip Jiang, Weiwei January 2018 (has links) As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems. In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style. The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions. First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains. Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow. Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel. In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement. Computer science Networks on a chip Computer networks--Design
7	Silicon Photonic Subsystems for Inter-Chip Optical Networks Gazman, Alexander January 2019 (has links) The continuous growth of electronic compute and memory nodes in terms of the number of I/O pins, bandwidth, and areal throughput poses major integration and packaging challenges associated with offloading multi-Tbit/s data rates within the few pJ/bit targets. While integrated photonics are already deployed in long and short distances such as inter and intra data centers communications, the promising characteristics of the silicon photonic platform set it as the future technology for optical interconnects in ultra short inter-chip distances. The high index contrast between the waveguide and the cladding together with strong thermo-optic and carrier effects in silicon allows developing a wide range of micro-scale and low power optical devices compatible with the CMOS fabrication processes. Furthermore, the availability of photonic foundries and new electrical and optical co-packaging techniques further pushes this platform for the next steps of commercial deployment. The work in this dissertation presents the current trends in high-performance memory and processor nodes and gives motivation for disaggregated and reconfigurable inter-chip network enabled with the silicon photonic layer. A dense WDM transceiver and broadband switch architectures are discussed to support a bi-directional network of ten hybrid-memory cubes (HMC) interconnected to ten processor nodes with an overall aggregated bandwidth of 9.6Tbit/s. Latency and energy consumption are key performance parameters in a processor to primary memory nodes connectivity. The transceiver design is based on energy-efficient micro-ring resonators, and the broadband switch is constructed with 2x2 Mach-Zehnder elements for nano-second reconfiguration. Each transceiver is based on hundreds of micro-rings to convert the native HMC electrical protocol to the optical domain and the switch is based on tens of hundreds of 2x2 elements to achieve non-blocking all-to-all connectivity. The next chapters focus on developing methods for controlling and monitoring such complex and highly integrated silicon photonic subsystems. The thermo-optic effect is characterized and we show experimentally that the phase of the optical carrier can be reliably controlled with pulse-width modulation (PWM) signal, ultimately relaxing the need for hundreds of digital to analog converters (DACs). We further show that doped waveguide heaters can be utilized as \textit{in-line} optical power monitors by measuring photo-conductance current, which is an alternative for the conventional tapping and integration of photo-diodes. The next part concerned with a common cascaded micro-ring resonator in a WDM transceiver design. We develop on an FPGA control algorithm that abstracts the physical layer and takes user-defined inputs to set the resonances to the desired wavelength in a unicast and multicast transmission modes. The associated sensitivities of these silicon ring resonators are presented and addressed with three closed-loop solutions. We first show a closed-loop operation based on tapping the error signal from the drop port of the micro-ring. The second solution presents a resonance wavelength locking with a single digital I/O for control and feedback signals. Lastly, we leverage the photo-conductance effect and demonstrate the locking procedure using only the doped heater for both control and feedback purposes. To achieve the inter-chip reconfigurability we discuss recent advances of high-port-count SiP broadband switches for reconfigurable inter-chip networks. To ensure optimal operation in terms of low insertion loss, low cross-talk and high signal integrity per routing path, hundreds of 2x2 Mach-Zehnder elements need to be biased precisely for the cross and bar states. We address this challenge with a tapless and a design agnostic calibration approach based on the photo-conductance effect. The automated algorithm returns a look-up table for all for each 2x2 element and the associated calibrated biases. Each routing scenario is then tested for insertion loss, crosstalk and bit-error rate of 25Gbit/s 4-level pulse amplitude modulation signals. The last part utilizes the Mach-Zehnder interferometers in WDM transceiver applications. We demonstrate a polarization insensitive four-channel WDM receiver with 40Gbit/s per channel and a transmitter design generating 8-level pulse amplitude modulation signals at 30Gbit/s. Electrical engineering Networks on a chip Photonics--Materials
8	Design of platform for exploring application-specific NoC architecture. January 2011 (has links) Liu, Zhouyi. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 110-114). / Abstracts in English and Chinese. / ABSTRACTS --- p.I / 摘要 --- p.II / CONTENTS --- p.III / LIST OF FIGURE --- p.V / LIST OF TABLE --- p.VI / ACKNOWLEDGEMENT --- p.VII / Chapter CHAPTER 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- NETWORK-ON-CHIP --- p.1 / Chapter 1.2 --- RELATED WORKS --- p.2 / Chapter 1.3 --- PLATFORM OVERVEW --- p.6 / Chapter 1.4 --- AUTHOR'S CONTRIBUTION --- p.10 / Chapter CHAPTER 2 --- NOC LIBRARY --- p.12 / Chapter 2.1 --- NETWORK TERMINOLOGY --- p.12 / Chapter 2.2 --- BASIC STRUCTURE --- p.15 / Chapter 2.3 --- LOW-POWER ORIENTED ARCHITECTURE --- p.20 / Chapter 2.3.1 --- Low-Cost Allocator Design --- p.21 / Chapter 2.3.2 --- Clock Gating --- p.22 / Chapter 2.3.3 --- Express Virtual Channel Insertion --- p.22 / Chapter 2.4 --- LOW-LATENCY ORIENTED ARCHITECTURE --- p.28 / Chapter 2.4.1. --- Lookahead Bypass Scheme --- p.29 / Chapter 2.4.2. --- Lookahead Bypass Router Architecture --- p.29 / Chapter CHAPTER 3 --- BENCHMARK AND MEASUREMENT --- p.31 / Chapter 3.1 --- BENCHMARK GENERATION --- p.32 / Chapter 3.1.1 --- Types of Traffic Patterns --- p.32 / Chapter 3.1.2 --- Traffic Generator --- p.36 / Chapter 3.2 --- MEASUREMENT SETTING --- p.38 / Chapter 3.2.1 --- Warming-up Period. --- p.38 / Chapter 3.2.2 --- Latency Definition --- p.39 / Chapter 3.2.3 --- Throughput Definition --- p.40 / Chapter 3.2.4 --- Virtual Channel Utilization --- p.40 / Chapter CHAPTER 4 --- PLATFORM STRUCTURE --- p.41 / Chapter 4.1 --- FILE TREE --- p.42 / Chapter 4.1.1 --- System Files --- p.46 / Chapter 4.1.2 --- Low-Power NoC Related --- p.47 / Chapter 4.1.3 --- Low-Latency NoC Related --- p.50 / Chapter 4.1.4 --- Project Related --- p.51 / Chapter 4.2 --- PROCESSES --- p.52 / Chapter 4.3 --- GUI ACCESS --- p.56 / Chapter 4.3.1 --- Section 1: Project Setup --- p.58 / Chapter 4.3.2 --- Section 2-a: Low-Power Router Structure --- p.59 / Chapter 4.3.3 --- Section 2-b: Low-Latency Router Structure --- p.60 / Chapter 4.3.4 --- Section 3: Benchmark & Measurement --- p.60 / Chapter 4.3.5 --- Section 4: View Result --- p.62 / Chapter 4.3.6 --- Low-Power NoC Example --- p.62 / Chapter CHAPTER 5 --- OPTIMIZATION AND COMPARISON --- p.72 / Chapter 5.1 --- OPTIMIZATION TECHNIQUE --- p.72 / Chapter 5.1.1 --- Optimization Phase 1: Inactive Buffer Removal --- p.73 / Chapter 5.1.2 --- Optimization Phase 2: Infighting Analysis --- p.74 / Chapter 5.1.3 --- Over-Optimization --- p.75 / Chapter 5.1.4 --- Optimization Example --- p.79 / Chapter 5.2 --- NOCS COMPARISON --- p.83 / Chapter 5.3 --- LOW-POWER IMPLEMENTATION CODE EXPORT --- p.88 / Chapter CHAPTER 6 --- SUMMARY AND FUTURE WORK --- p.92 / Chapter 6.1. --- SUMMARY --- p.92 / Chapter 6.2. --- FUTURE WORK --- p.93 / REFERENCES --- p.95 Computer architecture
9	Modeling, evaluation, and implementation of ring-based interconnects for network-on-chip Bourduas, Stephan. January 1900 (has links) Thesis (Ph.D.). / Written for the Dept. of Electrical and Computer Engineering. Title from title page of PDF (viewed 2008/07/23). Includes bibliographical references.
10	PROPEL power & area-efficient, scalable opto-electronic network-on-chip / Morris, Randy W. January 2009 (has links) Thesis (M.S.)--Ohio University, June, 2009. / Title from PDF t.p. Includes bibliographical references.

Search results