Global ETD Search

1	Characterization and optimization of low-swing on-chip interconnect circuits Irfansyah, Astria Nur, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links) Low-swing on-chip interconnect circuits have been viewed as alternative solutions to the problem of delay and power increase of on-chip interconnects. This thesis aims to characterize and optimize several basic low-swing interconnect circuits, by developing simple delay and power estimation methodologies. Accuracies of the proposed methods are validated against SPICE-based simulations on the 90nm technology node. Based on the delay and power estimation methods developed, optimum power-delay trade-off curves are obtained and directly used for comparison among different interconnect circuit strategies. Three low-swing techniques are included, i.e. conventional level converter (CLC), pseudodifferential interconnect circuit (PDIFF), and current-mode signaling (CM). These techniques represent significantly different driver and receiver topologies, where CLC uses lower supply voltage of a normal inverter driver, PDIFF uses NMOS only drivers, while CM has a low impedance termination at the receiving end. In addition, an optimized full-swing repeater-based technique is included as a baseline for comparison. A simplified repeater performance estimation technique considering ramp input signals is also proposed. The most important step in estimating delay of different driver circuits is the accurate estimation of transistor effective resistance, which considers velocity saturation effects and voltage transition patterns. Optimization for the CM circuit for on-chip interconnects requires completely different treatment than the voltage-mode circuits, due to the different and more complex effective driver resistance and termination resistance modeling. Sizing the driver and receiver transistors should be done simultaneously as their resistive values which affect its performance are dependent on each other. Optimum transistor sizing is very dependenton the required voltage swing chosen. Results of our comparisons show that optimized CLC (reduced voltage supply) repeaters appears to give the best general performance with a slight delay overhead compared to full-swing repeaters. The fact that CLC with repeaters has shorter delay than single-segment CM and PDIFF highlights the effectiveness of repeater structures in long wires. The inclusion of inductance and closed-form solutions to derive optimum transistor sizings for various low-swing interconnect circuits may be developed as a future work using delay and power estimation models presented in this thesis, which is a challenging task to do considering the non-linear equations involved. current-mode signaling on-chip interconnects low-swing
2	Wave-Pipelined Multiplexed (WPM) Routing for Gigascale Integration (GSI) Joshi, Ajay Jayant 12 April 2006 (has links) The main objective of this research is to develop a pervasive wire sharing technique that can be easily applied across the entire range of on-chip interconnects in a very large scale integration (VLSI) system. A wave-pipelined multiplexed (WPM) routing technique that can be applied both intra-macrocell and inter-macrocell interconnects is proposed in this thesis. It is shown that an extensive application of the WPM routing technique can provide significant advantages in terms of area, power and performance. In order to study the WPM routing technique, a hierarchical approach is adopted. A circuit-level, system-level and physical-level analysis is completed to explore the limits and opportunities to apply WPM routing to current VLSI and future gigascale integration (GSI) systems. Design, verification and optimization of the WPM circuit and measurement of its tolerance to external noise constitute the circuit-level analysis. The physical-level study involves designing wire sharing-aware placement algorithms to maximize the advantages of WPM routing. A system-level simulator that designs the entire multilevel interconnect network is developed to perform the system-level analysis. The effect of WPM routing on a full-custom interconnect network and a semi-custom interconnect network is studied. Wave-pipelining Wire sharing System simulator Low power On-chip interconnects
3	Efficient high-speed on-chip global interconnects Caputa, Peter January 2006 (has links) <p>The continuous miniaturization of integrated circuits has opened the path towards System-on-Chip realizations. Process shrinking into the nanometer regime improves transistor performancewhile the delay of global interconnects, connecting circuit blocks separated by a long distance, significantly increases. In fact, global interconnects extending across a full chip can have a delay corresponding to multiple clock cycles. At the same time, global clock skew constraints, not only between blocks but also along the pipelined interconnects, become even tighter. On-chip interconnects have always been considered <em>RC</em>-like, that is exhibiting long <em>RC</em>-delays. This has motivated large efforts on alternatives such as on-chip optical interconnects, which have not yet been demonstrated, or complex schemes utilizing on-chip F-transmission or pulsed current-mode signaling.</p><p>In this thesis, we show that well-designed electrical global interconnects, behaving as transmission lines, have the capacity of very high data rates (higher than can be delivered by the actual process) and support near velocity-of-light delay for single-ended voltage-mode signaling, thus mitigating the <em>RC</em>-problem. We critically explore key interconnect performance measures such as data delay, maximum data rate, crosstalk, edge rates and power dissipation. To experimentally demonstrate the feasibility and superior properties of on-chip transmission line interconnects, we have designed and fabricated a test chip carrying a 5 mm long global communication link. Measurements show that we can achieve 3 Gb/s/wire over the 5 mm long, repeaterless on-chip bus implemented in a standard 0.18 μm CMOS process, achieving a signal velocity of 1/3 of the velocity of light in vacuum.</p><p>To manage the problems due to global wire delays, we describe and implement a Synchronous Latency Insensitive Design (SLID) scheme, based on source-synchronous data transfer between blocks and data re-timing at the receiving block. The SLIDtechnique not onlymitigates unknown globalwire delays, but also removes the need for controlling global clock skew. The high-performance and high robustness capability of the SLID-method is practically demonstrated through a successful implementation of a SLID-based, 5.4 mm long, on-chip global bus, supporting 3 Gb/s/wire and dynamically accepting ± 2 clock cycles of data-clock skew, in a standard 0.18 μm CMOS porcess.</p><p>In the context of technology scaling, there is a tendency for interconnects to dominate chip power dissipation due to their large total capacitance. In this thesis we address the problem of interconnect power dissipation by proposing and analyzing a transition-energy cost model aimed for efficient power estimation of performancecritical buses. The model, which includes properties that closely capture effects present in high-performance VLSI buses, can be used to more accurately determine the energy benefits of e.g. transition coding of bus topologies. We further show a power optimization scheme based on appropriate choice of reduced voltage swing of the interconnect and scaling of receiver amplifier. Finally, the power saving impact of swing reduction in combination with a sense-amplifying flip-flop receiver is shown on a microprocessor cache bus architecture used in industry.</p> Microelectronics Global Interconnects On-Chip Interconnects Velocity-of-Light Delay On-Chip Communication Low-Latency Transmission Lines Electronics Elektronik
4	Efficient high-speed on-chip global interconnects Caputa, Peter January 2006 (has links) The continuous miniaturization of integrated circuits has opened the path towards System-on-Chip realizations. Process shrinking into the nanometer regime improves transistor performancewhile the delay of global interconnects, connecting circuit blocks separated by a long distance, significantly increases. In fact, global interconnects extending across a full chip can have a delay corresponding to multiple clock cycles. At the same time, global clock skew constraints, not only between blocks but also along the pipelined interconnects, become even tighter. On-chip interconnects have always been considered RC-like, that is exhibiting long RC-delays. This has motivated large efforts on alternatives such as on-chip optical interconnects, which have not yet been demonstrated, or complex schemes utilizing on-chip F-transmission or pulsed current-mode signaling. In this thesis, we show that well-designed electrical global interconnects, behaving as transmission lines, have the capacity of very high data rates (higher than can be delivered by the actual process) and support near velocity-of-light delay for single-ended voltage-mode signaling, thus mitigating the RC-problem. We critically explore key interconnect performance measures such as data delay, maximum data rate, crosstalk, edge rates and power dissipation. To experimentally demonstrate the feasibility and superior properties of on-chip transmission line interconnects, we have designed and fabricated a test chip carrying a 5 mm long global communication link. Measurements show that we can achieve 3 Gb/s/wire over the 5 mm long, repeaterless on-chip bus implemented in a standard 0.18 μm CMOS process, achieving a signal velocity of 1/3 of the velocity of light in vacuum. To manage the problems due to global wire delays, we describe and implement a Synchronous Latency Insensitive Design (SLID) scheme, based on source-synchronous data transfer between blocks and data re-timing at the receiving block. The SLIDtechnique not onlymitigates unknown globalwire delays, but also removes the need for controlling global clock skew. The high-performance and high robustness capability of the SLID-method is practically demonstrated through a successful implementation of a SLID-based, 5.4 mm long, on-chip global bus, supporting 3 Gb/s/wire and dynamically accepting ± 2 clock cycles of data-clock skew, in a standard 0.18 μm CMOS porcess. In the context of technology scaling, there is a tendency for interconnects to dominate chip power dissipation due to their large total capacitance. In this thesis we address the problem of interconnect power dissipation by proposing and analyzing a transition-energy cost model aimed for efficient power estimation of performancecritical buses. The model, which includes properties that closely capture effects present in high-performance VLSI buses, can be used to more accurately determine the energy benefits of e.g. transition coding of bus topologies. We further show a power optimization scheme based on appropriate choice of reduced voltage swing of the interconnect and scaling of receiver amplifier. Finally, the power saving impact of swing reduction in combination with a sense-amplifying flip-flop receiver is shown on a microprocessor cache bus architecture used in industry. Microelectronics Global Interconnects On-Chip Interconnects Velocity-of-Light Delay On-Chip Communication Low-Latency Transmission Lines Electronics Elektronik
5	Design Space Exploration for Networks On-chip Gilabert Villamón, Francisco 12 September 2011 (has links) Los diseños multi-núcleo se están convirtiendo en la solución más popular a la mayoría de las limitaciones de los diseños mono-núcleo. Un diseño multi-núcleo sigue el paradigma de diseño conocido como Sistema dentro del Chip (o SoC , del inglés System on-Chip), en el cuál varios núcleos se integran en un mismo chip. Las prestaciones de un diseño SoC dependen en gran medida de la infraestructura de interconexión que implemente. En este contexto, el paradigma de diseño conocido como red dentro del chip (o NoC, del inglés Network on-Chip) surge como una solución a los desafíos de interconexión presentes en los nuevos diseños de tipo SoC. Para un diseño concreto, el alto número de posibles soluciones basadas en NoCs incrementa la complejidad de analizar el espacio de diseño y de elegir la NoC óptima. La solución más común a este problema pasa por la utilización de herramientas de alto nivel para la obtención de estimaciones sobre las prestaciones de cada posible solución, que posteriormente serán utilizadas por el diseñador para cribar el espacio de diseño en las primeras etapas del proceso de diseño. Pero hay una gran diferencia entre las prestaciones estimadas por herramientas de alto nivel y las prestaciones reales obtenidas una vez el sistema se implementa. Este trabajo se centra en el desarrollo de nuevas herramientas de alto nivel de diseño, modelado y simulación de NoCs, con el fin de cribar el espacio de diseño de los candidatos menos atractivos. En un primer paso, nos centraremos en el diseño y desarrollo de una plataforma experimental para analizar arquitecturas alternativas para el diseño de NoCs de forma que permitan evaluar cualquier punto del espacio de diseño de forma rápida y precisa, mediante la anotación de algunos parámetros claves del proceso de síntesis física. En el segundo paso, se revisaron arquitecturas y técnicas de diseño adoptadas del dominio de las redes de interconexión fuera del chip, seleccionando las más prometedoras y, en algunos casos, explotando las características propias de las redes dentro de chip para obtener nuevas soluciones. Este paso, preliminar al desarrollo de la herramienta para la realización de exploraciones del espacio de diseño (o herramientas DSE, del inglés Design Space Exploration), tiene como objetivo depurar las técnicas para la abstracción de los efectos de la implementación física de las NoCs sobre sus prestaciones. / Gilabert Villamón, F. (2011). Design Space Exploration for Networks On-chip [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11521 Design space exploration Networks on-chip On-chip interconnects
6	Models and Techniques for Green High-Performance Computing Adhinarayanan, Vignesh 01 June 2020 (has links) High-performance computing (HPC) systems have become power limited. For instance, the U.S. Department of Energy set a power envelope of 20MW in 2008 for the first exascale supercomputer now expected to arrive in 2021--22. Toward this end, we seek to improve the greenness of HPC systems by improving their performance per watt at the allocated power budget. In this dissertation, we develop a series of models and techniques to manage power at micro-, meso-, and macro-levels of the system hierarchy, specifically addressing data movement and heterogeneity. We target the chip interconnect at the micro-level, heterogeneous nodes at the meso-level, and a supercomputing cluster at the macro-level. Overall, our goal is to improve the greenness of HPC systems by intelligently managing power. The first part of this dissertation focuses on measurement and modeling problems for power. First, we study how to infer chip-interconnect power by observing the system-wide power consumption. Our proposal is to design a novel micro-benchmarking methodology based on data-movement distance by which we can properly isolate the chip interconnect and measure its power. Next, we study how to develop software power meters to monitor a GPU's power consumption at runtime. Our proposal is to adapt performance counter-based models for their use at runtime via a combination of heuristics, statistical techniques, and application-specific knowledge. In the second part of this dissertation, we focus on managing power. First, we propose to reduce the chip-interconnect power by proactively managing its dynamic voltage and frequency (DVFS) state. Toward this end, we develop a novel phase predictor that uses approximate pattern matching to forecast future requirements and in turn, proactively manage power. Second, we study the problem of applying a power cap to a heterogeneous node. Our proposal proactively manages the GPU power using phase prediction and a DVFS power model but reactively manages the CPU. The resulting hybrid approach can take advantage of the differences in the capabilities of the two devices. Third, we study how in-situ techniques can be applied to improve the greenness of HPC clusters. Overall, in our dissertation, we demonstrate that it is possible to infer power consumption of real hardware components without directly measuring them, using the chip interconnect and GPU as examples. We also demonstrate that it is possible to build models of sufficient accuracy and apply them for intelligently managing power at many levels of the system hierarchy. / Doctor of Philosophy / Past research in green high-performance computing (HPC) mostly focused on managing the power consumed by general-purpose processors, known as central processing units (CPUs) and to a lesser extent, memory. In this dissertation, we study two increasingly important components: interconnects (predominantly focused on those inside a chip, but not limited to them) and graphics processing units (GPUs). Our contributions in this dissertation include a set of innovative measurement techniques to estimate the power consumed by the target components, statistical and analytical approaches to develop power models and their optimizations, and algorithms to manage power statically and at runtime. Experimental results show that it is possible to build models of sufficient accuracy and apply them for intelligently managing power on multiple levels of the system hierarchy: chip interconnect at the micro-level, heterogeneous nodes at the meso-level, and a supercomputing cluster at the macro-level. Green Supercomputing Power Modeling Power Management DVFS Phase Prediction Heterogeneity GPUs Data Movement On-chip Interconnects In-situ Techniques
7	Dynamic Bandwidth allocation algorithms for an RF on-chip interconnect / Allocation dynamique de bande passante pour l’interconnexion RF d’un réseau sur puce Unlu, Eren 21 June 2016 (has links) Avec l’augmentation du nombre de cœurs, les problèmes de congestion sont commencé avec les interconnexions conventionnelles. Afin de remédier à ces défis, WiNoCoD projet (Wired RF Network-on-Chip Reconfigurable-on-Demand) a été initié par le financement de l’Agence Nationale de Recherche (ANR). Ce travail de thèse contribue à WiNoCoD projet. Une structure de contrôleur de RF est proposé pour l’interconnexion OFDMA de WiNoCoD et plusieurs algorithmes d’allocation de bande passante efficaces (distribués et centralisés) sont développés, concernant les demandes et contraintes très spécifiques de l’environnement sur-puce. Un protocole innovante pour l’arbitrage des sous-porteuses pour des longueurs bimodales de paquets sur-puce, qui ne nécessite aucun signalisation supplémentaire est introduit. Utilisation des ordres de modulation élevés avec plus grande consommation d’énergie est évaluée. / With rapidly increasing number of cores on a single chip, scalability problems have arised due to congestion and latency with conventional interconnects. In order to address these issues, WiNoCoD project (Wired RF Network-on-Chip Reconfigurable-on-Demand) has been initiated by the support of French National Research Agency (ANR). This thesis work contributes to WiNoCoD project. A special RF controller structure has been proposed for the OFDMA based wired RF interconnect of WiNoCoD. Based on this architecture, effective bandwidth allocation algorithms have been presented, concerning very specific requirements and constraints of on-chip environment. An innovative subcarrier allocation protocol for bimodal packet lengths of cache coherency traffic has been presented, which is proven to decrease average latency significantly. In addition to these, effective modulation order selection policies for this interconnect have been introduced, which seeks the optimal delay-power trade-off. OFDMA Allocation Dynamique de bande passante Réseau sur puce Processeurs multicœurs Interconnexions sur puce OFDMA Dynamic bandwidth allocation Network-on-chip Multicore processors On-chip interconnects
8	Dynamic Bandwidth allocation algorithms for an RF on-chip interconnect / Allocation dynamique de bande passante pour l’interconnexion RF d’un réseau sur puce Unlu, Eren 21 June 2016 (has links) Avec l’augmentation du nombre de cœurs, les problèmes de congestion sont commencé avec les interconnexions conventionnelles. Afin de remédier à ces défis, WiNoCoD projet (Wired RF Network-on-Chip Reconfigurable-on-Demand) a été initié par le financement de l’Agence Nationale de Recherche (ANR). Ce travail de thèse contribue à WiNoCoD projet. Une structure de contrôleur de RF est proposé pour l’interconnexion OFDMA de WiNoCoD et plusieurs algorithmes d’allocation de bande passante efficaces (distribués et centralisés) sont développés, concernant les demandes et contraintes très spécifiques de l’environnement sur-puce. Un protocole innovante pour l’arbitrage des sous-porteuses pour des longueurs bimodales de paquets sur-puce, qui ne nécessite aucun signalisation supplémentaire est introduit. Utilisation des ordres de modulation élevés avec plus grande consommation d’énergie est évaluée. / With rapidly increasing number of cores on a single chip, scalability problems have arised due to congestion and latency with conventional interconnects. In order to address these issues, WiNoCoD project (Wired RF Network-on-Chip Reconfigurable-on-Demand) has been initiated by the support of French National Research Agency (ANR). This thesis work contributes to WiNoCoD project. A special RF controller structure has been proposed for the OFDMA based wired RF interconnect of WiNoCoD. Based on this architecture, effective bandwidth allocation algorithms have been presented, concerning very specific requirements and constraints of on-chip environment. An innovative subcarrier allocation protocol for bimodal packet lengths of cache coherency traffic has been presented, which is proven to decrease average latency significantly. In addition to these, effective modulation order selection policies for this interconnect have been introduced, which seeks the optimal delay-power trade-off. OFDMA Allocation Dynamique de bande passante Réseau sur puce Processeurs multicœurs Interconnexions sur puce OFDMA Dynamic bandwidth allocation Network-on-chip Multicore processors On-chip interconnects
9	Metody optimalizace pro zajištění integrity signálů pro vysokorychlostní přenos dat mezi čipy / Signal Integrity Optimization Techniques for High-Speed Chips Signaling Ševčík, Břetislav January 2017 (has links) Tato disertační práce je obsahově zaměřená na problematiku integrity signálů v moderních čipových obvodech. Na základě provedených simulací a praktických experimentů byl proveden návrh equalizační techniky druhého řádu pro efektivnější vysokorychlostní komunikaci. Předložený návrh respektuje současné požadavky na vyvíjené signalizační techniky, které zahrnují efektivnější využití šířky pásma přenosového kanálu a energetickou úsporu. Provedené analýzy podrobně ukazují možnost zvýšení přenosové rychlosti při přenosu signálu skrz nízkonákladové přenosové kanály s využitím navržené signální metody. Výkonnost navrhované signalizační techniky je demonstrována na různých typech přenosových kanálů s přenosovou funkcí vyššího řádu. Diskutovány jsou rovněž možnosti omezení rušivých vlivů na přenosové kanály během návrhu.

Search results