Spelling suggestions: "subject:"monolithic 3D"" "subject:"fonolithic 3D""
1 |
Temperature-aware 3D-integrated systolic array DNN acceleratorsShukla, Prachi 17 January 2023 (has links)
Deep neural networks (DNNs) are extensively used for inference in a wide range of emerging mobile and edge application domains, including autonomous vehicles, drones, augmented and virtual reality (AR/VR), etc. Due to the increasing popularity of these applications, there has been an increasing demand for mobile/edge DNN accelerators to achieve low inference latency and high efficiency. Furthermore, these mobile/edge applications also need to execute multi-DNN workloads, where multiple independent DNNs execute subtasks to complete one large task.
This thesis aims to optimize the efficiency of systolic arrays for DNN acceleration because they are among the most popular architectures for DNN inference in mobile/edge systems due to their straightforward design and dataflow. Systolic arrays provide several degrees of freedom to co-optimize performance, power, area, and temperature–namely, die/chiplet architecture (number of processing elements, on-chip memory capacity and its architecture), quantity, placement, and dataflow.
While recent works have focused on 2D DNN systolic arrays, 2D scaling has been saturating and, thus, improving the performance and power characteristics of computing systems is becoming increasingly challenging. To overcome traditional scaling bottlenecks, 3D integration has emerged as a promising integration technology. 3D technology provides several benefits over 2D systems such as high integration density, high bandwidth, high energy efficiency, and footprint savings. This thesis focuses on two 3D integration technologies: (i) die-stacked 3D (TSV3D), and (ii) monolithic 3D (MONO3D).
Both of these 3D technologies provide significant performance and power benefits over 2D systems and thus, are potent technologies for energy efficient design of systolic arrays for DNNs. However, the dense integration in 3D causes high power densities and inter-tier thermal coupling, further escalating thermal issues and resulting in hot spots across tiers. Furthermore, mobile/edge devices have tight area, power, and thermal constraints due to the absence of heat sinks and fans. Thus, temperature is a critical design concern in 3D DNN accelerators for mobile/edge devices.
This thesis states that to glean the benefits of 3D technology in mobile/edge devices to improve energy efficiency and satisfy performance and power constraints, it is imperative to design thermally-aware 3D systolic arrays for DNNs. To realize this statement, this thesis makes the following contributions: (i) it designs a thermally-aware optimization flow to select a near-optimal MONO3D DNN systolic array for a given DNN and an optimization goal under a performance constraint. The optimizer is facilitated by circuit and architecture-level cross-layer performance/power models that are developed as part of this thesis. (ii) It introduces thermal awareness in tuning a given TSV3D systolic array chiplet architecture and the chiplet’s placement in a multi-chip module (MCM) executing a multi-DNN workload to balance both cost and power of the MCM, while satisfying latency, area, power, thermal packaging, and workload constraints. (iii) It optimizes a dataflow implementation by utilizing the massive bandwidth available in MONO3D systolic arrays with a dense on-chip resistive RAM to improve energy efficiency while satisfying the thermal and performance constraints. Results demonstrate 81% improvement in inference per second per watt over 2D systolic arrays due to high-density and high-bandwidth resistive RAM interface using monolithic inter-tier vias (MIVs). We also demonstrate up to 44% MCM cost savings and 63% DRAM power savings over temperature-unaware optimization at iso-frequency and iso-MCM area for TSV3D MCMs. In addition, we show that optimization without thermal awareness leads to over-estimation of efficiency gains and thermal violations in both MONO3D and TSV3D systolic arrays. / 2025-01-16T00:00:00Z
|
2 |
Design methodology and technology assessment for high-desnity 3D technologies / Méthodologie de conception et de l'évaluation des technologies 3D haute densitéSarhan, Hossam 23 November 2015 (has links)
L'impact des interconnections d'un circuit intégré sur les performances et la consommation est de plus en plus important à partir du nœud CMOS 28 nm et au-delà, ayant pour effet de minimiser de plus ne plus la loi de Moore. Cela a motivé l'intérêt des technologies d'empilement 3D pour réduire l'effet des interconnections sur les performances des circuits. Les technologies d'empilement 3D varient suivant différents procédés de fabrication d'où l'on mettra en avant la technologie Trough Silicon Via (TSV) – Collage Cuivre-Cuivre (Cu-Cu) et 3D Monolithique. TSV et Cu-Cu présentent des diamètres d'interconnexions 3D de l'ordre de 10 µm tandis que le diamètre d'une interconnexion 3D Monolithique est 0.1 µm, c'est-à-dire cent fois plus petit. Un tel diamètre d'interconnexion créée de nouveaux challenge en terme de conception de circuit intégré numérique. Dans ce contexte, notre objectif est de proposer des méthodologies de conception de circuits 3D innovantes afin d'utiliser au mieux la densité d'intégration possible et d'évaluer efficacement les gains en performance, surface et consommation potentiels de ces différentes technologies d'empilement par rapport à la conception de circuit 2D.Trois contributions principales constituent cette thèse : La densité d'intégration offerte par les technologies d'empilement étudiées laisse le possibilité de revoir la topologie des cellules de bases en les concevant directement en 3D. C'est ce qui a été fait dans l'approche Cellule sur Buffer (Cell-on-Buffer – CoB), en empilant la fonction logique de base d'une cellule sur l'étage d'amplification. Les simulations montrent des gains substantiels par rapport aux circuits 2D. On a imaginé par la suite désaligner les niveaux d'alimentation de chaque tranche afin de créer une technique de Multi-VDD adaptée à l'empilement 3D pour réduire encore plus la consommation des circuits 3D.Dans un deuxième temps, le partitionnement grain fin des cellules a été étudié. En effet au niveau VLSI, quand on conçoit un circuit de plusieurs milliers voir million de cellules standard en 3D, se pose la question de l'attribution de telle ou telle cellule sur la tranche haute ou basse du circuit 3D afin d'accroitre au mieux les performances et consommation du circuit 3D. Une méthodologie de partitionnement physique est introduite pour cela.Enfin un environnement d'évaluation des performances et consommation des technologies 3D est présenté avec pour objectif de rapidement tester les gains possibles de telle ou telle technologie 3D tout en donnant des directives quant à l'impact des certains paramètres technologiques 3D sur les performances et consommation. / Scaling limitations of advanced technology nodes are increasing and the BEOL parasitics are becoming more dominant. This has led to an increasing interest in 3D technologies to overcome such limitations and to continue the scaling predicted by Moore's Law. 3D technologies vary according to the fabrication process which creates a wide spectrum of technologies including Through-Silicon-VIA (TSV), Copper-to-Copper (CuCu) and Monolithic 3D (M3D). TSV and CuCu provide 3D contacts of pitch around 5-10um while M3D scales down 3D via pitch extremely to 0.11um. Such high-density capability of Monolithic 3D technology creates new design paradigms. In this context, our objective is to propose innovative design methodologies to well utilize M3D technology and introduce a technology assessment framework to evaluate different M3D technology parameters from design perspective.This thesis can be divided into three main contributions. As creating 3D standard cells become achievable thanks to M3D technology, a new 3D standard cell approach has been introduced which we call it ‘3D Cell-on-Buffer' (3DCoB). 3DCoB cells are created by splitting 2D cells into functioning gates and driving buffers stacked over each other. The simulation results show gain in timing performances compared to 2D. By applying an additionally Multi-VDD low-power approach, iso-performance power gain has been achieved. Afterwards cell-on-cell design approach has been explored where a partitioning methodology is needed to distribute cells between different tiers, i.e. determine which cell is placed on which tier. A physical-aware partitioning methodology has been introduced which improves power-performance-area results comparing to the state-of-the-art partitioning techniques. Finally a full high-density 3D technology assessment study is presented to explore the trade-offs between different 3D technologies, block complexities and partitioning methodologies.
|
Page generated in 0.0879 seconds