Deep neural networks (DNNs) are extensively used for inference in a wide range of emerging mobile and edge application domains, including autonomous vehicles, drones, augmented and virtual reality (AR/VR), etc. Due to the increasing popularity of these applications, there has been an increasing demand for mobile/edge DNN accelerators to achieve low inference latency and high efficiency. Furthermore, these mobile/edge applications also need to execute multi-DNN workloads, where multiple independent DNNs execute subtasks to complete one large task.
This thesis aims to optimize the efficiency of systolic arrays for DNN acceleration because they are among the most popular architectures for DNN inference in mobile/edge systems due to their straightforward design and dataflow. Systolic arrays provide several degrees of freedom to co-optimize performance, power, area, and temperature–namely, die/chiplet architecture (number of processing elements, on-chip memory capacity and its architecture), quantity, placement, and dataflow.
While recent works have focused on 2D DNN systolic arrays, 2D scaling has been saturating and, thus, improving the performance and power characteristics of computing systems is becoming increasingly challenging. To overcome traditional scaling bottlenecks, 3D integration has emerged as a promising integration technology. 3D technology provides several benefits over 2D systems such as high integration density, high bandwidth, high energy efficiency, and footprint savings. This thesis focuses on two 3D integration technologies: (i) die-stacked 3D (TSV3D), and (ii) monolithic 3D (MONO3D).
Both of these 3D technologies provide significant performance and power benefits over 2D systems and thus, are potent technologies for energy efficient design of systolic arrays for DNNs. However, the dense integration in 3D causes high power densities and inter-tier thermal coupling, further escalating thermal issues and resulting in hot spots across tiers. Furthermore, mobile/edge devices have tight area, power, and thermal constraints due to the absence of heat sinks and fans. Thus, temperature is a critical design concern in 3D DNN accelerators for mobile/edge devices.
This thesis states that to glean the benefits of 3D technology in mobile/edge devices to improve energy efficiency and satisfy performance and power constraints, it is imperative to design thermally-aware 3D systolic arrays for DNNs. To realize this statement, this thesis makes the following contributions: (i) it designs a thermally-aware optimization flow to select a near-optimal MONO3D DNN systolic array for a given DNN and an optimization goal under a performance constraint. The optimizer is facilitated by circuit and architecture-level cross-layer performance/power models that are developed as part of this thesis. (ii) It introduces thermal awareness in tuning a given TSV3D systolic array chiplet architecture and the chiplet’s placement in a multi-chip module (MCM) executing a multi-DNN workload to balance both cost and power of the MCM, while satisfying latency, area, power, thermal packaging, and workload constraints. (iii) It optimizes a dataflow implementation by utilizing the massive bandwidth available in MONO3D systolic arrays with a dense on-chip resistive RAM to improve energy efficiency while satisfying the thermal and performance constraints. Results demonstrate 81% improvement in inference per second per watt over 2D systolic arrays due to high-density and high-bandwidth resistive RAM interface using monolithic inter-tier vias (MIVs). We also demonstrate up to 44% MCM cost savings and 63% DRAM power savings over temperature-unaware optimization at iso-frequency and iso-MCM area for TSV3D MCMs. In addition, we show that optimization without thermal awareness leads to over-estimation of efficiency gains and thermal violations in both MONO3D and TSV3D systolic arrays. / 2025-01-16T00:00:00Z
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/45480 |
Date | 17 January 2023 |
Creators | Shukla, Prachi |
Contributors | Coskun, Ayse K. |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0022 seconds