Spelling suggestions: "subject:"restacking"" "subject:"destacking""
1 |
Architecting heterogeneous memory systems with 3D die-stacked memorySim, Jae Woong 21 September 2015 (has links)
The main objective of this research is to efficiently enable 3D die-stacked memory and heterogeneous memory systems. 3D die-stacking is an emerging technology that allows for large amounts of in-package high-bandwidth memory storage. Die-stacked memory has the potential to provide extraordinary performance and energy benefits for computing environments, from data-intensive to mobile computing. However, incorporating die-stacked memory into computing environments requires innovations across the system stack from hardware and software. This dissertation presents several architectural innovations to practically deploy die-stacked memory into a variety of computing systems.
First, this dissertation proposes using die-stacked DRAM as a hardware-managed cache in a practical and efficient way. The proposed DRAM cache architecture employs two novel techniques: hit-miss speculation and self-balancing dispatch. The proposed techniques virtually eliminate the hardware overhead of maintaining a multi-megabytes SRAM structure, when scaling to gigabytes of stacked DRAM caches, and improve overall memory bandwidth utilization.
Second, this dissertation proposes a DRAM cache organization that provides a high level of reliability for die-stacked DRAM caches in a cost-effective manner. The proposed DRAM cache uses error-correcting code (ECCs), strong checksums (CRCs), and dirty data duplication to detect and correct a wide range of stacked DRAM failures—from traditional bit errors to large-scale row, column, bank, and channel failures—within the constraints of commodity, non-ECC DRAM stacks. With only a modest performance degradation compared to a DRAM cache with no ECC support, the proposed organization can correct all single-bit failures, and 99.9993% of all row, column, and bank failures.
Third, this dissertation proposes architectural mechanisms to use large, fast, on-chip memory structures as part of memory (PoM) seamlessly through the hardware. The proposed design achieves the performance benefit of on-chip memory caches without sacrificing a large fraction of total memory capacity to serve as a cache. To achieve this, PoM implements the ability to dynamically remap regions of memory based on their access patterns and expected performance benefits.
Lastly, this dissertation explores a new usage model for die-stacked DRAM involving a hybrid of caching and virtual memory support. In the common case where system’s physical memory is not over-committed, die-stacked DRAM operates as a cache to provide performance and energy benefits to the system. However, when the workload’s active memory demands exceed the capacity of the physical memory, the proposed scheme dynamically converts the stacked DRAM cache into a fast swap device to avoid the otherwise grievous performance penalty of swapping to disk.
|
2 |
Temperature-aware 3D-integrated systolic array DNN acceleratorsShukla, Prachi 17 January 2023 (has links)
Deep neural networks (DNNs) are extensively used for inference in a wide range of emerging mobile and edge application domains, including autonomous vehicles, drones, augmented and virtual reality (AR/VR), etc. Due to the increasing popularity of these applications, there has been an increasing demand for mobile/edge DNN accelerators to achieve low inference latency and high efficiency. Furthermore, these mobile/edge applications also need to execute multi-DNN workloads, where multiple independent DNNs execute subtasks to complete one large task.
This thesis aims to optimize the efficiency of systolic arrays for DNN acceleration because they are among the most popular architectures for DNN inference in mobile/edge systems due to their straightforward design and dataflow. Systolic arrays provide several degrees of freedom to co-optimize performance, power, area, and temperature–namely, die/chiplet architecture (number of processing elements, on-chip memory capacity and its architecture), quantity, placement, and dataflow.
While recent works have focused on 2D DNN systolic arrays, 2D scaling has been saturating and, thus, improving the performance and power characteristics of computing systems is becoming increasingly challenging. To overcome traditional scaling bottlenecks, 3D integration has emerged as a promising integration technology. 3D technology provides several benefits over 2D systems such as high integration density, high bandwidth, high energy efficiency, and footprint savings. This thesis focuses on two 3D integration technologies: (i) die-stacked 3D (TSV3D), and (ii) monolithic 3D (MONO3D).
Both of these 3D technologies provide significant performance and power benefits over 2D systems and thus, are potent technologies for energy efficient design of systolic arrays for DNNs. However, the dense integration in 3D causes high power densities and inter-tier thermal coupling, further escalating thermal issues and resulting in hot spots across tiers. Furthermore, mobile/edge devices have tight area, power, and thermal constraints due to the absence of heat sinks and fans. Thus, temperature is a critical design concern in 3D DNN accelerators for mobile/edge devices.
This thesis states that to glean the benefits of 3D technology in mobile/edge devices to improve energy efficiency and satisfy performance and power constraints, it is imperative to design thermally-aware 3D systolic arrays for DNNs. To realize this statement, this thesis makes the following contributions: (i) it designs a thermally-aware optimization flow to select a near-optimal MONO3D DNN systolic array for a given DNN and an optimization goal under a performance constraint. The optimizer is facilitated by circuit and architecture-level cross-layer performance/power models that are developed as part of this thesis. (ii) It introduces thermal awareness in tuning a given TSV3D systolic array chiplet architecture and the chiplet’s placement in a multi-chip module (MCM) executing a multi-DNN workload to balance both cost and power of the MCM, while satisfying latency, area, power, thermal packaging, and workload constraints. (iii) It optimizes a dataflow implementation by utilizing the massive bandwidth available in MONO3D systolic arrays with a dense on-chip resistive RAM to improve energy efficiency while satisfying the thermal and performance constraints. Results demonstrate 81% improvement in inference per second per watt over 2D systolic arrays due to high-density and high-bandwidth resistive RAM interface using monolithic inter-tier vias (MIVs). We also demonstrate up to 44% MCM cost savings and 63% DRAM power savings over temperature-unaware optimization at iso-frequency and iso-MCM area for TSV3D MCMs. In addition, we show that optimization without thermal awareness leads to over-estimation of efficiency gains and thermal violations in both MONO3D and TSV3D systolic arrays. / 2025-01-16T00:00:00Z
|
3 |
Polymères underfills innovants pour l'empilement de puces éléctroniques. / Innovative underfills polymers for chips stackingTaluy, Alisée 18 December 2013 (has links)
Depuis l'invention du transistor dans les années 50, les performances des composants microélectroniques n'ont cessé de progresser, en passant notamment par l'augmentation de leur densité. Malheureusement, la miniaturisation des composants augmente les coûts de fabrication de façon prohibitive. Une solution, permettant d'accroître la densification et les fonctionnalités tout en limitant les coûts, passe par l'empilement des composants microélectroniques. Leurs connexions électriques s'effectuent alors à l'aide d'interconnexions verticales soudées au moyen d'un joint de brasure. Afin d'empêcher leurs ruptures lors des dilatations thermiques, les interconnexions sont protégées au moyen d'un polymère underfill. L'objectif de cette thèse est d'évaluer la faisabilité et la pertinence d'une nouvelle solution de remplissage par polymère, appelée wafer-level underfill (WLUF). L'écoulement de l'underfill durant l'étape d'assemblage des composants est modélisé afin de prédire les paramètres de scellement idéaux, permettant la formation des interconnexions électriques. Puis, l'intégration de nouveaux underfills, possédant des propriétés thermomécaniques différentes, pouvant affecter l'intégrité et le fonctionnement du dispositif, l'étude de la fiabilité du procédé WLUF et, par conséquent, l'évaluation de sa possibilité d'industrialisation est effectuée. / Since the invention of the transistor in the Fifties, performances of microelectronics components did not cease progressing thanks to their density increase. Unfortunately, miniaturization of components increases manufacturing costs in a prohibitory way. A solution, allowing densification and functionalization increase without costs rise, is microelectronics components stack. Their electrical connections are carried out using vertical interconnections welded by means of solder joints. In order to prevent their ruptures during thermal dilatations, interconnections are protected thanks to polymer underfill. The objective of this thesis is to evaluate the feasibility and the relevance of a new solution of polymer filling, called wafer-level underfill (WLUF). Flow of underfill during components assembly step is modeled in order to predict ideal bonding parameters, allowing electrical interconnections formation. Then, integration of new underfills, having different thermomechanical properties, being able to affect device integrity and functioning, the study of WLUF process reliability and, consequently, the evaluation of its industrialization possibility is carried out.
|
Page generated in 0.2492 seconds