Recently, machine learning and deep neural networks (DNNs) have gained a significant amount of attention since they have achieved human-like performance in various tasks, such as image classification, recommendation, and natural language processing. As the tasks get more complicated, people build bigger and deeper networks to obtain high accuracy, and this brings challenges to existing hardware on fast and energy-efficient DNN computation due to the memory wall problem. First, traditional hardware spends a significant amount of energy on moving the data between memory and ALU units. Second, the traditional memory blocks only support row-by-row access, and this limits the computation speed and energy efficiency.
In-memory computing (IMC) is one promising measure to solve the aforementioned problems in DNN computation. This approach combines the memory blocks with the computation units to enable high computation throughput and low energy consumption. On the macro level, both digital and analog-mixed-signal (AMS) IMC macros achieve high performance in the multiply-and-accumulation (MAC) computation. The AMS designs have high energy efficiency and highcompute density, and the digital designs have PVT robustness and technology scalability. On the architecture level, specialized hardware accelerators that integrate these IMC macros outperform the traditional hardware accelerators in end-to-end DNN inference. Beyond the IMC, other approaches also reduce energy consumption. For example, sparsity-aware training reduces the arithmetic energy by adding more zeros to the weights and zero-gating the multiplication and/or addition. Weight and activation compression reduces the off-chip memory access energy.
This thesis presents new circuit and architecture designs for efficient DNN inference with in-memory computing architectures. First, this thesis presents two SRAM-based analog-mixed signal IMC macros. One is a macro with custom 10T1C cells for binary/ternary MAC operation. The other one, MACC-SRAM, is a multistep-accumulation capacitor-based IMC macro for 4b MAC computation. The macro features stepwise charging and discharging, sparsity optimization, and adder-first architecture for energy efficiency. Second, we propose a programmable DNN accelerator that integrates 108 AMS IMC macros. This accelerator, named PIMCA, with its own pipeline structure and instruction set architecture, can flexibly support the inference at the instruction level. Last, we implement a fully-digital accelerator that integrates IMC macros supporting floating-point number computation. The accelerator contains online decompression hardware to reducedata movement energy of weight and activation. It also contains online activation compressors to reduce the activation memory footprint.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/6b0f-aq66 |
Date | January 2024 |
Creators | Zhang, Bo |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0035 seconds