• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Deep Learning Optimization and Acceleration

Jiang, Beilei 08 1900 (has links)
The novelty of this dissertation is the optimization and acceleration of deep neural networks aimed at real-time predictions with minimal energy consumption. It consists of cross-layer optimization, output directed dynamic quantization, and opportunistic near-data computation for deep neural network acceleration. On two datasets (CIFAR-10 and CIFAR-100), the proposed deep neural network optimization and acceleration frameworks are tested using a variety of Convolutional neural networks (e.g., LeNet-5, VGG-16, GoogLeNet, DenseNet, ResNet). Experimental results are promising when compared to other state-of-the-art deep neural network acceleration efforts in the literature.
2

Adaptive Beyond Von-Neumann Computing Devices and Reconfigurable Architectures for Edge Computing Applications

Hossain, Mousam 01 January 2024 (has links) (PDF)
The Von-Neumann bottleneck, a major challenge in computer architecture, results from significant data transfer delays between the processor and main memory. Crossbar arrays utilizing spin-based devices like Magnetoresistive Random Access Memory (MRAM) aim to overcome this bottleneck by offering advantages in area and performance, particularly for tasks requiring linear transformations. These arrays enable single-cycle and in-memory vector-matrix multiplication, reducing overheads, which is crucial for energy and area-constrained Internet of Things (IoT) sensors and embedded devices. This dissertation focuses on designing, implementing, and evaluating reconfigurable computation platforms that leverage MRAM-based crossbar arrays and analog computation to support deep learning and error resilience implementations. One key contribution is the investigation of Spin Torque Transfer MRAM (STT-MRAM) technology scaling trends, considering power dissipation, area, and process variation (PV) across different technology nodes. A predictive model for power estimation in hybrid CMOS/MTJ technology has been developed and validated, along with new metrics considering the Internet of Things (IoT) energy profile of various applications. The dissertation introduces the Spintronically Configurable Analog Processing in-memory Environment (SCAPE), integrating analog arithmetic, runtime reconfigurability, and non-volatile devices within a selectable 2-D topology of hybrid spin/CMOS devices. Simulation results show improvements in error rates, power consumption, and power-error-product metric for real-world applications like machine learning and compressive sensing, while assessing process variation impact. Additionally, it explores transportable approaches to more robust SCAPE implementations, including applying redundancy techniques for artificial neural network (ANN)-based digit recognition applications. Generic redundancy techniques are developed and applied to hybrid spin/CMOS-based ANNs, showcasing improved/comparable accuracy with smaller-sized networks. Furthermore, the dissertation examines hardware security considerations for emerging memristive device-based applications, discussing mitigation approaches against malicious manufacturing interventions. It also discusses reconfigurable computing for AI/ML applications based on state-of-the-art FPGAs, along with future directions in adaptive computing architectures for AI/ML at the edge of the network.
3

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Ashish Gondimalla (14214179) 17 May 2024 (has links)
<p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p>

Page generated in 0.0979 seconds