Global ETD Search

Return to search

ENHANCING VISUAL UNDERSTANDING AND ENERGY-EFFICIENCY IN DEEP NEURAL NETWORKS

<p dir="ltr">Today’s deep neural networks (DNNs) have achieved tremendous performance in various domains such as computer vision, natural language processing, robotics, generative tasks etc. However, these high-performing DNNs require enormous amounts of compute, resulting in significant power consumption. Moreover, these often struggle in terms of visual understanding capabilities. To that effect, this thesis focuses on two aspects - enhancing efficiency of neural networks and improving their visual understanding. On the efficiency front, we leverage brain-inspired Spiking Neural Networks (SNNs), which offer a promising alternative to traditional deep learning. We first perform a comparative analysis between models with and without leak, revealing that leaky-integrate-and-fire (LIF) model provides improved robustness and better generalization compared to integrate-and-fire (IF). However, leak decreases the sparsity of computation. In the second work, by introducing a Discrete Cosine Transform-based novel spike encoding scheme (DCT-SNN), we demonstrate significant performance improvements, achieving 2-14X reduction in latency compared to state-of-the-art SNNs. Next, a novel temporal pruning method is proposed, which dynamically reduces the number of timesteps during training, enabling SNN inference with just one timestep while maintaining high accuracy. The second focus of the thesis is on improving the visual understanding aspect of DNNs. The first work along this direction introduces a framework for visual syntactic understanding, drawing parallels between linguistic syntax and visual components of an image. By manipulating images to create syntactically incorrect examples and using a BERT-like autoencoder for reconstruction, the study significantly enhances the visual syntactic recognition capabilities of DNNs, evidenced by substantial improvements in classification accuracies on the CelebA and AFHQ datasets. Further, the thesis tackles unsupervised procedure learning from videos, given multiple videos of the same underlying task. Employing optimal transport (OT) and introducing novel regularization strategies, we develop the ‘OPEL’ framework, which substantially outperforms existing methods (27-46% average enhancement in F1-score) on both egocentric and third-person benchmarks. Overall, the dissertation advances the field by proposing brain-inspired models and novel learning frameworks that significantly enhance the efficiency and visual understanding capabilities of deep learning systems, making them more suitable for real-world applications.</p>

10.25394/pgs.26811904.v1

Computer vision

Pattern recognition

spiking neural models

DCT-based spike encoding

temporal pruning

Visual syntax

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/26811904
Date	23 August 2024
Creators	Sayeed Shafayet Chowdhury (19469710)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/ENHANCING_VISUAL_UNDERSTANDING_AND_ENERGY-EFFICIENCY_IN_DEEP_NEURAL_NETWORKS/26811904

Page generated in 0.0022 seconds

ENHANCING VISUAL UNDERSTANDING AND ENERGY-EFFICIENCY IN DEEP NEURAL NETWORKS

Description

Links & Downloads

Tags

Additional Fields