Global ETD Search

1	ENHANCING VISUAL UNDERSTANDING AND ENERGY-EFFICIENCY IN DEEP NEURAL NETWORKS Sayeed Shafayet Chowdhury (19469710) 23 August 2024 (has links) <p dir="ltr">Today’s deep neural networks (DNNs) have achieved tremendous performance in various domains such as computer vision, natural language processing, robotics, generative tasks etc. However, these high-performing DNNs require enormous amounts of compute, resulting in significant power consumption. Moreover, these often struggle in terms of visual understanding capabilities. To that effect, this thesis focuses on two aspects - enhancing efficiency of neural networks and improving their visual understanding. On the efficiency front, we leverage brain-inspired Spiking Neural Networks (SNNs), which offer a promising alternative to traditional deep learning. We first perform a comparative analysis between models with and without leak, revealing that leaky-integrate-and-fire (LIF) model provides improved robustness and better generalization compared to integrate-and-fire (IF). However, leak decreases the sparsity of computation. In the second work, by introducing a Discrete Cosine Transform-based novel spike encoding scheme (DCT-SNN), we demonstrate significant performance improvements, achieving 2-14X reduction in latency compared to state-of-the-art SNNs. Next, a novel temporal pruning method is proposed, which dynamically reduces the number of timesteps during training, enabling SNN inference with just one timestep while maintaining high accuracy. The second focus of the thesis is on improving the visual understanding aspect of DNNs. The first work along this direction introduces a framework for visual syntactic understanding, drawing parallels between linguistic syntax and visual components of an image. By manipulating images to create syntactically incorrect examples and using a BERT-like autoencoder for reconstruction, the study significantly enhances the visual syntactic recognition capabilities of DNNs, evidenced by substantial improvements in classification accuracies on the CelebA and AFHQ datasets. Further, the thesis tackles unsupervised procedure learning from videos, given multiple videos of the same underlying task. Employing optimal transport (OT) and introducing novel regularization strategies, we develop the ‘OPEL’ framework, which substantially outperforms existing methods (27-46% average enhancement in F1-score) on both egocentric and third-person benchmarks. Overall, the dissertation advances the field by proposing brain-inspired models and novel learning frameworks that significantly enhance the efficiency and visual understanding capabilities of deep learning systems, making them more suitable for real-world applications.</p> Computer vision Pattern recognition spiking neural models DCT-based spike encoding temporal pruning Visual syntax

Search results

ENHANCING VISUAL UNDERSTANDING AND ENERGY-EFFICIENCY IN DEEP NEURAL NETWORKS