1 |
ARCHITECTURE AND MAPPING CO-EXPLORATION AND OPTIMIZATION FOR DNN ACCELERATORSTrewin, Benjamin Nicholas 01 May 2024 (has links) (PDF)
It is extremely difficult to optimize a deep neural network (DNN) accelerator’s performance on various networks in terms of energy and/or latency because of the sheer size of the search space. Not only do DNN accelerators have a huge search space of different hardware architecture topologies and characteristics, which may perform better or worse on certain DNNs, but also DNN layers can be mapped to hardware in a huge array of different configurations. Further, an optimal mapping for one DNN architecture is not consistently the same on a different architecture. These two factors depend on one another. Thus there is a need for co-optimization to take place so hardware characteristics and mapping can be optimized simultaneously, to find not only an optimal mapping but also the best architecture for a DNN as well. This work presents Blink, a design space exploration (DSE) tool, which co-optimizes hardware attributes and mapping configurations. This tool enables users to find optimal hardware architectures through the use of a genetic algorithm and further finds optimal mappings for each hardware configuration using a pruned random selection method. Architecture, layers, and mappings are each sent to Timeloop, a DNN accelerator simulator, to obtain accelerator statistics, which are sent back to the genetic algorithm for next population selection. Through this method, novel DNN accelerator solutions can be identified without tackling the computationally massive task of simulating exhaustively.
|
2 |
Deep Neural Network Approach for Single Channel Speech Enhancement ProcessingLi, Dongfu January 2016 (has links)
Speech intelligibility represents how comprehensible a speech is. It is more important than speech quality in some applications. Single channel speech intelligibility enhancement is much more difficult than multi-channel intelligibility enhancement. It has recently been reported that training-based single channel speech intelligibility enhancement algorithms perform better than Signal to Noise Ratio (SNR) based algorithm. In this thesis, a training-based Deep Neural Network (DNN) is used to improve single channel speech intelligibility. To increase the performance of the DNN, the Multi-Resolution Cochlea Gram (MRCG) feature set is used as the input of the DNN. MATLAB objective test results show that the MRCG-DNN approach is more robust than a Gaussian Mixture Model (GMM) approach. The MRCG-DNN also works better than other DNN training algorithms. Various conditions such as different speakers, different noise conditions and reverberation were tested in the thesis.
|
3 |
Camera ISP optimization for computer vision tasks performed by deep neural networksXiao, Zhenghong January 2023 (has links)
This thesis aims to improve the performance of Deep Neural Networkss (DNNs) in Computer Vision tasks by optimizing the Image Signal Processor (ISP) parameters. The research investigates the use of simulated RAW images and the application of the DRL-ISP (Deep Reinforcement Learning for Image Signal Processor) method to enhance the accuracy and robustness of DNNs. The study begins by utilizing the Unpaired CycleR2R method to generate simulated RAW images from RGB images. The trained inverse ISP model successfully transforms the RGB images into simulated RAW images. The performance of DNNs in the Semantic Segmentation and Object Detection tasks is evaluated using both the simulated RAW and original RGB datasets. The results demonstrate the superiority of models trained on the original RGB dataset, highlighting the challenges and limitations of using simulated RAW images. Furthermore, the application of the DRL-ISP method for ISP parameter optimization improves Object Detection performance. This thesis provides valuable insights into the challenges and opportunities in utilizing simulated RAW data and optimizing ISP parameters for improved DNN performance in Computer Vision tasks. The findings contribute to the advancement of research in this field and lay the foundation for future investigations. / Syftet med denna uppsats är att förbättra Deep Neural Networkss (DNNs) prestanda i datorseendeuppgifter genom att optimera parametrarna för Image Signal Processing (ISP). I forskningen undersöks användningen av simulerade RAW-bilder och tillämpningen av DRL-ISP (Deep Reinforcement Learning for Image Signal Processing) för att förbättra DNN:s noggrannhet och robusthet. Undersökningen inleds med att använda metoden Unpaired CycleR2R för att generera simulerade RAW-bilder från RGB-bilder. Den tränade omvända ISP-modellen omvandlar framgångsrikt RGB-bilderna till simulerade RAW-bilder. DNN:s prestanda vid semantisk segmentering och objektdetektering utvärderas med hjälp av både simulerade RAW- och ursprungliga RGB-dataset. Resultaten visar att modeller som tränats på de ursprungliga RGB bilderna är överlägsna och belyser utmaningarna och begränsningarna med att använda simulerade RAW-bilder. Dessutom förbättrar tillämpningen av DRL-ISP-metoden för optimering av ISP-parametrar prestanda för objektdetektering. Den här uppsatsen ger värdefulla insikter i utmaningarna och möjligheterna med att använda simulerade RAW-data och optimera ISP-parametrar för förbättrad DNNprestanda i datorseendeuppgifter. Resultaten bidrar till att främja forskningen på detta område och lägger grunden för framtida undersökningar.
|
4 |
Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural NetworksJanuary 2018 (has links)
abstract: Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.
To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.
This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs. / Dissertation/Thesis / Masters Thesis Computer Engineering 2018
|
5 |
Design Space Exploration of MobileNet for Suitable Hardware DeploymentDEBJYOTI SINHA (8764737) 28 April 2020 (has links)
<p> Designing self-regulating machines that can see and
comprehend various real world objects around it are the main purpose of the AI
domain. Recently, there has been marked
advancements in the field of deep learning to create state-of-the-art DNNs for
various CV applications. It is
challenging to deploy these DNNs into resource-constrained micro-controller
units as often they are quite memory intensive. Design Space Exploration is a technique which makes CNN/DNN memory
efficient and more flexible to be deployed into resource-constrained
hardware. MobileNet is small DNN architecture
which was designed for embedded and mobile vision, but still researchers faced
many challenges in deploying this model into resource limited real-time processors.</p><p> This thesis, proposes three new DNN architectures, which are
developed using the Design Space Exploration technique. The state-of-the art
MobileNet baseline architecture is used as foundation to propose these DNN architectures
in this study. They are enhanced versions of the baseline MobileNet
architecture. DSE techniques like data augmentation, architecture tuning, and architecture
modification have been done to improve the baseline architecture. First, the
Thin MobileNet architecture is proposed which uses more intricate block modules
as compared to the baseline MobileNet. It is a compact, efficient and flexible
architecture with good model accuracy. To get a more compact models, the
KilobyteNet and the Ultra-thin MobileNet DNN architecture is proposed.
Interesting techniques like channel depth alteration and hyperparameter tuning
are introduced along-with some of the techniques used for designing the Thin
MobileNet. All the models are trained and validated from scratch on the CIFAR-10 dataset. The experimental results (training and testing) can be visualized using the live accuracy and logloss graphs provided by the Liveloss package. The Ultra-thin MobileNet model is more balanced in terms of the model accuracy and model size out of the three and hence it is deployed into the NXP i.MX RT1060 embedded hardware unit for image classification application.</p>
|
6 |
Robust speech recognition in noisy and reverberant environments using deep neural network-based systemsNovoa Ilic, José Eduardo January 2018 (has links)
Doctor en Ingeniería Eléctrica / In this thesis an uncertainty weighting scheme for deep neural network-hidden Markov model (DNN-HMM) based automatic speech recognition (ASR) is proposed to increase discriminability in the decoding process. To this end, the DNN pseudo-log-likelihoods are weighted according to the uncertainty variance assigned to the acoustic observation. The results presented here suggest that substantial reduction in word error rate (WER) is achieved with clean training. Moreover, modelling the uncertainty propagation through the DNN is not required and no approximations for non linear activation functions are made. The presented method can be applied to any network topology that delivers log likelihood-like scores. It can be combined with any noise removal technique and adds a minimal computational cost. This technique was exhaustively evaluated and combined with uncertainty-propagation-based schemes for computing the pseudo-log-likelihoods and uncertainty variance at the DNN output. Two proposed methods optimized the parameters of the weighting function by leveraging the grid search either on a development database representing the given task or on each utterance based on discrimination metrics. Experiments with Aurora-4 task showed that, with clean training, the proposed weighting scheme can reduce WER by a maximum of 21% compared with a baseline system with spectral subtraction and uncertainty propagation using the unscented transform.
Additionally, it is proposed to replace the classical black box integration of automatic speech recognition technology in human-robot interaction (HRI) applications with the incorporation of the HRI environment representation and modeling, and the robot and user states and contexts. Accordingly, this thesis focuses on the environment representation and modeling by training a DNN-HMM based automatic speech recognition engine combining clean utterances with the acoustic channel responses and noise that were obtained from an HRI testbed built with a PR2 mobile manipulation robot. This method avoids recording a training database in all the possible acoustic environments given an HRI scenario. In the generated testbed, the resulting ASR engine provided a WER that is at least 26% and 38% lower than publicly available speech recognition application programming interfaces (APIs) with the loudspeaker and human speakers testing databases, respectively, with a limited amount of training data.
This thesis demonstrates that even state-of-the-art DNN-HMM based speech recognizers can benefit by combining systems for which the acoustic models have been trained using different feature sets. In this context, the complementarity of DNN-HMM based ASR systems trained with the same data set but with different signal representations is discussed. DNN fusion methods based on flat-weight combination, the minimization of mutual information and the maximization of discrimination metrics were proposed and tested. Schemes that consider the combination of ASR systems with lattice combination and minimum Bayes risk decoding were also evaluated and combined with DNN fusion techniques. The experimental results were obtained using a publicly-available naturally-recorded highly reverberant speech data. Significant improvements in WER were observed by combining DNN-HMM based ASR systems with different feature sets, obtaining relative improvements of 10% with two classifiers and 18% with four classifiers, without any tuning or a priori information of the ASR accuracy.
|
7 |
Squeeze-and-Excitation SqueezeNext: An Efficient DNN for Hardware DeploymentNaga Venkata Sai Ravi Teja Chappa (8742342) 22 April 2020 (has links)
<div>Convolution neural network is being used in field of autonomous driving vehicles or driver assistance systems (ADAS), and has achieved great success. Before the convolution neural network, traditional machine learning algorithms helped the driver assistance systems. Currently, there is a great exploration being done in architectures like MobileNet, SqueezeNext & SqueezeNet. It improved the CNN architectures and made it more suitable to implement on real-time embedded systems. </div><div> </div><div> This thesis proposes an efficient and a compact CNN to ameliorate the performance of existing CNN architectures. The intuition behind this proposed architecture is to supplant convolution layers with a more sophisticated block module and to develop a compact architecture with a competitive accuracy. Further, explores the bottleneck module and squeezenext basic block structure. The state-of-the-art squeezenext baseline architecture is used as a foundation to recreate and propose a high performance squeezenext architecture. The proposed architecture is further trained on the CIFAR-10 dataset from scratch. All the training and testing results are visualized with live loss and accuracy graphs. Focus of this thesis is to make an adaptable and a flexible model for efficient CNN performance which can perform better with the minimum tradeoff between model accuracy, size, and speed. Having a model size of 0.595MB along with accuracy of 92.60% and with a satisfactory training and validating speed of 9 seconds, this model can be deployed on real-time autonomous system platform such as Bluebox 2.0 by NXP.</div>
|
8 |
Squeeze-and-Excitation SqueezeNext: An Efficient DNN for Hardware DeploymentChappa, Naga Venkata Sai Raviteja 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Convolution neural network is being used in field of autonomous driving vehicles or driver assistance systems (ADAS), and has achieved great success. Before the convolution neural network, traditional machine learning algorithms helped the driver assistance systems. Currently, there is a great exploration being done in architectures like MobileNet, SqueezeNext & SqueezeNet. It improved the CNN architectures and made it more suitable to implement on real-time embedded systems.
This thesis proposes an efficient and a compact CNN to ameliorate the performance of existing CNN architectures. The intuition behind this proposed architecture is to supplant convolution layers with a more sophisticated block module and to develop a compact architecture with a competitive accuracy. Further, explores the bottleneck module and squeezenext basic block structure. The state-of-the-art squeezenext baseline architecture is used as a foundation to recreate and propose a high performance squeezenext architecture. The proposed architecture is further trained on the CIFAR-10 dataset from scratch. All the training and testing results are visualized with live loss and accuracy graphs. Focus of this thesis is to make an adaptable and a flexible model for efficient CNN performance which can perform better with the minimum tradeoff between model accuracy, size, and speed. Having a model size of 0.595MB along with accuracy of 92.60% and with a satisfactory training and validating speed of 9 seconds, this model can be deployed on real-time autonomous system platform such as Bluebox 2.0 by NXP.
|
9 |
Design Space Exploration of MobileNet for Suitable Hardware DeploymentSinha, Debjyoti 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Designing self-regulating machines that can see and comprehend various real world objects around it are the main purpose of the AI domain. Recently, there has been marked advancements in the field of deep learning to create state-of-the-art DNNs for various CV applications. It is challenging to deploy these DNNs into resource-constrained micro-controller units as often they are quite memory intensive. Design Space Exploration is a technique which makes CNN/DNN memory efficient and more flexible to be deployed into resource-constrained hardware. MobileNet is small DNN architecture which was designed for embedded and mobile vision, but still researchers faced many challenges in deploying this model into resource limited real-time processors.
This thesis, proposes three new DNN architectures, which are developed using the Design Space Exploration technique. The state-of-the art MobileNet baseline architecture is used as foundation to propose these DNN architectures in this study. They are enhanced versions of the baseline MobileNet architecture. DSE techniques like data augmentation, architecture tuning, and architecture modification have been done to improve the baseline architecture. First, the Thin MobileNet architecture is proposed which uses more intricate block modules as compared to the baseline MobileNet. It is a compact, efficient and flexible architecture with good model accuracy. To get a more compact models, the KilobyteNet and the Ultra-thin MobileNet DNN architecture is proposed. Interesting techniques like channel depth alteration and hyperparameter tuning are introduced along-with some of the techniques used for designing the Thin MobileNet. All the models are trained and validated from scratch on the CIFAR-10 dataset. The experimental results (training and testing) can be visualized using the live accuracy and logloss graphs provided by the Liveloss package. The Ultra-thin MobileNet model is more balanced in terms of the model accuracy and model size out of the three and hence it is deployed into the NXP i.MX RT1060 embedded hardware unit for image classification application.
|
10 |
A Series of Improved and Novel Methods in Computer Vision EstimationAdams, James J 07 December 2023 (has links) (PDF)
In this thesis, findings in three areas of computer vision estimation are presented. First, an improvement to the Kanade-Lucas-Tomasi (KLT) feature tracking algorithm is presented in which gyroscope data is incorporated to compensate for camera rotation. This improved algorithm is then compared with the original algorithm and shown to be more effective at tracking features in the presence of large rotational motion. Next, a deep neural network approach to depth estimation is presented. Equations are derived relating camera and feature motion to depth. The information necessary for depth estimation is given as inputs to a deep neural network, which is trained to predict depth across an entire scene. This deep neural network approach is shown to be effective at predicting the general structure of a scene. Finally, a method of passively estimating the position and velocity of constant velocity targets using only bearing and time-to-collision measurements is presented. This method is paired with a path planner to avoid tracked targets. Results are given to show the effectiveness of the method at avoiding collision while maneuvering as little as possible.
|
Page generated in 0.0717 seconds