• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 1
  • 1
  • Tagged with
  • 22
  • 22
  • 15
  • 11
  • 10
  • 9
  • 7
  • 7
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Enhancing Efficiency and Trustworthiness of Deep Learning Algorithms

Isha Garg (15341896) 24 April 2023 (has links)
<p>This dissertation explore two major goals in Deep Learning algorithm design: efficiency and trustworthiness. We motivate these concerns in Chapter 1 and give relevant background in Chapter 2. We then discuss six works to target these two goals. </p> <p>The first of these discusses how to make the model compression methodology more efficient, so it can be done in a single shot. This allows us to create models with reduced size and layers, so we can have faster and more efficient inference, and is covered in Chapter 3. We then extend this to target efficiency in continual learning in Chapter 4, while mitigating the problem of catastrophic forgetting. The method discussed also allows us to circumvent the potential for data leakage by avoiding the need to store any data from the past tasks. Next, we consider brain-inspired computing as an alternative to traditional neural networks to improve compute efficiency of networks. The spiking neural networks discussed however have large inference latency due to the need for accumulating spikes over many timesteps. We tackle this by introducing a new scheme that distributes an image over time by breaking it down into a sum of its ranked sinusoidal bases in Chapter 5. This results in networks that are faster and more efficient to deploy. Chapter 6 targets mitigating both the communication expense and potential for data leakage in federated learning, by distilling the gradients to be communicated in a small number of images that resemble noise. Communicating these images is more efficient, and circumvents the potential for data leakage as they resemble noise. We then explore the applications of studying curvature of loss with respect to input data points in the last two chapters. We first utilize curvature to create performant coresets to reduce the size of datasets, to make training more efficient in Chapter 7. In Chapter 8, we use curvature as a metric for overfitting and use it to expose dataset integrity issues arising from memorization.</p>
12

Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment

Akash Gaikwad (5931047) 17 January 2019 (has links)
<p>In recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems.</p> <p>This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model. </p> <p>This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model.</p> <p>1: Pruning based on Taylor expansion of change in cost function Delta C.</p> <p>2: Pruning based on L<sub>2</sub> normalization of activation maps.</p> <p>3: Pruning based on a combination of method 1 and method 2.</p><p>The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L<sub>2</sub> normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.</p><p></p>
13

[en] APPROXIMATE BORN AGAIN TREE ENSEMBLES / [pt] ÁRVORES BA APROXIMADAS

28 October 2021 (has links)
[pt] Métodos ensemble como random forest, boosting e bagging foram extensivamente estudados e provaram ter uma acurácia melhor do que usar apenas um preditor. Entretanto, a desvantagem é que os modelos obtidos utilizando esses métodos podem ser muito mais difíceis de serem interpretados do que por exemplo, uma árvore de decisão. Neste trabalho, nós abordamos o problema de construir uma árvore de decisão que aproximadamente reproduza um conjunto de árvores, explorando o tradeoff entre acurácia e interpretabilidade, que pode ser alcançado quando a reprodução exata do conjunto de árvores é relaxada. Primeiramente, nós formalizamos o problem de obter uma árvore de decisão de uma determinada profundidade que seja a mais aderente ao conjunto de árvores e propomos um algoritmo de programação dinâmica para resolver esse problema. Nós também provamos que a árvore de decisão obtida por esse procedimento satisfaz garantias de generalização relacionadas a generalização do modelo original de conjuntos de árvores, um elemento crucial para a efetividade dessa árvore de decisão em prática. Visto que a complexidade computacional do algoritmo de programação dinâmica é exponencial no número de features, nós propomos duas heurísticas para gerar árvores de uma determinada profundidade com boa aderência em relação ao conjunto de árvores. Por fim, nós conduzimos experimentos computacionais para avaliar os algoritmos propostos. Quando utilizados classificadores mais interpretáveis, os resultados indicam que em diversas situações a perda em acurácia é pequena ou inexistente: restrigindo a árvores de decisão de profundidade 6, nossos algoritmos produzem árvores que em média possuem acurácias que estão a 1 por cento (considerando o algoritmo de programção dinâmica) ou 2 por cento (considerando os algoritmos heurísticos) do conjunto original de árvores. / [en] Ensemble methods in machine learning such as random forest, boosting, and bagging have been thoroughly studied and proven to have better accuracy than using a single predictor. However, their drawback is that they give models that can be much harder to interpret than those given by, for example, decision trees. In this work, we approach in a principled way the problem of constructing a decision tree that approximately reproduces a tree ensemble, exploring the tradeoff between accuracy and interpretability that can be obtained once exact reproduction is relaxed. First, we formally define the problem of obtaining the decision tree of a given depth that is most adherent to a tree ensemble and give a Dynamic Programming algorithm for solving this problem. We also prove that the decision trees obtained by this procedure satisfy generalization guarantees related to the generalization of the original tree ensembles, a crucial element for their effectiveness in practice. Since the computational complexity of the Dynamic Programming algorithm is exponential in the number of features, we also design heuristics to compute trees of a given depth with good adherence to a tree ensemble. Finally, we conduct a comprehensive computational evaluation of the algorithms proposed. The results indicate that in many situations, there is little or no loss in accuracy in working more interpretable classifiers: even restricting to only depth-6 decision trees, our algorithms produce trees with average accuracies that are within 1 percent (for the Dynamic Programming algorithm) or 2 percent (heuristics) of the original random forest.
14

Representation and Efficient Computation of Sparse Matrix for Neural Networks in Customized Hardware

Yan, Lihao January 2022 (has links)
Deep Neural Networks are widely applied to various kinds of fields nowadays. However, hundreds of thousands of neurons in each layer result in intensive memory storage requirement and a massive number of operations, making it difficult to employ deep neural networks on mobile devices where the hardware resources are limited. One common technique to address the memory limitation is to prune and quantize the neural networks. Besides, due to the frequent usage of Rectified Linear Unit (ReLU) function or network pruning, majority of the data in the weight matrices will be zeros, which will not only take up a large amount of memory space but also cause unnecessary computation operations. In this thesis, a new value-based compression method is put forward to represent sparse matrix more efficiently by eliminating these zero elements, and a customized hardware is implemented to realize the decompression and computation operations. The value-based compression method is aimed to replace the nonzero data in each column of the weight matrix with a reference value (arithmetic mean) and the relative differences between each nonzero element and the reference value. Intuitively, the data stored in each column is likely to contain similar values. Therefore, the differences will have a narrow range, and fewer bits rather than the full form will be sufficient to represent all the differences. In this way, the weight matrix can be further compressed to save memory space. The proposed value-based compression method reduces the memory storage requirement for the fully-connected layers of AlexNet to 37%, 41%, 47% and 68% of the compressed model, e.g., the Compressed Sparse Column (CSC) format, when the data size is set to 8 bits and the sparsity is 20%, 40%, 60% and 80% respectively. In the meanwhile, 41%, 53% and 63% compression rates of the fully-connected layers of the compressed AlexNet model with respect to 8-bit, 16-bit and 32-bit data are achieved when the sparsity is 40%. Similar results are obtained for VGG16 experiment. / Djupa neurala nätverk används i stor utsträckning inom olika fält nuförtiden. Emellertid ställer hundratusentals neuroner per lager krav på intensiv minneslagring och ett stort antal operationer, vilket gör det svårt att använda djupa neurala nätverk på mobila enheter där hårdvaruresurserna är begränsade. En vanlig teknik för att hantera minnesbegränsningen är att beskära och kvantifiera de neurala nätverken. På grund av den frekventa användningen av Rectified Linear Unit (ReLU) -funktionen eller nätverksbeskärning kommer majoriteten av datat i viktmatriserna att vara nollor, vilket inte bara tar upp mycket minnesutrymme utan också orsakar onödiga beräkningsoperationer. I denna avhandling presenteras en ny värdebaserad komprimeringsmetod för att representera den glesa matrisen mer effektivt genom att eliminera dessa nollelement, och en anpassad hårdvara implementeras för att realisera dekompressions- och beräkningsoperationerna. Den värdebaserade komprimeringsmetoden syftar till att ersätta icke-nolldata i varje kolumn i viktmatrisen med ett referensvärde (aritmetiskt medelvärde) och de relativa skillnaderna mellan varje icke-nollelement och referensvärdet. Intuitivt kommer data som lagras i varje kolumn sannolikt att innehålla liknande värden. Därför kommer skillnaderna att ha ett smalt intervall, och färre bitar snarare än den fullständiga formen kommer att räcka för att representera alla skillnader. På så sätt kan viktmatrisen komprimeras ytterligare för att spara minnesutrymme. Den föreslagna värdebaserade komprimeringsmetoden minskar minneslagringskravet för de helt anslutna lagren av AlexNet till 37%, 41%, 47% och 68% av den komprimerade modellen, t.ex. Compressed Sparse Column (CSC) format, när datastorleken är inställd på 8 bitar och sparsiteten är 20%, 40%, 60% respektive 80%. Under tiden uppnås 41%, 53% och 63% komprimeringshastigheter för de helt anslutna lagren i den komprimerade AlexNet-modellen med avseende på 8- bitars, 16-bitars och 32-bitars data när sparsiteten är 40%. Liknande resultat erhålls för VGG16-experiment.
15

Efficient Edge Intelligence In the Era of Big Data

Jun Hua Wong (11013474) 05 August 2021 (has links)
Smart wearables, known as emerging paradigms for vital big data capturing, have been attracting intensive attentions. However, one crucial problem is their power-hungriness, i.e., the continuous data streaming consumes energy dramatically and requires devices to be frequently charged. Targeting this obstacle, we propose to investigate the biodynamic patterns in the data and design a data-driven approach for intelligent data compression. We leverage Deep Learning (DL), more specifically, Convolutional Autoencoder (CAE), to learn a sparse representation of the vital big data. The minimized energy need, even taking into consideration the CAE-induced overhead, is tremendously lower than the original energy need. Further, compared with state-of-the-art wavelet compression-based method, our method can compress the data with a dramatically lower error for a similar energy budget. Our experiments and the validated approach are expected to boost the energy efficiency of wearables, and thus greatly advance ubiquitous big data applications in era of smart health.<br><div>In recent years, there has also been a growing interest in edge intelligence for emerging instantaneous big data inference. However, the inference algorithms, especially deep learning, usually require heavy computation requirements, thereby greatly limiting their deployment on the edge. We take special interest in the smart health wearable big data mining and inference. <br></div><div><br></div><div>Targeting the deep learning’s high computational complexity and large memory and energy requirements, new approaches are urged to make the deep learning algorithms ultra-efficient for wearable big data analysis. We propose to leverage knowledge distillation to achieve an ultra-efficient edge-deployable deep learning model. More specifically, through transferring the knowledge from a teacher model to the on-edge student model, the soft target distribution of the teacher model can be effectively learned by the student model. Besides, we propose to further introduce adversarial robustness to the student model, by stimulating the student model to correctly identify inputs that have adversarial perturbation. Experiments demonstrate that the knowledge distillation student model has comparable performance to the heavy teacher model but owns a substantially smaller model size. With adversarial learning, the student model has effectively preserved its robustness. In such a way, we have demonstrated the framework with knowledge distillation and adversarial learning can, not only advance ultra-efficient edge inference, but also preserve the robustness facing the perturbed input.</div>
16

Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment

Gaikwad, Akash S. 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems. This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model. This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model. 1: Pruning based on Taylor expansion of change in cost function Delta C. 2: Pruning based on L2 normalization of activation maps. 3: Pruning based on a combination of method 1 and method 2. The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L2 normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.
17

Convolutional and recurrent neural networks for real-time speech separation in the complex domain

Tan, Ke 16 September 2021 (has links)
No description available.
18

Towards Green AI: Cost-Efficient Deep Learning using Domain Knowledge

Srivastava, Sangeeta 12 August 2022 (has links)
No description available.
19

Distributed Intelligence for Multi-Robot Environment : Model Compression for Mobile Devices with Constrained Computing Resources / Distribuerad intelligens för multirobotmiljö : Modellkomprimering för mobila enheter med begränsade datorresurser

Souroulla, Timotheos January 2021 (has links)
Human-Robot Collaboration (HRC), where both humans and robots work in the same environment simultaneously, is an emerging field and has increased massively during the past decade. For this collaboration to be feasible and safe, robots need to perform a proper safety analysis to avoid hazardous situations. This safety analysis procedure involves complex computer vision tasks that require a lot of processing power. Therefore, robots with constrained computing resources cannot execute these tasks without any delays, thus for executing these tasks they rely on edge infrastructures, such as remote computational resources accessible over wireless communication. In some cases though, the edge may be unavailable, or connection to it may not be possible. In such cases, robots still have to navigate themselves around the environment, while maintaining high levels of safety. This thesis project focuses on reducing the complexity and the total number of parameters of pre-trained computer vision models by using model compression techniques, such as pruning and knowledge distillation. These model compression techniques have strong theoretical and practical foundations, but work on their combination is limited, therefore it is investigated in this work. The results of this thesis project show that in the test cases, up to 90% of the total number of parameters of a computer vision model can be removed without any considerable reduction in the model’s accuracy. / Människa och robot samarbete (förkortat HRC från engelskans Human-Robot Collaboration), där både människor och robotar arbetar samtidigt i samma miljö, är ett växande forskningsområde och har ökat dramatiskt över de senaste decenniet. För att detta samarbetet ska vara möjligt och säkert behöver robotarna genomgå en ordentlig säkerhetsanalys så att farliga situationer kan undvikas. Denna säkerhetsanalys inkluderar komplexa Computer Vision uppgifter som kräver mycket processorkraft. Därför kan inte robotar med begränsad processorkraft utföra dessa beräkningar utan fördröjning, utan måste istället förlita sig på utomstående infrastruktur för att exekvera dem. Vid vissa tillfällen kan dock denna utomstående infrastruktur inte finnas på plats eller vara svår att koppla upp sig till. Även vid dessa tillfällen måste robotar fortfarande kunna navigera sig själva genom en lokal, och samtidigt upprätthålla hög grad av säkerhet. Detta projekt fokuserar på att reducera komplexiteten och det totala antalet parametrar av för-tränade Computer Vision-modeller genom att använda modellkompressionstekniker så som: Beskärning och kunskapsdestilering. Dessa modellkompressionstekniker har starka teoretiska grunder och praktiska belägg, men mängden arbeten kring deras kombinerade effekt är begränsad, därför är just det undersökt i detta arbetet. Resultaten av det här projektet visar att up till 90% av det totala antalet parametrar hos en Computer Vision-modell kan tas bort utan någon noterbar försämring av modellens säkerhet.
20

Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks

Mealey, Thomas C. 31 May 2018 (has links)
No description available.

Page generated in 0.1034 seconds