Spelling suggestions: "subject:"model compression"" "subject:"godel compression""
1 |
Vector Quantization of Deep Convolutional Neural Networks with Learned CodebookYang, Siyuan 16 February 2022 (has links)
Deep neural networks (DNNs), particularly convolutional neural networks (CNNs), have been widely applied in the many fields, such as computer vision, natural language processing, speech recognition and etc. Although DNNs achieve dramatic accuracy improvements in these real-world tasks, they require significant amounts of resources (e.g., memory, energy, storage, bandwidth and computation resources). This limits the application of these networks on resource-constrained systems, such as mobile and edge devices. A large body of literature has been proposed to addresses this problem from the perspective of compressing DNNs while preserving their performance. In this thesis, we focus on compressing deep CNNs based on vector quantization techniques.
The first part of this thesis summarizes some basic concepts in machine learning and popular techniques on model compression, including pruning, quantization, low-rank factorization and knowledge distillation approaches. Our main interest is quantization techniques, which compress networks by reducing the precision of parameters. Full-precision weights, activations and even gradients in networks can be quantized to 16-bit floating point numbers, 8-bit integers, or even binary numbers. Despite a possible performance degradation, quantization can greatly reduce the model size while maintaining model accuracy.
In the second part of this thesis, we propose a novel vector quantization approach, which we refer to as Vector Quantization with Learned Codebook, or VQLC, for CNNs. Rather than performing scalar quantization, we choose vector quantization that can simultaneously quantize multiple weights at once. Instead of taking a pretraining/clustering approach as in most works, in VQLC, the codebook for quantization are learned together with neural network training from scratch. For the forward pass, the traditional convolutional filters are replaced by the convex combinations of a set of learnable codewords. During inference, the compressed model will be represented by a small-sized codebook and a set of indices, resulting in a significant reduction of model size while preserving the network's performance.
Lastly, we validate our approach by quantizing multiple modern CNNs on several popular image classification benchmarks and compare with state-of-the-art quantization techniques. Our experimental results show that VQLC demonstrates at least comparable and often superior
performance to the existing schemes. In particular, VQLC
demonstrates significant advantages over the existing approaches
on wide networks at the high rate of compression.
|
2 |
Towards the Inference, Understanding, and Reasoning on Edge DevicesMa, Guoqing 10 May 2023 (has links)
This thesis explores the potential of edge devices in three applications: indoor localization, urban traffic prediction, and multi-modal representation learning. For indoor localization, we propose a reliable data transmission network and robust data processing framework by visible light communications and machine learning to enhance the intelligence of smart buildings. The urban traffic prediction proposes a dynamic spatial and temporal origin-destination feature enhanced deep network with the graph convolutional network to collaboratively learn a low-dimensional representation for each region to predict in-traffic and out-traffic for every city region simultaneously. The multi-modal representation learning proposes using dynamic contexts to uniformly model visual and linguistic causalities, introducing a novel dynamic-contexts-based similarity metric that considers the correlation of potential causes and effects to measure the relevance among images.
To enhance distributed training on edge devices, we introduced a new system called Distributed Artificial Intelligence Over-the-Air (AirDAI), which involves local training on raw data and sending trained outputs, such as model parameters, from local clients back to a central server for aggregation. To aid the development of AirDAI in wireless communication networks, we suggested a general system design and an associated simulator that can be tailored based on wireless channels and system-level configurations. We also conducted experiments to confirm the effectiveness and efficiency of the proposed system design and presented an analysis of the effects of wireless environments to facilitate future implementations and updates.
This thesis proposes FedForest to address the communication and computation limitations in heterogeneous edge networks, which optimizes the global network by distilling knowledge from aggregated sub-networks. The sub-network sampling process is differentiable, and the model size is used as an additional constraint to extract a new sub-network for the subsequent local optimization process. FedForest significantly reduces server-to-client communication and local device computation costs compared to conventional algorithms while maintaining performance with the benchmark Top-K sparsification method. FedForest can accelerate the deployment of large-scale deep learning models on edge devices.
|
3 |
Efficient and Secure Deep Learning Inference System: A Software and Hardware Co-design PerspectiveJanuary 2020 (has links)
abstract: The advances of Deep Learning (DL) achieved recently have successfully demonstrated its great potential of surpassing or close to human-level performance across multiple domains. Consequently, there exists a rising demand to deploy state-of-the-art DL algorithms, e.g., Deep Neural Networks (DNN), in real-world applications to release labors from repetitive work. On the one hand, the impressive performance achieved by the DNN normally accompanies with the drawbacks of intensive memory and power usage due to enormous model size and high computation workload, which significantly hampers their deployment on the resource-limited cyber-physical systems or edge devices. Thus, the urgent demand for enhancing the inference efficiency of DNN has also great research interests across various communities. On the other hand, scientists and engineers still have insufficient knowledge about the principles of DNN which makes it mostly be treated as a black-box. Under such circumstance, DNN is like "the sword of Damocles" where its security or fault-tolerance capability is an essential concern which cannot be circumvented.
Motivated by the aforementioned concerns, this dissertation comprehensively investigates the emerging efficiency and security issues of DNNs, from both software and hardware design perspectives. From the efficiency perspective, as the foundation technique for efficient inference of target DNN, the model compression via quantization is elaborated. In order to maximize the inference performance boost, the deployment of quantized DNN on the revolutionary Computing-in-Memory based neural accelerator is presented in a cross-layer (device/circuit/system) fashion. From the security perspective, the well known adversarial attack is investigated spanning from its original input attack form (aka. Adversarial example generation) to its parameter attack variant. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
|
4 |
A Method of Combining GANs to Improve the Accuracy of Object Detection on Autonomous VehiclesYe, Fanjie 12 1900 (has links)
As the technology in the field of computer vision becomes more and more mature, the autonomous vehicles have achieved rapid developments in recent years. However, the object detection and classification tasks of autonomous vehicles which are based on cameras may face problems when the vehicle is driving at a relatively high speed. One is that the camera will collect blurred photos when driving at high speed which may affect the accuracy of deep neural networks. The other is that small objects far away from the vehicle are difficult to be recognized by networks. In this paper, we present a method to combine two kinds of GANs to solve these problems. We choose DeblurGAN as the base model to remove blur in images. SRGAN is another GAN we choose for solving small object detection problems. Due to the total time of these two are too long, we still do the model compression on it to make it lighter. Then we use the Yolov4 to do the object detection. Finally we do the evaluation of the whole model architecture and proposed a model version 2 based on DeblurGAN and ESPCN which is faster than previous one but the accuracy may be lower.
|
5 |
Improving Communication Efficiency And Convergence In Federated LearningLiu, Yangyi January 2024 (has links)
Federated learning is an emerging field that has received tremendous attention as it enables training Deep Neural Networks in a distributed fashion. By keeping the data decentralized, Federated Learning enhances data privacy and security while maintaining the ability to train robust machine learning models. Unfortunately, despite these advantages, the communication overhead resulting from the demand for fre- quent communication between the central server and remote clients poses a serious challenge to the present-day communication infrastructure. As the size of the deep learning models and the number of devices participating in the training are ever in- creasing, the model gradient transmission between the remote clients and the central server orchestrating the training process becomes the critical performance bottleneck. In this thesis, we investigate and address the problems related to improving the communication efficiency while maintaining convergence speed and accuracy in Federated Learning. To characterize the trade-off between communication cost and convergence in Federated Learning, an innovative formulation utilizing the clients’ correlation is proposed, which considers gradient transmission and reconstruction problems as a multi-terminal source coding problem. Leveraging this formulation, the model up- date problem in Federated Learning is converted to a convex optimization problem from a rate-distortion perspective. Technical results, including an iterative algorithm to solve for the upper bound and lower bound of the sum-rate, as well as the rate allocation schemes, are provided. Additionally, a correlation-aware client selection strategy is proposed and evaluated against the state-of-the-art methods. Extensive simulations are conducted to validate our theoretical analysis and the effectiveness of the proposed approaches.
Furthermore, based on the statistical insights about the model gradient, we pro- pose a gradient compression algorithm also inspired by rate-distortion theory. More specifically, the proposed algorithm adopts model-wise sparsification for preliminary gradient dimension reduction and then performs layer-wise gradient quantization for further compression. The experimental results show that our approach achieves compression as aggressive as 1-bit while maintaining proper model convergence speed and final accuracy. / Thesis / Doctor of Science (PhD) / Federated Learning is a machine learning framework that allows remote clients to collaboratively train a model without raw data exchange, which ensures local data privacy. It differs from traditional machine learning scenarios where data needs to be stored centrally. This decentralized framework is advantageous in several respects including: data security, data diversity, real-time continual learning and hardware efficiency. However, the demand for frequent communication between clients and the server imposes tremendous communication challenges in applying Federated Learning to real-world scenarios. This thesis aims to tackle the problems in FL by theoretically characterizing the problem and developing practical methodologies. The theoretical results allow for systematic analysis of the communication cost and convergence rate. The experimental results validate the effectiveness of the proposed methods in improving communication efficiency and convergence in Federated Learning.
|
6 |
Μοντέλο προσομοίωσης συμπίεσης μαστού / A simulation model for breast compressionΖυγανιτίδης, Χρήστος 28 June 2007 (has links)
Η προσομοίωση της συμπίεσης του μαστού κατά την διάρκεια της μαστογραφίας πραγματοποιείται σήμερα με την μέθοδο των πεπερασμένων στοιχείων (FEM), η οποία επιβάλει την χρήση ενός μικρού αριθμού κόμβων, λόγω της μεγάλης υπολογιστικής ισχύς που απαιτεί, οδηγώντας σε χαμηλή ανάλυση της εικόνας του συμπιεσμένου μαστού. Επιπρόσθετα, η ανάγκη για την δημιουργία ενός πλέγματος του υπο συμπίεση όγκου, αποτελεί μία πολύπλοκη και μονότονη διαδικασία που δεν έχει αυτοματοποιηθεί πλήρως μέχρι σήμερα. Στη εργασία αυτή έχει αναπτυχθεί μια νέα μέθοδος για την προσομοίωση της συμπίεσης κατά την διάρκεια της μαστογραφίας, η οποία μπορεί να συμπιέσει ένα εικονικό ομοίωμα μαστού (Breast Phantom) υψηλής ανάλυσης με οποιαδήποτε γεωμετρική δομή και σύνθεση. Κύριος στόχος δεν αποτέλεσε η ακριβής μοντελοποίηση της δομής και συμπεριφοράς των ανθρώπινων ιστών, αλλά η απόκτηση ρεαλιστικών αποτελεσμάτων συμπίεσης μαστού με στόχο μια ακριβέστερη προσομοίωση μαστογραφίας. Η μέθοδος που αναπτύχθηκε βασίζεται σε ένα μοντέλο γραμμικών ελατηρίων και χρησιμοποιεί μια επαναληπτική τυχαία διαδικασία για την επίτευξη της ισορροπίας όλων των διακεκριμένων σημείων του όγκου(κόμβων). Τα στοιχεία του μοντέλου απαρτίζονται από 27 κόμβους, έχοντας 1 κόμβο ως κεντρικό και 26 ως γείτονες. Επιπλέον τα γειτονικά στοιχεία μοιράζονται κοινούς κόμβους και συνεπώς αλληλεπικαλύπτονται. Το πολύ σπουδαίο ζήτημα της διατήρησης του όγκου, επιλύθηκε με την χρήση ελατηρίων μεταβλητού μήκους ισορροπίας τα οποία ανάλογα με την εφαρμοσμένη συμπίεση σε κάθε στοιχείο συστέλλονται ή διαστέλλονται με στόχο την διατήρηση του όγκου κάθε στοιχείου. Τέλος δεν απαιτείται η δημιουργία ενός πλέγματος στον χώρο διότι το μοντέλο βασίζεται στα διακεκριμένα σημεία του breast phantom (voxel-based). Η εφαρμογή αυτής της μεθόδου έκανε εφικτό τον υπολογισμό των νέων θέσεων ισορροπίας 500.000 κόμβων ενός Breast Phantom που δέχθηκε 50% συμπίεση και περιείχε λιπώδη ιστό, δομές αδένων και ανωμαλίες (ασβεστώματα), με μέση απόκλιση μικρότερη από 0.1 χιλιοστά. Η αντιστρεψιμότητα του αλγόριθμου καθώς και η ακρίβεια του εκτιμητή απόκλισης επιβεβαιώθηκαν με την βοήθεια μια «αντίστροφης προσομοίωσης» κατά την διάρκεια της οποίας ένας συμπιεσμένος μαστός αποσυμπιέστηκε πλήρως για να επιστρέψει στην αρχική ασυμπίεστη μορφή του. Η διαδικασία τόσο για την συμπίεση όσο και για την αποσυμπίεση διήρκεσε περίπου 12 ώρες η κάθε μία, σε ένα σύστημα WinXp PC 2.4 GHz. Ο κώδικας του αλγόριθμου είναι γραμμένος σε γλώσσα Java. Τα αποτελέσματα αποδεικνύουν ότι είναι εφικτό να προσομοιωθεί η συμπίεση ενός Breast phantom ανάλυσης 512x512x512 ,απεριόριστου αριθμού ιστών και γεωμετρικών δομών, που μπορεί να χρησιμοποιηθεί σε υψηλής ανάλυσης προσομοίωση μαστογραφίας. / Breast compression during Mammography is currently being simulated using FEM analysis, which due to its computational greed, forces the use of a relatively small number of nodes leading to a low resolution image of the compressed breast. Moreover the mesh generation of the volume under compression is a tedious and complex task which demands user interaction. In the current work a novel method for simulating compression during mammography which uses a high resolution 3D Breast Phantom with any geometrical structure and contents has been developed. This work was not focused on producing a precise model of the structure and behavior of human tissues, but on achieving realistic results of breast compression during mammography, and therefore contributing to a more accurate mammography simulation. This method is based on a linear spring model and uses a repetitive random process to reach the equilibrium position for all the discrete points in the volume (nodes). The elements of the model consist of 27 nodes each, 1 center node and 26 neighbor nodes. Moreover neighbor elements share common nodes, and therefore overlap each other. The very critical issue of volume preservation was resolved by the introduction of variable equilibrium length springs which, depending on the compression applied on each element, expand or contract in order for the volume of each element to be preserved. Finally, user interaction is minimized by dismissing the tedious and time consuming need for a mesh generation and using a fully automated voxel based model instead. Applying this method it was made possible to compute the new position of each of the 500.000 nodes of a Breast Phantom subjected to 50% compression, which was composed of fatty tissue, gland and various abnormalities, with an average deviation less than 0.1 mm. The reversibility of the algorithm as well as the validity of the deviation estimator were verified by the means of a “reverse simulation” during which the compressed breast phantom was perfectly decompressed to its initial uncompressed state. The entire process both for compression and decompression took approximately 12 hours each on a WinXp PC 2.4 GHz. The source code is written in Java language. The results show that it is possible to obtain a 512x512x512 compressed 3D Breast Phantom with unlimited number of different tissues and structures which can be used in high resolution mammography simulation.
|
7 |
Exploiting diversity for efficient machine learningGeras, Krzysztof Jerzy January 2018 (has links)
A common practice for solving machine learning problems is currently to consider each problem in isolation, starting from scratch every time a new learning problem is encountered or a new model is proposed. This is a perfectly feasible solution when the problems are sufficiently easy or, if the problem is hard when a large amount of resources, both in terms of the training data and computation, are available. Although this naive approach has been the main focus of research in machine learning for a few decades and had a lot of success, it becomes infeasible if the problem is too hard in proportion to the available resources. When using a complex model in this naive approach, it is necessary to collect large data sets (if possible at all) to avoid overfitting and hence it is also necessary to use large computational resources to handle the increased amount of data, first during training to process a large data set and then also at test time to execute a complex model. An alternative to this strategy of treating each learning problem independently is to leverage related data sets and computation encapsulated in previously trained models. By doing that we can decrease the amount of data necessary to reach a satisfactory level of performance and, consequently, improve the accuracy achievable and decrease training time. Our attack on this problem is to exploit diversity - in the structure of the data set, in the features learnt and in the inductive biases of different neural network architectures. In the setting of learning from multiple sources we introduce multiple-source cross-validation, which gives an unbiased estimator of the test error when the data set is composed of data coming from multiple sources and the data at test time are coming from a new unseen source. We also propose new estimators of variance of the standard k-fold cross-validation and multiple-source cross-validation, which have lower bias than previously known ones. To improve unsupervised learning we introduce scheduled denoising autoencoders, which learn a more diverse set of features than the standard denoising auto-encoder. This is thanks to their training procedure, which starts with a high level of noise, when the network is learning coarse features and then the noise is lowered gradually, which allows the network to learn some more local features. A connection between this training procedure and curriculum learning is also drawn. We develop further the idea of learning a diverse representation by explicitly incorporating the goal of obtaining a diverse representation into the training objective. The proposed model, the composite denoising autoencoder, learns multiple subsets of features focused on modelling variations in the data set at different levels of granularity. Finally, we introduce the idea of model blending, a variant of model compression, in which the two models, the teacher and the student, are both strong models, but different in their inductive biases. As an example, we train convolutional networks using the guidance of bidirectional long short-term memory (LSTM) networks. This allows to train the convolutional neural network to be more accurate than the LSTM network at no extra cost at test time.
|
8 |
Efficient and Online Deep Learning through Model Plasticity and StabilityJanuary 2020 (has links)
abstract: The rapid advancement of Deep Neural Networks (DNNs), computing, and sensing technology has enabled many new applications, such as the self-driving vehicle, the surveillance drone, and the robotic system. Compared to conventional edge devices (e.g. cell phone or smart home devices), these emerging devices are required to deal with much more complicated and dynamic situations in real-time with bounded computation resources. However, there are several challenges, including but not limited to efficiency, real-time adaptation, model stability, and automation of architecture design.
To tackle the challenges mentioned above, model plasticity and stability are leveraged to achieve efficient and online deep learning, especially in the scenario of learning streaming data at the edge:
First, a dynamic training scheme named Continuous Growth and Pruning (CGaP) is proposed to compress the DNNs through growing important parameters and pruning unimportant ones, achieving up to 98.1% reduction in the number of parameters.
Second, this dissertation presents Progressive Segmented Training (PST), which targets catastrophic forgetting problems in continual learning through importance sampling, model segmentation, and memory-assisted balancing. PST achieves state-of-the-art accuracy with 1.5X FLOPs reduction in the complete inference path.
Third, to facilitate online learning in real applications, acquisitive learning (AL) is further proposed to emphasize both knowledge inheritance and acquisition: the majority of the knowledge is first pre-trained in the inherited model and then adapted to acquire new knowledge. The inherited model's stability is monitored by noise injection and the landscape of the loss function, while the acquisition is realized by importance sampling and model segmentation. Compared to a conventional scheme, AL reduces accuracy drop by >10X on CIFAR-100 dataset, with 5X reduction in latency per training image and 150X reduction in training FLOPs.
Finally, this dissertation presents evolutionary neural architecture search in light of model stability (ENAS-S). ENAS-S uses a novel fitness score, which addresses not only the accuracy but also the model stability, to search for an optimal inherited model for the application of continual learning. ENAS-S outperforms hand-designed DNNs when learning from a data stream at the edge.
In summary, in this dissertation, several algorithms exploiting model plasticity and model stability are presented to improve the efficiency and accuracy of deep neural networks, especially for the scenario of continual learning. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
|
9 |
Classifying hand-drawn documents in mobile settings, using transfer learning and model compression / Klassificerare av handskrivna dokument för mobil användningRiese, Axel January 2017 (has links)
In recent years, the state-of-the-art in computer vision has improved immensely due to increased use of convolutional neural networks (CNN). However, the best-performing models are typically complex and too slow or too large for mobile use. We investigate whether the power of these large models can be transferred to smaller models and used in mobile applications. A small CNN model was designed based on VGG Net. Using transfer learning, three pre-trained ImageNet networks were tuned to perform hand-drawn image classification. The models were evaluated on their predictive power and the best model was compressed to the small CNN model using knowledge distillation, a flavor of model compression. We found a small but significant improvement in classification performance compared to training the small CNN model directly on training data. No such improvement was found in localization abilities. We claim that model compression, and knowledge distillation in particular, presents a valuable tool for mobile deep learning development. / De senaste åren har system för datorseende markant förbättrats, genom användning av djupa faltningsnäterk (‘‘convolutional neural network’’ - CNN). De bästa modellerna är dock komplexa och för långsamma eller för stora för användning på mobila enheter. Vi undersöker huruvida styrkan i dessa stora modeller kan överföras till mindre modeller för mobila applikationer. En liten CNN-modell designades baserat på VGG Net. Genom användning av transfer learning justerades tre ImageNet-modeller till att klassificera handskrivna dokument. Modellerna evaluerades på deras förmåga att kategorisera innehållet. Den bästa modellen komprimerades sedan till den mindre modellen genom modellkomprimering, mer specifikt en teknik kallad knowledge distillation. Vi fann en liten men signifikant förbättring av den lilla modellens förmåga att kategorisera innehållet, jämfört med att träna modellen direkt på data. Någon sådan förbättring upptäcktes dock inte för lokalisering av objekt. Vi påstår att modellkomprimering, och speciellt knowledge distillation, kan vara ett värdefullt verktyg för utveckling av djupa neurala nätverk för mobila applikationer.
|
10 |
Deep Learning Model Compression for Edge DeploymentVaishnav, Ashutosh January 2019 (has links)
Powerful Deep learning algorithms today allow us to solve many difficult classification and regression tasks. However, running them on memory constrained and low power devices for efficient inference at the edge is a challenge. The goal is to develop a highly generalizable and low complexity compression algorithm that can compress deep neural networks. In this thesis, we propose two novel approaches to this end. The first approach involves learning a new network with L1 norm regularized parameters from the original trained model. This new model is trained with only a fraction of the original dataset. The second approach involves using information about second order derivative of loss to find solutions that are robust to quantization. Combining these approaches allows us to achieve significant compression of the trained model, with only marginal loss in performance,measured using test set classification accuracy. / Kraftfulla djupinlärningsalgoritmer gör det idag möjligt för oss att lösa många svåra klassificerings- och regressionsproblem. Att köra dessa algoritmer på minnesbegränsade och energisnåla enheter för effektiv inferens är en stor utmaning. Målet är att utveckla generaliserbara kompressionsalgoritmer med låg komplexitet som kan komprimera djupa neurala nätverk. I det här examensarbetet föreslår vi två nya tillvägagångssätt för att uppnå målet. Den första metoden bygger på att träna upp ett nytt nätverk med L1-normregulariserade parametrar från den ursprungliga modellen. Den nya modellen kan tränas med bara en bråkdel av den ursprungliga datan. Det andra tillvägagångssättet använder sig av information om andraderivatan av förlustfunktionen för att hitta lösningar som är robusta mot kvantifiering. Genom att kombinera dessa metoder uppnår vi markant komprimering av den tränade modellen, med endast marginell prestationsförlust uppmätt genom klassificering av separattestdata.
|
Page generated in 0.1004 seconds