Spelling suggestions: "subject:"[een] NEURAL ARCHITECTURE SEARCH"" "subject:"[enn] NEURAL ARCHITECTURE SEARCH""
1 |
Efficient and Online Deep Learning through Model Plasticity and StabilityJanuary 2020 (has links)
abstract: The rapid advancement of Deep Neural Networks (DNNs), computing, and sensing technology has enabled many new applications, such as the self-driving vehicle, the surveillance drone, and the robotic system. Compared to conventional edge devices (e.g. cell phone or smart home devices), these emerging devices are required to deal with much more complicated and dynamic situations in real-time with bounded computation resources. However, there are several challenges, including but not limited to efficiency, real-time adaptation, model stability, and automation of architecture design.
To tackle the challenges mentioned above, model plasticity and stability are leveraged to achieve efficient and online deep learning, especially in the scenario of learning streaming data at the edge:
First, a dynamic training scheme named Continuous Growth and Pruning (CGaP) is proposed to compress the DNNs through growing important parameters and pruning unimportant ones, achieving up to 98.1% reduction in the number of parameters.
Second, this dissertation presents Progressive Segmented Training (PST), which targets catastrophic forgetting problems in continual learning through importance sampling, model segmentation, and memory-assisted balancing. PST achieves state-of-the-art accuracy with 1.5X FLOPs reduction in the complete inference path.
Third, to facilitate online learning in real applications, acquisitive learning (AL) is further proposed to emphasize both knowledge inheritance and acquisition: the majority of the knowledge is first pre-trained in the inherited model and then adapted to acquire new knowledge. The inherited model's stability is monitored by noise injection and the landscape of the loss function, while the acquisition is realized by importance sampling and model segmentation. Compared to a conventional scheme, AL reduces accuracy drop by >10X on CIFAR-100 dataset, with 5X reduction in latency per training image and 150X reduction in training FLOPs.
Finally, this dissertation presents evolutionary neural architecture search in light of model stability (ENAS-S). ENAS-S uses a novel fitness score, which addresses not only the accuracy but also the model stability, to search for an optimal inherited model for the application of continual learning. ENAS-S outperforms hand-designed DNNs when learning from a data stream at the edge.
In summary, in this dissertation, several algorithms exploiting model plasticity and model stability are presented to improve the efficiency and accuracy of deep neural networks, especially for the scenario of continual learning. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2020
|
2 |
Designing a Performant Ablation Study Framework for PyTorchMolinari, Alessio January 2020 (has links)
PyTorch is becoming a really important library for any deep learning practitioner, as it provides many low-level functionalities that allow a fine-grained control of neural networks from training to inference, and for this reason it is also heavily used in deep learning research, where ablation studies are often conducted to validate neural architectures that researchers come up with. To the best of our knowledge, Maggy is the first open-source framework for asynchronous parallel ablation studies and hyperparameter optimization for TensorFlow, and in this work we added important functionalities such as the possibility to execute ablation studies on PyTorch models as well as the generalization of feature ablation on any data type. This work also shows the main challenges and interesting points of developing a framework on top of PyTorch and how these challenges have been addressed in the extension of Maggy. / PyTorch blir ett oerhört viktigt bibliotek för alla utövare inom djupinlärning, detta eftersom PyTorch innehåller flertalet lågnivåfunktioner som möjliggör en finkorning kontroll av neurala nätverk - från träning till inferens. Av den anledningen används PyTorch också kraftigt i forskning om djupinlärning, där ablationsstudier ofta genomförs för att validera neurala arkitekturer som forskare framtagit. Så vitt vi vet är Maggy det första open-source ramverk för asynkrona parallella ablationsstudier och hyperparameteroptimering för TensorFlow. I detta arbete har vi lagt till viktiga funktioner såsom möjligheten att utföra ablationsstudier på PyTorch-modeller samt generalisering av funktionsablation för alla datatyper. Detta arbete upplyser också dem viktigaste utmaningarna och mest intressanta punkterna för att utveckla en ram ovanpå PyTorch och hur dessa utmaningar har hanterats i förlängningen av Maggy.
|
3 |
Grafové neuronové sítě pro odhad výkonnosti při hledání architektur / Grafové neuronové sítě pro odhad výkonnosti při hledání architekturSuchopárová, Gabriela January 2021 (has links)
In this work we present a novel approach to network embedding for neural architecture search - info-NAS. The model learns to predict the output fea- tures of a trained convolutional neural network on a set of input images. We use the NAS-Bench-101 search space as the neural architecture dataset, and the CIFAR-10 as the image dataset. For the purpose of this task, we extend an existing unsupervised graph variational autoencoder, arch2vec, by jointly training on unlabeled and labeled neural architectures in a semi-supervised manner. To evaluate our approach, we analyze how our model learns on the data, compare it to the original arch2vec, and finally, we evaluate both mod- els on the NAS-Bench-101 search task and on the performance prediction task. 1
|
4 |
Redukce počtu parametrů v konvolučních neuronových sítích / Reducing Number of Parameters in Convolutional Neural NetworksHübsch, Ondřej January 2021 (has links)
In the current deep learning era, convolutional neural networks are commonly used as a backbone of systems that process images or videos. A lot of existing neural network architectures are however needlessly overparameterized and their performance can be closely matched by an alternative that uses much smaller amount of parameters. Our aim is to design a method that is able to find such alternative(s) for a given convolutional architecture. We propose a general scheme for architecture reduction and evaluate three algorithms that search for the op- timal smaller architecture. We do multiple experiments with ResNet and Wide ResNet architectures as the base using CIFAR-10 dataset. The best method is able to reduce the number of parameters by 75-85% without any loss in accuracy even in these already quite efficient architectures. 1
|
5 |
Bayesian Optimization for Neural Architecture Search using Graph KernelsKrishnaswami Sreedhar, Bharathwaj January 2020 (has links)
Neural architecture search is a popular method for automating architecture design. Bayesian optimization is a widely used approach for hyper-parameter optimization and can estimate a function with limited samples. However, Bayesian optimization methods are not preferred for architecture search as it expects vector inputs while graphs are high dimensional data. This thesis presents a Bayesian approach with Gaussian priors that use graph kernels specifically targeted to work in the higherdimensional graph space. We implemented three different graph kernels and show that on the NAS-Bench-101 dataset, an untrained graph convolutional network kernel outperforms previous methods significantly in terms of the best network found and the number of samples required to find it. We follow the AutoML guidelines to make this work reproducible. / Neural arkitektur sökning är en populär metod för att automatisera arkitektur design. Bayesian-optimering är ett vanligt tillvägagångssätt för optimering av hyperparameter och kan uppskatta en funktion med begränsade prover. Bayesianska optimeringsmetoder är dock inte att föredra för arkitektonisk sökning eftersom vektoringångar förväntas medan grafer är högdimensionella data. Denna avhandling presenterar ett Bayesiansk tillvägagångssätt med gaussiska prior som använder grafkärnor som är särskilt fokuserade på att arbeta i det högre dimensionella grafutrymmet. Vi implementerade tre olika grafkärnor och visar att det på NASBench- 101-data, till och med en otränad Grafkonvolutionsnätverk-kärna, överträffar tidigare metoder när det gäller det bästa nätverket som hittats och antalet prover som krävs för att hitta det. Vi följer AutoML-riktlinjerna för att göra detta arbete reproducerbart.
|
6 |
[pt] BUSCA POR ARQUITETURA NEURAL COM INSPIRAÇÃO QUÂNTICA APLICADA A SEGMENTAÇÃO SEMÂNTICA / [en] QUANTUM-INSPIRED NEURAL ARCHITECTURE SEARCH APPLIED TO SEMANTIC SEGMENTATIONGUILHERME BALDO CARLOS 14 July 2023 (has links)
[pt] Redes neurais profundas são responsáveis pelo grande progresso em diversas tarefas perceptuais, especialmente nos campos da visão computacional,reconhecimento de fala e processamento de linguagem natural. Estes resultados produziram uma mudança de paradigma nas técnicas de reconhecimentode padrões, deslocando a demanda do design de extratores de característicaspara o design de arquiteturas de redes neurais. No entanto, o design de novas arquiteturas de redes neurais profundas é bastante demandanteem termos de tempo e depende fortemente da intuição e conhecimento de especialistas,além de se basear em um processo de tentativa e erro. Neste contexto, a idea de automatizar o design de arquiteturas de redes neurais profundas tem ganhado popularidade, estabelecendo o campo da busca por arquiteturas neurais(NAS - Neural Architecture Search). Para resolver o problema de NAS, autores propuseram diversas abordagens envolvendo o espaço de buscas, a estratégia de buscas e técnicas para mitigar o consumo de recursos destes algoritmos. O Q-NAS (Quantum-inspired Neural Architecture Search) é uma abordagem proposta para endereçar o problema de NAS utilizando um algoritmo evolucionário com inspiração quântica como estratégia de buscas. Este método foi aplicado de forma bem sucedida em classificação de imagens, superando resultados de arquiteturas de design manual nos conjuntos de dados CIFAR-10 e CIFAR-100 além de uma aplicação de mundo real na área da sísmica. Motivados por este sucesso, propõe-se nesta Dissertação o SegQNAS (Quantum-inspired Neural Architecture Search applied to Semantic Segmentation), uma adaptação do Q-NAS para a tarefa de segmentação semântica. Diversos experimentos foram realizados com objetivo de verificar a aplicabilidade do SegQNAS em dois conjuntos de dados do desafio Medical Segmentation Decathlon. O SegQNAS foi capaz de alcançar um coeficiente de similaridade dice de 0.9583 no conjunto de dados de baço, superando os resultados de arquiteturas tradicionais como U-Net e ResU-Net e atingindo resultados comparáveis a outros trabalhos que aplicaram NAS a este conjunto de dados, mas encontrando arquiteturas com muito menos parãmetros. No conjunto de dados de próstata, o SegQNAS alcançou um coeficiente de similaridade dice de 0.6887 superando a U-Net, ResU-Net e o trabalho na área de NAS que utilizamos como comparação. / [en] Deep neural networks are responsible for great progress in performance
for several perceptual tasks, especially in the fields of computer vision, speech
recognition, and natural language processing. These results produced a paradigm shift in pattern recognition techniques, shifting the demand from feature
extractor design to neural architecture design. However, designing novel deep
neural network architectures is very time-consuming and heavily relies on experts intuition, knowledge, and a trial and error process. In that context, the
idea of automating the architecture design of deep neural networks has gained
popularity, establishing the field of neural architecture search (NAS). To tackle the problem of NAS, authors have proposed several approaches regarding
the search space definition, algorithms for the search strategy, and techniques
to mitigate the resource consumption of those algorithms. Q-NAS (Quantum-inspired Neural Architecture Search) is one proposed approach to address the
NAS problem using a quantum-inspired evolutionary algorithm as the search
strategy. That method has been successfully applied to image classification,
outperforming handcrafted models on the CIFAR-10 and CIFAR-100 datasets
and also on a real-world seismic application. Motivated by this success, we
propose SegQNAS (Quantum-inspired Neural Architecture Search applied to
Semantic Segmentation), which is an adaptation of Q-NAS applied to semantic
segmentation. We carried out several experiments to verify the applicability
of SegQNAS on two datasets from the Medical Segmentation Decathlon challenge. SegQNAS was able to achieve a 0.9583 dice similarity coefficient on the
spleen dataset, outperforming traditional architectures like U-Net and ResU-Net and comparable results with a similar NAS work from the literature but
with fewer parameters network. On the prostate dataset, SegQNAS achieved
a 0.6887 dice similarity coefficient, also outperforming U-Net, ResU-Net, and
outperforming a similar NAS work from the literature.
|
7 |
[pt] BUSCA DE ARQUITETURAS NEURAIS COM ALGORITMOS EVOLUTIVOS DE INSPIRAÇÃO QUÂNTICA / [en] QUANTUM-INSPIRED NEURAL ARCHITECTURE SEARCHDANIELA DE MATTOS SZWARCMAN 13 August 2020 (has links)
[pt] As redes neurais deep são modelos poderosos e flexíveis, que ganharam destaque na comunidade científica na última década. Para muitas tarefas, elas até superam o desempenho humano. Em geral, para obter tais resultados, um especialista despende tempo significativo para projetar a arquitetura neural, com longas sessões de tentativa e erro. Com isso, há um interesse crescente em automatizar esse processo. Novos métodos baseados em técnicas como aprendizado por reforço e algoritmos evolutivos foram apresentados como abordagens para o problema da busca de arquitetura neural (NAS - Neural Architecture Search), mas muitos ainda são algoritmos de alto custo computacional. Para reduzir esse custo, pesquisadores sugeriram
limitar o espaço de busca, com base em conhecimento prévio. Os algoritmos evolutivos de inspiração quântica (AEIQ) apresentam resultados promissores em relação à convergência mais rápida. A partir dessa idéia, propõe-se o Q-NAS: um AEIQ para buscar redes deep através da montagem de subestruturas. O Q-NAS também pode evoluir alguns hiperparâmetros numéricos, o que é um primeiro passo para a automação completa. Experimentos com o conjunto de dados CIFAR-10 foram realizados a fim de analisar detalhes do Q-NAS. Para muitas configurações de parâmetros, foram obtidos resultados satisfatórios. As melhores acurácias no CIFAR-10 foram de 93,85 porcento para uma rede residual e 93,70 porcento para uma rede convolucional, superando modelos elaborados por especialistas e alguns métodos de NAS. Incluindo um esquema simples de parada antecipada, os tempos de evolução nesses casos foram de 67 dias de GPU e 48 dias de GPU, respectivamente. O Q-NAS foi aplicado ao CIFAR-100, sem qualquer ajuste de parâmetro, e obteve 74,23 porcento de acurácia, similar a uma ResNet com 164 camadas. Por fim, apresenta-se um estudo de caso com dados reais, no qual utiliza-se o Q-NAS para resolver a tarefa de classificação sísmica. Em menos de 8,5 dias de GPU, o Q-NAS gerou redes com 12 vezes menos pesos e maior acurácia do que um modelo criado especialmente para esta tarefa. / [en] Deep neural networks are powerful and flexible models that have gained the attention of the machine learning community over the last decade. For a variety of tasks, they can even surpass human-level performance. Usually, to reach these excellent results, an expert spends significant time designing the neural architecture, with long trial and error sessions. In this scenario, there is a growing interest in automating this design process. To address the neural architecture search (NAS) problem, authors have presented new methods based on techniques such as reinforcement learning and evolutionary algorithms, but the high computational cost is still an issue for many of them. To reduce this cost, researchers have proposed to restrict the search space, with the help of expert knowledge. Quantum-inspired evolutionary algorithms present promising results regarding faster convergence. Motivated by this idea, we propose Q-NAS: a quantum-inspired algorithm to search for deep networks by assembling substructures. Q-NAS can also evolve some numerical hyperparameters, which is a first step in the direction of complete automation. We ran several experiments with the CIFAR-10 dataset to analyze the details of the algorithm. For
many parameter settings, Q-NAS was able to achieve satisfactory results. Our best accuracies on the CIFAR-10 task were 93.85 percent for a residual network and 93.70 percent for a convolutional network, overcoming hand-designed models, and some NAS works. Considering the addition of a simple early-stopping mechanism, the evolution times for these runs were 67 GPU days and 48 GPU days, respectively. Also, we applied Q-NAS to CIFAR-100 without any parameter adjustment, reaching an accuracy of 74.23 percent, which is comparable to a ResNet with 164 layers. Finally, we present a case study with real datasets, where we used Q-NAS to solve the seismic classification task. In less than 8.5 GPU days, Q-NAS generated networks with 12 times fewer weights and higher accuracy than a model specially created for this task.
|
8 |
[pt] APRIMORAÇÃO DO ALGORITMO Q-NAS PARA CLASSIFICAÇÃO DE IMAGENS / [en] ENHANCED Q-NAS FOR IMAGE CLASSIFICATIONJULIA DRUMMOND NOCE 31 October 2022 (has links)
[pt] Redes neurais profundas são modelos poderosos e flexíveis que ganharam a atenção da comunidade de aprendizado de máquina na última década. Normalmente, um especialista gasta um tempo significativo projetando a arquitetura neural, com longas sessões de tentativa e erro para alcançar resultados
bons e relevantes. Por causa do processo manual, há um maior interesse em abordagens de busca de arquitetura neural, que é um método que visa automatizar a busca de redes neurais. A busca de arquitetura neural(NAS) é uma subárea das técnicas de aprendizagem de máquina automatizadas (AutoML) e uma etapa essencial para automatizar os métodos de aprendizado de máquina.
Esta técnica leva em consideração os aspectos do espaço de busca das arquiteturas, estratégia de busca e estratégia de estimativa de desempenho. Algoritmos evolutivos de inspiração quântica apresentam resultados promissores quanto à convergência mais rápida quando comparados a outras soluções com espaço de busca restrito e alto custo computacional. Neste trabalho, foi aprimorado o Q-NAS: um algoritmo de inspiração quântica para pesquisar redes profundas por meio da montagem de subestruturas simples. O Q-NAS também pode evoluir alguns hiperparâmetros numéricos do treinamento, o que é um primeiro passo na direção da automação completa. Foram apresentados resultados aplicando
Q-NAS, evoluído, sem transferência de conhecimento, no conjunto de dados CIFAR-100 usando apenas 18 GPU/dias. Nossa contribuição envolve experimentar outros otimizadores no algoritmo e fazer um estudo aprofundado dos parâmetros do Q-NAS. Nesse trabalho, foi possível atingir uma acurácia
de 76,40%. Foi apresentado também o Q-NAS aprimorado aplicado a um estudo de caso para classificação COVID-19 x Saudável em um banco de dados de tomografia computadorizada de tórax real. Em 9 GPU/dias, conseguimos atingir uma precisão de 99,44% usando menos de 1000 amostras para dados
de treinamento. / [en] Deep neural networks are powerful and flexible models that have gained
the attention of the machine learning community over the last decade. Usually,
an expert spends significant time designing the neural architecture, with
long trial and error sessions to reach good and relevant results. Because
of the manual process, there is a greater interest in Neural Architecture
Search (NAS), which is an automated method of architectural search in
neural networks. NAS is a subarea of Automated Machine Learning (AutoML)
and is an essential step towards automating machine learning methods. It
is a technique that aims to automate the construction process of a neural
network architecture. This technique is defined by the search space aspects
of the architectures, search strategy and performance estimation strategy.
Quantum-inspired evolutionary algorithms present promising results regarding
faster convergence when compared to other solutions with restricted search
space and high computational costs. In this work, we enhance Q-NAS: a
quantum-inspired algorithm to search for deep networks by assembling simple
substructures. Q-NAS can also evolve some numerical hyperparameters, which
is a first step in the direction of complete automation. Our contribution involves
experimenting other types of optimizers in the algorithm and make an indepth
study of the Q-NAS parameters. Additionally, we present Q-NAS results,
evolved from scratch, on the CIFAR-100 dataset using only 18 GPU/days.
We were able to achieve an accuracy of 76.40% which is a competitive result
regarding other works in literature. Finally, we also present the enhanced QNAS
applied to a case study for COVID-19 x Healthy classification on a real
chest computed tomography database. In 9 GPU/days we were able to achieve
an accuracy of 99.44% using less than 1000 samples for training data. This
accuracy overcame benchmark networks such as ResNet, GoogleLeNet and
VGG.
|
9 |
EMONAS : Evolutionary Multi-objective Neuron Architecture Search of Deep Neural Network / EMONAS : Evolutionär multi-objektiv neuronarkitektursökning av djupa neurala nätverk för inbyggda systemFeng, Jiayi January 2023 (has links)
Customized Deep Neural Network (DNN) accelerators have been increasingly popular in various applications, from autonomous driving and natural language processing to healthcare and finance, etc. However, deploying them directly on embedded system peripherals within real-time operating systems (RTOS) is not easy due to the paradox of the complexity of DNNs and the simplicity of embedded system devices. As a result, DNN implementation on embedded system devices requires customized accelerators with tailored hardware due to their numerous computations, latency, power consumption, etc. Moreover, the computational capacity, provided by potent microprocessors or graphics processing units (GPUs), is necessary to unleash the full potential of DNN, but these computational resources are often not easily available in embedded system devices. In this thesis, we propose an innovative method to evaluate and improve the efficiency of DNN implementation within the constraints of resourcelimited embedded system devices. The Evolutionary Multi-Objective Neuron Architecture Search-Binary One Optimization (EMONAS-BOO) optimizes both the image classification accuracy and the innovative Binary One Optimization (BOO) objectives, with Multiple Objective Optimization (MOO) methods. The EMONAS-BOO automates neural network searching and training, and the neural network architectures’ diversity is also guaranteed with the help of an evolutionary algorithm that consists of tournament selection, polynomial mutation, and point crossover mechanisms. Binary One Optimization (BOO) is used to evaluate the difficulty in implementing DNNs on resource-limited embedded system peripherals, employing a binary format for DNN weights. A deeper implementation of the innovative Binary One Optimization will significantly boost not only computation efficiency but also memory storage, power dissipation, etc. It is based on the reduction of weights binary 1’s that need to be computed and stored, where the reduction of binary 1 brings reduced arithmetic operations and thus simplified neural network structures. In addition, analyzed from a digital circuit waveform perspective, the embedded system, in interpreting the neural network, will register an increase in zero weights leading to a reduction in voltage transition frequency, which, in turn, benefits power efficiency improvement. The proposed EMONAS employs the MOO method which optimizes two objectives. The first objective is image classification accuracy, and the second objective is Binary One Optimization (BOO). This approach enables EMONAS to outperform manually constructed and randomly searched DNNs. Notably, 12 out of 100 distinct DNNs maintained their image classification accuracy. At the same time, they also exhibit superior BOO performance. Additionally, the proposed EMONAS ensures automated searching and training of DNNs. It achieved significant reductions in key performance metrics: Compared with random search, evolutionary-searched BOO was lowered by up to 85.1%, parameter size by 85.3%, and FLOPs by 83.3%. These improvements were accomplished without sacrificing the image classification accuracy, which saw an increase of 8.0%. These results demonstrate that the EMONAS is an excellent choice for optimizing innovative objects that did not exist before, and greater multi-objective optimization performance can be guaranteed simultaneously if computational resources are adequate. / Customized Deep Neural Network (DNN)-acceleratorer har blivit alltmer populära i olika applikationer, från autonom körning och naturlig språkbehandling till sjukvård och ekonomi, etc. Att distribuera dem direkt på kringutrustning för inbyggda system inom realtidsoperativsystem (RTOS) är dock inte lätt på grund av paradoxen med komplexiteten hos DNN och enkelheten hos inbyggda systemenheter. Som ett resultat kräver DNNimplementering på inbäddade systemenheter skräddarsydda acceleratorer med skräddarsydd hårdvara på grund av deras många beräkningar, latens, strömförbrukning, etc. Dessutom är beräkningskapaciteten, som tillhandahålls av potenta mikroprocessorer eller grafikprocessorer (GPU), nödvändig för att frigöra den fulla potentialen hos DNN, men dessa beräkningsresurser är ofta inte lätt tillgängliga i inbyggda systemenheter. I den här avhandlingen föreslår vi en innovativ metod för att utvärdera och förbättra effektiviteten av DNN-implementering inom begränsningarna av resursbegränsade inbäddade systemenheter. Den evolutionära Multi-Objective Neuron Architecture Search-Binary One Optimization (EMONAS-BOO) optimerar både bildklassificeringsnoggrannheten och de innovativa Binary One Optimization (BOO) målen, med Multiple Objective Optimization (MOO) metoder. EMONAS-BOO automatiserar sökning och träning av neurala nätverk, och de neurala nätverksarkitekturernas mångfald garanteras också med hjälp av en evolutionär algoritm som består av turneringsval, polynommutation och punktövergångsmekanismer. Binary One Optimization (BOO) används för att utvärdera svårigheten att implementera DNN på resursbegränsade kringutrustning för inbäddade system, med ett binärt format för DNN-vikter. En djupare implementering av den innovativa Binary One Optimization kommer att avsevärt öka inte bara beräkningseffektiviteten utan också minneslagring, effektförlust, etc. Den är baserad på minskningen av vikter binära 1:or som behöver beräknas och lagras, där minskningen av binär 1 ger minskade aritmetiska operationer och därmed förenklade neurala nätverksstrukturer. Dessutom, analyserat ur ett digitalt kretsvågformsperspektiv, kommer det inbäddade systemet, vid tolkning av det neurala nätverket, att registrera en ökning av nollvikter, vilket leder till en minskning av spänningsövergångsfrekvensen, vilket i sin tur gynnar en förbättring av effekteffektiviteten. Den föreslagna EMONAS använder MOO-metoden som optimerar två mål. Det första målet är bildklassificeringsnoggrannhet och det andra målet är Binary One Optimization (BOO). Detta tillvägagångssätt gör det möjligt för EMONAS att överträffa manuellt konstruerade och slumpmässigt genomsökta DNN. Noterbart behöll 12 av 100 distinkta DNN:er sin bildklassificeringsnoggrannhet. Samtidigt uppvisar de också överlägsen BOOprestanda. Dessutom säkerställer den föreslagna EMONAS automatisk sökning och utbildning av DNN. Den uppnådde betydande minskningar av nyckelprestandamått: BOO sänktes med upp till 85,1%, parameterstorleken med 85,3% och FLOP:s med 83,3%. Dessa förbättringar åstadkoms utan att offra bildklassificeringsnoggrannheten, som såg en ökning med 8,0%. Dessa resultat visar att EMONAS är ett utmärkt val för att optimera innovativa objekt som inte existerade tidigare, och större multi-objektiv optimeringsprestanda kan garanteras samtidigt om beräkningsresurserna är tillräckliga.
|
10 |
Research and Design of Neural Processing Architectures Optimized for Embedded ApplicationsWu, Binyi 28 May 2024 (has links)
Der Einsatz von neuronalen Netzen in Edge-Geräten und deren Einbindung in unser tägliches Leben findet immer mehr Beachtung. Ihre hohen Rechenkosten machen jedoch viele eingebettete Anwendungen zu einer Herausforderung. Das Hauptziel meiner Doktorarbeit ist es, einen Beitrag zur Lösung dieses Dilemmas zu leisten: die Optimierung neuronaler Netze und die Entwicklung entsprechender neuronaler Verarbeitungseinheiten für Endgeräte. Diese Arbeit nahm die algorithmische Forschung als Ausgangspunkt und wandte dann deren Ergebnisse an, um das Architekturdesign von Neural Processing Units (NPUs) zu verbessern. Die Optimierung neuronaler Netzwerkmodelle begann mit der Quantisierung neuronaler Netzwerke mit einfacher Präzision und entwickelte sich zu gemischter Präzision. Die Entwicklung der NPU-Architektur folgte den Erkenntnissen der Algorithmusforschung, um ein Hardware/Software Co-Design zu erreichen. Darüber hinaus wurde ein neuartiger Ansatz zur gemeinsamen Entwicklung von Hardware und Software vorgeschlagen, um das Prototyping und die Leistungsbewertung von NPUs zu beschleunigen. Dieser Ansatz zielt auf die frühe Entwicklungsphase ab. Er hilft Entwicklern, sich auf das Design und die Optimierung von NPUs zu konzentrieren und verkürzt den Entwicklungszyklus erheblich. Im Abschlussprojekt wurde ein auf maschinellem Lernen basierender Ansatz angewendet, um die Rechen- und Speicherressourcen der NPU zu erkunden and optimieren. Die gesamte Arbeit umfasst mehrere verschiedene Bereiche, von der Algorithmusforschung bis zum Hardwaredesign. Sie alle arbeiten jedoch an der Verbesserung der Inferenz-Effizienz neuronaler Netze. Die Optimierung der Algorithmen zielt insbesondere darauf ab, den Speicherbedarf und die Rechenkosten von neuronalen Netzen zu verringern. Das NPU-Design hingegen konzentriert sich auf die Verbesserung der Nutzung von Hardwareressourcen. Der vorgeschlagene Ansatz zur gemeinsamen Entwicklung von Software und Hardware verkürzt den Entwurfszyklus und beschleunigt die Entwurfsiterationen. Die oben dargestellte Reihenfolge entspricht dem Aufbau dieser Dissertation. Jedes Kapitel ist einem Thema gewidmet und umfasst relevante Forschungsarbeiten, Methodik und Versuchsergebnisse.:1 Introduction
2 Convolutional Neural Networks
2.1 Convolutional layer
2.1.1 Padding
2.1.2 Convolution
2.1.3 Batch Normalization
2.1.4 Nonlinearity
2.2 Pooling Layer
2.3 Fully Connected Layer
2.4 Characterization
2.4.1 Composition of Operations and Parameters
2.4.2 Arithmetic Intensity
2.5 Optimization
3 Quantization with Double-Stage Squeeze-and-Threshold 19
3.1 Overview
3.1.1 Binarization
3.1.2 Multi-bit Quantization
3.2 Quantization of Convolutional Neural Networks
3.2.1 Quantization Scheme
3.2.2 Operator fusion of Conv2D
3.3 Activation Quantization with Squeeze-and-Threshold
3.3.1 Double-Stage Squeeze-and-Threshold
3.3.2 Inference Optimization
3.4 Experiment
3.4.1 Ablation Study of Squeeze-and-Threshold
3.4.2 Comparison with State-of-the-art Methods
3.5 Summary
4 Low-Precision Neural Architecture Search 39
4.1 Overview
4.2 Differentiable Architecture Search
4.2.1 Gumbel Softmax
4.2.2 Disadvantage and Solution
4.3 Low-Precision Differentiable Architecture Search
4.3.1 Convolution Sharing
4.3.2 Forward-and-Backward Scaling
4.3.3 Power Estimation
4.3.4 Architecture of Supernet
4.4 Experiment
4.4.1 Effectiveness of solutions to the dominance problem
4.4.2 Softmax and Gumbel Softmax
4.4.3 Optimizer and Inverted Learning Rate Scheduler
4.4.4 NAS Method Evaluation
4.4.5 Searched Model Analysis
4.4.6 NAS Cost Analysis
4.4.7 NAS Training Analysis
4.5 Summary
5 Configurable Sparse Neural Processing Unit 65
5.1 Overview
5.2 NPU Architecture
5.2.1 Buffer
5.2.2 Reshapeable Mixed-Precision MAC Array
5.2.3 Sparsity
5.2.4 Post Process Unit
5.3 Mapping
5.3.1 Mixed-Precision MAC
5.3.2 MAC Array
5.3.3 Support of Other Operation
5.3.4 Configurability
5.4 Experiment
5.4.1 Performance Analysis of Runtime Configuration
5.4.2 Roofline Performance Analysis
5.4.3 Mixed-Precision
5.4.4 Comparison with Cortex-M7
5.5 Summary
6 Agile Development and Rapid Design Space Exploration 91
6.1 Overview
6.1.1 Agile Development
6.1.2 Design Space Exploration
6.2 Agile Development Infrastructure
6.2.1 Chisel Backend
6.2.2 NPU Software Stack
6.3 Modeling and Exploration
6.3.1 Area Modeling
6.3.2 Performance Modeling
6.3.3 Layered Exploration Framework
6.4 Experiment
6.4.1 Efficiency of Agile Development Infrastructure
6.4.2 Effectiveness of Agile Development Infrastructure
6.4.3 Area Modeling
6.4.4 Performance Modeling
6.4.5 Rapid Exploration and Pareto Front
6.5 Summary
7 Summary and Outlook 123
7.1 Summary
7.2 Outlook
A Appendix of Double-Stage ST Quantization 127
A.1 Training setting of ResNet-18 in Table 3.3
A.2 Training setting of ReActNet in Table 3.4
A.3 Training setting of ResNet-18 in Table 3.4
A.4 Pseudocode Implementation of Double-Stage ST
B Appendix of Low-Precision Neural Architecture Search 131
B.1 Low-Precision NAS on CIFAR-10
B.2 Low-Precision NAS on Tiny-ImageNet
B.3 Low-Precision NAS on ImageNet
Bibliography 137 / Deploying neural networks on edge devices and bringing them into our daily lives is attracting more and more attention. However, its expensive computational cost makes many embedded applications daunting. The primary objective of my doctoral studies is to make contributions towards resolving this predicament: optimizing neural networks and designing corresponding efficient neural processing units for edge devices. This work took algorithmic research, specifically the optimization of deep neural networks, as a starting point and then applied its findings to steer the architecture design of Neural Processing Units (NPUs). The optimization of neural network models started with single precision neural network quantization and progressed to mixed precision. The NPU architecture development followed the algorithmic research findings to achieve hardware/software co-design. Furthermore, a new approach to hardware and software co-development was introduced, aimed at expediting the prototyping and performance assessment of NPUs. This approach targets early-stage development. It helps developers to focus on the design and optimization of NPUs and significantly shortens the development cycle. In the final project, a machine learning-based approach was applied to explore and optimize the computational and memory resources of the NPU. The entire work covers several different areas, from algorithmic research to hardware design. But they all work on improving the inference efficiency of neural networks. Specifically, algorithm optimization aims to reduce the memory footprint and computational cost of neural networks. The NPU design, on the other hand, focuses on improving the utilization of hardware resources. The proposed software and hardware co-development approach shortens the design cycle and speeds up the design iteration. The order presented above corresponds to the structure of this dissertation. Each chapter corresponds to a topic and covers relevant research, methodology, and experimental results.:1 Introduction
2 Convolutional Neural Networks
2.1 Convolutional layer
2.1.1 Padding
2.1.2 Convolution
2.1.3 Batch Normalization
2.1.4 Nonlinearity
2.2 Pooling Layer
2.3 Fully Connected Layer
2.4 Characterization
2.4.1 Composition of Operations and Parameters
2.4.2 Arithmetic Intensity
2.5 Optimization
3 Quantization with Double-Stage Squeeze-and-Threshold 19
3.1 Overview
3.1.1 Binarization
3.1.2 Multi-bit Quantization
3.2 Quantization of Convolutional Neural Networks
3.2.1 Quantization Scheme
3.2.2 Operator fusion of Conv2D
3.3 Activation Quantization with Squeeze-and-Threshold
3.3.1 Double-Stage Squeeze-and-Threshold
3.3.2 Inference Optimization
3.4 Experiment
3.4.1 Ablation Study of Squeeze-and-Threshold
3.4.2 Comparison with State-of-the-art Methods
3.5 Summary
4 Low-Precision Neural Architecture Search 39
4.1 Overview
4.2 Differentiable Architecture Search
4.2.1 Gumbel Softmax
4.2.2 Disadvantage and Solution
4.3 Low-Precision Differentiable Architecture Search
4.3.1 Convolution Sharing
4.3.2 Forward-and-Backward Scaling
4.3.3 Power Estimation
4.3.4 Architecture of Supernet
4.4 Experiment
4.4.1 Effectiveness of solutions to the dominance problem
4.4.2 Softmax and Gumbel Softmax
4.4.3 Optimizer and Inverted Learning Rate Scheduler
4.4.4 NAS Method Evaluation
4.4.5 Searched Model Analysis
4.4.6 NAS Cost Analysis
4.4.7 NAS Training Analysis
4.5 Summary
5 Configurable Sparse Neural Processing Unit 65
5.1 Overview
5.2 NPU Architecture
5.2.1 Buffer
5.2.2 Reshapeable Mixed-Precision MAC Array
5.2.3 Sparsity
5.2.4 Post Process Unit
5.3 Mapping
5.3.1 Mixed-Precision MAC
5.3.2 MAC Array
5.3.3 Support of Other Operation
5.3.4 Configurability
5.4 Experiment
5.4.1 Performance Analysis of Runtime Configuration
5.4.2 Roofline Performance Analysis
5.4.3 Mixed-Precision
5.4.4 Comparison with Cortex-M7
5.5 Summary
6 Agile Development and Rapid Design Space Exploration 91
6.1 Overview
6.1.1 Agile Development
6.1.2 Design Space Exploration
6.2 Agile Development Infrastructure
6.2.1 Chisel Backend
6.2.2 NPU Software Stack
6.3 Modeling and Exploration
6.3.1 Area Modeling
6.3.2 Performance Modeling
6.3.3 Layered Exploration Framework
6.4 Experiment
6.4.1 Efficiency of Agile Development Infrastructure
6.4.2 Effectiveness of Agile Development Infrastructure
6.4.3 Area Modeling
6.4.4 Performance Modeling
6.4.5 Rapid Exploration and Pareto Front
6.5 Summary
7 Summary and Outlook 123
7.1 Summary
7.2 Outlook
A Appendix of Double-Stage ST Quantization 127
A.1 Training setting of ResNet-18 in Table 3.3
A.2 Training setting of ReActNet in Table 3.4
A.3 Training setting of ResNet-18 in Table 3.4
A.4 Pseudocode Implementation of Double-Stage ST
B Appendix of Low-Precision Neural Architecture Search 131
B.1 Low-Precision NAS on CIFAR-10
B.2 Low-Precision NAS on Tiny-ImageNet
B.3 Low-Precision NAS on ImageNet
Bibliography 137
|
Page generated in 0.1012 seconds