Global ETD Search

1	OBJECT DETECTION IN DEEP LEARNING Haoyu Shi (8100614) 10 December 2019 (has links) <p>Through the computing advance and GPU (Graphics Processing Unit) availability for math calculation, the deep learning field becomes more popular and prevalent. Object detection with deep learning, which is the part of image processing, plays an important role in automatic vehicle drive and computer vision. Object detection includes object localization and object classification. Object localization involves that the computer looks through the image and gives the correct coordinates to localize the object. Object classification is that the computer classification targets into different categories. The traditional image object detection pipeline idea is from Fast/Faster R-CNN [32] [58]. The region proposal network generates the contained objects areas and put them into classifier. The first step is the object localization while the second step is the object classification. The time cost for this pipeline function is not efficient. Aiming to address this problem, You Only Look Once (YOLO) [4] network is born. YOLO is the single neural network end-to-end pipeline with the image processing speed being 45 frames per second in real time for network prediction. In this thesis, the convolution neural networks are introduced, including the state of art convolutional neural networks in recently years. YOLO implementation details are illustrated step by step. We adopt the YOLO network for our applications since the YOLO network has the faster convergence rate in training and provides high accuracy and it is the end to end architecture, which makes networks easy to optimize and train. </p> Computer Engineering Deep learning neural network convolution neural network yolo
2	Semantic Segmentation Using Deep Learning Neural Architectures Sarpangala, Kishan January 2019 (has links) No description available. Artificial Intelligence Semantic Segmentation Convolutional Neural Network Computer Vision Deep Learning Neural Network Artificial Intelligence Fully Convolutional Network
3	Segmentace cévního řečiště ve snímcích sítnice metodami hlubokého učení / Blood vessel segmentation in retinal images using deep learning approaches Serečunová, Stanislava January 2018 (has links) This diploma thesis deals with the application of deep neural networks with focus on image segmentation. The theoretical part contains a description of deep neural networks and a summary of widely used convolutional architectures for segmentation of objects from the image. Practical part of the work was devoted to testing of an existing network architectures. For this purpose, an open-source software library Tensorflow, implemented in Python programming language, was used. A frequent problem incorporating the use of convolutional neural networks is the requirement on large amount of input data. In order to overcome this obstacle a new data set, consisting of a combination of five freely available databases was created. The selected U-net network architecture was tested by first modification of the newly created data set. Based on the test results, the chosen network architecture has been modified. By these means a new network has been created achieving better performance in comparison to the original network. The modified architecture is then trained on a newly created data set, that contains images of different types taken with various fundus cameras. As a result, the trained network is more robust and allows segmentation of retina blood vessels from images with different parameters. The modified architecture was tested on the STARE, CHASE, and HRF databases. Results were compared with published segmentation methods from literature, which are based on convolutional neural networks, as well as classical segmentation methods. The created network shows a high success rate of retina blood vessels segmentation comparable to state-of-the-art methods.
4	Odhad kanálu v OFDM systémech pomocí deep learning metod / Utilization of deep learning for channel estimation in OFDM systems Hubík, Daniel January 2019 (has links) This paper describes a wireless communication model based on IEEE 802.11n. Typical methods for channel equalisation and estimation are described, such as the least squares method and the minimum mean square error method. Equalization based on deep learning was used as well. Coded and uncoded bit error rate was used as a performance identifier. Experiments with topology of the neural network has been performed. Programming languages such as MATLAB and Python were used in this work.
5	Research and Design of Neural Processing Architectures Optimized for Embedded Applications Wu, Binyi 28 May 2024 (has links) Der Einsatz von neuronalen Netzen in Edge-Geräten und deren Einbindung in unser tägliches Leben findet immer mehr Beachtung. Ihre hohen Rechenkosten machen jedoch viele eingebettete Anwendungen zu einer Herausforderung. Das Hauptziel meiner Doktorarbeit ist es, einen Beitrag zur Lösung dieses Dilemmas zu leisten: die Optimierung neuronaler Netze und die Entwicklung entsprechender neuronaler Verarbeitungseinheiten für Endgeräte. Diese Arbeit nahm die algorithmische Forschung als Ausgangspunkt und wandte dann deren Ergebnisse an, um das Architekturdesign von Neural Processing Units (NPUs) zu verbessern. Die Optimierung neuronaler Netzwerkmodelle begann mit der Quantisierung neuronaler Netzwerke mit einfacher Präzision und entwickelte sich zu gemischter Präzision. Die Entwicklung der NPU-Architektur folgte den Erkenntnissen der Algorithmusforschung, um ein Hardware/Software Co-Design zu erreichen. Darüber hinaus wurde ein neuartiger Ansatz zur gemeinsamen Entwicklung von Hardware und Software vorgeschlagen, um das Prototyping und die Leistungsbewertung von NPUs zu beschleunigen. Dieser Ansatz zielt auf die frühe Entwicklungsphase ab. Er hilft Entwicklern, sich auf das Design und die Optimierung von NPUs zu konzentrieren und verkürzt den Entwicklungszyklus erheblich. Im Abschlussprojekt wurde ein auf maschinellem Lernen basierender Ansatz angewendet, um die Rechen- und Speicherressourcen der NPU zu erkunden and optimieren. Die gesamte Arbeit umfasst mehrere verschiedene Bereiche, von der Algorithmusforschung bis zum Hardwaredesign. Sie alle arbeiten jedoch an der Verbesserung der Inferenz-Effizienz neuronaler Netze. Die Optimierung der Algorithmen zielt insbesondere darauf ab, den Speicherbedarf und die Rechenkosten von neuronalen Netzen zu verringern. Das NPU-Design hingegen konzentriert sich auf die Verbesserung der Nutzung von Hardwareressourcen. Der vorgeschlagene Ansatz zur gemeinsamen Entwicklung von Software und Hardware verkürzt den Entwurfszyklus und beschleunigt die Entwurfsiterationen. Die oben dargestellte Reihenfolge entspricht dem Aufbau dieser Dissertation. Jedes Kapitel ist einem Thema gewidmet und umfasst relevante Forschungsarbeiten, Methodik und Versuchsergebnisse.:1 Introduction 2 Convolutional Neural Networks 2.1 Convolutional layer 2.1.1 Padding 2.1.2 Convolution 2.1.3 Batch Normalization 2.1.4 Nonlinearity 2.2 Pooling Layer 2.3 Fully Connected Layer 2.4 Characterization 2.4.1 Composition of Operations and Parameters 2.4.2 Arithmetic Intensity 2.5 Optimization 3 Quantization with Double-Stage Squeeze-and-Threshold 19 3.1 Overview 3.1.1 Binarization 3.1.2 Multi-bit Quantization 3.2 Quantization of Convolutional Neural Networks 3.2.1 Quantization Scheme 3.2.2 Operator fusion of Conv2D 3.3 Activation Quantization with Squeeze-and-Threshold 3.3.1 Double-Stage Squeeze-and-Threshold 3.3.2 Inference Optimization 3.4 Experiment 3.4.1 Ablation Study of Squeeze-and-Threshold 3.4.2 Comparison with State-of-the-art Methods 3.5 Summary 4 Low-Precision Neural Architecture Search 39 4.1 Overview 4.2 Differentiable Architecture Search 4.2.1 Gumbel Softmax 4.2.2 Disadvantage and Solution 4.3 Low-Precision Differentiable Architecture Search 4.3.1 Convolution Sharing 4.3.2 Forward-and-Backward Scaling 4.3.3 Power Estimation 4.3.4 Architecture of Supernet 4.4 Experiment 4.4.1 Effectiveness of solutions to the dominance problem 4.4.2 Softmax and Gumbel Softmax 4.4.3 Optimizer and Inverted Learning Rate Scheduler 4.4.4 NAS Method Evaluation 4.4.5 Searched Model Analysis 4.4.6 NAS Cost Analysis 4.4.7 NAS Training Analysis 4.5 Summary 5 Configurable Sparse Neural Processing Unit 65 5.1 Overview 5.2 NPU Architecture 5.2.1 Buffer 5.2.2 Reshapeable Mixed-Precision MAC Array 5.2.3 Sparsity 5.2.4 Post Process Unit 5.3 Mapping 5.3.1 Mixed-Precision MAC 5.3.2 MAC Array 5.3.3 Support of Other Operation 5.3.4 Configurability 5.4 Experiment 5.4.1 Performance Analysis of Runtime Configuration 5.4.2 Roofline Performance Analysis 5.4.3 Mixed-Precision 5.4.4 Comparison with Cortex-M7 5.5 Summary 6 Agile Development and Rapid Design Space Exploration 91 6.1 Overview 6.1.1 Agile Development 6.1.2 Design Space Exploration 6.2 Agile Development Infrastructure 6.2.1 Chisel Backend 6.2.2 NPU Software Stack 6.3 Modeling and Exploration 6.3.1 Area Modeling 6.3.2 Performance Modeling 6.3.3 Layered Exploration Framework 6.4 Experiment 6.4.1 Efficiency of Agile Development Infrastructure 6.4.2 Effectiveness of Agile Development Infrastructure 6.4.3 Area Modeling 6.4.4 Performance Modeling 6.4.5 Rapid Exploration and Pareto Front 6.5 Summary 7 Summary and Outlook 123 7.1 Summary 7.2 Outlook A Appendix of Double-Stage ST Quantization 127 A.1 Training setting of ResNet-18 in Table 3.3 A.2 Training setting of ReActNet in Table 3.4 A.3 Training setting of ResNet-18 in Table 3.4 A.4 Pseudocode Implementation of Double-Stage ST B Appendix of Low-Precision Neural Architecture Search 131 B.1 Low-Precision NAS on CIFAR-10 B.2 Low-Precision NAS on Tiny-ImageNet B.3 Low-Precision NAS on ImageNet Bibliography 137 / Deploying neural networks on edge devices and bringing them into our daily lives is attracting more and more attention. However, its expensive computational cost makes many embedded applications daunting. The primary objective of my doctoral studies is to make contributions towards resolving this predicament: optimizing neural networks and designing corresponding efficient neural processing units for edge devices. This work took algorithmic research, specifically the optimization of deep neural networks, as a starting point and then applied its findings to steer the architecture design of Neural Processing Units (NPUs). The optimization of neural network models started with single precision neural network quantization and progressed to mixed precision. The NPU architecture development followed the algorithmic research findings to achieve hardware/software co-design. Furthermore, a new approach to hardware and software co-development was introduced, aimed at expediting the prototyping and performance assessment of NPUs. This approach targets early-stage development. It helps developers to focus on the design and optimization of NPUs and significantly shortens the development cycle. In the final project, a machine learning-based approach was applied to explore and optimize the computational and memory resources of the NPU. The entire work covers several different areas, from algorithmic research to hardware design. But they all work on improving the inference efficiency of neural networks. Specifically, algorithm optimization aims to reduce the memory footprint and computational cost of neural networks. The NPU design, on the other hand, focuses on improving the utilization of hardware resources. The proposed software and hardware co-development approach shortens the design cycle and speeds up the design iteration. The order presented above corresponds to the structure of this dissertation. Each chapter corresponds to a topic and covers relevant research, methodology, and experimental results.:1 Introduction 2 Convolutional Neural Networks 2.1 Convolutional layer 2.1.1 Padding 2.1.2 Convolution 2.1.3 Batch Normalization 2.1.4 Nonlinearity 2.2 Pooling Layer 2.3 Fully Connected Layer 2.4 Characterization 2.4.1 Composition of Operations and Parameters 2.4.2 Arithmetic Intensity 2.5 Optimization 3 Quantization with Double-Stage Squeeze-and-Threshold 19 3.1 Overview 3.1.1 Binarization 3.1.2 Multi-bit Quantization 3.2 Quantization of Convolutional Neural Networks 3.2.1 Quantization Scheme 3.2.2 Operator fusion of Conv2D 3.3 Activation Quantization with Squeeze-and-Threshold 3.3.1 Double-Stage Squeeze-and-Threshold 3.3.2 Inference Optimization 3.4 Experiment 3.4.1 Ablation Study of Squeeze-and-Threshold 3.4.2 Comparison with State-of-the-art Methods 3.5 Summary 4 Low-Precision Neural Architecture Search 39 4.1 Overview 4.2 Differentiable Architecture Search 4.2.1 Gumbel Softmax 4.2.2 Disadvantage and Solution 4.3 Low-Precision Differentiable Architecture Search 4.3.1 Convolution Sharing 4.3.2 Forward-and-Backward Scaling 4.3.3 Power Estimation 4.3.4 Architecture of Supernet 4.4 Experiment 4.4.1 Effectiveness of solutions to the dominance problem 4.4.2 Softmax and Gumbel Softmax 4.4.3 Optimizer and Inverted Learning Rate Scheduler 4.4.4 NAS Method Evaluation 4.4.5 Searched Model Analysis 4.4.6 NAS Cost Analysis 4.4.7 NAS Training Analysis 4.5 Summary 5 Configurable Sparse Neural Processing Unit 65 5.1 Overview 5.2 NPU Architecture 5.2.1 Buffer 5.2.2 Reshapeable Mixed-Precision MAC Array 5.2.3 Sparsity 5.2.4 Post Process Unit 5.3 Mapping 5.3.1 Mixed-Precision MAC 5.3.2 MAC Array 5.3.3 Support of Other Operation 5.3.4 Configurability 5.4 Experiment 5.4.1 Performance Analysis of Runtime Configuration 5.4.2 Roofline Performance Analysis 5.4.3 Mixed-Precision 5.4.4 Comparison with Cortex-M7 5.5 Summary 6 Agile Development and Rapid Design Space Exploration 91 6.1 Overview 6.1.1 Agile Development 6.1.2 Design Space Exploration 6.2 Agile Development Infrastructure 6.2.1 Chisel Backend 6.2.2 NPU Software Stack 6.3 Modeling and Exploration 6.3.1 Area Modeling 6.3.2 Performance Modeling 6.3.3 Layered Exploration Framework 6.4 Experiment 6.4.1 Efficiency of Agile Development Infrastructure 6.4.2 Effectiveness of Agile Development Infrastructure 6.4.3 Area Modeling 6.4.4 Performance Modeling 6.4.5 Rapid Exploration and Pareto Front 6.5 Summary 7 Summary and Outlook 123 7.1 Summary 7.2 Outlook A Appendix of Double-Stage ST Quantization 127 A.1 Training setting of ResNet-18 in Table 3.3 A.2 Training setting of ReActNet in Table 3.4 A.3 Training setting of ResNet-18 in Table 3.4 A.4 Pseudocode Implementation of Double-Stage ST B Appendix of Low-Precision Neural Architecture Search 131 B.1 Low-Precision NAS on CIFAR-10 B.2 Low-Precision NAS on Tiny-ImageNet B.3 Low-Precision NAS on ImageNet Bibliography 137 info:eu-repo/classification/ddc/621 ddc:621

1

Page generated in 0.0727 seconds