Global ETD Search

1	Trénovatelná segmentace obrazu s použitím hlubokého učení / Trainable image segmentation using deep learning Dolníček, Pavel January 2017 (has links) This work focuses on the topic of machine learning, specifically implementation of a program for automated classification using deep learning. This work compares different trainable models of neural networks and describes practical solutions encountered during their implementation.
2	TOWARDS AN UNDERSTANDING OF RESIDUAL NETWORKS USING NEURAL TANGENT HIERARCHY Yuqing Li (10223885) 06 May 2021 (has links) <div>Deep learning has become an important toolkit for data science and artificial intelligence. In contrast to its practical success across a wide range of fields, theoretical understanding of the principles behind the success of deep learning has been an issue of controversy. Optimization, as an important component of theoretical machine learning, has attracted much attention. The optimization problems induced from deep learning is often non-convex and</div><div>non-smooth, which is challenging to locate the global optima. However, in practice, global convergence of first-order methods like gradient descent can be guaranteed for deep neural networks. In particular, gradient descent yields zero training loss in polynomial time for deep neural networks despite its non-convex nature. Besides that, another mysterious phenomenon is the compelling performance of Deep Residual Network (ResNet). Not only</div><div>does training ResNet require weaker conditions, the employment of residual connections by ResNet even enables first-order methods to train the neural networks with an order of magnitude more layers. Advantages arising from the usage of residual connections remain to be discovered.</div><div><br></div><div>In this thesis, we demystify these two phenomena accordingly. Firstly, we contribute to further understanding of gradient descent. The core of our analysis is the neural tangent hierarchy (NTH) that captures the gradient descent dynamics of deep neural networks. A recent work introduced the Neural Tangent Kernel (NTK) and proved that the limiting</div><div>NTK describes the asymptotic behavior of neural networks trained by gradient descent in the infinite width limit. The NTH outperforms the NTK in two ways: (i) It can directly study the time variation of NTK for neural networks. (ii) It improves the result to non-asymptotic settings. Moreover, by applying NTH to ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width m with respect to the number of training samples n from quartic to cubic, obtaining a state-of-the-art result. Secondly, we extend our scope of analysis to structural properties of deep neural networks. By making fair and consistent comparisons between fully-connected network and ResNet, we suggest strongly that the particular skip-connection architecture possessed by ResNet is the main</div><div>reason for its triumph over fully-connected network.</div> Residual Network Deep Learning Neural Networks optimization method Gradient Descent
3	Post-Training Optimization of Cross-layer Approximate Computing for Edge Inference of Deep Learning Applications De la Parra Aparicio, Cecilia Eugenia 07 February 2024 (has links) Over the past decade, the rapid development of deep learning (DL) algorithms has enabled extraordinary advances in perception tasks throughout different fields, from computer vision to audio signal processing. Additionally, increasing computational resources available in supercomputers and graphic processor clusters have provided a suitable environment to train larger and deeper deep neural network (DNN) models for improved performances. However, the resulting memory bandwidth and computational requirements of such DNN models restricts their deployment in embedded systems with constrained hardware resources. To overcome this challenge, it is important to establish new paradigms to reduce the computational workload of such DL algorithms while maintaining their original accuracy. A key observation of previous research is that DL models are resilient to input noise and computational errors; therefore, a reasonable approach to decreasing such hardware requirements is to embrace DNN resiliency and utilize approximate computing techniques at different system design layers. This approach requires, however, constant monitoring as well as a careful combination of approximation techniques to avoid performance degradation while maximizing computational savings. Within this context, the focus of this thesis is the simulation of cross-layer approximate computing (AC) methods for DNN computation and the development of optimization methods to compensate AC errors in approximated DNNs. The first part of this thesis proposes the simulation framework ProxSim. This framework enables accelerated approximate computational unit (ACU) simulation for evaluation and training of approximated DNNs. ProxSim supports quantization and approximation of common neural layers such as fully connected (FC), convolutional, and recurrent layers. A performance evaluation using a variety of DNN architectures, as well as a comparison with the state of the art is also presented. The author used ProxSim to implement and evaluate the following methods presented in this work. The second part of this thesis introduces an approach to model the approximation error in DNN computation. First, the author thoroughly anaylzes the error caused by approximate multipliers to compute the multiply and accumulate (MAC) operations in DNN models. From this analysis, a statistical model of the approximation error is obtained. Through various experiments with DNNs for image classification, the proposed model is verified and compared with other methods from the literature. The results demonstrate the validity of the approximation error model and reinforce a general understanding of approximate computing in DNNs. In the third part of this thesis, the author presents a methodology for uniform systematic approximation of DNNs. This methodology focuses on the optimization of full DNN approximation with a single type of ACU to minimize power consumption without accuracy loss. The backbone of this methodology is the custom fine-tuning methods the author proposes to compensate for the approximation error. These methods enable the use of ACUs with large approximation errors, which results in significant power savings and negligible accuracy losses. This process is corroborated by extensive experiments, where the estimated savings and the accuracy achieved after approximation are thoroughly examined using ProxSim. In the last part of this thesis, the author proposes two different methodologies to further boost energy savings after applying uniform approximation. This increment in energy savings is achieved by computing more resilient DNN elements (neurons or layers) with increased approximation levels. The first methodology focuses on iterative kernel-wise approximation and quantization enabled by a custom approximate MAC unit. The second method is based on flexible layer-wise approximation, and applied to bit-decomposed in-memory computing (IMC) architectures as a case study to demonstrate the effectiveness of the proposed approach. info:eu-repo/classification/ddc/006 ddc:006
4	Deep learning for text spotting Jaderberg, Maxwell January 2015 (has links) This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power. 004
5	HBONEXT: AN EFFICIENT DNN FOR LIGHT EDGE EMBEDDED DEVICES Sanket Ramesh Joshi (10716561) 10 May 2021 (has links) <div>Every year the most effective Deep learning models, CNN architectures are showcased based on their compatibility and performance on the embedded edge hardware, especially for applications like image classification. These deep learning models necessitate a significant amount of computation and memory, so they can only be used on high-performance computing systems like CPUs or GPUs. However, they often struggle to fulfill portable specifications due to resource, energy, and real-time constraints. Hardware accelerators have recently been designed to provide the computational resources that AI and machine learning tools need. These edge accelerators have high-performance hardware which helps maintain the precision needed to accomplish this mission. Furthermore, this classification dilemma that investigates channel interdependencies using either depth-wise or group-wise convolutional features, has benefited from the inclusion of Bottleneck modules. Because of its increasing use in portable applications, the classic inverted residual block, a well-known architecture technique, has gotten more recognition. This work takes it a step forward by introducing a design method for porting CNNs to low-resource embedded systems, essentially bridging the difference between deep learning models and embedded edge systems. To achieve these goals, we use closer computing strategies to reduce the computer's computational load and memory usage while retaining excellent deployment efficiency. This thesis work introduces HBONext, a mutated version of Harmonious Bottlenecks (DHbneck) combined with a Flipped version of Inverted Residual (FIR), which outperforms the current HBONet architecture in terms of accuracy and model size miniaturization. Unlike the current definition of inverted residual, this FIR block performs identity mapping and spatial transformation at its higher dimensions. The HBO solution, on the other hand, focuses on two orthogonal dimensions: spatial (H/W) contraction-expansion and later channel (C) expansion-contraction, which are both organized in a bilaterally symmetric manner. HBONext is one of those versions that was designed specifically for embedded and mobile applications. In this research work, we also show how to use NXP Bluebox 2.0 to build a real-time HBONext image classifier. The integration of the model into this hardware has been a big hit owing to the limited model size of 3 MB. The model was trained and validated using CIFAR10 dataset, which performed exceptionally well due to its smaller size and higher accuracy. The validation accuracy of the baseline HBONet architecture is 80.97%, and the model is 22 MB in size. The proposed architecture HBONext variants, on the other hand, gave a higher validation accuracy of 89.70% and a model size of 3.00 MB measured using the number of parameters. The performance metrics of HBONext architecture and its various variants are compared in the following chapters.</div> Computer Engineering Convolution Neural Networks Artificial Intelligence CIFAR10 Embedded Systems Deep Learning Neural Networks image classification CNNs
6	Structural priors in deep neural networks Ioannou, Yani Andrew January 2018 (has links) Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors. We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation. Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.
7	U-net based deep learning architectures for object segmentation in biomedical images Nahian Siddique (11219427) 04 August 2021 (has links) <div>U-net is an image segmentation technique developed primarily for medical image analysis that can precisely segment images using a scarce amount of training data. These traits provide U-net with a high utility within the medical imaging community and have resulted in extensive adoption of U-net as the primary tool for segmentation tasks in medical imaging. The success of U-net is evident in its widespread use in nearly all major image modalities from CT scans and MRI to X-rays and microscopy. Furthermore, while U-net is largely a segmentation tool, there have been instances of the use of U-net in other applications. Given that U-net's potential is still increasing, this review examines the numerous developments and breakthroughs in the U-net architecture and provides observations on recent trends. We also discuss the many innovations that have advanced in deep learning and discuss how these tools facilitate U-net. In addition, we review the different image modalities and application areas that have been enhanced by U-net.</div><div>In recent years, deep learning for health care is rapidly infiltrating and transforming medical fields thanks to the advances in computing power, data availability, and algorithm development. In particular, U-Net, a deep learning technique, has achieved remarkable success in medical image segmentation and has become one of the premier tools in this area. While the accomplishments of U-Net and other deep learning algorithms are evident, there still exist many challenges in medical image processing to achieve human-like performance. In this thesis, we propose a U-net architecture that integrates a residual skip connections and recurrent feedback with EfficientNet as a pretrained encoder. Residual connections help feature propagation in deep neural networks and significantly improve performance against networks with a similar number of parameters while recurrent connections ameliorate gradient learning. We also propose a second model that utilizes densely connected layers aiding deeper neural networks. And the proposed third model that incorporates fractal expansions to bypass diminishing gradients. EfficientNet is a family of powerful pretrained encoders that streamline neural network design. The use of EfficientNet as an encoder provides the network with robust feature extraction that can be used by the U-Net decoder to create highly accurate segmentation maps. The proposed networks are evaluated against state-of-the-art deep learning based segmentation techniques to demonstrate their superior performance.</div> Computer Vision U-net Image Segmentation Semantic Segmentation Medical imaging Artificial neural network Deep Learning Neural Networks
8	Tiefes Reinforcement Lernen auf Basis visueller Wahrnehmungen Lange, Sascha 19 May 2010 (has links) Die vorliegende Arbeit widmet sich der Untersuchung und Weiterentwicklung selbständig lernender maschineller Lernverfahren (Reinforcement Lernen) in der Anwendung auf visuelle Wahrnehmungen. Zuletzt wurden mit der Einführung speicherbasierter Methoden in das Reinforcement Lernen große Fortschritte beim Lernen an realen Systemen erzielt, aber der Umgang mit hochkomplexen visuellen Eingabedaten, wie sie z.B. von einer digitalen Kamera aufgezeichnet werden, stellt weiterhin ein ungelöstes Problem dar. Bestehende Methoden sind auf den Umgang mit niedrigdimensionalen Zustandsbeschreibungen beschränkt, was eine Anwendung dieser Verfahren direkt auf den Strom von Bilddaten bisher ausschließt und den vorgeschalteten Einsatz klassischer Methoden des Bildverstehens zur Extraktion und geeigneten Kodierung der relevanten Informationen erfordert. Einen Ausweg bietet der Einsatz von so genannten `tiefen Autoencodern'. Diese mehrschichtigen neuronalen Netze ermöglichen es, selbstorganisiert niedrigdimensionale Merkmalsräume zur Repräsentation hochdimensionaler Eingabedaten zu erlernen und so eine klassische, aufgabenspezifische Bildanalyse zu ersetzen. In typischen Objekterkennungsaufgaben konnten auf Basis dieser erlernten Repräsentationen bereits beeindruckende Ergebnisse erzielt werden. Im Rahmen der vorliegenden Arbeit werden nun die tiefen Autoencodernetze auf ihre grundsätzliche Tauglichkeit zum Einsatz im Reinforcement Lernen untersucht. Mit dem ``Deep Fitted Q''-Algorithmus wird ein neuer Algorithmus entwickelt, der das Training der tiefen Autoencodernetze auf effiziente Weise in den Reinforcement Lernablauf integriert und so den Umgang mit visuellen Wahrnehmungen beim Strategielernen ermöglicht. Besonderes Augenmerk wird neben der Dateneffizienz auf die Stabilität des Verfahrens gelegt. Im Anschluss an eine Diskussion der theoretischen Aspekte des Verfahrens wird eine ausführliche empirische Evaluation der erzeugten Merkmalsräume und der erlernten Strategien an simulierten und realen Systemen durchgeführt. Dabei gelingt es im Rahmen der vorliegenden Arbeit mit Hilfe der entwickelten Methoden erstmalig, Strategien zur Steuerung realer Systeme direkt auf Basis der unvorverarbeiteten Bildinformationen zu erlernen, wobei von außen nur das zu erreichende Ziel vorgegeben werden muss. 54.72 - Künstliche Intelligenz 54.74 - Maschinelles Sehen ddc:500

Search results