Global ETD Search

391	Decomposition of Morphological Structuring Elements and Segmentation of Human Objects in Video Sequences Yang, Hsin-Tai 11 September 2007 (has links) With the rapid development of image processing techniques, many unprecedented applications are emerging from all kinds of science branches, such as medicine, meteorology, astronomy, industrial control, etc. This dissertation presents an achievement of our research work related to the image processing field. Technically, the work consists of two parts. The first part concerns decomposition of morphological structuring elements while the second part explores the problems of human object segmentation in video/image data. In the first part, an integrated method, aiming to decompose a morphological structuring element into dilations of smaller ones, is proposed. By first formulating the decomposition problem into a set of linear constraints, the integer linear programming echnique is then applied to obtain an optimal decomposition. Compared to other existing approaches, the proposed method is more general and has several advantages. Firstly, it provides a systematic way of decomposing arbitrarily shaped structuring elements. Secondly, for convex images, factors can be of any size, not restricted to 3x3. Thirdly, the candidate set can be freely assigned by the user and finally the criteria of optimality can be flexible. In the second part, we present a three-stage system for segmentation of multiple human objects in a video stream. In the first stage, for a base frame to be segmented, we propose a hybrid self-clustering technique that incorporates the spatial concept as well as color attributes to reduce the number of small segments. In the second stage, the face shape modeled by the eight-directional convex polygons and the face features including two eyes and a mouth are extracted, parameterized, and fed to a trained neural network for detection of a human face. In the last stage, the size and orientation of the detected face region as well as the motion information among frames are used to roughly detect the corresponding body. To locate human objects more accurately, another neural network is constructed for recognizing the ambiguous regions. image processing morphology clustering human object segmentation neural network neural network human object segmentation clustering morphology image processing
392	Optical character recognition using deep learning / Reconhecimento óptico de caracteres usando aprendizado profundo Santos, Claudio Filipi Gonçalves dos 26 April 2018 (has links) Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-24T11:51:59Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 8334356 bytes, checksum: 8dd05363a96c946ae1f6d665edc80d09 (MD5) / Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) Corrigir a ordem das páginas pré-textuais; a ordem correta (capa, folha de rosto, dedicatória, agradecimentos, epígrafe, resumo na língua vernácula, resumo em língua estrangeira, listas de ilustrações, de tabelas, de abreviaturas, de siglas e de símbolos e sumário). Problema 03) Faltam as palavras-chave no resumo e no abstracts. Na página da Seção de pós-graduação, em Instruções para Qualificação e Defesas de Dissertação e Tese, você pode acessar o modelo das páginas pré-textuais. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-24T20:59:53Z (GMT) / Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T00:43:19Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11084990 bytes, checksum: 6f8d7431cd17efd931a31c0eade10c65 (MD5) / Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) A paginação deve ser sequencial, iniciando a contagem na folha de rosto e mostrando o número a partir da introdução, a ficha catalográfica ficará após a folha de rosto e não deverá ser contada. Problema 03) Na descrição do item: Título em outro idioma – Se você colocou no título em inglês deve por neste campo o título em outro idioma (ex: português, espanhol, francês...) Estamos encaminhando via e-mail o template/modelo para que você possa fazer as correções. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-25T15:22:45Z (GMT) / Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T15:52:53Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11089966 bytes, checksum: d6c863077a995bd2519035b8a3e97c80 (MD5) / Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Agradecemos a compreensão. on 2018-05-25T18:03:19Z (GMT) / Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T18:08:09Z No. of bitstreams: 1 Claudio Filipi Gonçalves dos Santos Corrigido Biblioteca.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5) / Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-05-25T18:51:24Z (GMT) No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5) / Made available in DSpace on 2018-05-25T18:51:24Z (GMT). No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5) Previous issue date: 2018-04-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Detectores óticos de caracteres, ou Optical Character Recognition (OCR) é o nome dado à técnologia de traduzir dados de imagens em arquivo de texto. O objetivo desse projeto é usar aprendizagem profunda, também conhecido por aprendizado hierárquico ou Deep Learning para o desenvolvimento de uma aplicação com a habilidade de detectar áreas candidatas, segmentar esses espaços dan imagem e gerar o texto contido na figura. Desde 2006, Deep Learning emergiu como uma nova área em aprendizagem de máquina. Em tempos recentes, as técnicas desenvolvidas em pesquisas com Deep Learning têm influenciado e expandido escopo, incluindo aspectos chaves nas área de inteligência artificial e aprendizagem de máquina. Um profundo estudo foi conduzido com a intenção de desenvolver um sistema OCR usando apenas arquiteturas de Deep Learning.A evolução dessas técnicas, alguns trabalhos passados e como esses trabalhos influenciaram o desenvolvimento dessa estrutura são explicados nesse texto. Essa tese demonstra com resultados como um classificador de caracteres foi desenvolvido. Em seguida é explicado como uma rede neural pode ser desenvolvida para ser usada como um detector de objetos e como ele pode ser transformado em um detector de texto. Logo após é demonstrado como duas técnicas diferentes de Deep Learning podem ser combinadas e usadas na tarefa de transformar segmentos de imagens em uma sequência de caracteres. Finalmente é demonstrado como o detector de texto e o sistema transformador de imagem em texto podem ser combinados para se desenvolver um sistema OCR completo que detecta regiões de texto nas imagens e o que está escrito nessa região. Esse estudo demonstra que a idéia de usar apenas estruturas de Deep Learning podem ter performance melhores do técnicas baseadas em outras áreas da computação como por exemplo o processamento de imagens. Para detecção de texto foi alcançado mais de 70% de precisão quando uma arquitetura mais complexa foi usada, por volta de 69% de traduções de imagens para texto corretas e por volta de 50% na tarefa ponta-à-ponta de detectar as áreas de texto e traduzi-las em sequência de caracteres. / Optical Character Recognition (OCR) is the name given to the technology used to translate image data into a text file. The objective of this project is to use Deep Learning techniques to develop a software with the ability to segment images, detecting candidate characters and generating textthatisinthepicture. Since2006,DeepLearningorhierarchicallearning, emerged as a new machine learning area. Over recent years, the techniques developed from deep learning research have influenced and expanded scope, including key aspects of artificial intelligence and machine learning. A thorough study was carried out in order to develop an OCR system using only Deep Learning architectures. It is explained the evolution of these techniques, some past works and how they influenced thisframework’sdevelopment. Inthisthesisitisdemonstratedwithresults how a single character classifier was developed. Then it is explained how a neural network can be developed to be an object detector and how to transform this object detector into a text detector. After that it shows how a set of two Deep Learning techniques can be combined and used in the taskoftransformingacroppedregionofanimageinastringofcharacters. Finally, it demonstrates how the text detector and the Image-to-Text systemswerecombinedinordertodevelopafullend-to-endOCRsystemthat detects the regions of a given image containing text and what is written in this region. It shows the idea of using only Deep Learning structures can outperform other techniques based on other areas like image processing. In text detection it reached over 70% of precision when a more complex architecture was used, around 69% of correct translation of image-to-text areasandaround50%onend-to-endtaskofdetectingareasandtranslating them into text. / 1623685 Aprendizado profundo Redes neurais convolucionais Redes neurais recorrentes OCR Deep learning Convolutional neural network Recurrent neural network
393	State Estimation for Truck and Trailer Systems using Deep Learning / Tillståndsskattning med hjälp av djupinlärning för lastbilar med dolly och semitrailer Arnström, Daniel January 2018 (has links) High precision control of a truck and trailer system requires accurate and robust state estimation of the system. This thesis work explores the possibility of estimating the states with high accuracy from sensors solely mounted on the truck. The sensors used are a LIDAR sensor, a rear-view camera and a RTK-GNSS receiver. Information about the angles between the truck and the trailer are extracted from LIDAR scans and camera images through deep learning and through model-based approaches. The estimates are fused together with a model of the dynamics of the system in an Extended Kalman Filter to obtain high precision state estimates. Training data for the deep learning approaches and data to evaluate and compare these methods with the model-based approaches are collected in a simulation environment established in Gazebo. The deep learning approaches are shown to give decent angle estimations but the model-based approaches are shown to result in more robust and accurate estimates. The flexibility of the deep learning approach to learn any model given sufficient training data has been highlighted and it is shown that a deep learning approach can be viable if the trailer has an irregular shape and a large amount of data is available. It is also shown that biases in measured lengths of the system can be remedied by estimating the biases online in the filter and this improves the state estimates. state estimation observer deep learning neural network convolutional neural network EKF truck trailer semi-trailer Control Engineering Reglerteknik
394	Multistage neural networks for pattern recognition Zieba, Maciej January 2009 (has links) In this work the concept of multistage neural networks is going to be presented. The possibility of using this type of structure for pattern recognition would be discussed and examined with chosen problem from eld area. The results of experiment would be confront with other possible methods used for the problem. two-stage neural network multi-stage neural network multistage pattern recognition writer identification iconic gesture recognition Computer Sciences Datavetenskap (datalogi)
395	EVALUATING THE IMPACT OF UNCERTAINTY ON THE INTEGRITY OF DEEP NEURAL NETWORKS Harborn, Jakob January 2021 (has links) Deep Neural Networks (DNNs) have proven excellent performance and are very successful in image classification and object detection. Safety critical industries such as the automotive and aerospace industry aim to develop autonomous vehicles with the help of DNNs. In order to certify the usage of DNNs in safety critical systems, it is essential to prove the correctness of data within the system. In this thesis, the research is focused on investigating the sources of uncertainty, what effects various sources of uncertainty has on NNs, and how it is possible to reduce uncertainty within an NN. Probabilistic methods are used to implement an NN with uncertainty estimation to analyze and evaluate how the integrity of the NN is affected. By analyzing and discussing the effects of uncertainty in an NN it is possible to understand the importance of including a method of estimating uncertainty. Preventing, reducing, or removing the presence of uncertainty in such a network improves the correctness of data within the system. With the implementation of the NN, results show that estimating uncertainty makes it possible to identify and classify the presence of uncertainty in the system and reduce the uncertainty to achieve an increased level of integrity, which improves the correctness of the predictions. Uncertainty Deep Neural Network Bayesian Neural Network Dependability Integrity Probability
396	Comparing Encoder-Decoder Architectures for Neural Machine Translation: A Challenge Set Approach Doan, Coraline 19 November 2021 (has links) Machine translation (MT) as a field of research has known significant advances in recent years, with the increased interest for neural machine translation (NMT). By combining deep learning with translation, researchers have been able to deliver systems that perform better than most, if not all, of their predecessors. While the general consensus regarding NMT is that it renders higher-quality translations that are overall more idiomatic, researchers recognize that NMT systems still struggle to deal with certain classic difficulties, and that their performance may vary depending on their architecture. In this project, we implement a challenge-set based approach to the evaluation of examples of three main NMT architectures: convolutional neural network-based systems (CNN), recurrent neural network-based (RNN) systems, and attention-based systems, trained on the same data set for English to French translation. The challenge set focuses on a selection of lexical and syntactic difficulties (e.g., ambiguities) drawn from literature on human translation, machine translation, and writing for translation, and also includes variations in sentence lengths and structures that are recognized as sources of difficulties even for NMT systems. This set allows us to evaluate performance in multiple areas of difficulty for the systems overall, as well as to evaluate any differences between architectures’ performance. Through our challenge set, we found that our CNN-based system tends to reword sentences, sometimes shifting their meaning, while our RNN-based system seems to perform better when provided with a larger context, and our attention-based system seems to struggle the longer a sentence becomes. neural machine translation machine translation evaluation convolutional neural network recurrent neural network challenge set
397	RMNv2: Reduced Mobilenet V2 An Efficient Lightweight Model for Hardware Deployment MANEESH AYI (8735112) 22 April 2020 (has links) Humans can visually see things and can differentiate objects easily but for computers, it is not that easy. Computer Vision is an interdisciplinary field that allows computers to comprehend, from digital videos and images, and differentiate objects. With the Introduction to CNNs/DNNs, computer vision is tremendously used in applications like ADAS, robotics and autonomous systems, etc. This thesis aims to propose an architecture, RMNv2, that is well suited for computer vision applications such as ADAS, etc.<br><div>RMNv2 is inspired by its original architecture Mobilenet V2. It is a modified version of Mobilenet V2. It includes changes like disabling downsample layers, Heterogeneous kernel-based convolutions, mish activation, and auto augmentation. The proposed model is trained from scratch in the CIFAR10 dataset and produced an accuracy of 92.4% with a total number of parameters of 1.06M. The results indicate that the proposed model has a model size of 4.3MB which is like a 52.2% decrease from its original implementation. Due to its less size and competitive accuracy the proposed model can be easily deployed in resource-constrained devices like mobile and embedded devices for applications like ADAS etc. Further, the proposed model is also implemented in real-time embedded devices like NXP Bluebox 2.0 and NXP i.MX RT1060 for image classification tasks. <br></div> Computer Engineering convolution neural network Deep Neural Network (DNN) embedded systems Bluebox 2.0
398	Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient January 2019 (has links) abstract: Deep neural networks (DNNs) have had tremendous success in a variety of statistical learning applications due to their vast expressive power. Most applications run DNNs on the cloud on parallelized architectures. There is a need for for efficient DNN inference on edge with low precision hardware and analog accelerators. To make trained models more robust for this setting, quantization and analog compute noise are modeled as weight space perturbations to DNNs and an information theoretic regularization scheme is used to penalize the KL-divergence between perturbed and unperturbed models. This regularizer has similarities to both natural gradient descent and knowledge distillation, but has the advantage of explicitly promoting the network to and a broader minimum that is robust to weight space perturbations. In addition to the proposed regularization, KL-divergence is directly minimized using knowledge distillation. Initial validation on FashionMNIST and CIFAR10 shows that the information theoretic regularizer and knowledge distillation outperform existing quantization schemes based on the straight through estimator or L2 constrained quantization. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019 Artificial intelligence Computer engineering Computer science analog neural network distillation fisher information kl-divergence non-volatile memory accelerator quantized neural network
399	Hybrid Model Approach to Appliance Load Disaggregation : Expressive appliance modelling by combining convolutional neural networks and hidden semi Markov models. / Hybridmodell för disaggregering av hemelektronik : Detaljerad modellering av elapparater genom att kombinera neurala nätverk och Markovmodeller. Huss, Anders January 2015 (has links) The increasing energy consumption is one of the greatest environmental challenges of our time. Residential buildings account for a considerable part of the total electricity consumption and is further a sector that is shown to have large savings potential. Non Intrusive Load Monitoring (NILM), i.e. the deduction of the electricity consumption of individual home appliances from the total electricity consumption of a household, is a compelling approach to deliver appliance specific consumption feedback to consumers. This enables informed choices and can promote sustainable and cost saving actions. To achieve this, accurate and reliable appliance load disaggregation algorithms must be developed. This Master's thesis proposes a novel approach to tackle the disaggregation problem inspired by state of the art algorithms in the field of speech recognition. Previous approaches, for sampling frequencies <img src="http://www.diva-portal.org/cgi-bin/mimetex.cgi?%5Cleq" />1 Hz, have primarily focused on different types of hidden Markov models (HMMs) and occasionally the use of artificial neural networks (ANNs). HMMs are a natural representation of electric appliances, however with a purely generative approach to disaggregation, basically all appliances have to be modelled simultaneously. Due to the large number of possible appliances and variations between households, this is a major challenge. It imposes strong restrictions on the complexity, and thus the expressiveness, of the respective appliance model to make inference algorithms feasible. In this thesis, disaggregation is treated as a factorisation problem where the respective appliance signal has to be extracted from its background. A hybrid model is proposed, where a convolutional neural network (CNN) extracts features that correlate with the state of a single appliance and the features are used as observations for a hidden semi Markov model (HSMM) of the appliance. Since this allows for modelling of a single appliance, it becomes computationally feasible to use a more expressive Markov model. As proof of concept, the hybrid model is evaluated on 238 days of 1 Hz power data, collected from six households, to predict the power usage of the households' washing machine. The hybrid model is shown to perform considerably better than a CNN alone and it is further demonstrated how a significant increase in performance is achieved by including transitional features in the HSMM. / Den ökande energikonsumtionen är en stor utmaning för en hållbar utveckling. Bostäder står för en stor del av vår totala elförbrukning och är en sektor där det påvisats stor potential för besparingar. Non Intrusive Load Monitoring (NILM), dvs. härledning av hushållsapparaters individuella elförbrukning utifrån ett hushålls totala elförbrukning, är en tilltalande metod för att fortlöpande ge detaljerad information om elförbrukningen till hushåll. Detta utgör ett underlag för medvetna beslut och kan bidraga med incitament för hushåll att minska sin miljöpåverakan och sina elkostnader. För att åstadkomma detta måste precisa och tillförlitliga algoritmer för el-disaggregering utvecklas. Denna masteruppsats föreslår ett nytt angreppssätt till el-disaggregeringsproblemet, inspirerat av ledande metoder inom taligenkänning. Tidigare angreppsätt inom NILM (i frekvensområdet <img src="http://www.diva-portal.org/cgi-bin/mimetex.cgi?%5Cleq" />1 Hz) har huvudsakligen fokuserat på olika typer av Markovmodeller (HMM) och enstaka förekomster av artificiella neurala nätverk. En HMM är en naturlig representation av en elapparat, men med uteslutande generativ modellering måste alla apparater modelleras samtidigt. Det stora antalet möjliga apparater och den stora variationen i sammansättningen av dessa mellan olika hushåll utgör en stor utmaning för sådana metoder. Det medför en stark begränsning av komplexiteten och detaljnivån i modellen av respektive apparat, för att de algoritmer som används vid prediktion ska vara beräkningsmässigt möjliga. I denna uppsats behandlas el-disaggregering som ett faktoriseringsproblem, där respektive apparat ska separeras från bakgrunden av andra apparater. För att göra detta föreslås en hybridmodell där ett neuralt nätverk extraherar information som korrelerar med sannolikheten för att den avsedda apparaten är i olika tillstånd. Denna information används som obervationssekvens för en semi-Markovmodell (HSMM). Då detta utförs för en enskild apparat blir det beräkningsmässigt möjligt att använda en mer detaljerad modell av apparaten. Den föreslagna Hybridmodellen utvärderas för uppgiften att avgöra när tvättmaskinen används för totalt 238 dagar av elförbrukningsmätningar från sex olika hushåll. Hybridmodellen presterar betydligt bättre än enbart ett neuralt nätverk, vidare påvisas att prestandan förbättras ytterligare genom att introducera tillstånds-övergång-observationer i HSMM:en. NILM disaggregation neural network convolutional neural network CNN HMM HSMM Other Computer and Information Science Annan data- och informationsvetenskap
400	Using machine learning to help find paths through the map in Slay the Spire Porenius, Oscar, Hansson, Nils January 2021 (has links) Slay the Spire is a complex deck-building and roguelike game with many possibilities of improving players ability to win. An important part of Slay the Spire is choosing a path that makes the players character as successful as possible. In this study we show that machine learning can help players pick better paths by creating an Artificial Neural Network (ANN) that predicts the most successful path of all available paths, we also discuss what makes a path successful. This study performed two experiments, one user study and one simulation experiment, with the intention of evaluating the created ANN and analysing what makes paths successful. Through the user study this paper shows that the ANN was effective at predicting paths, outperforming all other human players who played normally in all three cases. This study concludes that machine learning can be used effectively to help make pathing decisions in Slay the Spire. Furthermore the study proves the importance of the room types ’Elite’ and ’Campfires’ through the simulation experiment, user study and analysis of data from previous playthroughs. Slay the Spire Artificial Intelligence Neural Network TensorFlow AI Artificial Neural Network Pathfinding Machine learning Computer Sciences Datavetenskap (datalogi)

Search results