Global ETD Search

11	Segmentação de cenas em telejornais: uma abordagem multimodal / Scene segmentation in news programs: a multimodal approach Coimbra, Danilo Barbosa 11 April 2011 (has links) Este trabalho tem como objetivo desenvolver um método de segmentação de cenas em vídeos digitais que trate segmentos semânticamente complexos. Como prova de conceito, é apresentada uma abordagem multimodal que utiliza uma definição mais geral para cenas em telejornais, abrangendo tanto cenas onde âncoras aparecem quanto cenas onde nenhum âncora aparece. Desse modo, os resultados obtidos da técnica multimodal foram signifiativamente melhores quando comparados com os resultados obtidos das técnicas monomodais aplicadas em separado. Os testes foram executados em quatro grupos de telejornais brasileiros obtidos de duas emissoras de TV diferentes, cada qual contendo cinco edições, totalizando vinte telejornais / This work aims to develop a method for scene segmentation in digital video which deals with semantically complex segments. As proof of concept, we present a multimodal approach that uses a more general definition for TV news scenes, covering both: scenes where anchors appear on and scenes where no anchor appears. The results of the multimodal technique were significantly better when compared with the results from monomodal techniques applied separately. The tests were performed in four groups of Brazilian news programs obtained from two different television stations, containing five editions each, totaling twenty newscasts Multimodal scene segmentation Multimodal video segmentation Segmentação de cena multimodal Segmentação de vídeo multimodal Segmentaçãop semântica Semantic segmentation
12	Multi-Task Learning using Road Surface Condition Classification and Road Scene Semantic Segmentation Westell, Jesper January 2019 (has links) Understanding road surface conditions is an important component in active vehicle safety. Estimations can be achieved through image classification using increasingly popular convolutional neural networks (CNNs). In this paper, we explore the effects of multi-task learning by creating CNNs capable of simultaneously performing the two tasks road surface condition classification (RSCC) and road scene semantic segmentation (RSSS). A multi-task network, containing a shared feature extractor (VGG16, ResNet-18, ResNet-101) and two taskspecific network branches, is built and trained using the Road-Conditions and Cityscapes datasets. We reveal that utilizing task-dependent homoscedastic uncertainty in the learning process improvesmulti-task model performance on both tasks. When performing task adaptation, using a small set of additional data labeled with semantic information, we gain considerable RSCC improvements on complex models. Furthermore, we demonstrate increased model generalizability in multi-task models, with up to 12% higher F1-score compared to single-task models. Computer Vision Deep Learning Machine Learning Convolutional Neural Networks Classification Semantic Segmentation Signal Processing Signalbehandling
13	Evaluation of Deep Learning-Based Semantic Segmentation Approaches for Autonomous Corrosion Detection on Metallic Surfaces Cheng Qian (7479359) 17 October 2019 (has links) <div> The structural defects can lead to serious safety issues and the corrosponding economic losses. In 2013, it was estimated that 2.5 trillion US dollars were spent on corrosion around the world, which was 3.4\% of the global Gross Domestic Product (GDP) (Koch, 2016). Periodical inspection of corrosion and maintenance of steel structures are essential to minimize these losses. Current corrosion inspection guidelines require inspectors to visually assess every critical member within arm's reach. This process is time-consuming, subjective and labor-intensive, and therefore is done only once every two years. </div><div><br></div><div>A promising solution is to use a robotic system, such as an Unmanned Aerial Vehicle (UAV), with computer vision techniques to assess corrosion on metallic surfaces. Several studies have been conducted in this area, but the shortcoming is that they cannot quantify the corroded region reliably: some studies only classify whether corrosion exists in the image or not; some only draw a box around corroded region; and some need human-engineered features to identify corrosion. This study aims to address this problem by using deep learning-based semantic segmentation to let the computer capture useful features and find the bounding of corroded regions accurately.</div><div><br></div><div>In this study, the performance of four state-of-the-art deep learning techniques for semantic segmentation was investigated for corrosion assessment task，including U-Net, DeepLab, PSPNet, and RefineNet. Six hundred high-resolution images of corroded regions were used to train and test the networks. Ten sets of experiments were performed on each architecture for cross-validation. Since the images were large, two approaches were used to analyze images: 1) subdividing images, 2) down-sampling images. A parametric analysis on these two prepossessing methods was also considered.</div><div><br></div><div>Prediction results were evaluated based on intersection over union (IoU), recall and precision scores. Statistical analysis using box chart and Wilcoxon singled ranked test showed that subdivided image dataset gave a better result, while resized images required less time for prediction. Performance of PSPNet outperformed the other three architectures on the subdivided dataset. DeepLab showed the best performance on the resized dataset. It was found Refinenet was not appropriate for corrosion detection task. U-Net was found to be ideal for real-time processing of image while RefineNet did not perform well for corrosion assessment.</div><div> </div> corrosion assessment semantic segmentation deep learning
14	Improving Photogrammetry using Semantic Segmentation Kernell, Björn January 2018 (has links) 3D reconstruction is the process of constructing a three-dimensional model from images. It contains multiple steps where each step can induce errors. When doing 3D reconstruction of outdoor scenes, there are some types of scene content that regularly cause problems and affect the resulting 3D model. Two of these are water, due to its fluctuating nature, and sky because of it containing no useful (3D) data. These areas cause different problems throughout the process and do generally not benefit it in any way. Therefore, masking them early in the reconstruction chain could be a useful step in an outdoor scene reconstruction pipeline. Manual masking of images is a time-consuming and boring task and it gets very tedious for big data sets which are often used in large scale 3D reconstructions. This master thesis explores if this can be done automatically using Convolutional Neural Networks for semantic segmentation, and to what degree the masking would benefit a 3D reconstruction pipeline. / 3D-rekonstruktion är teknologin bakom att skapa 3D-modeller utifrån bilder. Det är en process med många steg där varje steg kan medföra fel. Vid 3D-rekonstruktion av stora utomhusmiljöer finns det vissa typer av bildinnehåll som ofta ställer till problem. Två av dessa är vatten och himmel. Vatten är problematiskt då det kan fluktuera mycket från bild till bild samt att det kan innehålla reflektioner som ger olika utseenden från olika vinklar. Himmel å andra sidan ska aldrig ge upphov till 3D-information varför den lika gärna kan maskas bort. Manuell maskning av bilder är väldigt tidskrävande och dyrt. Detta examensarbete undersöker huruvida denna maskning kan göras automatiskt med Faltningsnät för Semantisk Segmentering och hur detta skulle kunna förbättra en 3D-rekonstruktionsprocess. photogrammetry semantic segmentation convolutional neural networks
15	Automated taxiing for unmanned aircraft systems Eaton, William H. January 2017 (has links) Over the last few years, the concept of civil Unmanned Aircraft System(s) (UAS) has been realised, with small UASs commonly used in industries such as law enforcement, agriculture and mapping. With increased development in other areas, such as logistics and advertisement, the size and range of civil UAS is likely to grow. Taken to the logical conclusion, it is likely that large scale UAS will be operating in civil airspace within the next decade. Although the airborne operations of civil UAS have already gathered much research attention, work is also required to determine how UAS will function when on the ground. Motivated by the assumption that large UAS will share ground facilities with manned aircraft, this thesis describes the preliminary development of an Automated Taxiing System(ATS) for UAS operating at civil aerodromes. To allow the ATS to function on the majority of UAS without the need for additional hardware, a visual sensing approach has been chosen, with the majority of work focusing on monocular image processing techniques. The purpose of the computer vision system is to provide direct sensor data which can be used to validate the vehicle s position, in addition to detecting potential collision risks. As aerospace regulations require the most robust and reliable algorithms for control, any methods which are not fully definable or explainable will not be suitable for real-world use. Therefore, non-deterministic methods and algorithms with hidden components (such as Artificial Neural Network (ANN)) have not been used. Instead, the visual sensing is achieved through a semantic segmentation, with separate segmentation and classification stages. Segmentation is performed using superpixels and reachability clustering to divide the image into single content clusters. Each cluster is then classified using multiple types of image data, probabilistically fused within a Bayesian network. The data set for testing has been provided by BAE Systems, allowing the system to be trained and tested on real-world aerodrome data. The system has demonstrated good performance on this limited dataset, accurately detecting both collision risks and terrain features for use in navigation.
16	Neural Networks for Semantic Segmentation in the Food Packaging Industry Carlsson, Mattias January 2018 (has links) Industrial applications of computer vision often utilize traditional image processing techniques whereas state-of-the-art methods in most image processing challenges are almost exclusively based on convolutional neural networks (CNNs). Thus there is a large potential for improving the performance of many machine vision applications by incorporating CNNs. One such application is the classification of juice boxes with straws, where the baseline solution uses classical image processing techniques on depth images to reject or accept juice boxes. This thesis aim to investigate how CNNs perform on the task of semantic segmentation (pixel-wise classification) of said images and if the result can be used to increase classification performance. A drawback of CNNs is that they usually require large amounts of labelled data for training to be able to generalize and learn anything useful. As labelled data is hard to come by, two ways to get cheap data are investigated, one being synthetic data generation and the other being automatic labelling using the baseline solution. The implemented network performs well on semantic segmentation, even when trained on synthetic data only, though the performance increases with the ratio of real (automatically labelled) to synthetic images. The classification task is very sensitive to small errors in semantic segmentation and the results are therefore not as good as the baseline solution. It is suspected that the drop in performance between validation and test data is due to a domain shift between the data sets, e.g. variations in data collection and straw and box type, and fine-tuning to the target domain could definitely increase performance. When trained on synthetic data the domain shift is even larger and the performance on classification is next to useless. It is likely that the results could be improved by using more advanced data generation, e.g. a generative adversarial network (GAN), or more rigorous modelling of the data. Deep learning neural networks semantic segmentation
17	Semantic-oriented Object Segmentation / Segmentation d'objet pour l'interprétation sémantique Zou, Wenbin 13 March 2014 (has links) Cette thèse porte sur les problèmes de segmentation d’objets et la segmentation sémantique qui visent soit à séparer des objets du fond, soit à l’attribution d’une étiquette sémantique spécifique à chaque pixel de l’image. Nous proposons deux approches pour la segmentation d’objets, et une approche pour la segmentation sémantique. La première approche est basée sur la détection de saillance. Motivés par notre but de segmentation d’objets, un nouveau modèle de détection de saillance est proposé. Cette approche se formule dans le modèle de récupération de la matrice de faible rang en exploitant les informations de structure de l’image provenant d’une segmentation ascendante comme contrainte importante. La segmentation construite à l’aide d’un schéma d’optimisation itératif et conjoint, effectue simultanément, d’une part, une segmentation d’objets basée sur la carte de saillance résultant de sa détection et, d’autre part, une amélioration de la qualité de la saillance à l’aide de la segmentation. Une carte de saillance optimale et la segmentation finale sont obtenues après plusieurs itérations. La deuxième approche proposée pour la segmentation d’objets se fonde sur des images exemples. L’idée sous-jacente est de transférer les étiquettes de segmentation d’exemples similaires, globalement et localement, à l’image requête. Pour l’obtention des exemples les mieux assortis, nous proposons une représentation nouvelle de haut niveau de l’image, à savoir le descripteur orienté objet, qui reflète à la fois l’information globale et locale de l’image. Ensuite, un prédicteur discriminant apprend en ligne à l’aide les exemples récupérés pour attribuer à chaque région de l’image requête un score d’appartenance au premier plan. Ensuite, ces scores sont intégrés dans un schéma de segmentation du champ de Markov (MRF) itératif qui minimise l’énergie. La segmentation sémantique se fonde sur une banque de régions et la représentation parcimonieuse. La banque des régions est un ensemble de régions générées par segmentations multi-niveaux. Ceci est motivé par l’observation que certains objets peuvent être capturés à certains niveaux dans une segmentation hiérarchique. Pour la description de la région, nous proposons la méthode de codage parcimonieux qui représente chaque caractéristique locale avec plusieurs vecteurs de base du dictionnaire visuel appris, et décrit toutes les caractéristiques locales d’une région par un seul histogramme parcimonieux. Une machine à support de vecteurs (SVM) avec apprentissage de noyaux multiple est utilisée pour l’inférence sémantique. Les approches proposées sont largement évaluées sur plusieurs ensembles de données. Des expériences montrent que les approches proposées surpassent les méthodes de l’état de l’art. Ainsi, par rapport au meilleur résultat de la littérature, l’approche proposée de segmentation d’objets améliore la mesure d F-score de 63% à 68,7% sur l’ensemble de données Pascal VOC 2011. / This thesis focuses on the problems of object segmentation and semantic segmentation which aim at separating objects from background or assigning a specific semantic label to each pixel in an image. We propose two approaches for the object segmentation and one approach for semantic segmentation. The first proposed approach for object segmentation is based on saliency detection. Motivated by our ultimate goal for object segmentation, a novel saliency detection model is proposed. This model is formulated in the low-rank matrix recovery model by taking the information of image structure derived from bottom-up segmentation as an important constraint. The object segmentation is built in an iterative and mutual optimization framework, which simultaneously performs object segmentation based on the saliency map resulting from saliency detection, and saliency quality boosting based on the segmentation. The optimal saliency map and the final segmentation are achieved after several iterations. The second proposed approach for object segmentation is based on exemplar images. The underlying idea is to transfer segmentation labels of globally and locally similar exemplar images to the query image. For the purpose of finding the most matching exemplars, we propose a novel high-level image representation method called object-oriented descriptor, which captures both global and local information of image. Then, a discriminative predictor is learned online by using the retrieved exemplars. This predictor assigns a probabilistic score of foreground to each region of the query image. After that, the predicted scores are integrated into the segmentation scheme of Markov random field (MRF) energy optimization. Iteratively finding minimum energy of MRF leads the final segmentation. For semantic segmentation, we propose an approach based on region bank and sparse coding. Region bank is a set of regions generated by multi-level segmentations. This is motivated by the observation that some objects might be captured at certain levels in a hierarchical segmentation. For region description, we propose sparse coding method which represents each local feature descriptor with several basic vectors in the learned visual dictionary, and describes all local feature descriptors within a region by a single sparse histogram. With the sparse representation, support vector machine with multiple kernel learning is employed for semantic inference. The proposed approaches have been extensively evaluated on several challenging and widely used datasets. Experiments demonstrated the proposed approaches outperform the stateofthe- art methods. Such as, compared to the best result in the literature, the proposed object segmentation approach based on exemplar images improves the F-score from 63% to 68.7% on Pascal VOC 2011 dataset. Segmentation d’objets Segmentation sémantique Détection de saillance Object segmentation Semantic segmentation Saliency detection 621.367
18	Segmentação de cenas em telejornais: uma abordagem multimodal / Scene segmentation in news programs: a multimodal approach Danilo Barbosa Coimbra 11 April 2011 (has links) Este trabalho tem como objetivo desenvolver um método de segmentação de cenas em vídeos digitais que trate segmentos semânticamente complexos. Como prova de conceito, é apresentada uma abordagem multimodal que utiliza uma definição mais geral para cenas em telejornais, abrangendo tanto cenas onde âncoras aparecem quanto cenas onde nenhum âncora aparece. Desse modo, os resultados obtidos da técnica multimodal foram signifiativamente melhores quando comparados com os resultados obtidos das técnicas monomodais aplicadas em separado. Os testes foram executados em quatro grupos de telejornais brasileiros obtidos de duas emissoras de TV diferentes, cada qual contendo cinco edições, totalizando vinte telejornais / This work aims to develop a method for scene segmentation in digital video which deals with semantically complex segments. As proof of concept, we present a multimodal approach that uses a more general definition for TV news scenes, covering both: scenes where anchors appear on and scenes where no anchor appears. The results of the multimodal technique were significantly better when compared with the results from monomodal techniques applied separately. The tests were performed in four groups of Brazilian news programs obtained from two different television stations, containing five editions each, totaling twenty newscasts Segmentação de cena multimodal Segmentação de vídeo multimodal Segmentaçãop semântica Multimodal scene segmentation Multimodal video segmentation Semantic segmentation
19	Sémantická segmentace obrazu pomocí konvolučních neuronových sítí / Semantic segmentation of images using convolutional neural networks Špila, Filip January 2020 (has links) Tato práce se zabývá rešerší a implementací vybraných architektur konvolučních neuronových sítí pro segmentaci obrazu. V první části jsou shrnuty základní pojmy z teorie neuronových sítí. Tato část také představuje silné stránky konvolučních sítí v oblasti rozpoznávání obrazových dat. Teoretická část je uzavřena rešerší zaměřenou na konkrétní architekturu používanou na segmentaci scén. Implementace této architektury a jejích variant v Caffe je převzata a upravena pro konkrétní použití v praktické části práce. Nedílnou součástí tohoto procesu jsou kroky potřebné ke správnému nastavení softwarového a hardwarového prostředí. Příslušná kapitola proto poskytuje přesný návod, který ocení zejména noví uživatelé Linuxu. Pro trénování všech variant vybrané sítě je vytvořen vlastní dataset obsahující 2600 obrázků. Je také provedeno několik nastavení původní implementace, zvláště pro účely použití předtrénovaných parametrů. Trénování zahrnuje výběr hyperparametrů, jakými jsou například typ optimalizačního algoritmu a rychlost učení. Na závěr je provedeno vyhodnocení výkonu a výpočtové náročnosti všech natrénovaných sítí na testovacím datasetu.
20	Self-supervised učení v aplikacích počítačového vidění / Self-supervised learning in computer vision applications Vančo, Timotej January 2021 (has links) The aim of the diploma thesis is to make research of the self-supervised learning in computer vision applications, then to choose a suitable test task with an extensive data set, apply self-supervised methods and evaluate. The theoretical part of the work is focused on the description of methods in computer vision, a detailed description of neural and convolution networks and an extensive explanation and division of self-supervised methods. Conclusion of the theoretical part is devoted to practical applications of the Self-supervised methods in practice. The practical part of the diploma thesis deals with the description of the creation of code for working with datasets and the application of the SSL methods Rotation, SimCLR, MoCo and BYOL in the role of classification and semantic segmentation. Each application of the method is explained in detail and evaluated for various parameters on the large STL10 dataset. Subsequently, the success of the methods is evaluated for different datasets and the limiting conditions in the classification task are named. The practical part concludes with the application of SSL methods for pre-training the encoder in the application of semantic segmentation with the Cityscapes dataset.

Search results