Global ETD Search

101	Mixed Precision Quantization for Computer Vision Tasks in Autonomous Driving / Blandad Precisionskvantisering för Datorvisionsuppgifter vid Autonom Körning Rengarajan, Sri Janani January 2022 (has links) Quantization of Neural Networks is popular technique for adopting computation intensive Deep Learning applications to edge devices. In this work, low bit mixed precision quantization of FPN-Resnet18 model trained for the task of semantic segmentation is explored using Cityscapes and Arriver datasets. The Hessian information of each layer in the model is used to determine the bit precision for each layer and in some experiments the bit precision for the layers are determined randomly. The networks are quantization-aware trained with bit combinations 2, 4 and 8. The results obtained for both Cityscapes and Arriver datasets show that the quantization-aware trained networks with the low bit mixed precision technique offer a performance at par with the 8-bit quantization-aware trained networks and the segmentation performance degrades when the network activations are quantized below 8 bits. Also, it was found that the usage of the Hessian information had little effect on the network’s performance. / Kvantisering av Neurala nätverk är populär teknik för att införa beräknings-intensiva Deep Learning -applikationer till edge-enheter. I detta arbete utforskas låg bitmixad precisionskvantisering av FPN-Resnet18-modellen som är utbildad för uppgiften för semantisk segmentering med hjälp av Cityscapes och Arriverdatauppsättningar. Hessisk information från varje lager i modellen, används för att bestämma bitprecisionen för respektive lager. I vissa experiment bestäms bitprecision för skikten slumpmässigt. Nätverken är kvantiserings medvetna utbildade med bitkombinationer 2, 4 och 8. Resultaten som erhållits för både Cityscapes och Arriver datauppsättningar visar att de kvantiserings medvetna utbildade nätverken med lågbit blandad precisionsteknik erbjuder en prestanda i nivå med 8-bitars kvantiseringsmedvetna utbildade nätverk och segmenteringens prestationsgrader när nätverksaktiveringarna kvantiseras under 8 bitar. Det visade sig också att användningen av hessisk information hade liten effekt på nätets prestanda. Quantization Neural Networks Quantization Aware Training Mixed precision Semantic segmentation Hessian Kvantisering Neurala nätverk Kvantiseringsmedveten träning Blandad precision Semantisk segmentering Hessian Computer and Information Sciences Data- och informationsvetenskap
102	Использование диффузионных моделей для аугментации данных и улучшения качества сегментации изображений (на примере модели Stable Diffusion и наборе данных Caltech-UCSD Birds-200-2011) : магистерская диссертация / Using diffusion models to augment data and improve the quality of image segmentation (using the example of the Stable Diffusion model and the Caltech-UCSD Birds-200-2011 data set) Морий, С. М., Moriy, S. M. January 2023 (has links) Объект исследования: процесс аугментации изображений для решения задачи сегментации. Предмет исследования: методы аугментации и машинного обучения, с помощью которых осуществляется сегментация изображений. Цель работы: исследование эффективности генеративной аугментации изображений, выполненной с помощью диффузионной модели Stable Diffusion на примере задачи семантической сегментации. В процессе исследования проводились: рассмотрение основных подходов сегментации изображений и методов аугментации данных, разработка и реализация экспериментов для оценки эффективности генеративной аугментации изображений. В работе продемонстрирована эффективность подхода аугментации изображений, реализованного за счет расширения части исходного датасета путем генерирования новых данных с помощью диффузионной модели. Область практического применения: предложенный подход может быть использован для улучшения качества работы моделей семантической сегментации изображений в условиях ограниченного количества исходных данных, дефицита размеченных данных или дисбаланса данных. / Object of study: the process of image augmentation to solve the segmentation problem. Subject of research: augmentation and machine learning methods used for image segmentation. Purpose of the work: to study the effectiveness of generative image augmentation performed using the Stable Diffusion model using the example of a semantic segmentation task. During the research process, the following was carried out: consideration of the main approaches to image segmentation and data augmentation methods, development and implementation of experiments to evaluate the effectiveness of generative image augmentation. The work demonstrates the effectiveness of the image augmentation approach, implemented by expanding part of the original dataset by generating new data using a diffusion model. Area of practical application: the proposed approach can be used to improve the quality of work of semantic image segmentation models in conditions of a limited amount of source data, a shortage of labeled data, or data imbalance. КОМПЬЮТЕРНОЕ ЗРЕНИЕ АУГМЕНТАЦИЯ ДАННЫХ ДИФФУЗИОННЫЕ МОДЕЛИ MASTER'S THESIS COMPUTER VISION SEMANTIC SEGMENTATION DATA AUGMENTATION CONVOLUTIONAL NEURAL NETWORKS DIFFUSION MODELS
103	Pushing the boundary of Semantic Image Segmentation Jain, Shipra January 2020 (has links) The state-of-the-art object detection and image classification methods can perform impressively on more than 9k classes. In contrast, the number of classes in semantic segmentation datasets are fairly limited. This is not surprising , when the restrictions caused by the lack of labeled data and high computation demand are considered. To efficiently perform pixel-wise classification for c number of classes, segmentation models use cross-entropy loss on c-channel output for each pixel. The computational demand for such prediction turns out to be a major bottleneck for higher number of classes. The major goal of this thesis is to reduce the number of channels of the output prediction, thus allowing to perform semantic segmentation with very high number of classes. The reduction of dimension has been approached using metric learning for the semantic feature space. The metric learning provides us the mapping from pixel to embedding with minimal, still sufficient, number of dimensions. Our proposed approximation of groundtruth class probability for cross entropy loss helps the model to place the embeddings of same class pixels closer, reducing inter-class variabilty of clusters and increasing intra-class variability. The model also learns a prototype embedding for each class. In loss function, these class embeddings behave as positive and negative samples for pixel embeddings (anchor). We show that given a limited computational memory and resources, our approach can be used for training a segmentation model for any number of classes. We perform all experiments on one GPU and show that our approach performs similar and in some cases slightly better than deeplabv3+ baseline model for Cityscapes and ADE20K dataset. We also perform experiments to understand trade-offs in terms of memory usage, inference time and performance metrics. Our work helps in alleviating the problem of computational complexity, thus paving the way for image segmentation task with very high number of semantic classes. / De ledande djupa inlärningsmetoderna inom objektdetektion och bildklassificering kan hantera väl över 9000 klasser. Inom semantisk segmentering är däremot antalet klasser begränsat för vanliga dataset. Detta är inte förvånande då det behövs mycket annoterad data och beräkningskraft. För att effektivt kunna göra en pixelvis klassificering av c klasser, använder segmenteringsmetoder den s.k. korsentropin över c sannolikhets värden för varje pixel för att träna det djupa nätverket. Beräkningskomplexiteten från detta steg är den huvudsakliga flaskhalsen för att kunna öka antalet klasser. Det huvudsakliga målet av detta examensarbete är att minska antalet kanaler i prediktionen av nätverket för att kunna prediktera semantisk segmentering även vid ett mycket högt antal klasser. För att åstadkomma detta används metric learning för att träna slutrepresentationen av nätet. Metric learning metoden låter oss träna en representation med ett minimalt, men fortfarande tillräckligt antal dimensioner. Vi föreslår en approximation av korsentropin under träning som låter modellen placera representationer från samma klass närmare varandra, vilket reducerar interklassvarians och öka intraklarrvarians. Modellen lär sig en prototyprepresentation för varje klass. För inkärningskostnadsfunktionen ses dessa prototyper som positiva och negativa representationer. Vi visar att vår metod kan användas för att träna en segmenteringsmodell för ett godtyckligt antal klasser givet begränsade minnes- och beräkningsresurser. Alla experiment genomförs på en GPU. Vår metod åstadkommer liknande eller något bättre segmenteringsprestanda än den ursprungliga deeplabv3+ modellen på Cityscapes och ADE20K dataseten. Vi genomför också experiment för att analysera avvägningen mellan minnesanvändning, beräkningstid och segmenteringsprestanda. Vår metod minskar problemet med beräkningskomplexitet, vilket banar väg för segmentering av bilder med ett stort antal semantiska klasser. Deep Learning computer vision semantic segmentation metric learning contrastive learning Djup lärning datorsyn semantisk segmentering metrisk inlärning kontrastivt lärande Elektroteknik och elektronik
104	FGSSNet: Applying Feature-Guided Semantic Segmentation on real world floorplans Norrby, Hugo, Färm, Gabriel January 2024 (has links) This master thesis introduces FGSSNet, a novel multi-headed feature-guided semantic segmentation (FGSS) architecture designed to improve the generalization ability of segmentation models on floorplans by injecting domain-specific information into the latent space, guiding the segmentation process. FGSSNet features a U-Net segmentation backbone with a jointly trained reconstruction head attached to the U-Net decoder, tasked with reconstructing the injected feature maps, forcing their utilization throughout the decoding process. A multi-headed dedicated feature extractor is used to extract the domain-specific feature maps used by the FGSSNet while also predicting the wall width used for our novel dynamic scaling algorithm, designed to ensure spatial consistency between the training and real-world floorplans. The results show that the reconstruction head proved redundant, diverting the networks attention away from the segmentation task, ultimately hindering its performance. Instead, the ablated reconstruction head model, FGSSNet-NoRec, showed increased performance by utilizing the injected features freely, showcasing their importance. FGSSNet-NoRec slightly improves the IoU performance of comparable U-Net models by achieving 79.3 wall IoU(%) on a preprocessed CubiCasa5K dataset while showing an average IoU increase of 3.0 (5.3%) units on the more challenging real-world floorplans, displaying a superior generalization performance by leveraging the injected domain-specific information. Segmentation Semantic-Segmentation Feature-guided guide segmentation CubiCasa5k Floorplan injecting domain-specific FGSSNet FGSSNet-NoRec Unet Unet-backbone Computer and Information Sciences Data- och informationsvetenskap
105	Data-driven Infrastructure Inspection Bianchi, Eric Loran 18 January 2022 (has links) Bridge inspection and infrastructure inspection are critical steps in the lifecycle of the built environment. Emerging technologies and data are driving factors which are disrupting the traditional processes for conducting these inspections. Because inspections are mainly conducted visually by human inspectors, this paper focuses on improving the visual inspection process with data-driven approaches. Data driven approaches, however, require significant data, which was sparse in the existing literature. Therefore, this research first examined the present state of the existing data in the research domain. We reviewed hundreds of image-based visual inspection papers which used machine learning to augment the inspection process and from this, we compiled a comprehensive catalog of over forty available datasets in the literature and identified promising, emerging techniques and trends in the field. Based on our findings in our review we contributed six significant datasets to target gaps in data in the field. The six datasets comprised of structural material segmentation, corrosion condition state segmentation, crack detection, structural detail detection, and bearing condition state classification. The contributed datasets used novel annotation guidelines and benefitted from a novel semi-automated annotation process for both object detection and pixel-level detection models. Using the data obtained from our collected sources, task-appropriate deep learning models were trained. From these datasets and models, we developed a change detection algorithm to monitor damage evolution between two inspection videos and trained a GAN-Inversion model which generated hyper-realistic synthetic bridge inspection image data and could forecast a future deterioration state of an existing bridge element. While the application of machine learning techniques in civil engineering is not wide-spread yet, this research provides impactful contribution which demonstrates the advantages that data driven sciences can provide to more economically and efficiently inspect structures, catalog deterioration, and forecast potential outcomes. / Doctor of Philosophy / Bridge inspection and infrastructure inspection are critical steps in the lifecycle of the built environment. Emerging technologies and data are driving factors which are disrupting the traditional processes for conducting these inspections. Because inspections are mainly conducted visually by human inspectors, this paper focuses on improving the visual inspection process with data-driven approaches. Data driven approaches, however, require significant data, which was sparse in the existing literature. Therefore, this research first examined the present state of the existing data in the research domain. We reviewed hundreds of image-based visual inspection papers which used machine learning to augment the inspection process and from this, we compiled a comprehensive catalog of over forty available datasets in the literature and identified promising, emerging techniques and trends in the field. Based on our findings in our review we contributed six significant datasets to target gaps in data in the field. The six datasets comprised of structural material detection, corrosion condition state identification, crack detection, structural detail detection, and bearing condition state classification. The contributed datasets used novel labeling guidelines and benefitted from a novel semi-automated labeling process for the artificial intelligence models. Using the data obtained from our collected sources, task-appropriate artificial intelligence models were trained. From these datasets and models, we developed a change detection algorithm to monitor damage evolution between two inspection videos and trained a generative model which generated hyper-realistic synthetic bridge inspection image data and could forecast a future deterioration state of an existing bridge element. While the application of machine learning techniques in civil engineering is not widespread yet, this research provides impactful contribution which demonstrates the advantages that data driven sciences can provide to more economically and efficiently inspect structures, catalog deterioration, and forecast potential outcomes. dataset data set deep learning Machine learning inspection bridge inspection GAN image-registration semantic segmentation structural health monitoring change detection civil engineering structural engineering
106	Semantic Segmentation of Remote Sensing Data using Self-Supervised Learning Wallin, Emma, Åhlander, Rebecka January 2024 (has links) Semantic segmentation is the process of assigning a specific class label to each pixel in an image. There are multiple areas of use for semantic segmentation of remote sensing images, including climate change studies and urban planning and development. When training a network to perform semantic segmentation in a supervised manner, annotated data is crucial, and annotating satellite images is an expensive and time-consuming task. A resolution to this issue might be self-supervised learning. Training a pretext task on a large unlabeled dataset, and a downstream task on a smaller labeled dataset, could mitigate the need for large amounts of labeled data. In this thesis, the use of self-supervised learning for semantic segmentation of remote sensing data is investigated and compared to the traditional use of supervised pre-training using ImageNet. Two different methods of self-supervised learning are evaluated, a reconstructive method and a contrastive method. Furthermore, whether including modalities unique to remote sensing data yields greater performance for semantic segmentation is investigated. The findings indicate that self-supervised learning with in-domain data shows significant potential. While the performance of models pre-trained using self-supervised learning on remote sensing data, does not surpass that of pre-trained models using supervised learning on ImageNet, it achieves a comparable level. This is notable given the substantially smaller training data used. However, in cases where the in-domain dataset is small — as in this thesis with approximately 20,000 images — leveraging ImageNet for pre-training is preferable. Furthermore, self-supervised learning demonstrates promise as a more effective pre-training approach compared to supervised learning, when both methods are trained on ImageNet. The reconstructive method proves more suitable for semantic segmentation of remote sensing data compared to the contrastive method, and incorporating modalities unique to remote sensing further enhances performance. Machine Learning Deep Learning Satellite Imagery Remote Sensing Data Self-supervised Learning Semantic Segmentation
107	Unsupervised construction of 4D semantic maps in a long-term autonomy scenario Ambrus, Rares January 2017 (has links) Robots are operating for longer times and collecting much more data than just a few years ago. In this setting we are interested in exploring ways of modeling the environment, segmenting out areas of interest and keeping track of the segmentations over time, with the purpose of building 4D models (i.e. space and time) of the relevant parts of the environment. Our approach relies on repeatedly observing the environment and creating local maps at specific locations. The first question we address is how to choose where to build these local maps. Traditionally, an operator defines a set of waypoints on a pre-built map of the environment which the robot visits autonomously. Instead, we propose a method to automatically extract semantically meaningful regions from a point cloud representation of the environment. The resulting segmentation is purely geometric, and in the context of mobile robots operating in human environments, the semantic label associated with each segment (i.e. kitchen, office) can be of interest for a variety of applications. We therefore also look at how to obtain per-pixel semantic labels given the geometric segmentation, by fusing probabilistic distributions over scene and object types in a Conditional Random Field. For most robotic systems, the elements of interest in the environment are the ones which exhibit some dynamic properties (such as people, chairs, cups, etc.), and the ability to detect and segment such elements provides a very useful initial segmentation of the scene. We propose a method to iteratively build a static map from observations of the same scene acquired at different points in time. Dynamic elements are obtained by computing the difference between the static map and new observations. We address the problem of clustering together dynamic elements which correspond to the same physical object, observed at different points in time and in significantly different circumstances. To address some of the inherent limitations in the sensors used, we autonomously plan, navigate around and obtain additional views of the segmented dynamic elements. We look at methods of fusing the additional data and we show that both a combined point cloud model and a fused mesh representation can be used to more robustly recognize the dynamic object in future observations. In the case of the mesh representation, we also show how a Convolutional Neural Network can be trained for recognition by using mesh renderings. Finally, we present a number of methods to analyse the data acquired by the mobile robot autonomously and over extended time periods. First, we look at how the dynamic segmentations can be used to derive a probabilistic prior which can be used in the mapping process to further improve and reinforce the segmentation accuracy. We also investigate how to leverage spatial-temporal constraints in order to cluster dynamic elements observed at different points in time and under different circumstances. We show that by making a few simple assumptions we can increase the clustering accuracy even when the object appearance varies significantly between observations. The result of the clustering is a spatial-temporal footprint of the dynamic object, defining an area where the object is likely to be observed spatially as well as a set of time stamps corresponding to when the object was previously observed. Using this data, predictive models can be created and used to infer future times when the object is more likely to be observed. In an object search scenario, this model can be used to decrease the search time when looking for specific objects. / <p>QC 20171009</p> Mobile robotics autonomous systems perception computer vision RGB-D object segmentation modelling and recognition semantic segmentation long-term autonomy mapping temporal modeling
108	Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images Stewart, Seth Andrew 01 October 2018 (has links) We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place. Convolutional Neural Networks Document Image Analysis Fully Convolutional Neural Networks Layout Analysis Page Segmentation Pixel-Labeling Region Classification Semantic Segmentation Data Augmentation Historical Document Processing Optical Character Recognition Handwriting Recognition Computer Sciences
109	Fully Convolutional Neural Networks for Pixel Classification in Historical Document Images Stewart, Seth Andrew 01 October 2018 (has links) We use a Fully Convolutional Neural Network (FCNN) to classify pixels in historical document images, enabling the extraction of high-quality, pixel-precise and semantically consistent layers of masked content. We also analyze a dataset of hand-labeled historical form images of unprecedented detail and complexity. The semantic categories we consider in this new dataset include handwriting, machine-printed text, dotted and solid lines, and stamps. Segmentation of document images into distinct layers allows handwriting, machine print, and other content to be processed and recognized discriminatively, and therefore more intelligently than might be possible with content-unaware methods. We show that an efficient FCNN with relatively few parameters can accurately segment documents having similar textural content when trained on a single representative pixel-labeled document image, even when layouts differ significantly. In contrast to the overwhelming majority of existing semantic segmentation approaches, we allow multiple labels to be predicted per pixel location, which allows for direct prediction and reconstruction of overlapped content. We perform an analysis of prevalent pixel-wise performance measures, and show that several popular performance measures can be manipulated adversarially, yielding arbitrarily high measures based on the type of bias used to generate the ground-truth. We propose a solution to the gaming problem by comparing absolute performance to an estimated human level of performance. We also present results on a recent international competition requiring the automatic annotation of billions of pixels, in which our method took first place. Convolutional Neural Networks Document Image Analysis Fully Convolutional Neural Networks Layout Analysis Page Segmentation Pixel-Labeling Region Classification Semantic Segmentation Data Augmentation Historical Document Processing Optical Character Recognition Handwriting Recognition Physical Sciences and Mathematics
110	Počítačová podpora rozpoznávání a klasifikace rodových erbů / Computer Aided Recognization and Classification of Coat of Arms Vídeňský, František January 2017 (has links) This master thesis describes the design and development of the system for detection and recognition of whole coat of arms as well as each heraldic parts. In the thesis are presented methods of computer vision for segmentation and detection of an object and selected methods that are the most suitable. Most of the heraldic parts are segmented using a convolution neural networks and the rest using active contours. The Histogram of the gradient method was selected for coats of arms detection in an image. For training and functionality verification is used my own data set. The resulting system can serve as an auxiliary tool used in auxiliary sciences of history.

Search results