Global ETD Search

71	Robust Real-Time Estimation of Region Displacements in Video Sequences Skoglund, Johan January 2007 (has links) The possibility to use real-time computer vision in video sequences gives many opportunities for a system to interact with the environment. Possible ways for interaction are e.g. augmented reality like in the MATRIS project where the purpose is to add new objects into the video sequence, or surveillance where the purpose is to find abnormal events. The increase of the speed of computers the last years has simplified this process and it is now possible to use at least some of the more advanced computer vision algorithms that are available. The computational speed of computers is however still a problem, for an efficient real-time system efficient code and methods are necessary. This thesis deals with both problems, one part is about efficient implementations using single instruction multiple data (SIMD) instructions and one part is about robust tracking. An efficient real-time system requires efficient implementations of the used computer vision methods. Efficient implementations requires knowledge about the CPU and the possibilities given. In this thesis, one method called SIMD is explained. SIMD is useful when the same operation is applied to multiple data which usually is the case in computer vision, the same operation is executed on each pixel. Following the position of a feature or object in a video sequence is called tracking. Tracking can be used for a number of applications. The application in this thesis is to use tracking for pose estimation. One way to do tracking is to cut out a small region around the feature, creating a patch and find the position on this patch in the other frames. To find the position, a measure of the difference between the patch and the image in a given position is used. This thesis thoroughly investigates the sum of absolute difference (SAD) error measure. The investigation involves different ways to improve the robustness and to decrease the average error. One method to estimate the average error, the covariance of the position error is proposed. An estimate of the average error is needed when different measurements are combined. Finally, a system for camera pose estimation is presented. The computer vision part of this system is based on the result in this thesis. This presentation contains also a discussion about the result of this system. / Report code: LIU-TEK-LIC-2007:5. The report code in the thesis is incorrect. tracking subpixel real time covariance pose estimation
72	Neural Networks for Semantic Segmentation in the Food Packaging Industry Carlsson, Mattias January 2018 (has links) Industrial applications of computer vision often utilize traditional image processing techniques whereas state-of-the-art methods in most image processing challenges are almost exclusively based on convolutional neural networks (CNNs). Thus there is a large potential for improving the performance of many machine vision applications by incorporating CNNs. One such application is the classification of juice boxes with straws, where the baseline solution uses classical image processing techniques on depth images to reject or accept juice boxes. This thesis aim to investigate how CNNs perform on the task of semantic segmentation (pixel-wise classification) of said images and if the result can be used to increase classification performance. A drawback of CNNs is that they usually require large amounts of labelled data for training to be able to generalize and learn anything useful. As labelled data is hard to come by, two ways to get cheap data are investigated, one being synthetic data generation and the other being automatic labelling using the baseline solution. The implemented network performs well on semantic segmentation, even when trained on synthetic data only, though the performance increases with the ratio of real (automatically labelled) to synthetic images. The classification task is very sensitive to small errors in semantic segmentation and the results are therefore not as good as the baseline solution. It is suspected that the drop in performance between validation and test data is due to a domain shift between the data sets, e.g. variations in data collection and straw and box type, and fine-tuning to the target domain could definitely increase performance. When trained on synthetic data the domain shift is even larger and the performance on classification is next to useless. It is likely that the results could be improved by using more advanced data generation, e.g. a generative adversarial network (GAN), or more rigorous modelling of the data. Deep learning neural networks semantic segmentation
73	Improving Discriminative Correlation Filters for Visual Tracking / Förbättring av korrelationsfilter för visuell följning Häger, Gustav January 2015 (has links) Generic visual tracking is one of the classical problems in computer vision. In this problem, no prior knowledge of the target is available aside from a bounding box in the initial frame of the sequence. The generic visual tracking is a difficult task due to a number of factors such as momentary occlusions, target rotations, changes in target illumination and variations in the target size. In recent years, discriminative correlation filter (DCF) based trackers have shown promising results for visual tracking. These DCF based methods use the Fourier transform to efficiently calculate detection and model updates, allowing significantly higher frame rates than competing methods. However, existing DCF based methods only estimate translation of the object while ignoring changes in size.This thesis investigates the problem of accurately estimating the scale variations within a DCF based framework. A novel scale estimation method is proposed by explicitly constructing translation and scale filters. The proposed scale estimation technique is robust and significantly improve the tracking performance, while operating at real-time. In addition, a comprehensive evaluation of feature representations in a DCF framework is performed. Experiments are performed on the benchmark OTB-2015 dataset, as well as the VOT 2014 dataset. The proposed methods are shown to significantly improve the performance of existing DCF based trackers. / Allmän visuell följning är ett klassiskt problem inom datorseende. I den vanliga formuleringen antas ingen förkunskap om objektet som skall följas, utöver en initial rektangel i en videosekvens första bild.Detta är ett mycket svårt problem att lösa allmänt på grund av occlusioner, rotationer, belysningsförändringar och variationer i objektets uppfattde storlek. På senare år har följningsmetoder baserade på diskriminativea korrelationsfilter gett lovande resultat inom området. Dessa metoder är baserade på att med hjälp av Fourertransformen effektivt beräkna detektioner och modellupdateringar, samtidigt som de har mycket bra prestanda och klarar av många hundra bilder per sekund. De nuvarande metoderna uppskattar dock bara translationen hos det följda objektet, medans skalförändringar ignoreras. Detta examensarbete utvärderar ett antal metoder för att göra skaluppskattningar inom ett korrelationsfilterramverk. En innovativ metod baserad på att konstruera separata skal och translationsfilter. Den föreslagna metoden är robust och har signifikant bättre följningsprestanda, samtidigt som den kan användas i realtid. Det utförs också en utvärdering av olika särdragsrepresentationer på två stora benchmarking dataset för följning. Tracking Computer vision
74	Fisheye Camera Calibration and Image Stitching for Automotive Applications Söderroos, Anna January 2015 (has links) Integrated camera systems for increasing safety and maneuverability are becoming increasingly common for heavy vehicles. One problem with heavy vehicles today is that there are blind spots where the driver has no or very little view. There is a great demand on increasing the safety and helping the driver to get a better view of his surroundings. This can be achieved by a sophisticated camera system, using cameras with wide field of view, that could cover dangerous blind spots. This master thesis aims to investigate and develop a prototype solution for a camera system consisting of two fisheye cameras. The solution covers both hardware choices and software development including camera calibration and image stitching. Two different fisheye camera calibration toolboxes are compared and their results discussed, with the aim to find the most suitable for this application. The result from the two toolboxes differ in performance, and the result from only one of the toolboxes is sufficient for image stitching. fisheye camera calibration image stitching
75	Anomaly Detection for Product Inspection and Surveillance Applications / Anomalidetektion för produktinspektions- och övervakningsapplikationer Thulin, Peter January 2015 (has links) Anomaly detection is a general theory of detecting unusual patterns or events in data. This master thesis investigates the subject of anomaly detection in two different applications. The first application is product inspection using a camera and the second application is surveillance using a 2D laser scanner. The first part of the thesis presents a system for automatic visual defect inspection. The system is based on aligning the images of the product to a common template and doing pixel-wise comparisons. The system is trained using only images of products that are defined as normal, i.e. non-defective products. The visual properties of the inspected products are modelled using three different methods. The performance of the system and the different methods have been evaluated on four different datasets. The second part of the thesis presents a surveillance system based on a single laser range scanner. The system is able to detect certain anomalous events based on the time, position and velocities of individual objects in the scene. The practical usefulness of the system is made plausible by a qualitative evaluation using unlabelled data. Anomaly detection product inspection surveillance computer vision
76	Photogrammetric methods for calculating the dimensions of cuboids from images / Fotogrammetriska metoder för beräkning av dimensionerna på rätblock från bilder Lennartsson, Louise January 2015 (has links) There are situations where you would like to know the size of an object but do not have a ruler nearby. However, it is likely that you are carrying a smartphone that has an integrated digital camera, so imagine if you could snap a photo of the object to get a size estimation. Different methods for finding the dimensions of a cuboid from a photography are evaluated in this project. A simple Android application implementing these methods has also been created. To be able to perform measurements of objects in images we need to know how the scene is reproduced by the camera. This depends on the traits of the camera, called the intrinsic parameters. These parameters are unknown unless a camera calibration is performed, which is a non-trivial task. Because of this eight smartphone cameras, of different models, were calibrated in search of similarities that could give ground for generalisations. To be able to determine the size of the cuboid the scale needs to be known, which is why a reference object is used. In this project a credit card is used as reference, which is placed on top of the cuboid. The four corners of the reference as well as four corners of the cuboid are used to determine the dimensions of the cuboid. Two methods, one dependent and one independent of the intrinsic parameters, are used to find the width and length, i.e. the sizes of the two dimensions that share a plane with the reference. These results are then used in another two methods to find the height of the cuboid. Errors were purposely introduced to the corners to investigate the performance of the different methods. The results show that the different methods perform very well and are all equally suitable for this type of problem. They also show that having correct reference corners is more important than having correct object corners as the results were highly dependent on the accuracy of the reference corners. Another conclusion is that the camera calibration is not necessary because different approximations of the intrinsic parameters can be used instead. / Det finns tillfällen då man undrar över storleken på ett föremål, men inte har något mätinstrument i närheten. Det är dock troligt att du har en smartphone på dig. Smartphones har oftast en integrerad digitalkamera, så tänk om du kunde ta ett foto på föremålet och få en storleksuppskattning. I det här projektet har olika metoder för att beräkna dimensionerna på ett rätblock utvärderats. En enkel Android-applikation som implementerar dessa metoder har också skapats. För att kunna göra mätningar på föremål i bilder måste vi veta hur vyn återskapas av kameran. Detta beror på kamerans egenskaper vilka kallas kameraparametrarna. Dessa parametrar kan man få fram genom att göra en kamerakalibrering, vilket inte är en trivial uppgift. Därför har åtta smartphonekameror, från olika tillverkare, kalibrerats för att se om det finns likheter mellan kamerorna som kan befoga vissa generaliseringar. För att kunna räkna ut storleken på rätblocket måste skalan vara känd och därför används ett referensobjekt. I detta projekt har ett kreditkort använts som referensobjekt. Referensen placeras ovanpå rätblocket och sedan används fyra av referensens hörn samt fyra av rätblockets hörn i beräkningarna. Två metoder, en beroende och en oberoende av kameraparametrarna, har använts för att beräkna längden och bredden, alltså längden på de två sidor som ligger i samma plan som referensobjektet. Detta resultat används sedan i ytterligare två olika metoder för att beräkna höjden på rätblocket. För att undersöka hur de olika metoderna klarade av fel manipulerades hörnen. Resultaten visar att de olika metoderna fungerar bra och är alla lika lämpliga för att lösa den här uppgiften. De visar också på att det är viktigare att referensobjektets hörn är korrekta än rätblockets hörn eftersom referensobjektets hörn hade större inverkan på resultaten. En slutsats som också kan dras är att kameraparametrarna kan approximeras och att kamerakalibrering därför inte nödvändigtvis behöver utföras. Computational Mathematics Beräkningsmatematik
77	FPGA-Accelerated Dehazing by Visible and Near-infrared Image Fusion Karlsson, Jonas January 2015 (has links) Fog and haze can have a dramatic impact on vision systems for land and sea vehicles. The impact of such conditions on infrared images is not as severe as for standard images. By fusing images from two cameras, one ordinary and one near-infrared camera, a complete dehazing system with colour preservation can be achieved. Applying several different algorithms to an image set and evaluating the results, the most suitable image fusion algoritm has been identified. Using an FPGA, a programmable integrated circuit, a crucial part of the algorithm has been implemented. It is capable of producing processed images 30 times faster than a laptop computer. This implementation lays the foundation of a real-time dehazing system and provides a significant part of the full solution. The results show that such a system can be accomplished with an FPGA. Image Fusion Acceleration FPGA Vision System
78	Object Detection and Semantic Segmentation Using Self-Supervised Learning Gustavsson, Simon January 2021 (has links) In this thesis, three well known self-supervised methods have been implemented and trained on road scene images. The three so called pretext tasks RotNet, MoCov2, and DeepCluster were used to train a neural network self-supervised. The self-supervised trained networks where then evaluated on different amount of labeled data on two downstream tasks, object detection and semantic segmentation. The performance of the self-supervised methods are compared to networks trained from scratch on the respective downstream task. The results show that it is possible to achieve a performance increase using self-supervision on a dataset containing road scene images only. When only a small amount of labeled data is available, the performance increase can be substantial, e.g., a mIoU from 33 to 39 when training semantic segmentation on 1750 images with a RotNet pre-trained backbone compared to training from scratch. However, it seems that when a large amount of labeled images are available (>70000 images), the self-supervised pretraining does not increase the performance as much or at all. Self-supervised learning Computer vision
79	Generating synthetic brain MR images using a hybrid combination of Noise-to-Image and Image-to-Image GANs Schilling, Lennart January 2020 (has links) Generative Adversarial Networks (GANs) have attracted much attention because of their ability to learn high-dimensional, realistic data distributions. In the field of medical imaging, they can be used to augment the often small image sets available. In this way, for example, the training of image classification or segmentation models can be improved to support clinical decision making. GANs can be distinguished according to their input. While Noise-to-Image GANs synthesize new images from a random noise vector, Image-To-Image GANs translate a given image into another domain. In this study, it is investigated if the performance of a Noise-To-Image GAN, defined by its generated output quality and diversity, can be improved by using elements of a previously trained Image-To-Image GAN within its training. The data used consists of paired T1- and T2-weighted MR brain images. With the objective of generating additional T1-weighted images, a hybrid model (Hybrid GAN) is implemented that combines elements of a Deep Convolutional GAN (DCGAN) as a Noise-To-Image GAN and a Pix2Pix as an Image-To-Image GAN. Thereby, starting from the dependency of an input image, the model is gradually converted into a Noise-to-Image GAN. Performance is evaluated by the use of an independent classifier that estimates the divergence between the generative output distribution and the real data distribution. When comparing the Hybrid GAN performance with the DCGAN baseline, no improvement, neither in the quality nor in the diversity of the generated images, could be observed. Consequently, it could not be shown that the performance of a Noise-To-Image GAN is improved by using elements of a previously trained Image-To-Image GAN within its training. Generative Adversarial Network GAN
80	FPGA acceleration of superpixel segmentation Östgren, Magnus January 2020 (has links) Superpixel segmentation is a preprocessing step for computer vision applications, where an image is split into segments referred to as superpixels. Then running the main algorithm on these superpixels reduces the number of data points processed in comparison to running the algorithm on pixels directly, while still keeping much of the same information. In this thesis, the possibility to run superpixel segmentation on an FPGA is researched. This has resulted in the development of a modified version of the algorithm SLIC, Simple Linear Iterative Clustering. An FPGA implementation of this algorithm has then been built in VHDL, it is designed as a pipeline, unrolling the iterations of SLIC. The designed algorithm shows a lot of potential and runs on real hardware, but more work is required to make the implementation more robust, and remove some visual artefacts. Embedded Systems Inbäddad systemteknik Signal Processing Signalbehandling

Search results