Global ETD Search

21	Online Learning for Robot Vision Öfjäll, Kristoffer January 2014 (has links) In tele-operated robotics applications, the primary information channel from the robot to its human operator is a video stream. For autonomous robotic systems however, a much larger selection of sensors is employed, although the most relevant information for the operation of the robot is still available in a single video stream. The issue lies in autonomously interpreting the visual data and extracting the relevant information, something humans and animals perform strikingly well. On the other hand, humans have great diculty expressing what they are actually looking for on a low level, suitable for direct implementation on a machine. For instance objects tend to be already detected when the visual information reaches the conscious mind, with almost no clues remaining regarding how the object was identied in the rst place. This became apparent already when Seymour Papert gathered a group of summer workers to solve the computer vision problem 48 years ago [35]. Articial learning systems can overcome this gap between the level of human visual reasoning and low-level machine vision processing. If a human teacher can provide examples of what to be extracted and if the learning system is able to extract the gist of these examples, the gap is bridged. There are however some special demands on a learning system for it to perform successfully in a visual context. First, low level visual input is often of high dimensionality such that the learning system needs to handle large inputs. Second, visual information is often ambiguous such that the learning system needs to be able to handle multi modal outputs, i.e. multiple hypotheses. Typically, the relations to be learned are non-linear and there is an advantage if data can be processed at video rate, even after presenting many examples to the learning system. In general, there seems to be a lack of such methods. This thesis presents systems for learning perception-action mappings for robotic systems with visual input. A range of problems are discussed, such as vision based autonomous driving, inverse kinematics of a robotic manipulator and controlling a dynamical system. Operational systems demonstrating solutions to these problems are presented. Two dierent approaches for providing training data are explored, learning from demonstration (supervised learning) and explorative learning (self-supervised learning). A novel learning method fullling the stated demands is presented. The method, qHebb, is based on associative Hebbian learning on data in channel representation. Properties of the method are demonstrated on a vision-based autonomously driving vehicle, where the system learns to directly map low-level image features to control signals. After an initial training period, the system seamlessly continues autonomously. In a quantitative evaluation, the proposed online learning method performed comparably with state of the art batch learning methods.
22	Global Pose Estimation from Aerial Images : Registration with Elevation Models Grelsson, Bertil January 2014 (has links) Over the last decade, the use of unmanned aerial vehicles (UAVs) has increased drastically. Originally, the use of these aircraft was mainly military, but today many civil applications have emerged. UAVs are frequently the preferred choice for surveillance missions in disaster areas, after earthquakes or hurricanes, and in hazardous environments, e.g. for detection of nuclear radiation. The UAVs employed in these missions are often relatively small in size which implies payload restrictions. For navigation of the UAVs, continuous global pose (position and attitude) estimation is mandatory. Cameras can be fabricated both small in size and light in weight. This makes vision-based methods well suited for pose estimation onboard these vehicles. It is obvious that no single method can be used for pose estimation in all dierent phases throughout a ight. The image content will be very dierent on the runway, during ascent, during ight at low or high altitude, above urban or rural areas, etc. In total, a multitude of pose estimation methods is required to handle all these situations. Over the years, a large number of vision-based pose estimation methods for aerial images have been developed. But there are still open research areas within this eld, e.g. the use of omnidirectional images for pose estimation is relatively unexplored. The contributions of this thesis are three vision-based methods for global egopositioning and/or attitude estimation from aerial images. The rst method for full 6DoF (degrees of freedom) pose estimation is based on registration of local height information with a geo-referenced 3D model. A dense local height map is computed using motion stereo. A pose estimate from navigation sensors is used as an initialization. The global pose is inferred from the 3D similarity transform between the local height map and the 3D model. Aligning height information is assumed to be more robust to season variations than feature matching in a single-view based approach. The second contribution is a method for attitude (pitch and roll angle) estimation via horizon detection. It is one of only a few methods in the literature that use an omnidirectional (sheye) camera for horizon detection in aerial images. The method is based on edge detection and a probabilistic Hough voting scheme. In a ight scenario, there is often some knowledge on the probability density for the altitude and the attitude angles. The proposed method allows this prior information to be used to make the attitude estimation more robust. The third contribution is a further development of method two. It is the very rst method presented where the attitude estimates from the detected horizon in omnidirectional images is rened through registration with the geometrically expected horizon from a digital elevation model. It is one of few methods where the ray refraction in the atmosphere is taken into account, which contributes to the highly accurate pose estimates. The attitude errors obtained are about one order of magnitude smaller than for any previous vision-based method for attitude estimation from horizon detection in aerial images.
23	Development and Evaluation of a Kinect based Bin-Picking System Mishra, Chintan, Khan, Zeeshan January 2015 (has links) No description available.
24	Evaluation of Optical Flow for Estimation of Liquid Glass Flow Velocity Rudin, Malin January 2021 (has links) In the glass wool industry, the molten glass flow is monitored for regulation purposes. Given the progress in the computer vision field, the current monitoring solution might be replaced by a camera based solution. The aim of this thesis is to investigate the possibility of using optical flow techniques for estimation of the molten glass flow displacement. Three glass melt flow datasets were recorded, as well as two additional melt flow datasets, using a NIR camera. The block matching techniques Full Search (FS) and Adaptive Rood Pattern Search (ARPS), as well as the local feature methods ORB and A-KAZE were considered. These four techniques were compared to RAFT, the state-of-the-art approach for optical flow estimation, using available pre-trained models, as well as an approach of using the tracking method ECO for the optical flow estimation. The methods have been evaluated using the metrics MAE, MSE, and SSIM to compare the warped flow to the target image. In addition, ground truth for 50 frames from each dataset was manually annotated as to use the optical flow metric End-Point Error. To investigate the computational complexity the average computational time per frame was calculated. The investigation found that RAFT does not perform well on the given data, due to the large displacements of the flows. For simulated displacements of up to about 100 pixels at full resolution, the performance is satisfactory, with results comparable to the traditional methods. Using ECO for optical flow estimation encounters similar problems as RAFT, where the large displacement proved challenging for the tracker. Simulating smaller motions of up to 60 pixels resulted in good performance, though computation time of the used implementation is much too high for a real-time implementation. The four traditional block matching and local feature approaches examined in this thesis outperform the state-of-the-art approaches. FS, ARPS, A-KAZE, and ORB all have similar performance on the glass flow datasets, whereas the block matching approaches fail on the alternative melt flow data as the template extraction approach is inadequate. The two local feature approaches, though working reasonably well on all datasets given full resolution, struggle to identify features on down-sampled data. This might be mitigated by fine-tuning the settings of the methods. Generally, ORB mostly outperforms A-KAZE with respect to the evaluation metrics, and is considerably faster.
25	Image-based fashion recommender systems : Considering Deep learning role in computer vision development shirkhani, shaghayegh January 2021 (has links) Fashion is perceived as a meaningful way of self-expressing that people use for different purposes. It seems to be an integral part of every person in modern societies, from everyday life to exceptional events and occasions. Fashionable products are highly demanded, and consequently, fashion is perceived as a desirable and profitable industry. Although this massive demand for fashion products provides an excellent opportunity for companies to invest in fashion-related sectors, it also faces different challenges in answering their customer needs. Fashion recommender systems have been introduced to address these needs. This thesis aims to provide deeper insight into the fashion recommender system domain by conducting a comprehensive literature review on more than 100 papers in this field focusing on image-based fashion recommender systems considering computer vision advancements. Justifying fashion domain-specific characteristics, the subtle notions of this domain and their relevancy have been conceptualized. Four main tasks in image-based fashion recommender systems have been recognized, including cloth-item retrievals, Complementary item recommendation, Outfit recommendation, and Capsule wardrobes. An evolvement trajectory of image-based fashion recommender systems concerning computer vision advancements has been illustrated consists of three main eras and the most recent developments. Finally, a comparison between traditional computer vision techniques and deep learning-based has been made. Although the main objective of this literature review was to perform a comprehensive, integrated overview of researches in this field, there is still a need for conducting further studies considering image-based fashion recommender systems from a more practical perspective.
26	Classification of black plastic granulates using computer vision / Classification of black plastic granulates using computer vision Persson, Anton, Dymne, Niklas January 2021 (has links) Pollution and climate change are some of the biggest challenges facing humanity. Moreover, for a sustainable future, recycling is needed. Plas- tic is a big part of the recycled material today, but there are problems that the recycling world is facing. The modern-day recycling facilities can handle plastics of all colours except black plastics. For this reason, most recycling companies have resorted to methods unaffected by colour, like the method used at Stena Nordic Recycling Central. The unawareness of the individual plastics causes the problem that Stena Nordic Recycling Central has to wait until an entire bag of plastic granulates has been run through the production line and sorted to test its purity using a chemistry method. Finding out if the electrostats divider settings are correct using this testing method is costly and causes many re-runs. If the divider set- ting is valid in an earlier state, it will save both time and the number of re-runs needed.This thesis aims to create a system that can classify different types of plas- tics by using image analysis. This thesis will explore two techniques to solve this problem. The two computer vision techniques will be the RGB method see 3.3.2 and machine learning see 3.3.4 using transfer learning with an AlexNet. The aim is the accuracy of at least 95% when classifying the plastics granulates.The Convolutional neural network used in this thesis is an AlexNet. The choice of method to further explore is decided in the method part of this thesis. The results of the computer vision method and RGB method were difficult to determine more about in section 4.2. It was not clear if one plastic was blacker than the other. This uncertainty and the fact that a Convolutional neural network takes more features than just RGB into a count, discussed in section 3.3, makes the computer vision method, Con- volutional neural network, a method to further explore in this thesis. The results gathered from the Convolutional neural network’s training was 95% accuracy in classifying the plastic granulates. A separate test is also needed to make sure the accuracy is close to the network accuracy. The result from the stand-alone test was 86.6% accurate, where the plastic- type Polystyrene had a subpar result of 73.3% and 100% accuracy when classifying Acrylonitrile butadiene styrene. The results from the Convo- lutional neural network show that black plastics could be classified using machine learning and could be an excellent solution for classifying and recycling black plastics if further research on the field is conducted.
27	Point Cloud Data Augmentation for Safe 3D Object Detection using Geometric Techniques Kapoor, Shrayash January 2021 (has links) Background: Autonomous navigation has become increasingly popular. This surge in popularity caused a lot of interest in sensor technologies, driving the cost of sensor technology down. This has resulted in increasing developments in deep learning for computer vision. There is, however, not a lot of available, adaptable research for directly performing data augmentation on point cloud data independent of the training process. This thesis focuses on the impact of point cloud augmentation techniques on 3D object detection quality. Objectives: The objectives of this thesis are to evaluate the efficiency of geometric data augmentation techniques for point cloud data. The identified techniques are then implemented on a 3D object detector, and the results obtained are then compared based on selected metrics. Methods: This thesis uses two literature reviews to find the appropriate point cloud techniques to implement for data augmentation and a 3D object detector to implement data augmentation. Subsequently, an experiment is performed to quantitatively discern how much improvement augmentation offers in the detection quality. Metrics used to compare the algorithms include precision, recall, average precision, mean average precision, memory usage and training time. Results: The literature review results indicate flipping, scaling, translation and rotation to be ideal candidates for performing geometric data augmentation and ComplexYOLO to be a capable detector for 3D object detection. Experimental results indicate that at the expense of some training time, the developed library "Aug3D" can boost the detection quality and results of the ComplexYOLO algorithm. Conclusions: After analysis of results, it was found that the implementation of geometric data augmentations (namely flipping, translation, scaling and rotation) yielded an increase of over 50% in the mean average precision for the performance of the ComplexYOLO 3D detection model on the Car and Pedestrian classes.
28	Hyperspectral Image Registration and Construction From Irregularly Sampled Data Freij, Hannes January 2021 (has links) Hyperspectral imaging based on the use of an exponentially variable filter gives the possibility to construct a lightweight hyperspectral sensor. The exponentially variable filter captures the whole spectral range in each image where each column captures a different wavelength. To gather the full spectrum for any given point in the image requires the fusion of several gathered images with movement in between captures. The construction of a hyperspectral cube requires registration of the gathered images. With a lightweight sensor comes the possibility to mount the hyperspectral sensor on an unmanned aerial vehicle to collect aerial footage. This thesis presents a registration algorithm capable of constructing a complete hyperspectral cube of almost any chosen area in the captured region. The thesis presents the result of a construction method using a multi-frame super-resolution algorithm trying to increase the spectral resolution and a spline interpolation method interpolating missing spectral data. The result of an algorithm trying to suggest the optimal spectral and spatial resolution before constructing the hyperspectral cube is also presented. Lastly, the result of an algorithm providing information about the quality of the constructed hyperspectral cube is also presented.
29	Domain Adaptation of Unreal Images for Image Classification / Domänöversättning av syntetiska bilder för bildklassificiering Thornström, Johan January 2019 (has links) Deep learning has been intensively researched in computer vision tasks like im-age classification. Collecting and labeling images that these neural networks aretrained on is labor-intensive, which is why alternative methods of collecting im-ages are of interest. Virtual environments allow rendering images and automaticlabeling, which could speed up the process of generating training data and re-duce costs.This thesis studies the problem of transfer learning in image classificationwhen the classifier has been trained on rendered images using a game engine andtested on real images. The goal is to render images using a game engine to createa classifier that can separate images depicting people wearing civilian clothingor camouflage. The thesis also studies how domain adaptation techniques usinggenerative adversarial networks could be used to improve the performance ofthe classifier. Experiments show that it is possible to generate images that canbe used for training a classifier capable of separating the two classes. However,the experiments with domain adaptation were unsuccessful. It is instead recom-mended to improve the quality of the rendered images in terms of features usedin the target domain to achieve better results.
30	Vehicle Detection, at a Distance : Done Efficiently via Fusion of Short- and Long-Range Images / Fordonsdetektion, på avstånd Luusua, Emil January 2020 (has links) Object detection is a classical computer vision task, encountered in many practical applications such as robotics and autonomous driving. The latter involves serious consequences of failure and a multitude of challenging demands, including high computational efficiency and detection accuracy. Distant objects are notably difficult to detect accurately due to their small scale in the image, consisting of only a few pixels. This is especially problematic in autonomous driving, as objects should be detected at the earliest possible stage to facilitate handling of hazardous situations. Previous work has addressed small objects via use of feature pyramids and super-resolution techniques, but the efficiency of such methods is limited as computational cost increases with image resolution. Therefore, a trade-off must be made between accuracy and cost. Opportunely though, a common characteristic of driving scenarios is the predominance of distant objects in the centre of the image. Thus, the full-frame image can be downsampled to reduce computational cost, and a crop can be extracted from the image centre to preserve resolution for distant vehicles. In this way, short- and long-range images are generated. This thesis investigates the fusion of such images in a convolutional neural network, particularly the fusion level, fusion operation, and spatial alignment. A novel framework — DetSLR — is proposed for the task and examined via the aforementioned aspects. Through adoption of the framework for the well-established SSD detector and MobileNetV2 feature extractor, it is shown that the framework significantly improves upon the original detector without incurring additional cost. The fusion level is shown to have great impact on the performance of the framework, favouring high-level fusion, while only insignificant differences exist between investigated fusion operations. Finally, spatial alignment of features is demonstrated to be a crucial component of the framework.

Search results