Global ETD Search

621	Visual control of multi-rotor UAVs Duncan, Stuart Johann Maxwell January 2014 (has links) Recent miniaturization of computer hardware, MEMs sensors, and high energy density batteries have enabled highly capable mobile robots to become available at low cost. This has driven the rapid expansion of interest in multi-rotor unmanned aerial vehicles. Another area which has expanded simultaneously is small powerful computers, in the form of smartphones, which nearly always have a camera attached, many of which now contain a OpenCL compatible graphics processing units. By combining the results of those two developments a low-cost multi-rotor UAV can be produced with a low-power onboard computer capable of real-time computer vision. The system should also use general purpose computer vision software to facilitate a variety of experiments. To demonstrate this I have built a quadrotor UAV based on control hardware from the Pixhawk project, and paired it with an ARM based single board computer, similar those in high-end smartphones. The quadrotor weights 980 g and has a flight time of 10 minutes. The onboard computer capable of running a pose estimation algorithm above the 10 Hz requirement for stable visual control of a quadrotor. A feature tracking algorithm was developed for efficient pose estimation, which relaxed the requirement for outlier rejection during matching. Compared with a RANSAC- only algorithm the pose estimates were less variable with a Z-axis standard deviation 0.2 cm compared with 2.4 cm for RANSAC. Processing time per frame was also faster with tracking, with 95 % confidence that tracking would process the frame within 50 ms, while for RANSAC the 95 % confidence time was 73 ms. The onboard computer ran the algorithm with a total system load of less than 25 %. All computer vision software uses the OpenCV library for common computer vision algorithms, fulfilling the requirement for running general purpose software. The tracking algorithm was used to demonstrate the capability of the system by per- forming visual servoing of the quadrotor (after manual takeoff). Response to external perturbations was poor however, requiring manual intervention to avoid crashing. This was due to poor visual controller tuning, and to variations in image acquisition and attitude estimate timing due to using free running image acquisition. The system, and the tracking algorithm, serve as proof of concept that visual control of a quadrotor is possible using small low-power computers and general purpose computer vision software. computer vision opencv quadrotor uav multirotor feature matching ransac
622	Human Action Recognition on Videos: Different Approaches Mejia, Maria Helena January 2012 (has links) The goal of human action recognition on videos is to determine in an automatic way what is happening in a video. This work focuses on providing an answer to this question: given consecutive frames from a video where a person or persons are doing an action, is an automatic system able to recognize the action that is going on for each person? Seven approaches have been provided, most of them based on an alignment process in order to find a measure of distance or similarity for obtaining the classification. Some are based on fluents that are converted to qualitative sequences of Allen relations to make it possible to measure the distance between the pair of sequences by aligning them. The fluents are generated in various ways: representation based on feature extraction of human pose propositions in just an image or a small sequence of images, changes of time series mainly on the angle of slope, changes of the time series focus on the slope direction, and propositions based on symbolic sequences generated by SAX. Another approach based on alignment corresponds to Dynamic Time Warping on subsets of highly dependent parts of the body. An additional approach explored is based on SAX symbolic sequences and respective pair wise alignment. The last approach is based on discretization of the multivariate time series, but instead of alignment, a spectrum kernel and SVM are used as is employed to classify protein sequences in biology. Finally, a sliding window method is used to recognize the actions along the video. These approaches were tested on three datasets derived from RGB-D cameras (e.g., Microsoft Kinect) as well as ordinary video, and a selection of the approaches was compared to the results of other researchers. Gesture Recognition Machine learning Computer Science Activity recognition Computer vision
623	Evaluation of online hardware video stabilization on a moving platform / Utvärdering av hårdvarustabilisering av video i realtid på rörlig plattform Gratorp, Eric January 2013 (has links) Recording a video sequence with a camera during movement often produces blurred results. This is mainly due to motion blur which is caused by rapid movement of objects in the scene or the camera during recording. By correcting for changes in the orientation of the camera, caused by e.g. uneven terrain, it is possible to minimize the motion blur and thus, produce a stabilized video. In order to do this, data gathered from a gyroscope and the camera itself can be used to measure the orientation of the camera. The raw data needs to be processed, synchronized and filtered to produce a robust estimate of the orientation. This estimate can then be used as input to some automatic control system in order to correct for changes in the orientation This thesis focuses on examining the possibility of such a stabilization. The actual stabilization is left for future work. An evaluation of the hardware as well as the implemented methods are done with emphasis on speed, which is crucial in real time computing. / En videosekvens som spelas in under rörelse blir suddig. Detta beror främst på rörelseoskärpa i bildrutorna orsakade av snabb rörelse av objekt i scenen eller av kameran själv. Genom att kompensera för ändringar i kamerans orientering, orsakade av t.ex. ojämn terräng, är det möjligt att minimera rörelseoskärpan och på så sätt stabilisera videon. För att åstadkomma detta används data från ett gyroskop och kameran i sig för att skatta kamerans orientering. Den insamlade datan behandlas, synkroniseras och filtreras för att få en robust skattning av orienteringen. Denna orientering kan sedan användas som insignal till ett reglersystem för att kompensera för ändringar i kamerans orientering. Denna avhandling undersöker möjligheten för en sådan stabilisering. Den faktiska stabiliseringen lämnas till framtida arbete. Hårdvaran och de implementerade metoderna utvärderas med fokus på beräkningshastighet, som är kritiskt inom realtidssystem. video stabilization computer vision sensor fusion orientation estimation
624	Shot classification in broadcast soccer video. Guimaraes, Lionel. January 2013 (has links) Event understanding systems, responsible for automatically generating human relatable event descriptions from video sequences, is an open problem in computer vision research that has many applications in the sports domain, such as indexing and retrieval systems for sports video. Background modelling and shot classification of broadcast video are important steps in event understanding in video sequences. Shot classification seeks to identify shots, i.e. the labelling of continuous frame sequences captured by a single camera action such as long shot, close-up and audience shot, while background modelling seeks to classify pixels in an image as foreground/background. Many features used for shot classification are built upon the background model therefore background modelling is an essential part of shot classification. This dissertation reports on an investigation into techniques and procedures for background modelling and classification of shots in broadcast soccer videos. Broadcast video refers to video which would typically be viewed by a person at home on their television set and imposes constraints that are often not considered in many approaches to event detection. In this work we analyse the performances of two background modelling techniques appropriate for broadcast video, the colour distance model and Gaussian mixture model. The performance of the background models depends on correctly set parameters. Some techniques offer better updating schemes and thus adapt better to the changing conditions of a game, some are shown to be more robust to changes in broadcast technique and are therefore of greater value in shot classification. Our results show the colour distance model slightly outperformed the Gaussian mixture model with both techniques performing similar to those found in literature. Many features useful for shot classification are proposed in the literature. This dissertation identifies these features and presents a detailed analysis and comparison of various features appropriate for shot classification in broadcast soccer video. Once a feature set is established, a classifier is required to determine a shot class based on the extracted features. We establish the best use of the feature set and decision tree parameters that result in the best performance and then use a combined feature set to train a neural network to classify shots. The combined feature set in conjunction with the neural network classifier proved effective in classifying shots and in some situations outperformed those techniques found in literature. / Thesis (M.Sc.)-University of KwaZulu-Natal, Durban, 2012. Pattern recognition systems. Computer vision. Image processing. Theses--Computer science.
625	A Probabilistic Approach to Image Feature Extraction, Segmentation and Interpretation Pal, Chris January 2000 (has links) This thesis describes a probabilistic approach to imagesegmentation and interpretation. The focus of the investigation is the development of a systematic way of combining color, brightness, texture and geometric features extracted from an image to arrive at a consistent interpretation for each pixel in the image. The contribution of this thesis is thus the presentation of a novel framework for the fusion of extracted image features producing a segmentation of an image into relevant regions. Further, a solution to the sub-pixel mixing problem is presented based on solving a probabilistic linear program. This work is specifically aimed at interpreting and digitizing multi-spectral aerial imagery of the Earth's surface. The features of interest for extraction are those of relevance to environmental management, monitoring and protection. The presented algorithms are suitable for use within a larger interpretive system. Some results are presented and contrasted with other techniques. The integration of these algorithms into a larger system is based firmly on a probabilistic methodology and the use of statistical decision theory to accomplish uncertain inference within the visual formalism of a graphical probability model. Mathematics Computer Vision Image Segmentation Image Interpretation Graphical Probability Models
626	Learning generative models of mid-level structure in natural images Heess, Nicolas Manfred Otto January 2012 (has links) Natural images arise from complicated processes involving many factors of variation. They reflect the wealth of shapes and appearances of objects in our three-dimensional world, but they are also affected by factors such as distortions due to perspective, occlusions, and illumination, giving rise to structure with regularities at many different levels. Prior knowledge about these regularities and suitable representations that allow efficient reasoning about the properties of a visual scene are important for many image processing and computer vision tasks. This thesis focuses on models of image structure at intermediate levels of complexity as required, for instance, for image inpainting or segmentation. It aims at developing generative, probabilistic models of this kind of structure, and, in particular, at devising strategies for learning such models in a largely unsupervised manner from data. One hallmark of natural images is that they can often be decomposed into regions with very different visual characteristics. The main approach of this thesis is therefore to represent images in terms of regions that are characterized by their shapes and appearances, and an image is then composed from many such regions. We explore approaches to learn about the appearance of regions, to learn about region shapes, and ways to combine several regions to form a full image. To achieve this goal, we make use of some ideas for unsupervised learning developed in the literature on models of low-level image structure and in the “deep learning” literature. These models are used as building blocks of more structured model formulations that incorporate additional prior knowledge of how images are formed. The thesis makes the following contributions: Firstly, we investigate a popular, MRF based prior of natural image structure, the Field-of Experts, with respect to its ability to model image textures, and propose an extended formulation that is considerably more successful at this task. This formulation gives rise to a fully parametric, translation-invariant probabilistic generative model of image textures. We illustrate how this model can be used as a component of a more comprehensive model of images comprising multiple textured regions. Secondly, we develop a model of region shape. This work is an extension of the “Masked Restricted Boltzmann Machine” proposed by Le Roux et al. (2011) and it allows explicit reasoning about the independent shapes and relative depths of occluding objects. We develop an inference and unsupervised learning scheme and demonstrate how this shape model, in combination with the masked RBM gives rise to a good model of natural image patches. Finally, we demonstrate how this model of region shape can be extended to model shapes in large images. The result is a generative model of large images which are formed by composition from many small, partially overlapping and occluding objects. 006.3
627	Investigations of an "Objectness" Measure for Object Localization Coates, Lewis Richard James 18 May 2016 (has links) Object localization is the task of locating objects in an image, typically by finding bounding boxes that isolate those objects. Identifying objects in images that have not had regions of interest labeled by humans often requires object localization to be performed first. The sliding window method is a common naïve approach, wherein the image is covered with bounding boxes of different sizes that form windows in the image. An object classifier is then run on each of these windows to determine if each given window contains a given object. However, because object classification algorithms tend to be computationally expensive, it is helpful to have an effective filter to reduce the number of times those classifiers have to be run. In this thesis I evaluate one promising approach to object localization: the objectness algorithm proposed by Alexe et al. Specifically, I verify the results given by Alexe et al., and further explore the weaknesses and strengths of their "objectness" Computer vision Pattern recognition systems Image processing Computer Sciences
628	Choosing where to go : mobile robot exploration Shade, Robert J. January 2011 (has links) For a mobile robot to engage in exploration of a-priori unknown environments it must be able to identify locations which will yield new information when visited. This thesis presents two novel algorithms which attempt to answer the question of choosing where a robot should go next in a partially explored workspace. To begin we describe the process of acquiring highly accurate dense 3D data from a stereo camera. This approach combines techniques from a number of existing implementations and is demonstrated to be more accurate than a range of commercial offerings. Combined with state of the art visual odometry based pose estimation we can use these point clouds to drive exploration. The first exploration algorithm we present is an attempt to represent the three dimensional world as a continuous two dimensional surface. The surface is maintained as a planar graph structure in which vertices correspond to points in space as seen by the stereo camera. Edges connect vertices which have been seen as adjacent pixels in a stereo image pair, and have a weight equal to the Euclidean distance between the end points. Discontinuities in the input stereo data manifest as areas of the graph with high average edge weight, and by moving the camera to view such areas and merging the new scan with the existing graph, we push back the boundary of the explored workspace. Motivated by scaling and precision problems with the graph-based method, we present a second exploration algorithm based on continuum methods. We show that by solving Laplace’s equation over the freespace of the partially explored environment, we can guide exploration by following streamlines in the resulting vector field. Choosing appropriate boundary conditions ensures that these streamlines run parallel to obstacles and are guaranteed to lead to a frontier – a boundary between explored and unexplored space. Results are shown which demonstrate this method fully exploring three dimensional environments and outperforming oft-used information gain based approaches. We show how analysis of the potential field solution can be used to identify volumes of the workspace which have been fully explored, thus reducing future computation. 629.8932
629	Strukturální rozpoznávání fasád / Structural recognition of facades Dobiaš, Martin January 2010 (has links) We investigate a method for interpretation of facades from single images. The emphasis is on the separation of knowledge about facade structure and detection of facade elements. The interpretation task is formulated as a Bayesian inference problem of nding maximum a posteriori estimate. A stochastic model that encompasses the structural knowledge about facade elements is presented and an it is used together with an integrated classi er to determine the correct positions of facade elements. We construct a Markov chain Monte Carlo sampler that solves the problem. Various improvements of the model and sampling algorithm are discussed. Finally, we propose a more general approach for structural recognition using context-free grammars that could be used for other computer vision tasks.
630	Obstacle detection for image-guided surface water navigation Sadhu, Tanmana 09 September 2016 (has links) An issue of concern for maritime safety when operating a small to medium-sized sailboat is that the presence of hazards in the navigational route in the form of floating logs can lead to a severe collision if undetected. As a precautionary measure to prevent such a collision with a log, a 2D vision-based detection system is proposed. We take a combined approach involving predictive mapping by linear regression and saliency detection. This approach is found to overcome specific issues related to the illumination changes and unstructured environment in the dataset. The proposed method has been evaluated using precision and recall measures. This proof of concept demonstrates the potential of the method for deployment on a real-time onboard detection system. The algorithm is robust and of reasonable computational complexity. / Graduate image processing computer vision autonomous navigation saliency linear regression

Search results