• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 120
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 181
  • 181
  • 93
  • 68
  • 45
  • 33
  • 32
  • 30
  • 28
  • 28
  • 27
  • 26
  • 25
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Bring Your Body into Action : Body Gesture Detection, Tracking, and Analysis for Natural Interaction

Abedan Kondori, Farid January 2014 (has links)
Due to the large influx of computers in our daily lives, human-computer interaction has become crucially important. For a long time, focusing on what users need has been critical for designing interaction methods. However, new perspective tends to extend this attitude to encompass how human desires, interests, and ambitions can be met and supported. This implies that the way we interact with computers should be revisited. Centralizing human values rather than user needs is of the utmost importance for providing new interaction techniques. These values drive our decisions and actions, and are essential to what makes us human. This motivated us to introduce new interaction methods that will support human values, particularly human well-being. The aim of this thesis is to design new interaction methods that will empower human to have a healthy, intuitive, and pleasurable interaction with tomorrow’s digital world. In order to achieve this aim, this research is concerned with developing theories and techniques for exploring interaction methods beyond keyboard and mouse, utilizing human body. Therefore, this thesis addresses a very fundamental problem, human motion analysis. Technical contributions of this thesis introduce computer vision-based, marker-less systems to estimate and analyze body motion. The main focus of this research work is on head and hand motion analysis due to the fact that they are the most frequently used body parts for interacting with computers. This thesis gives an insight into the technical challenges and provides new perspectives and robust techniques for solving the problem.
32

AUV SLAM constraint formation using side scan sonar / AUV SLAM Begränsningsbildning med hjälp av sidescan sonar

Schouten, Marco January 2022 (has links)
Autonomous underwater vehicle (AUV) navigation has been a challenging problem for a long time. Navigation is challenging due to the drift present in underwater environments and the lack of precise localisation systems such as GPS. Therefore, the uncertainty of the vehicle’s pose grows with the mission’s duration. This research investigates methods to form constraints on the vehicle’s pose throughout typical surveys. Current underwater navigation relies on acoustic sensors. Side Scan Sonar (SSS) is cheaper than Multibeam echosounder (MBES) but can generate 2D intensity images of wide sections of the seafloor instead of 3D representations. The methodology consists in extracting information from pairs of side-scan sonar images representing overlapping portions of the seafloor and computing the sensor pose transformation between the two reference frames of the image to generate constraints on the pose. The chosen approach relies on optimisation methods within a Simultaneous Localisation and Mapping (SLAM) framework to directly correct the trajectory and provide the best estimate of the AUV pose. I tested the optimisation system on simulated data to evaluate the proof of concept. Lastly, as an experiment trial, I tested the implementation on an annotated dataset containing overlapping side-scan sonar images provided by SMaRC. The simulated results indicate that AUV pose error can be reduced by optimisation, even with various noise levels in the measurements. / Navigering av autonoma undervattensfordon (AUV) har länge varit ett utmanande problem. Navigering är en utmaning på grund av den drift som förekommer i undervattensmiljöer och bristen på exakta lokaliseringssystem som GPS. Därför ökar osäkerheten i fråga om fordonets position med uppdragets längd. I denna forskning undersöks metoder för att skapa begränsningar för fordonets position under typiska undersökningar. Nuvarande undervattensnavigering bygger på akustiska sensorer. Side Scan Sonar (SSS) är billigare än Multibeam echosounder (MBES) men kan generera 2D-intensitetsbilder av stora delar av havsbotten i stället för 3D-bilder. Metoden består i att extrahera information från par av side-scan sonarbilder som representerar överlappande delar av havsbotten och beräkna sensorns posetransformation mellan bildens två referensramar för att generera begränsningar för poset. Det valda tillvägagångssättet bygger på optimeringsmetoder inom en SLAM-ram (Simultaneous Localisation and Mapping) för att direkt korrigera banan och ge den bästa uppskattningen av AUV:s position. Jag testade optimeringssystemet på simulerade data för att utvärdera konceptet. Slutligen testade jag genomförandet på ett annoterat dataset med överlappande side-scan sonarbilder från SMaRC. De simulerade resultaten visar att AUV:s poseringsfel kan minskas genom optimering, även med olika brusnivåer i mätningarna.
33

Channel-Coded Feature Maps for Computer Vision and Machine Learning

Jonsson, Erik January 2008 (has links)
This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function. The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated. This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented. Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration. Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed. All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system.
34

Enhancing mobile camera pose estimation through the inclusion of sensors

Hughes, Lloyd Haydn 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: Monocular structure from motion (SfM) is a widely researched problem, however many of the existing approaches prove to be too computationally expensive for use on mobile devices. In this thesis we investigate how inertial sensors can be used to increase the performance of SfM algorithms on mobile devices. Making use of the low cost inertial sensors found on most mobile devices we design and implement an extended Kalman filter (EKF) to exploit their complementary nature, in order to produce an accurate estimate of the attitude of the device. We make use of a quaternion based system model in order to linearise the measurement stage of the EKF, thus reducing its computational complexity. We use this attitude estimate to enhance the feature tracking and camera localisation stages in our SfM pipeline. In order to perform feature tracking we implement a hybrid tracking algorithm which makes use of Harris corners and an approximate nearest neighbour search to reduce the search space for possible correspondences. We increase the robustness of this approach by using inertial information to compensate for inter-frame camera rotation. We further develop an efficient bundle adjustment algorithm which only optimises the pose of the previous three key frames and the 3D map points common between at least two of these frames. We implement an optimisation based localisation algorithm which makes use of our EKF attitude estimate and the tracked features, in order to estimate the pose of the device relative to the 3D map points. This optimisation is performed in two steps, the first of which optimises only the translation and the second optimises the full pose. We integrate the aforementioned three sub-systems into an inertial assisted pose estimation pipeline. We evaluate our algorithms with the use of datasets captured on the iPhone 5 in the presence of a Vicon motion capture system for ground truth data. We find that our EKF can estimate the device’s attitude with an average dynamic accuracy of ±5°. Furthermore, we find that the inclusion of sensors into the visual pose estimation pipeline can lead to improvements in terms of robustness and computational efficiency of the algorithms and are unlikely to negatively affect the accuracy of such a system. Even though we managed to reduce execution time dramatically, compared to typical existing techniques, our full system is found to still be too computationally expensive for real-time performance and currently runs at 3 frames per second, however the ever improving computational power of mobile devices and our described future work will lead to improved performance. From this study we conclude that inertial sensors make a valuable addition into a visual pose estimation pipeline implemented on a mobile device. / AFRIKAANSE OPSOMMING: Enkel-kamera struktuur-vanaf-beweging (structure from motion, SfM) is ’n bekende navorsingsprobleem, maar baie van die bestaande benaderings is te berekeningsintensief vir gebruik op mobiele toestelle. In hierdie tesis ondersoek ons hoe traagheidsensors gebruik kan word om die prestasie van SfM algoritmes op mobiele toestelle te verbeter. Om van die lae-koste traagheidsensors wat op meeste mobiele toestelle gevind word gebruik te maak, ontwerp en implementeer ons ’n uitgebreide Kalman filter (extended Kalman filter, EKF) om hul komplementêre geaardhede te ontgin, en sodoende ’n akkurate skatting van die toestel se postuur te verkry. Ons maak van ’n kwaternioon-gebaseerde stelselmodel gebruik om die meetstadium van die EKF te lineariseer, en so die berekeningskompleksiteit te verminder. Hierdie afskatting van die toestel se postuur word gebruik om die fases van kenmerkvolging en kameralokalisering in ons SfM proses te verbeter. Vir kenmerkvolging implementeer ons ’n hibriede volgingsalgoritme wat gebruik maak van Harris-hoekpunte en ’n benaderde naaste-buurpunt-soektog om die soekruimte vir moontlike ooreenstemmings te verklein. Ons verhoog die robuustheid van hierdie benadering, deur traagheidsinligting te gebruik om vir kamerarotasies tussen raampies te kompenseer. Verder ontwikkel ons ’n doeltreffende bondelaanpassingsalgoritme wat slegs optimeer oor die vorige drie sleutelraampies, en die 3D punte gemeenskaplik tussen minstens twee van hierdie raampies. Ons implementeer ’n optimeringsgebaseerde lokaliseringsalgoritme, wat gebruik maak van ons EKF se postuurafskatting en die gevolgde kenmerke, om die posisie en oriëntasie van die toestel relatief tot die 3D punte in die kaart af te skat. Die optimering word in twee stappe uitgevoer: eerstens net oor die kamera se translasie, en tweedens oor beide die translasie en rotasie. Ons integreer die bogenoemde drie sub-stelsels in ’n pyplyn vir postuurafskatting met behulp van traagheidsensors. Ons evalueer ons algoritmes met die gebruik van datastelle wat met ’n iPhone 5 opgeneem is, terwyl dit in die teenwoordigheid van ’n Vicon bewegingsvasleggingstelsel was (vir die gelyktydige opneming van korrekte postuurdata). Ons vind dat die EKF die toestel se postuur kan afskat met ’n gemiddelde dinamiese akkuraatheid van ±5°. Verder vind ons dat die insluiting van sensors in die visuele postuurafskattingspyplyn kan lei tot verbeterings in terme van die robuustheid en berekeningsdoeltreffendheid van die algoritmes, en dat dit waarskynlik nie die akkuraatheid van so ’n stelsel negatief beïnvloed nie. Al het ons die uitvoertyd drasties verminder (in vergelyking met tipiese bestaande tegnieke) is ons volledige stelsel steeds te berekeningsintensief vir intydse verwerking op ’n mobiele toestel en hardloop tans teen 3 raampies per sekonde. Die voortdurende verbetering van mobiele toestelle se berekeningskrag en die toekomstige werk wat ons beskryf sal egter lei tot ’n verbetering in prestasie. Uit hierdie studie kan ons aflei dat traagheidsensors ’n waardevolle toevoeging tot ’n visuele postuurafskattingspyplyn kan maak.
35

Face pose estimation in monocular images

Shafi, Muhammad January 2010 (has links)
People use orientation of their faces to convey rich, inter-personal information. For example, a person will direct his face to indicate who the intended target of the conversation is. Similarly in a conversation, face orientation is a non-verbal cue to listener when to switch role and start speaking, and a nod indicates that a person has understands, or agrees with, what is being said. Further more, face pose estimation plays an important role in human-computer interaction, virtual reality applications, human behaviour analysis, pose-independent face recognition, driver s vigilance assessment, gaze estimation, etc. Robust face recognition has been a focus of research in computer vision community for more than two decades. Although substantial research has been done and numerous methods have been proposed for face recognition, there remain challenges in this field. One of these is face recognition under varying poses and that is why face pose estimation is still an important research area. In computer vision, face pose estimation is the process of inferring the face orientation from digital imagery. It requires a serious of image processing steps to transform a pixel-based representation of a human face into a high-level concept of direction. An ideal face pose estimator should be invariant to a variety of image-changing factors such as camera distortion, lighting condition, skin colour, projective geometry, facial hairs, facial expressions, presence of accessories like glasses and hats, etc. Face pose estimation has been a focus of research for about two decades and numerous research contributions have been presented in this field. Face pose estimation techniques in literature have still some shortcomings and limitations in terms of accuracy, applicability to monocular images, being autonomous, identity and lighting variations, image resolution variations, range of face motion, computational expense, presence of facial hairs, presence of accessories like glasses and hats, etc. These shortcomings of existing face pose estimation techniques motivated the research work presented in this thesis. The main focus of this research is to design and develop novel face pose estimation algorithms that improve automatic face pose estimation in terms of processing time, computational expense, and invariance to different conditions.
36

Stereo Camera Pose Estimation to Enable Loop Detection / Estimering av kamera-pose i stereo för att återupptäcka besökta platser

Ringdahl, Viktor January 2019 (has links)
Visual Simultaneous Localization And Mapping (SLAM) allows for three dimensionalreconstruction from a camera’s output and simultaneous positioning of the camera withinthe reconstruction. With use cases ranging from autonomous vehicles to augmentedreality, the SLAM field has garnered interest both commercially and academically. A SLAM system performs odometry as it estimates the camera’s movement throughthe scene. The incremental estimation of odometry is not error free and exhibits driftover time with map inconsistencies as a result. Detecting the return to a previously seenplace, a loop, means that this new information regarding our position can be incorporatedto correct the trajectory retroactively. Loop detection can also facilitate relocalization ifthe system loses tracking due to e.g. heavy motion blur. This thesis proposes an odometric system making use of bundle adjustment within akeyframe based stereo SLAM application. This system is capable of detecting loops byutilizing the algorithm FAB-MAP. Two aspects of this system is evaluated, the odometryand the capability to relocate. Both of these are evaluated using the EuRoC MAV dataset,with an absolute trajectory RMS error ranging from 0.80 m to 1.70 m for the machinehall sequences. The capability to relocate is evaluated using a novel methodology that intuitively canbe interpreted. Results are given for different levels of strictness to encompass differentuse cases. The method makes use of reprojection of points seen in keyframes to definewhether a relocalization is possible or not. The system shows a capability to relocate inup to 85% of all cases when a keyframe exists that can project 90% of its points intothe current view. Errors in estimated poses were found to be correlated with the relativedistance, with errors less than 10 cm in 23% to 73% of all cases. The evaluation of the whole system is augmented with an evaluation of local imagedescriptors and pose estimation algorithms. The descriptor SIFT was found to performbest overall, but demanding to compute. BRISK was deemed the best alternative for afast yet accurate descriptor. Conclusions that can be drawn from this thesis is that FAB-MAP works well fordetecting loops as long as the addition of keyframes is handled appropriately.
37

From shape-based object recognition and discovery to 3D scene interpretation

Payet, Nadia 12 May 2011 (has links)
This dissertation addresses a number of inter-related and fundamental problems in computer vision. Specifically, we address object discovery, recognition, segmentation, and 3D pose estimation in images, as well as 3D scene reconstruction and scene interpretation. The key ideas behind our approaches include using shape as a basic object feature, and using structured prediction modeling paradigms for representing objects and scenes. In this work, we make a number of new contributions both in computer vision and machine learning. We address the vision problems of shape matching, shape-based mining of objects in arbitrary image collections, context-aware object recognition, monocular estimation of 3D object poses, and monocular 3D scene reconstruction using shape from texture. Our work on shape-based object discovery is the first to show that meaningful objects can be extracted from a collection of arbitrary images, without any human supervision, by shape matching. We also show that a spatial repetition of objects in images (e.g., windows on a building facade, or cars lined up along a street) can be used for 3D scene reconstruction from a single image. The aforementioned topics have never been addressed in the literature. The dissertation also presents new algorithms and object representations for the aforementioned vision problems. We fuse two traditionally different modeling paradigms Conditional Random Fields (CRF) and Random Forests (RF) into a unified framework, referred to as (RF)^2. We also derive theoretical error bounds of estimating distribution ratios by a two-class RF, which is then used to derive the theoretical performance bounds of a two-class (RF)^2. Thorough experimental evaluation of individual aspects of all our approaches is presented. In general, the experiments demonstrate that we outperform the state of the art on the benchmark datasets, without increasing complexity and supervision in training. / Graduation date: 2011 / Access restricted to the OSU Community at author's request from May 12, 2011 - May 12, 2012
38

Inferring 3D Structure with a Statistical Image-Based Shape Model

Grauman, Kristen, Shakhnarovich, Gregory, Darrell, Trevor 17 April 2003 (has links)
We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
39

Robust and Efficient 3D Recognition by Alignment

Alter, Tao Daniel 01 September 1992 (has links)
Alignment is a prevalent approach for recognizing 3D objects in 2D images. A major problem with current implementations is how to robustly handle errors that propagate from uncertainties in the locations of image features. This thesis gives a technique for bounding these errors. The technique makes use of a new solution to the problem of recovering 3D pose from three matching point pairs under weak-perspective projection. Furthermore, the error bounds are used to demonstrate that using line segments for features instead of points significantly reduces the false positive rate, to the extent that alignment can remain reliable even in cluttered scenes.
40

Stereo-Based Head Pose Tracking Using Iterative Closest Point and Normal Flow Constraint

Morency, Louis-Philippe 01 May 2003 (has links)
In this text, we present two stereo-based head tracking techniques along with a fast 3D model acquisition system. The first tracking technique is a robust implementation of stereo-based head tracking designed for interactive environments with uncontrolled lighting. We integrate fast face detection and drift reduction algorithms with a gradient-based stereo rigid motion tracking technique. Our system can automatically segment and track a user's head under large rotation and illumination variations. Precision and usability of this approach are compared with previous tracking methods for cursor control and target selection in both desktop and interactive room environments. The second tracking technique is designed to improve the robustness of head pose tracking for fast movements. Our iterative hybrid tracker combines constraints from the ICP (Iterative Closest Point) algorithm and normal flow constraint. This new technique is more precise for small movements and noisy depth than ICP alone, and more robust for large movements than the normal flow constraint alone. We present experiments which test the accuracy of our approach on sequences of real and synthetic stereo images. The 3D model acquisition system we present quickly aligns intensity and depth images, and reconstructs a textured 3D mesh. 3D views are registered with shape alignment based on our iterative hybrid tracker. We reconstruct the 3D model using a new Cubic Ray Projection merging algorithm which takes advantage of a novel data structure: the linked voxel space. We present experiments to test the accuracy of our approach on 3D face modelling using real-time stereo images.

Page generated in 0.3285 seconds