Global ETD Search

21	Exploring the Feasibility of Machine Learning Techniques in Recognizing Complex Human Activities Hu, Shengnan 01 January 2023 (has links) (PDF) This dissertation introduces several technical innovations that improve the ability of machine learning models to recognize a wide range of complex human activities. As human sensor data becomes more abundant, the need to develop algorithms for understanding and interpreting complex human actions has become increasingly important. Our research focuses on three key areas: multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. To tackle the problem of monitoring coordinated team activities from spatio-temporal traces, we introduce a new framework that incorporates field of view data to predict team performance. Our framework uses Spatial Temporal Graph Convolutional Networks (ST-GCN) and recurrent neural network layers to capture and model the dynamic spatial relationships between agents. The second part of the dissertation addresses the problem of multi-person pose estimation (MPPE) from video data. Our proposed technique (Language Assisted Multi-person Pose estimation) leverages text representations from multimodal foundation models to learn a visual representation that is more robust to occlusion. By infusing semantic information into pose estimation, our approach enables precise estimations, even in cluttered scenes. The final part of the dissertation examines the problem of fusing multimodal physiological input from cardiovascular and gaze tracking sensors to exploit the complementary nature of these modalities. When dealing with multimodal features, uncovering the correlations between different modalities is as crucial as identifying effective unimodal features. This dissertation introduces a hybrid multimodal tensor fusion network that is effective at learning both unimodal and bimodal dynamics. The outcomes of this dissertation contribute to advancing the field of complex human activity recognition by addressing the challenges associated with multi-agent activity recognition, multi-person pose estimation, and multimodal fusion. The proposed innovations have potential applications in various domains, including video surveillance, human-robot interaction, sports analysis, and healthcare monitoring. By developing intelligent systems capable of accurately recognizing complex human activities, this research paves the way for improved safety, efficiency, and decision-making in a wide range of real-world applications. Machine Learning Human Activities Pose Estimation Computer Sciences
22	A Comparison of Two-Dimensional Pose Estimation Algorithms Based on Natural Features Korte, Christopher M. 23 September 2011 (has links) No description available. Aerospace Materials Pose-Estimation Image Processing Model Matching
23	3D Deep Learning for Object-Centric Geometric Perception Li, Xiaolong 30 June 2022 (has links) Object-centric geometric perception aims at extracting the geometric attributes of 3D objects. These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images. Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation. To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective. / Doctor of Philosophy / 3D sensors these days are widely equipped on various mobile devices like a depth camera on iPhone, or laser LiDAR sensors on an autonomous driving vehicle. These 3D sensing techniques could help us get accurate measurements of the 3D world. For the field of machine intel- ligence, we also want to build intelligent system and algorithm to learn useful information and understand the 3D world better. We human beings have the incredible ability to sense and understand this 3D world through our visual or tactile system. For example, humans could infer the geometry structure and arrangement of furniture in a room without seeing the full room, we are able to track an 3D object no matter how its appearance, shape and scale changes, we could also predict the future motion of multiple objects based on sequential observation and complex reasoning. Here my work designs various frameworks to learn such 3D information from geometric data represented by a lot of 3D points, which achieves fine-grained geometric understanding of individual objects, and we can help machine tell the target objects' geometry, states, and dynamics. The work in this dissertation serves as building blocks towards a better understanding of this dynamic world. point cloud pose estimation equivariance motion forecasting shape completion
24	Enhancing Online Yoga Instruction: Evaluating the Effectiveness of Visual Augmentations for Performance Assessment Gopal, Ajit Ayyadurai 23 October 2024 (has links) Yoga is a mind-body practice known for its substantial psychological and physiological benefit, contributing to a healthy lifestyle. However, without professional guidance, individuals may experience reduced performance and increased risk of injury. While online yoga classes on platforms like Zoom have grown in popularity, tools to support instructors in accurately assessing and monitoring student performance remain insufficient. For certain populations, this lack of real-time professional guidance poses safety risks and limits the effectiveness of the practice. This study examined the effectiveness of using computer-vision-based visual augmentations in enhancing instructors' ability to assess student performance and ensure safety. Specifically, we investigated the effectiveness of various visual augmentations in aiding instructors' visual search for unstable or unsafe poses. Eleven certified yoga instructors (8 female, 3 male), each holding 200 to 500 RYT certifications, participated in the study. Instructors completed eight trials assessing 12 yoga poses using four different visual augmentations—Raw Video, Skeleton (joint locations overlay), Contour (participant outlines), and Contour + Skeleton—across two camera views (Single vs. Multiple Views). During each trial, eye-tracking data was collected as instructors identified potentially unstable (unsafe) poses, and they subsequently completed a usability questionnaire and NASA - TLX rating. Upon finishing all trials, instructors provided overall feedback on the usability of the visual augmentations and camera views Instructors showed no significant difference in their assessment performance across different visual augmentations and camera views. The Skeleton augmentation led to increased cognitive workload, as indicated by larger pupil diameters. The Contour alone augmentation was less effective for visual search based on the usability ratings, and combining Contour with Skeleton did not offer notable improvements. Simpler visualizations, such as Raw and Skeleton, received higher usability ratings, and instructors preferred Single View layouts over Multiple Views for their ease of use and lower cognitive demand. In conclusion, while Skeleton augmentation increased cognitive load, it did not significantly enhance visual search performance. Future research should explore alternative visual augmentation techniques and configurations to better assist instructors on performance assessment which increases overall performance while not substantially increasing cognitive workload. / Master of Science / Yoga is a great way to improve both mental and physical health. However, practicing yoga without proper guidance can sometimes lead to injuries or mistakes. With more people attending yoga classes online, like through Zoom, it's harder for instructors to closely monitor how their students are performing, which can reduce the safety and benefits of the practice. This study looked at whether certain computer tools could help instructors better see and correct their students' poses during online yoga classes. Eleven experienced yoga instructors tried out different visual aids while watching students perform yoga poses. These aids included a simple video, a video with lines showing where the students' joints were (called Skeleton), a video that showed just the outline of the student (Contour), and a mix of both (Contour + Skeleton). The instructors were asked to identify any unstable or unsafe poses while using these aids. The results showed that none of the visual aids helped the instructors spot mistakes better than regular video. While the Skeleton aid made the instructors work harder mentally, it didn't actually help them perform better. The instructors preferred using simple video over the more complex tools and found that using a single camera view was easier to work with. In short, more complex visual tools didn't help instructors improve their performance. Future studies should explore other ways, like using different camera angles or adding sound, to help instructors in online yoga classes. Online yoga computer vision pose estimation yoga instruction
25	Real-Time Head Pose Estimation in Low-Resolution Football Footage / Realtidsestimering av huvudets vridning i lågupplösta videosekvenser från fotbollsmatcher Launila, Andreas January 2009 (has links) <p>This report examines the problem of real-time head pose estimation in low-resolution football footage. A method is presented for inferring the head pose using a combination of footage and knowledge of the locations of the football and players. An ensemble of randomized ferns is compared with a support vector machine for processing the footage, while a support vector machine performs pattern recognition on the location data. Combining the two sources of information outperforms either in isolation. The location of the football turns out to be an important piece of information.</p> / QC 20100707 / Capturing and Visualizing Large scale Human Action (ACTVIS) head pose estimation football real-time coarse head pose estimation machine learning computer vision svm randomized ferns Computer science Datalogi Image analysis Bildanalys Computer engineering Datorteknik
26	3D Vision Geometry for Rolling Shutter Cameras / Géométrie pour la vision 3D avec des caméras Rolling Shutter Lao, Yizhen 16 May 2019 (has links) De nombreuses caméras CMOS modernes sont équipées de capteurs Rolling Shutter (RS). Ces caméras à bas coût et basse consommation permettent d’atteindre de très hautes fréquences d’acquisition. Dans ce mode d’acquisition, les lignes de pixels sont exposées séquentiellement du haut vers le bas de l'image. Par conséquent, les images capturées alors que la caméra et/ou la scène est en mouvement présentent des distorsions qui rendent les algorithmes classiques au mieux moins précis, au pire inutilisables en raison de singularités ou de configurations dégénérées. Le but de cette thèse est de revisiter la géométrie de la vision 3D avec des caméras RS en proposant des solutions pour chaque sous-tâche du pipe-line de Structure-from-Motion (SfM).Le chapitre II présente une nouvelle méthode de correction du RS en utilisant les droites. Contrairement aux méthodes existantes, qui sont itératives et font l’hypothèse dite Manhattan World (MW), notre solution est linéaire et n’impose aucune contrainte sur l’orientation des droites 3D. De plus, la méthode est intégrée dans un processus de type RANSAC permettant de distinguer les courbes qui sont des projections de segments droits de celles qui correspondent à de vraies courbes 3D. La méthode de correction est ainsi plus robuste et entièrement automatisée.Le chapitre III revient sur l'ajustement faisceaux ou bundle adjustment (BA). Nous proposons un nouvel algorithme basé sur une erreur de projection dans laquelle l’index de ligne des points projetés varie pendant l’optimisation afin de garder une cohérence géométrique contrairement aux méthodes existantes qui considère un index fixe (celui mesurés dans l’image). Nous montrons que cela permet de lever la dégénérescence dans le cas où les directions de scan des images sont trop proches (cas très communs avec des caméras embraquées sur un véhicule par exemple). Dans le chapitre VI nous étendons le concept d'homographie aux cas d’images RS en démontrant que la relation point-à-point entre deux images d’un nuage de points coplanaires pouvait s’exprimer sous la forme de 3 à 7 matrices de taille 3X3 en fonction du modèle de mouvement utilisé. Nous proposons une méthode linéaire pour le calcul de ces matrices. Ces dernières sont ensuite utilisées pour résoudre deux problèmes classiques en vision par ordinateur à savoir le calcul du mouvement relatif et le « mosaïcing » dans le cas RS.Dans le chapitre V nous traitons le problème de calcul de pose et de reconstruction multi-vues en établissant une analogie avec les méthodes utilisées pour les surfaces déformables telles que SfT (Structure-from-Template) et NRSfM (Non Rigid Structure-from-Motion). Nous montrons qu’une image RS d’une scène rigide en mouvement peut être interprétée comme une image Global Shutter (GS) d’une surface virtuellement déformée (par l’effet RS). La solution proposée pour estimer la pose et la structure 3D de la scène est ainsi composée de deux étapes. D’abord les déformations virtuelles sont d’abord calculées grâce à SfT ou NRSfM en assumant un modèle GS classique (relaxation du modèle RS). Ensuite, ces déformations sont réinterprétées comme étant le résultat du mouvement durant l’acquisition (réintroduction du modèle RS). L’approche proposée présente ainsi de meilleures propriétés de convergence que les approches existantes. / Many modern CMOS cameras are equipped with Rolling Shutter (RS) sensors which are considered as low cost, low consumption and fast cameras. In this acquisition mode, the pixel rows are exposed sequentially from the top to the bottom of the image. Therefore, images captured by moving RS cameras produce distortions (e.g. wobble and skew) which make the classic algorithms at best less precise, at worst unusable due to singularities or degeneracies. The goal of this thesis is to propose a general framework for modelling and solving structure from motion (SfM) with RS cameras. Our approach consists in addressing each sub-task of the SfM pipe-line (namely image correction, absolute and relative pose estimation and bundle adjustment) and proposing improvements.The first part of this manuscript presents a novel RS correction method which uses line features. Unlike existing methods, which uses iterative solutions and make Manhattan World (MW) assumption, our method R4C computes linearly the camera instantaneous-motion using few image features. Besides, the method was integrated into a RANSAC-like framework which enables us to detect curves that correspond to actual 3D straight lines and reject outlier curves making image correction more robust and fully automated.The second part revisits Bundle Adjustment (BA) for RS images. It deals with a limitation of existing RS bundle adjustment methods in case of close read-out directions among RS views which is a common configuration in many real-life applications. In contrast, we propose a novel camera-based RS projection algorithm and incorporate it into RSBA to calculate reprojection errors. We found out that this new algorithm makes SfM survive the degenerate configuration mentioned above.The third part proposes a new RS Homography matrix based on point correspondences from an RS pair. Linear solvers for the computation of this matrix are also presented. Specifically, a practical solver with 13 point correspondences is proposed. In addition, we present two essential applications in computer vision that use RS homography: plane-based RS relative pose estimation and RS image stitching. The last part of this thesis studies absolute camera pose problem (PnP) and SfM which handle RS effects by drawing analogies with non-rigid vision, namely Shape-from-Template (SfT) and Non-rigid SfM (NRSfM) respectively. Unlike all existing methods which perform 3D-2D registration after augmenting the Global Shutter (GS) projection model with the velocity parameters under various kinematic models, we propose to use local differential constraints. The proposed methods outperform stat-of-the-art and handles configurations that are critical for existing methods. Rolling shutter Pose absolue et relative Homographie S-f-M Ajustement de faisceaux Rolling shutter Image correction Pose estimation Relative pose estimation Homography Structure from Motion Bundle Adjustment
27	Bring Your Body into Action : Body Gesture Detection, Tracking, and Analysis for Natural Interaction Abedan Kondori, Farid January 2014 (has links) Due to the large influx of computers in our daily lives, human-computer interaction has become crucially important. For a long time, focusing on what users need has been critical for designing interaction methods. However, new perspective tends to extend this attitude to encompass how human desires, interests, and ambitions can be met and supported. This implies that the way we interact with computers should be revisited. Centralizing human values rather than user needs is of the utmost importance for providing new interaction techniques. These values drive our decisions and actions, and are essential to what makes us human. This motivated us to introduce new interaction methods that will support human values, particularly human well-being. The aim of this thesis is to design new interaction methods that will empower human to have a healthy, intuitive, and pleasurable interaction with tomorrow’s digital world. In order to achieve this aim, this research is concerned with developing theories and techniques for exploring interaction methods beyond keyboard and mouse, utilizing human body. Therefore, this thesis addresses a very fundamental problem, human motion analysis. Technical contributions of this thesis introduce computer vision-based, marker-less systems to estimate and analyze body motion. The main focus of this research work is on head and hand motion analysis due to the fact that they are the most frequently used body parts for interacting with computers. This thesis gives an insight into the technical challenges and provides new perspectives and robust techniques for solving the problem. Human Well-Being Bodily Interaction Natural Interaction Human Motion Analysis Active Motion Estimation Direct Motion Estimation Head Pose Estimation Hand Pose Estimation.
28	AUV SLAM constraint formation using side scan sonar / AUV SLAM Begränsningsbildning med hjälp av sidescan sonar Schouten, Marco January 2022 (has links) Autonomous underwater vehicle (AUV) navigation has been a challenging problem for a long time. Navigation is challenging due to the drift present in underwater environments and the lack of precise localisation systems such as GPS. Therefore, the uncertainty of the vehicle’s pose grows with the mission’s duration. This research investigates methods to form constraints on the vehicle’s pose throughout typical surveys. Current underwater navigation relies on acoustic sensors. Side Scan Sonar (SSS) is cheaper than Multibeam echosounder (MBES) but can generate 2D intensity images of wide sections of the seafloor instead of 3D representations. The methodology consists in extracting information from pairs of side-scan sonar images representing overlapping portions of the seafloor and computing the sensor pose transformation between the two reference frames of the image to generate constraints on the pose. The chosen approach relies on optimisation methods within a Simultaneous Localisation and Mapping (SLAM) framework to directly correct the trajectory and provide the best estimate of the AUV pose. I tested the optimisation system on simulated data to evaluate the proof of concept. Lastly, as an experiment trial, I tested the implementation on an annotated dataset containing overlapping side-scan sonar images provided by SMaRC. The simulated results indicate that AUV pose error can be reduced by optimisation, even with various noise levels in the measurements. / Navigering av autonoma undervattensfordon (AUV) har länge varit ett utmanande problem. Navigering är en utmaning på grund av den drift som förekommer i undervattensmiljöer och bristen på exakta lokaliseringssystem som GPS. Därför ökar osäkerheten i fråga om fordonets position med uppdragets längd. I denna forskning undersöks metoder för att skapa begränsningar för fordonets position under typiska undersökningar. Nuvarande undervattensnavigering bygger på akustiska sensorer. Side Scan Sonar (SSS) är billigare än Multibeam echosounder (MBES) men kan generera 2D-intensitetsbilder av stora delar av havsbotten i stället för 3D-bilder. Metoden består i att extrahera information från par av side-scan sonarbilder som representerar överlappande delar av havsbotten och beräkna sensorns posetransformation mellan bildens två referensramar för att generera begränsningar för poset. Det valda tillvägagångssättet bygger på optimeringsmetoder inom en SLAM-ram (Simultaneous Localisation and Mapping) för att direkt korrigera banan och ge den bästa uppskattningen av AUV:s position. Jag testade optimeringssystemet på simulerade data för att utvärdera konceptet. Slutligen testade jag genomförandet på ett annoterat dataset med överlappande side-scan sonarbilder från SMaRC. De simulerade resultaten visar att AUV:s poseringsfel kan minskas genom optimering, även med olika brusnivåer i mätningarna. AUV Side Scan Sonar Pose Estimation SLAM Optimisation AUV Side Scan Sonar Pose Estimation SLAM Optimisation Computer and Information Sciences Data- och informationsvetenskap
29	Channel-Coded Feature Maps for Computer Vision and Machine Learning Jonsson, Erik January 2008 (has links) This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function. The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated. This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented. Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration. Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed. All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system. computer vision machine learning object recognition pose estimation Image analysis Bildanalys
30	Enhancing mobile camera pose estimation through the inclusion of sensors Hughes, Lloyd Haydn 12 1900 (has links) Thesis (MSc)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: Monocular structure from motion (SfM) is a widely researched problem, however many of the existing approaches prove to be too computationally expensive for use on mobile devices. In this thesis we investigate how inertial sensors can be used to increase the performance of SfM algorithms on mobile devices. Making use of the low cost inertial sensors found on most mobile devices we design and implement an extended Kalman filter (EKF) to exploit their complementary nature, in order to produce an accurate estimate of the attitude of the device. We make use of a quaternion based system model in order to linearise the measurement stage of the EKF, thus reducing its computational complexity. We use this attitude estimate to enhance the feature tracking and camera localisation stages in our SfM pipeline. In order to perform feature tracking we implement a hybrid tracking algorithm which makes use of Harris corners and an approximate nearest neighbour search to reduce the search space for possible correspondences. We increase the robustness of this approach by using inertial information to compensate for inter-frame camera rotation. We further develop an efficient bundle adjustment algorithm which only optimises the pose of the previous three key frames and the 3D map points common between at least two of these frames. We implement an optimisation based localisation algorithm which makes use of our EKF attitude estimate and the tracked features, in order to estimate the pose of the device relative to the 3D map points. This optimisation is performed in two steps, the first of which optimises only the translation and the second optimises the full pose. We integrate the aforementioned three sub-systems into an inertial assisted pose estimation pipeline. We evaluate our algorithms with the use of datasets captured on the iPhone 5 in the presence of a Vicon motion capture system for ground truth data. We find that our EKF can estimate the device’s attitude with an average dynamic accuracy of ±5°. Furthermore, we find that the inclusion of sensors into the visual pose estimation pipeline can lead to improvements in terms of robustness and computational efficiency of the algorithms and are unlikely to negatively affect the accuracy of such a system. Even though we managed to reduce execution time dramatically, compared to typical existing techniques, our full system is found to still be too computationally expensive for real-time performance and currently runs at 3 frames per second, however the ever improving computational power of mobile devices and our described future work will lead to improved performance. From this study we conclude that inertial sensors make a valuable addition into a visual pose estimation pipeline implemented on a mobile device. / AFRIKAANSE OPSOMMING: Enkel-kamera struktuur-vanaf-beweging (structure from motion, SfM) is ’n bekende navorsingsprobleem, maar baie van die bestaande benaderings is te berekeningsintensief vir gebruik op mobiele toestelle. In hierdie tesis ondersoek ons hoe traagheidsensors gebruik kan word om die prestasie van SfM algoritmes op mobiele toestelle te verbeter. Om van die lae-koste traagheidsensors wat op meeste mobiele toestelle gevind word gebruik te maak, ontwerp en implementeer ons ’n uitgebreide Kalman filter (extended Kalman filter, EKF) om hul komplementêre geaardhede te ontgin, en sodoende ’n akkurate skatting van die toestel se postuur te verkry. Ons maak van ’n kwaternioon-gebaseerde stelselmodel gebruik om die meetstadium van die EKF te lineariseer, en so die berekeningskompleksiteit te verminder. Hierdie afskatting van die toestel se postuur word gebruik om die fases van kenmerkvolging en kameralokalisering in ons SfM proses te verbeter. Vir kenmerkvolging implementeer ons ’n hibriede volgingsalgoritme wat gebruik maak van Harris-hoekpunte en ’n benaderde naaste-buurpunt-soektog om die soekruimte vir moontlike ooreenstemmings te verklein. Ons verhoog die robuustheid van hierdie benadering, deur traagheidsinligting te gebruik om vir kamerarotasies tussen raampies te kompenseer. Verder ontwikkel ons ’n doeltreffende bondelaanpassingsalgoritme wat slegs optimeer oor die vorige drie sleutelraampies, en die 3D punte gemeenskaplik tussen minstens twee van hierdie raampies. Ons implementeer ’n optimeringsgebaseerde lokaliseringsalgoritme, wat gebruik maak van ons EKF se postuurafskatting en die gevolgde kenmerke, om die posisie en oriëntasie van die toestel relatief tot die 3D punte in die kaart af te skat. Die optimering word in twee stappe uitgevoer: eerstens net oor die kamera se translasie, en tweedens oor beide die translasie en rotasie. Ons integreer die bogenoemde drie sub-stelsels in ’n pyplyn vir postuurafskatting met behulp van traagheidsensors. Ons evalueer ons algoritmes met die gebruik van datastelle wat met ’n iPhone 5 opgeneem is, terwyl dit in die teenwoordigheid van ’n Vicon bewegingsvasleggingstelsel was (vir die gelyktydige opneming van korrekte postuurdata). Ons vind dat die EKF die toestel se postuur kan afskat met ’n gemiddelde dinamiese akkuraatheid van ±5°. Verder vind ons dat die insluiting van sensors in die visuele postuurafskattingspyplyn kan lei tot verbeterings in terme van die robuustheid en berekeningsdoeltreffendheid van die algoritmes, en dat dit waarskynlik nie die akkuraatheid van so ’n stelsel negatief beïnvloed nie. Al het ons die uitvoertyd drasties verminder (in vergelyking met tipiese bestaande tegnieke) is ons volledige stelsel steeds te berekeningsintensief vir intydse verwerking op ’n mobiele toestel en hardloop tans teen 3 raampies per sekonde. Die voortdurende verbetering van mobiele toestelle se berekeningskrag en die toekomstige werk wat ons beskryf sal egter lei tot ’n verbetering in prestasie. Uit hierdie studie kan ons aflei dat traagheidsensors ’n waardevolle toevoeging tot ’n visuele postuurafskattingspyplyn kan maak. Computer vision Motion detectors Extended Kalman filter (EKF) Pose estimation Mobile cameras UCTD

Search results