Spelling suggestions: "subject:"dose"" "subject:"pose""
331 |
3D Rekonstrukce historických míst z obrázků na Flickru / 3D Reconstruction of Historic Landmarks from Flickr PicturesŠimetka, Vojtěch January 2015 (has links)
Tato práce popisuje problematiku návrhu a vývoje aplikace pro rekonstrukci 3D modelů z 2D obrazových dat, označované jako bundle adjustment. Práce analyzuje proces 3D rekonstrukce a důkladně popisuje jednotlivé kroky. Prvním z kroků je automatizované získání obrazové sady z internetu. Je představena sada skriptů pro hromadné stahování obrázků ze služeb Flickr a Google Images a shrnuty požadavky na tyto obrázky pro co nejlepší 3D rekonstrukci. Práce dále popisuje různé detektory, extraktory a párovací algoritmy klíčových bodů v obraze s cílem najít nejvhodnější kombinaci pro rekonstrukci budov. Poté je vysvětlen proces rekonstrukce 3D struktury, její optimalizace a jak je tato problematika realizovaná v našem programu. Závěr práce testuje výsledky získané z implementovaného programu pro několik různých datových sad a porovnává je s výsledky ostatních podobných programů, představených v úvodu práce.
|
332 |
Robust Optimization for Simultaneous Localization and MappingSünderhauf, Niko 19 April 2012 (has links)
SLAM (Simultaneous Localization And Mapping) has been a very active and almost ubiquitous problem in the field of mobile and autonomous robotics for over two decades. For many years, filter-based methods have dominated the SLAM literature, but a change of paradigms could be observed recently.
Current state of the art solutions of the SLAM problem are based on efficient sparse least squares optimization techniques. However, it is commonly known that least squares methods are by default not robust against outliers. In SLAM, such outliers arise mostly from data association errors like false positive loop closures. Since the optimizers in current SLAM systems are not robust against outliers, they have to rely heavily on certain preprocessing steps to prevent or reject all data association errors. Especially false positive loop closures will lead to catastrophically wrong solutions with current solvers. The problem is commonly accepted in the literature, but no concise solution has been proposed so far.
The main focus of this work is to develop a novel formulation of the optimization-based SLAM problem that is robust against such outliers. The developed approach allows the back-end part of the SLAM system to change parts of the topological structure of the problem\'s factor graph representation during the optimization process. The back-end can thereby discard individual constraints and converge towards correct solutions even in the presence of many false positive loop closures. This largely increases the overall robustness of the SLAM system and closes a gap between the sensor-driven front-end and the back-end optimizers. The approach is evaluated on both large scale synthetic and real-world datasets.
This work furthermore shows that the developed approach is versatile and can be applied beyond SLAM, in other domains where least squares optimization problems are solved and outliers have to be expected. This is successfully demonstrated in the domain of GPS-based vehicle localization in urban areas where multipath satellite observations often impede high-precision position estimates.
|
333 |
Cooperative Navigation of Fixed-Wing Micro Air Vehicles in GPS-Denied EnvironmentsEllingson, Gary James 05 November 2019 (has links)
Micro air vehicles have recently gained popularity due to their potential as autonomous systems. Their future impact, however, will depend in part on how well they can navigate in GPS-denied and GPS-degraded environments. In response to this need, this dissertation investigates a potential solution for GPS-denied operations called relative navigation. The method utilizes keyframe-to-keyframe odometry estimates and their covariances in a global back end that represents the global state as a pose graph. The back end is able to effectively represent nonlinear uncertainties and incorporate opportunistic global constraints. The GPS-denied research community has, for the most part, neglected to consider fixed-wing aircraft. This dissertation enables fixed-wing aircraft to utilize relative navigation by accounting for their sensing requirements. The development of an odometry-like, front-end, EKF-based estimator that utilizes only a monocular camera and an inertial measurement unit is presented. The filter uses the measurement model of the multi-state-constraint Kalman filter and regularly performs relative resets in coordination with keyframe declarations. In addition to the front-end development, a method is provided to account for front-end velocity bias in the back-end optimization. Finally a method is presented for enabling multiple vehicles to improve navigational accuracy by cooperatively sharing information. Modifications to the relative navigation architecture are presented that enable decentralized, cooperative operations amidst temporary communication dropouts. The proposed framework also includes the ability to incorporate inter-vehicle measurements and utilizes a new concept called the coordinated reset, which is necessary for optimizing the cooperative odometry and improving localization. Each contribution is demonstrated through simulation and/or hardware flight testing. Simulation and Monte-Carlo testing is used to show the expected quality of the results. Hardware flight-test results show the front-end estimator performance, several back-end optimization examples, and cooperative GPS-denied operations.
|
334 |
Enabling Autonomous Operation of Micro Aerial Vehicles Through GPS to GPS-Denied TransitionsJackson, James Scott 11 November 2019 (has links)
Micro aerial vehicles and other autonomous systems have the potential to truly transform life as we know it, however much of the potential of autonomous systems remains unrealized because reliable navigation is still an unsolved problem with significant challenges. This dissertation presents solutions to many aspects of autonomous navigation. First, it presents ROSflight, a software and hardware architure that allows for rapid prototyping and experimentation of autonomy algorithms on MAVs with lightweight, efficient flight control. Next, this dissertation presents improvments to the state-of-the-art in optimal control of quadrotors by utilizing the error-state formulation frequently utilized in state estimation. It is shown that performing optimal control directly over the error-state results in a vastly more computationally efficient system than competing methods while also dealing with the non-vector rotation components of the state in a principled way. In addition, real-time robust flight planning is considered with a method to navigate cluttered, potentially unknown scenarios with real-time obstacle avoidance. Robust state estimation is a critical component to reliable operation, and this dissertation focuses on improving the robustness of visual-inertial state estimation in a filtering framework by extending the state-of-the-art to include better modeling and sensor fusion. Further, this dissertation takes concepts from the visual-inertial estimation community and applies it to tightly-coupled GNSS, visual-inertial state estimation. This method is shown to demonstrate significantly more reliable state estimation than visual-inertial or GNSS-inertial state estimation alone in a hardware experiment through a GNSS-GNSS denied transition flying under a building and back out into open sky. Finally, this dissertation explores a novel method to combine measurements from multiple agents into a coherent map. Traditional approaches to this problem attempt to solve for the position of multiple agents at specific times in their trajectories. This dissertation instead attempts to solve this problem in a relative context, resulting in a much more robust approach that is able to handle much greater intial error than traditional approaches.
|
335 |
Isomorphic Visualization and Understanding of the Commutativity of Multiplication: from multiplication of whole numbers to multiplication of fractionsMalaty, George 16 March 2012 (has links)
No description available.
|
336 |
Learning Sampling-Based 6D Object Pose EstimationKrull, Alexander 31 August 2018 (has links)
The task of 6D object pose estimation, i.e. of estimating an object position (three degrees of freedom) and orientation (three degrees of freedom) from images is an essential building block of many modern applications, such as robotic grasping, autonomous driving, or augmented reality. Automatic pose estimation systems have to overcome a variety of visual ambiguities, including texture-less objects, clutter, and occlusion. Since many applications demand real time performance the efficient use of computational resources is an additional challenge.
In this thesis, we will take a probabilistic stance on trying to overcome said issues. We build on a highly successful automatic pose estimation framework based on predicting pixel-wise correspondences between the camera coordinate system and the local coordinate system of the object. These dense correspondences are used to generate a pool of hypotheses, which in turn serve as a starting point in a final search procedure. We will present three systems that each use probabilistic modeling and sampling to improve upon different aspects of the framework.
The goal of the first system, System I, is to enable pose tracking, i.e. estimating the pose of an object in a sequence of frames instead of a single image. By including information from previous frames tracking systems can resolve many visual ambiguities and reduce computation time. System I is a particle filter (PF) approach. The PF represents its belief about the pose in each frame by propagating a set of samples through time. Our system uses the process of hypothesis generation from the original framework as part of a proposal distribution that efficiently concentrates samples in the appropriate areas.
In System II, we focus on the problem of evaluating the quality of pose hypotheses. This task plays an essential role in the final search procedure of the original framework. We use a convolutional neural network (CNN) to assess the quality of an hypothesis by comparing rendered and observed images. To train the CNN we view it as part of an energy-based probability distribution in pose space. This probabilistic perspective allows us to train the system under the maximum likelihood paradigm. We use a sampling approach to approximate the required gradients. The resulting system for pose estimation yields superior results in particular for highly occluded objects.
In System III, we take the idea of machine learning a step further. Instead of learning to predict an hypothesis quality measure, to be used in a search procedure, we present a way of learning the search procedure itself. We train a reinforcement learning (RL) agent, termed PoseAgent, to steer the search process and make optimal use of a given computational budget. PoseAgent dynamically decides which hypothesis should be refined next, and which one should ultimately be output as final estimate. Since the search procedure includes discrete non-differentiable choices, training of the system via gradient descent is not easily possible. To solve the problem, we model behavior of PoseAgent as non-deterministic stochastic policy, which is ultimately governed by a CNN. This allows us to use a sampling-based stochastic policy gradient training procedure.
We believe that some of the ideas developed in this thesis,
such as the sampling-driven probabilistically motivated training of a CNN for the comparison of images or the search procedure implemented by PoseAgent have the potential to be applied in fields beyond pose estimation as well.
|
337 |
Through the Blur with Deep Learning : A Comparative Study Assessing Robustness in Visual Odometry TechniquesBerglund, Alexander January 2023 (has links)
In this thesis, the robustness of deep learning techniques in the field of visual odometry is investigated, with a specific focus on the impact of motion blur. A comparative study is conducted, evaluating the performance of state-of-the-art deep convolutional neural network methods, namely DF-VO and DytanVO, against ORB-SLAM3, a well-established non-deep-learning technique for visual simultaneous localization and mapping. The objective is to quantitatively assess the performance of these models as a function of motion blur. The evaluation is carried out on a custom synthetic dataset, which simulates a camera navigating through a forest environment. The dataset includes trajectories with varying degrees of motion blur, caused by camera translation, and optionally, pitch and yaw rotational noise. The results demonstrate that deep learning-based methods maintained robust performance despite the challenging conditions presented in the test data, while excessive blur lead to tracking failures in the geometric model. This suggests that the ability of deep neural network architectures to automatically learn hierarchical feature representations and capture complex, abstract features may enhance the robustness of deep learning-based visual odometry techniques in challenging conditions, compared to their geometric counterparts.
|
338 |
Modulating Depth Map Features to Estimate 3D Human Pose via Multi-Task Variational Autoencoders / Modulerande djupkartfunktioner för att uppskatta människans ställning i 3D med multi-task-variationsautoenkoderMoerman, Kobe January 2023 (has links)
Human pose estimation (HPE) constitutes a fundamental problem within the domain of computer vision, finding applications in diverse fields like motion analysis and human-computer interaction. This paper introduces innovative methodologies aimed at enhancing the accuracy and robustness of 3D joint estimation. Through the integration of Variational Autoencoders (VAEs), pertinent information is extracted from depth maps, even in the presence of inevitable image-capturing inconsistencies. This concept is enhanced through the introduction of noise to the body or specific regions surrounding key joints. The deliberate introduction of noise to these areas enables the VAE to acquire a robust representation that captures authentic pose-related patterns. Moreover, the introduction of a localised mask as a constraint in the loss function ensures the model predominantly relies on pose-related cues while disregarding potential confounding factors that may hinder the compact representation of accurate human pose information. Delving into the latent space modulation further, a novel model architecture is devised, joining a VAE and fully connected network into a multi-task joint training objective. In this framework, the VAE and regressor harmoniously influence the latent representations for accurate joint detection and localisation. By combining the multi-task model with the loss function constraint, this study attains results that compete with state-of-the-art techniques. These findings underscore the significance of leveraging latent space modulation and customised loss functions to address challenging human poses. Additionally, these novel methodologies pave the way for future explorations and provide prospects for advancing HPE. Subsequent research endeavours may optimising these techniques, evaluating their performance across diverse datasets, and exploring potential extensions to unravel further insights and advancements in the field. / Human pose estimation (HPE) är ett grundläggande problem inom datorseende och används inom områden som rörelseanalys och människa-datorinteraktion. I detta arbete introduceras innovativa metoder som syftar till att förbättra noggrannheten och robustheten i 3D-leduppskattning. Genom att integrera variationsautokodare (eng. variational autoencoder, VAE) extraheras relevant information från djupkartor, trots närvaro av inkonsekventa avvikelser i bilden. Dessa avvikelser förstärks genom att applicera brus på kroppen eller på specifika regioner som omger viktiga leder. Det avsiktliga införandet av brus i dessa områden gör det möjligt för VAE att lära sig en robust representation som fångar autentiska poseringsrelaterade mönster. Dessutom införs en lokaliserad mask som en begränsning i förlustfunktionen, vilket säkerställer att modellen främst förlitar sig på poseringsrelaterade signaler samtidigt som potentiella störande faktorer som hindrar den kompakta representationen av korrekt mänsklig poseringsinformation bortses ifrån. Genom att fördjupa sig ytterligare i den latenta rumsmoduleringen har en ny modellarkitektur tagits fram som förenar en VAE och ett fullständigt anslutet nätverk i en fleruppgiftsmodell. I detta ramverk påverkar VAE och det fullständigt ansluta nätverket de latenta representationerna på ett harmoniskt sätt för att uppnå korrekt leddetektering och lokalisering. Genom att kombinera fleruppgiftsmodellen med förlustfunktionsbegränsningen uppnår denna studie resultat som konkurrerar med toppmoderna tekniker. Dessa resultat understryker betydelsen av att utnyttja latent rymdmodulering och anpassade förlustfunktioner för att hantera utmanande mänskliga poser. Dessutom banar dessa nya metoder väg för framtida utveckling inom uppskattning av HPE. Efterföljande forskningsinsatser kan optimera dessa tekniker, utvärdera deras prestanda över olika datamängder och utforska potentiella tillägg för att avslöja ytterligare insikter och framsteg inom området.
|
339 |
Structureless Camera Motion Estimation of Unordered Omnidirectional ImagesSastuba, Mark 08 August 2022 (has links)
This work aims at providing a novel camera motion estimation pipeline from large collections of unordered omnidirectional images. In oder to keep the pipeline as general and flexible as possible, cameras are modelled as unit spheres, allowing to incorporate any central camera type. For each camera an unprojection lookup is generated from intrinsics, which is called P2S-map (Pixel-to-Sphere-map), mapping pixels to their corresponding positions on the unit sphere. Consequently the camera geometry becomes independent of the underlying projection model. The pipeline also generates P2S-maps from world map projections with less distortion effects as they are known from cartography. Using P2S-maps from camera calibration and world map projection allows to convert omnidirectional camera images to an appropriate world map projection in oder to apply standard feature extraction and matching algorithms for data association. The proposed estimation pipeline combines the flexibility of SfM (Structure from Motion) - which handles unordered image collections - with the efficiency of PGO (Pose Graph Optimization), which is used as back-end in graph-based Visual SLAM (Simultaneous Localization and Mapping) approaches to optimize camera poses from large image sequences. SfM uses BA (Bundle Adjustment) to jointly optimize camera poses (motion) and 3d feature locations (structure), which becomes computationally expensive for large-scale scenarios. On the contrary PGO solves for camera poses (motion) from measured transformations between cameras, maintaining optimization managable. The proposed estimation algorithm combines both worlds. It obtains up-to-scale transformations between image pairs using two-view constraints, which are jointly scaled using trifocal constraints. A pose graph is generated from scaled two-view transformations and solved by PGO to obtain camera motion efficiently even for large image collections. Obtained results can be used as input data to provide initial pose estimates for further 3d reconstruction purposes e.g. to build a sparse structure from feature correspondences in an SfM or SLAM framework with further refinement via BA.
The pipeline also incorporates fixed extrinsic constraints from multi-camera setups as well as depth information provided by RGBD sensors. The entire camera motion estimation pipeline does not need to generate a sparse 3d structure of the captured environment and thus is called SCME (Structureless Camera Motion Estimation).:1 Introduction
1.1 Motivation
1.1.1 Increasing Interest of Image-Based 3D Reconstruction
1.1.2 Underground Environments as Challenging Scenario
1.1.3 Improved Mobile Camera Systems for Full Omnidirectional Imaging
1.2 Issues
1.2.1 Directional versus Omnidirectional Image Acquisition
1.2.2 Structure from Motion versus Visual Simultaneous Localization and Mapping
1.3 Contribution
1.4 Structure of this Work
2 Related Work
2.1 Visual Simultaneous Localization and Mapping
2.1.1 Visual Odometry
2.1.2 Pose Graph Optimization
2.2 Structure from Motion
2.2.1 Bundle Adjustment
2.2.2 Structureless Bundle Adjustment
2.3 Corresponding Issues
2.4 Proposed Reconstruction Pipeline
3 Cameras and Pixel-to-Sphere Mappings with P2S-Maps
3.1 Types
3.2 Models
3.2.1 Unified Camera Model
3.2.2 Polynomal Camera Model
3.2.3 Spherical Camera Model
3.3 P2S-Maps - Mapping onto Unit Sphere via Lookup Table
3.3.1 Lookup Table as Color Image
3.3.2 Lookup Interpolation
3.3.3 Depth Data Conversion
4 Calibration
4.1 Overview of Proposed Calibration Pipeline
4.2 Target Detection
4.3 Intrinsic Calibration
4.3.1 Selected Examples
4.4 Extrinsic Calibration
4.4.1 3D-2D Pose Estimation
4.4.2 2D-2D Pose Estimation
4.4.3 Pose Optimization
4.4.4 Uncertainty Estimation
4.4.5 PoseGraph Representation
4.4.6 Bundle Adjustment
4.4.7 Selected Examples
5 Full Omnidirectional Image Projections
5.1 Panoramic Image Stitching
5.2 World Map Projections
5.3 World Map Projection Generator for P2S-Maps
5.4 Conversion between Projections based on P2S-Maps
5.4.1 Proposed Workflow
5.4.2 Data Storage Format
5.4.3 Real World Example
6 Relations between Two Camera Spheres
6.1 Forward and Backward Projection
6.2 Triangulation
6.2.1 Linear Least Squares Method
6.2.2 Alternative Midpoint Method
6.3 Epipolar Geometry
6.4 Transformation Recovery from Essential Matrix
6.4.1 Cheirality
6.4.2 Standard Procedure
6.4.3 Simplified Procedure
6.4.4 Improved Procedure
6.5 Two-View Estimation
6.5.1 Evaluation Strategy
6.5.2 Error Metric
6.5.3 Evaluation of Estimation Algorithms
6.5.4 Concluding Remarks
6.6 Two-View Optimization
6.6.1 Epipolar-Based Error Distances
6.6.2 Projection-Based Error Distances
6.6.3 Comparison between Error Distances
6.7 Two-View Translation Scaling
6.7.1 Linear Least Squares Estimation
6.7.2 Non-Linear Least Squares Optimization
6.7.3 Comparison between Initial and Optimized Scaling Factor
6.8 Homography to Identify Degeneracies
6.8.1 Homography for Spherical Cameras
6.8.2 Homography Estimation
6.8.3 Homography Optimization
6.8.4 Homography and Pure Rotation
6.8.5 Homography in Epipolar Geometry
7 Relations between Three Camera Spheres
7.1 Three View Geometry
7.2 Crossing Epipolar Planes Geometry
7.3 Trifocal Geometry
7.4 Relation between Trifocal, Three-View and Crossing Epipolar Planes
7.5 Translation Ratio between Up-To-Scale Two-View Transformations
7.5.1 Structureless Determination Approaches
7.5.2 Structure-Based Determination Approaches
7.5.3 Comparison between Proposed Approaches
8 Pose Graphs
8.1 Optimization Principle
8.2 Solvers
8.2.1 Additional Graph Solvers
8.2.2 False Loop Closure Detection
8.3 Pose Graph Generation
8.3.1 Generation of Synthetic Pose Graph Data
8.3.2 Optimization of Synthetic Pose Graph Data
9 Structureless Camera Motion Estimation
9.1 SCME Pipeline
9.2 Determination of Two-View Translation Scale Factors
9.3 Integration of Depth Data
9.4 Integration of Extrinsic Camera Constraints
10 Camera Motion Estimation Results
10.1 Directional Camera Images
10.2 Omnidirectional Camera Images
11 Conclusion
11.1 Summary
11.2 Outlook and Future Work
Appendices
A.1 Additional Extrinsic Calibration Results
A.2 Linear Least Squares Scaling
A.3 Proof Rank Deficiency
A.4 Alternative Derivation Midpoint Method
A.5 Simplification of Depth Calculation
A.6 Relation between Epipolar and Circumferential Constraint
A.7 Covariance Estimation
A.8 Uncertainty Estimation from Epipolar Geometry
A.9 Two-View Scaling Factor Estimation: Uncertainty Estimation
A.10 Two-View Scaling Factor Optimization: Uncertainty Estimation
A.11 Depth from Adjoining Two-View Geometries
A.12 Alternative Three-View Derivation
A.12.1 Second Derivation Approach
A.12.2 Third Derivation Approach
A.13 Relation between Trifocal Geometry and Alternative Midpoint Method
A.14 Additional Pose Graph Generation Examples
A.15 Pose Graph Solver Settings
A.16 Additional Pose Graph Optimization Examples
Bibliography
|
340 |
Crime Detection From Pre-crime Video AnalysisSedat Kilic (18363729) 03 June 2024 (has links)
<p dir="ltr">his research investigates the detection of pre-crime events, specifically targeting behaviors indicative of shoplifting, through the advanced analysis of CCTV video data. The study introduces an innovative approach that leverages augmented human pose and emotion information within individual frames, combined with the extraction of activity information across subsequent frames, to enhance the identification of potential shoplifting actions before they occur. Utilizing a diverse set of models including 3D Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), and a specially developed transformer architecture, the research systematically explores the impact of integrating additional contextual information into video analysis.</p><p dir="ltr">By augmenting frame-level video data with detailed pose and emotion insights, and focusing on the temporal dynamics between frames, our methodology aims to capture the nuanced behavioral patterns that precede shoplifting events. The comprehensive experimental evaluation of our models across different configurations reveals a significant improvement in the accuracy of pre-crime detection. The findings underscore the crucial role of combining visual features with augmented data and the importance of analyzing activity patterns over time for a deeper understanding of pre-shoplifting behaviors.</p><p dir="ltr">The study’s contributions are multifaceted, including a detailed examination of pre-crime frames, strategic augmentation of video data with added contextual information, the creation of a novel transformer architecture customized for pre-crime analysis, and an extensive evaluation of various computational models to improve predictive accuracy.</p>
|
Page generated in 0.0431 seconds