Spelling suggestions: "subject:"cisual place arecognition"" "subject:"cisual place 2recognition""
1 |
Deep visual place recognition for mobile surveillance services : Evaluation of localization methods for GPS denied environmentBlomqvist, Linus January 2022 (has links)
Can an outward facing camera on a bus, be used to recognize its location in GPS denied environment? Observit, provides cloud-based mobile surveillance services for bus operators using IP cameras with wireless connectivity. With the continuous gathering of video information, it opens up new possibilities for additional services. One service is to use the information with the technology, visual place recognition, to locate the vehicle, where the image was taken. The objective of this thesis has been to answer, how well can learnable visual place recognition methods localize a bus in a GPS denied environment and if a lightweight model can achieve the same accurate results as a heavyweight model. In order to achieve this, four model architecture has been implemented, trained and evaluate on a created dataset of interesting places. A visual place recognition application has been implemented as well, in order to test the models on bus video footage. The results show that the heavyweight model constructed of VGG16 with Patch-NetVLAD, performed best on the task with different recall@N values and got a recall@1 score of 92.31%. The lightweight model that used the backbone of MobileNetV2 with Patch-NetVLAD, scored similar recall@N results as the heavyweight model and got the same recall@1 score. The thesis shows that, with different localization methods, it is possible for a vehicle to identify its position in a GPS denied environment, with a model that could be deploy on a camera. This work, impacts companies that rely on cameras as their source of service.
|
2 |
Monocular Camera-based Localization and Mapping for Autonomous MobilityShyam Sundar Kannan (6630713) 10 October 2024 (has links)
<p dir="ltr">Visual localization is a crucial component for autonomous vehicles and robots, enabling them to navigate effectively by interpreting visual cues from their surroundings. In visual localization, the agent estimates its six degrees of freedom camera pose using images captured by onboard cameras. However, the operating environment of the agent can undergo various changes, such as variations in illumination, time of day, seasonal shifts, and structural modifications, all of which can significantly affect the performance of vision-based localization systems. To ensure robust localization in dynamic conditions, it is vital to develop methods that can adapt to these variations.</p><p dir="ltr">This dissertation presents a suite of methods designed to enhance the robustness and accuracy of visual localization for autonomous agents, addressing the challenges posed by environmental changes. First, we introduce a visual place recognition system that aids the autonomous agent in identifying its location within a large-scale map by retrieving a reference image closely matching the query image captured by the camera. This system employs a vision transformer to extract both global and patch-level descriptors from the images. Global descriptors, which are compact vectors devoid of geometric details, facilitate the rapid retrieval of candidate images from the reference dataset. Patch-level descriptors, derived from the patch tokens of the transformer, are subsequently used for geometric verification, re-ranking the candidate images to pinpoint the reference image that most closely matches the query.</p><p dir="ltr">Building on place recognition, we present a method for pose refinement and relocalization that integrates the environment's 3D point cloud with the set of reference images. The closest image retrieved in the initial place recognition step provides a coarse pose estimate of the query image, which is then refined to compute a precise six degrees of freedom pose. This refinement process involves extracting features from the query image and the closest reference image and then regressing these features using a transformer-based network that estimates the pose of the query image. The features are appended with 2D and 3D positional embeddings that are calculated based on the camera parameters and the 3D point cloud of the environment. These embeddings add spatial awareness to the regression model, hence enhancing the accuracy of the pose estimation. The resulting refined pose can serve as a robust initialization for various localization frameworks or be used for localization on the go. </p><p dir="ltr">Recognizing that the operating environment may undergo permanent changes, such as structural modifications that can render existing reference maps outdated, we also introduce a zero-shot visual change detection framework. This framework identifies and localizes changes by comparing current images with historical images from the same locality on the map, leveraging foundational vision models to operate without extensive annotated training data. It accurately detects changes and classifies them as temporary or permanent, enabling timely and informed updates to reference maps. This capability is essential for maintaining the accuracy and robustness of visual localization systems over time, particularly in dynamic environments.</p><p dir="ltr">Collectively, the contributions of this dissertation in place recognition, pose refinement, and change detection advance the state of visual localization, providing a comprehensive and adaptable framework that supports the evolving requirements of autonomous mobility. By enhancing the accuracy, robustness, and adaptability of visual localization, these methods contribute significantly to the development and deployment of fully autonomous systems capable of navigating complex and changing environments with high reliability.</p>
|
3 |
Indoor scene verification : Evaluation of indoor scene representations for the purpose of location verification / Verifiering av inomhusbilder : Bedömning av en inomhusbilder framställda i syfte att genomföra platsverifieringFinfando, Filip January 2020 (has links)
When human’s visual system is looking at two pictures taken in some indoor location, it is fairly easy to tell whether they were taken in exactly the same place, even when the location has never been visited in reality. It is possible due to being able to pay attention to the multiple factors such as spatial properties (windows shape, room shape), common patterns (floor, walls) or presence of specific objects (furniture, lighting). Changes in camera pose, illumination, furniture location or digital alteration of the image (e.g. watermarks) has little influence on this ability. Traditional approaches to measuring the perceptual similarity of images struggled to reproduce this skill. This thesis defines the Indoor scene verification (ISV) problem as distinguishing whether two indoor scene images were taken in the same indoor space or not. It explores the capabilities of state-of-the-art perceptual similarity metrics by introducing two new datasets designed specifically for this problem. Perceptual hashing, ORB, FaceNet and NetVLAD are evaluated as the baseline candidates. The results show that NetVLAD provides the best results on both datasets and therefore is chosen as the baseline for the experiments aiming to improve it. Three of them are carried out testing the impact of using the different training dataset, changing deep neural network architecture and introducing new loss function. Quantitative analysis of AUC score shows that switching from VGG16 to MobileNetV2 allows for improvement over the baseline. / Med mänskliga synförmågan är det ganska lätt att bedöma om två bilder som tas i samma inomhusutrymme verkligen har tagits i exakt samma plats även om man aldrig har varit där. Det är möjligt tack vare många faktorer, sådana som rumsliga egenskaper (fönsterformer, rumsformer), gemensamma mönster (golv, väggar) eller närvaro av särskilda föremål (möbler, ljus). Ändring av kamerans placering, belysning, möblernas placering eller digitalbildens förändring (t. ex. vattenstämpel) påverkar denna förmåga minimalt. Traditionella metoder att mäta bildernas perceptuella likheter hade svårigheter att reproducera denna färdighet . Denna uppsats definierar verifiering av inomhusbilder, Indoor SceneVerification (ISV), som en ansats att ta reda på om två inomhusbilder har tagits i samma utrymme eller inte. Studien undersöker de främsta perceptuella identitetsfunktionerna genom att introducera två nya datauppsättningar designade särskilt för detta. Perceptual hash, ORB, FaceNet och NetVLAD identifierades som potentiella referenspunkter. Resultaten visar att NetVLAD levererar de bästa resultaten i båda datauppsättningarna, varpå de valdes som referenspunkter till undersökningen i syfte att förbättra det. Tre experiment undersöker påverkan av användning av olika datauppsättningar, ändring av struktur i neuronnätet och införande av en ny minskande funktion. Kvantitativ AUC-värdet analys visar att ett byte frånVGG16 till MobileNetV2 tillåter förbättringar i jämförelse med de primära lösningarna.
|
4 |
Superpixels and their Application for Visual Place Recognition in Changing EnvironmentsNeubert, Peer 03 December 2015 (has links) (PDF)
Superpixels are the results of an image oversegmentation. They are an established intermediate level image representation and used for various applications including object detection, 3d reconstruction and semantic segmentation. While there are various approaches to create such segmentations, there is a lack of knowledge about their properties. In particular, there are contradicting results published in the literature. This thesis identifies segmentation quality, stability, compactness and runtime to be important properties of superpixel segmentation algorithms. While for some of these properties there are established evaluation methodologies available, this is not the case for segmentation stability and compactness. Therefore, this thesis presents two novel metrics for their evaluation based on ground truth optical flow. These two metrics are used together with other novel and existing measures to create a standardized benchmark for superpixel algorithms. This benchmark is used for extensive comparison of available algorithms. The evaluation results motivate two novel segmentation algorithms that better balance trade-offs of existing algorithms: The proposed Preemptive SLIC algorithm incorporates a local preemption criterion in the established SLIC algorithm and saves about 80 % of the runtime. The proposed Compact Watershed algorithm combines Seeded Watershed segmentation with compactness constraints to create regularly shaped, compact superpixels at the even higher speed of the plain watershed transformation.
Operating autonomous systems over the course of days, weeks or months, based on visual navigation, requires repeated recognition of places despite severe appearance changes as they are for example induced by illumination changes, day-night cycles, changing weather or seasons - a severe problem for existing methods. Therefore, the second part of this thesis presents two novel approaches that incorporate superpixel segmentations in place recognition in changing environments. The first novel approach is the learning of systematic appearance changes. Instead of matching images between, for example, summer and winter directly, an additional prediction step is proposed. Based on superpixel vocabularies, a predicted image is generated that shows, how the summer scene could look like in winter or vice versa. The presented results show that, if certain assumptions on the appearance changes and the available training data are met, existing holistic place recognition approaches can benefit from this additional prediction step. Holistic approaches to place recognition are known to fail in presence of viewpoint changes. Therefore, this thesis presents a new place recognition system based on local landmarks and Star-Hough. Star-Hough is a novel approach to incorporate the spatial arrangement of local image features in the computation of image similarities. It is based on star graph models and Hough voting and particularly suited for local features with low spatial precision and high outlier rates as they are expected in the presence of appearance changes. The novel landmarks are a combination of local region detectors and descriptors based on convolutional neural networks. This thesis presents and evaluates several new approaches to incorporate superpixel segmentations in local region detection. While the proposed system can be used with different types of local regions, in particular the combination with regions obtained from the novel multiscale superpixel grid shows to perform superior to the state of the art methods - a promising basis for practical applications.
|
5 |
Superpixels and their Application for Visual Place Recognition in Changing EnvironmentsNeubert, Peer 01 December 2015 (has links)
Superpixels are the results of an image oversegmentation. They are an established intermediate level image representation and used for various applications including object detection, 3d reconstruction and semantic segmentation. While there are various approaches to create such segmentations, there is a lack of knowledge about their properties. In particular, there are contradicting results published in the literature. This thesis identifies segmentation quality, stability, compactness and runtime to be important properties of superpixel segmentation algorithms. While for some of these properties there are established evaluation methodologies available, this is not the case for segmentation stability and compactness. Therefore, this thesis presents two novel metrics for their evaluation based on ground truth optical flow. These two metrics are used together with other novel and existing measures to create a standardized benchmark for superpixel algorithms. This benchmark is used for extensive comparison of available algorithms. The evaluation results motivate two novel segmentation algorithms that better balance trade-offs of existing algorithms: The proposed Preemptive SLIC algorithm incorporates a local preemption criterion in the established SLIC algorithm and saves about 80 % of the runtime. The proposed Compact Watershed algorithm combines Seeded Watershed segmentation with compactness constraints to create regularly shaped, compact superpixels at the even higher speed of the plain watershed transformation.
Operating autonomous systems over the course of days, weeks or months, based on visual navigation, requires repeated recognition of places despite severe appearance changes as they are for example induced by illumination changes, day-night cycles, changing weather or seasons - a severe problem for existing methods. Therefore, the second part of this thesis presents two novel approaches that incorporate superpixel segmentations in place recognition in changing environments. The first novel approach is the learning of systematic appearance changes. Instead of matching images between, for example, summer and winter directly, an additional prediction step is proposed. Based on superpixel vocabularies, a predicted image is generated that shows, how the summer scene could look like in winter or vice versa. The presented results show that, if certain assumptions on the appearance changes and the available training data are met, existing holistic place recognition approaches can benefit from this additional prediction step. Holistic approaches to place recognition are known to fail in presence of viewpoint changes. Therefore, this thesis presents a new place recognition system based on local landmarks and Star-Hough. Star-Hough is a novel approach to incorporate the spatial arrangement of local image features in the computation of image similarities. It is based on star graph models and Hough voting and particularly suited for local features with low spatial precision and high outlier rates as they are expected in the presence of appearance changes. The novel landmarks are a combination of local region detectors and descriptors based on convolutional neural networks. This thesis presents and evaluates several new approaches to incorporate superpixel segmentations in local region detection. While the proposed system can be used with different types of local regions, in particular the combination with regions obtained from the novel multiscale superpixel grid shows to perform superior to the state of the art methods - a promising basis for practical applications.
|
6 |
Visual Place Recognition in Changing Environments using Additional Data-Inherent KnowledgeSchubert, Stefan 15 November 2023 (has links)
Visual place recognition is the task of finding same places in a set of database images for a given set of query images. This becomes particularly challenging for long-term applications when the environmental condition changes between or within the database and query set, e.g., from day to night. Visual place recognition in changing environments can be used if global position data like GPS is not available or very inaccurate, or for redundancy. It is required for tasks like loop closure detection in SLAM, candidate selection for global localization, or multi-robot/multi-session mapping and map merging.
In contrast to pure image retrieval, visual place recognition can often build upon additional information and data for improvements in performance, runtime, or memory usage. This includes additional data-inherent knowledge about information that is contained in the image sets themselves because of the way they were recorded. Using data-inherent knowledge avoids the dependency on other sensors, which increases the generality of methods for an integration into many existing place recognition pipelines.
This thesis focuses on the usage of additional data-inherent knowledge. After the discussion of basics about visual place recognition, the thesis gives a systematic overview of existing data-inherent knowledge and corresponding methods. Subsequently, the thesis concentrates on a deeper consideration and exploitation of four different types of additional data-inherent knowledge. This includes 1) sequences, i.e., the database and query set are recorded as spatio-temporal sequences so that consecutive images are also adjacent in the world, 2) knowledge of whether the environmental conditions within the database and query set are constant or continuously changing, 3) intra-database similarities between the database images, and 4) intra-query similarities between the query images. Except for sequences, all types have received only little attention in the literature so far.
For the exploitation of knowledge about constant conditions within the database and query set (e.g., database: summer, query: winter), the thesis evaluates different descriptor standardization techniques. For the alternative scenario of continuous condition changes (e.g., database: sunny to rainy, query: sunny to cloudy), the thesis first investigates the qualitative and quantitative impact on the performance of image descriptors. It then proposes and evaluates four unsupervised learning methods, including our novel clustering-based descriptor standardization method K-STD and three PCA-based methods from the literature. To address the high computational effort of descriptor comparisons during place recognition, our novel method EPR for efficient place recognition is proposed. Given a query descriptor, EPR uses sequence information and intra-database similarities to identify nearly all matching descriptors in the database. For a structured combination of several sources of additional knowledge in a single graph, the thesis presents our novel graphical framework for place recognition. After the minimization of the graph's error with our proposed ICM-based optimization, the place recognition performance can be significantly improved. For an extensive experimental evaluation of all methods in this thesis and beyond, a benchmark for visual place recognition in changing environments is presented, which is composed of six datasets with thirty sequence combinations.
|
Page generated in 0.0632 seconds