• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 194
  • 24
  • 17
  • 10
  • 9
  • 6
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 334
  • 211
  • 141
  • 103
  • 69
  • 58
  • 55
  • 47
  • 44
  • 43
  • 42
  • 42
  • 37
  • 36
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

[en] A FACE RECOGNITION SYSTEM FOR VIDEO SEQUENCES BASED ON A MULTITHREAD IMPLEMENTATION OF TLD / [pt] UM SISTEMA DE RECONHECIMENTO FACIAL EM VÍDEO BASEADO EM UMA IMPLEMENTAÇÃO MULTITHREAD DO ALGORITMO TLD

CIZENANDO MORELLO BONFA 04 October 2018 (has links)
[pt] A identificação facial em vídeo é uma aplicação de grande interesse na comunidade cientifica e na indústria de segurança, impulsionando a busca por técnicas mais robustas e eficientes. Atualmente, no âmbito de reconhecimento facial, as técnicas de identificação frontal são as com melhor taxa de acerto quando comparadas com outras técnicas não frontais. Esse trabalho tem como objetivo principal buscar métodos de avaliar imagens em vídeo em busca de pessoas (rostos), avaliando se a qualidade da imagem está dentro de uma faixa aceitável que permita um algoritmo de reconhecimento facial frontal identificar os indivíduos. Propõem-se maneiras de diminuir a carga de processamento para permitir a avaliação do máximo número de indivíduos numa imagem sem afetar o desempenho em tempo real. Isso é feito através de uma análise da maior parte das técnicas utilizadas nos últimos anos e do estado da arte, compilando toda a informação para ser aplicada em um projeto que utiliza os pontos fortes de cada uma e compense suas deficiências. O resultado é uma plataforma multithread. Para avaliação do desempenho foram realizados testes de carga computacional com o uso de um vídeo público disponibilizado na AVSS (Advanced Video and Signal based Surveillance). Os resultados mostram que a arquitetura promove um melhor uso dos recursos computacionais, permitindo um uso de uma gama maior de algoritmos em cada segmento que compõe a arquitetura, podendo ser selecionados segundo critérios de qualidade da imagem e ambiente onde o vídeo é capturado. / [en] Face recognition in video is an application of great interest in the scientific community and in the surveillance industry, boosting the search for efficient and robust techniques. Nowadays, in the facial recognition field, the frontal identification techniques are those with the best hit ratio when compared with others non-frontal techniques. This work has as main objective seek for methods to evaluate images in video to look for people (faces), assessing if the image quality is in an acceptable range that allows a facial recognition algorithm to identify the individuals. It s proposed ways to decrease the processing load to allow a maximum number of individuals assessed in an image without affecting the real time performance. This is reached through analysis of most the techniques used in the last years and the state-of-the-art, compiling all information to be applied in a project that uses the strengths of each one and offset its shortcomings. The outcome is a multithread platform. Performance evaluation was performed through computational load tests by using public videos available in AVSS ( Advanced Video and Signal based Surveillance). The outcomes show that the architecture makes a better use of the computational resources, allowing use of a wide range of algorithms in every segment of the architecture that can be selected according to quality image and video environment criteria.
82

No tempo da pose: uma genealogia das figuras de aceleração do tempo em tecnologias fotossensíveis do século XIX

SOUZA, Camila Targino de 26 November 2012 (has links)
Submitted by Haroudo Xavier Filho (haroudo.xavierfo@ufpe.br) on 2016-04-05T15:14:16Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) TESE PARA O DEPÓSITO PDF.pdf: 2900383 bytes, checksum: ce72e0d7b6e5c9b80143aacc757d6966 (MD5) / Made available in DSpace on 2016-04-05T15:14:16Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) TESE PARA O DEPÓSITO PDF.pdf: 2900383 bytes, checksum: ce72e0d7b6e5c9b80143aacc757d6966 (MD5) Previous issue date: 2012-11-26 / FACEPE / A passagem do século XVIII para o XIX é uma das mais importantes para a compreensão do que somos na contemporaneidade. Koselleck afirma que o intervalo de 1750 a 1850 diz respeito a um tempo cindido: não mais nos encontrávamos nos primeiros passos de uma modernidade incipiente, mas ainda não havíamos alcançado a plena racionalização anunciada pelas ‘Luzes’. Esse conflito reflete-se exemplarmente nas primeiras décadas de prática de impressões de imagem por luz solar, que vai de 1814 a 1850. A análise de manuais fotoquímicos oitocentista e das cartas trocadas por Niépce e Daguerre é uma prova disso. A heliografia e a daguerreotipia, entendidas como dispositivos so ciais, disputaram, a partir de suas substâncias fotossensíveis, um espaço de construção da verdade sobre um tempo-técnico de captação de imagem. Esta pesquisa partiu do princípio de que tal temporalidade ligada à estética fotográfica não é neutra à medida que se relaciona com modalizações histórico-temporais que emergem no campo do fotossensível. Pensamentos e práticas setecentistas, através de um tempo distenso, influenciaram as impressões das imagens de Niépce, cujas cópias solares advinham de uma intimidade com o mundo vital formado na interface de uma fisiologia da vida e uma química pré-moderna dos reinos naturais. Já as práticas de Daguerre se desenvolveram a partir de um tempo célere moderno. As impressões que resultam na imagem sobre a prata espelhada, banhada com iodo, bromo e cloro retomam fortemente os primeiros movimentos da química teórica cuja identidade encontrava-se atrelada ao inorgânico das sínteses laboratoriais. Essa disposição dicotômica dos saberes mostra que, na passagem do século XVIII para o XIX, uma série de conceitos que se davam a uma razão mais utilitarista não havia ainda atingido seu ideal de cientificidade, sendo perpassada por toda uma tradição de saberes de um mundo natural e espiritual. As experiências de Niépce emergem desse momento no qual a alquimia já não reinava e ainda se estava longe da figura do químico especializado. Neste contexto, Niépce ocupou um lugar de resistência aos saberes modernos. Tal fato implica uma revisão da afirmativa, tradicionalmente disposta nas histórias da fotografia, de que entre a duração da captação de imagem niépciana e o imperativo de uma aceleração temporal, representada pela captação de imagem daguerreana, havia um horizonte de expectativa que encaminharia ambas as experiências à premência temporal moderna. Esse dito não deve mais perdurar sob pena de se continuar mascarando, através de uma vontade de verdade desmedida, um espaço de resistência vicejante que fez surgir, a partir de uma lógica pré-moderna, um determinado sonho da imagem impressa por luz solar. / The passage from the 18th to 19th century is one of the most important to the understanding of what we are currently. Koselleck says that the period of time from 1750 to 1850 concerns an intermediate phase: we took no more the first steps towards an incipient modernity, and yet we did not reach the full rationalization announced by the „Lights‟. This conflict is reflected exemplary in the first decades of practice in printing images by sunlight, which goes from 1814 to 1850. The analysis of photochemical manuals of 18th century and the correspondence between Niépce and Daguerre prove this. Heliography and daguerreotype, considered social dispositive, disputed, through its photosensitive substances, a space on building the truth upon a technical time of image caption. This research assumed that such temporality, linked to photographic esthetic, isn‟t neutral, as it is related to historical and temporal modeling emerging in the photosensitive field. Thoughts and practices of 17th century, through a relaxed time, influenced the printing of Niépce images, whose sunlight copies derived from an intimacy with the vital world which is formed in the interface between a physiology of life and a pre-modern chemistry of natural realms. As to Daguerre practices, it were developed from a quick modern time. Printing resulting in the image on mirrored silver, iodine, bromine and chlorine plated, remake the first movements of theoretical chemistry, whose identity was tied to the inorganic of laboratory synthesis. This dichotomist disposition of knowledge shows that, in the passage from 18th to 19th century, a series of concepts which follows a more utilitarian reason, did not reached its ideal of science, being per passed by a whole tradition on knowledge of natural and spiritual realms. Niépce experiences come out of this moment where alchemy no more reign and we were far enough from the character of specialized chemist. In this context, Niépce occupied a place of resistance to modern knowledge. This fact implies a revision of the statement, traditionally disposed on photography history, which between the length of Niepce image caption and the imperative of a temporal acceleration, represented by Daguerre image caption, there was a horizon of expectations, which would lead both experiences to modern temporal urgency. This statement should no long last, or we will continue to hide, through a unmeasured truth wish, a luxuriant resistance space, that brought up, through a pre-modern logic, the dream of the image printed by sunlight.
83

Analyse des personnes dans les films stéréoscopiques / Person analysis in stereoscopic movies

Seguin, Guillaume 29 April 2016 (has links)
Les humains sont au coeur de nombreux problèmes de vision par ordinateur, tels que les systèmes de surveillance ou les voitures sans pilote. Ils sont également au centre de la plupart des contenus visuels, pouvant amener à des jeux de données très larges pour l’entraînement de modèles et d’algorithmes. Par ailleurs, si les données stéréoscopiques font l’objet d’études depuis longtemps, ce n’est que récemment que les films 3D sont devenus un succès commercial. Dans cette thèse, nous étudions comment exploiter les données additionnelles issues des films 3D pour les tâches d’analyse des personnes. Nous explorons tout d’abord comment extraire une notion de profondeur à partir des films stéréoscopiques, sous la forme de cartes de disparité. Nous évaluons ensuite à quel point les méthodes de détection de personne et d’estimation de posture peuvent bénéficier de ces informations supplémentaires. En s’appuyant sur la relative facilité de la tâche de détection de personne dans les films 3D, nous développons une méthode pour collecter automatiquement des exemples de personnes dans les films 3D afin d’entraîner un détecteur de personne pour les films non 3D. Nous nous concentrons ensuite sur la segmentation de plusieurs personnes dans les vidéos. Nous proposons tout d’abord une méthode pour segmenter plusieurs personnes dans les films 3D en combinant des informations dérivées des cartes de profondeur avec des informations dérivées d’estimations de posture. Nous formulons ce problème comme un problème d’étiquetage de graphe multi-étiquettes, et notre méthode intègre un modèle des occlusions pour produire une segmentation multi-instance par plan. Après avoir montré l’efficacité et les limitations de cette méthode, nous proposons un second modèle, qui ne repose lui que sur des détections de personne à travers la vidéo, et pas sur des estimations de posture. Nous formulons ce problème comme la minimisation d’un coût quadratique sous contraintes linéaires. Ces contraintes encodent les informations de localisation fournies par les détections de personne. Cette méthode ne nécessite pas d’information de posture ou des cartes de disparité, mais peut facilement intégrer ces signaux supplémentaires. Elle peut également être utilisée pour d’autres classes d’objets. Nous évaluons tous ces aspects et démontrons la performance de cette nouvelle méthode. / People are at the center of many computer vision tasks, such as surveillance systems or self-driving cars. They are also at the center of most visual contents, potentially providing very large datasets for training models and algorithms. While stereoscopic data has been studied for long, it is only recently that feature-length stereoscopic ("3D") movies became widely available. In this thesis, we study how we can exploit the additional information provided by 3D movies for person analysis. We first explore how to extract a notion of depth from stereo movies in the form of disparity maps. We then evaluate how person detection and human pose estimation methods perform on such data. Leveraging the relative ease of the person detection task in 3D movies, we develop a method to automatically harvest examples of persons in 3D movies and train a person detector for standard color movies. We then focus on the task of segmenting multiple people in videos. We first propose a method to segment multiple people in 3D videos by combining cues derived from pose estimates with ones derived from disparity maps. We formulate the segmentation problem as a multi-label Conditional Random Field problem, and our method integrates an occlusion model to produce a layered, multi-instance segmentation. After showing the effectiveness of this approach as well as its limitations, we propose a second model which only relies on tracks of person detections and not on pose estimates. We formulate our problem as a convex optimization one, with the minimization of a quadratic cost under linear equality or inequality constraints. These constraints weakly encode the localization information provided by person detections. This method does not explicitly require pose estimates or disparity maps but can integrate these additional cues. Our method can also be used for segmenting instances of other object classes from videos. We evaluate all these aspects and demonstrate the superior performance of this new method.
84

Stéréotomie et vision artificielle pour la construction robotisée de structures maçonnées complexes / Stereotomy and computer vision for robotic construction of complex masonry structures

Loing, Vianney 22 January 2019 (has links)
Ce travail de thèse s'inscrit dans le contexte du développement de la robotique dans la construction. On s’intéresse ici à la construction robotisée de structures maçonnées complexes en ayant recours à de la vision artificielle. La construction sans cintre étant un enjeu important en ce qui concerne la productivité sur un chantier et la quantité de déchets produits, nous explorons, à cet effet, les possibilités qu'offre la rigidité en flexion inhérente aux maçonneries topologiquement autobloquantes. La génération de ces dernières, classique dans le cas plan, est généralisée ici à la conception de structures courbes, à partir de maillages de quadrangles plans et de manière paramétrique, grâce aux logiciels Rhinoceros 3D / Grasshopper. Pour cela, nous proposons un ensemble d'inégalités à respecter afin que la structure obtenue soit effectivement topologiquement autobloquante. Ces inégalités permettent, par ailleurs, d'introduire un résultat nouveau ; à savoir qu'il est possible d'avoir un assemblage de blocs dans lequel chacun des blocs est topologiquement bloqué en translation, mais un sous-ensemble — constitué de plusieurs de ces blocs — ne l'est pas. Un prototype de maçonnerie à topologie autobloquante est finalement conçu. Sa conception repose sur une découpe des joints d'inclinaison variable qui permet de le construire sans cintre. En parallèle, nous abordons des aspects de vision artificielle robuste pour un environnement chantier, environnement complexe dans lequel les capteurs peuvent subir des chocs, être salis ou déplacés accidentellement. Le problème est d'estimer la position relative d'un bloc de maçonnerie par rapport à un bras robot, à partir de simples caméras 2D ne nécessitant pas d'étape de calibration. Notre approche repose sur l'utilisation de réseaux de neurones convolutifs de classification, entraînés à partir de centaines de milliers d'images synthétiques de l’ensemble bras robot + bloc, présentant des variations aléatoires en terme de dimensions et positions du bloc, textures, éclairage, etc, et ce afin que le robot puisse apprendre à repérer le bloc sans trop de biais d’environnement. La génération de ces images est réalisée grâce à Unreal Engine 4. Cette méthode permet la localisation du bloc par rapport au robot avec une précision millimétrique, sans utiliser une seule image réelle pour la phase d'apprentissage ; ce qui constitue un avantage certain puisque l'acquisition de données représentatives pour l'apprentissage est un processus long et fastidieux. Nous avons également construit une base de données riche, constituée d’environ 12000 images réelles contenant un robot et un bloc précisément localisés, permettant d’évaluer quantitativement notre approche et de la rendre comparable aux approches alternatives. Un démonstrateur réel intégrant un bras ABB IRB 120, des blocs parallélépipédiques et trois webcams a été mis en place pour démontrer la faisabilité de la méthode / The context of this thesis work is the development of robotics in the construction industry. We explore the robotic construction of complex masonry structures with the help of computer vision. Construction without the use of formwork is an important issue in relation to both productivity on a construction site and the amount of waste generated. To this end, we study topological interlocking masonries and the possibilities they present. The design of this kind of masonry is standard for planar structures. We generalize it to the design of curved structures in a parametrical way, using PQ meshes and the softwares Rhinoceros 3D and Grasshopper. To achieve this, we introduce a set of inequalities to respect in order to have a topological interlocked structure. These inequalities allow us to present a new result. Namely, it is possible to have an assembly of blocks in which each block is interlocked in translation, while having a subset — composed of several of these blocks — that is not interlocked. We also present a prototype of topological interlocking masonry. Its design is based on variable inclination joints, allowing construction without formwork. In parallel, we are studying robust computer vision for unstructured environments like construction sites, in which sensors are vulnerable to dust or could be accidentally jostled. The goal is to estimate the relative pose (position + orientation) of a masonry block with respect to a robot, using only cheap cameras without the need for calibration. Our approach relies on a classification Convolutional Neural Network trained using hundreds of thousands of synthetically rendered scenes with a robot and a block, and randomized parameters such as block dimensions and poses, light, textures, etc, so that the robot can learn to locate the block without being influenced by the environment. The generation of these images is performed with Unreal Engine 4. This method allows us to estimate a block pose very accurately, with only millimetric errors, without using a single real image for training. This is a strong advantage since acquiring representative training data is a long and expensive process. We also built a new rich dataset of real robot images (about 12,000 images) with accurately localized blocks so that we can evaluate our approach and compare it to alternative approaches. A real demonstrator, including a ABB IRB 120 robot, cuboid blocks and three webcams was set up to prove the feasibility of the method
85

Alignement de données 2D, 3D et applications en réalité augmentée. / 2D, 3D data alignment and application in augmented reality

El Rhabi, Youssef 12 June 2017 (has links)
Ette thèse s’inscrit dans le contexte de la réalité augmentée (RA). La problématique majeure consiste à calculer la pose d’une caméra en temps réel. Ce calcul doit être effectué en respectant trois critères principaux : précision, robustesse et rapidité. Dans le cadre de cette thèse, nous introduisons certaines méthodes permettant d’exploiter au mieux les primitives des images. Dans notre cas, les primitives sont des points que nous allons détecter puis décrire dans une image. Pour ce faire, nous nous basons sur la texture de cette image. Nous avons dans un premier temps mis en place une architecture favorisant le calcul rapide de la pose, sans perdre en précision ni en robustesse. Nous avons pour cela exploité une phase hors ligne, où nous reconstruisons la scène en 3D. Nous exploitons les informations que nous obtenons lors de cette phase hors ligne afin de construire un arbre de voisinage. Cet arbre lie les images de la base de données entre elles. Disposer de cet arbre nous permet de calculer la pose de la caméra plus efficacement en choisissant les images de la base de données jugées les plus pertinentes. Nous rendant compte que la phase de description et de comparaison des primitives n’est pas suffisamment rapide, nous en avons optimisé les calculs. Cela nous a mené jusqu’à proposer notre propre descripteur. Pour cela, nous avons dressé un schéma générique basé sur la théorie de l’information qui englobe une bonne part des descripteurs binaires, y compris un descripteur récent nommé BOLD [BTM15]. Notre objectif a été, comme pour BOLD, d’augmenter la stabilité aux changements d’orientation du descripteur produit. Afin de réaliser cela, nous avons construit un nouveau schéma de sélection hors ligne plus adapté à la procédure de mise en correspondance en ligne. Cela permet d’intégrer ces améliorations dans le descripteur que nous construisons. Procéder ainsi permet d’améliorer les performances du descripteur notamment en terme de rapidité en comparaison avec les descripteurs de l’état de l’art. Nous détaillons dans cette thèse les différentes méthodes que nous avons mises en place afin d’optimiser l’estimation de la pose d’une caméra. Nos travaux ont fait l’objet de 2 publications (1 nationale et 1 internationale) et d’un dépôt de brevet. / This thesis belongs within the context of augmented reality. The main issue resides in estimating a camera pose in real-time. This estimation should be done following three main criteria: precision, robustness and computation efficiency.In the frame of this thesis we established methods enabling better use of image primitives. As far as we are concerned, we limit ourselves to keypoint primitives. We first set an architecture enabling faster pose estimation without loss of precision or robustness. This architecture is based on using data collected during an offline phase. This offline phase is used to construct a 3D point cloud of the scene. We use those data in order to build a neighbourhood graph within the images in the database. This neighbourhood graph enables us to select the most relevant images in order to compute the camera pose more efficiently. Since the description and matching processes are not fast enough with SIFT descriptor, we decided to optimise the bottleneck parts of the whole pipeline. It led us to propose our own descriptor. Towards this aim, we built a framework encompassing most recent binary descriptors including a recent state-of-the-art one named BOLD. We pursue a similar goal to BOLD, namely to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure introduced in BOLD.In this thesis we introduce several methods used to estimate camera poses more efficiently. Our work has been distinguished by two publications (a national and an international one) as well as with a patent application.
86

Adaptive Losses for Camera Pose Supervision

Dahlqvist, Marcus January 2021 (has links)
This master thesis studies the learning of dense feature descriptors where camera poses are the only supervisory signal. The use of camera poses as a supervisory signal has only been published once before, and this thesis expands on this previous work by utilizing a couple of different techniques meant increase the robustness of the method, which is particularly important when not having access to ground-truth correspondences. Firstly, an adaptive robust loss is utilized to better differentiate inliers and outliers. Secondly, statistical properties during training are both enforced and adapted to, in an attempt to alleviate problems with uncertainties introduced by not having true correspondences available. These additions are shown to slightly increase performance, and also highlights some key ideas related to prediction certainty and robustness when working with camera poses as a supervisory signal. Finally, possible directions for future work are discussed.
87

En jämförelse av inlärningsbaserade lösningar för mänsklig positionsuppskattning i 3D / A comparison of learning-based solutions for 3D human pose estimation

Lange, Alfons, Lindfors, Erik January 2019 (has links)
Inom områden som idrottsvetenskap och underhållning kan det finnas behov av att analysera en människas kroppsposition i 3D. Dessa behov kan innefatta att analysera en golfsving eller att möjliggöra mänsklig interaktion med spel. För att tillförlitligt uppskatta kroppspositioner krävs det idag specialiserad hårdvara som ofta är dyr och svårtillgänglig. På senare tid har det även tillkommit inlärningsbaserade lösningar som kan utföra samma uppskattning på vanliga bilder. Syftet med arbetet har varit att identifiera och jämföra populära inlärningsbaserade lösningar samt undersöka om någon av dessa presterar i paritet med en etablerad hårdvarubaserad lösning. För detta har testverktyg utvecklats, positionsuppskattningar genomförts och resul- tatdata för samtliga tester analyserats. Resultatet har visat att lösningarna inte pre- sterar likvärdigt med Kinect och att de i nuläget inte är tillräckligt välutvecklade för att användas som substitut för specialiserad hårdvara. / In fields such as sports science and entertainment, there’s occasionally a need to an- alyze a person's body pose in 3D. These needs may include analyzing a golf swing or enabling human interaction with games. Today, in order to reliably perform a human pose estimation, specialized hardware is usually required, which is often expensive and difficult to access. In recent years, multiple learning-based solutions have been developed that can perform the same kind of estimation on ordinary images. The purpose of this report has been to identify and compare popular learning-based so- lutions and to investigate whether any of these perform on par with an established hardware-based solution. To accomplish this, tools for testing have been developed, pose estimations have been conducted and result data for each test have been ana- lyzed. The result has shown that the solutions do not perform on par with Kinect and that they are currently not sufficiently well-developed to be used as a substitute for specialized hardware.
88

Conformal Tracking For Virtual Environments

Davis, Larry Dennis, Jr. 01 January 2004 (has links)
A virtual environment is a set of surroundings that appears to exist to a user through sensory stimuli provided by a computer. By virtual environment, we mean to include environments supporting the full range from VR to pure reality. A necessity for virtual environments is knowledge of the location of objects in the environment. This is referred to as the tracking problem, which points to the need for accurate and precise tracking in virtual environments. Marker-based tracking is a technique which employs fiduciary marks to determine the pose of a tracked object. A collection of markers arranged in a rigid configuration is called a tracking probe. The performance of marker-based tracking systems depends upon the fidelity of the pose estimates provided by tracking probes. The realization that tracking performance is linked to probe performance necessitates investigation into the design of tracking probes for proponents of marker-based tracking. The challenges involved with probe design include prediction of the accuracy and precision of a tracking probe, the creation of arbitrarily-shaped tracking probes, and the assessment of the newly created probes. To address these issues, we present a pioneer framework for designing conformal tracking probes. Conformal in this work means to adapt to the shape of the tracked objects and to the environmental constraints. As part of the framework, the accuracy in position and orientation of a given probe may be predicted given the system noise. The framework is a methodology for designing tracking probes based upon performance goals and environmental constraints. After presenting the conformal tracking framework, the elements used for completing the steps of the framework are discussed. We start with the application of optimization methods for determining the probe geometry. Two overall methods for mapping markers on tracking probes are presented, the Intermediary Algorithm and the Viewpoints Algorithm. Next, we examine the method used for pose estimation and present a mathematical model of error propagation used for predicting probe performance in pose estimation. The model uses a first-order error propagation, perturbing the simulated marker locations with Gaussian noise. The marker locations with error are then traced through the pose estimation process and the effects of the noise are analyzed. Moreover, the effects of changing the probe size or the number of markers are discussed. Finally, the conformal tracking framework is validated experimentally. The assessment methods are divided into simulation and post-fabrication methods. Under simulation, we discuss testing of the performance of each probe design. Then, post-fabrication assessment is performed, including accuracy measurements in orientation and position. The framework is validated with four tracking probes. The first probe is a six-marker planar probe. The predicted accuracy of the probe was 0.06 deg and the measured accuracy was 0.083 plus/minus 0.015 deg. The second probe was a pair of concentric, planar tracking probes mounted together. The smaller probe had a predicted accuracy of 0.206 deg and a measured accuracy of 0.282 plus/minus 0.03 deg. The larger probe had a predicted accuracy of 0.039 deg and a measured accuracy of 0.017 plus/minus 0.02 deg. The third tracking probe was a semi-spherical head tracking probe. The predicted accuracy in orientation and position was 0.54 plus/minus 0.24 deg and 0.24 plus/minus 0.1 mm, respectively. The experimental accuracy in orientation and position was 0.60 plus/minus 0.03 deg and 0.225 plus/minus 0.05 mm, respectively. The last probe was an integrated, head-mounted display probe, created using the conformal design process. The predicted accuracy of this probe was 0.032 plus/minus 0.02 degrees in orientation and 0.14 plus/minus 0.08 mm in position. The measured accuracy of the probe was 0.028 plus/minus 0.01 degrees in orientation and 0.11 plus/minus 0.01 mm in position. These results constitute an order of magnitude improvement over current marker-based tracking probes in orientation, indicating the benefits of a conformal tracking approach. Also, this result translates to a predicted positional overlay error of a virtual object presented at 1m of less than 0.5 mm, which is well above reported overlay performance in virtual environments.
89

AUV SLAM constraint formation using side scan sonar / AUV SLAM Begränsningsbildning med hjälp av sidescan sonar

Schouten, Marco January 2022 (has links)
Autonomous underwater vehicle (AUV) navigation has been a challenging problem for a long time. Navigation is challenging due to the drift present in underwater environments and the lack of precise localisation systems such as GPS. Therefore, the uncertainty of the vehicle’s pose grows with the mission’s duration. This research investigates methods to form constraints on the vehicle’s pose throughout typical surveys. Current underwater navigation relies on acoustic sensors. Side Scan Sonar (SSS) is cheaper than Multibeam echosounder (MBES) but can generate 2D intensity images of wide sections of the seafloor instead of 3D representations. The methodology consists in extracting information from pairs of side-scan sonar images representing overlapping portions of the seafloor and computing the sensor pose transformation between the two reference frames of the image to generate constraints on the pose. The chosen approach relies on optimisation methods within a Simultaneous Localisation and Mapping (SLAM) framework to directly correct the trajectory and provide the best estimate of the AUV pose. I tested the optimisation system on simulated data to evaluate the proof of concept. Lastly, as an experiment trial, I tested the implementation on an annotated dataset containing overlapping side-scan sonar images provided by SMaRC. The simulated results indicate that AUV pose error can be reduced by optimisation, even with various noise levels in the measurements. / Navigering av autonoma undervattensfordon (AUV) har länge varit ett utmanande problem. Navigering är en utmaning på grund av den drift som förekommer i undervattensmiljöer och bristen på exakta lokaliseringssystem som GPS. Därför ökar osäkerheten i fråga om fordonets position med uppdragets längd. I denna forskning undersöks metoder för att skapa begränsningar för fordonets position under typiska undersökningar. Nuvarande undervattensnavigering bygger på akustiska sensorer. Side Scan Sonar (SSS) är billigare än Multibeam echosounder (MBES) men kan generera 2D-intensitetsbilder av stora delar av havsbotten i stället för 3D-bilder. Metoden består i att extrahera information från par av side-scan sonarbilder som representerar överlappande delar av havsbotten och beräkna sensorns posetransformation mellan bildens två referensramar för att generera begränsningar för poset. Det valda tillvägagångssättet bygger på optimeringsmetoder inom en SLAM-ram (Simultaneous Localisation and Mapping) för att direkt korrigera banan och ge den bästa uppskattningen av AUV:s position. Jag testade optimeringssystemet på simulerade data för att utvärdera konceptet. Slutligen testade jag genomförandet på ett annoterat dataset med överlappande side-scan sonarbilder från SMaRC. De simulerade resultaten visar att AUV:s poseringsfel kan minskas genom optimering, även med olika brusnivåer i mätningarna.
90

A Composite Field-Based Learning Framework for Pose Estimation and Object Detection : Exploring Scale Variation Adaptations in Composite Field-Based Pose Estimation and Extending the Framework for Object Detection / En sammansatt fältbaserad inlärningsramverk för posuppskattning och objektdetektering : Utforskning av skalvariationsanpassningar i sammansatt fältbaserad posuppskattning och utvidgning av ramverket för objektdetektering

Guo, Jianting January 2024 (has links)
This thesis aims to address the concurrent challenges of multi-person 2D pose estimation and object detection within a unified bottom-up framework. Our foundational solutions encompass a recently proposed pose estimation framework named OpenPifPaf, grounded in composite fields. OpenPifPaf employs the Composite Intensity Field (CIF) for precise joint localization and the Composite Association Field (CAF) for seamless joint connectivity. To assess the model’s robustness against scale variances, a Feature Pyramid Network (FPN) is incorporated into the baseline. Additionally, we present a variant of OpenPifPaf known as CifDet. CifDet utilizes the Composite Intensity Field to classify and detect object centers, subsequently regressing bounding boxes from these identified centers. Furthermore, we introduce an extended version of CifDet specifically tailored for enhanced object detection capabilities—CifCafDet. This augmented framework is designed to more effectively tackle the challenges inherent in object detection tasks. The baseline OpenPifPaf model outperforms most existing bottom-up pose estimation methods and achieves comparable results with some state-of-the-art top-down methods on the COCO keypoint dataset. Its variant, CifDet, adapts the OpenPifPaf’s composite field-based architecture for object detection tasks. Further modifications result in CifCafDet, which demonstrates enhanced performance on the MS COCO detection dataset over CifDet, suggesting its viability as a multi-task framework. / Denna avhandling syftar till att ta itu med de samtidiga utmaningarna med flerpersons 2D-posestimering och objektdetektion inom en enhetlig bottom-up-ram. Våra grundläggande lösningar omfattar ett nyligen föreslaget ramverk för posestimering med namnet OpenPifPaf, som grundar sig i kompositfält. OpenPifPaf använder Composite Intensity Field (CIF) för exakt leddlokalisering och Composite Association Field (CAF) för sömlös ledanslutning. För att bedöma modellens robusthet mot skalvariationer införlivas ett Feature Pyramid Network (FPN) i baslinjen. Dessutom presenterar vi en variant av OpenPifPaf känd som CifDet. CifDet använder Composite Intensity Field för att klassificera och detektera objektcentrum, för att sedan regrediera inramningslådor från dessa identifierade centrum. Vidare introducerar vi en utökad version av CifDet som är speciellt anpassad för förbättrade objektdetekteringsförmågor—CifCafDet. Detta förstärkta ramverk är utformat för att mer effektivt ta itu med de utmaningar som är inneboende i objektdetekteringsuppgifter. Basmodellen OpenPifPaf överträffar de flesta befintliga bottom-up-metoder för posestimering och uppnår jämförbara resultat med vissa toppmoderna top-down-metoder på COCO-keypoint-datasetet. Dess variant, CifDet, anpassar OpenPifPafs kompositfältbaserade arkitektur för objekt-detekteringsuppgifter. Ytterligare modifieringar resulterar i CifCafDet, som visar förbättrad prestanda på MS COCO-detektionsdatasetet över CifDet, vilket antyder dess livskraft som ett ramverk för flera uppgifter.

Page generated in 0.0383 seconds