• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 123
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 186
  • 186
  • 97
  • 71
  • 48
  • 35
  • 33
  • 32
  • 30
  • 29
  • 28
  • 27
  • 26
  • 24
  • 24
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Simultaneous real-time object recognition and pose estimation for artificial systems operating in dynamic environments

Van Wyk, Frans Pieter January 2013 (has links)
Recent advances in technology have increased awareness of the necessity for automated systems in people’s everyday lives. Artificial systems are more frequently being introduced into environments previously thought to be too perilous for humans to operate in. Some robots can be used to extract potentially hazardous materials from sites inaccessible to humans, while others are being developed to aid humans with laborious tasks. A crucial aspect of all artificial systems is the manner in which they interact with their immediate surroundings. Developing such a deceivingly simply aspect has proven to be significantly challenging, as it not only entails the methods through which the system perceives its environment, but also its ability to perform critical tasks. These undertakings often involve the coordination of numerous subsystems, each performing its own complex duty. To complicate matters further, it is nowadays becoming increasingly important for these artificial systems to be able to perform their tasks in real-time. The task of object recognition is typically described as the process of retrieving the object in a database that is most similar to an unknown, or query, object. Pose estimation, on the other hand, involves estimating the position and orientation of an object in three-dimensional space, as seen from an observer’s viewpoint. These two tasks are regarded as vital to many computer vision techniques and and regularly serve as input to more complex perception algorithms. An approach is presented which regards the object recognition and pose estimation procedures as mutually dependent. The core idea is that dissimilar objects might appear similar when observed from certain viewpoints. A feature-based conceptualisation, which makes use of a database, is implemented and used to perform simultaneous object recognition and pose estimation. The design incorporates data compression techniques, originally suggested by the image-processing community, to facilitate fast processing of large databases. System performance is quantified primarily on object recognition, pose estimation and execution time characteristics. These aspects are investigated under ideal conditions by exploiting three-dimensional models of relevant objects. The performance of the system is also analysed for practical scenarios by acquiring input data from a structured light implementation, which resembles that obtained from many commercial range scanners. Practical experiments indicate that the system was capable of performing simultaneous object recognition and pose estimation in approximately 230 ms once a novel object has been sensed. An average object recognition accuracy of approximately 73% was achieved. The pose estimation results were reasonable but prompted further research. The results are comparable to what has been achieved using other suggested approaches such as Viewpoint Feature Histograms and Spin Images. / Dissertation (MEng)--University of Pretoria, 2013. / gm2014 / Electrical, Electronic and Computer Engineering / unrestricted
132

Calcul de pose dynamique avec les caméras CMOS utilisant une acquisition séquentielle / Dynamic pose estimation with CMOS cameras using sequential acquisition

Magerand, Ludovic 18 December 2014 (has links)
En informatique, la vision par ordinateur s’attache à extraire de l’information à partir de caméras. Les capteurs de celles-ci peuvent être produits avec la technologie CMOS que nous retrouvons dans les appareils mobiles en raison de son faible coût et d’un encombrement réduit. Cette technologie permet d’acquérir rapidement l’image en exposant les lignes de l’image de manière séquentielle. Cependant cette méthode produit des déformations dans l’image s’il existe un mouvement entre la caméra et la scène filmée. Cet effet est connu sous le nom de «Rolling Shutter» et de nombreuses méthodes ont tenté de corriger ces artefacts. Plutôt que de le corriger, des travaux antérieurs ont développé des méthodes pour extraire de l’information sur le mouvement à partir de cet effet. Ces méthodes reposent sur une extension de la modélisation géométrique classique des caméras pour prendre en compte l’acquisition séquentielle et le mouvement entre le capteur et la scène, considéré uniforme. À partir de cette modélisation, il est possible d’étendre le calcul de pose habituel (estimation de la position et de l’orientation de la scène par rapport au capteur) pour estimer aussi les paramètres du mouvement. Dans la continuité de cette démarche, nous présenterons une généralisation à des mouvements non-uniformes basée sur un lissage des dérivées des paramètres de mouvement. Ensuite nous présenterons une modélisation polynomiale du «Rolling Shutter» et une méthode d’optimisation globale pour l’estimation de ces paramètres. Correctement implémenté, cela permet de réaliser une mise en correspondance automatique entre le modèle tridimensionnel et l’image. Pour terminer nous comparerons ces différentes méthodes tant sur des données simulées que sur des données réelles et conclurons. / Computer Vision, a field of Computer Science, is about extracting information from cameras. Their sensors can be produced using the CMOS technology which is widely used on mobile devices due to its low cost and volume. This technology allows a fast acquisition of an image by sequentially exposin the scan-line. However this method produces some deformation in the image if there is a motion between the camera and the filmed scene. This effect is known as Rolling Shutter and various methods have tried to remove these artifacts. Instead of correcting it, previous works have shown methods to extract information on the motion from this effect. These methods rely on a extension of the usual geometrical model of cameras by taking into account the sequential acquisition and the motion, supposed uniform, between the sensor and the scene. From this model, it’s possible to extend the usual pose estimation (estimation of position and orientation of the camera in the scene) to also estimate the motion parameters. Following on from this approach, we will present an extension to non-uniform motions based on a smoothing of the derivatives of the motion parameters. Afterwards, we will present a polynomial model of the Rolling Shutter and a global optimisation method to estimate the motion parameters. Well implemented, this enables to establish an automatic matching between the 3D model and the image. We will conclude with a comparison of all these methods using either simulated or real data.
133

Analyse des personnes dans les films stéréoscopiques / Person analysis in stereoscopic movies

Seguin, Guillaume 29 April 2016 (has links)
Les humains sont au coeur de nombreux problèmes de vision par ordinateur, tels que les systèmes de surveillance ou les voitures sans pilote. Ils sont également au centre de la plupart des contenus visuels, pouvant amener à des jeux de données très larges pour l’entraînement de modèles et d’algorithmes. Par ailleurs, si les données stéréoscopiques font l’objet d’études depuis longtemps, ce n’est que récemment que les films 3D sont devenus un succès commercial. Dans cette thèse, nous étudions comment exploiter les données additionnelles issues des films 3D pour les tâches d’analyse des personnes. Nous explorons tout d’abord comment extraire une notion de profondeur à partir des films stéréoscopiques, sous la forme de cartes de disparité. Nous évaluons ensuite à quel point les méthodes de détection de personne et d’estimation de posture peuvent bénéficier de ces informations supplémentaires. En s’appuyant sur la relative facilité de la tâche de détection de personne dans les films 3D, nous développons une méthode pour collecter automatiquement des exemples de personnes dans les films 3D afin d’entraîner un détecteur de personne pour les films non 3D. Nous nous concentrons ensuite sur la segmentation de plusieurs personnes dans les vidéos. Nous proposons tout d’abord une méthode pour segmenter plusieurs personnes dans les films 3D en combinant des informations dérivées des cartes de profondeur avec des informations dérivées d’estimations de posture. Nous formulons ce problème comme un problème d’étiquetage de graphe multi-étiquettes, et notre méthode intègre un modèle des occlusions pour produire une segmentation multi-instance par plan. Après avoir montré l’efficacité et les limitations de cette méthode, nous proposons un second modèle, qui ne repose lui que sur des détections de personne à travers la vidéo, et pas sur des estimations de posture. Nous formulons ce problème comme la minimisation d’un coût quadratique sous contraintes linéaires. Ces contraintes encodent les informations de localisation fournies par les détections de personne. Cette méthode ne nécessite pas d’information de posture ou des cartes de disparité, mais peut facilement intégrer ces signaux supplémentaires. Elle peut également être utilisée pour d’autres classes d’objets. Nous évaluons tous ces aspects et démontrons la performance de cette nouvelle méthode. / People are at the center of many computer vision tasks, such as surveillance systems or self-driving cars. They are also at the center of most visual contents, potentially providing very large datasets for training models and algorithms. While stereoscopic data has been studied for long, it is only recently that feature-length stereoscopic ("3D") movies became widely available. In this thesis, we study how we can exploit the additional information provided by 3D movies for person analysis. We first explore how to extract a notion of depth from stereo movies in the form of disparity maps. We then evaluate how person detection and human pose estimation methods perform on such data. Leveraging the relative ease of the person detection task in 3D movies, we develop a method to automatically harvest examples of persons in 3D movies and train a person detector for standard color movies. We then focus on the task of segmenting multiple people in videos. We first propose a method to segment multiple people in 3D videos by combining cues derived from pose estimates with ones derived from disparity maps. We formulate the segmentation problem as a multi-label Conditional Random Field problem, and our method integrates an occlusion model to produce a layered, multi-instance segmentation. After showing the effectiveness of this approach as well as its limitations, we propose a second model which only relies on tracks of person detections and not on pose estimates. We formulate our problem as a convex optimization one, with the minimization of a quadratic cost under linear equality or inequality constraints. These constraints weakly encode the localization information provided by person detections. This method does not explicitly require pose estimates or disparity maps but can integrate these additional cues. Our method can also be used for segmenting instances of other object classes from videos. We evaluate all these aspects and demonstrate the superior performance of this new method.
134

Aplikace rozšířené reality: Měření rozměrů objektů / Application of Augmented Reality: Measurement of Object Dimensions

Karásek, Miroslav January 2019 (has links)
The goal of this diploma thesis is design and implementation of an application for automated measurement of objects in augmented reality. It focuses on automating the entire process, so that the user carries out the fewest number of manual actions. The proposed interface divides the measurement into several steps in which it gives the user instructions to progress to the next stage. The result is an Android application with ARCore technology. Is capable of determining the minimal bounding box of an object of a general shape lying on a horizontal surface. Measure error depends on ambient conditions and is in units of percent.
135

Automatická kalibrace robotického ramene pomocí kamer/y / Automatic calibration ot robotic arm using cameras

Adámek, Daniel January 2019 (has links)
K nahrazení člověka při úloze testování dotykových embedded zařízení je zapotřebí vyvinout komplexní automatizovaný robotický systém. Jedním ze zásadních úkolů je tento systém automaticky zkalibrovat. V této práci jsem se zabýval možnými způsoby automatické kalibrace robotického ramene v prostoru ve vztahu k dotykovému zařízení pomocí jedné či více kamer. Následně jsem představil řešení založené na estimaci polohy jedné kamery pomocí iterativních metod jako např. Gauss-Newton nebo Levenberg-Marquardt. Na konci jsem zhodnotil dosaženou přesnost a navrhnul postup pro její zvýšení.
136

6-DOF lokalizace objektů v průmyslových aplikacích / 6-DOF Object Localization in Industrial Applications

Macurová, Nela January 2021 (has links)
The aim of this work is to design a method for the object localization in the point could and as accurately as possible estimates the 6D pose of known objects in the industrial scene for bin picking. The design of the solution is inspired by the PoseCNN network. The solution also includes a scene simulator that generates artificial data. The simulator is used to generate a training data set containing 2 objects for training a convolutional neural network. The network is tested on annotated real scenes and achieves low success, only 23.8 % and 31.6 % success for estimating translation and rotation for one type of obejct and for another 12.4 % and 21.6 %, while the tolerance for correct estimation is 5 mm and 15°. However, by using the ICP algorithm on the estimated results, the success of the translation estimate is 81.5 % and the rotation is 51.8 % and for the second object 51.9 % and 48.7 %. The benefit of this work is the creation of a generator and testing the functionality of the network on small objects
137

Automated Gait Analysis : Using Deep Metric Learning

Engström, Isak January 2021 (has links)
Sectors of security, safety, and defence require methods for identifying people on the individual level. Automation of these tasks has the potential of outperforming manual labor, as well as relieving workloads. The ever-extending surveillance camera networks, advances in human pose estimation from monocular cameras, together with the progress of deep learning techniques, pave the way for automated walking gait analysis as an identification method. This thesis investigates the use of 2D kinematic pose sequences to represent gait, monocularly extracted from a limited dataset containing walking individuals captured from five camera views. The sequential information of the gait is captured using recurrent neural networks. Techniques in deep metric learning are applied to evaluate two network models, with contrasting output dimensionalities, against deep-metric-, and non-deep-metric-based embedding spaces. The results indicate that the gait representation, network designs, and network learning structure show promise when identifying individuals, scaling particularly well to unseen individuals. However, with the limited dataset, the network models performed best when the dataset included the labels from both the individuals and the camera views simultaneously, contrary to when the data only contained the labels from the individuals without the information of the camera views. For further investigations, an extension of the data would be required to evaluate the accuracy and effectiveness of these methods, for the re-identification task of each individual. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>
138

Simutaneous real-time object recognition and pose estimation for artificial systems operating in dynamic environments

Van Wyk, Frans-Pieter January 2013 (has links)
Recent advances in technology have increased awareness of the necessity for automated systems in people’s everyday lives. Artificial systems are more frequently being introduced into environments previously thought to be too perilous for humans to operate in. Some robots can be used to extract potentially hazardous materials from sites inaccessible to humans, while others are being developed to aid humans with laborious tasks. A crucial aspect of all artificial systems is the manner in which they interact with their immediate surroundings. Developing such a deceivingly simply aspect has proven to be significantly challenging, as it not only entails the methods through which the system perceives its environment, but also its ability to perform critical tasks. These undertakings often involve the coordination of numerous subsystems, each performing its own complex duty. To complicate matters further, it is nowadays becoming increasingly important for these artificial systems to be able to perform their tasks in real-time. The task of object recognition is typically described as the process of retrieving the object in a database that is most similar to an unknown, or query, object. Pose estimation, on the other hand, involves estimating the position and orientation of an object in three-dimensional space, as seen from an observer’s viewpoint. These two tasks are regarded as vital to many computer vision techniques and regularly serve as input to more complex perception algorithms. An approach is presented which regards the object recognition and pose estimation procedures as mutually dependent. The core idea is that dissimilar objects might appear similar when observed from certain viewpoints. A feature-based conceptualisation, which makes use of a database, is implemented and used to perform simultaneous object recognition and pose estimation. The design incorporates data compression techniques, originally suggested by the image-processing community, to facilitate fast processing of large databases. System performance is quantified primarily on object recognition, pose estimation and execution time characteristics. These aspects are investigated under ideal conditions by exploiting three-dimensional models of relevant objects. The performance of the system is also analysed for practical scenarios by acquiring input data from a structured light implementation, which resembles that obtained from many commercial range scanners. Practical experiments indicate that the system was capable of performing simultaneous object recognition and pose estimation in approximately 230 ms once a novel object has been sensed. An average object recognition accuracy of approximately 73% was achieved. The pose estimation results were reasonable but prompted further research. The results are comparable to what has been achieved using other suggested approaches such as Viewpoint Feature Histograms and Spin Images. / Dissertation (MEng)--University of Pretoria, 2013. / gm2014 / Electrical, Electronic and Computer Engineering / unrestricted
139

Stéréotomie et vision artificielle pour la construction robotisée de structures maçonnées complexes / Stereotomy and computer vision for robotic construction of complex masonry structures

Loing, Vianney 22 January 2019 (has links)
Ce travail de thèse s'inscrit dans le contexte du développement de la robotique dans la construction. On s’intéresse ici à la construction robotisée de structures maçonnées complexes en ayant recours à de la vision artificielle. La construction sans cintre étant un enjeu important en ce qui concerne la productivité sur un chantier et la quantité de déchets produits, nous explorons, à cet effet, les possibilités qu'offre la rigidité en flexion inhérente aux maçonneries topologiquement autobloquantes. La génération de ces dernières, classique dans le cas plan, est généralisée ici à la conception de structures courbes, à partir de maillages de quadrangles plans et de manière paramétrique, grâce aux logiciels Rhinoceros 3D / Grasshopper. Pour cela, nous proposons un ensemble d'inégalités à respecter afin que la structure obtenue soit effectivement topologiquement autobloquante. Ces inégalités permettent, par ailleurs, d'introduire un résultat nouveau ; à savoir qu'il est possible d'avoir un assemblage de blocs dans lequel chacun des blocs est topologiquement bloqué en translation, mais un sous-ensemble — constitué de plusieurs de ces blocs — ne l'est pas. Un prototype de maçonnerie à topologie autobloquante est finalement conçu. Sa conception repose sur une découpe des joints d'inclinaison variable qui permet de le construire sans cintre. En parallèle, nous abordons des aspects de vision artificielle robuste pour un environnement chantier, environnement complexe dans lequel les capteurs peuvent subir des chocs, être salis ou déplacés accidentellement. Le problème est d'estimer la position relative d'un bloc de maçonnerie par rapport à un bras robot, à partir de simples caméras 2D ne nécessitant pas d'étape de calibration. Notre approche repose sur l'utilisation de réseaux de neurones convolutifs de classification, entraînés à partir de centaines de milliers d'images synthétiques de l’ensemble bras robot + bloc, présentant des variations aléatoires en terme de dimensions et positions du bloc, textures, éclairage, etc, et ce afin que le robot puisse apprendre à repérer le bloc sans trop de biais d’environnement. La génération de ces images est réalisée grâce à Unreal Engine 4. Cette méthode permet la localisation du bloc par rapport au robot avec une précision millimétrique, sans utiliser une seule image réelle pour la phase d'apprentissage ; ce qui constitue un avantage certain puisque l'acquisition de données représentatives pour l'apprentissage est un processus long et fastidieux. Nous avons également construit une base de données riche, constituée d’environ 12000 images réelles contenant un robot et un bloc précisément localisés, permettant d’évaluer quantitativement notre approche et de la rendre comparable aux approches alternatives. Un démonstrateur réel intégrant un bras ABB IRB 120, des blocs parallélépipédiques et trois webcams a été mis en place pour démontrer la faisabilité de la méthode / The context of this thesis work is the development of robotics in the construction industry. We explore the robotic construction of complex masonry structures with the help of computer vision. Construction without the use of formwork is an important issue in relation to both productivity on a construction site and the amount of waste generated. To this end, we study topological interlocking masonries and the possibilities they present. The design of this kind of masonry is standard for planar structures. We generalize it to the design of curved structures in a parametrical way, using PQ meshes and the softwares Rhinoceros 3D and Grasshopper. To achieve this, we introduce a set of inequalities to respect in order to have a topological interlocked structure. These inequalities allow us to present a new result. Namely, it is possible to have an assembly of blocks in which each block is interlocked in translation, while having a subset — composed of several of these blocks — that is not interlocked. We also present a prototype of topological interlocking masonry. Its design is based on variable inclination joints, allowing construction without formwork. In parallel, we are studying robust computer vision for unstructured environments like construction sites, in which sensors are vulnerable to dust or could be accidentally jostled. The goal is to estimate the relative pose (position + orientation) of a masonry block with respect to a robot, using only cheap cameras without the need for calibration. Our approach relies on a classification Convolutional Neural Network trained using hundreds of thousands of synthetically rendered scenes with a robot and a block, and randomized parameters such as block dimensions and poses, light, textures, etc, so that the robot can learn to locate the block without being influenced by the environment. The generation of these images is performed with Unreal Engine 4. This method allows us to estimate a block pose very accurately, with only millimetric errors, without using a single real image for training. This is a strong advantage since acquiring representative training data is a long and expensive process. We also built a new rich dataset of real robot images (about 12,000 images) with accurately localized blocks so that we can evaluate our approach and compare it to alternative approaches. A real demonstrator, including a ABB IRB 120 robot, cuboid blocks and three webcams was set up to prove the feasibility of the method
140

Towards Color-Based Two-Hand 3D Global Pose Estimation

Lin, Fanqing 14 June 2022 (has links)
Pose estimation and tracking is essential for applications involving human controls. Specifically, as the primary operating tool for human activities, hand pose estimation plays a significant role in applications such as hand tracking, gesture recognition, human-computer interaction and VR/AR. As the field develops, there has been a trend to utilize deep learning to estimate the 2D/3D hand poses using color-based information without depth data. Within the depth-based as well as color-based approaches, the research community has primarily focused on single-hand scenarios in a localized/normalized coordinate system. Due to the fact that both hands are utilized in most applications, we propose to push the frontier by addressing two-hand pose estimation in the global coordinate system using only color information. Our first chapter introduces the first system capable of estimating global 3D joint locations for both hands via only monocular RGB input images. To enable training and evaluation of the learning-based models, we propose to introduce a large-scale synthetic 3D hand pose dataset Ego3DHands. As knowledge in synthetic data cannot be directly applied to the real-world domain, a natural two-hand pose dataset is necessary for real-world applications. To this end, we present a large-scale RGB-based egocentric hand dataset Ego2Hands in two chapters. In chapter 2, we address the task of two-hand segmentation/detection using images in the wild. In chapter 3, we focus on the task of two-hand 2D/3D pose estimation using real-world data. In addition to research in hand pose estimation, chapter 4 includes our work on interactive refinement that generalizes the backpropagating refinement technique for dense prediction models.

Page generated in 0.146 seconds