• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 198
  • 24
  • 18
  • 10
  • 9
  • 6
  • 6
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 343
  • 217
  • 145
  • 106
  • 70
  • 61
  • 58
  • 48
  • 45
  • 45
  • 44
  • 43
  • 39
  • 38
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
181

Infared Light-Based Data Association and Pose Estimation for Aircraft Landing in Urban Environments

Akagi, David 10 June 2024 (has links) (PDF)
In this thesis we explore an infrared light-based approach to the problem of pose estimation during aircraft landing in urban environments where GPS is unreliable or unavailable. We introduce a novel fiducial constellation composed of sparse infrared lights that incorporates projective invariant properties in its design to allow for robust recognition and association from arbitrary camera perspectives. We propose a pose estimation pipeline capable of producing high accuracy pose measurements at real-time rates from monocular infrared camera views of the fiducial constellation, and present as part of that pipeline a data association method that is able to robustly identify and associate individual constellation points in the presence of clutter and occlusions. We demonstrate the accuracy and efficiency of this pose estimation approach on real-world data obtained from multiple flight tests, and show that we can obtain decimeter level accuracy from distances of over 100 m from the constellation. To achieve greater robustness to the potentially large number of outlier infrared detections that can arise in urban environments, we also explore learning-based approaches to the outlier rejection and data association problems. By formulating the problem of camera image data association as a 2D point cloud analysis, we can apply deep learning methods designed for 3D point cloud segmentation to achieve robust, high-accuracy associations at constant real-time speeds on infrared images with high outlier-to-inlier ratios. We again demonstrate the efficiency of our learning-based approach on both synthetic and real-world data, and compare the results and limitations of this method to our first-principles-based data association approach.
182

6DOF MAGNETIC TRACKING AND ITS APPLICATION TO HUMAN GAIT ANALYSIS

Ravi Abhishek Shankar (18855049) 28 June 2024 (has links)
<p dir="ltr">There is growing research in analyzing human gait in the context of various applications. This has been aided by the improvement in sensing technologies and computation power. A complex motor skill that it is, gait has found its use in medicine for diagnosing different neurological ailments and injuries. In sports, gait can be used to provide feedback to the player/athlete to improve his/her skill and to prevent injuries. In biometrics, gait can be used to identify and authenticate individuals. This can be easier to scale to perform biometrics of individuals in large crowds compared to conventional biometric methods. In the field of Human Computer Interaction (HCI), gait can be an additional input that could be provided to be used in applications such as video games. Gait analysis has also been used for Human Activity Recognition (HAR) for purposes such as personal fitness, elderly care and rehabilitation. </p><p dir="ltr">The current state-of-the-art methods for gait analysis involves non-wearable technology due to its superior performance. The sophistication afforded in non-wearable technologies, such as cameras, is better able to capture gait information as compared to wearables. However, non-wearable systems are expensive, not scalable and typically, inaccessible to the general public. These systems sometimes need to be set up in specialized clinical facilities by experts. On the other hand, wearables offer scalability and convenience but are not able to match the performance of non-wearables. So the current work is a step in the direction to bridge the gap between the performance of non-wearable systems and the convenience of wearables. </p><p dir="ltr">A magnetic tracking system is developed to be applied for gait analysis. The system performs position and orientation tracking, i.e. 6 degrees of freedom or 6DoF tracking. One or more tracker modules, called Rx modules, is tracked with respect to a module called the Tx module. The Tx module mainly consists of a magnetic field generating coil, Inertial Measurement Unit (IMU) and magnetometer. The Rx module mainly consists of a tri-axis sensing coil, IMU and magnetometer. The system is minimally intrusive, works with Non-Line-of-Sight (NLoS) condition, low power consuming, compact and light weight. </p><p dir="ltr">The magnetic tracking system has been applied to the task of Human Activity Recognition (HAR) in this work as a proof-of-concept. The tracking system was worn by participants, and 4 activities - walking, walking with weight, marching and jogging - were performed. The Tx module was worn on the waist and the Rx modules were placed on the feet. To compare magnetic tracking with the most commonly used wearable sensors - IMUs + magnetometer - the same system was used to provide IMU and magnetometer data for the same 4 activities. The gait data was processed by 2 commonly used deep learning models - Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The magnetic tracking system shows an overall accuracy of 92\% compared to 86.69\% of the IMU + magnetometer system. Moreover, an accuracy improvement of 8\% is seen with the magnetic tracking system in differentiating between the walking and walking with weight activities, which are very similar in nature. This goes to show the improvement in gait information that 6DoF tracking brings, that manifests as increased classification accuracy. This increase in gait information will have a profound impact in other applications of gait analysis as well.</p>
183

Gappy POD and Temporal Correspondence for Lizard Motion Estimation

Kurdila, Hannah Robertshaw 20 June 2018 (has links)
With the maturity of conventional industrial robots, there has been increasing interest in designing robots that emulate realistic animal motions. This discipline requires careful and systematic investigation of a wide range of animal motions from biped, to quadruped, and even to serpentine motion of centipedes, millipedes, and snakes. Collecting optical motion capture data of such complex animal motions can be complicated for several reasons. Often there is the need to use many high-quality cameras for detailed subject tracking, and self-occlusion, loss of focus, and contrast variations challenge any imaging experiment. The problem of self-occlusion is especially pronounced for animals. In this thesis, we walk through the process of collecting motion capture data of a running lizard. In our collected raw video footage, it is difficult to make temporal correspondences using interpolation methods because of prolonged blurriness, occlusion, or the limited field of vision of our cameras. To work around this, we first make a model data set by making our best guess of the points' locations through these corruptions. Then, we randomly eclipse the data, use Gappy POD to repair the data and then see how closely it resembles the initial set, culminating in a test case where we simulate the actual corruptions we see in the raw video footage. / Master of Science / There has been increasing interest over the past few years in designing robots that emulate realistic animal motions. To make these designs as accurate as possible requires thorough analysis of animal motion. This is done by recording video and then converting it into numerical data, which can be analyzed in a rigorous way. But this conversion cannot be made when the raw video footage is ambiguous, for instance, when the footage is blurry, the shot is too dark or too light, the subject (or parts of the subject) are out of view of the camera, etc. In this thesis, we walk through the process of collecting video footage of a lizard running and then converting it into data. Ambiguities in the video footage result in an incomplete translation into numerical data and we use a mathematical technique called the Gappy Proper Orthogonal Decomposition to fill in this incompleteness in an intelligible way. And in the process, we lay your hands on the fundamental drivers of the animal’s motion.
184

Training und Evaluation eines neuroyalen Netzes zur Lösung der „Visual Referee Challenge“

Jurkat, Freijdis 14 October 2024 (has links)
Die Schätzung von Posen ist ein bedeutendes Forschungsgebiet im Bereich der künstlichen Intelligenz, das die Mensch-Maschine-Interaktion vorantreibt und auch im Sport immer mehr an Relevanz gewinnt. Während menschliche Fußballspieler auf dem Feld mit den Schiedsrichtern ganz natürlich interagieren, wurde dieser Aspekt jedoch bisher in der Standard Platform League des Robocups vernachlässigt. Diese Arbeit untersucht einen weiteren Ansatz, um die Klassifizierung von statischen und dynamischen Schiedsrichterposen durchzuführen und damit dem großen Ziel, dass bis Mitte des 21. Jahrhunderts ein vollständig autonomes Roboter-Team nach den offiziellen FIFA-Regeln gegen den aktuellen Weltmeister gewinnen soll, einen Schritt näher zu kommen. Hierfür wurden Videos von relevanten Schiedsrichterposen erstellt und gesammelt. Anschließend wurden die menschlichen Gelenke mittels MoveNet extrahiert und die Pose mithilfe eines Convolutional Neural Networks klassifiziert. Dabei wurden zwei verschiedene Ansätze verfolgt: Ein Modell für jede Pose und ein Modell für alle Posen. Die Untersuchung zeigt, dass gute bis sehr gute Ergebnisse für statische und dynamische Posen erzielt werden können, wobei die Genauigkeit von einem Modell pro Pose 91,3% bis 99,3% mit einem Durchschnitt von 96,1% erreicht und die Genauigkeit von einem Modell für alle Posen eine Genauigkeit von 90,9% erreicht. Die erfolgreiche Anwendung der entwickelten Methodik zur Schätzung von Posen im Roboterfußball eröffnet vielversprechende Perspektiven für die Zukunft dieses Bereichs. Die gewonnenen Erkenntnisse können nicht nur zur Verbesserung der Leistungsfähigkeit von Fußballrobotern beitragen, sondern auch einen bedeutenden Beitrag zur weiteren Integration von KI-Technologien in unsere Gesellschaft leisten.:Inhaltsverzeichnis Abbildungsverzeichnis Tabellenverzeichnis Abkürzungsverzeichnis 1 Einleitung 2 Einsatzszenario 2.1 Der RoboCup 2.2 Die Standard Platform League 2.3 Die In-Game Visual Referee Challenge 3 Grundlagen neuronaler Netze 3.1 Artificial Neural Networks 3.2 Convolutional Neural Networks 3.2.1 Architektur 3.2.2 Aktivierungsfunktionen 3.2.3 Weitere Optimierungsmöglichkeiten 3.3 Verschiedene Lernmethoden 3.4 Evaluation 4 State of the Art 10 4.1 Machine Learning Ansätze 4.1.1 Decision Trees 4.1.2 k-NN Algorithmus 4.2 Deep Learning Ansätze 4.2.1 Artificial Neural Network 4.2.2 Convolutionan Neural Network 4.2.3 Recurrent Neural Network 4.3 Auswahl des Vorgehens 4.3.1 Schlüsselpunkterkennung 4.3.2 Posenerkennung 5 Eigene Implementierung 5.1 Datensatz 5.2 Vorverarbeitung der Daten 5.2.1 Vorverarbeitung der Videos 5.2.2 Erstellung der Trainings- und Validierungsdaten 5.3 Ansatz 1: Ein Model pro Pose 5.3.1 Datensatz 5.3.2 Architektur 5.3.3 Bewertung 5.4 Ansatz 2: Ein Model für alle Posen 5.4.1 Datensatz 5.4.2 Architektur 5.4.3 Bewertung 5.5 Vergleich der Ansätze 6 Fazit und Ausblick 6.1 Fazit 6.2 Ausblick Literatur A Anhang A.1 RoboCup Standard Platform League (NAO) Technical Challenges A.2 Modelcard Movenet A.3 Code und Datensätze Eigenständigkeitserklärung
185

Fusion de données visuo-inertielles pour l'estimation de pose et l'autocalibrage / Visuo-inertial data fusion for pose estimation and self-calibration

Scandaroli, Glauco Garcia 14 June 2013 (has links)
Les systèmes multi-capteurs exploitent les complémentarités des différentes sources sensorielles. Par exemple, le capteur visuo-inertiel permet d’estimer la pose à haute fréquence et avec une grande précision. Les méthodes de vision mesurent la pose à basse fréquence mais limitent la dérive causée par l’intégration des données inertielles. Les centrales inertielles mesurent des incréments du déplacement à haute fréquence, ce que permet d’initialiser la vision et de compenser la perte momentanée de celle-ci. Cette thèse analyse deux aspects du problème. Premièrement, nous étudions les méthodes visuelles directes pour l’estimation de pose, et proposons une nouvelle technique basée sur la corrélation entre des images et la pondération des régions et des pixels, avec une optimisation inspirée de la méthode de Newton. Notre technique estime la pose même en présence des changements d’illumination extrêmes. Deuxièmement, nous étudions la fusion des données a partir de la théorie de la commande. Nos résultats principaux concernent le développement d’observateurs pour l’estimation de pose, biais IMU et l’autocalibrage. Nous analysons la dynamique de rotation d’un point de vue non linéaire, et fournissons des observateurs stables dans le groupe des matrices de rotation. Par ailleurs, nous analysons la dynamique de translation en tant que système linéaire variant dans le temps, et proposons des conditions d’observabilité uniforme. Les analyses d’observabilité nous permettent de démontrer la stabilité uniforme des observateurs proposés. La méthode visuelle et les observateurs sont testés et comparés aux méthodes classiques avec des simulations et de vraies données visuo-inertielles. / Systems with multiple sensors can provide information unavailable from a single source, and complementary sensory characteristics can improve accuracy and robustness to many vulnerabilities as well. Explicit pose measurements are often performed either with high frequency or precision, however visuo-inertial sensors present both features. Vision algorithms accurately measure pose at low frequencies, but limit the drift due to integration of inertial data. Inertial measurement units yield incremental displacements at high frequencies that initialize vision algorithms and compensate for momentary loss of sight. This thesis analyzes two aspects of that problem. First, we survey direct visual tracking methods for pose estimation, and propose a new technique based on the normalized crosscorrelation, region and pixel-wise weighting together with a Newton-like optimization. This method can accurately estimate pose under severe illumination changes. Secondly, we investigate the data fusion problem from a control point of view. Main results consist in novel observers for concurrent estimation of pose, IMU bias and self-calibration. We analyze the rotational dynamics using tools from nonlinear control, and provide stable observers on the group of rotation matrices. Additionally, we analyze the translational dynamics using tools from linear time-varying systems, and propose sufficient conditions for uniform observability. The observability analyses allow us to prove uniform stability of the observers proposed. The proposed visual method and nonlinear observers are tested and compared to classical methods using several simulations and experiments with real visuo-inertial data.
186

3D detection and pose estimation of medical staff in operating rooms using RGB-D images / Détection et estimation 3D de la pose des personnes dans la salle opératoire à partir d'images RGB-D

Kadkhodamohammadi, Abdolrahim 01 December 2016 (has links)
Dans cette thèse, nous traitons des problèmes de la détection des personnes et de l'estimation de leurs poses dans la Salle Opératoire (SO), deux éléments clés pour le développement d'applications d'assistance chirurgicale. Nous percevons la salle grâce à des caméras RGB-D qui fournissent des informations visuelles complémentaires sur la scène. Ces informations permettent de développer des méthodes mieux adaptées aux difficultés propres aux SO, comme l'encombrement, les surfaces sans texture et les occlusions. Nous présentons des nouvelles approches qui tirent profit des informations temporelles, de profondeur et des vues multiples afin de construire des modèles robustes pour la détection des personnes et de leurs poses. Une évaluation est effectuée sur plusieurs jeux de données complexes enregistrés dans des salles opératoires avec une ou plusieurs caméras. Les résultats obtenus sont très prometteurs et montrent que nos approches surpassent les méthodes de l'état de l'art sur ces données cliniques. / In this thesis, we address the two problems of person detection and pose estimation in Operating Rooms (ORs), which are key ingredients in the development of surgical assistance applications. We perceive the OR using compact RGB-D cameras that can be conveniently integrated in the room. These sensors provide complementary information about the scene, which enables us to develop methods that can cope with numerous challenges present in the OR, e.g. clutter, textureless surfaces and occlusions. We present novel part-based approaches that take advantage of depth, multi-view and temporal information to construct robust human detection and pose estimation models. Evaluation is performed on new single- and multi-view datasets recorded in operating rooms. We demonstrate very promising results and show that our approaches outperform state-of-the-art methods on this challenging data acquired during real surgeries.
187

Dynamic Headpose Classification and Video Retargeting with Human Attention

Anoop, K R January 2015 (has links) (PDF)
Over the years, extensive research has been devoted to the study of people's head pose due to its relevance in security, human-computer interaction, advertising as well as cognitive, neuro and behavioural psychology. One of the main goals of this thesis is to estimate people's 3D head orientation as they freely move around in naturalistic settings such as parties, supermarkets etc. Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces captured are at low-resolution with a blurred appearance. Also labelling sufficient training data for headpose estimation in such settings is difficult due to the motion of targets and the large possible range of head orientations. Domain adaptation approaches are useful for transferring knowledge from the training source to the test target data having different attributes, minimizing target data labelling efforts in the process. This thesis examines the use of transfer learning for efficient multi-view head pose classification. Relationship between head pose and facial appearance from many labelled examples corresponding to the source data is learned initially. Domain adaptation techniques are then employed to transfer this knowledge to the target data. The following three challenging situations is addressed (I) ranges of head poses in the source and target images is different, (II) where source images capture a stationary person while target images capture a moving person with varying facial appearance due to changing perspective, scale and (III) a combination of (I) and (II). All proposed transfer learning methods are sufficiently tested and benchmarked on a new compiled dataset DPOSE for headpose classification. This thesis also looks at a novel signature representation for describing object sets for covariance descriptors, Covariance Profiles (CPs). CP is well suited for representing a set of similarly related objects. CPs posit that the covariance matrices, pertaining to a specific entity, share the same eigen-structure. Such a representation is not only compact but also eliminates the need to store all the training data. Experiments on images as well as videos for applications such as object-track clustering and headpose estimation is shown using CP. In the second part, Human-gaze for interest point detection for video retargeting is explored. Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. This thesis proposes an interactive human-in-loop framework to model eye-movements and predicts visual saliency in yet-unseen frames. Eye-tracking and video content is used to model visual attention in a manner that accounts for temporal discontinuities due to sudden eye movements, noise and behavioural artefacts. Gaze buffering, for eye-gaze analysis and its fusion with content based features is proposed. The method uses eye-gaze information along with bottom-up and top-down saliency to boost the importance of image pixels. Our robust visual saliency prediction is instantiated for content aware Video Retargeting.
188

Estudo de uma técnica para o tratamento de dead-times em operações de rastreamento de objetos por servovisão

Saqui, Diego 22 May 2014 (has links)
Made available in DSpace on 2016-06-02T19:06:15Z (GMT). No. of bitstreams: 1 6235.pdf: 6898238 bytes, checksum: 058a3b75f03de2058255b7fa7db30dac (MD5) Previous issue date: 2014-05-22 / Financiadora de Estudos e Projetos / Visual servoing is a technique that uses computer vision to acquire visual information (by camera) and a control system with closed loop circuit to control robots. One typical application of visual servoing is tracking objects on conveyors in industrial environments. Visual servoing has the advantage of obtaining a large amount of information from the environment and greater flexibility in operations than other types of sensors. A disadvantage are the delays, known as dead-times or time-delays that can occur during the treatment of visual information in computer vision tasks or other tasks of the control system that need large processing capacity. The dead-times in visual servoing applied in industrial operations such as in the tracking of objects on conveyors are critical and can negatively affect production capacity in manufacturing environments. Some methodogies can be found in the literature for this problem and some of these methodologies are often based on the Kalman filter. In this work a technique was selected based on the formulation of the Kalman filter that already had a study on the prediction of future pose of objects with linear motion. This methodology has been studied in detail, tested and analyzed through simulations for other motions and some applications. Three types of experiments were generated: one for different types of motions and two others applied in different types of signals in the velocity control systems. The results from the motion of the object shown that the technique is able to estimate the future pose of objects with linear motion and smooth curves, but it is inefficient for drastic changes in motion. With respect to the signal to be filtered in the velocity control, the methodogy has been shown applicable (with motions conditions) only in the estimation of pose of the object after the occurrence of dead-times caused by computer vision and this information is subsequently used to calculate the future error of the object related to the robotic manipulator used to calculate the velocity of the robot. The trying to apply the methodogy directly on the error used to calculate the velocity to be applied to the robot did not produce good results. With the results the methodogy can be applied for object tracking with linear motion and smooth curves as in the case of objects transported by conveyors in industrial environments. / Servovisao e uma tecnica que utiliza visao computacional para obter informacoes visuais (atraves de camera) e um sistema de controle com circuito em malha fechada para controlar robos. Uma das aplicacoes tipicas de servovisao e no rastreamento de objetos sobre esteiras transportadoras em ambientes industriais. Servovisao possui a vantagem em relacao a outros tipos de sensores de permitir a obtencao de um grande numero de informacoes a partir do ambiente e maior flexibilidade nas operacoes. Uma desvantagem sao os atrasos conhecidos como dead-times ou time-delays que podem ocorrer durante o tratamento de informacoes visuais nas tarefas de visao computacional ou em outras tarefas do sistema de controle que necessitam de grande capacidade de processamento. Os dead-times em servovisao aplicada em operacoes industriais como no rastreamento de objetos em esteiras transportadoras sao criticos e podem afetar negativamente na capacidade de producao em ambientes de manufatura. Algumas metodologias podem ser encontradas na literatura para esse tipo de problema sendo muitas vezes baseadas no filtro de Kalman. Nesse trabalho foi selecionada uma metodologia baseada na formulacao do filtro de Kalman que ja possui um estudo na previsao futura de pose de objetos com movimentacao linear. Essa metodologia foi estudada detalhadamente, testada atraves de simulacoes e analisada sobre outros tipos de movimentos e algumas aplicacoes. No total foram gerados tres tipos de experimentos: um para diferentes tipos de movimentacao e outros dois aplicados em diferentes tipos de sinais no controlador de velocidades. Os resultados a partir da movimentacao do objeto demonstraram que o metodo e capaz de estimar a pose futura de objetos com movimento linear e com curvas suaves, porem e ineficiente para alteracoes drasticas no movimento. Com relacao ao sinal a ser filtrado no controlador de velocidades a metodologia se demonstrou aplicavel (com as condicoes de movimento) somente na estimativa da pose do objeto apos a ocorrencia de dead-times causados por visao computacional e posteriormente essa informacao e utilizada para calcular o erro futuro do objeto em relacao ao manipulador robotico utilizado no calculo da velocidade do robo. A tentativa de aplicacao da tecnica diretamente no erro utilizado no calculo da velocidade a ser aplicada ao robo nao apresentou bons resultados. Com os resultados obtidos a metodologia se demonstrou eficiente para o rastreamento de objetos de forma linear e curvas suaves como no caso de objetos transportados por esteiras em ambientes industriais.
189

Posicionamento e movimenta??o de um rob? human?ide utilizando imagens de uma c?mera m?vel externa

Nogueira, Marcelo Borges 20 December 2005 (has links)
Made available in DSpace on 2014-12-17T14:55:48Z (GMT). No. of bitstreams: 1 MarceloBN.pdf: 1368278 bytes, checksum: e9f6ea9d9353cb33144a3fc036bd57dc (MD5) Previous issue date: 2005-12-20 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / This work proposes a method to localize a simple humanoid robot, without embedded sensors, using images taken from an extern camera and image processing techniques. Once the robot is localized relative to the camera, supposing we know the position of the camera relative to the world, we can compute the position of the robot relative to the world. To make the camera move in the work space, we will use another mobile robot with wheels, which has a precise locating system, and will place the camera on it. Once the humanoid is localized in the work space, we can take the necessary actions to move it. Simultaneously, we will move the camera robot, so it will take good images of the humanoid. The mainly contributions of this work are: the idea of using another mobile robot to aid the navigation of a humanoid robot without and advanced embedded electronics; chosing of the intrinsic and extrinsic calibration methods appropriated to the task, especially in the real time part; and the collaborative algorithm of simultaneous navigation of the robots / Este trabalho prop?e um m?todo para localizar um rob? human?ide simples, sem sensores embarcados, utilizando imagens obtidas por uma c?mera externa e t?cnicas de processamento de imagens. Localizando o rob? em rela??o ? c?mera, e supondo conhecida a posi??o da c?mera em rela??o ao mundo, podemos determinar a posi??o do rob? human?ide em rela??o ao mundo. Para que a posi??o da c?mera n?o seja fixa, utilizamos um outro rob? m?vel com rodas, dotado de um sistema de localiza??o preciso, sobre o qual ser? colocada a c?mera. Uma vez que o human?ide seja localizado no ambiente de trabalho, podemos tomar as a??es necess?rias para realizar a sua movimenta??o. Simultaneamente, movimentamos o rob? que cont?m a c?mera, de forma que este tenha uma boa visada do human?ide. As principais contribui??es deste trabalho s?o: a id?ia de utilizar um segundo rob? m?vel para auxiliar a movimenta??o de um rob? human?ide sem eletr?nica embarcada avan?ada; a escolha de m?todos de calibra??o dos par?metros intr?nsecos e extr?nsecos da c?mera apropriados para a aplica??o em quest?o, especialmente na parte em tempo real; e o algoritmo colaborativo de movimenta??o simult?nea dos dois rob?s
190

3D Pose estimation of continuously deformable instruments in robotic endoscopic surgery / Mesure par vision de la position d'instruments médicaux flexibles pour la chirurgie endoscopique robotisée

Cabras, Paolo 24 February 2016 (has links)
Connaître la position 3D d’instruments robotisés peut être très utile dans le contexte chirurgical. Nous proposons deux méthodes automatiques pour déduire la pose 3D d’un instrument avec une unique section pliable et équipé avec des marqueurs colorés, en utilisant uniquement les images fournies par la caméra monoculaire incorporée dans l'endoscope. Une méthode basée sur les graphes permet segmenter les marqueurs et leurs coins apparents sont extraits en détectant la transition de couleur le long des courbes de Bézier qui modélisent les points du bord. Ces primitives sont utilisées pour estimer la pose 3D de l'instrument en utilisant un modèle adaptatif qui prend en compte les jeux mécaniques du système. Pour éviter les limites de cette approche dérivants des incertitudes sur le modèle géométrique, la fonction image-position-3D peut être appris selon un ensemble d’entrainement. Deux techniques ont été étudiées et améliorées : réseau des fonctions à base radiale avec noyaux gaussiens et une régression localement pondérée. Les méthodes proposées sont validées sur une cellule expérimentale robotique et sur des séquences in-vivo. / Knowing the 3D position of robotized instruments can be useful in surgical context for e.g. their automatic control or gesture guidance. We propose two methods to infer the 3D pose of a single bending section instrument equipped with colored markers using only the images provided by the monocular camera embedded in the endoscope. A graph-based method is used to segment the markers. Their corners are extracted by detecting color transitions along Bézier curves fitted on edge points. These features are used to estimate the 3D pose of the instrument using an adaptive model that takes into account the mechanical plays of the system. Since this method can be affected by model uncertainties, the image-to-3d function can be learned according to a training set. We opted for two techniques that have been improved : Radial Basis Function Network with Gaussian kernel and Locally Weighted Projection. The proposed methods are validated on a robotic experimental cell and in in-vivo sequences.

Page generated in 0.0805 seconds