Spelling suggestions: "subject:"rgbd"" "subject:"rgbnd""
11 |
Segmentation d'objets mobiles par fusion RGB-D et invariance colorimétrique / Mooving objects segmentation by RGB-D fusion and color constancyMurgia, Julian 24 May 2016 (has links)
Cette thèse s'inscrit dans un cadre de vidéo-surveillance, et s'intéresse plus précisément à la détection robustesd'objets mobiles dans une séquence d'images. Une bonne détection d'objets mobiles est un prérequis indispensableà tout traitement appliqué à ces objets dans de nombreuses applications telles que le suivi de voitures ou depersonnes, le comptage des passagers de transports en commun, la détection de situations dangereuses dans desenvironnements spécifiques (passages à niveau, passages piéton, carrefours, etc.), ou encore le contrôle devéhicules autonomes. Un très grand nombre de ces applications utilise un système de vision par ordinateur. Lafiabilité de ces systèmes demande une robustesse importante face à des conditions parfois difficiles souventcausées par les conditions d'illumination (jour/nuit, ombres portées), les conditions météorologiques (pluie, vent,neige) ainsi que la topologie même de la scène observée (occultations). Les travaux présentés dans cette thèsevisent à améliorer la qualité de détection d'objets mobiles en milieu intérieur ou extérieur, et à tout moment de lajournée.Pour ce faire, nous avons proposé trois stratégies combinables :i) l'utilisation d'invariants colorimétriques et/ou d'espaces de représentation couleur présentant des propriétésinvariantes ;ii) l'utilisation d'une caméra stéréoscopique et d'une caméra active Microsoft Kinect en plus de la caméra couleurafin de reconstruire l'environnement 3D partiel de la scène, et de fournir une dimension supplémentaire, à savoirune information de profondeur, à l'algorithme de détection d'objets mobiles pour la caractérisation des pixels ;iii) la proposition d'un nouvel algorithme de fusion basé sur la logique floue permettant de combiner les informationsde couleur et de profondeur tout en accordant une certaine marge d'incertitude quant à l'appartenance du pixel aufond ou à un objet mobile. / This PhD thesis falls within the scope of video-surveillance, and more precisely focuses on the detection of movingobjects in image sequences. In many applications, good detection of moving objects is an indispensable prerequisiteto any treatment applied to these objects such as people or cars tracking, passengers counting, detection ofdangerous situations in specific environments (level crossings, pedestrian crossings, intersections, etc.), or controlof autonomous vehicles. The reliability of computer vision based systems require robustness against difficultconditions often caused by lighting conditions (day/night, shadows), weather conditions (rain, wind, snow...) and thetopology of the observed scene (occultation...).Works detailed in this PhD thesis aim at reducing the impact of illumination conditions by improving the quality of thedetection of mobile objects in indoor or outdoor environments and at any time of the day. Thus, we propose threestrategies working as a combination to improve the detection of moving objects:i) using colorimetric invariants and/or color spaces that provide invariant properties ;ii) using passive stereoscopic camera (in outdoor environments) and Microsoft Kinect active camera (in outdoorenvironments) in order to partially reconstruct the 3D environment, providing an additional dimension (a depthinformation) to the background/foreground subtraction algorithm ;iii) a new fusion algorithm based on fuzzy logic in order to combine color and depth information with a certain level ofuncertainty for the pixels classification.
|
12 |
Adaptive registration using 2D and 3D features for indoor scene reconstruction. / Registro adaptativo usando características 2D e 3D para reconstrução de cenas em ambientes internos.Juan Carlos Perafán Villota 27 October 2016 (has links)
Pairwise alignment between point clouds is an important task in building 3D maps of indoor environments with partial information. The combination of 2D local features with depth information provided by RGB-D cameras are often used to improve such alignment. However, under varying lighting or low visual texture, indoor pairwise frame registration with sparse 2D local features is not a particularly robust method. In these conditions, features are hard to detect, thus leading to misalignment between consecutive pairs of frames. The use of 3D local features can be a solution as such features come from the 3D points themselves and are resistant to variations in visual texture and illumination. Because varying conditions in real indoor scenes are unavoidable, we propose a new framework to improve the pairwise frame alignment using an adaptive combination of sparse 2D and 3D features based on both the levels of geometric structure and visual texture contained in each scene. Experiments with datasets including unrestricted RGB-D camera motion and natural changes in illumination show that the proposed framework convincingly outperforms methods using 2D or 3D features separately, as reflected in better level of alignment accuracy. / O alinhamento entre pares de nuvens de pontos é uma tarefa importante na construção de mapas de ambientes em 3D. A combinação de características locais 2D com informação de profundidade fornecida por câmeras RGB-D são frequentemente utilizadas para melhorar tais alinhamentos. No entanto, em ambientes internos com baixa iluminação ou pouca textura visual o método usando somente características locais 2D não é particularmente robusto. Nessas condições, as características 2D são difíceis de serem detectadas, conduzindo a um desalinhamento entre pares de quadros consecutivos. A utilização de características 3D locais pode ser uma solução uma vez que tais características são extraídas diretamente de pontos 3D e são resistentes a variações na textura visual e na iluminação. Como situações de variações em cenas reais em ambientes internos são inevitáveis, essa tese apresenta um novo sistema desenvolvido com o objetivo de melhorar o alinhamento entre pares de quadros usando uma combinação adaptativa de características esparsas 2D e 3D. Tal combinação está baseada nos níveis de estrutura geométrica e de textura visual contidos em cada cena. Esse sistema foi testado com conjuntos de dados RGB-D, incluindo vídeos com movimentos irrestritos da câmera e mudanças naturais na iluminação. Os resultados experimentais mostram que a nossa proposta supera aqueles métodos que usam características 2D ou 3D separadamente, obtendo uma melhora da precisão no alinhamento de cenas em ambientes internos reais.
|
13 |
Unconstrained Gaze Estimation Using RGB-D Camera. / Estimation du regard avec une caméra RGB-D dans des environnements utilisateur non-contraintsKacete, Amine 15 December 2016 (has links)
Dans ce travail, nous avons abordé le problème d’estimation automatique du regard dans des environnements utilisateur sans contraintes. Ce travail s’inscrit dans la vision par ordinateur appliquée à l’analyse automatique du comportement humain. Plusieurs solutions industrielles sont aujourd’hui commercialisées et donnent des estimations précises du regard. Certaines ont des spécifications matérielles très complexes (des caméras embarquées sur un casque ou sur des lunettes qui filment le mouvement des yeux) et présentent un niveau d’intrusivité important, ces solutions sont souvent non accessible au grand public. Cette thèse vise à produire un système d’estimation automatique du regard capable d’augmenter la liberté du mouvement de l’utilisateur par rapport à la caméra (mouvement de la tête, distance utilisateur-capteur), et de réduire la complexité du système en utilisant des capteurs relativement simples et accessibles au grand public. Dans ce travail, nous avons exploré plusieurs paradigmes utilisés par les systèmes d’estimation automatique du regard. Dans un premier temps, Nous avons mis au point deux systèmes basés sur deux approches classiques: le premier basé caractéristiques et le deuxième basé semi apparence. L’inconvénient majeur de ces paradigmes réside dans la conception des systèmes d'estimation du regard qui supposent une indépendance totale entre l'image d'apparence des yeux et la pose de la tête. Pour corriger cette limitation, Nous avons convergé vers un nouveau paradigme qui unifie les deux blocs précédents en construisant un espace regard global, nous avons exploré deux directions en utilisant des données réelles et synthétiques respectivement. / In this thesis, we tackled the automatic gaze estimation problem in unconstrained user environments. This work takes place in the computer vision research field applied to the perception of humans and their behaviors. Many existing industrial solutions are commercialized and provide an acceptable accuracy in gaze estimation. These solutions often use a complex hardware such as range of infrared cameras (embedded on a head mounted or in a remote system) making them intrusive, very constrained by the user's environment and inappropriate for a large scale public use. We focus on estimating gaze using cheap low-resolution and non-intrusive devices like the Kinect sensor. We develop new methods to address some challenging conditions such as head pose changes, illumination conditions and user-sensor large distance. In this work we investigated different gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous components and building a global gaze manifold, we explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples.
|
14 |
[en] GRAPH OPTIMIZATION AND PROBABILISTIC SLAM OF MOBILE ROBOTS USING AN RGB-D SENSOR / [pt] OTIMIZAÇÃO DE GRAFOS E SLAM PROBABILÍSTICO DE ROBÔS MÓVEIS USANDO UM SENSOR RGB-D23 March 2021 (has links)
[pt] Robôs móveis têm uma grande gama de aplicações, incluindo veículos
autônomos, robôs industriais e veículos aéreos não tripulados. Navegação
móvel autônoma é um assunto desafiador devido à alta incerteza e nãolinearidade
inerente a ambientes não estruturados, locomoção e medições de
sensores. Para executar navegação autônoma, um robô precisa de um mapa
do ambiente e de uma estimativa de sua própria localização e orientação
em relação ao sistema de referência global. No entando, geralmente o
robô não possui informações prévias sobre o ambiente e deve criar o
mapa usando informações de sensores e se localizar ao mesmo tempo,
um problema chamado Mapeamento e Localização Simultâneos (SLAM).
As formulações de SLAM usam algoritmos probabilísticos para lidar com
as incertezas do problema, e a abordagem baseada em grafos é uma das
soluções estado-da-arte para SLAM. Por muitos anos os sensores LRF (laser
range finders) eram as escolhas mais populares de sensores para SLAM.
No entanto, sensores RGB-D são uma alternativa interessante, devido ao
baixo custo. Este trabalho apresenta uma implementação de RGB-D SLAM
com uma abordagem baseada em grafos. A metodologia proposta usa o
Sistema Operacional de Robôs (ROS) como middleware do sistema. A
implementação é testada num robô de baixo custo e com um conjunto de
dados reais obtidos na literatura. Também é apresentada a implementação
de uma ferramenta de otimização de grafos para MATLAB. / [en] Mobile robots have a wide range of applications, including autonomous
vehicles, industrial robots and unmanned aerial vehicles. Autonomous mobile
navigation is a challenging subject due to the high uncertainty and nonlinearity
inherent to unstructured environments, robot motion and sensor
measurements. To perform autonomous navigation, a robot need a map of
the environment and an estimation of its own pose with respect to the global
coordinate system. However, usually the robot has no prior knowledge about
the environment, and has to create a map using sensor information and localize
itself at the same time, a problem called Simultaneous Localization and
Mapping (SLAM). The SLAM formulations use probabilistic algorithms to
handle the uncertainties of the problem, and the graph-based approach is
one of the state-of-the-art solutions for SLAM. For many years, the LRF
(laser range finders) were the most popular sensor choice for SLAM. However,
RGB-D sensors are an interesting alternative, due to their low cost.
This work presents an RGB-D SLAM implementation with a graph-based
probabilistic approach. The proposed methodology uses the Robot Operating
System (ROS) as middleware. The implementation is tested in a low
cost robot and with real-world datasets from literature. Also, it is presented
the implementation of a pose-graph optimization tool for MATLAB.
|
15 |
Improving Visual Question Answering by Leveraging Depth and Adapting Explainability / Förbättring av Visual Question Answering (VQA) genom utnyttjandet av djup och anpassandet av förklaringsförmåganPanesar, Amrita Kaur January 2022 (has links)
To produce smooth human-robot interactions, it is important for robots to be able to answer users’ questions accurately and provide a suitable explanation for why they arrive to the answer they provide. However, in the wild, the user may ask the robot questions relating to aspects of the scene that the robot is unfamiliar with and hence be unable to answer correctly all of the time. In order to gain trust in the robot and resolve failure cases where an incorrect answer is provided, we propose a method that uses Grad-CAM explainability on RGB-D data. Depth is a critical component in producing more intelligent robots that can respond correctly most of the time as some questions might rely on spatial relations within the scene, for which 2D RGB data alone would be insufficient. To our knowledge, this work is the first of its kind to leverage depth and an explainability module to produce an explainable Visual Question Answering (VQA) system. Furthermore, we introduce a new dataset for the task of VQA on RGB-D data, VQA-SUNRGBD. We evaluate our explainability method against Grad-CAM on RGB data and find that ours produces better visual explanations. When we compare our proposed model on RGB-D data against the baseline VQN network on RGB data alone, we show that ours outperforms, particularly in questions relating to depth such as asking about the proximity of objects and relative positions of objects to one another. / För att skapa smidiga interaktioner mellan människa och robot är det viktigt för robotar att kunna svara på användarnas frågor korrekt och ge en lämplig förklaring till varför de kommer fram till det svar de ger. Men i det vilda kan användaren ställa frågor till roboten som rör aspekter av miljön som roboten är obekant med och därmed inte kunna svara korrekt hela tiden. För att få förtroende för roboten och lösa de misslyckade fall där ett felaktigt svar ges, föreslår vi en metod som använder Grad-CAM-förklarbarhet på RGB-D-data. Djup är en kritisk komponent för att producera mer intelligenta robotar som kan svara korrekt för det mesta, eftersom vissa frågor kan förlita sig på rumsliga relationer inom scenen, för vilka enbart 2D RGB-data skulle vara otillräcklig. Såvitt vi vet är detta arbete det första i sitt slag som utnyttjar djup och en förklaringsmodul för att producera ett förklarabart Visual Question Answering (VQA)-system. Dessutom introducerar vi ett nytt dataset för uppdraget av VQA på RGB-D-data, VQA-SUNRGBD. Vi utvärderar vår förklaringsmetod mot Grad-CAM på RGB-data och finner att vår modell ger bättre visuella förklaringar. När vi jämför vår föreslagna modell för RGB-Ddata mot baslinje-VQN-nätverket på enbart RGB-data visar vi att vår modell överträffar, särskilt i frågor som rör djup, som att fråga om objekts närhet och relativa positioner för objekt jämntemot varandra.
|
16 |
RGB-D Deep Learning keypoints and descriptors extraction Network for feature-based Visual Odometry systems / RGB-D Deep Learning-nätverk för utvinning av nyckelpunkter och deskriptorer för nyckelpunktsbaserad Visuella Odometri.Bennasciutti, Federico January 2022 (has links)
Feature extractors in Visual Odometry pipelines rarely exploit depth signals, even though depth sensors and RGB-D cameras are commonly used in later stages of Visual Odometry systems. Nonetheless, depth sensors from RGB-D cameras function even with no external light and can provide feature extractors with additional structural information otherwise invisible in RGB images. Deep Learning feature extractors, which have recently been shown to outperform their classical counterparts, still only exploit RGB information. Against this background, this thesis presents a Self-Supervised Deep Learning feature extraction algorithm that employs both RGB and depth signals as input. The proposed approach builds upon the existing deep learning feature extractors, adapting the architecture and training procedure to introduce the depth signal. The developed RGB-D system is compared with an RGB-only feature extractor in a qualitative study on keypoints’ location and a quantitative evaluation on pose estimation. The qualitative evaluation demonstrates that the proposed system exploits information from both RGB and depth domains, and it robustly adapts to the degradation of either of the two input signals. The pose estimation results indicate that the RGB-D system performs comparably to the RGB-only one in normal and low-light conditions. Thanks to the usage of depth information, the RGB-D feature extractor can still operate, showing only limited performance degradation, even in completely dark environments, where RGB methods fail due to a lack of input information. The combined qualitative and quantitative results suggest that the proposed system extracts features based on both RGB and depth input domains and can autonomously transition from normal brightness to a no-light environment, by exploiting depth signal to compensate for the degraded RGB information. / Detektering av nyckelpunkter för Visuell Odometri (VO) utnyttjar sällan information om djup i bilder, även om avståndssensorer och RGB-D-kameror ofta används i senare skeden av VO pipelinen. RGB-D-kamerors avståndsestimering fungerar även utan externt ljus. De kan förse nyckelpunktsdetektorer med ytterligare strukturell information som är svårt att extrahera enbart från RGB-bilder. Detektering av nyckelpunkter, med hjälp av Deep Learning metoder, har nyligen visat sig överträffa sina klassiska motsvarigheter som fortfarande endast utnyttjar bildinformation. Denna avhandling presenterar en algoritm för självövervakande nyckelpunktsdetektering med djupinlärning, som använder både RGB-bilder och avståndsinformation som indata. Det föreslagna tillvägagångssättet bygger på en befintlig arkitektur, som har anpassats för att också kunna hantera informationen om djupet i bilder. Den utvecklade RGB-D nyckelpunktsdetektorn har jämförts med en detektor som enbart baseras på RGB-bilder. Det har både gjorts en kvalitativ utvärdering av nyckelpunkternas läge och en kvantitativ utvärdering av detektorns förmåga på VO-tillämpningar, dvs estimering av position och orientering. Den kvalitativa utvärderingen av nyckelpunkterna visar att det föreslagna systemet kan utnyttja både information från bild- och djupdomänen. Den visar även att detektorn är robust mot försämringar av båda bilderna och djupinformationen. Evalueringen visar att den utvecklade RGB-D-metoden och en standardetektor uppnår jämförbara resultat under normala och svaga ljusförhållanden. Dock, tack vare användningen av tillgänglig djupinformation kan RGB-D-metoden fortfarande fungera i helt mörka förhållanden, med endast begränsad försämring av prestanda. I dessa scenarion misslyckas RGB-metoder på grund av brist på användbar bildinformation. De kombinerade kvalitativa och kvantitativa resultaten tyder på att det föreslagna systemet extraherar egenskaper som baseras på både bild- och djupinmatningsområden och kan självständigt övergå mellan normala och ljusfattiga förhållanden genom att utnyttja djup för att kompensera för den försämrade bildinformationen.
|
17 |
Robust Registration of ToF and RGB-D Camera Point Clouds / Robust registrering av punktmoln från ToF och RGB-D kameraChen, Shuo January 2021 (has links)
This thesis presents a comparison of M-estimator, BLAVE, and RANSAC method in point clouds registration. The comparison is performed empirically by applying all the estimators on a simulated data added with noise plus gross errors, ToF data and RGB-D data. The RANSAC method is the fastest and most robust estimator from the comparison. The 2D feature extracting methods Harris corner detector, SIFT and SURF and 3D extracting method ISS are compared in the real-world scene data as well. SIFT algorithm is proven to have extracted the most feature points with accurate features among all the extracting methods in different data. In the end, ICP algorithm is used to refine the registration result based on the estimation of initial transform. / Denna avhandling presenterar en jämförelse av tre metoder för registrering av punktmoln: M-estimator, BLAVE och RANSAC. Jämförelsen utfördes empiriskt genom att använda alla metoder på simulerad data med brus och grova fel samt på ToF - och RGB-D -data. Tester visade att RANSAC-metoden är den snabbaste och mest robusta metoden. Vi har även jämfört tre metoder för extrahering av features från 2D-bilder: Harris hörndetektor, SIFT och SURF och en 3D extraheringsmetod ISS. Denna jämförelse utfördes md hjälp av verkliga data. SIFT -algoritmen har visat sig fungera bäst bland alla extraheringsmetoder: den har extraherat flesta features med högst precision. I slutändan användes ICP-algoritmen för att förfina registreringsresultatet baserat på uppskattningen av initial transformering.
|
18 |
Human Object Interaction Recognition / Reconnaissance d’action humaines et d’interaction avec l’objetMeng, Meng 09 January 2017 (has links)
Dans cette thèse, nous avons étudié la reconnaissance des actions qui incluent l'intéraction avec l’objet à partir des données du skeleton et des informations de profondeur fournies par les capteurs RGB-D. Il existe deux principales applications que nous abordons dans cette thèse: la reconnaissance de l'interaction humaine avec l'objet et la reconnaissance d'une activité anormale. Nous proposons, dan un premier temps, une modélisation spatio-temporelle pour la reconnaissance en ligne et hors ligne des interactions entre l’humain et l’objet. Ces caractéristiques ont été adoptées pour la reconnaissance en ligne des interactions humaines avec l’objet et pour la détection de la démarche anormale. Ensuite, nous proposons des caractéristiques liées à d'objet qui décrivent approximativement la forme et la taille de l’objet. Ces caractéristiques sont fusionnées avec les caractéristiques bas-niveau pour la reconnaissance en ligne des interactions humaines avec l’objet. Les expériences menées sur deux benchmarks démontrent l’efficacité de la méthode proposée. Dans le deuxième volet de ce travail, nous étendons l'étude à la détection de la démarche anormale en utilisant le cadre en ligne l’approche. Afin de valider la robustesse de l’approche à la pose, nous avons collecté une base multi-vue pour des interactions humaines avec l’objet, de façon normale et anormale. Les résultats expérimentaux sur le benchmark des actions anormales frontales et sur la nouvelles base prouvent l’efficacité de l’approche. / In this thesis, we have investigated the human object interaction recognition by using the skeleton data and local depth information provided by RGB-D sensors. There are two main applications we address in this thesis: human object interaction recognition and abnormal activity recognition. First, we propose a spatio-temporal modeling of human-object interaction videos for on-line and off-line recognition. In the spatial modeling of human object interactions, we propose low-level feature and object related distance feature which adopted on on-line human object interaction recognition and abnormal gait detection. Then, we propose object feature, a rough description of the object shape and size as new features to model human-object interactions. This object feature is fused with the low-level feature for online human object interaction recognition. In the temporal modeling of human object interactions, we proposed a shape analysis framework based on low-level feature and object related distance feature for full sequence-based off-line recognition. Experiments carried out on two representative benchmarks demonstrate the proposed method are effective and discriminative for human object interaction analysis. Second, we extend the study to abnormal gait detection by using the on-line framework of human object interaction classification. The experiments conducted following state-of-the-art settings on the benchmark shows the effectiveness of proposed method. Finally, we collected a multi-view human object interaction dataset involving abnormal and normal human behaviors by RGB-D sensors. We test our model on the new dataset and evaluate the potential of the proposed approach.
|
19 |
An Obstacle Avoidance System for the Visually Impaired Using 3-D Point Cloud ProcessingTaylor, Evan Justin 01 December 2017 (has links)
The long white cane offers many benefits for the blind and visually impaired. Still, many report being injured both indoors and outdoors while using the long white cane. One frequent cause of injury is due to the fact that the long white cane cannot detect obstacles above the waist of the user. This thesis presents a system that attempts to augment the capabilities of the long white cane by sensing the environment around the user, creating a map of obstacles within the environment, and providing simple haptic feedback to the user. The proposed augmented cane system uses the Asus Xtion Pro Live infrared depth sensor to capture the user's environment as a point cloud. The open-source Point Cloud Library (PCL) and Robotic Operating System (ROS) are used to process the point cloud. The points representing the ground plane are extracted to more clearly define potential obstacles. The system determines the nearest point for each 1degree across the horizontal view. These nearest points are recorded as a ROS Laser Scan message and used in a simple haptic feedback system where the rumble feedback is based on two different cost functions. Twenty-two volunteers participated in a user demonstration that showed the augmented cane system can successfully communicate the presence of obstacles to blindfolded users. The users reported experiencing a sense of safety and confidence in the system's abilities. Obstacles above waist height are detected and communicated to the user. The system requires additional development before it could be considered a viable product for the visually impaired.
|
20 |
Capture and generalisation of close interaction with objectsSandilands, Peter James January 2015 (has links)
Robust manipulation capture and retargeting has been a longstanding goal in both the fields of animation and robotics. In this thesis I describe a new approach to capture both the geometry and motion of interactions with objects, dealing with the problems of occlusion by the use of magnetic systems, and performing the reconstruction of the geometry by an RGB-D sensor alongside visual markers. This ‘interaction capture’ allows the scene to be described in terms of the spatial relationships between the character and the object using novel topological representations such as the Electric Parameters, which parametrise the outer space of an object using properties of the surface of the object. I describe the properties of these representations for motion generalisation and discuss how they can be applied to the problems of human-like motion generation and programming by demonstration. These generalised interactions are shown to be valid by demonstration of retargeting grasping and manipulation to robots with dissimilar kinematics and morphology using only local, gradient-based planning.
|
Page generated in 0.0462 seconds