1 |
A novel augmented laser pointer interface and shared autonomy paradigm to enable object retrieval via an assistive robotHamilton, Kali 15 May 2020 (has links)
Assistive robots have the potential to enable persons with motor disabilities to live more independent lives. Object retrieval has been rated a high-priority task for assistive robots. A key challenge in creating effective assistive robots lies in designing control interfaces that enable the human user to control the robot. This thesis builds on prior work that uses a laser pointer to allow the person to intuitively communicate their goals to a robot by creating a `clickable world'. Specifically, this thesis reduces the infrastructure needed for the robot to recognize the user's goal by augmenting the laser pointer with a small camera, an inertial measurement unit (IMU), and a laser rangefinder to estimate the location of the object to be grasped. The robot then drives to the approximate target location given by input from the laser pointer while using an onboard camera to detect an object near the target location. Local autonomy on the robot is used to visually navigate to the detected object to enable object retrieval.
Results show a successful proof of concept in demonstrating reasonable detection of user intent on a 1.23 x 1.83 meters squared test grid. Testing of the estimation of object location in the odometry frame fell within range of successful local autonomy object retrieval for an environment with a single object. Future work includes testing on a wide variety of dropped objects and in cluttered environments which is needed to validate the effectiveness of the system for potential end users.
|
2 |
Rule Governance in an African White-necked Raven (Corvus albicollis)Cory, Emily Faun January 2012 (has links)
Rule governance is critical to human society. However, could rule governance be found in non-human animals? A six year old, female, African white-necked raven (Covrus albicollis) named Shade correctly followed informal verbal commands to retrieve specified objects in the past. This ability was tested using two different methods. Both methods involved the researcher verbally asking the bird to retrieve one object out of two either from the same room or an adjacent room. While initial results were not significantly different than chance, review of trial recordings revealed that it is possible to predict when the bird will retrieve an incorrect object based solely on specific behaviors, termed inattentive or uninterested. Trials marked as inattentive by observers were significantly more likely to be incorrect than correct. This indicates that the bird was capable of retrieving the correct object, but that she also occasionally, intentionally retrieved the incorrect object.
|
3 |
Advancing large scale object retrievalArandjelovic, Relja January 2013 (has links)
The objective of this work is object retrieval in large scale image datasets, where the object is specified by an image query and retrieval should be immediate at run time. Such a system has a wide variety of applications including object or location recognition, video search, near duplicate detection and 3D reconstruction. The task is very challenging because of large variations in the imaged object appearance due to changes in lighting conditions, scale and viewpoint, as well as partial occlusions. A starting point of established systems which tackle the same task is detection of viewpoint invariant features, which are then quantized into visual words and efficient retrieval is performed using an inverted index. We make the following three improvements to the standard framework: (i) a new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements; (ii) a novel discriminative method for query expansion; (iii) a new feature augmentation method. Scaling up to searching millions of images involves either distributing storage and computation across many computers, or employing very compact image representations on a single computer combined with memory-efficient approximate nearest neighbour search (ANN). We take the latter approach and improve VLAD, a popular compact image descriptor, using: (i) a new normalization method to alleviate the burstiness effect; (ii) vocabulary adaptation to reduce influence of using a bad visual vocabulary; (iii) extraction of multiple VLADs for retrieval and localization of small objects. We also propose a method, SCT, for extremely low bit-rate compression of descriptor sets in order to reduce the memory footprint of ANN. The problem of finding images of an object in an unannotated image corpus starting from a textual query is also considered. Our approach is to first obtain multiple images of the queried object using textual Google image search, and then use these images to visually query the target database. We show that issuing multiple queries significantly improves recall and enables the system to find quite challenging occurrences of the queried object. Current retrieval techniques work only for objects which have a light coating of texture, while failing completely for smooth (fairly textureless) objects best described by shape. We present a scalable approach to smooth object retrieval and illustrate it on sculptures. A smooth object is represented by its imaged shape using a set of quantized semi-local boundary descriptors (a bag-of-boundaries); the representation is suited to the standard visual word based object retrieval. Furthermore, we describe a method for automatically determining the title and sculptor of an imaged sculpture using the proposed smooth object retrieval system.
|
4 |
Interactive Object Retrieval using Interpretable Visual ModelsRebai, Ahmed 18 May 2011 (has links) (PDF)
This thesis is an attempt to improve visual object retrieval by allowing users to interact with the system. Our solution lies in constructing an interactive system that allows users to define their own visual concept from a concise set of visual patches given as input. These patches, which represent the most informative clues of a given visual category, are trained beforehand with a supervised learning algorithm in a discriminative manner. Then, and in order to specialize their models, users have the possibility to send their feedback on the model itself by choosing and weighting the patches they are confident of. The real challenge consists in how to generate concise and visually interpretable models. Our contribution relies on two points. First, in contrast to the state-of-the-art approaches that use bag-of-words, we propose embedding local visual features without any quantization, which means that each component of the high-dimensional feature vectors used to describe an image is associated to a unique and precisely localized image patch. Second, we suggest using regularization constraints in the loss function of our classifier to favor sparsity in the models produced. Sparsity is indeed preferable for concision (a reduced number of patches in the model) as well as for decreasing prediction time. To meet these objectives, we developed a multiple-instance learning scheme using a modified version of the BLasso algorithm. BLasso is a boosting-like procedure that behaves in the same way as Lasso (Least Absolute Shrinkage and Selection Operator). It efficiently regularizes the loss function with an additive L1-constraint by alternating between forward and backward steps at each iteration. The method we propose here is generic in the sense that it can be used with any local features or feature sets representing the content of an image region.
|
5 |
Large scale image retrieval base on user generated contentOlivares Ríos, Ximena 02 March 2011 (has links)
Los sistemas online para compartir fotos proporcionan una valiosa fuente de
contenidos generado por el usuario (UGC). La mayor a de los sistemas de re-
cuperaci on de im agenes Web utilizan las anotaciones textuales para rankear los
resultados, sin embargo estas anotaciones no s olo ilustran el contenido visual
de una imagen, sino que tambi en describen situaciones subjetivas, espaciales,
temporales y sociales, que complican la tarea de b usqueda basada en palabras
clave.
La investigaci on en esta tesis se centra en c omo mejorar la recuperaci on de
im agenes en sistemas de gran escala, es decir, la Web, combinando informaci on
proporcionada por los usuarios m as el contenido visual de las im agenes. En el
presente trabajo se exploran distintos tipos de UGC, tales como anotaciones de
texto, anotaciones visuales, y datos de click-through, as como diversas t ecnicas
para combinar esta informaci on con el objetivo de mejorar la recuperaci on de
im agenes usando informaci on visual.
En conclusi on, la investigaci on realizada en esta tesis se centra en la impor-
tancia de incluir la informaci on visual en distintas etapas de la recuperaci on
de contenido. Combinando informaci on visual con otras formas de UGC, es
posible mejorar signi cativamente el rendimiento de un sistema de recuperaci on
de im agenes y cambiar la experiencia del usuario en la b usqueda de contenidos
multimedia en la Web. / Online photo sharing systems provide a valuable source of user generated content
(UGC). Most Web image retrieval systems use textual annotations to rank the
results, although these annotations do not only illustrate the visual content of
an image, but also describe subjective, spatial, temporal, and social dimensions,
complicating the task of keyword based search.
The research in this thesis is focused on how to improve the retrieval of
images in large scale context , i.e. the Web, using information provided by users
combined with visual content from images. Di erent forms of UGC are explored,
such as textual annotations, visual annotations, and click-through-data, as well
as di erent techniques to combine these data to improve the retrieval of images
using visual information.
In conclusion, the research conducted in this thesis focuses on the impor-
tance to include visual information into various steps of the retrieval of media
content. Using visual information, in combination with various forms of UGC,
can signi cantly improve the retrieval performance and alter the user experience
when searching for multimedia content on the Web.
1
|
6 |
Interactive Object Retrieval using Interpretable Visual Models / Recherche Interactive d'Objets à l'Aide de Modèles Visuels InterprétablesRebai, Ahmed 18 May 2011 (has links)
L'objectif de cette thèse est d'améliorer la recherche d'objets visuels à l'aide de l'interactivité avec l'utilisateur. Notre solution est de construire un système intéractif permettant aux utilisateurs de définir leurs propres concepts visuels à partir de certains mots-clés visuels. Ces mots-clés visuels, qui en théorie représentent les mots visuels les plus informatifs liés à une catégorie d'objets, sont appris auparavant à l'aide d'un algorithme d'apprentissage supervisé et d'une manière discriminative. Le challenge est de construire des mots-clés visuels concis et interprétables. Notre contribution repose sur deux points. D'abord, contrairement aux approches existantes qui utilisent les sacs de mots, nous proposons d'employer les descripteurs locaux sans aucune quantification préalable. Deuxièmement, nous proposons d'ajouter une contrainte de régularisation à la fonction de perte de notre classifieur pour favoriser la parcimonie des modèles produits. La parcimonie est en effet préférable pour sa concision (nombre de mots visuels réduits) ainsi pour sa diminution du temps de prédiction. Afin d'atteindre ces objectifs, nous avons développé une méthode d'apprentissage à instances multiples utilisant une version modifiée de l'algorithme BLasso. Cet algorithme est une forme de boosting qui se comporte similairement au LASSO (Least Absolute Shrinkage and Selection Operator). Il régularise efficacement la fonction de perte avec une contrainte additive de type L1 et ceci en alternant entre des itérations en avant et en arrière. La méthode proposée est générique dans le sens où elle pourrait être utilisée avec divers descripteurs locaux voire un ensemble structuré de descripteurs locaux qui décrit une région locale de l'image. / This thesis is an attempt to improve visual object retrieval by allowing users to interact with the system. Our solution lies in constructing an interactive system that allows users to define their own visual concept from a concise set of visual patches given as input. These patches, which represent the most informative clues of a given visual category, are trained beforehand with a supervised learning algorithm in a discriminative manner. Then, and in order to specialize their models, users have the possibility to send their feedback on the model itself by choosing and weighting the patches they are confident of. The real challenge consists in how to generate concise and visually interpretable models. Our contribution relies on two points. First, in contrast to the state-of-the-art approaches that use bag-of-words, we propose embedding local visual features without any quantization, which means that each component of the high-dimensional feature vectors used to describe an image is associated to a unique and precisely localized image patch. Second, we suggest using regularization constraints in the loss function of our classifier to favor sparsity in the models produced. Sparsity is indeed preferable for concision (a reduced number of patches in the model) as well as for decreasing prediction time. To meet these objectives, we developed a multiple-instance learning scheme using a modified version of the BLasso algorithm. BLasso is a boosting-like procedure that behaves in the same way as Lasso (Least Absolute Shrinkage and Selection Operator). It efficiently regularizes the loss function with an additive L1-constraint by alternating between forward and backward steps at each iteration. The method we propose here is generic in the sense that it can be used with any local features or feature sets representing the content of an image region. / تعالج هذه الأطروحة مسألة البحث عن الأشياء في الصور الثابتة و هي محاولة لتحسين نتائج البحث المنتظرة عن طريق تفاعل المستخدم مع النظام . يتمثل الحل المقترح في تصميم نظام تفاعلي يتيح للمستخدم صياغة مفهومه المرئي عن طريق مجموعة مقتضبة من أجزاء صغيرة للصور هي عبارة عن كلمات مفاتيح قد تم تعلمها سابقا عن طريق تعلم آلي استنتاجي . يمكن للمستخدم حينئذ تخصيص أنموذجه أولا بالاختيار ثم بترجيح الأجزاء التي يراها مناسبة . يتمثل التحدي القائم في كيفية توليد نماذج مرئية مفهومة و مقتضبة . نكون قد ساهمنا في هذا المجال بنقطتين أساسيتين تتمثل الأولى في إدماج الواصفات المحلية للصور دون أي تكميم ، و بذلك يكون كل مكون من ناقلات الميزات ذات الأبعاد العالية مرتبط حصريا بمكان وحيد و محدد في الصورة . ثانيا ، نقترح إضافة قيود تسوية لدالة الخسارة من أجل التحصل على حلول متفرقة و مقتضبة . يساهم ذلك في تقلص عدد هذه الأجزاء المرئية و بالتالي في ربح إضافي لوقت التكهن . في إطار تحقيق الأهداف المرسومة ، قمنا بإعداد مشروع تعلم قائم على تعدد الأمثلة يرتكز أساسا على نسخة محورة لخوارزمية بلاسو . تجدر الإشارة في الأخير أنه يمكن توظيف هذا العمل باستخدام نوع أو عدة أنواع من الواصفات المحلية للصور.
|
7 |
Indexation et recherche de contenus par objet visuel / Object-based visual content indexing and retrievalBursuc, Andrei 21 December 2012 (has links)
La question de recherche des objets vidéo basés sur le contenu lui-même, est de plus en plus difficile et devient un élément obligatoire pour les moteurs de recherche vidéo. Cette thèse présente un cadre pour la recherche des objets vidéo définis par l'utilisateur et apporte deux grandes contributions. La première contribution, intitulée DOOR (Dynamic Object Oriented Retrieval), est un cadre méthodologique pour la recherche et récupération des instances d'objets vidéo sélectionnés par un utilisateur, tandis que la seconde contribution concerne le support offert pour la recherche des vidéos, à savoir la navigation dans les vidéo, le système de récupération de vidéos et l'interface avec son architecture sous-jacente.Dans le cadre DOOR, l’objet comporte une représentation hybride obtenues par une sur-segmentation des images, consolidé avec la construction des graphs d’adjacence et avec l’agrégation des points d'intérêt. L'identification des instances d'objets à travers plusieurs vidéos est formulée comme un problème d’optimisation de l'énergie qui peut approximer un tache NP-difficile. Les objets candidats sont des sous-graphes qui rendent une énergie optimale vers la requête définie par l'utilisateur. Quatre stratégies d'optimisation sont proposées: Greedy, Greedy relâché, recuit simulé et GraphCut. La représentation de l'objet est encore améliorée par l'agrégation des points d'intérêt dans la représentation hybride, où la mesure de similarité repose sur une technique spectrale intégrant plusieurs types des descripteurs. Le cadre DOOR est capable de s’adapter à des archives vidéo a grande échelle grâce à l'utilisation de représentation sac-de-mots, enrichi avec un algorithme de définition et d’expansion de la requête basée sur une approche multimodale, texte, image et vidéo. Les techniques proposées sont évaluées sur plusieurs corpora de test TRECVID et qui prouvent leur efficacité.La deuxième contribution, OVIDIUS (On-line VIDeo Indexing Universal System) est une plate-forme en ligne pour la navigation et récupération des vidéos, intégrant le cadre DOOR. Les contributions de cette plat-forme portent sur le support assuré aux utilisateurs pour la recherche vidéo - navigation et récupération des vidéos, interface graphique. La plate-forme OVIDIUS dispose des fonctionnalités de navigation hiérarchique qui exploite la norme MPEG-7 pour la description structurelle du contenu vidéo. L'avantage majeur de l'architecture propose c’est sa structure modulaire qui permet de déployer le système sur terminaux différents (fixes et mobiles), indépendamment des systèmes d'exploitation impliqués. Le choix des technologies employées pour chacun des modules composant de la plate-forme est argumentée par rapport aux d'autres options technologiques. / With the ever increasing amount of available video content on video repositories the issue of content-based video objects retrieval is growing in difficulty and becomes a mandatory feature for video search engines.The present thesis advances a user defined video object retrieval framework and brings two major contributions. The first contribution is a methodological framework for user selected video object instances retrieval, entitled DOOR (Dynamic Object Oriented Retrieval), while the second one concerns the support offered for video retrieval, namely the video navigation and retrieval system and interface and its underlying architecture.Under the DOOR framework, the user defined video object comports a hybrid representation obtained by over-segmenting the frames, constructing region adjacency graphs and aggregating interest points. The identification of object instances across multiple videos is formulated as an energy optimization problem approximating an NP-hard problem. Object candidates are sub-graphs that yield an optimum energy towards the user defined query. In order to obtain the optimum energy four optimization strategies are proposed: Greedy, Relaxed Greedy, Simulated Annealing and GraphCut. The region-based object representation is further improved by the aggregation of interest points into a hybrid object representation. The similarity between an object and a frame is achieved with the help of a spectral matching technique integrating both colorimetric and interest points descriptors.The DOOR framework is suitable to large scale video archives through the use of a Bag-of-Words representation enriched with a query definition and expansion mechanism based on a multi-modal, text-image-video principle.The performances of the proposed techniques are evaluated on multiple TRECVID video datasets prooving their effectiveness.The second contribution is related to the user support for video retrieval - video navigation, video retrieval, graphical interface - and consists in the OVIDIUS (On-line VIDeo Indexing Universal System) on-line video browsing and retrieval platform. The OVIDIUS platform features hierarchical video navigation functionalities that exploit the MPEG-7 approach for structural description of video content. The DOOR framework is integrated in the OVIDIUS platform, ensuring the search functionalities of the system. The major advantage of the proposed system concerns its modular architecture which makes it possible to deploy the system on various terminals (both fixed and mobile), independently of the exploitation systems involved. The choice of the technologies employed for each composing module of the platform is argumented in comparison with other technological options. Finally different scenarios and use cases for the OVIDIUS platform are presented.
|
8 |
Meta-Pseudo Labelled Multi-View 3D Shape Recognition / Meta-pseudomärking med Bilder från Flera Kameravinklar för 3D ObjektigenkänningUçkun, Fehmi Ayberk January 2023 (has links)
The field of computer vision has long pursued the challenge of understanding the three-dimensional world. This endeavour is further fuelled by the increasing demand for technologies that rely on accurate perception of the 3D environment such as autonomous driving and augmented reality. However, the labelled data scarcity in the 3D domain continues to be a hindrance to extensive research and development. Semi-Supervised Learning is a valuable tool to overcome data scarcity yet most of the state-of-art methods are primarily developed and tested for two-dimensional vision problems. To address this challenge, there is a need to explore innovative approaches that can bridge the gap between 2D and 3D domains. In this work, we propose a technique that both leverages the existing abundance of two-dimensional data and makes the state-of-art semi-supervised learning methods directly applicable to 3D tasks. Multi-View Meta Pseudo Labelling (MV-MPL) combines one of the best-performing architectures in 3D shape recognition, Multi-View Convolutional Neural Networks, together with the state-of-art semi-supervised method, Meta Pseudo Labelling. To evaluate the performance of MV-MPL, comprehensive experiments are conducted on widely used shape recognition benchmarks ModelNet40, ShapeNetCore-v1, and ShapeNetCore-v2, as well as, Objaverse-LVIS. The results demonstrate that MV-MPL achieves competitive accuracy compared to fully supervised models, even when only \(10%\) of the labels are available. Furthermore, the study reveals that the object descriptors extracted from the MV-MPL model exhibit strong performance on shape retrieval tasks, indicating the effectiveness of the approach beyond classification objectives. Further analysis includes the evaluation of MV-MPL under more restrained scenarios, the enhancements to the view aggregation and pseudo-labelling processes; and the exploration of the potential of employing multi-views as augmentations for semi-supervised learning. / Forskningsområdet för datorseende har länge strävat efter utmaningen att förstå den tredimensionella världen. Denna strävan drivs ytterligare av den ökande efterfrågan på teknologier som är beroende av en korrekt uppfattning av den tredimensionella miljön, såsom autonom körning och förstärkt verklighet. Dock fortsätter bristen på märkt data inom det tredimensionella området att vara ett hinder för omfattande forskning och utveckling. Halv-vägledd lärning (semi-supervised learning) framträder som ett värdefullt verktyg för att övervinna bristen på data, ändå är de flesta av de mest avancerade semisupervised-metoderna primärt utvecklade och testade för tvådimensionella problem inom datorseende. För att möta denna utmaning krävs det att utforska innovativa tillvägagångssätt som kan överbrygga klyftan mellan 2D- och 3D-domänerna. I detta arbete föreslår vi en teknik som både utnyttjar den befintliga överflöd av tvådimensionella data och gör det möjligt att direkt tillämpa de mest avancerade semisupervised-lärandemetoderna på 3D-uppgifter. Multi-View Meta Pseudo Labelling (MV-MPL) kombinerar en av de bästa arkitekturerna för 3D-formigenkänning, Multi-View Convolutional Neural Networks, tillsammans med den mest avancerade semisupervised-metoden, Meta Pseudo Labelling. För att utvärdera prestandan hos MV-MPL genomförs omfattande experiment på väl använda uvärderingar för formigenkänning., ModelNet40, ShapeNetCore-v1 och ShapeNetCore-v2. Resultaten visar att MV-MPL uppnår konkurrenskraftig noggrannhet jämfört med helt vägledda modeller, även när endast \(10%\) av etiketterna är tillgängliga. Dessutom visar studien att objektbeskrivningarna som extraherats från MV-MPL-modellen uppvisar en stark prestanda i formåterhämtningsuppgifter, vilket indikerar effektiviteten hos tillvägagångssättet bortom klassificeringsmål. Vidare analys inkluderar utvärderingen av MV-MPL under mer begränsade scenarier, förbättringar av vyaggregerings- och pseudomärkningsprocesserna samt utforskning av potentialen att använda bilder från flera vinklar som en metod att få mer data för halv-vägledd lärande.
|
Page generated in 0.0409 seconds