Spelling suggestions: "subject:"abject localization"" "subject:"6bject localization""
1 |
Object localization in weakly labeled images and videosRochan, Mrigank 06 1900 (has links)
We consider the problem of localizing objects in weakly labeled images/videos. An image/video (e.g., Flickr image and YouTube video) is weakly labeled if it is associated with a tag describing the main object present in the image/video. It is weakly labeled because the tag only indicates the presence/absence of the object, but does not provide the detailed spatial location of the object. Given an image/video with an object tag, our goal is to localize the object in it. In this thesis, we propose two novel techniques to handle this challenging problem. First, we build a video-specific object appearance model and then incorporate temporal consistency information to localize the object. Second, we make use of existing detectors of some other object classes (which we call "familiar objects") to build the appearance model of the unseen object class (i.e., the object of interest). Experimental results show the effectiveness of the proposed methods. / October 2016
|
2 |
Robust 2-D Model-Based Object RecognitionCass, Todd A. 01 May 1988 (has links)
Techniques, suitable for parallel implementation, for robust 2D model-based object recognition in the presence of sensor error are studied. Models and scene data are represented as local geometric features and robust hypothesis of feature matchings and transformations is considered. Bounds on the error in the image feature geometry are assumed constraining possible matchings and transformations. Transformation sampling is introduced as a simple, robust, polynomial-time, and highly parallel method of searching the space of transformations to hypothesize feature matchings. Key to the approach is that error in image feature measurement is explicitly accounted for. A Connection Machine implementation and experiments on real images are presented.
|
3 |
Real-time localization of balls and hands in videos of juggling using a convolutional neural networkÅkerlund, Rasmus January 2019 (has links)
Juggling can be both a recreational activity that provides a wide variety of challenges to participants and an art form that can be performed on stage. Non-learning-based computer vision techniques, depth sensors, and accelerometers have been used in the past to augment these activities. These solutions either require specialized hardware or only work in a very limited set of environments. In this project, a 54 000 frame large video dataset of annotated juggling was created and a convolutional neural network was successfully trained that could locate the balls and hands with high accuracy in a variety of environments. The network was sufficiently light-weight to provide real-time inference on CPUs. In addition, the locations of the balls and hands were recorded for thirty-six common juggling pattern, and small neural networks were trained that could categorize them almost perfectly. By building on the publicly available code, models and datasets that this project has produced jugglers will be able to create interactive juggling games for beginners and novel audio-visual enhancements for live performances.
|
4 |
Detekce mobilního robotu zpracováním obrazu / Mobile robot detection using image processingNovotný, Stanislav January 2012 (has links)
This master´s thesis deals with processing of image sequence taken by statically placed camera over plane of robot movement. At first there are methods for image segmentation and localization methods described. In the next part, selected methods are implemented and compared to individual images. In the final part, selected methods are further implemented in algorithm for batch processing of image sequence.
|
5 |
3D Position Estimation using Deep LearningPedrazzini, Filippo January 2018 (has links)
The estimation of the 3D position of an object is one of the most important topics in the computer vision field. Where the final aim is to create automated solutions that can localize and detect objects from images, new high-performing models and algorithms are needed. Due to lack of relevant information in the single 2D images, approximating the 3D position can be considered a complex problem. This thesis describes a method based on two deep learning models: the image net and the temporal net that can tackle this task. The former is a deep convolutional neural network with the intention to extract meaningful features from the images, while the latter exploits the temporal information to reach a more robust prediction. This solution reaches a better Mean Absolute Error compared to already existing computer vision methods on different conditions and configurations. A new data-driven pipeline has been created to deal with 2D videos and extract the 3D information of an object. The same architecture can be generalized to different domains and applications. / Uppskattning av 3D-positionen för ett objekt är ett viktigt område inom datorseende. Då det slutliga målet är att skapa automatiserade lösningar som kan lokalisera och upptäcka objekt i bilder, behövs nya, högpresterande modeller och algoritmer. Bristen på relevant information i de enskilda 2D-bilderna gör att approximering av 3D-positionen blir ett komplext problem. Denna uppsats beskriver en metod baserad på två djupinlärningsmodeller: image net och temporal net. Den förra är ett djupt nätverk som kan extrahera meningsfulla egenskaper från bilderna, medan den senare utnyttjar den tidsmässiga informationen för att kunna göra mer robusta förutsägelser. Denna lösning erhåller ett lägre genomsnittligt absolut fel jämfört med existerande metoder, under olika villkor och konfigurationer. En ny datadriven arkitektur har skapats för att hantera 2D-videoklipp och extrahera 3D-informationen för ett objekt. Samma arkitektur kan generaliseras till olika domäner och applikationer.
|
6 |
Localisation d'objets urbains à partir de sources multiples dont des images aériennes / Localization of urban objects from multiple sources, including aerial imageryPibre, Lionel 30 November 2018 (has links)
Cette thèse aborde des problèmes liés à la localisation et la reconnaissance d’objets urbains dans des images multi-sources (optique, infrarouge, Modèle Numérique de Surface) de très haute précision acquises par voie aérienne.Les objets urbains (lampadaires, poteaux, voitures, arbres…) présentent des dimensions, des formes, des textures et des couleurs très variables. Ils peuvent être collés les uns les autres et sont de petite taille par rapport à la dimension d’une image. Ils sont présents en grand nombre mais peuvent être partiellement occultés. Tout ceci rend les objets urbains difficilement identifiables par les techniques actuelles de traitement d’images.Dans un premier temps, nous avons comparé les approches d’apprentissage classiques, composées de deux étapes - extraction de caractéristiques par le biais d’un descripteur prédéfini et utilisation d’un classifieur - aux approches d’apprentissage profond (Deep Learning), et plus précisément aux réseaux de neurones convolutionnels (CNN). Les CNN donnent de meilleurs résultats mais leurs performances ne sont pas suffisantes pour une utilisation industrielle. Nous avons donc proposé deux améliorations.Notre première contribution consiste à combiner de manière efficace les données provenant de sources différentes. Nous avons comparé une approche naïve qui consiste à considérer toutes les sources comme des composantes d’une image multidimensionnelle à une approche qui réalise la fusion des informations au sein même du CNN. Pour cela, nous avons traité les différentes informations dans des branches séparées du CNN. Nous avons ainsi montré que lorsque la base d’apprentissage contient peu de données, combiner intelligemment les sources dans une phase de pré-traitement (nous combinons l'optique et l'infrarouge pour créer une image NDVI) avant de les donner au CNN améliore les performances.Pour notre seconde contribution, nous nous sommes concentrés sur le problème des données incomplètes. Jusque-là, nous considérions que nous avions accès à toutes les sources pour chaque image mais nous pouvons aussi nous placer dans le cas où une source n’est pas disponible ou utilisable pour une image. Nous avons proposé une architecture permettant de prendre en compte toutes les données, même lorsqu’il manque une source sur une ou plusieurs images. Nous avons évalué notre architecture et montré que sur un scénario d’enrichissement, cette architecture permet d'obtenir un gain de plus de 2% sur la F-mesure.Les méthodes proposées ont été testées sur une base de données publique. Elles ont pour objectif d’être intégrées dans un logiciel de la société Berger-Levrault afin d’enrichir les bases de données géographiques et ainsi faciliter la gestion du territoire par les collectivités locales. / This thesis addresses problems related to the location and recognition of urban objects in multi-source images (optical, infrared, terrain model) of very high precision acquired by air.Urban objects (lamp posts, poles, car, tree...) have dimensions, shapes, textures and very variable colors. They can be glued to each other and are small with respect to the size of an image. They are present in large numbers but can be partially hidden. All this makes urban objects difficult to identify with current image processing techniques.First, we compared traditional learning approaches, consisting of two stages - extracting features through a predefined descriptor and using a classifier - to deep learning approaches and more precisely Convolutional Neural Networks (CNN). CNNs give better results but their performances are not sufficient for industrial use. We therefore proposed two contributions to increase performance.The first is to efficiently combine data from different sources. We compared a naive approach that considers all sources as components of a multidimensional image to an approach that merges information within CNN itself. For this, we have processed the different information in separate branches of the CNN.For our second contribution, we focused on the problem of incomplete data. Until then, we considered that we had access to all the sources for each image but we can also place ourselves in the case where a source is not available or usable. We have proposed an architecture to take into account all the data, even when a source is missing in one or more images. We evaluated our architecture and showed that on an enrichment scenario, it allows to have a gain of more than 2% on the F-measure.The proposed methods were tested on a public database. They aim to be integrated into a Berger-Levrault company software in order to enrich geographic databases and thus facilitate the management of the territory by local authorities.
|
7 |
Modélisation, détection et classification d'objets urbains à partir d’images photographiques aériennes / Modeling, detection and classification of urban objects from aerial imagesPasquet, Jérôme 03 November 2016 (has links)
Cette thèse aborde des problèmes liés à la localisation et reconnaissance d'objets urbains dans des images aériennes de très haute définition. Les objets urbains se caractérisent par une représentation très variable en terme de forme, texture et couleur. De plus, ils sont présents de multiples fois sur les images à analyser et peuvent être collés les uns aux autres. Pour effectuer la localisation et reconnaissance automatiquement des différents objets nous proposons d'utiliser des approches d'apprentissage supervisé. De part leurs caractéristiques, les objets urbains sont difficilement détectables et les approches classiques de détections n'offrent pas de performances satisfaisantes. Nous avons proposé l'utilisation d'un réseau de séparateurs à vaste marge (SVM) afin de mieux fusionner les informations issues des différentes résolutions et donc d'améliorer la représentativité de l'objet urbain. L'utilisation de réseau de SVM permet d'améliorer les performances mais à un coût calculatoire important. Nous avons alors proposé d'utiliser un chemin d'activation permettant de réduire la complexité sans perdre en efficacité. Ce chemin va activer le réseau de manière séquentielle et stoppera l'exploration lorsque la probabilité de détection d'un objet est importante. Dans le cas d'une localisation basée sur l'extraction de caractéristiques puis la classification, la réduction calculatoire est d'un facteur cinq. Par la suite, nous avons montré que nous pouvons combiner le réseau de SVM avec les cartes de caractéristiques issues de réseaux de neurones convolutifs. Cette architecture combinée avec le chemin d'activation permet une réduction théorique du coût d'activation pouvant aller jusqu'à 97% avec un gain de performances d'environ 8% sur les données utilisées. Les méthodes développées ont pour objectif d'être intégrées dans un logiciel de la société Berger-Levrault afin de faciliter et d'améliorer la gestion de cadastre dans les collectivités locales. / This thesis deals with the problems of automatic localization and recognition of urban objects in high-definition aerial images. Urban object detection is a challenging problem because they vary in appearance, color and size. Moreover, there are many urban objects which can be very close to each other in an image. The localization and the automatic recognition of different urban objects, considering these characteristics, are very difficult to detect and classical image processing algorithms do not lead to good performances. We propose then to use the supervised learning approach. In a first time, we have built a Support Vector Machine (SVM) network to merge different resolutions in an efficient way. However, this method highly increases the computational cost. We then proposed to use an “activation path” which reduces the complexity without any loss of efficiency. This path activates sequentially the network and stops the exploration when an urban object has a high probability of detection. In the case of localizations based on a feature extraction step followed by a classification step, this may reduce by a factor 5 the computational cost. Thereafter, we show that we can combine an SVM network with feature maps which have been extracted by a Convolutional Neural Network. Such an architecture associated with the activation path increased the performance by 8% on our database while giving a theoretical reduction of the computational costs up to 97%. We implemented all these new methods in order to be integrated in the software framework of Berger-Levrault company, to improve land registry for local communities.
|
8 |
DETECTION AND SUB-PIXEL LOCALIZATION OF DIM POINT OBJECTSMridul Gupta (15426011) 08 May 2023 (has links)
<p>Detection of dim point objects plays an important role in many imaging applications such as early warning systems, surveillance, astronomy, and microscopy. In satellite imaging, natural phenomena, such as clouds, can confound object detection methods. We propose an object detection method that uses spatial, spectral, and temporal information to reject detections that are not consistent with a moving object and achieve a high probability of detection with a low false alarm rate. We propose another method for dim object detection using convolutional neural networks (CNN). The method augments a conventional space-based detection processing chain with a lightweight CNN to improve detection performance. For evaluation of the performance of our proposed methods,</p>
<p>we used a set of curated satellite images and generated receiver operating characteristics (ROC).</p>
<p><br></p>
<p>Most satellite images have adequate spatial resolution and signal-to-noise ratio (SNR) for the detection and localization of common large objects, such as buildings. In many applications, the spatial resolution of the imaging system is not enough to localize a point object or two closely-spaced objects (CSOs) that are described by only a few pixels (or less than one pixel). A low signal-to-noise ratio (SNR) increases the difficulty such as when the objects are dim. We describe a method to estimate the objects’ amplitudes and spatial locations with sub-pixel accuracy using non-linear optimization and information from multiple spectral bands. We also propose a machine</p>
<p>learning method that minimizes a cost function derived from the maximum likelihood estimation of the observed image to determine an object’s sub-pixel spatial location and amplitude. We derive the Cramer-Rao Lower Bound and compare the proposed estimators’ variance with this bound.</p>
|
9 |
Automatic parameter tuning in localization algorithms / Automatisk parameterjustering av lokaliseringsalgoritmerLundberg, Martin January 2019 (has links)
Many algorithms today require a number of parameters to be set in order to perform well in a given application. The tuning of these parameters is often difficult and tedious to do manually, especially when the number of parameters is large. It is also unlikely that a human can find the best possible solution for difficult problems. To be able to automatically find good sets of parameters could both provide better results and save a lot of time. The prominent methods Bayesian optimization and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are evaluated for automatic parameter tuning in localization algorithms in this work. Both methods are evaluated using a localization algorithm on different datasets and compared in terms of computational time and the precision and recall of the final solutions. This study shows that it is feasible to automatically tune the parameters of localization algorithms using the evaluated methods. In all experiments performed in this work, Bayesian optimization was shown to make the biggest improvements early in the optimization but CMA-ES always passed it and proceeded to reach the best final solutions after some time. This study also shows that automatic parameter tuning is feasible even when using noisy real-world data collected from 3D cameras.
|
10 |
An Energy-efficient And Reactive Remote Surveillance Framework Using Wireless Multimedia Sensor NetworksOztarak, Hakan 01 May 2012 (has links) (PDF)
With the introduction of Wireless Multimedia Sensor Networks, large-scale remote outdoor surveillance applications where the majority of the cameras will be battery-operated are envisioned. These are the applications where the frequency of incidents is too low to employ permanent staffing such as monitoring of land and marine border, critical infrastructures, bridges, water supplies, etc. Given the inexpensive costs of wireless resource constrained camera sensors, the size of these networks will be significantly larger than the traditional multi-camera systems. While large number of cameras may increase the coverage of the network, such a large size along with resource constraints poses new challenges, e.g., localization, classification, tracking or reactive behavior. This dissertation proposes a framework that transforms current multi-camera networks into low-cost and reactive systems which can be used in large-scale remote surveillance applications. Specifically, a remote surveillance system framework with three components is proposed: 1) Localization and tracking of objects / 2) Classification and identification of objects / and 3) Reactive behavior at the base-station. For each component, novel lightweight, storage-efficient and real-time algorithms both at the computation and communication level are designed, implemented and tested under a variety of conditions. The results have indicated the feasibility of this framework working with limited energy but having high object localization/classification accuracies. The results of this research will facilitate the design and development of very large-scale remote border surveillance systems and improve the systems effectiveness in dealing with the intrusions with reduced human involvement and labor costs.
|
Page generated in 0.0987 seconds