• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 334
  • 42
  • 19
  • 13
  • 10
  • 8
  • 4
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 522
  • 522
  • 239
  • 201
  • 162
  • 128
  • 109
  • 108
  • 104
  • 86
  • 84
  • 76
  • 73
  • 72
  • 69
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Visual saliency computation for image analysis

Zhang, Jianming 08 December 2016 (has links)
Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training.
122

Image context for object detection, object context for part detection

Gonzalez-Garcia, Abel January 2018 (has links)
Objects and parts are crucial elements for achieving automatic image understanding. The goal of the object detection task is to recognize and localize all the objects in an image. Similarly, semantic part detection attempts to recognize and localize the object parts. This thesis proposes four contributions. The first two make object detection more efficient by using active search strategies guided by image context. The last two involve parts. One of them explores the emergence of parts in neural networks trained for object detection, whereas the other improves on part detection by adding object context. First, we present an active search strategy for efficient object class detection. Modern object detectors evaluate a large set of windows using a window classifier. Instead, our search sequentially chooses what window to evaluate next based on all the information gathered before. This results in a significant reduction on the number of necessary window evaluations to detect the objects in the image. We guide our search strategy using image context and the score of the classifier. In our second contribution, we extend this active search to jointly detect pairs of object classes that appear close in the image, exploiting the valuable information that one class can provide about the location of the other. This leads to an even further reduction on the number of necessary evaluations for the smaller, more challenging classes. In the third contribution of this thesis, we study whether semantic parts emerge in Convolutional Neural Networks trained for different visual recognition tasks, especially object detection. We perform two quantitative analyses that provide a deeper understanding of their internal representation by investigating the responses of the network filters. Moreover, we explore several connections between discriminative power and semantics, which provides further insights on the role of semantic parts in the network. Finally, the last contribution is a part detection approach that exploits object context. We complement part appearance with the object appearance, its class, and the expected relative location of the parts inside it. We significantly outperform approaches that use part appearance alone in this challenging task.
123

Localizing spatially and temporally objects and actions in videos

Kalogeiton, Vasiliki January 2018 (has links)
The rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization. Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this performance gap. We define and evaluate several domain shift factors: spatial location accuracy, appearance diversity, image quality, aspect distribution, and object size and camera framing. We examine the impact of these factors by comparing the detection performance before and after cancelling them out. The results show that all five factors affect the performance of the detectors and their combined effect explains the performance gap. While most existing approaches for detection in videos focus on objects or human actions separately, in the second part of this thesis we aim at detecting non-human centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting object-action pairs in videos, and show that both tasks of object and action detection benefit from this joint learning. In experiments on the A2D dataset [Xu et al., 2015], we obtain state-of-the-art results on segmentation of object-action pairs. In the third part, we are the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of-the-art approaches do. The same way modern detectors rely on anchor boxes, our tubelet detector is based on anchor cuboids by taking as input a sequence of frames and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our tubelet detector outperforms all state of the art on the UCF-Sports [Rodriguez et al., 2008], J-HMDB [Jhuang et al., 2013a], and UCF-101 [Soomro et al., 2012] action localization datasets especially at high overlap thresholds. The improvement in detection performance is explained by both more accurate scores and more precise localization.
124

Object Detection using deep learning and synthetic data

Lidberg, Love January 2018 (has links)
This thesis investigates how synthetic data can be utilized when training convolutional neural networks to detect flags with threatening symbols. The synthetic data used in this thesis consisted of rendered 3D flags with different textures and flags cut out from real images. The synthetic data showed that it can achieve an accuracy above 80% compared to 88% accuracy achieved by a data set containing only real images. The highest accuracy scored was achieved by combining real and synthetic data showing that synthetic data can be used as a complement to real data. Some attempts to improve the accuracy score was made using generative adversarial networks without achieving any encouraging results.
125

Object Detection and Tracking

Al-Ridha, Moatasem Yaseen 01 May 2013 (has links)
An improved object tracking algorithm based Kalman filtering is developed in this thesis. The algorithm uses a median filter and morphological operations during tracking. The problem created by object shadows is identified and the primary focus is to incorporate shadow detection and removal to improve tracking multiple objects in complex scenes. It is shown that the Kalman filter, without the improvements, fails to remove shadows that connect different objects. The application of the median filter helps the separation of different objects and thus enables the tracking of multiple objects individually. The performances of the Kalman filter and the improved tracking algorithm were tested on a highway video sequence of moving cars and it is shown that the proposed algorithm yields better performance in the presence of shadows.
126

Robust visual detection and tracking of complex objects : applications to space autonomous rendez-vous and proximity operations / Détection et suivi visuels robustes d'objets complexes : applications au rendezvous spatial autonome

Petit, Antoine 19 December 2013 (has links)
Dans cette thèse nous étudions le fait de localiser complètement un objet connu par vision artificielle, en utilisant une caméra monoculaire, ce qui constitue un problème majeur dans des domaines comme la robotique. Une attention particulière est ici portée sur des applications de robotique spatiale, dans le but de concevoir un système de localisation visuelle pour des opérations de rendez-vous spatial autonome. Deux composantes principales du problème sont abordées: celle de la localisation initiale de l'objet ciblé, puis celle du suivi de cet objet image par image, donnant la pose complète entre la caméra et l'objet, connaissant le modèle 3D de l'objet. Pour la détection, l'estimation de pose est basée sur une segmentation de l'objet en mouvement et sur une procédure probabiliste d'appariement et d'alignement basée contours de vues synthétiques de l'objet avec une séquence d'images initiales. Pour la phase de suivi, l'estimation de pose repose sur un algorithme de suivi basé modèle 3D, pour lequel nous proposons trois différents types de primitives visuelles, dans l'idée de décrire l'objet considéré par ses contours, sa silhouette et par un ensemble de points d'intérêts. L'intégrité du système de localisation est elle évaluée en propageant l'incertitude sur les primitives visuelles. Cette incertitude est par ailleurs utilisée au sein d'un filtre de Kalman linéaire sur les paramètres de vitesse. Des tests qualitatifs et quantitatifs ont été réalisés, sur des données synthétiques et réelles, avec notamment des conditions d'image difficiles, montrant ainsi l'efficacité et les avantages des différentes contributions proposées, et leur conformité avec un contexte de rendez vous spatial. / In this thesis, we address the issue of fully localizing a known object through computer vision, using a monocular camera, what is a central problem in robotics. A particular attention is here paid on space robotics applications, with the aims of providing a unified visual localization system for autonomous navigation purposes for space rendezvous and proximity operations. Two main challenges of the problem are tackled: initially detecting the targeted object and then tracking it frame-by-frame, providing the complete pose between the camera and the object, knowing the 3D CAD model of the object. For detection, the pose estimation process is based on the segmentation of the moving object and on an efficient probabilistic edge-based matching and alignment procedure of a set of synthetic views of the object with a sequence of initial images. For the tracking phase, pose estimation is handled through a 3D model-based tracking algorithm, for which we propose three different types of visual features, pertinently representing the object with its edges, its silhouette and with a set of interest points. The reliability of the localization process is evaluated by propagating the uncertainty from the errors of the visual features. This uncertainty besides feeds a linear Kalman filter on the camera velocity parameters. Qualitative and quantitative experiments have been performed on various synthetic and real data, with challenging imaging conditions, showing the efficiency and the benefits of the different contributions, and their compliance with space rendezvous applications.
127

Content Detection in Handwritten Documents

January 2018 (has links)
abstract: Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document. Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD). / Dissertation/Thesis / Masters Thesis Computer Science 2018
128

Uma abordagem estrutural para detecção de objetos e localização em ambientes internos por dispositivos móveis / A structural approach for object detection and indoor localization with mobile devices

Henrique Morimitsu 29 August 2011 (has links)
A detecção de objetos é uma área de extrema importância para sistemas de visão computacional. Em especial, dado o aumento constante da utilização de dispositivos móveis, torna-se cada vez mais importante o desenvolvimento de métodos e aplicações capazes de serem utilizadas em tais aparelhos. Neste sentido, neste trabalho propõe-se o estudo e implementação de um aplicativo para dispositivos móveis capaz de detectar, em tempo real, objetos existentes em ambientes internos com uma aplicação para auxiliar um usuário a se localizar dentro do local. O aplicativo depende somente das capacidades do próprio aparelho e, portanto, procura ser mais flexível e sem restrições. A detecção de objetos é realizada por casamento de grafos-chave entre imagens de objetos pré-escolhidas e a imagem sendo capturada pela câmera do dispositivo. Os grafos-chave são uma generalização do método de detecção de pontos-chave tradicional e, por levarem em consideração um conjunto de pontos e suas propriedades estruturais, são capazes de descrever e detectar os objetos de forma robusta e eficiente. Para realizar a localização, optou-se por detectar placas existentes no próprio local. Após cada detecção, aplica-se um simples, mas bastante eficaz, sistema de localização baseado na comparação da placa detectada com uma base de dados de imagens de todo o ambiente. A base foi construída utilizando diversas câmeras colocadas sobre uma estrutura móvel, capturando sistematicamente imagens do ambiente em intervalos regulares. A implementação é descrita em detalhes e são apresentados resultados obtidos por testes reais no ambiente escolhido utilizando um celular Nokia N900. Tais resultados são avaliados em termos da precisão da detecção e da estimativa de localização, bem como do tempo decorrido para a realização de todo o processo. / Object detection is an area of extreme importance for computer vision systems. Specially because of the increasing use of mobile devices, it becomes more and more important to develop methods and applications that can be used in such devices. In this sense, we propose the study and implementation of an application for mobile devices that is able to detect, in real time, existing indoor objects with an application to help a user in localization in the environment. The application depends solely on the device capabilities and hence, it is flexible and unconstrained. Object detection is accomplished by keygraph matching between images of previously chosen signs and the image currently being captured by the camera device. Keygraphs are a generalization of the traditional keypoints method and, by taking into consideration a set of points and its structural properties, are capable of describing the objects robustly and efficiently. In order to perform localization, we chose to detect signs existing in the environment. After each detection, we apply a simple, but very effective, localization method based on a comparison between the detected sign and a dataset of images of the whole environment. The dataset was built using several cameras atop a mobile structure, systematically capturing images of the environment at regular intervals. The implementation is described in details and we show results obtained from real tests in the chosen environment using a Nokia N900 cell phone. Such results are evaluated in terms of detection and localization estimation precision, as well as the elapsed time to perform the whole process.
129

On the Construction of an Automatic Traffic Sign Recognition System

Jonsson, Fredrik January 2017 (has links)
This thesis proposes an automatic road sign recognition system, including all steps from the initial detection of road signs from a digital image to the final recognition step that determines the class of the sign. We develop a Bayesian approach for image segmentation in the detection step using colour information in the HSV (Hue, Saturation and Value) colour space. The image segmentation uses a probability model which is constructed based on manually extracted data on colours of road signs collected from real images. We show how the colour data is fitted using mixture multivariate normal distributions, where for the case of parameter estimation Gibbs sampling is used. The fitted models are then used to find the (posterior) probability of a pixel colour to belong to a road sign using the Bayesian approach. Following the image segmentation, regions of interest (ROIs) are detected by using the Maximally Stable Extremal Region (MSER) algorithm, followed by classification of the ROIs using a cascade of classifiers. Synthetic images are used in training of the classifiers, by applying various random distortions to a set of template images constituting most road signs in Sweden, and we demonstrate that the construction of such synthetic images provides satisfactory recognition rates. We focus on a large set of the signs on the Swedish road network, including almost 200 road signs. We use classification models such as the Support Vector Machine (SVM), and Random Forest (RF), where for features we use Histogram of Oriented Gradients (HOG).
130

Dynamic Data-Driven Visual Surveillance of Human Crowds via Cooperative Unmanned Vehicles

Minaeian, Sara, Minaeian, Sara January 2017 (has links)
Visual surveillance of human crowds in a dynamic environment has attracted a great amount of computer vision research efforts in recent years. Moving object detection, which conventionally includes motion segmentation and optionally, object classification, is the first major task for any visual surveillance application. After detecting the targets, estimation of their geo-locations is needed to create the same reference coordinate system for them for higher-level decision-making. Depending on the required fidelity of decision, multi-target data association may be also needed at higher levels to differentiate multiple targets in a series of frames. Applying all these vision-based algorithms to a crowd surveillance system (a major application studied in this dissertation) using a team of cooperative unmanned vehicles (UVs), introduces new challenges to the problem. Since the visual sensors move with the UVs, and thus the targets and the environment are dynamic, it adds to the complexity and uncertainty of the video processing. Moreover, the limited onboard computation resources require more efficient algorithms to be proposed. Responding to these challenges, the goal of this dissertation is to design and develop an effective and efficient visual surveillance system based on dynamic data driven application system (DDDAS) paradigm to be used by the cooperative UVs for autonomous crowd control and border patrol. The proposed visual surveillance system includes different modules: 1) a motion detection module, in which a new method for detecting multiple moving objects, based on sliding window is proposed to segment the moving foreground using the moving camera onboard the unmanned aerial vehicle (UAV); 2) a target recognition module, in which a customized method based on histogram-of-oriented-gradients is applied to classify the human targets using the onboard camera of unmanned ground vehicle (UGV); 3) a target geo-localization module, in which a new moving-landmark-based method is proposed for estimating the geo-location of the detected crowd from the UAV, while a heuristic method based on triangulation is applied for geo-locating the detected individuals via the UGV; and 4) a multi-target data association module, in which the affinity score is dynamically adjusted to comply with the changing dispersion of the detected targets over successive frames. In this dissertation, a cooperative team of one UAV and multiple UGVs with onboard visual sensors is used to take advantage of the complementary characteristics (e.g. different fidelities and view perspectives) of these UVs for crowd surveillance. The DDDAS paradigm is also applied toward these vision-based modules, where the computational and instrumentation aspects of the application system are unified for more accurate or efficient analysis according to the scenario. To illustrate and demonstrate the proposed visual surveillance system, aerial and ground video sequences from the UVs, as well as simulation models are developed, and experiments are conducted using them. The experimental results on both developed videos and literature datasets reveal the effectiveness and efficiency of the proposed modules and their promising performance in the considered crowd surveillance application.

Page generated in 0.0963 seconds