• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 6
  • 2
  • 1
  • Tagged with
  • 42
  • 42
  • 21
  • 14
  • 13
  • 11
  • 10
  • 10
  • 9
  • 8
  • 8
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Interactive segmentation of multiple 3D objects in medical images by optimum graph cuts = Segmentação interativa de múltiplos objetos 3D em imagens médicas por cortes ótimos em grafo / Segmentação interativa de múltiplos objetos 3D em imagens médicas por cortes ótimos em grafo

Moya, Nikolas, 1991- 03 December 2015 (has links)
Orientador: Alexandre Xavier Falcão / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-27T14:45:13Z (GMT). No. of bitstreams: 1 Moya_Nikolas_M.pdf: 5706960 bytes, checksum: 9304544bfe8a78039de8b62562531865 (MD5) Previous issue date: 2015 / Resumo: Segmentação de imagens médicas é crucial para extrair medidas de objetos 3D (estruturas anatômicas) que são úteis no diagnóstico e tratamento de doenças. Nestas aplicações, segmentação interativa é necessária quando métodos automáticos falham ou não são factíveis. Métodos por corte em grafo são considerados o estado da arte em segmentação interativa, mas diversas abordagens utilizam o algoritmo min-cut/max-flow, que é limitado à segmentação binária, sendo que segmentação de múltiplos objetos pode economizar tempo e esforço do usuário. Este trabalho revisita a transformada imagem floresta diferencial (DIFT, em inglês) -- uma abordagem por corte em grafo adequada para segmentação de múltiplos objetos -- resolvendo problemas relacionados a ela. O algoritmo da DIFT executa em tempo proporcional ao número de voxels nas regiões modificadas em cada execução da segmentação (sublinear). Tal característica é altamente desejável em segmentação interativa de imagens 3D para responder as ações do usuário em tempo real. O algoritmo da DIFT funciona da seguinte forma: o usuário desenha marcadores (traço com voxels de semente) rotulados dentro de cada objeto e o fundo, enquanto o computador interpreta a imagem como um grafo, cujos nós são os voxels e os arcos são definidos por pixels vizinhos, produzindo como resultado uma floresta de caminhos ótimos (partição na imagem) enraizada nos nós sementes do grafo. Nesta floresta, cada objeto é representado pela floresta de caminhos ótimos enraizado em suas sementes internas. Tais árvores são pintadas com a mesmo cor associada ao rótulo do marcador correspondente. Ao adicionar ou remover marcadores, é possível corrigir a segmentação até o mapa de rótulo de objeto representar o resultado desejado. Para garantir consistência na segmentação, métodos baseados em semente sempre devem manter a conectividade entre os voxels e suas sementes. Entretanto, isto não é mantido em algumas abordagens, como Random Walkers ou quando o mapa de rótulos é filtrado para suavizar a fronteira dos objetos. Esta conectividade é primordial para realizar correções sem recomeçar o processo depois de cada intervenção do usuário. Todavia, foi observado que a DIFT falha em manter consistência da segmentação em alguns casos. Consertamos este problema tanto no algoritmo da DIFT, quanto após a suavização dos objetos. Estes resultados comparam diversas estruturas anatômicas 3D de imagens de ressonância magnética e tomografia computadorizada / Abstract: Medical image segmentation is crucial to extract measures from 3D objects (body anatomical structures) that are useful for diagnosis and treatment of diseases. In such applications, interactive segmentation is necessary whenever automated methods fail or are not feasible. Graph-cut methods are considered the state of the art in interactive segmentation, but most approaches rely on the min-cut/max-flow algorithm, which is limited to binary segmentation while multi-object segmentation can considerably save user time and effort. This work revisits the differential image foresting transform (DIFT) ¿ a graph-cut approach suitable for multi-object segmentation in linear time ¿ and solves several problems related to it. Indeed, the DIFT algorithm can take time proportional to the number of voxels in the regions modified at each segmentation execution (sublinear time). Such a characteristic is highly desirable in 3D interactive segmentation to respond the user's actions as close as possible to real time. Segmentation using the DIFT works as follows: the user draws labeled markers (strokes of connected seed voxels) inside each object and background, while the computer interprets the image as a graph, whose nodes are the voxels and arcs are defined by neighboring voxels, and outputs an optimum-path forest (image partition) rooted at the seed nodes in the graph. In the forest, each object is represented by the optimum-path trees rooted at its internal seeds. Such trees are painted with same color associated to the label of the corresponding marker. By adding/removing markers, the user can correct segmentation until the forest (its object label map) represents the desired result. For the sake of consistency in segmentation, similar seed-based methods should always maintain the connectivity between voxels and seeds that have labeled them. However, this does not hold in some approaches, such as random walkers, or when the segmentation is filtered to smooth object boundaries. That connectivity is also paramount to make corrections without starting over the process at each user intervention. However, we observed that the DIFT algorithm fails in maintaining segmentation consistency in some cases. We have fixed this problem in the DIFT algorithm and when the obtained object boundaries are smoothed. These results are presented and evaluated on several 3D body anatomical structures from MR and CT images / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
32

Building an Understanding of Human Activities in First Person Video using Fuzzy Inference

Schneider, Bradley A. 23 May 2022 (has links)
No description available.
33

<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong>

Liu Yang (16642902) 07 August 2023 (has links)
<p>  </p> <p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p> <p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p> <p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p> <p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p> <p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p> <p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p>
34

Scene Recognition for Safety Analysis in Collaborative Robotics

Wang, Shaolei January 2018 (has links)
In modern industrial environments, human-robot collaboration is a trend in automation to improve performance and productivity. Instead of isolating robot from human to guarantee safety, collaborative robotics allows human and robot working in the same area at the same time. New hazards and risks, such as the collision between robot and human, arise in this situation. Safety analysis is necessary to protect both human and robot when using a collaborative robot.To perform safety analysis, robots need to perceive the surrounding environment in realtime. This surrounding environment is perceived and stored in the form of scene graph, which is a direct graph with semantic representation of the environment, the relationship between the detected objects and properties of these objects. In order to generate the scene graph, a simulated warehouse is used: robots and humans work in a common area for transferring products between shelves and conveyor belts. Each robot generates its own scene graph from the attached camera sensor. In the graph, each detected object is represented by a node and edges are used to denote the relationship among the identified objects. The graph node includes values like velocity, bounding box sizes, orientation, distance and directions between the object and the robot.We generate scene graph in a simulated warehouse scenario with the frequency of 7 Hz and present a study of Mask R-CNN based on the qualitative comparison. Mask R-CNN is a method for object instance segmentation to get the properties of the objects. It uses ResNetFPN for feature extraction and adds a branch to Faster R-CNN for predicting segmentation mask for each object. And its results outperform almost all existing, single-model entries on instance segmentation and bounding-box object detection. With the help of this method, the boundaries of the detected object are extracted from the camera images. We initialize Mask R-CNN model using three different types of weights: COCO pre-trained weight, ImageNet pre-trained weight and random weight, and the results of these three different weights are compared w.r.t. precision and recall.Results showed that Mask R-CNN is also suitable for simulated environments and can meet requirements in both detection precision and speed. Moreover, the model trained used the COCO pre-trained weight outperformed the model with ImageNet and randomly assigned initial weights. The calculated Mean Average Precision (mAP) value for validation dataset reaches 0.949 with COCO pre-trained weights and execution speed of 11.35 fps. / I modern industriella miljöer, för att förbättra prestanda och produktivitet i automatisering är human-robot samarbete en trend. Istället för att isolera roboten från människan för att garantera säkerheten, möjliggör samarbets robotar att man och robot arbetar i samma område samtidigt. Nya risker, såsom kollisionen mellan robot och människa, uppstår i denna situation. Säkerhetsanalys är nödvändig för att skydda både människa och robot när man använder en samarbets robot.För att utföra säkerhetsanalys måste robotar uppfatta omgivningen i realtid. Denna omgivande miljö uppfattas och lagras i form av scen graf, som är ett direkt diagram med semantisk representation av miljön, samt förhållandet mellan de detekterade objekten och egenskaperna hos dessa objekt. För att skapa scen grafen används ett simulerat lager: robotar och människor arbetar i ett gemensamt område för överföring av produkter mellan hyllor och transportband. Varje robot genererar sin egen scen grafik från den medföljande kamerasensorn. I diagrammet presenteras varje detekterat objekt av en nod och kanterna används för att beteckna förhållandet mellan de identifierade objekten. Diagram noden innehåller värden som hastighet, gränsvärde, orientering, avstånd och riktningar mellan objektet och roboten.Vi genererar scen graf i ett simulerat lager scenario med frekvensen 7 Hz och presenterar en studie av Mask R-CNN baserat på den kvalitativa jämförelsen. Mask R-CNN är ett sätt att segmentera objekt exempel för att få objektens egenskaper. Det använder ResNetFPN för funktion extraktion och lägger till en gren till Snabbare R-CNN för att förutsäga segmenterings mask för varje objekt. Och dess resultat överträffar nästan alla befintliga, enkel modell poster, till exempel segmentering och avgränsning av objektiv detektering. Med hjälp av denna metod extraheras kanterna för det detekterade objektet från kamerabilderna. Vi initierar Mask R-CNN-modellen med tre olika typer av vikter: COCO-utbildade vikter, ImageNet-tränade vikter och slumpmässiga vikter, och resultaten av dessa tre olika vikter jämförs med avseende på precision och återkallelse.Resultaten visade att Mask R-CNN också är lämplig för simulerade miljöer och kan uppfylla kraven i både detekterings precision och hastighet. Dessutom använde den utbildade modellen de COCO-tränade vikterna överträffat modellen med slumpmässigt tilldelade initial vikter. Det beräknade medelvärdet för precision (mAP) för validerings dataset når 0.949 med COCO-pre-utbildade vikter och körhastighet på 11.35 fps.
35

Efficient hierarchical layered graph approach for multi-region segmentation / Abordagem eficiente baseada em grafo hierárquico em camadas para a segmentação de múltiplas regiões

Leon, Leissi Margarita Castaneda 15 March 2019 (has links)
Image segmentation refers to the process of partitioning an image into meaningful regions of interest (objects) by assigning distinct labels to their composing pixels. Images are usually composed of multiple objects with distinctive features, thus requiring distinct high-level priors for their appropriate modeling. In order to obtain a good segmentation result, the segmentation method must attend all the individual priors of each object, as well as capture their inclusion/exclusion relations. However, many existing classical approaches do not include any form of structural information together with different high-level priors for each object into a single energy optimization. Consequently, they may be inappropriate in this context. We propose a novel efficient seed-based method for the multiple object segmentation of images based on graphs, named Hierarchical Layered Oriented Image Foresting Transform (HLOIFT). It uses a tree of the relations between the image objects, being each object represented by a node. Each tree node may contain different individual high-level priors and defines a weighted digraph, named as layer. The layer graphs are then integrated into a hierarchical graph, considering the hierarchical relations of inclusion and exclusion. A single energy optimization is performed in the hierarchical layered weighted digraph leading to globally optimal results satisfying all the high-level priors. The experimental evaluations of HLOIFT and its extensions, on medical, natural and synthetic images, indicate promising results comparable to the state-of-the-art methods, but with lower computational complexity. Compared to hierarchical segmentation by the min cut/max-flow algorithm, our approach is less restrictive, leading to globally optimal results in more general scenarios, and has a better running time. / A segmentação de imagem refere-se ao processo de particionar uma imagem em regiões significativas de interesse (objetos), atribuindo rótulos distintos aos seus pixels de composição. As imagens geralmente são compostas de vários objetos com características distintas, exigindo, assim, restrições de alto nível distintas para a sua modelagem apropriada. Para obter um bom resultado de segmentação, o método de segmentação deve atender a todas as restrições individuais de cada objeto, bem como capturar suas relações de inclusão/ exclusão. No entanto, muitas abordagens clássicas existentes não incluem nenhuma forma de informação estrutural, juntamente com diferentes restrições de alto nível para cada objeto em uma única otimização de energia. Consequentemente, elas podem ser inapropriadas nesse contexto. Estamos propondo um novo método eficiente baseado em sementes para a segmentação de múltiplos objetos em imagens baseado em grafos, chamado Hierarchical Layered Oriented Image Foresting Transform (HLOIFT). Ele usa uma árvore das relações entre os objetos de imagem, sendo cada objeto representado por um nó. Cada nó da árvore pode conter diferentes restrições individuais de alto nível, que são usadas para definir um dígrafo ponderado, nomeado como camada. Os grafos das camadas são então integrados em um grafo hierárquico, considerando as relações hierárquicas de inclusão e exclusão. Uma otimização de energia única é realizada no dígrafo hierárquico em camadas, levando a resultados globalmente ótimos, satisfazendo todas as restrições de alto nível. As avaliações experimentais do HLOIFT e de suas extensões, em imagens médicas, naturais e sintéticas,indicam resultados promissores comparáveis aos métodos do estado-da-arte, mas com menor complexidade computacional. Comparada à segmentação hierárquica pelo algoritmo min-cut/max-flow, nossa abordagem é menos restritiva, levando a resultados globalmente ótimo sem cenários mais gerais e com melhor tempo de execução.
36

2D/3D knowledge inference for intelligent access to enriched visual content

Sambra-Petre, Raluca-Diana 18 June 2013 (has links) (PDF)
This Ph.D. thesis tackles the issue of sill and video object categorization. The objective is to associate semantic labels to 2D objects present in natural images/videos. The principle of the proposed approach consists of exploiting categorized 3D model repositories in order to identify unknown 2D objects based on 2D/3D matching techniques. We propose here an object recognition framework, designed to work for real time applications. The similarity between classified 3D models and unknown 2D content is evaluated with the help of the 2D/3D description. A voting procedure is further employed in order to determine the most probable categories of the 2D object. A representative viewing angle selection strategy and a new contour based descriptor (so-called AH), are proposed. The experimental evaluation proved that, by employing the intelligent selection of views, the number of projections can be decreased significantly (up to 5 times) while obtaining similar performance. The results have also shown the superiority of AH with respect to other state of the art descriptors. An objective evaluation of the intra and inter class variability of the 3D model repositories involved in this work is also proposed, together with a comparative study of the retained indexing approaches . An interactive, scribble-based segmentation approach is also introduced. The proposed method is specifically designed to overcome compression artefacts such as those introduced by JPEG compression. We finally present an indexing/retrieval/classification Web platform, so-called Diana, which integrates the various methodologies employed in this thesis
37

Robust visual detection and tracking of complex objects : applications to space autonomous rendez-vous and proximity operations

Petit, Antoine 19 December 2013 (has links) (PDF)
In this thesis, we address the issue of fully localizing a known object through computer vision, using a monocular camera, what is a central problem in robotics. A particular attention is here paid on space robotics applications, with the aims of providing a unified visual localization system for autonomous navigation purposes for space rendezvous and proximity operations. Two main challenges of the problem are tackled: initially detecting the targeted object and then tracking it frame-by-frame, providing the complete pose between the camera and the object, knowing the 3D CAD model of the object. For detection, the pose estimation process is based on the segmentation of the moving object and on an efficient probabilistic edge-based matching and alignment procedure of a set of synthetic views of the object with a sequence of initial images. For the tracking phase, pose estimation is handled through a 3D model-based tracking algorithm, for which we propose three different types of visual features, pertinently representing the object with its edges, its silhouette and with a set of interest points. The reliability of the localization process is evaluated by propagating the uncertainty from the errors of the visual features. This uncertainty besides feeds a linear Kalman filter on the camera velocity parameters. Qualitative and quantitative experiments have been performed on various synthetic and real data, with challenging imaging conditions, showing the efficiency and the benefits of the different contributions, and their compliance with space rendezvous applications.
38

Unsupervised construction of 4D semantic maps in a long-term autonomy scenario

Ambrus, Rares January 2017 (has links)
Robots are operating for longer times and collecting much more data than just a few years ago. In this setting we are interested in exploring ways of modeling the environment, segmenting out areas of interest and keeping track of the segmentations over time, with the purpose of building 4D models (i.e. space and time) of the relevant parts of the environment. Our approach relies on repeatedly observing the environment and creating local maps at specific locations. The first question we address is how to choose where to build these local maps. Traditionally, an operator defines a set of waypoints on a pre-built map of the environment which the robot visits autonomously. Instead, we propose a method to automatically extract semantically meaningful regions from a point cloud representation of the environment. The resulting segmentation is purely geometric, and in the context of mobile robots operating in human environments, the semantic label associated with each segment (i.e. kitchen, office) can be of interest for a variety of applications. We therefore also look at how to obtain per-pixel semantic labels given the geometric segmentation, by fusing probabilistic distributions over scene and object types in a Conditional Random Field. For most robotic systems, the elements of interest in the environment are the ones which exhibit some dynamic properties (such as people, chairs, cups, etc.), and the ability to detect and segment such elements provides a very useful initial segmentation of the scene. We propose a method to iteratively build a static map from observations of the same scene acquired at different points in time. Dynamic elements are obtained by computing the difference between the static map and new observations. We address the problem of clustering together dynamic elements which correspond to the same physical object, observed at different points in time and in significantly different circumstances. To address some of the inherent limitations in the sensors used, we autonomously plan, navigate around and obtain additional views of the segmented dynamic elements. We look at methods of fusing the additional data and we show that both a combined point cloud model and a fused mesh representation can be used to more robustly recognize the dynamic object in future observations. In the case of the mesh representation, we also show how a Convolutional Neural Network can be trained for recognition by using mesh renderings. Finally, we present a number of methods to analyse the data acquired by the mobile robot autonomously and over extended time periods. First, we look at how the dynamic segmentations can be used to derive a probabilistic prior which can be used in the mapping process to further improve and reinforce the segmentation accuracy. We also investigate how to leverage spatial-temporal constraints in order to cluster dynamic elements observed at different points in time and under different circumstances. We show that by making a few simple assumptions we can increase the clustering accuracy even when the object appearance varies significantly between observations. The result of the clustering is a spatial-temporal footprint of the dynamic object, defining an area where the object is likely to be observed spatially as well as a set of time stamps corresponding to when the object was previously observed. Using this data, predictive models can be created and used to infer future times when the object is more likely to be observed. In an object search scenario, this model can be used to decrease the search time when looking for specific objects. / <p>QC 20171009</p>
39

Visual Flow Analysis and Saliency Prediction

Srinivas, Kruthiventi S S January 2016 (has links) (PDF)
Nowadays, we have millions of cameras in public places such as traffic junctions, railway stations etc., and capturing video data round the clock. This humongous data has resulted in an increased need for automation of visual surveillance. Analysis of crowd and traffic flows is an important step towards achieving this goal. In this work, we present our algorithms for identifying and segmenting dominant ows in surveillance scenarios. In the second part, we present our work aiming at predicting the visual saliency. The ability of humans to discriminate and selectively pay attention to few regions in the scene over the others is a key attentional mechanism. Here, we present our algorithms for predicting human eye fixations and segmenting salient objects in the scene. (i) Flow Analysis in Surveillance Videos: We propose algorithms for segmenting flows of static and dynamic nature in surveillance videos in an unsupervised manner. In static flows scenarios, we assume the motion patterns to be consistent over the entire duration of video and analyze them in the compressed domain using H.264 motion vectors. Our approach is based on modeling the motion vector field as a Conditional Random Field (CRF) and obtaining oriented motion segments which are merged to obtain the final flow segments. This approach in compressed domain is shown to be both accurate and computationally efficient. In the case of dynamic flow videos (e.g. flows at a traffic junction), we propose a method for segmenting the individual object flows over long durations. This long-term flow segmentation is achieved in the framework of CRF using local color and motion features. We propose a Dynamic Time Warping (DTW) based distance measure between flow segments for clustering them and generate representative dominant ow models. Using these dominant flow models, we perform path prediction for the vehicles entering the camera's field-of-view and detect anomalous motions. (ii) Visual Saliency Prediction using Deep Convolutional Neural Networks: We propose a deep fully convolutional neural network (CNN) - DeepFix, for accurately predicting eye fixations in the form of saliency maps. Unlike classical works which characterize the saliency map using various hand-crafted features, our model automatically learns features in a hierarchical fashion and predicts saliency map in an end-to-end manner. DeepFix is designed to capture visual semantics at multiple scales while taking global context into account. Generally, fully convolutional nets are spatially invariant which prevents them from modeling location dependent patterns (e.g. centre-bias). Our network overcomes this limitation by incorporating a novel Location Biased Convolutional layer. We experimentally show that our network outperforms other recent approaches by a significant margin. In general, human eye fixations correlate with locations of salient objects in the scene. However, only a handful of approaches have attempted to simultaneously address these related aspects of eye fixations and object saliency. In our work, we also propose a deep convolutional network capable of simultaneously predicting eye fixations and segmenting salient objects in a unified framework. We design the initial network layers, shared between both the tasks, such that they capture the global contextual aspects of saliency, while the deeper layers of the network address task specific aspects. Our network shows a significant improvement over the current state-of-the-art for both eye fixation prediction and salient object segmentation across a number of challenging datasets.
40

2D/3D knowledge inference for intelligent access to enriched visual content / Modélisation et inférence 2D/3D de connaissances pour l'accès intelligent aux contenus visuels enrichis

Sambra-Petre, Raluca-Diana 18 June 2013 (has links)
Cette thèse porte sur la catégorisation d'objets vidéo. L'objectif est d'associer des étiquettes sémantiques à des objets 2D présents dans les images/vidéos. L'approche proposée consiste à exploiter des bases d'objets 3D classifiés afin d'identifier des objets 2D inconnus. Nous proposons un schéma de reconnaissance d'objet, conçu pour fonctionner pour des applications en temps réel. La similitude entre des modèles 3D et des contenus 2D inconnu est évaluée à l'aide de la description 2D/3D. Une procédure de vote est ensuite utilisée afin de déterminer les catégories les plus probables de l'objet 2D. Nous proposons aussi une stratégie pour la sélection des vues les plus représentatives d'un objet 3D et un nouveau descripteur de contour (nommé AH). L'évaluation expérimentale a montré que, en employant la sélection intelligente de vues, le nombre de projections peut être diminué de manière significative (jusqu'à 5 fois) tout en obtenant des performances similaires. Les résultats ont également montré la supériorité de l'AH par rapport aux autres descripteurs adoptés. Une évaluation objective de la variabilité intra et inter classe des bases de données 3D impliqués dans ce travail est également proposé, ainsi qu'une étude comparative des approches d'indexations retenues. Une approche de segmentation interactive est également introduite. La méthode proposée est spécifiquement conçu pour surmonter les artefacts de compression tels que ceux mis en place par la compression JPEG. Enfin, nous présentons une plate-forme Web pour l'indexation/la recherche/la classification, qui intègre les différentes méthodologies utilisées dans cette thèse / This Ph.D. thesis tackles the issue of sill and video object categorization. The objective is to associate semantic labels to 2D objects present in natural images/videos. The principle of the proposed approach consists of exploiting categorized 3D model repositories in order to identify unknown 2D objects based on 2D/3D matching techniques. We propose here an object recognition framework, designed to work for real time applications. The similarity between classified 3D models and unknown 2D content is evaluated with the help of the 2D/3D description. A voting procedure is further employed in order to determine the most probable categories of the 2D object. A representative viewing angle selection strategy and a new contour based descriptor (so-called AH), are proposed. The experimental evaluation proved that, by employing the intelligent selection of views, the number of projections can be decreased significantly (up to 5 times) while obtaining similar performance. The results have also shown the superiority of AH with respect to other state of the art descriptors. An objective evaluation of the intra and inter class variability of the 3D model repositories involved in this work is also proposed, together with a comparative study of the retained indexing approaches . An interactive, scribble-based segmentation approach is also introduced. The proposed method is specifically designed to overcome compression artefacts such as those introduced by JPEG compression. We finally present an indexing/retrieval/classification Web platform, so-called Diana, which integrates the various methodologies employed in this thesis

Page generated in 0.1006 seconds