Global ETD Search

21	Information fusion for scene understanding / Fusion d'informations pour la compréhesion de scènes Xu, Philippe 28 November 2014 (has links) La compréhension d'image est un problème majeur de la robotique moderne, la vision par ordinateur et l'apprentissage automatique. En particulier, dans le cas des systèmes avancés d'aide à la conduite, la compréhension de scènes routières est très importante. Afin de pouvoir reconnaître le grand nombre d’objets pouvant être présents dans la scène, plusieurs capteurs et algorithmes de classification doivent être utilisés. Afin de pouvoir profiter au mieux des méthodes existantes, nous traitons le problème de la compréhension de scènes comme un problème de fusion d'informations. La combinaison d'une grande variété de modules de détection, qui peuvent traiter des classes d'objets différentes et utiliser des représentations distinctes, est faites au niveau d'une image. Nous considérons la compréhension d'image à deux niveaux : la détection d'objets et la segmentation sémantique. La théorie des fonctions de croyance est utilisée afin de modéliser et combiner les sorties de ces modules de détection. Nous mettons l'accent sur la nécessité d'avoir un cadre de fusion suffisamment flexible afin de pouvoir inclure facilement de nouvelles classes d'objets, de nouveaux capteurs et de nouveaux algorithmes de détection d'objets. Dans cette thèse, nous proposons une méthode générale permettant de transformer les sorties d’algorithmes d'apprentissage automatique en fonctions de croyance. Nous étudions, ensuite, la combinaison de détecteurs de piétons en utilisant les données Caltech Pedestrian Detection Benchmark. Enfin, les données du KITTI Vision Benchmark Suite sont utilisées pour valider notre approche dans le cadre d'une fusion multimodale d'informations pour de la segmentation sémantique. / Image understanding is a key issue in modern robotics, computer vison and machine learning. In particular, driving scene understanding is very important in the context of advanced driver assistance systems for intelligent vehicles. In order to recognize the large number of objects that may be found on the road, several sensors and decision algorithms are necessary. To make the most of existing state-of-the-art methods, we address the issue of scene understanding from an information fusion point of view. The combination of many diverse detection modules, which may deal with distinct classes of objects and different data representations, is handled by reasoning in the image space. We consider image understanding at two levels : object detection ans semantic segmentation. The theory of belief functions is used to model and combine the outputs of these detection modules. We emphazise the need of a fusion framework flexible enough to easily include new classes, new sensors and new object detection algorithms. In this thesis, we propose a general method to model the outputs of classical machine learning techniques as belief functions. Next, we apply our framework to the combination of pedestrian detectors using the Caltech Pedestrain Detection Benchmark. The KITTI Vision Benchmark Suite is then used to validate our approach in a semantic segmentation context using multi-modal information Fusion d'informations Compréhension de scènes routières Théorie des fonctions de croyance Détection d'objets Segmentation sémantique Information fusion Driving scene understanding Theory of belief function Demster-Shafer theory Object detection Semantic segmentation
22	Modélisation géométrique de scènes intérieures à partir de nuage de points / Geometric modeling of indoor scenes from acquired point data Oesau, Sven 24 June 2015 (has links) La modélisation géométrique et la sémantisation de scènes intérieures à partir d'échantillon de points et un sujet de recherche qui prend de plus en plus d'importance. Cependant, le traitement d'un ensemble volumineux de données est rendu difficile d'une part par le nombre élevé d'objets parasitant la scène et d'autre part par divers défauts d'acquisitions comme par exemple des données manquantes ou un échantillonnage de la scène non isotrope. Cette thèse s'intéresse de près à de nouvelles méthodes permettant de modéliser géométriquement un nuage de point non structuré et d’y donner de la sémantique. Dans le chapitre 2, nous présentons deux méthodes permettant de transformer le nuage de points en un ensemble de formes. Nous proposons en premier lieu une méthode d'extraction de lignes qui détecte des segments à partir d'une coupe horizontale du nuage de point initiale. Puis nous introduisons une méthode par croissance de régions qui détecte et renforce progressivement des régularités parmi les formes planaires. Dans la première partie du chapitre 3, nous proposons une méthode basée sur de l'analyse statistique afin de séparer de la structure de la scène les objets la parasitant. Dans la seconde partie, nous présentons une méthode d'apprentissage supervisé permettant de classifier des objets en fonction d'un ensemble de formes planaires. Nous introduisons dans le chapitre 4 une méthode permettant de modéliser géométriquement le volume d'une pièce (sans meubles). Une formulation énergétique est utilisée afin de labelliser les régions d’une partition générée à partir de formes élémentaires comme étant intérieur ou extérieur de manière robuste au bruit et aux données. / Geometric modeling and semantization of indoor scenes from sampled point data is an emerging research topic. Recent advances in acquisition technologies provide highly accurate laser scanners and low-cost handheld RGB-D cameras for real-time acquisition. However, the processing of large data sets is hampered by high amounts of clutter and various defects such as missing data, outliers and anisotropic sampling. This thesis investigates three novel methods for efficient geometric modeling and semantization from unstructured point data: Shape detection, classification and geometric modeling. Chapter 2 introduces two methods for abstracting the input point data with primitive shapes. First, we propose a line extraction method to detect wall segments from a horizontal cross-section of the input point cloud. Second, we introduce a region growing method that progressively detects and reinforces regularities of planar shapes. This method utilizes regularities common to man-made architecture, i.e. coplanarity, parallelism and orthogonality, to reduce complexity and improve data fitting in defect-laden data. Chapter 3 introduces a method based on statistical analysis for separating clutter from structure. We also contribute a supervised machine learning method for object classification based on sets of planar shapes. Chapter 4 introduces a method for 3D geometric modeling of indoor scenes. We first partition the space using primitive shapes detected from permanent structures. An energy formulation is then used to solve an inside/outside labeling of a space partitioning, the latter providing robustness to missing data and outliers. Traitement de la géometrie Modélisation 3D LIDAR Compréhension de scène Reconstruction de scène d'intérieur Détection d'ombre Geometry processing Shape detection Indoor scene reconstruction Scene understanding 3D modeling LIDAR data
23	Modeling and recognizing interactions between people, objects and scenes / Modélisation et reconnaissance des actions humaines dans les images Delaitre, Vincent 07 April 2015 (has links) Nous nous intéressons dans cette thèse à la modélisation des interactions entre personnes, objets et scènes. Nous montrons l’intérêt de combiner ces trois sources d’information pour améliorer la classification d’action et la compréhension automatique des scènes. Dans la première partie, nous cherchons à exploiter le contexte fourni par les objets et la scène pour améliorer la classification des actions humaines dans les photographies. Nous explorons différentes variantes du modèle dit de “bag-of-features” et proposons une méthode tirant avantage du contexte scénique. Nous proposons ensuite un nouveau modèle exploitant les objets pour la classification d’action basé sur des paires de détecteurs de parties du corps et/ou d’objet. Nous évaluons ces méthodes sur notre base de données d’images nouvellement collectée ainsi que sur trois autres jeux de données pour la classification d’action et obtenons des résultats proches de l’état de l’art. Dans la seconde partie de cette thèse, nous nous attaquons au problème inverse et cherchons à utiliser l’information contextuelle fournie par les personnes pour aider à la localisation des objets et à la compréhension des scènes. Nous collectons une nouvelle base de données de time-lapses comportant de nombreuses interactions entre personnes, objets et scènes. Nous développons une approche permettant de décrire une zone de l’image par la distribution des poses des personnes qui interagissent avec et nous utilisons cette représentation pour améliorer la localisation d’objets. De plus, nous démontrons qu’utiliser des informations provenant des personnes détectées peut améliorer plusieurs étapes de l’algorithme utilisé pour la compréhension des scènes d’intérieur. Pour finir, nous proposons des annotations 3D de notre base de time-lapses et montrons comment estimer l’espace utilisé par différentes classes d’objets dans une pièce. Pour résumer, les contributions de cette thèse sont les suivantes : (i) nous mettons au point des modèles pour la classification d’image tirant avantage du contexte scénique et des objets environnants et nous proposons une nouvelle base de données pour évaluer leurs performances, (ii) nous développons un nouveau modèle pour améliorer la localisation d’objet grâce à l’observation des acteurs humains interagissant avec une scène et nous le testons sur un nouveau jeu de vidéos comportant de nombreuses interactions entre personnes, objets et scènes, (iii) nous proposons la première méthode pour évaluer les volumes occupés par différentes classes d’objets dans une pièce, ce qui nous permet d’analyser les différentes étapes pour la compréhension automatique de scène d’intérieur et d’en identifier les principales sources d’erreurs. / In this thesis, we focus on modeling interactions between people, objects and scenes and show benefits of combining corresponding cues for improving both action classification and scene understanding. In the first part, we seek to exploit the scene and object context to improve action classification in still images. We explore alternative bag-of-features models and propose a method that takes advantage of the scene context. We then propose a new model exploiting the object context for action classification based on pairs of body part and object detectors. We evaluate our methods on our newly collected still image dataset as well as three other datasets for action classification and show performance close to the state of the art. In the second part of this thesis, we address the reverse problem and aim at using the contextual information provided by people to help object localization and scene understanding. We collect a new dataset of time-lapse videos involving people interacting with indoor scenes. We develop an approach to describe image regions by the distribution of human co-located poses and use this pose-based representation to improve object localization. We further demonstrate that people cues can improve several steps of existing pipelines for indoor scene understanding. Finally, we extend the annotation of our time-lapse dataset to 3D and show how to infer object labels for occupied 3D volumes of a scene. To summarize, the contributions of this thesis are the following: (i) we design action classification models for still images that take advantage of the scene and object context and we gather a new dataset to evaluate their performance, (ii) we develop a new model to improve object localization thanks to observations of people interacting with an indoor scene and test it on a new dataset centered on person, object and scene interactions, (iii) we propose the first method to evaluate the volumes occupied by different object classes in a room that allow us to analyze the current 3D scene understanding pipeline and identify its main source of errors. Vision par ordinateur Classification d'action humaine Compréhension de scène Computer vision Human action classification Scene understanding Action object scene interactions 006.4
24	Exploring Situation Awareness for Advanced Driver-Assistance Systems Chengxi Li (11530579) 22 November 2021 (has links) <div>From prehistoric man who needs to be aware of the surrounding situations and hunt for food, to modern industry where machines and robots are programmed to explore the environment and accomplish assignments, situation awareness has always been an essential topic to everyone.</div><div><br></div><div>Advanced Driver-Assistance Systems (ADAS) is one of the modern technologies seeking effective solutions for driving safety. It also utilizes situation awareness model to interpret the driver's state in the environment and provide safe driving advice, with the potential to significantly reduce the traffic accident fatalities.</div><div><br></div><div>To enable situation awareness, an intelligent driving system needs to fulfill the following: (1) perceives the traffic elements in the environment, (2) comprehends the spatial-temporal interactions between a driver and other objects, and (3) projects the states of traffic elements to forecast future actions.</div><div><br></div><div>However, each level of situation awareness encounters its unique challenges in driving scenarios, for example, how to perceive vehicles in low-illuminated conditions? How to represent the complicated interactive relations in complicated driving situations? And how to anticipate the temporal dynamics of traffic elements and identify the where the potential risk comes from? To answer these questions, we explore situation awareness model for Advanced Driver-Assistance Systems at 3 levels: Perception, Comprehension and Projection. We discuss how to realize situation awareness based on three different computer vision tasks. We demonstrate that our proposed system is able to forecast the driver's operational intentions and identify risk objects to avoid hazards.</div> Computer Engineering Computer Vision Image Processing Situation awareness Driver Assistant System Autonomous Driving Object Detection Scene Understanding Risk Assessment
25	Towards Efficient Convolutional Neural Architecture Design Richter, Mats L. 10 May 2022 (has links) The design and adjustment of convolutional neural network architectures is an opaque and mostly trial and error-driven process. The main reason for this is the lack of proper paradigms beyond general conventions for the development of neural networks architectures and lacking effective insights into the models that can be propagated back to design decision. In order for the task-specific design of deep learning solutions to become more efficient and goal-oriented, novel design strategies need to be developed that are founded on an understanding of convolutional neural network models. This work develops tools for the analysis of the inference process in trained neural network models. Based on these tools, characteristics of convolutional neural network models are identified that can be linked to inefficiencies in predictive and computational performance. Based on these insights, this work presents methods for effectively diagnosing these design faults before and during training with little computational overhead. These findings are empirically tested and demonstrated on architectures with sequential and multi-pathway structures, covering all the common types of convolutional neural network architectures used for classification. Furthermore, this work proposes simple optimization strategies that allow for goal-oriented and informed adjustment of the neural architecture, opening the potential for a less trial-and-error-driven design process. 54.72 - Künstliche Intelligenz 54.74 - Maschinelles Sehen I.2.10 - Vision and Scene Understanding I.5.2 - Design Methodology ddc:004
26	<strong>Redefining Visual SLAM for Construction Robots: Addressing Dynamic Features and Semantic Composition for Robust Performance</strong> Liu Yang (16642902) 07 August 2023 (has links) <p> </p> <p>This research is motivated by the potential of autonomous mobile robots (AMRs) in enhancing safety, productivity, and efficiency in the construction industry. The dynamic and complex nature of construction sites presents significant challenges to AMRs, particularly in localization and mapping – a process where AMRs determine their own position in the environment while creating a map of the surrounding area. These capabilities are crucial for autonomous navigation and task execution but are inadequately addressed by existing solutions, which primarily rely on visual Simultaneous Localization and Mapping (SLAM) methods. These methods are often ineffective in construction sites due to their underlying assumption of a static environment, leading to unreliable outcomes. Therefore, there is a pressing need to enhance the applicability of AMRs in construction by addressing the limitations of current localization and mapping methods in addressing the dynamic nature of construction sites, thereby empowering AMRs to function more effectively and fully realize their potential in the construction industry.</p> <p>The overarching goal of this research is to fulfill this critical need by developing a novel visual SLAM framework that is capable of not only detecting and segmenting diverse dynamic objects in construction environments but also effectively interpreting the semantic structure of the environment. Furthermore, it can efficiently integrate these functionalities into a unified system to provide an improved SLAM solution for dynamic, complex, and unstructured environments. The rationale is that such a SLAM system could effectively address the dynamic nature of construction sites, thereby significantly improving the efficiency and accuracy of robot localization and mapping in the construction working environment. </p> <p>Towards this goal, three specific objectives have been formulated. The first objective is to develop a novel methodology for comprehensive dynamic object segmentation that can support visual SLAM within highly variable construction environments. This novel method integrates class-agnostic objectness masks and motion cues into video object segmentation, thereby significantly improving the identification and segmentation of dynamic objects within construction sites. These dynamic objects present a significant challenge to the reliable operation of AMRs and, by accurately identifying and segmenting them, the accuracy and reliability of SLAM-based localization is expected to greatly improve. The key to this innovative approach involves a four-stage method for dynamic object segmentation, including objectness mask generation, motion saliency estimation, fusion of objectness masks and motion saliency, and bi-directional propagation of the fused mask. Experimental results show that the proposed method achieves a highest of 6.4% improvement for dynamic object segmentation than state-of-the-art methods, as well as lowest localization errors when integrated into visual SLAM system over public dataset. </p> <p>The second objective focuses on developing a flexible, cost-effective method for semantic segmentation of construction images of structural elements. This method harnesses the power of image-level labels and Building Information Modeling (BIM) object data to replace the traditional and often labor-intensive pixel-level annotations. The hypothesis for this objective is that by fusing image-level labels with BIM-derived object information, a segmentation that is competitive with pixel-level annotations while drastically reducing the associated cost and labor intensity can be achieved. The research method involves initializing object location, extracting object information, and incorporating location priors. Extensive experiments indicate the proposed method with simple image-level labels achieves competitive results with the full pixel-level supervisions, but completely remove the need for laborious and expensive pixel-level annotations when adapting networks to unseen environments. </p> <p>The third objective aims to create an efficient integration of dynamic object segmentation and semantic interpretation within a unified visual SLAM framework. It is proposed that a more efficient dynamic object segmentation with adaptively selected frames combined with the leveraging of a semantic floorplan from an as-built BIM would speed up the removal of dynamic objects and enhance localization while reducing the frequency of scene segmentation. The technical approach to achieving this objective is through two major modifications to the classic visual SLAM system: adaptive dynamic object segmentation, and semantic-based feature reliability update. Upon the accomplishment of this objective, an efficient framework is developed that seamlessly integrates dynamic object segmentation and semantic interpretation into a visual SLAM framework. Experiments demonstrate the proposed framework achieves competitive performance over the testing scenarios, with processing time almost halved than the counterpart dynamic SLAM algorithms.</p> <p>In conclusion, this research contributes significantly to the adoption of AMRs in construction by tailoring a visual SLAM framework specifically for dynamic construction sites. Through the integration of dynamic object segmentation and semantic interpretation, it enhances localization accuracy, mapping efficiency, and overall SLAM performance. With broader implications of visual SLAM algorithms such as site inspection in dangerous zones, progress monitoring, and material transportation, the study promises to advance AMR capabilities, marking a significant step towards a new era in construction automation.</p> Construction engineering Visual SLAM Building Information Modeling Video Object Segmentation Scene Understanding Weakly Supervised Segmentation Localization Mapping Robotics Construction Automation Image-level labels Semantic Segmentation
27	臉書相片分類及使用者樣貌分析 / Identifying User Profile Using Facebook Photos. 張婷雅, Chang,Ting Ya Unknown Date (has links) 除了文字訊息，張貼相片也是臉書使用者常用的功能，這些上傳的照片種類繁多，可能是自拍照、風景照、或食物照等等，本論文的研究以影像分析為出發點，探討相片內容跟發佈者間之關係，希望藉由相片獲得的資訊，輔助分析使用者樣貌。本研究共收集32位受測者上傳至臉書的相片，利用電腦視覺技術分析圖像內容，如人臉偵測、環境識別、找出影像上視覺顯著的區域等，藉由這些工具所提供的資訊，將照片加註標籤，以及進行自動分類，並以此兩個層次的資訊做為特徵向量，利用階層式演算法進行使用者分群，再根據實驗結果去分析每一群的行為特性。透過此研究，可對使用者進行初步分類、瞭解不同的使用者樣貌，並嘗試回應相關問題，如使用者所張貼之相片種類統計、不同性別使用者的上傳行為、依據上傳圖像內容，進行使用者樣貌分類等，深化我們對於臉書相片上傳行為的理解。 / Apart from text messages, photo posting is a popular function of Facebook. The uploaded photos are of various nature, including selfie, outdoor scenes, and food. In this thesis, we employ state-of-the-art computer vision techniques to analyze image content and establish the relationship between user profile and the type of photos posted. We collected photos from 32 Facebook users. We then applied techniques such as face detection, scene understanding and saliency map identification to gather information for automatic image tagging and classification. Grouping of users can be achieved either by tag statistics or photo classes. Characteristics of each group can be further investigated based on the results of hierarchical clustering. We wish to identify profiles of different users and respond to questions such as the type of photos most frequently posted, gender differentiation in photo posting behavior and user classification according to image content, which will promote our understanding of photo uploading activities on Facebook. 臉書人臉偵測環境識別影像標籤使用者樣貌分析 Facebook face detection scene understanding image tag user behavior analysis
28	A computational framework for unsupervised analysis of everyday human activities Hamid, Muhammad Raffay 07 July 2008 (has links) In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments. Computational perception Activity recognition Anomaly detection Artificial intelligence Behavior modeling Scene understanding Automatic data collection systems Ubiquitous computing Optical detectors Human-computer interaction Expert systems (Computer science) Computer vision
29	Context-aware anchoring, semantic mapping and active perception for mobile robots Günther, Martin 30 November 2021 (has links) An autonomous robot that acts in a goal-directed fashion requires a world model of the elements that are relevant to the robot's task. In real-world, dynamic environments, the world model has to be created and continually updated from uncertain sensor data. The symbols used in plan-based robot control have to be anchored to detected objects. Furthermore, robot perception is not only a bottom-up and passive process: Knowledge about the composition of compound objects can be used to recognize larger-scale structures from their parts. Knowledge about the spatial context of an object and about common relations to other objects can be exploited to improve the quality of the world model and can inform an active search for objects that are missing from the world model. This thesis makes several contributions to address these challenges: First, a model-based semantic mapping system is presented that recognizes larger-scale structures like furniture based on semantic descriptions in an ontology. Second, a context-aware anchoring process is presented that creates and maintains the links between object symbols and the sensor data corresponding to those objects while exploiting the geometric context of objects. Third, an active perception system is presented that actively searches for a required object while being guided by the robot's knowledge about the environment. Anchoring Semantic Mapping Active Perception Robotics Artificial Intelligence Context Object Search Robotik Künstliche Intelligenz 54.72 - Künstliche Intelligenz 54.74 - Maschinelles Sehen I.2.9 - Robotics I.2.10 - Vision and Scene Understanding ddc:004
30	Dynamické rozpoznávání scény pro navigaci mobilního robotu / Dynamic Scene Understanding for Mobile Robot Navigation Mikšík, Ondřej January 2012 (has links) Diplomová práce se zabývá porozuměním dynamických scén pro navigaci mobilních robotů. V první části předkládáme nový přístup k "sebe-učícím" modelům - fůzi odhadu úběžníku cesty založeného na frekvenčním zpracování a pravděpodobnostních modelech využívající barvu pro segmentaci. Detekce úběžníku cesty je založena na odhadu dominantních orientací texturního toku, získáného pomocí banky Gaborových vlnek, a hlasování. Úběžník cesty poté definuje trénovací oblast, která se využívá k samostatnému učení barevných modelů. Nakonec, oblasti tvořící cestu jsou vybrány pomocí měření Mahalanobisovi vzdálenosti. Pár pravidel řeší situace, jako jsou mohutné stíny, přepaly a rychlost adaptivity. Kromě toho celý odhad úběžníku cesty je přepracován - vlnky jsou nahrazeny aproximacemi pomocí binárních blokových funkcí, což umožňuje efektivní filtraci pomocí integrálních obrazů. Nejužší hrdlo celého algoritmu bylo samotné hlasování, proto překládáme schéma, které nejdříve provede hrubý odhad úběžníku a následně jej zpřesní, čímž dosáhneme výrazně vyšší rychlosti (až 40x), zatímco přesnost se zhorší pouze o 3-5%. V druhé části práce předkládáme vyhlazovací filtr pro prostorovo-časovou konzistentnost predikcí, která je důležitá pro vyspělé systémy. Klíčovou částí filtru je nová metrika měřící podobnost mezi třídami, která rozlišuje mnohem lépe než standardní Euclidovská vzdálenost. Tato metrika může být použita k nejrůznějším úlohám v počítačovém vidění. Vyhlazovací filtr nejdříve odhadne optický tok, aby definoval lokální okolí. Toto okolí je použito k rekurzivní filtraci založené na podobnostní metrice. Celková přesnost předkládané metody měřená na pixelech, které nemají shodné predikce mezi původními daty a vyfiltrovanými, je téměř o 18% vyšší než u původních predikcí. Ačkoliv využíváme SHIM jako zdroj původních predikcí, algoritmus může být kombinován s kterýmkoliv jiným systémem (MRF, CRF,...), který poskytne predikce ve formě pravěpodobností. Předkládaný filtr představuje první krok na cestě k úplnému usuzování.

Search results