Global ETD Search

91	Segmentation and structuring of video documents for indexing applications Tapu, Ruxandra Georgina 07 December 2012 (has links) (PDF) Recent advances in telecommunications, collaborated with the development of image and video processing and acquisition devices has lead to a spectacular growth of the amount of the visual content data stored, transmitted and exchanged over Internet. Within this context, elaborating efficient tools to access, browse and retrieve video content has become a crucial challenge. In Chapter 2 we introduce and validate a novel shot boundary detection algorithm able to identify abrupt and gradual transitions. The technique is based on an enhanced graph partition model, combined with a multi-resolution analysis and a non-linear filtering operation. The global computational complexity is reduced by implementing a two-pass approach strategy. In Chapter 3 the video abstraction problem is considered. In our case, we have developed a keyframe representation system that extracts a variable number of images from each detected shot, depending on the visual content variation. The Chapter 4 deals with the issue of high level semantic segmentation into scenes. Here, a novel scene/DVD chapter detection method is introduced and validated. Spatio-temporal coherent shots are clustered into the same scene based on a set of temporal constraints, adaptive thresholds and neutralized shots. Chapter 5 considers the issue of object detection and segmentation. Here we introduce a novel spatio-temporal visual saliency system based on: region contrast, interest points correspondence, geometric transforms, motion classes' estimation and regions temporal consistency. The proposed technique is extended on 3D videos by representing the stereoscopic perception as a 2D video and its associated depth [INFO:INFO_OH] Computer Science/Other Video transition detection Graph partition Multi-resolution analysis Automatic abstraction High level semantic segmentation
92	Living in a dynamic world : semantic segmentation of large scale 3D environments Miksik, Ondrej January 2017 (has links) As we navigate the world, for example when driving a car from our home to the work place, we continuously perceive the 3D structure of our surroundings and intuitively recognise the objects we see. Such capabilities help us in our everyday lives and enable free and accurate movement even in completely unfamiliar places. We largely take these abilities for granted, but for robots, the task of understanding large outdoor scenes remains extremely challenging. In this thesis, I develop novel algorithms for (near) real-time dense 3D reconstruction and semantic segmentation of large-scale outdoor scenes from passive cameras. Motivated by "smart glasses" for partially sighted users, I show how such modeling can be integrated into an interactive augmented reality system which puts the user in the loop and allows her to physically interact with the world to learn personalized semantically segmented dense 3D models. In the next part, I show how sparse but very accurate 3D measurements can be incorporated directly into the dense depth estimation process and propose a probabilistic model for incremental dense scene reconstruction. To relax the assumption of a stereo camera, I address dense 3D reconstruction in its monocular form and show how the local model can be improved by joint optimization over depth and pose. The world around us is not stationary. However, reconstructing dynamically moving and potentially non-rigidly deforming texture-less objects typically require "contour correspondences" for shape-from-silhouettes. Hence, I propose a video segmentation model which encodes a single object instance as a closed curve, maintains correspondences across time and provide very accurate segmentation close to object boundaries. Finally, instead of evaluating the performance in an isolated setup (IoU scores) which does not measure the impact on decision-making, I show how semantic 3D reconstruction can be incorporated into standard Deep Q-learning to improve decision-making of agents navigating complex 3D environments.
93	Real-time 3D Semantic Segmentation of Timber Loads with Convolutional Neural Networks Sällqvist, Jessica January 2018 (has links) Volume measurements of timber loads is done in conjunction with timber trade. When dealing with goods of major economic values such as these, it is important to achieve an impartial and fair assessment when determining price-based volumes. With the help of Saab’s missile targeting technology, CIND AB develops products for digital volume measurement of timber loads. Currently there is a system in operation that automatically reconstructs timber trucks in motion to create measurable images of them. Future iterations of the system is expected to fully automate the scaling by generating a volumetric representation of the timber and calculate its external gross volume. The first challenge towards this development is to separate the timber load from the truck. This thesis aims to evaluate and implement appropriate method for semantic pixel-wise segmentation of timber loads in real time. Image segmentation is a classic but difficult problem in computer vision. To achieve greater robustness, it is therefore important to carefully study and make use of the conditions given by the existing system. Variations in timber type, truck type and packing together create unique combinations that the system must be able to handle. The system must work around the clock in different weather conditions while maintaining high precision and performance. Semantic Segmentation Pixel-wise classification Convolutional Neural Networks Fully Convolutional Networks Patch-based training.
94	Active Learning for Road Segmentation using Convolutional Neural Networks Sörsäter, Michael January 2018 (has links) In recent years, development of Convolutional Neural Networks has enabled high performing semantic segmentation models. Generally, these deep learning based segmentation methods require a large amount of annotated data. Acquiring such annotated data for semantic segmentation is a tedious and expensive task. Within machine learning, active learning involves in the selection of new data in order to limit the usage of annotated data. In active learning, the model is trained for several iterations and additional samples are selected that the model is uncertain of. The model is then retrained on additional samples and the process is repeated again. In this thesis, an active learning framework has been applied to road segmentation which is semantic segmentation of objects related to road scenes. The uncertainty in the samples is estimated with Monte Carlo dropout. In Monte Carlo dropout, several dropout masks are applied to the model and the variance is captured, working as an estimate of the model’s uncertainty. Other metrics to rank the uncertainty evaluated in this work are: a baseline method that selects samples randomly, the entropy in the default predictions and three additional variations/extensions of Monte Carlo dropout. Both the active learning framework and uncertainty estimation are implemented in the thesis. Monte Carlo dropout performs slightly better than the baseline in 3 out of 4 metrics. Entropy outperforms all other implemented methods in all metrics. The three additional methods do not perform better than Monte Carlo dropout. An analysis of what kind of uncertainty Monte Carlo dropout capture is performed together with a comparison of the samples selected by baseline and Monte Carlo dropout. Future development and possible improvements are also discussed. Active learning Monte Carlo dropout Semantic Segmentation Convolutional Neural Networks Uncertainty CNN Deep Learning Road Segmentation Machine learning
95	Information fusion for scene understanding / Fusion d'informations pour la compréhesion de scènes Xu, Philippe 28 November 2014 (has links) La compréhension d'image est un problème majeur de la robotique moderne, la vision par ordinateur et l'apprentissage automatique. En particulier, dans le cas des systèmes avancés d'aide à la conduite, la compréhension de scènes routières est très importante. Afin de pouvoir reconnaître le grand nombre d’objets pouvant être présents dans la scène, plusieurs capteurs et algorithmes de classification doivent être utilisés. Afin de pouvoir profiter au mieux des méthodes existantes, nous traitons le problème de la compréhension de scènes comme un problème de fusion d'informations. La combinaison d'une grande variété de modules de détection, qui peuvent traiter des classes d'objets différentes et utiliser des représentations distinctes, est faites au niveau d'une image. Nous considérons la compréhension d'image à deux niveaux : la détection d'objets et la segmentation sémantique. La théorie des fonctions de croyance est utilisée afin de modéliser et combiner les sorties de ces modules de détection. Nous mettons l'accent sur la nécessité d'avoir un cadre de fusion suffisamment flexible afin de pouvoir inclure facilement de nouvelles classes d'objets, de nouveaux capteurs et de nouveaux algorithmes de détection d'objets. Dans cette thèse, nous proposons une méthode générale permettant de transformer les sorties d’algorithmes d'apprentissage automatique en fonctions de croyance. Nous étudions, ensuite, la combinaison de détecteurs de piétons en utilisant les données Caltech Pedestrian Detection Benchmark. Enfin, les données du KITTI Vision Benchmark Suite sont utilisées pour valider notre approche dans le cadre d'une fusion multimodale d'informations pour de la segmentation sémantique. / Image understanding is a key issue in modern robotics, computer vison and machine learning. In particular, driving scene understanding is very important in the context of advanced driver assistance systems for intelligent vehicles. In order to recognize the large number of objects that may be found on the road, several sensors and decision algorithms are necessary. To make the most of existing state-of-the-art methods, we address the issue of scene understanding from an information fusion point of view. The combination of many diverse detection modules, which may deal with distinct classes of objects and different data representations, is handled by reasoning in the image space. We consider image understanding at two levels : object detection ans semantic segmentation. The theory of belief functions is used to model and combine the outputs of these detection modules. We emphazise the need of a fusion framework flexible enough to easily include new classes, new sensors and new object detection algorithms. In this thesis, we propose a general method to model the outputs of classical machine learning techniques as belief functions. Next, we apply our framework to the combination of pedestrian detectors using the Caltech Pedestrain Detection Benchmark. The KITTI Vision Benchmark Suite is then used to validate our approach in a semantic segmentation context using multi-modal information Fusion d'informations Compréhension de scènes routières Théorie des fonctions de croyance Détection d'objets Segmentation sémantique Information fusion Driving scene understanding Theory of belief function Demster-Shafer theory Object detection Semantic segmentation
96	Melanoma Diagnostics Using Fully Convolutional Networks on Whole Slide Images Phillips, Adon January 2017 (has links) Semantic segmentation as an approach to recognizing and localizing objects within an image is a major research area in computer vision. Now that convolutional neural networks are being increasingly used for such tasks, there have been many improve- ments in grand challenge results, and many new research opportunities in previously untennable areas. Using fully convolutional networks, we have developed a semantic segmentation pipeline for the identification of melanocytic tumor regions, epidermis, and dermis lay- ers in whole slide microscopy images of cutaneous melanoma or cutaneous metastatic melanoma. This pipeline includes processes for annotating and preparing a dataset from the output of a tissue slide scanner to the patch-based training and inference by an artificial neural network. We have curated a large dataset of 50 whole slide images containing cutaneous melanoma or cutaneous metastatic melanoma that are fully annotated at 40× ob- jective resolution by an expert pathologist. We will publish the source images of this dataset online. We also present two new FCN architectures that fuse multiple deconvolutional strides, combining coarse and fine predictions to improve accuracy over similar networks without multi-stride information. Our results show that the system performs better than our comparators. We include inference results on thousands of patches from four whole slide images, reassembling them into whole slide segmentation masks to demonstrate how our system generalizes on novel cases. Machine Learning Medical Imaging Deep Learning Melanoma Whole Slide Images Digital Pathology Convolutional Neural Network Cancer Semantic Segmentation Multi-Class Segmentation
97	Modélisation géométrique à différent niveau de détails d'objets fabriqués par l'homme / Geometric modeling of man-made objects at different level of details Fang, Hao 16 January 2019 (has links) La modélisation géométrique d'objets fabriqués par l'homme à partir de données 3D est l'un des plus grands défis de la vision par ordinateur et de l'infographie. L'objectif à long terme est de générer des modèles de type CAO de la manière la plus automatique possible. Pour atteindre cet objectif, des problèmes difficiles doivent être résolus, notamment (i) le passage à l'échelle du processus de modélisation sur des données d'entrée massives, (ii) la robustesse de la méthodologie contre des mesures d'entrées erronés, et (iii) la qualité géométrique des modèles de sortie. Les méthodes existantes fonctionnent efficacement pour reconstruire la surface des objets de forme libre. Cependant, dans le cas d'objets fabriqués par l'homme, il est difficile d'obtenir des résultats dont la qualité approche celle des représentations hautement structurées, comme les modèles CAO. Dans cette thèse, nous présentons une série de contributions dans ce domaine. Tout d'abord, nous proposons une méthode de classification basée sur l'apprentissage en profondeur pour distinguer des objets dans des environnements complexes à partir de nuages de points 3D. Deuxièmement, nous proposons un algorithme pour détecter des primitives planaires dans des données 3D à différents niveaux d'abstraction. Enfin, nous proposons un mécanisme pour assembler des primitives planaires en maillages polygonaux compacts. Ces contributions sont complémentaires et peuvent être utilisées de manière séquentielle pour reconstruire des modèles de ville à différents niveaux de détail à partir de données 3D aéroportées. Nous illustrons la robustesse, le passage à l'échelle et l'efficacité de nos méthodes sur des données laser et multi-vues stéréo sur des scènes composées d'objets fabriqués par l'homme. / Geometric modeling of man-made objects from 3D data is one of the biggest challenges in Computer Vision and Computer Graphics. The long term goal is to generate a CAD-style model in an as-automatic-as-possible way. To achieve this goal, difficult issues have to be addressed including (i) the scalability of the modeling process with respect to massive input data, (ii) the robustness of the methodology to various defect-laden input measurements, and (iii) the geometric quality of output models. Existing methods work well to recover the surface of free-form objects. However, in case of manmade objects, it is difficult to produce results that approach the quality of high-structured representations as CAD models.In this thesis, we present a series of contributions to the field. First, we propose a classification method based on deep learning to distinguish objects from raw 3D point cloud. Second, we propose an algorithm to detect planar primitives in 3D data at different level of abstraction. Finally, we propose a mechanism to assemble planar primitives into compact polygonal meshes. These contributions are complementary and can be used sequentially to reconstruct city models at various level-of-details from airborne 3D data. We illustrate the robustness, scalability and efficiency of our methods on both laser and multi-view stereo data composed of man-made objects. Nuage de points Maillage polygonal Segmentation sémantique Apprentissage approfondi Direction de forme Primitives géométriques Reconstruction de surface Point cloud Polygonal mesh Semantic segmentation Deep learning Shape detection Geometric primitives Surface reconstruction
98	Semantic Segmentation of Iron Pellets as a Cloud Service Christopher, Rosenvall January 2020 (has links) This master’s thesis evaluates automatic data annotation and machine learning predictions of iron ore pellets using tools provided by Amazon Web Services (AWS) in the cloud. The main tool in focus is Amazon SageMaker which is capable of automatic data annotation as well as building, training and deploying machine learning models quickly. Three different models was trained using SageMakers built in semantic segmentation algorithm, PSP, FCN and DeepLabV3. The dataset used for training and evaluation contains 180 images of iron ore pellets collected from LKAB’s experimental blast furnace in Luleå, Sweden. The Amazon Web Services solution for automatic annotation was shown to be of no use when annotating microscopic images of iron ore pellets. Ilastik which is an interactive learning and segmentation toolkit showed far superiority for the task at hand. Out of the three trained networks Fully-Convolutional Network (FCN) performed best looking at inference and training times, it was the quickest network to train and performed within 1% worse than the fastest in regard to inference time. The Fully-Convolutional Network had an average accuracy of 85.8% on the dataset, where both PSP & DeepLabV3 was showing similar performance. From the results in this thesis it was concluded that there are benefits of running deep neural networks as a cloud service for analysis and management ofiron ore pellets. AWS Amazon Web Services Iron Pellets Iron Ore Iron Ore Pellets Cloud Cloud Service Semantic Segmentation Machine Learning SageMaker Neural Networks FCN PSP DeepLab Computer Systems Datorsystem
99	U-net based deep learning architectures for object segmentation in biomedical images Nahian Siddique (11219427) 04 August 2021 (has links) <div>U-net is an image segmentation technique developed primarily for medical image analysis that can precisely segment images using a scarce amount of training data. These traits provide U-net with a high utility within the medical imaging community and have resulted in extensive adoption of U-net as the primary tool for segmentation tasks in medical imaging. The success of U-net is evident in its widespread use in nearly all major image modalities from CT scans and MRI to X-rays and microscopy. Furthermore, while U-net is largely a segmentation tool, there have been instances of the use of U-net in other applications. Given that U-net's potential is still increasing, this review examines the numerous developments and breakthroughs in the U-net architecture and provides observations on recent trends. We also discuss the many innovations that have advanced in deep learning and discuss how these tools facilitate U-net. In addition, we review the different image modalities and application areas that have been enhanced by U-net.</div><div>In recent years, deep learning for health care is rapidly infiltrating and transforming medical fields thanks to the advances in computing power, data availability, and algorithm development. In particular, U-Net, a deep learning technique, has achieved remarkable success in medical image segmentation and has become one of the premier tools in this area. While the accomplishments of U-Net and other deep learning algorithms are evident, there still exist many challenges in medical image processing to achieve human-like performance. In this thesis, we propose a U-net architecture that integrates a residual skip connections and recurrent feedback with EfficientNet as a pretrained encoder. Residual connections help feature propagation in deep neural networks and significantly improve performance against networks with a similar number of parameters while recurrent connections ameliorate gradient learning. We also propose a second model that utilizes densely connected layers aiding deeper neural networks. And the proposed third model that incorporates fractal expansions to bypass diminishing gradients. EfficientNet is a family of powerful pretrained encoders that streamline neural network design. The use of EfficientNet as an encoder provides the network with robust feature extraction that can be used by the U-Net decoder to create highly accurate segmentation maps. The proposed networks are evaluated against state-of-the-art deep learning based segmentation techniques to demonstrate their superior performance.</div> Computer Vision U-net Image Segmentation Semantic Segmentation Medical imaging Artificial neural network Deep Learning Neural Networks
100	Automatic segmentation of articular cartilage in arthroscopic images using deep neural networks and multifractal analysis Ångman, Mikael, Viken, Hampus January 2020 (has links) Osteoarthritis is a large problem affecting many patients globally, and diagnosis of osteoarthritis is often done using evidence from arthroscopic surgeries. Making a correct diagnosis is hard, and takes years of experience and training on thousands of images. Therefore, developing an automatic solution to perform the diagnosis would be extremely helpful to the medical field. Since machine learning has been proven to be useful and effective at classifying and segmenting medical images, this thesis aimed at solving the problem using machine learning methods. Multifractal analysis has also been used extensively for medical imaging segmentation. This study proposes two methods of automatic segmentation using neural networks and multifractal analysis. The thesis was performed using real arthroscopic images from surgeries. MultiResUnet architecture is shown to be well suited for pixel perfect segmentation. Classification of multifractal features using neural networks is also shown to perform well when compared to related studies. Convolutional neural networks multifractal analysis arthroscopy semantic segmentation wavelet leaders wavelet p-leaders Medical Image Processing Medicinsk bildbehandling Computer Engineering Datorteknik

Search results