• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 117
  • 5
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 165
  • 165
  • 102
  • 84
  • 71
  • 62
  • 56
  • 50
  • 46
  • 38
  • 36
  • 30
  • 29
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Compréhension de scènes urbaines par combinaison d'information 2D/3D / Urban scenes understanding by combining 2D/3D information

Bauda, Marie-Anne 13 June 2016 (has links)
Cette thèse traite du problème de segmentation sémantique d'une séquence d'images calibrées acquises dans un environnement urbain. Ce problème consiste, plus précisément, à partitionner chaque image en régions représentant les objets de la scène (façades, routes, etc.). Ainsi, à chaque région est associée une étiquette sémantique. Dans notre approche, l'étiquetage s'opère via des primitives visuelles de niveau intermédiaire appelés super-pixels, lesquels regroupent des pixels similaires au sens de différents critères proposés dans la littérature, qu'ils soient photométriques (s'appuyant sur les couleurs) ou géométriques (limitant la taille des super-pixels formés). Contrairement à l'état de l'art, où les travaux récents traitant le même problème s'appuient en entrée sur une sur-segmentation initiale sans la remettre en cause, notre idée est de proposer, dans un contexte multi-vues, une nouvelle approche de constructeur de superpixels s'appuyant sur une analyse tridimensionnelle de la scène et, en particulier, de ses structures planes. Pour construire de «meilleurs» superpixels, une mesure de planéité locale, qui quantifie à quel point la zone traitée de l'image correspond à une surface plane de la scène, est introduite. Cette mesure est évaluée à partir d'une rectification homographique entre deux images proches, induites par un plan candidat au support des points 3D associés à la zone traitée. Nous analysons l'apport de la mesure UQI (Universal Quality Image) et montrons qu'elle se compare favorablement aux autres métriques qui ont le potentiel de détecter des structures planes. On introduit ensuite un nouvel algorithme de construction de super-pixels, fondé sur l'algorithme SLIC (Simple Linear Iterative Clustering) dont le principe est de regrouper les plus proches voisins au sens d'une distance fusionnant similarités en couleur et en distance, et qui intègre cette mesure de planéité. Ainsi la sur-segmentation obtenue, couplée à la cohérence interimages provenant de la validation de la contrainte de planéité locale de la scène, permet d'attribuer une étiquette à chaque entité et d'obtenir ainsi une segmentation sémantique qui partitionne l'image en objets plans. / This thesis deals with the semantic segmentation problem of a calibrated sequence of images acquired in an urban environment. The problem is, specifically, to partition each image into regions representing the objects in the scene such as facades, roads, etc. Thus, each region is associated with a semantic tag. In our approach, the labelling is done through mid-level visual features called super-pixels, which are groups of similar pixels within the meaning of some criteria proposed in research such as photometric criteria (based on colour) or geometrical criteria thus limiting the size of super-pixel formed. Unlike the state of the art, where recent work addressing the same problem are based on an initial over-segmentation input without calling it into question, our idea is to offer, in a multi-view environment, another super-pixel constructor approach based on a three-dimensional scene analysis and, in particular, an analysis of its planar structures. In order to construct "better" super-pixels, a local flatness measure is introduced which quantifies at which point the zone of the image in question corresponds to a planar surface of the scene. This measure is assessed from the homographic correction between two close images, induced by a candidate plan as support to the 3D points associated with the area concerned. We analyze the contribution of the UQI measure (Universal Image Quality) and demonstrate that it compares favorably with other metrics which have the potential to detect planar structures. Subsequently we introduce a new superpixel construction algorithm based on the SLIC (Simple Linear Iterative Clustering) algorithm whose principle is to group the nearest neighbors in terms of a distance merging similarities in colour and distance, and which includes this local planarity measure. Hence the over-segmentation obtained, coupled with the inter-image coherence as a result of the validation of the local flatness constraint related to the scene, allows assigning a label to each entity and obtaining in this way a semantic segmentation which divides the image into planar objects.
62

Make it Meaningful : Semantic Segmentation of Three-Dimensional Urban Scene Models

Lind, Johan January 2017 (has links)
Semantic segmentation of a scene aims to give meaning to the scene by dividing it into meaningful — semantic — parts. Understanding the scene is of great interest for all kinds of autonomous systems, but manual annotation is simply too time consuming, which is why there is a need for an alternative approach. This thesis investigates the possibility of automatically segmenting 3D-models of urban scenes, such as buildings, into a predetermined set of labels. The approach was to first acquire ground truth data by manually annotating five 3D-models of different urban scenes. The next step was to extract features from the 3D-models and evaluate which ones constitutes a suitable feature space. Finally, three supervised learners were implemented and evaluated: k-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Classification Forest (RCF). The classifications were done point-wise, classifying each 3D-point in the dense point cloud belonging to the model being classified. The result showed that the best suitable feature space is not necessarily the one containing all features. The KNN classifier got the highest average accuracy overall models — classifying 42.5% of the 3D points correct. The RCF classifier managed to classify 66.7% points correct in one of the models, but had worse performance for the rest of the models and thus resulting in a lower average accuracy compared to KNN. In general, KNN, SVM, and RCF seemed to have different benefits and drawbacks. KNN is simple and intuitive but by far the slowest classifier when dealing with a large set of training data. SVM and RCF are both fast but difficult to tune as there are more parameters to adjust. Whether the reason for obtaining the relatively low highest accuracy was due to the lack of ground truth training data, unbalanced validation models, or the capacity of the learners, was never investigated due to a limited time span. However, this ought to be investigated in future studies.
63

Semantic segmentation of terrain and road terrain for advanced driver assistance systems

Gheorghe, I. V. January 2015 (has links)
Modern automobiles and particularly those with off-road lineage possess subsystems that can be configured to better negotiate certain terrain types. Different terrain classes amount to different adherence (or surface grip) and compressibility properties that impact vehicle ma-noeuvrability and should therefore incur a tailored throttle response, suspension stiffness and so on. This thesis explores prospective terrain recognition for an anticipating terrain response driver assistance system. Recognition of terrain and road terrain is cast as a semantic segmen-tation task whereby forward driving images or point clouds are pre-segmented into atomic units and subsequently classified. Terrain classes are typically of amorphous spatial extent con-taining homogenous or granularly repetitive patterns. For this reason, colour and texture ap-pearance is the saliency of choice for monocular vision. In this work, colour, texture and sur-face saliency of atomic units are obtained with a bag-of-features approach. Five terrain classes are considered, namely grass, dirt, gravel, shrubs and tarmac. Since colour can be ambiguous among terrain classes such as dirt and gravel, several texture flavours are explored with scalar and structured output learning in a bid to devise an appropriate visual terrain saliency and predictor combination. Texture variants are obtained using local binary patters (LBP), filter responses (or textons) and dense key-point descriptors with daisy. Learning algorithms tested include support vector machine (SVM), random forest (RF) and logistic regression (LR) as scalar predictors while a conditional random field (CRF) is used for structured output learning. The latter encourages smooth labelling by incorporating the prior knowledge that neighbouring segments with similar saliency are likely segments of the same class. Once a suitable texture representation is devised the attention is shifted from monocular vision to stereo vision. Sur-face saliency from reconstructed point clouds can be used to enhance terrain recognition. Pre-vious superpixels span corresponding supervoxels in real world coordinates and two surface saliency variants are proposed and tested with all predictors: one using the height coordinates of point clouds and the other using fast point feature histograms (FPFH). Upon realisation that road recognition and terrain recognition can be assumed as equivalent problems in urban en-vironments, the top most accurate models consisting of CRFs are augmented with composi-tional high order pattern potentials (CHOPP). This leads to models that are able to strike a good balance between smooth local labelling and global road shape. For urban environments the label set is restricted to road and non-road (or equivalently tarmac and non-tarmac). Ex-periments are conducted using a proprietary terrain dataset and a public road evaluation da-taset.
64

Polyp segmentation using artificial neural networks

Rodríguez Villegas, Antoni January 2020 (has links)
Colorectal cancer is the second cause of cancer death in the world. Aiming to early detect and prevent this type of cancer, clinicians perform screenings through the colon searching for polyps (colorectal cancer precursor lesions).If found, these lesions are susceptible of being removed in order to further ana-lyze their malignancy degree. Automatic polyp segmentation is of primary impor-tance when it comes to computer-aided medical diagnosis using images obtained in colonoscopy screenings. These results allow for more precise medical diagnosis which can lead to earlier detection.This project proposed a neural network based solution for semantic segmenta-tion, using the U-net architecture.Combining different data augmentation techniques to alleviate the problem of data scarcity and conducting experiments on the different hyperparameters of the network, the U-net scored a mean Intersection over Union (IoU) of 0,6814. A final approach that combines prediction maps of different models scored a mean IoU of 0,7236.
65

Semantic Segmentation of Point Clouds Using Deep Learning / Semantisk Segmentering av Punktmoln med Deep Learning

Tosteberg, Patrik January 2017 (has links)
In computer vision, it has in recent years become more popular to use point clouds to represent 3D data. To understand what a point cloud contains, methods like semantic segmentation can be used. Semantic segmentation is the problem of segmenting images or point clouds and understanding what the different segments are. An application for semantic segmentation of point clouds are e.g. autonomous driving, where the car needs information about objects in its surrounding. Our approach to the problem, is to project the point clouds into 2D virtual images using the Katz projection. Then we use pre-trained convolutional neural networks to semantically segment the images. To get the semantically segmented point clouds, we project back the scores from the segmentation into the point cloud. Our approach is evaluated on the semantic3D dataset. We find our method is comparable to state-of-the-art, without any fine-tuning on the Semantic3Ddataset.
66

Deep neural networks for semantic segmentation

Bojja, Abhishake Kumar 28 April 2020 (has links)
Segmenting image into multiple meaningful regions is an essential task in Computer Vision. Deep Learning has been highly successful for segmentation, benefiting from the availability of the annotated datasets and deep neural network architectures. However, depth-based hand segmentation, an important application area of semantic segmentation, has yet to benefit from rich and large datasets. In addition, while deep methods provide robust solutions, they are often not efficient enough for low-powered devices. In this thesis, we focus on these two problems. To tackle the problem of lack of rich data, we propose an automatic method for generating high-quality annotations and introduce a large scale hand segmentation dataset. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two-hand segmentation. Our automatic annotation method lowers the cost/complexity of creating high-quality datasets and makes it easy to expand the dataset in the future. To reduce the computational requirement and allow real-time segmentation on low power devices, we propose a new representation and architecture for deep networks that predict segmentation maps based on Voronoi Diagrams. Voronoi Diagrams split space into discrete regions based on proximity to a set of points making them a powerful representation of regions, which we can then use to represent our segmentation outcomes. Specifically, we propose to estimate the location and class for these sets of points, which are then rasterized into an image. Notably, we use a differentiable definition of the Voronoi Diagram based on the softmax operator, enabling its use as a decoder layer in an end-to-end trainable network. As rasterization can take place at any given resolution, our method especially excels at rendering high-resolution segmentation maps, given a low-resolution image. We believe that our new HandSeg dataset will open new frontiers in Hand Segmentation research, and our cost-effective automatic annotation pipeline can benefit other relevant labeling tasks. Our newly proposed segmentation network enables high-quality segmentation representations that are not practically possible on low power devices using existing approaches. / Graduate
67

Semantic segmentation using convolutional neural networks to facilitate motion tracking of feet : For real-time analysis of perioperative microcirculation images in patients with critical limb thretening ischemia

Öberg, Andreas, Hulterström, Martin January 2021 (has links)
This thesis investigates the use of Convolutional Neural Networks (CNNs) toperform semantic segmentation of feet during endovascular surgery in patientswith Critical Limb Threatening Ischemia (CLTI). It is currently being investigatedwhether objective assessment of perfusion can aid surgeons during endovascularsurgery. By segmenting feet, it is possible to perform automatic analysis of perfusion data which could give information about the impact of the surgery in specificRegions of Interest (ROIs). The CNN was developed in Python with a U-net architecture which has shownto be state of the art when it comes to medical image segmentation. An imageset containing approximately 78 000 images of feet and their ground truth segmentation was manually created from 11 videos taken during surgery, and onevideo taken on three healthy test subjects. All videos were captured with a MultiExposure Laser Speckle Contrast Imaging (MELSCI) camera developed by Hultman et al. [1]. The best performing CNN was an ensemble model consisting of10 sub-models, each trained with different sets of training data. An ROI tracking algorithm was developed based on the Unet output, by takingadvantage of the simplicity of edge detection in binary images. The algorithmconverts images into point clouds and calculates a transformation between twopoint clouds with the use of the Iterative Closest Point (ICP) algorithm. The resultis a system that perform automatic tracking of manually selected ROIs whichenables continuous measurement of perfusion in the ROIs during endovascularsurgery.
68

AUTOMATIC ASSESSMENT OF BURN INJURIES USING ARTIFICIAL INTELLIGENCE

Daniela Chanci Arrubla (11154033) 20 July 2021 (has links)
<p>Accurate assessment of burn injuries is critical for the correct management of such wounds. Depending on the total body surface area affected by the burn, and the severity of the injury, the optimal treatment and the surgical requirements are selected. However, such assessment is considered a clinical challenge. In this thesis, to address this challenge, an automatic framework to segment the burn using RGB images, and classify the injury based on the severity using ultrasound images is proposed and implemented. With the use this framework, the conventional assessment approach, which relies exclusively on a physical and visual examination of the injury performed by medical practitioners, could be complemented and supported, yielding accurate results. The ultrasound data enables the assessment of internal structures of the body, which can provide complementary and useful information. It is a noninvasive imaging modality that provides access to internal body structures that are not visible during the typical physical examination of the burn. The semantic segmentation module of the proposed approach was evaluated through one experiment. Similarly, the classification module was evaluated through two experiments. The second experiment assessed the effects of incorporating texture features as extra features for the classification task. Experimental results and evaluation metrics demonstrated the satisfactory results obtained with the proposed framework for the segmentation and classification problem. Therefore, this work acts as a first step towards the creation of a Computer-Aided Diagnosis and Detection system for burn injury assessment.</p>
69

Deep Learning for 3D Perception: Computer Vision and Tactile Sensing

Garcia-Garcia, Alberto 23 October 2019 (has links)
The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. To achieve full scene understanding, various tasks must be accomplished. In this thesis we will focus on three of them: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool for solving such recognition, segmentation, and prediction problems. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability.
70

Automatic classification of fish and bubbles at pixel-level precision in multi-frequency acoustic echograms using U-Net convolutional neural networks

Slonimer, Alex 05 April 2022 (has links)
Multi-frequency backscatter acoustic profilers (echosounders) are used to measure biological and physical phenomena in the ocean in ways that are not possible with optical methods. Echosounders are commonly used on ocean observatories and by commercial fisheries but require significant manual effort to classify species of interest within the collected echograms. The work presented in this thesis tackles the challenging task of automating the identification of fish and other phenomena in echosounder data, with specific application to aggregations of juvenile salmon, schools of herring, and bubbles of air that have been mixed into the water. U-Net convolutional neural networks (CNNs) are used to accomplish this task by identifying classes at the pixel level. The data considered here were collected in Okisollo Channel on the coast of British Columbia, Canada, using an Acoustic Zooplankton and Fish Profiler at four frequencies (67.5, 125, 200, and 455 kHz). The entrainment of air bubbles and the behaviour of fish are both governed by the surrounding physical environment. To improve the classification, simulated channels for water depth and solar elevation angle (a proxy for sunlight) are used to encode the CNNs with information related to the environment providing spatial and temporal context. The manual annotation of echograms at the pixel level is a challenging process, and a custom application was developed to aid in this process. A relatively small set of annotations were created and are used to train the CNNs. During training, the echogram data are divided into randomly-spaced square tiles to encode the models with robust features, and into overlapping tiles for added redundancy during classification. This is done without removing noise in the data, thus ensuring broad applicability. This approach is proven highly successful, as evidenced by the best-performing U-Net model producing F1 scores of 93.0%, 87.3% and 86.5% for herring, salmon, and bubble classes, respectively. These models also achieve promising results when applied to echogram data with coarser resolution. One goal in fisheries acoustics is to detect distinct schools of fish. Following the initial pixel level classification, the results from the best performing U-Net model are fed through a heuristic module, inspired by traditional fisheries methods, that links connected components of identified fish (school candidates) into distinct school objects. The results are compared to the outputs from a recent study that relied on a Mask R-CNN architecture to apply instance segmentation for classifying fish schools. It is demonstrated that the U-Net/heuristic hybrid technique improves on the Mask R-CNN approach by a small amount for the classification of herring schools, and by a large amount for aggregations of juvenile salmon (improvement in mean average precision from 24.7% to 56.1%). / Graduate

Page generated in 0.1174 seconds