Global ETD Search

51	Semantic segmentation of terrain and road terrain for advanced driver assistance systems Gheorghe, I. V. January 2015 (has links) Modern automobiles and particularly those with off-road lineage possess subsystems that can be configured to better negotiate certain terrain types. Different terrain classes amount to different adherence (or surface grip) and compressibility properties that impact vehicle ma-noeuvrability and should therefore incur a tailored throttle response, suspension stiffness and so on. This thesis explores prospective terrain recognition for an anticipating terrain response driver assistance system. Recognition of terrain and road terrain is cast as a semantic segmen-tation task whereby forward driving images or point clouds are pre-segmented into atomic units and subsequently classified. Terrain classes are typically of amorphous spatial extent con-taining homogenous or granularly repetitive patterns. For this reason, colour and texture ap-pearance is the saliency of choice for monocular vision. In this work, colour, texture and sur-face saliency of atomic units are obtained with a bag-of-features approach. Five terrain classes are considered, namely grass, dirt, gravel, shrubs and tarmac. Since colour can be ambiguous among terrain classes such as dirt and gravel, several texture flavours are explored with scalar and structured output learning in a bid to devise an appropriate visual terrain saliency and predictor combination. Texture variants are obtained using local binary patters (LBP), filter responses (or textons) and dense key-point descriptors with daisy. Learning algorithms tested include support vector machine (SVM), random forest (RF) and logistic regression (LR) as scalar predictors while a conditional random field (CRF) is used for structured output learning. The latter encourages smooth labelling by incorporating the prior knowledge that neighbouring segments with similar saliency are likely segments of the same class. Once a suitable texture representation is devised the attention is shifted from monocular vision to stereo vision. Sur-face saliency from reconstructed point clouds can be used to enhance terrain recognition. Pre-vious superpixels span corresponding supervoxels in real world coordinates and two surface saliency variants are proposed and tested with all predictors: one using the height coordinates of point clouds and the other using fast point feature histograms (FPFH). Upon realisation that road recognition and terrain recognition can be assumed as equivalent problems in urban en-vironments, the top most accurate models consisting of CRFs are augmented with composi-tional high order pattern potentials (CHOPP). This leads to models that are able to strike a good balance between smooth local labelling and global road shape. For urban environments the label set is restricted to road and non-road (or equivalently tarmac and non-tarmac). Ex-periments are conducted using a proprietary terrain dataset and a public road evaluation da-taset. 629.28
52	Deep neural networks for semantic segmentation Bojja, Abhishake Kumar 28 April 2020 (has links) Segmenting image into multiple meaningful regions is an essential task in Computer Vision. Deep Learning has been highly successful for segmentation, benefiting from the availability of the annotated datasets and deep neural network architectures. However, depth-based hand segmentation, an important application area of semantic segmentation, has yet to benefit from rich and large datasets. In addition, while deep methods provide robust solutions, they are often not efficient enough for low-powered devices. In this thesis, we focus on these two problems. To tackle the problem of lack of rich data, we propose an automatic method for generating high-quality annotations and introduce a large scale hand segmentation dataset. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two-hand segmentation. Our automatic annotation method lowers the cost/complexity of creating high-quality datasets and makes it easy to expand the dataset in the future. To reduce the computational requirement and allow real-time segmentation on low power devices, we propose a new representation and architecture for deep networks that predict segmentation maps based on Voronoi Diagrams. Voronoi Diagrams split space into discrete regions based on proximity to a set of points making them a powerful representation of regions, which we can then use to represent our segmentation outcomes. Specifically, we propose to estimate the location and class for these sets of points, which are then rasterized into an image. Notably, we use a differentiable definition of the Voronoi Diagram based on the softmax operator, enabling its use as a decoder layer in an end-to-end trainable network. As rasterization can take place at any given resolution, our method especially excels at rendering high-resolution segmentation maps, given a low-resolution image. We believe that our new HandSeg dataset will open new frontiers in Hand Segmentation research, and our cost-effective automatic annotation pipeline can benefit other relevant labeling tasks. Our newly proposed segmentation network enables high-quality segmentation representations that are not practically possible on low power devices using existing approaches. / Graduate Deep Learning Computer Vision Semantic Segmentation Dataset Hands Hand Segmentation Automatic Labelling Voronoi Implicit Representation Rendering Cityscapes HandSeg
53	AUTOMATIC ASSESSMENT OF BURN INJURIES USING ARTIFICIAL INTELLIGENCE Daniela Chanci Arrubla (11154033) 20 July 2021 (has links) <p>Accurate assessment of burn injuries is critical for the correct management of such wounds. Depending on the total body surface area affected by the burn, and the severity of the injury, the optimal treatment and the surgical requirements are selected. However, such assessment is considered a clinical challenge. In this thesis, to address this challenge, an automatic framework to segment the burn using RGB images, and classify the injury based on the severity using ultrasound images is proposed and implemented. With the use this framework, the conventional assessment approach, which relies exclusively on a physical and visual examination of the injury performed by medical practitioners, could be complemented and supported, yielding accurate results. The ultrasound data enables the assessment of internal structures of the body, which can provide complementary and useful information. It is a noninvasive imaging modality that provides access to internal body structures that are not visible during the typical physical examination of the burn. The semantic segmentation module of the proposed approach was evaluated through one experiment. Similarly, the classification module was evaluated through two experiments. The second experiment assessed the effects of incorporating texture features as extra features for the classification task. Experimental results and evaluation metrics demonstrated the satisfactory results obtained with the proposed framework for the segmentation and classification problem. Therefore, this work acts as a first step towards the creation of a Computer-Aided Diagnosis and Detection system for burn injury assessment.</p> Burn Injuries Ultrasound Artificial Intelligence Texture Computer Vision Semantic Segmentation Burn Depth Classification Burn Size
54	Deep Learning for 3D Perception: Computer Vision and Tactile Sensing Garcia-Garcia, Alberto 23 October 2019 (has links) The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. To achieve full scene understanding, various tasks must be accomplished. In this thesis we will focus on three of them: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool for solving such recognition, segmentation, and prediction problems. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability. Deep Learning Computer Vision Synthetic Data Tactile Sensing Convolutional Neural Networks Semantic Segmentation
55	Automatic classification of fish and bubbles at pixel-level precision in multi-frequency acoustic echograms using U-Net convolutional neural networks Slonimer, Alex 05 April 2022 (has links) Multi-frequency backscatter acoustic profilers (echosounders) are used to measure biological and physical phenomena in the ocean in ways that are not possible with optical methods. Echosounders are commonly used on ocean observatories and by commercial fisheries but require significant manual effort to classify species of interest within the collected echograms. The work presented in this thesis tackles the challenging task of automating the identification of fish and other phenomena in echosounder data, with specific application to aggregations of juvenile salmon, schools of herring, and bubbles of air that have been mixed into the water. U-Net convolutional neural networks (CNNs) are used to accomplish this task by identifying classes at the pixel level. The data considered here were collected in Okisollo Channel on the coast of British Columbia, Canada, using an Acoustic Zooplankton and Fish Profiler at four frequencies (67.5, 125, 200, and 455 kHz). The entrainment of air bubbles and the behaviour of fish are both governed by the surrounding physical environment. To improve the classification, simulated channels for water depth and solar elevation angle (a proxy for sunlight) are used to encode the CNNs with information related to the environment providing spatial and temporal context. The manual annotation of echograms at the pixel level is a challenging process, and a custom application was developed to aid in this process. A relatively small set of annotations were created and are used to train the CNNs. During training, the echogram data are divided into randomly-spaced square tiles to encode the models with robust features, and into overlapping tiles for added redundancy during classification. This is done without removing noise in the data, thus ensuring broad applicability. This approach is proven highly successful, as evidenced by the best-performing U-Net model producing F1 scores of 93.0%, 87.3% and 86.5% for herring, salmon, and bubble classes, respectively. These models also achieve promising results when applied to echogram data with coarser resolution. One goal in fisheries acoustics is to detect distinct schools of fish. Following the initial pixel level classification, the results from the best performing U-Net model are fed through a heuristic module, inspired by traditional fisheries methods, that links connected components of identified fish (school candidates) into distinct school objects. The results are compared to the outputs from a recent study that relied on a Mask R-CNN architecture to apply instance segmentation for classifying fish schools. It is demonstrated that the U-Net/heuristic hybrid technique improves on the Mask R-CNN approach by a small amount for the classification of herring schools, and by a large amount for aggregations of juvenile salmon (improvement in mean average precision from 24.7% to 56.1%). / Graduate CNN bioacoustics echogram machine learning salmon herring semantic segmentation instance segmentation fish school ocean acoustics active acoustics u-net
56	Apprentissage statistique de classes sémantiques pour l'interprétation d'images aériennes / Learning of semantic classes for aerial image analysis Randrianarivo, Hicham 15 December 2016 (has links) Ce travail concerne l'interprétation du contenu des images aériennes optiques panchromatiques très haute résolution. Deux méthodes pour la classification du contenu de ces images ont été développées. Une méthode basée sur la détection des instances des différentes catégories d'objets et une autre méthode basée sur la segmentation sémantique des superpixels de l'image utilisant un modèle de contexte entre les différentes instances des superpixels. La méthode de détection des objets dans une image très haute résolution est basée sur l'apprentissage d'un mélange de modèle d'apparence de la catégorie d'objets à détecter puis d'une fusion des hypothèses renvoyées par les différents modèles. Nous proposons une méthode de partitionnement en sous catégories visuelles basée sur une procédure en deux étapes des exemples d'apprentissages de la base en fonction des métadonnées disponibles et de l'apparence des exemples d'apprentissage. Cette phase de partitionnement permet d'apprendre des modèles d'apparence où chacun est spécialisés dans la reconnaissance d'une sous-partie de la base et dont la fusion permet la généralisation de la détection à l'ensemble des objets de la classe. Les performances du détecteur ainsi obtenu sont évaluées sur plusieurs bases d'images aériennes très haute résolution à des résolution différentes et en plusieurs endroits du monde. La méthode de segmentation sémantique contextuelle développée utilise une combinaison de la description visuelle d'un superpixel extrait d'une image et des informations de contexte extraient entre un superpixel et ses voisins. La représentation du contexte entre les superpixels est obtenu en utilisant une représentation par modèle graphique entre les superpixels voisins. Les noeuds du graphes étant la représentation visuelle d'un superpixel et les arêtes la représentation contextuelle entre deux voisins. Enfin nous présentons une méthode de prédiction de la catégorie d'un superpixel en fonction des décisions données par les voisins pour rendre les prédictions plus robustes. La méthode a été testé sur une base d'image aérienne très haute résolution. / This work is about interpretation of the content of very high resolution aerial optical panchromatic images. Two methods are proposed for the classification of this kind of images. The first method aims at detecting the instances of a class of objects and the other method aims at segmenting superpixels extracted from the images using a contextual model of the relations between the superpixels. The object detection method in very high resolution images uses a mixture of appearance models of a class of objects then fuses the hypothesis returned by the models. We develop a method that clusters training samples into visual subcategories based on a two stages procedure using metadata and visual information. The clustering part allows to learn models that are specialised in recognizing a subset of the dataset and whose fusion lead to a generalization of the object detector. The performances of the method are evaluate on several dataset of very high resolution images at several resolutions and several places. The method proposed for contextual semantic segmentation use a combination of visual description of a superpixel extract from the image and contextual information gathered between a superpixel and its neighbors. The contextual representation is based on a graph where the nodes are the superpixels and the edges are the relations between two neighbors. Finally we predict the category of a superpixel using the predictions made by of the neighbors using the contextual model in order to make the prediction more reliable. We test our method on a dataset of very high resolution images. Apprentissage statistique Détection d'objets Modèle de contexte Segmentation sémantique Machine learning Object detection Contextual Model Semantic segmentation 006.3 621.367 8
57	Semantic Segmentation Using Deep Learning Neural Architectures Sarpangala, Kishan January 2019 (has links) No description available. Artificial Intelligence Semantic Segmentation Convolutional Neural Network Computer Vision Deep Learning Neural Network Artificial Intelligence Fully Convolutional Network
58	Semantic Segmentation of RGB images for feature extraction in Real Time Elavarthi, Pradyumna January 2019 (has links) No description available. Computer Science Target Identification semantic segmentation depth-wise convolution fully convolutional neural networks neural networks object detection
59	The World in 3D : Geospatial Segmentation and Reconstruction Robín Karlsson, David January 2022 (has links) Deep learning has proven a powerful tool for image analysis during the past two decades. With the rise of high resolution overhead imagery, an opportunity for automatic geospatial 3D-recreation has presented itself. This master thesis researches the possibil- ity of 3D-recreation through deep learning based image analysis of overhead imagery. The goal is a model capable of making predictions for three different tasks: heightmaps, bound- ary proximity heatmaps and semantic segmentations. A new neural network is designed with the novel feature of supplying the predictions from one task to another with the goal of improving performance. A number of strategies to ensure the model generalizes to un- seen data are employed. The model is trained using satellite and aerial imagery from a variety of cities on the planet. The model is meticulously evaluated by using four common performance metrics. For datasets with no ground truth data, the results were assessed visually. This thesis concludes that it is possible to create a deep learning network capa- ble of making predictions for the three tasks with varying success, performing best for heightmaps and worst for semantic segmentation. It was observed that supplying estima- tions from one task to another can both improve and decrease performance. Analysis into what features in an image is important for the three tasks was clear in some images, unclear in others. Lastly, validation proved that a number of random transformations during the training process helped the model generalize to unseen data. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> deep learning ai machine learning geospatial gis heightmap semantic segmentation satellite imagery aerial imagery Media and Communication Technology Medieteknik
60	2D object detection and semantic segmentation in the Carla simulator / 2D-objekt detektering och semantisk segmentering i Carla-simulatorn Wang, Chen January 2020 (has links) The subject of self-driving car technology has drawn growing interest in recent years. Many companies, such as Baidu and Tesla, have already introduced automatic driving techniques in their newest cars when driving in a specific area. However, there are still many challenges ahead toward fully autonomous driving cars. Tesla has caused several severe accidents when using autonomous driving functions, which makes the public doubt self-driving car technology. Therefore, it is necessary to use the simulator environment to help verify and perfect algorithms for the perception, planning, and decision-making of autonomous vehicles before implementation in real-world cars. This project aims to build a benchmark for implementing the whole self-driving car system in software. There are three main components including perception, planning, and control in the entire autonomous driving system. This thesis focuses on two sub-tasks 2D object detection and semantic segmentation in the perception part. All of the experiments will be tested in a simulator environment called The CAR Learning to Act(Carla), which is an open-source platform for autonomous car research. Carla simulator is developed based on the game engine(Unreal4). It has a server-client system, which provides a flexible python API. 2D object detection uses the You only look once(Yolov4) algorithm that contains the tricks of the latest deep learning techniques from the aspect of network structure and data augmentation to strengthen the network’s ability to learn the object. Yolov4 achieves higher accuracy and short inference time when comparing with the other popular object detection algorithms. Semantic segmentation uses Efficient networks for Computer Vision(ESPnetv2). It is a light-weight and power-efficient network, which achieves the same performance as other semantic segmentation algorithms by using fewer network parameters and FLOPS. In this project, Yolov4 and ESPnetv2 are implemented into the Carla simulator. Two modules work together to help the autonomous car understand the world. The minimal distance awareness application is implemented into the Carla simulator to detect the distance to the ahead vehicles. This application can be used as a basic function to avoid the collision. Experiments are tested by using a single Nvidia GPU(RTX2060) in Ubuntu 18.0 system. / Ämnet självkörande bilteknik har väckt intresse de senaste åren. Många företag, som Baidu och Tesla, har redan infört automatiska körtekniker i sina nyaste bilar när de kör i ett specifikt område. Det finns dock fortfarande många utmaningar inför fullt autonoma bilar. Detta projekt syftar till att bygga ett riktmärke för att implementera hela det självkörande bilsystemet i programvara. Det finns tre huvudkomponenter inklusive uppfattning, planering och kontroll i hela det autonoma körsystemet. Denna avhandling fokuserar på två underuppgifter 2D-objekt detektering och semantisk segmentering i uppfattningsdelen. Alla experiment kommer att testas i en simulatormiljö som heter The CAR Learning to Act (Carla), som är en öppen källkodsplattform för autonom bilforskning. Du ser bara en gång (Yolov4) och effektiva nätverk för datorvision (ESPnetv2) implementeras i detta projekt för att uppnå Funktioner för objektdetektering och semantisk segmentering. Den minimala distans medvetenhets applikationen implementeras i Carla-simulatorn för att upptäcka avståndet till de främre bilarna. Denna applikation kan användas som en grundläggande funktion för att undvika kollisionen. Object-detectiopn Semantic segmentation Yolov4 ESPnetv2 Carla Objekt-detectiopn semantisk segmentering Yolov4 ESPnetv2 Carla Computer and Information Sciences Data- och informationsvetenskap

Search results