• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 97
  • 5
  • 4
  • 3
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 140
  • 140
  • 92
  • 72
  • 63
  • 52
  • 49
  • 46
  • 39
  • 37
  • 31
  • 30
  • 23
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Multitask Deep Learning models for real-time deployment in embedded systems / Deep Learning-modeller för multitaskproblem, anpassade för inbyggda system i realtidsapplikationer

Martí Rabadán, Miquel January 2017 (has links)
Multitask Learning (MTL) was conceived as an approach to improve thegeneralization ability of machine learning models. When applied to neu-ral networks, multitask models take advantage of sharing resources forreducing the total inference time, memory footprint and model size. Wepropose MTL as a way to speed up deep learning models for applicationsin which multiple tasks need to be solved simultaneously, which is par-ticularly useful in embedded, real-time systems such as the ones foundin autonomous cars or UAVs.In order to study this approach, we apply MTL to a Computer Vi-sion problem in which both Object Detection and Semantic Segmenta-tion tasks are solved based on the Single Shot Multibox Detector andFully Convolutional Networks with skip connections respectively, usinga ResNet-50 as the base network. We train multitask models for twodifferent datasets, Pascal VOC, which is used to validate the decisionsmade, and a combination of datasets with aerial view images capturedfrom UAVs.Finally, we analyse the challenges that appear during the process of train-ing multitask networks and try to overcome them. However, these hinderthe capacity of our multitask models to reach the performance of the bestsingle-task models trained without the limitations imposed by applyingMTL. Nevertheless, multitask networks benefit from sharing resourcesand are 1.6x faster, lighter and use less memory compared to deployingthe single-task models in parallel, which turns essential when runningthem on a Jetson TX1 SoC as the parallel approach does not fit intomemory. We conclude that MTL has the potential to give superior per-formance as far as the object detection and semantic segmentation tasksare concerned in exchange of a more complex training process that re-quires overcoming challenges not present in the training of single-taskmodels.
62

Semantic Stixels fusing LIDAR for Scene Perception / Semantiska Stixlar med LIDAR för självkörande bilar

Forsberg, Olof January 2018 (has links)
Autonomous driving is the concept of a vehicle that operates in traffic without instructions from a driver. A major challenge for such a system is to provide a comprehensive, accurate and compact scene model based on information from sensors. For such a model to be comprehensive it must provide 3D position and semantics on relevant surroundings to enable a safe traffic behavior. Such a model creates a foundation for autonomous driving to make substantiated driving decisions. The model must be compact to enable efficient processing, allowing driving decisions to be made in real time. In this thesis rectangular objects (The Stixelworld) are used to represent the surroundings of a vehicle and provide a scene model. LIDAR and semantic segmentation are fused in the computation of these rectangles. This method indicates that a dense and compact scene model can be provided also from sparse LIDAR data by use of semantic segmentation. / Fullt självkörande fordon behöver inte förare. Ett sådant fordon behöver en precis, detaljerad och kompakt modell av omgivningen baserad på sensordata. Med detaljerad avses att modellen innefattar all information nödvändig för ett trafiksäkert beteende. Med kompakt avses att en snabb bearbetning kan göras av modellen så att fordonet i realtid kan fatta beslut och manövrera i trafiken. I denna uppsats tillämpas en metod där man med rektangulära objekt skapar en modell av omgivningen. Dessa beräknas från LIDAR och semantisk segmentering. Arbetet indikerar att med hjälp av semantisk segmentering kan en tät, detaljerad och kompakt modell göras även från glesa LIDAR-data.
63

Creating a semantic segmentationmachine learning model for sea icedetection on radar images to study theThwaites region

Fuentes Soria, Carmen January 2022 (has links)
This thesis presents a deep learning tool able to identify ice in radar images fromthe sea-ice environment of the Twhaites glacier outlet. The project is motivatedby the threatening situation of the Thwaites glacier that has been increasingits mass loss rate during the last decade. This is of concern considering thelarge mass of ice held by the glacier, that in case of melting, could increasethe mean sea level by more than +65 cm [1]. The algorithm generated alongthis work is intended to help in the generation of navigation charts and identificationof icebergs in future stages of the project, outside of the scope of this thesis.The data used for this task are ICEYE’s X-band radar images from the Thwaitessea-ice environment, the target area to be studied. The corresponding groundtruth for each of the samples has been manually generated identifying the iceand icebergs present in each image. Additional data processing includes tiling,to increment the number of samples, and augmentation, done by horizontal andvertical flips of a random number of tiles.The proposed tool performs semantic segmentation on radar images classifyingthe class "Ice". It is developed by a deep learning Convolutional Neural Network(CNN) model, trained with prepared ICEYE’s radar images. The model reachesvalues of F1 metric higher than 89% in the images of the target area (Thwaitessea-ice environment) and is able to generalize to different regions of Antarctica,reaching values of F1 = 80 %. A potential alternative version of the algorithm isproposed and discussed. This alternative score F1 values higher than F1 > 95 %for images of the target environment and F1 = 87 % for the image of the differentregion. However, it must not be confirmed as the final algorithm due to the needfor further verification.
64

[en] A DATA-CENTRIC APPROACH TO IMPROVING SEGMENTATION MODELS WITH DEEP LEARNING IN MAMMOGRAPHY IMAGES / [pt] UMA ABORDAGEM CENTRADA EM DADOS PARA O APRIMORAMENTO DE MODELOS DE SEGMENTAÇÃO COM APRENDIZADO PROFUNDO EM IMAGENS DE MAMOGRAFIA

SANTIAGO STIVEN VALLEJO SILVA 07 December 2023 (has links)
[pt] A segmentação semântica das estruturas anatômicas em imagens de mamografia desempenha um papel significativo no apoio da análise médica. Esta tarefa pode ser abordada com o uso de um modelo de aprendizado de máquina, que deve ser capaz de identificar e delinear corretamente as estruturas de interesse tais como papila, tecido fibroglandular, músculo peitoral e tecido gorduroso. No entanto, a segmentação de estruturas pequenas como papila e peitoral é frequentemente um desafio. Sendo o maior desafio o reconhecimento ou deteção do músculo peitoral na vista craniocaudal (CC), devido ao seu tamanho variável, possíveis ausências e sobreposição de tecido fibroglandular. Para enfrentar esse desafio, este trabalho propõe uma abordagem centrada em dados para melhorar o desempenho do modelo de segmentação na papila mamária e no músculo peitoral. Especificamente, aprimorando os dados de treinamento e as anotações em duas etapas. A primeira etapa é baseada em modificações nas anotações. Foram desenvolvidos algoritmos para buscar automaticamente anotações fora do comum dependendo da sua forma. Com estas anotações encontradas, foi feita uma revisão e correção manual. A segunda etapa envolve um downsampling do conjunto de dados, reduzindo as amostras de imagens do conjunto de treinamento. Foram analisados os casos de falsos positivos e falsos negativos, identificando as imagens que fornecem informações confusas, para posteriormente removê-las do conjunto. Em seguida, foram treinados modelos usando os dados de cada etapa e foram obtidas as métricas de classificação para o músculo peitoral em vista CC e o IoU para cada estrutura nas vistas CC e MLO (Mediolateral Oblíqua). Os resultados do treinamento mostram uma melhora progressiva na identificação e segmentação do músculo peitoral em vista CC e uma melhora na papila em vista MLO, mantendo as métricas para as demais estruturas. / [en] The semantic segmentation of anatomical structures in mammography images plays a significant role in supporting medical analysis. This task can be approached using a machine learning model, which must be capable of identifying and accurately delineating the structures. However, segmentation of small structures such as nipple and pectoral is often challenging. Especially in there cognition or detection of the pectoral muscle in the craniocaudal (CC) view,due to its variable size, possible absences and overlapping of fibroglandular tissue.To tackle this challenge, this work proposes a data-centric approach to improvethe segmentation model s performance on the mammary papilla and pectoral muscle. Specifically, enhancing the training data and annotations in two stages.The first stage is based on modifications to the annotations. Algorithms were developed to automatically search for uncommon annotations dependingon their shape. Once these annotations were found, a manual review and correction were performed.The second stage involves downsampling the dataset, reducing the image samples in the training set. Cases of false positives and false negatives were analyzed, identifying images that provide confusing information, which were subsequently removed from the set. Next, models were trained using the data from each stage, and classification metrics were obtained for the pectoral muscle in the CC view and IoU for each structure in CC and MLO (mediolateral oblique) views. The training results show a progressive improvement in the identification and segmentation of the pectoral muscle in the CC view and an enhancement in the mammary papilla in the MLO view, while maintaining segmentation metricsfor the other structures.
65

Knowledge Distillation for Semantic Segmentation and Autonomous Driving. : Astudy on the influence of hyperparameters, initialization of a student network and the distillation method on the semantic segmentation of urban scenes.

Sanchez Nieto, Juan January 2022 (has links)
Reducing the size of a neural network whilst maintaining a comparable performance is an important problem to be solved since the constrictions on resources of small devices make it impossible to deploy large models in numerous real-life scenarios. A prominent example is autonomous driving, where computer vision tasks such as object detection and semantic segmentation need to be performed in real time by mobile devices. In this thesis, the knowledge and spherical knowledge distillation techniques are utilized to train a small model (PSPNet50) under the supervision of a large model (PSPNet101) in order to perform semantic segmentation of urban scenes. The importance of the distillation hyperparameters is studied first, namely the influence of the temperature and the weights of the loss function on the performance of the distilled model, showing no decisive advantage over the individual training of the student. Thereafter, distillation is performed utilizing a pretrained student, revealing a good improvement in performance. Contrary to expectations, the pretrained student benefits from a high learning rate when training resumes under distillation, especially in the spherical knowledge distillation case, displaying a superior and more stable performance when compared to the regular knowledge distillation setting. These findings are validated by several experiments conducted using the Cityscapes dataset. The best distilled model achieves 87.287% pixel accuracy and a 42.0% mean Intersection-Over-Union value (mIoU) on the validation set, higher than the 86.356% pixel accuracy and 39.6% mIoU obtained by the baseline student. On the test set, the official evaluation obtained by submission to the Cityscapes website yields 42.213% mIoU for the distilled model and 41.085% for the baseline student. / Att minska storleken på ett neuralt nätverk med bibehållen prestanda är ett viktigt problem som måste lösas, eftersom de begränsade resurserna i små enheter gör det omöjligt att använda stora modeller i många verkliga situationer. Ett framträdande exempel är autonom körning, där datorseende uppgifter som objektsdetektering och semantisk segmentering måste utföras i realtid av mobila enheter. I den här avhandlingen används tekniker för destillation av kunskap och sfärisk kunskap för att träna en liten modell (PSPNet50) under övervakning av en stor modell (PSPNet101) för att utföra semantisk segmentering av stadsscener. Betydelsen av hyperparametrarna för destillation studeras först, nämligen temperaturens och förlustfunktionens vikter för den destillerade modellens prestanda, vilket inte visar någon avgörande fördel jämfört med individuell träning av eleven. Därefter utförs destillation med hjälp av en utbildad elev, vilket visar på en god förbättring av prestanda. Tvärtemot förväntningarna har den utbildade eleven en hög inlärningshastighet när utbildningen återupptas under destillation, särskilt i fallet med sfärisk kunskapsdestillation, vilket ger en överlägsen och stabilare prestanda jämfört med den vanliga kunskapsdestillationssituationen. Dessa resultat bekräftas av flera experiment som utförts med hjälp av datasetet Cityscapes. Den bästa destillerade modellen uppnår 87.287% pixelprecision och ett 42.0% medelvärde för skärning över union (mIoU) på valideringsuppsättningen, vilket är högre än de 86.356% pixelprecision och 39.6% mIoU som uppnåddes av grundstudenten. I testuppsättningen ger den officiella utvärderingen som gjordes på webbplatsen Cityscapes 42.213% mIoU för den destillerade modellen och 41.085% för grundstudenten.
66

Towards a Smart Food Diary : Evaluating semantic segmentation models on a newly annotated dataset: FoodSeg103

Reibel, Yann January 2024 (has links)
Automatic food recognition is becoming a solution to perform diet control as it has the ability to release the burden of self diet assessment by offering an easy process that immediately detects the food elements in the picture. This step consisting of accurately segmenting the different areas into the proper food category is crucial to make an accurate calorie estimation. In this thesis, we utilize the PREVENT project as a background to the task of creating a model capable of segmenting food. We decided to carry out the research on a newly annotated dataset FoodSeg103 that consists of a more data-realistic support for the implementation of this study. Most papers performed on FoodSeg103 focus on Vision transformer models that are seen as very trendy but also with computational constraints. We decided to choose DeepLabV3 as a dilation-based semantic segmentation model with main objective of training the model on the dataset and additionally with hope of improving the state-of-the-art results. We set up an iterative optimization process with purpose of maximizing the results and managed to attain 48.27% mIOU (also mentioned as "mIOU all" in the thesis). We also obtained a significant difference in average mIOU troughout all random search experiments in comparison to bayesian optimization experiments.This study has not overpassed the state-of-the-art performance but has managed to settle 1% behind, BEIT v2 Large remaining in first position with 49.4% mIOU.
67

Extracting Topography from Historic Topographic Maps Using GIS-Based Deep Learning

Pierce, Briar 01 May 2023 (has links) (PDF)
Historical topographic maps are valuable resources for studying past landscapes, but they are unsuitable for geospatial analysis. Cartographic map elements must be extracted and digitized for use in GIS. This can be accomplished by sophisticated image processing and pattern recognition techniques, and more recently, artificial intelligence. While these methods are generally effective, they require high levels of technical expertise. This study presents a straightforward method to digitally extract historical topographic map elements from within popular GIS software, using new and rapidly evolving toolsets. A convolutional neural network deep learning model was used to extract elevation contour lines from a 1940 United States Geological Survey (USGS) quadrangle in Sevier County, TN, ultimately producing a Digital Elevation Model (DEM). The topographically derived DEM (TOPO-DEM) is compared to a modern LiDAR-derived DEM to analyze its quality and utility. GIS-capable historians, archaeologists, geographers, and others can use this method in research and land management.
68

From interactive to semantic image segmentation

Gulshan, Varun January 2011 (has links)
This thesis investigates two well defined problems in image segmentation, viz. interactive and semantic image segmentation. Interactive segmentation involves power assisting a user in cutting out objects from an image, whereas semantic segmentation involves partitioning pixels in an image into object categories. We investigate various models and energy formulations for both these problems in this thesis. In order to improve the performance of interactive systems, low level texture features are introduced as a replacement for the more commonly used RGB features. To quantify the improvement obtained by using these texture features, two annotated datasets of images are introduced (one consisting of natural images, and the other consisting of camouflaged objects). A significant improvement in performance is observed when using texture features for the case of monochrome images and images containing camouflaged objects. We also explore adding mid-level cues such as shape constraints into interactive segmentation by introducing the idea of geodesic star convexity, which extends the existing notion of a star convexity prior in two important ways: (i) It allows for multiple star centres as opposed to single stars in the original prior and (ii) It generalises the shape constraint by allowing for Geodesic paths as opposed to Euclidean rays. Global minima of our energy function can be obtained subject to these new constraints. We also introduce Geodesic Forests, which exploit the structure of shortest paths in implementing the extended constraints. These extensions to star convexity allow us to use such constraints in a practical segmentation system. This system is evaluated by means of a “robot user” to measure the amount of interaction required in a precise way, and it is shown that having shape constraints reduces user effort significantly compared to existing interactive systems. We also introduce a new and harder dataset which augments the existing GrabCut dataset with more realistic images and ground truth taken from the PASCAL VOC segmentation challenge. In the latter part of the thesis, we bring in object category level information in order to make the interactive segmentation tasks easier, and move towards fully automated semantic segmentation. An algorithm to automatically segment humans from cluttered images given their bounding boxes is presented. A top down segmentation of the human is obtained using classifiers trained to predict segmentation masks from local HOG descriptors. These masks are then combined with bottom up image information in a local GrabCut like procedure. This algorithm is later completely automated to segment humans without requiring a bounding box, and is quantitatively compared with other semantic segmentation methods. We also introduce a novel way to acquire large quantities of segmented training data relatively effortlessly using the Kinect. In the final part of this work, we explore various semantic segmentation methods based on learning using bottom up super-pixelisations. Different methods of combining multiple super-pixelisations are discussed and quantitatively evaluated on two segmentation datasets. We observe that simple combinations of independently trained classifiers on single super-pixelisations perform almost as good as complex methods based on jointly learning across multiple super-pixelisations. We also explore CRF based formulations for semantic segmentation, and introduce novel visual words based object boundary description in the energy formulation. The object appearance and boundary parameters are trained jointly using structured output learning methods, and the benefit of adding pairwise terms is quantified on two different datasets.
69

Mise en correspondance robuste et détection de modèles visuels appliquées à l'analyse de façades / Robust feature correspondence and pattern detection for façade analysis

Ok, David 25 March 2013 (has links)
Depuis quelques années, avec l'émergence de larges bases d'images comme Google Street View, la capacité à traiter massivement et automatiquement des données, souvent très contaminées par les faux positifs et massivement ambiguës, devient un enjeu stratégique notamment pour la gestion de patrimoine et le diagnostic de l'état de façades de bâtiment. Sur le plan scientifique, ce souci est propre à faire avancer l'état de l'art dans des problèmes fondamentaux de vision par ordinateur. Notamment, nous traitons dans cette thèse les problèmes suivants: la mise en correspondance robuste, algorithmiquement efficace de caractéristiques visuelles et l'analyse d'images de façades par grammaire. L'enjeu est de développer des méthodes qui doivent également être adaptées à des problèmes de grande échelle. Tout d'abord, nous proposons une formalisation mathématique de la cohérence géométrique qui joue un rôle essentiel pour une mise en correspondance robuste de caractéristiques visuelles. A partir de cette formalisation, nous en dérivons un algorithme de mise en correspondance qui est algorithmiquement efficace, précise et robuste aux données fortement contaminées et massivement ambiguës. Expérimentalement, l'algorithme proposé se révèle bien adapté à des problèmes de mise en correspondance d'objets déformés, et à des problèmes de mise en correspondance précise à grande échelle pour la calibration de caméras. En s'appuyant sur notre algorithme de mise en correspondance, nous en dérivons ensuite une méthode de recherche d'éléments répétés, comme les fenêtres. Celle-ci s'avère expérimentalement très efficace et robuste face à des conditions difficiles comme la grande variabilité photométrique des éléments répétés et les occlusions. De plus, elle fait également peu d'hallucinations. Enfin, nous proposons des contributions méthodologiques qui exploitent efficacement les résultats de détections d'éléments répétés pour l'analyse de façades par grammaire, qui devient substantiellement plus précise et robuste / For a few years, with the emergence of large image database such as Google Street View, designing efficient, scalable, robust and accurate strategies have now become a critical issue to process very large data, which are also massively contaminated by false positives and massively ambiguous. Indeed, this is of particular interest for property management and diagnosing the health of building fac{c}ades. Scientifically speaking, this issue puts into question the current state-of-the-art methods in fundamental computer vision problems. More particularly, we address the following problems: (1) robust and scalable feature correspondence and (2) façade image parsing. First, we propose a mathematical formalization of the geometry consistency which plays a key role for a robust feature correspondence. From such a formalization, we derive a novel match propagation method. Our method is experimentally shown to be robust, efficient, scalable and accurate for highly contaminated and massively ambiguous sets of correspondences. Our experiments show that our method performs well in deformable object matching and large-scale and accurate matching problem instances arising in camera calibration. We build a novel repetitive pattern search upon our feature correspondence method. Our pattern search method is shown to be effective for accurate window localization and robust to the potentially great appearance variability of repeated patterns and occlusions. Furthermore, our pattern search method makes very few hallucinations. Finally, we propose methodological contributions that exploit our repeated pattern detection results, which results in a substantially more robust and more accurate façade image parsing
70

Segmentation sémantique d'images fortement structurées et faiblement structurées / Semantic Segmentation of Highly Structured and Weakly Structured Images

Gadde, Raghu Deep 30 June 2017 (has links)
Cette thèse pour but de développer des méthodes de segmentation pour des scènes fortement structurées (ex. bâtiments et environnements urbains) ou faiblement structurées (ex. paysages ou objets naturels). En particulier, les images de bâtiments peuvent être décrites en termes d'une grammaire de formes, et une dérivation de cette grammaire peut être inférée pour obtenir une segmentation d'une image. Cependant, il est difficile et long d'écrire de telles grammaires. Pour répondre à ce problème, nous avons développé une nouvelle méthode qui permet d'apprendre automatiquement une grammaire à partir d'un ensemble d'images et de leur segmentation associée. Des expériences montrent que des grammaires ainsi apprises permettent une inférence plus rapide et produisent de meilleures segmentations. Nous avons également étudié une méthode basée sur les auto-contextes pour segmenter des scènes fortement structurées et notamment des images de bâtiments. De manière surprenante, même sans connaissance spécifique sur le type de scène particulier observé, nous obtenons des gains significatifs en qualité de segmentation sur plusieurs jeux de données. Enfin, nous avons développé une technique basée sur les réseaux de neurones convolutifs (CNN) pour segmenter des images de scènes faiblement structurées. Un filtrage adaptatif est effectué à l'intérieur même du réseau pour permettre des dépendances entre zones d'images distantes. Des expériences sur plusieurs jeux de données à grande échelle montrent là aussi un gain important sur la qualité de segmentation / The aim of this thesis is to develop techniques for segmenting strongly-structuredscenes (e.g. building images) and weakly-structured scenes (e.g. natural images). Buildingimages can naturally be expressed in terms of grammars and inference is performed usinggrammars to obtain the optimal segmentation. However, it is difficult and time consum-ing to write such grammars. To alleviate this problem, a novel method to automaticallylearn grammars from a given training set of image and ground-truth segmentation pairs isdeveloped. Experiments suggested that such learned grammars help in better and fasterinference. Next, the effect of using grammars for strongly structured scenes is explored.To this end, a very simple technique based on Auto-Context is used to segment buildingimages. Surprisingly, even with out using any domain specific knowledge, we observedsignificant improvements in terms of performance on several benchmark datasets. Lastly,a novel technique based on convolutional neural networks is developed to segment imageswithout any high-level structure. Image-adaptive filtering is performed within a CNN ar-chitecture to facilitate long-range connections. Experiments on different large scale bench-marks show significant improvements in terms of performance

Page generated in 0.05 seconds