• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 266
  • 35
  • 14
  • 10
  • 5
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 386
  • 386
  • 386
  • 248
  • 165
  • 159
  • 141
  • 87
  • 84
  • 81
  • 79
  • 76
  • 70
  • 67
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
211

The Application of Index Based, Region Segmentation, and Deep Learning Approaches to Sensor Fusion for Vegetation Detection

Stone, David L. 01 January 2019 (has links)
This thesis investigates the application of index based, region segmentation, and deep learning methods to the sensor fusion of omnidirectional (O-D) Infrared (IR) sensors, Kinnect sensors, and O-D vision sensors to increase the level of intelligent perception for unmanned robotic platforms. The goals of this work is first to provide a more robust calibration approach and improve the calibration of low resolution and noisy IR O-D cameras. Then our goal was to explore the best approach to sensor fusion for vegetation detection. We looked at index based, region segmentation, and deep learning methods and compared them with a goal of significant reduction in false positives while maintaining reasonable vegetation detection. The results are as follows: Direct Spherical Calibration of the IR camera provided a more consistent and robust calibration board capture and resulted in the best overall calibration results with sub-pixel accuracy The best approach for sensor fusion for vegetation detection was the deep learning approach, the three methods are detailed in the following chapters with the results summarized here. Modified Normalized Difference Vegetation Index approach achieved 86.74% recognition and 32.5% false positive, with peaks to 80% Thermal Region Fusion (TRF) achieved a lower recognition rate at 75.16% but reduced false positives to 11.75% (a 64% reduction) Our Deep Learning Fusion Network (DeepFuseNet) results demonstrated that deep learning approach showed the best results with a significant (92%) reduction in false positives when compared to our modified normalized difference vegetation index approach. The recognition was 95.6% with 2% false positive. Current approaches are primarily focused on O-D color vision for localization, mapping, and tracking and do not adequately address the application of these sensors to vegetation detection. We will demonstrate the contradiction between current approaches and our deep sensor fusion (DeepFuseNet) for vegetation detection. The combination of O-D IR and O-D color vision coupled with deep learning for the extraction of vegetation material type, has great potential for robot perception. This thesis will look at two architectures: 1) the application of Autoencoders Feature Extractors feeding a deep Convolution Neural Network (CNN) fusion network (DeepFuseNet), and 2) Bottleneck CNN feature extractors feeding a deep CNN fusion network (DeepFuseNet) for the fusion of O-D IR and O-D visual sensors. We show that the vegetation recognition rate and the number of false detects inherent in the classical indices based spectral decomposition are greatly improved using our DeepFuseNet architecture. We first investigate the calibration of omnidirectional infrared (IR) camera for intelligent perception applications. The low resolution omnidirectional (O-D) IR image edge boundaries are not as sharp as with color vision cameras, and as a result, the standard calibration methods were harder to use and less accurate with the low definition of the omnidirectional IR camera. In order to more fully address omnidirectional IR camera calibration, we propose a new calibration grid center coordinates control point discovery methodology and a Direct Spherical Calibration (DSC) approach for a more robust and accurate method of calibration. DSC addresses the limitations of the existing methods by using the spherical coordinates of the centroid of the calibration board to directly triangulate the location of the camera center and iteratively solve for the camera parameters. We compare DSC to three Baseline visual calibration methodologies and augment them with additional output of the spherical results for comparison. We also look at the optimum number of calibration boards using an evolutionary algorithm and Pareto optimization to find the best method and combination of accuracy, methodology and number of calibration boards. The benefits of DSC are more efficient calibration board geometry selection, and better accuracy than the three Baseline visual calibration methodologies. In the context of vegetation detection, the fusion of omnidirectional (O-D) Infrared (IR) and color vision sensors may increase the level of vegetation perception for unmanned robotic platforms. A literature search found no significant research in our area of interest. The fusion of O-D IR and O-D color vision sensors for the extraction of feature material type has not been adequately addressed. We will look at augmenting indices based spectral decomposition with IR region based spectral decomposition to address the number of false detects inherent in indices based spectral decomposition alone. Our work shows that the fusion of the Normalized Difference Vegetation Index (NDVI) from the O-D color camera fused with the IR thresholded signature region associated with the vegetation region, minimizes the number of false detects seen with NDVI alone. The contribution of this work is the demonstration of two new techniques, Thresholded Region Fusion (TRF) technique for the fusion of O-D IR and O-D Color. We also look at the Kinect vision sensor fused with the O-D IR camera. Our experimental validation demonstrates a 64% reduction in false detects in our method compared to classical indices based detection. We finally compare our DeepFuseNet results with our previous work with Normalized Difference Vegetation index (NDVI) and IR region based spectral fusion. This current work shows that the fusion of the O-D IR and O-D visual streams utilizing our DeepFuseNet deep learning approach out performs the previous NVDI fused with far infrared region segmentation. Our experimental validation demonstrates an 92% reduction in false detects in our method compared to classical indices based detection. This work contributes a new technique for the fusion of O-D vision and O-D IR sensors using two deep CNN feature extractors feeding into a fully connected CNN Network (DeepFuseNet).
212

Obstacle Avoidance for an Autonomous Robot Car using Deep Learning / En autonom robotbil undviker hinder med hjälp av djupinlärning

Norén, Karl January 2019 (has links)
The focus of this study was deep learning. A small, autonomous robot car was used for obstacle avoidance experiments. The robot car used a camera for taking images of its surroundings. A convolutional neural network used the images for obstacle detection. The available dataset of 31 022 images was trained with the Xception model. We compared two different implementations for making the robot car avoid obstacles. Mapping image classes to steering commands was used as a reference implementation. The main implementation of this study was to separate obstacle detection and steering logic in different modules. The former reached an obstacle avoidance ratio of 80 %, the latter reached 88 %. Different hyperparameters were looked at during training. We found that frozen layers and number of epochs were important to optimize. Weights were loaded from ImageNet before training. Frozen layers decided how many layers that were trainable after that. Training all layers (no frozen layers) was proven to work best. Number of epochs decided how many epochs a model trained. We found that it was important to train between 10-25 epochs. The best model used no frozen layers and trained for 21 epochs. It reached a test accuracy of 85.2 %.
213

Detecção de distorção arquitetural mamária em mamografia digital utilizando rede neural convolucional profunda / Detection of architectural distortion in digital mammography using deep convolutional neural network

Costa, Arthur Chaves 08 March 2019 (has links)
A proposta deste trabalho foi analisar diferentes metodologias de treinamento de uma rede neural convolucional profunda (CNN) para a detecção de distorção arquitetural mamária (DA) em imagens de mamografia digital. A DA é uma contração sutil do tecido mamário que pode representar o sinal mais precoce de um câncer de mama em formação. Os sistemas computacionais de auxílio ao diagnóstico (CAD) existentes ainda apresentam desempenho insatisfatório para a detecção da DA. Sistemas baseados em CNN têm atraído a atenção da comunidade científica, inclusive na área médica para a otimização dos sistemas CAD. No entanto, as CNNs necessitam de um grande volume de dados para serem treinadas adequadamente, o que é particularmente difícil na área médica. Dessa forma, foi realizada neste trabalho, uma comparação de diferentes abordagens de treinamento para uma arquitetura CNN avaliando-se o efeito de técnicas de geração de novas amostras (data augmentation) sobre o desempenho da rede. Para isso, foram utilizadas 240 mamografias digitais clínicas. Uma das redes (CNN-SW) foi treinada com recortes extraídos por varredura em janela sobre a área interna da mama (aprox. 21600 em média) e a outra rede (CNN-SW+) contou com o mesmo conjunto ampliado por data augmentation (aprox. 345000 em média). Para avaliar o método, foi utilizada validação cruzada por k-fold, gerando-se em rodízio, 10 modelos de cada rede. Os testes analisaram todas as ROIs extraídas da mama, sendo testados 14 mamogramas por fold, e obtendo-se uma diferença estatisticamente significativa entre os resultados (AUC de 0,81 para a CNN-SW e 0,83 para a CNN-SW+). Mapas de calor ilustraram as predições da rede, permitindo uma análise visual e quantitativa do comportamento de ambos os modelos. / The purpose of this work was to analyze different training methodologies of a deep convolutional neural network (CNN) to detect breast architectural distortion (AD) in digital mammography images. AD is a subtle contraction of the breast tissue that may represent the earliest sign of a breast cancer in formation. Current Computer-Aided Detection (CAD) systems still have an unsatisfactory performance on AD detection. CNN-based systems have attracted the attention of the scientific community, including in the medical field for CAD optimization. However, CNNs require a large amount of data to be properly trained, which is particularly difficult in the medical field. Thus, in this work, different training approaches for a CNN architecture are compared evaluating the effect of data augmentation techniques on the data set. For this, 240 clinical digital mammography were used. One of the networks (CNN-SW) was trained with regions of interest (ROI) extracted by a sliding window over the inner breast area (approx 21600 on average) and the other network (CNN-SW+) had the same set enlarged by data augmentation (about 345000 on average). To evaluate the method, k-fold cross-validation was used, generating 10 instances of each model. The tests looked at all the ROIs extracted from the breast (14 mammograms per fold), and results showed a statistically significant difference between both networks (AUC of 0.81 for CNN-SW and 0.83 for CNN-SW+). Heat maps illustrated the predictions of the networks, allowing a visual and quantitative analysis of the behavior of both models.
214

Le mouvement en action : estimation du flot optique et localisation d'actions dans les vidéos / Motion in action : optical flow estimation and action localization in videos

Weinzaepfel, Philippe 23 September 2016 (has links)
Avec la récente et importante croissance des contenus vidéos, la compréhension automatique de vidéos est devenue un problème majeur.Ce mémoire présente plusieurs contributions sur deux tâches de la compréhension automatique de vidéos : l'estimation du flot optique et la localisation d'actions humaines.L'estimation du flot optique consiste à calculer le déplacement de chaque pixel d'une vidéo et fait face à plusieurs défis tels que les grands déplacements non rigides, les occlusions et les discontinuités du mouvement.Nous proposons tout d'abord une méthode pour le calcul du flot optique, basée sur un modèle variationnel qui incorpore une nouvelle méthode d'appariement.L'algorithme d'appariement proposé repose sur une architecture corrélationnelle hiérarchique à plusieurs niveaux et gère les déformations non rigides ainsi que les textures répétitives.Il permet d'améliorer l'estimation du flot en présence de changements d'apparence significatifs et de grands déplacements.Nous présentons également une nouvelle approche pour l'estimation du flot optique basée sur une interpolation dense de correspondances clairsemées tout en respectant les contours.Cette méthode tire profit d'une distance géodésique basée sur les contours qui permet de respecter les discontinuités du mouvement et de gérer les occlusions.En outre, nous proposons une approche d'apprentissage pour détecter les discontinuités du mouvement.Les motifs de discontinuité du mouvement sont prédits au niveau d'un patch en utilisant des forêts aléatoires structurées.Nous montrons expérimentalement que notre approche surclasse la méthode basique construite sur le gradient du flot tant sur des données synthétiques que sur des vidéos réelles.Nous présentons à cet effet une base de données contenant des vidéos d'utilisateurs.La localisation d'actions humaines consiste à reconnaître les actions présentes dans une vidéo, comme `boire' ou `téléphoner', ainsi que leur étendue temporelle et spatiale.Nous proposons tout d'abord une nouvelle approche basée sur les réseaux de neurones convolutionnels profonds.La méthode passe par l'extraction de tubes dépendants de la classe à détecter, tirant parti des dernières avancées en matière de détection et de suivi.La description des tubes est enrichie par des descripteurs spatio-temporels locaux.La détection temporelle est effectuée à l'aide d'une fenêtre glissante à l'intérieur de chaque tube.Notre approche surclasse l'état de l'art sur des bases de données difficiles de localisation d'actions.Deuxièmement, nous présentons une méthode de localisation d'actions faiblement supervisée, c'est-à-dire qui ne nécessite pas l'annotation de boîtes englobantes.Des candidats de localisation d'actions sont calculés en extrayant des tubes autour des humains.Cela est fait en utilisant un détecteur d'humains robuste aux poses inhabituelles et aux occlusions, appris sur une base de données de poses humaines.Un rappel élevé est atteint avec seulement quelques tubes, permettant d'appliquer un apprentissage à plusieurs instances.En outre, nous présentons une nouvelle base de données pour la localisation d'actions humaines.Elle surmonte les limitations des bases existantes, telles la diversité et la durée des vidéos.Notre approche faiblement supervisée obtient des résultats proches de celles totalement supervisées alors qu'elle réduit significativement l'effort d'annotations requis. / With the recent overwhelming growth of digital video content, automatic video understanding has become an increasingly important issue.This thesis introduces several contributions on two automatic video understanding tasks: optical flow estimation and human action localization.Optical flow estimation consists in computing the displacement of every pixel in a video andfaces several challenges including large non-rigid displacements, occlusions and motion boundaries.We first introduce an optical flow approach based on a variational model that incorporates a new matching method.The proposed matching algorithm is built upon a hierarchical multi-layer correlational architecture and effectively handles non-rigid deformations and repetitive textures.It improves the flow estimation in the presence of significant appearance changes and large displacements.We also introduce a novel scheme for estimating optical flow based on a sparse-to-dense interpolation of matches while respecting edges.This method leverages an edge-aware geodesic distance tailored to respect motion boundaries and to handle occlusions.Furthermore, we propose a learning-based approach for detecting motion boundaries.Motion boundary patterns are predicted at the patch level using structured random forests.We experimentally show that our approach outperforms the flow gradient baseline on both synthetic data and real-world videos,including an introduced dataset with consumer videos.Human action localization consists in recognizing the actions that occur in a video, such as `drinking' or `phoning', as well as their temporal and spatial extent.We first propose a novel approach based on Deep Convolutional Neural Network.The method extracts class-specific tubes leveraging recent advances in detection and tracking.Tube description is enhanced by spatio-temporal local features.Temporal detection is performed using a sliding window scheme inside each tube.Our approach outperforms the state of the art on challenging action localization benchmarks.Second, we introduce a weakly-supervised action localization method, ie, which does not require bounding box annotation.Action proposals are computed by extracting tubes around the humans.This is performed using a human detector robust to unusual poses and occlusions, which is learned on a human pose benchmark.A high recall is reached with only several human tubes, allowing to effectively apply Multiple Instance Learning.Furthermore, we introduce a new dataset for human action localization.It overcomes the limitations of existing benchmarks, such as the diversity and the duration of the videos.Our weakly-supervised approach obtains results close to fully-supervised ones while significantly reducing the required amount of annotations.
215

Evaluating Deep Learning Algorithms for Steering an Autonomous Vehicle / Utvärdering av Deep Learning-algoritmer för styrning av ett självkörande fordon

Magnusson, Filip January 2018 (has links)
With self-driving cars on the horizon, vehicle autonomy and its problems is a hot topic. In this study we are using convolutional neural networks to make a robot car avoid obstacles. The robot car has a monocular camera, and our approach is to use the images taken by the camera as input, and then output a steering command. Using this method the car is to avoid any object in front of it. In order to lower the amount of training data we use models that are pretrained on ImageNet, a large image database containing millions of images. The model are then trained on our own dataset, which contains of images taken directly by the robot car while driving around. The images are then labeled with the steering command used while taking the image. While training we experiment with using different amounts of frozen layers. A frozen layer is a layer that has been pretrained on ImageNet, but are not trained on our dataset. The Xception, MobileNet and VGG16 architectures are tested and compared to each other. We find that a lower amount of frozen layer produces better results, and our best model, which used the Xception architecture, achieved 81.19% accuracy on our test set. During a qualitative test the car avoid collisions 78.57% of the time.
216

Apprentissage de représentations pour la reconnaissance visuelle / Learning representations for visual recognition

Saxena, Shreyas 12 December 2016 (has links)
Dans cette dissertation, nous proposons des méthodes d’apprentissage automa-tique aptes à bénéficier de la récente explosion des volumes de données digitales.Premièrement nous considérons l’amélioration de l’efficacité des méthodes derécupération d’image. Nous proposons une approche d’apprentissage de métriques locales coordonnées (Coordinated Local Metric Learning, CLML) qui apprends des métriques locales de Mahalanobis, puis les intègre dans une représentation globale où la distance l2 peut être utilisée. Ceci permet de visualiser les données avec une unique représentation 2D, et l’utilisation de méthodes de récupération efficaces basées sur la distance l2. Notre approche peut être interprétée comme l’apprentissage d’une projection linéaire de descripteurs donnés par une méthode a noyaux de grande dimension définie explictement. Cette interprétation permet d’appliquer des outils existants pour l’apprentissage de métriques de Mahalanobis à l’apprentissage de métriques locales coordonnées. Nos expériences montrent que la CLML amé-liore les résultats en matière de récupération de visage obtenues par les approches classiques d’apprentissage de métriques locales et globales.Deuxièmement, nous présentons une approche exploitant les modèles de ré-seaux neuronaux convolutionnels (CNN) pour la reconnaissance faciale dans lespectre visible. L’objectif est l’amélioration de la reconnaissance faciale hétérogène, c’est à dire la reconnaissance faciale à partir d’images infra-rouges avec des images d’entraînement dans le spectre visible. Nous explorerons différentes stratégies d’apprentissage de métriques locales à partir des couches intermédiaires d’un CNN, afin de faire le rapprochement entre des images de sources différentes. Dans nos expériences, la profondeur de la couche optimale pour une tâche donnée est positivement corrélée avec le changement entre le domaine source (données d’entraînement du CNN) et le domaine cible. Les résultats montrent que nous pouvons utiliser des CNN entraînés sur des images du spectre visible pour obtenir des résultats meilleurs que l’état de l’art pour la reconnaissance faciale hétérogène (images et dessins quasi-infrarouges).Troisièmement, nous présentons les "tissus de neurones convolutionnels" (Convolutional Neural Fabrics) permettant l’exploration de l’espace discret et exponentiellement large des architectures possibles de réseaux neuronaux, de manière efficiente et systématique. Au lieu de chercher à sélectionner une seule architecture optimale, nous proposons d’utiliser un "tissu" d’architectures combinant un nombre exponentiel d’architectures en une seule. Le tissu est une représentation 3D connectant les sorties de CNNs à différentes couches, échelles et canaux avec un motif de connectivité locale, homogène et creux. Les seuls hyper-paramètres du tissu (le nombre de canaux et de couches) ne sont pas critiques pour la performance. La nature acyclique du tissu nous permet d’utiliser la rétro-propagation du gradient durant la phase d’apprentissage. De manière automatique, nous pouvons donc configurer le tissu de manière à implémenter l’ensemble de toutes les architectures possibles (un nombre exponentiel) et, plus généralement, des ensembles (combinaisons) de ces modèles. La complexité de calcul et de taille mémoire du tissu évoluent de manière linéaire alors qu’il permet d’exploiter un nombre exponentiel d’architectures en parallèle, en partageant les paramètres entre architectures. Nous présentons des résultats à l’état de l’art pour la classification d’images sur le jeu de données MNIST et CIFAR10, et pour la segmentation sémantique sur le jeu de données Part Labels. / In this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset
217

[en] MODELING OF GEOBODIES: AI FOR SEISMIC FAULT DETECTION AND ALL-QUADRILATERAL MESH GENERATION / [pt] MODELAGEM DE OBJETOS GEOLÓGICOS: IA PARA DETECÇÃO AUTOMÁTICA DE FALHAS E GERAÇÃO DE MALHAS DE QUADRILÁTEROS

AXELLE DANY JULIETTE POCHET 14 December 2018 (has links)
[pt] A exploração segura de reservatórios de petróleo necessita uma boa modelagem numérica dos objetos geológicos da sub superfície, que inclui entre outras etapas: interpretação sísmica e geração de malha. Esta tese apresenta um estudo nessas duas áreas. O primeiro estudo é uma contribuição para interpretação de dados sísmicos, que se baseia na detecção automática de falhas sísmicas usando redes neurais profundas. Em particular, usamos Redes Neurais Convolucionais (RNCs) diretamente sobre mapas de amplitude sísmica, com a particularidade de usar dados sintéticos para treinar a rede com o objetivo final de classificar dados reais. Num segundo estudo, propomos um novo algoritmo para geração de malhas bidimensionais de quadrilaterais para estudos geomecânicos, baseado numa abordagem inovadora do método de quadtree: definimos novos padrões de subdivisão para adaptar a malha de maneira eficiente a qualquer geometria de entrada. As malhas obtidas podem ser usadas para simulações com o Método de Elementos Finitos (MEF). / [en] Safe oil exploration requires good numerical modeling of the subsurface geobodies, which includes among other steps: seismic interpretation and mesh generation. This thesis presents a study in these two areas. The first study is a contribution to data interpretation, examining the possibilities of automatic seismic fault detection using deep learning methods. In particular, we use Convolutional Neural Networks (CNNs) on seismic amplitude maps, with the particularity to use synthetic data for training with the goal to classify real data. In the second study, we propose a new two-dimensional all-quadrilateral meshing algorithm for geomechanical domains, based on an innovative quadtree approach: we define new subdivision patterns to efficiently adapt the mesh to any input geometry. The resulting mesh is suited for Finite Element Method (FEM) simulations.
218

Organ Detection and Localization in Radiological Image Volumes

Linder, Tova, Jigin, Ola January 2017 (has links)
Using Convolutional Neural Networks for classification of images and for localization and detection of objects in images is becoming increasingly popular. Within radiology a huge amount of image data is produced and meta data containing information of what the images depict is currently added manually by a radiologist. To aid in streamlining physician’s workflow this study has investigated the possibility to use Convolutional Neural Networks (CNNs) that are pre-trained on natural images to automatically detect the presence and location of multiple organs and body-parts in medical CT images. The results show promise for multiclass classification with an average precision 89.41% and average recall 86.40%. This also confirms that a CNN that is pre-trained on natural images can be succesfully transferred to solve a different task. It was also found that adding additional data to the dataset does not necessarily result in increased precision and recall or decreased error rate. It is rather the type of data and used preprocessing techniques that matter.
219

Data-efficient Transfer Learning with Pre-trained Networks

Lundström, Dennis January 2017 (has links)
Deep learning has dominated the computer vision field since 2012, but a common criticism of deep learning methods is their dependence on large amounts of data. To combat this criticism research into data-efficient deep learning is growing. The foremost success in data-efficient deep learning is transfer learning with networks pre-trained on the ImageNet dataset. Pre-trained networks have achieved state-of-the-art performance on many tasks. We consider the pre-trained network method for a new task where we have to collect the data. We hypothesize that the data efficiency of pre-trained networks can be improved through informed data collection. After exhaustive experiments on CaffeNet and VGG16, we conclude that the data efficiency indeed can be improved. Furthermore, we investigate an alternative approach to data-efficient learning, namely adding domain knowledge in the form of a spatial transformer to the pre-trained networks. We find that spatial transformers are difficult to train and seem to not improve data efficiency.
220

SiameseVO-Depth: odometria visual através de redes neurais convolucionais siamesas / SiameseVO-Depth: visual odometry through siamese neural networks

Santos, Vinícius Araújo 11 October 2018 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2018-11-21T11:05:44Z No. of bitstreams: 2 Dissertação - Vinícius Araújo Santos - 2018.pdf: 14601054 bytes, checksum: e02a8bcd3cdc93bf2bf202c3933b3f27 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2018-11-21T11:06:26Z (GMT) No. of bitstreams: 2 Dissertação - Vinícius Araújo Santos - 2018.pdf: 14601054 bytes, checksum: e02a8bcd3cdc93bf2bf202c3933b3f27 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-11-21T11:06:26Z (GMT). No. of bitstreams: 2 Dissertação - Vinícius Araújo Santos - 2018.pdf: 14601054 bytes, checksum: e02a8bcd3cdc93bf2bf202c3933b3f27 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2018-10-11 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Visual Odometry is an important process in image based navigation of robots. The standard methods of this field rely on the good feature matching between frames where feature detection on images stands as a well adressed problem within Computer Vision. Such techniques are subject to illumination problems, noise and poor feature localization accuracy. Thus, 3D information on a scene may mitigate the uncertainty of the features on images. Deep Learning techniques show great results when dealing with common difficulties of VO such as low illumination conditions and bad feature selection. While Visual Odometry and Deep Learning have been connected previously, no techniques applying Siamese Convolutional Networks on depth infomation given by disparity maps have been acknowledged as far as this work’s researches went. This work aims to fill this gap by applying Deep Learning to estimate egomotion through disparity maps on an Siamese architeture. The SiameseVO-Depth architeture is compared to state of the art techniques on OV by using the KITTI Vision Benchmark Suite. The results reveal that the chosen methodology succeeded on the estimation of Visual Odometry although it doesn’t outperform the state-of-the-art techniques. This work presents fewer steps in relation to standard VO techniques for it consists of an end-to-end solution and demonstrates a new approach of Deep Learning applied to Visual Odometry. / Odometria Visual é um importante processo na navegação de robôs baseada em imagens. Os métodos clássicos deste tema dependem de boas correspondências de características feitas entre imagens sendo que a detecção de características em imagens é um tema amplamente discutido no campo de Visão Computacional. Estas técnicas estão sujeitas a problemas de iluminação, presença de ruído e baixa de acurácia de localização. Nesse contexto, a informação tridimensional de uma cena pode ser uma forma de mitigar as incertezas sobre as características em imagens. Técnicas de Deep Learning têm demonstrado bons resultados lidando com problemas comuns em técnicas de OV como insuficiente iluminação e erros na seleção de características. Ainda que já existam trabalhos que relacionam Odometria Visual e Deep Learning, não foram encontradas técnicas que utilizem Redes Convolucionais Siamesas com sucesso utilizando informações de profundidade de mapas de disparidade durante esta pesquisa. Este trabalho visa preencher esta lacuna aplicando Deep Learning na estimativa do movimento por de mapas de disparidade em uma arquitetura Siamesa. A arquitetura SiameseVO-Depth proposta neste trabalho é comparada à técnicas do estado da arte em OV utilizando a base de dados KITTI Vision Benchmark Suite. Os resultados demonstram que através da metodologia proposta é possível a estimativa dos valores de uma Odometria Visual ainda que o desempenho não supere técnicas consideradas estado da arte. O trabalho proposto possui menos etapas em comparação com técnicas clássicas de OV por apresentar-se como uma solução fim-a-fim e apresenta nova abordagem no campo de Deep Learning aplicado à Odometria Visual.

Page generated in 0.3237 seconds