Global ETD Search

91	O algoritmo de aprendizado semi-supervisionado co-training e sua aplicação na rotulação de documentos / The semi-supervised learning algorithm co-training applied to label text documents Edson Takashi Matsubara 26 May 2004 (has links) Em Aprendizado de Máquina, a abordagem supervisionada normalmente necessita de um número significativo de exemplos de treinamento para a indução de classificadores precisos. Entretanto, a rotulação de dados é freqüentemente realizada manualmente, o que torna esse processo demorado e caro. Por outro lado, exemplos não-rotulados são facilmente obtidos se comparados a exemplos rotulados. Isso é particularmente verdade para tarefas de classificação de textos que envolvem fontes de dados on-line tais como páginas de internet, email e artigos científicos. A classificação de textos tem grande importância dado o grande volume de textos disponível on-line. Aprendizado semi-supervisionado, uma área de pesquisa relativamente nova em Aprendizado de Máquina, representa a junção do aprendizado supervisionado e não-supervisionado, e tem o potencial de reduzir a necessidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados está disponível. Este trabalho descreve o algoritmo de aprendizado semi-supervisionado co-training, que necessita de duas descrições de cada exemplo. Deve ser observado que as duas descrições necessárias para co-training podem ser facilmente obtidas de documentos textuais por meio de pré-processamento. Neste trabalho, várias extensões do algoritmo co-training foram implementadas. Ainda mais, foi implementado um ambiente computacional para o pré-processamento de textos, denominado PreTexT, com o objetivo de utilizar co-training em problemas de classificação de textos. Os resultados experimentais foram obtidos utilizando três conjuntos de dados. Dois conjuntos de dados estão relacionados com classificação de textos e o outro com classificação de páginas de internet. Os resultados, que variam de excelentes a ruins, mostram que co-training, similarmente a outros algoritmos de aprendizado semi-supervisionado, é afetado de maneira bastante complexa pelos diferentes aspectos na indução dos modelos. / In Machine Learning, the supervised approach usually requires a large number of labeled training examples to learn accurately. However, labeling is often manually performed, making this process costly and time-consuming. By contrast, unlabeled examples are often inexpensive and easier to obtain than labeled examples. This is especially true for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Text classification is of great practical importance today given the massive volume of online text available. Semi-supervised learning, a relatively new area in Machine Learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labeled data whenever only a small set of labeled examples is available. This work describes the semi-supervised learning algorithm co-training, which requires a partitioned description of each example into two distinct views. It should be observed that the two different views required by co-training can be easily obtained from textual documents through pre-processing. In this works, several extensions of co-training algorithm have been implemented. Furthermore, we have also implemented a computational environment for text pre-processing, called PreTexT, in order to apply the co-training algorithm to text classification problems. Experimental results using co-training on three data sets are described. Two data sets are related to text classification and the other one to web-page classification. Results, which range from excellent to poor, show that co-training, similarly to other semi-supervised learning algorithms, is affected by modelling assumptions in a rather complicated way. aprendizado de máquina aprendizado multi-visão aprendizado semi-supervisionado co-training mineração de textos pré-processamento de textos co-training machine learning multi-view learning semi-supervised learning text mining text pre-processing
92	Forêt aléatoire pour l'apprentissage multi-vues basé sur la dissimilarité : Application à la Radiomique / Random forest for dissimilarity based multi-view learning : application to radiomics Cao, Hongliu 02 December 2019 (has links) Les travaux de cette thèse ont été initiés par des problèmes d’apprentissage de données radiomiques. La Radiomique est une discipline médicale qui vise l’analyse à grande échelle de données issues d’imageries médicales traditionnelles, pour aider au diagnostic et au traitement des cancers. L’hypothèse principale de cette discipline est qu’en extrayant une grande quantité d’informations des images, on peut caractériser de bien meilleure façon que l’œil humain les spécificités de cette pathologie. Pour y parvenir, les données radiomiques sont généralement constituées de plusieurs types d’images et/ou de plusieurs types de caractéristiques (images, cliniques, génomiques). Cette thèse aborde ce problème sous l’angle de l’apprentissage automatique et a pour objectif de proposer une solution générique, adaptée à tous problèmes d’apprentissage du même type. Nous identifions ainsi en Radiomique deux problématiques d’apprentissage: (i) l’apprentissage de données en grande dimension et avec peu d’instances (high dimension, low sample size, a.k.a.HDLSS) et (ii) l’apprentissage multi-vues. Les solutions proposées dans ce manuscrit exploitent des représentations de dissimilarités obtenues à l’aide des Forêts Aléatoires. L’utilisation d’une représentation par dissimilarité permet de contourner les difficultés inhérentes à l’apprentissage en grande dimension et facilite l’analyse conjointe des descriptions multiples (les vues). Les contributions de cette thèse portent sur l’utilisation de la mesure de dissimilarité embarquée dans les méthodes de Forêts Aléatoires pour l’apprentissage multi-vue de données HDLSS. En particulier, nous présentons trois résultats: (i) la démonstration et l’analyse de l’efficacité de cette mesure pour l’apprentissage multi-vue de données HDLSS; (ii) une nouvelle méthode pour mesurer les dissimilarités à partir de Forêts Aléatoires, plus adaptée à ce type de problème d’apprentissage; et (iii) une nouvelle façon d’exploiter l’hétérogénèité des vues, à l’aide d’un mécanisme de combinaison dynamique. Ces résultats ont été obtenus sur des données radiomiques mais aussi sur des problèmes multi-vue classiques. / The work of this thesis was initiated by a Radiomic learning problem. Radiomics is a medical discipline that aims at the large-scale analysis of data from traditional medical imaging to assist in the diagnosis and treatment of cancer. The main hypothesis of this discipline is that by extracting a large amount of information from the images, we can characterize the specificities of this pathology in a much better way than the human eye. To achieve this, Radiomics data are generally based on several types of images and/or several types of features (from images, clinical, genomic). This thesis approaches this problem from the perspective of Machine Learning (ML) and aims to propose a generic solution, adapted to any similar learning problem. To do this, we identify two types of ML problems behind Radiomics: (i) learning from high dimension, low sample size (HDLSS) and (ii) multiview learning. The solutions proposed in this manuscript exploit dissimilarity representations obtained using the Random Forest method. The use of dissimilarity representations makes it possible to overcome the well-known difficulties of learning high dimensional data, and to facilitate the joint analysis of the multiple descriptions, i.e. the views.The contributions of this thesis focus on the use of the dissimilarity easurement embedded in the Random Forest method for HDLSS multi-view learning. In particular, we present three main results: (i) the demonstration and analysis of the effectiveness of this measure for HDLSS multi-view learning; (ii) a new method for measuring dissimilarities from Random Forests, better adapted to this type of learning problem; and (iii) a new way to exploit the heterogeneity of views, using a dynamic combination mechanism. These results have been obtained on radiomic data but also on classical multi-view learning problems. Espace de dissimilarité Forêt aléatoire Apprentissage multi-vue Dimension élevée Taille réduite de l'échantillon Apprentissage de dissimilarité Sélection dynamique Dissimilarity space Random forest Multi-view learning High dimension Low sample size Dissimilarity learning Dynamic selection 006.3
93	Recognition of Incomplete Objects based on Synthesis of Views Using a Geometric Based Local-Global Graphs Robbeloth, Michael Christopher 31 May 2019 (has links) No description available. Computer Science incomplete objects obstruction Local-Global L-G Graph synthesis of views recognition six-sided views multi-view chain code geometric segmentation computer vision image processing graph algorithm machine learning Delaunay triangulations
94	Human Pose and Action Recognition using Negative Space Analysis Janse Van Vuuren, Michaella 12 1900 (has links) This thesis proposes a novel approach to extracting pose information from image sequences. Current state of the art techniques focus exclusively on the image space occupied by the body for pose and action recognition. The method proposed here, however, focuses on the negative spaces: the areas surrounding the individual. This has resulted in the colour-coded negative space approach, an image preprocessing step that circumvents the need for complicated model fitting or template matching methods. The approach can be described as follows: negative spaces surrounding the human silhouette are extracted using horizontal and vertical scanning processes. These negative space areas are more numerous, and undergo more radical changes in shape than the single area occupied by the figure of the person performing an action. The colour-coded negative space representation is formed using the four binary images produced by the scanning processes. Features are then extracted from the colour-coded images. These are based on the percentage of area occupied by distinct coloured regions as well as the bounding box proportions. Pose clusters are identified using feedback from an independent action set. Subsequent images are classified using a simple Euclidean distance measure. An image sequence is thus temporally segmented into its corresponding pose representations. Action recognition simply becomes the detection of a temporally ordered sequence of poses that characterises the action. The method is purely vision-based, utilising monocular images with no need for body markers or special clothing. Two datasets were constructed using several actors performing different poses and actions. Some of these actions included actors waving their arms, sitting down or kicking a leg. These actions were recorded against a monochrome background to simplify the segmentation of the actors from the background. The actions were then recorded on DV cam and digitised into a data base. The silhouette images from these actions were isolated and placed in a frame or bounding box. The next step was to highlight the negative spaces using a directional scanning method. This scanning method colour-codes the negative spaces of each action. What became immediately apparent is that very distinctive colour patterns formed for different actions. To emphasise the action, different colours were allocated to negative spaces surrounding the image. For example, the space between the legs of an actor standing in a T - pose with legs apart would be allocated yellow, while the space below the arms were allocated different shades of green. The space surrounding the head would be different shades of purple. During an action when the actor moves one leg up in a kicking fashion, the yellow colour would increase. Inversely, when the actor closes his legs and puts them together, the yellow colour filling the negative space would decrease substantially. What also became apparent is that these coloured negative spaces are interdependent and that they influence each other during the course of an action. For example, when an actor lifts one of his legs, increasing the yellow-coded negative space, the green space between that leg and the arm decreases. This interrelationship between colours hold true for all poses and actions as presented in this thesis. In terms of pose recognition, it is significant that these colour coded negative spaces and the way the change during an action or a movement are substantial and instantly recognisable. Compare for example, looking at someone lifting an arm as opposed to seeing a vast negative space changing shape. In a controlled research environment, several actors were instructed to perform a number of different actions. After colour coding the negative spaces, it became apparent that every action can be recognised by a unique colour coded pattern. The challenge is to ascribe a numerical presentation, a mathematical quotation, to extract the essence of what is so visually apparent. The essence of pose recognition and it's measurability lies in the relationship between the colours in these negative spaces and how they impact on each other during a pose or an action. The simplest way of measuring this relationship is by calculating the percentage of each colour present during an action. These calculated percentages become the basis of pose and action recognition. By plotting these percentages on a graph confirms that the essence of these different actions and poses can in fact been captured and recognised. Despite variations in these traces caused by time differences, personal appearance and mannerisms, what emerged is a clear recognisable pattern that can be married to an action or different parts of an action. 7 Actors might lift their left leg, some slightly higher than others, some slower than others and these variations in terms of colour percentages would be recorded as a trace, but there would be very specific stages during the action where the traces would correspond, making the action recognisable.In conclusion, using negative space as a tool in human pose and tracking recognition presents an exiting research avenue because it is influenced less by variations such as difference in personal appearance and changes in the angle of observation. This approach is also simplistic and does not rely on complicated models and templates Image processing pattern recognition computer vision surveillance chroma-key silhouette preprocessing color coded colour-coded horizontal scanning vertical scanning RGB colour image bounding box multi-view multi view aerobics exercise feature extraction clustering over generalisation over fitting k-means SOM self organising map automatic partitioning pose labelling pose clusters pose classification correlation sequence feedback recognising actions visualization feature plots direct able directable character database animated sequences . QA75 NX QA76 TA T1 TK Q1
95	Modélisation de scènes urbaines à partir de données aériennes / Urban scene modeling from airborne data Verdie, Yannick 15 October 2013 (has links) L'analyse et la reconstruction automatique de scène urbaine 3D est un problème fondamental dans le domaine de la vision par ordinateur et du traitement numérique de la géométrie. Cette thèse présente des méthodologies pour résoudre le problème complexe de la reconstruction d'éléments urbains en 3D à partir de données aériennes Lidar ou bien de maillages générés par imagerie Multi-View Stereo (MVS). Nos approches génèrent une représentation précise et compacte sous la forme d'un maillage 3D comportant une sémantique de l'espace urbain. Deux étapes sont nécessaires ; une identification des différents éléments de la scène urbaine, et une modélisation des éléments sous la forme d'un maillage 3D. Le Chapitre 2 présente deux méthodes de classifications des éléments urbains en classes d'intérêts permettant d'obtenir une compréhension approfondie de la scène urbaine, et d'élaborer différentes stratégies de reconstruction suivant le type d'éléments urbains. Cette idée, consistant à insérer à la fois une information sémantique et géométrique dans les scènes urbaines, est présentée en détails et validée à travers des expériences. Le Chapitre 3 présente une approche pour détecter la 'Végétation' incluses dans des données Lidar reposant sur les processus ponctuels marqués, combinée avec une nouvelle méthode d'optimisation. Le Chapitre 4 décrit à la fois une approche de maillage 3D pour les 'Bâtiments' à partir de données Lidar et de données MVS. Des expériences sur des structures urbaines larges et complexes montrent les bonnes performances de nos systèmes. / Analysis and 3D reconstruction of urban scenes from physical measurements is a fundamental problem in computer vision and geometry processing. Within the last decades, an important demand arises for automatic methods generating urban scenes representations. This thesis investigates the design of pipelines for solving the complex problem of reconstructing 3D urban elements from either aerial Lidar data or Multi-View Stereo (MVS) meshes. Our approaches generate accurate and compact mesh representations enriched with urban-related semantic labeling.In urban scene reconstruction, two important steps are necessary: an identification of the different elements of the scenes, and a representation of these elements with 3D meshes. Chapter 2 presents two classification methods which yield to a segmentation of the scene into semantic classes of interests. The beneath is twofold. First, this brings awareness of the scene for better understanding. Second, deferent reconstruction strategies are adopted for each type of urban elements. Our idea of inserting both semantical and structural information within urban scenes is discussed and validated through experiments. In Chapter 3, a top-down approach to detect 'Vegetation' elements from Lidar data is proposed using Marked Point Processes and a novel optimization method. In Chapter 4, bottom-up approaches are presented reconstructing 'Building' elements from Lidar data and from MVS meshes. Experiments on complex urban structures illustrate the robustness and scalability of our systems. Traitement de l'image Traitement de la géométrie Modélisation 3D Processus ponctuels marqués LiDAR Minimisation d'énergie Champs aléatoires de Markov Image processing Geometry processing Urban scene reconstruction Scene understanding 3D modeling Marked point processes LiDAR data Multi-view stereo data Energy minimization Markov random field
96	Modèles spectraux à transferts de flux appliqués à la prédiction de couleurs sur des surfaces imprimées en demi-ton / Flux transfer spectral models for predicting colors of duplex halftone prints Mazauric, Serge 07 December 2016 (has links) La protection des documents fiduciaires et identitaires contre la fraude exige le développement d’outils de contrôle fondés sur des effets visuels sans cesse renouvelés, difficiles à contrefaire (même pour un expert ... de la contrefaçon !). Ce projet de recherche s’inscrit dans cette problématique et vise à apporter des solutions originales via l’impression de supports diffusants d’une part, et le développement de modèles de rendu visuel d’autre part. Les effets visuels recherchés sont des ajustements de couleurs entre les deux faces d’un imprimé lorsque celui-ci est observé par transparence devant une source lumineuse. Pour obtenir facilement des ajustements de couleurs quelles que soient les couleurs visées, il est capital d’avoir un modèle à disposition, permettant de calculer les quantités d’encre à déposer. Un modèle doit être capable de prédire les facteurs spectraux de réflexion et de transmission du support imprimé en décrivant les phénomènes de diffusion optique présents en pratique dans les couches d’encre et le support. Nous nous intéressons plus particulièrement aux imprimés translucides contenant des couleurs en demi-ton des deux côtés de la surface avec pour objectif de prédire le rendu visuel pour diverses configurations d’observation. Pour cela, nous proposons une nouvelle approche basée sur l’utilisation de matrices de transfert de flux pour prédire les facteurs spectraux de réflexion et de transmission des imprimés lorsqu’ils sont éclairés simultanément des deux côtés. En représentant le comportement optique des différents composants d’un imprimé par des matrices de transfert, la description des transferts de flux entre ces composantes s’en trouve simplifiée. Ce cadre mathématique mène à la construction de modèles de prédiction de couleurs imprimées en demi-ton sur des supports diffusants. Nous montrons par ailleurs que certains modèles existants, comme le modèle de Kubelka-Munk ou encore le modèle de Clapper-Yule, peuvent également être formulés en termes de matrices de transfert. Les résultats obtenus avec les modèles proposés dans ce travail mettent en évidence des qualités de prédiction équivalentes, voire supérieures, à celles qu’on retrouve dans l’état de l’art, tout en proposant une simplification de la formulation mathématique et de la description physique des échanges de flux. Cette simplification fait de ces modèles des outils de calcul qui s’utilisent très facilement, notamment pour la détermination des quantités d’encre à déposer sur les deux faces de l’imprimé afin d’obtenir des ajustements de couleurs / The protection of banknotes or identity documents against counterfeiting demands the development of control tools based on visual effects that are continuously renewed. These visual effects become thus difficult to counterfeit even by an expert forger ! This research tries to deal with that issue. Its objective is to bring new solutions using on the one side, the printing of diffusing materials, and on the other side the development of visual rendering models that can be observed. The visual effects that are sought-after are the color matching on both sides of a printed document when observed against thelight. To easily obtain a color matching, whatever the colors that are aimed for, it is essential to have a model that helps in calculating the quantity of ink to be left on the document. A model must be used to predict the spectral reflectance and the transmittance factors of the printed document by describing the phenomena of optical diffusion really present in the ink layers and in the document. We shall focus our interest especially on translucent printed documents that have halftone colors on both sides. Our goal here is to predict the visual rendering in different configurations of observation. To that end, we are offering a new approach based on the use of flux transfer matrices to predict the spectral reflectance and transmittance factors of prints when they are simultaneously lit up on both sides. By representing with transfer matrices the optical behavior of the different components present in a printed document, we see that the description of flux transfer between these elements is thus simplified. This mathematical framework leads to the construction of prediction models of halftone printed colors on diffusing materials. We also show that some existing models, such as the Kubelka-Munk or the Clapper-Yule models, can also be formulated in transfer matrices terms. The results that we get with the models used in this work make apparent identical prediction quality and in some cases even better ones to the ones found in the state of the art, while offering a simplification of the mathematical formulation and the physical description of the flux transfer. This simplification thus transforms these models into calculation tools that can easily be used especially for the choice of quantities of ink that must be left on both sides of the document in order to obtain color matching Impression recto-verso Rendu visuel Couleurs en demi-ton Modèle de prédiction de couleurs Transferts de flux Réflectance et transmittance spectrales Matrice de transfert Ajustement de couleurs Images multi-vues Duplex printing Visual rendering Halftone colors Colors prediction model Flux transfer Transfer matrices Color matching Multi-view imaging
97	Reconstruction multi-vues et texturation Aganj, Ehsan 11 December 2009 (has links) (PDF) Dans cette thèse, nous étudions les problèmes de reconstruction statique et dynamique à partir de vues multiples et texturation, en s'appuyant sur des applications réelles et pratiques. Nous proposons trois méthodes de reconstruction destinées à l'estimation d'une représentation d'une scène statique/dynamique à partir d'un ensemble d'images/vidéos. Nous considérons ensuite le problème de texturation multi-vues en se concentrant sur la qualité visuelle de rendu.. Multi-view reconstruction dynamic reconstruction stereovision sur- face reconstruction point cloud Delaunay triangulation Voronoi diagram medial axis transform cell complex minimum s-t cut simulated annealing visibility thin- plate spline texturing
98	Compression des données Multi-View-plus-Depth (MVD): de l'analyse de la qualité perçue à l'élaboration d'outils pour le codage des données MVD Bosc, Emilie 22 October 2012 (has links) (PDF) Cette thèse aborde la problématique de compression des vidéos multi-vues avec pour pilier un souci constant du respect de la perception humaine du media, dans le contexte de la vidéo 3D. Les études et les choix portés durant cette thèse se veulent orientés par la recherche de la meilleure qualité perçue possible des vues synthétisées. L'enjeu des travaux que de cette thèse réside dans l'investigation de nouvelles techniques de compression des données multi-view-plus-depth (MVD) limitant autant que possible les dégradations perceptibles sur les vues synthétisées à partir de ces données décodées. La difficulté vient du fait que les sources de dégradations des vues synthétisées sont d'une part multiples et d'autre part difficilement mesurables par les techniques actuelles d'évaluation de qualité d'images. Pour cette raison, les travaux de cette thèse s'articulent autour de deux axes principaux: l'évaluation de la qualité des vues synthétisées ainsi que les artefacts spécifiques et l'étude de schémas de compression des données MVD aidée de critères perceptuels. Durant cette thèse nous avons réalisé des études pour caractériser les artefacts liés aux algorithmes DIBR. Les analyses des tests de Student réalisés à partir des scores des tests de Comparaisons par paires et ACR-HR ont permis de déterminer l'adéquation des méthodes d'évaluation subjective de qualité pour le cas des vues synthétisées. L'évaluation des métriques objectives de qualité d'image/vidéo ont également permis d'établir leur corrélation avec les scores subjectifs. Nous nous sommes ensuite concentrés sur la compression des cartes de profondeur, en proposant deux méthodes dérivées pour le codage des cartes de profondeur et basées sur la méthode LAR. En nous appuyant sur nos observations, nous avons proposé une stratégie de représentation et de codage adaptée au besoin de préserver les discontinuités de la carte tout en réalisant des taux de compression importants. Les comparaisons avec les codecs de l'état de l'art (H.264/AVC, HEVC) montrent que notre méthode propose des images de meilleure qualité visuelle à bas débit. Nous avons également réalisé des études sur la répartition du débit entre la texture et la profondeur lors de la compression de séquences MVD. Les résultats de cette thèse peuvent être utilisés pour aider à la conception de nouveaux protocoles d'évaluation de qualité de données de synthèse; pour la conception de nouvelles métriques de qualité; pour améliorer les schémas de codage pour les données MVD, notamment grâce aux approches originales proposées; pour optimiser les schémas de codage de données MVD, à partir de nos études sur les relations entre la texture et la profondeur. Vidéo 3D évaluation de qualité compression multi-view MVD ACR-HR PC H.264 HEVC LAR 3DTV FTV carte de profondeur
99	3D Video Playback : A modular cross-platform GPU-based approach for flexible multi-view 3D video rendering Andersson, Håkan January 2010 (has links) The evolution of depth‐perception visualization technologies, emerging format standardization work and research within the field of multi‐view 3D video and imagery addresses the need for flexible 3D video visualization. The wide variety of available 3D‐display types and visualization techniques for multi‐view video, as well as the high throughput requirements for high definition video, addresses the need for a real‐time 3D video playback solution that takes advantage of hardware accelerated graphics, while providing a high degree of flexibility through format configuration and cross‐platform interoperability. A modular component based software solution based on FFmpeg for video demultiplexing and video decoding is proposed,using OpenGL and GLUT for hardware accelerated graphics and POSIX threads for increased CPU utilization. The solution has been verified to have sufficient throughput in order to display 1080p video at the native video frame rate on the experimental system, which is considered as a standard high‐end desktop PC only using commercial hardware. In order to evaluate the performance of the proposed solution a number of throughput evaluation metrics have been introduced measuring average frame rate as a function of: video bit rate, video resolution and number of views. The results obtained have indicated that the GPU constitutes the primary bottleneck in a multi‐view lenticular rendering system and that multi‐view rendering performance is degraded as the number of views is increased. This is a result of the current GPU square matrix texture cache architectures, resulting in texture lookup access times according to random memory access patterns when the number of views is high. The proposed solution has been identified in order to provide low CPU efficiency, i.e. low CPU hardware utilization and it is recommended to increase performance by investigating the gains of scalable multithreading techniques. It is also recommended to investigate the gains of introducing video frame buffering in video memory or to move more calculations to the CPU in order to increase GPU performance. 3D Video Player Multi-view Video Lenticular Rendering Auto-stereoscopy 3D Visualization FFmpeg GPU OpenGL C. 3D Video Videospelare Visualisering Multi-vy Lentikulär Rendering Auto-stereoskopi Visualisering Systemdesign FFmpeg GPU OpenGL C PThreads GLUT Annan elektroteknik och elektronik Signal Processing Signalbehandling Computer Engineering Datorteknik
100	Semantic Labeling of Large Geographic Areas Using Multi-Date and Multi-View Satellite Images and Noisy OpenStreetMap Labels Bharath Kumar Comandur Jagannathan Raghunathan (9187466) 31 July 2020 (has links) <div>This dissertation addresses the problem of how to design a convolutional neural network (CNN) for giving semantic labels to the points on the ground given the satellite image coverage over the area and, for the ground truth, given the noisy labels in OpenStreetMap (OSM). This problem is made challenging by the fact that -- (1) Most of the images are likely to have been recorded from off-nadir viewpoints for the area of interest on the ground; (2) The user-supplied labels in OSM are frequently inaccurate and, not uncommonly, entirely missing; and (3) The size of the area covered on the ground must be large enough to possess any engineering utility. As this dissertation demonstrates, solving this problem requires that we first construct a DSM (Digital Surface Model) from a stereo fusion of the available images, and subsequently use the DSM to map the individual pixels in the satellite images to points on the ground. That creates an association between the pixels in the images and the noisy labels in OSM. The CNN-based solution we present yields a 4-8% improvement in the per-class segmentation IoU (Intersection over Union) scores compared to the traditional approaches that use the views independently of one another. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-`a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. This work also presents, for arguably the first time, an in-depth discussion of large-area image alignment and DSM construction using tens of true multi-date and multi-view WorldView-3 satellite images on a distributed OpenStack cloud computing platform.</div> Photogrammetry and Remote Sensing Computer Vision Semantic segmentation Deep Learning Applications Open Street Map Data Fusion Approach Stereo Matching Algorithm 3D reconstruction, multi-view data processing satellite images Building detection Road detection Remote sensing imagery Automated methods

Search results