• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1833
  • 57
  • 53
  • 38
  • 37
  • 36
  • 18
  • 12
  • 10
  • 7
  • 4
  • 4
  • 2
  • 2
  • 1
  • Tagged with
  • 2643
  • 2643
  • 1096
  • 943
  • 825
  • 603
  • 574
  • 482
  • 480
  • 457
  • 431
  • 428
  • 405
  • 405
  • 366
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Klasifikace na množinách bodů v 3D / Klasifikace na množinách bodů v 3D

Střelský, Jakub January 2018 (has links)
Increasing interest for classification of 3D geometrical data has led to discov- ery of PointNet, which is a neural network architecture capable of processing un- ordered point sets. This thesis explores several methods of utilizing conventional point features within PointNet and their impact on classification. Classification performance of the presented methods was experimentally evaluated and com- pared with a baseline PointNet model on four different datasets. The results of the experiments suggest that some of the considered features can improve clas- sification effectiveness of PointNet on difficult datasets with objects that are not aligned into canonical orientation. In particular, the well known spin image rep- resentations can be employed successfully and reliably within PointNet. Further- more, a feature-based alternative to spatial transformer, which is a sub-network of PointNet responsible for aligning misaligned objects into canonical orientation, have been introduced. Additional experiments demonstrate that the alternative might be competitive with spatial transformer on challenging datasets. 1
152

Techniques d'analyse de contenu appliquées à l'imagerie spatiale / Machine learning applied to remote sensing images

Le Goff, Matthieu 20 October 2017 (has links)
Depuis les années 1970, la télédétection a permis d’améliorer l’analyse de la surface de la Terre grâce aux images satellites produites sous format numérique. En comparaison avec les images aéroportées, les images satellites apportent plus d’information car elles ont une couverture spatiale plus importante et une période de revisite courte. L’essor de la télédétection a été accompagné de l’émergence des technologies de traitement qui ont permis aux utilisateurs de la communauté d’analyser les images satellites avec l’aide de chaînes de traitement de plus en plus automatiques. Depuis les années 1970, les différentes missions d’observation de la Terre ont permis d’accumuler une quantité d’information importante dans le temps. Ceci est dû notamment à l’amélioration du temps de revisite des satellites pour une même région, au raffinement de la résolution spatiale et à l’augmentation de la fauchée (couverture spatiale d’une acquisition). La télédétection, autrefois cantonnée à l’étude d’une seule image, s’est progressivement tournée et se tourne de plus en plus vers l’analyse de longues séries d’images multispectrales acquises à différentes dates. Le flux annuel d’images satellite est supposé atteindre plusieurs Péta octets prochainement. La disponibilité d’une si grande quantité de données représente un atout pour développer de chaines de traitement avancées. Les techniques d’apprentissage automatique beaucoup utilisées en télédétection se sont beaucoup améliorées. Les performances de robustesse des approches classiques d’apprentissage automatique étaient souvent limitées par la quantité de données disponibles. Des nouvelles techniques ont été développées pour utiliser efficacement ce nouveau flux important de données. Cependant, la quantité de données et la complexité des algorithmes mis en place nécessitent une grande puissance de calcul pour ces nouvelles chaînes de traitement. En parallèle, la puissance de calcul accessible pour le traitement d’images s’est aussi accrue. Les GPUs («Graphic Processing Unit ») sont de plus en plus utilisés et l’utilisation de cloud public ou privé est de plus en plus répandue. Désormais, pour le traitement d’images, toute la puissance nécessaire pour les chaînes de traitements automatiques est disponible à coût raisonnable. La conception des nouvelles chaînes de traitement doit prendre en compte ce nouveau facteur. En télédétection, l’augmentation du volume de données à exploiter est devenue une problématique due à la contrainte de la puissance de calcul nécessaire pour l’analyse. Les algorithmes de télédétection traditionnels ont été conçus pour des données pouvant être stockées en mémoire interne tout au long des traitements. Cette condition est de moins en moins respectée avec la quantité d’images et leur résolution. Les algorithmes de télédétection traditionnels nécessitent d’être revus et adaptés pour le traitement de données à grande échelle. Ce besoin n’est pas propre à la télédétection et se retrouve dans d’autres secteurs comme le web, la médecine, la reconnaissance vocale,… qui ont déjà résolu une partie de ces problèmes. Une partie des techniques et technologies développées par les autres domaines doivent encore être adaptées pour être appliquée aux images satellites. Cette thèse se focalise sur les algorithmes de télédétection pour le traitement de volumes de données massifs. En particulier, un premier algorithme existant d’apprentissage automatique est étudié et adapté pour une implantation distribuée. L’objectif de l’implantation est le passage à l’échelle c’est-à-dire que l’algorithme puisse traiter une grande quantité de données moyennant une puissance de calcul adapté. Enfin, la deuxième méthodologie proposée est basée sur des algorithmes récents d’apprentissage automatique les réseaux de neurones convolutionnels et propose une méthodologie pour les appliquer à nos cas d’utilisation sur des images satellites. / Since the 1970s, remote sensing has been a great tool to study the Earth in particular thanks to satellite images produced in digital format. Compared to airborne images, satellite images provide more information with a greater spatial coverage and a short revisit period. The rise of remote sensing was followed by the development of processing technologies enabling users to analyze satellite images with the help of automatic processing chains. Since the 1970s, the various Earth observation missions have gathered an important amount of information over time. This is caused in particular by the frequent revisiting time for the same region, the improvement of spatial resolution and the increase of the swath (spatial coverage of an acquisition). Remote sensing, which was once confined to the study of a single image, has gradually turned into the analysis of long time series of multispectral images acquired at different dates. The annual flow of satellite images is expected to reach several Petabytes in the near future. The availability of such a large amount of data is an asset to develop advanced processing chains. The machine learning techniques used in remote sensing have greatly improved. The robustness of traditional machine learning approaches was often limited by the amount of available data. New techniques have been developed to effectively use this new and important data flow. However, the amount of data and the complexity of the algorithms embedded in the new processing pipelines require a high computing power. In parallel, the computing power available for image processing has also increased. Graphic Processing Units (GPUs) are increasingly being used and the use of public or private clouds is becoming more widespread. Now, all the power required for image processing is available at a reasonable cost. The design of the new processing lines must take this new factor into account. In remote sensing, the volume of data currently available for exploitation has become a problem due to the constraint of the computing power required for the analysis. Traditional remote sensing algorithms have often been designed for data that can be stored in internal memory throughout processing. This condition is violated with the quantity of images and their resolution taken into account. Traditional remote sensing algorithms need to be reviewed and adapted for large-scale data processing. This need is not specific to remote sensing and is found in other sectors such as the web, medicine, speech recognition ... which have already solved some of these problems. Some of the techniques and technologies developed by the other domains still need to be adapted to be applied to satellite images. This thesis focuses on remote sensing algorithms for processing massive data volumes. In particular, a first algorithm of machine learning is studied and adapted for a distributed implementation. The aim of the implementation is the scalability, i.e. the algorithm can process a large quantity of data with a suitable computing power. Finally, the second proposed methodology is based on recent algorithms of learning convolutional neural networks and proposes a methodology to apply them to our cases of use on satellite images.
153

Image Reconstruction, Classification, and Tracking for Compressed Sensing Imaging and Video

January 2016 (has links)
abstract: Compressed sensing (CS) is a novel approach to collecting and analyzing data of all types. By exploiting prior knowledge of the compressibility of many naturally-occurring signals, specially designed sensors can dramatically undersample the data of interest and still achieve high performance. However, the generated data are pseudorandomly mixed and must be processed before use. In this work, a model of a single-pixel compressive video camera is used to explore the problems of performing inference based on these undersampled measurements. Three broad types of inference from CS measurements are considered: recovery of video frames, target tracking, and object classification/detection. Potential applications include automated surveillance, autonomous navigation, and medical imaging and diagnosis. Recovery of CS video frames is far more complex than still images, which are known to be (approximately) sparse in a linear basis such as the discrete cosine transform. By combining sparsity of individual frames with an optical flow-based model of inter-frame dependence, the perceptual quality and peak signal to noise ratio (PSNR) of reconstructed frames is improved. The efficacy of this approach is demonstrated for the cases of \textit{a priori} known image motion and unknown but constant image-wide motion. Although video sequences can be reconstructed from CS measurements, the process is computationally costly. In autonomous systems, this reconstruction step is unnecessary if higher-level conclusions can be drawn directly from the CS data. A tracking algorithm is described and evaluated which can hold target vehicles at very high levels of compression where reconstruction of video frames fails. The algorithm performs tracking by detection using a particle filter with likelihood given by a maximum average correlation height (MACH) target template model. Motivated by possible improvements over the MACH filter-based likelihood estimation of the tracking algorithm, the application of deep learning models to detection and classification of compressively sensed images is explored. In tests, a Deep Boltzmann Machine trained on CS measurements outperforms a naive reconstruct-first approach. Taken together, progress in these three areas of CS inference has the potential to lower system cost and improve performance, opening up new applications of CS video cameras. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016
154

Compressive Light Field Reconstruction using Deep Learning

January 2017 (has links)
abstract: Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a light field. We present a deep learning approach using a new, two branch network architecture, consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution 4D light field from a single coded 2D image. This network decreases reconstruction time significantly while achieving average PSNR values of 26-32 dB on a variety of light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7 minutes as compared to the dictionary method for equivalent visual quality. These reconstructions are performed at small sampling/compression ratios as low as 8%, allowing for cheaper coded light field cameras. We test our network reconstructions on synthetic light fields, simulated coded measurements of real light fields captured from a Lytro Illum camera, and real coded images from a custom CMOS diffractive light field camera. The combination of compressive light field capture with deep learning allows the potential for real-time light field video acquisition systems in the future. / Dissertation/Thesis / Masters Thesis Computer Engineering 2017
155

Compressive Visual Question Answering

January 2017 (has links)
abstract: Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from compressive measurements. However,the reconstruction process is a computationally expensive step that also provides poor results at high compression rate. There have been several successful attempts to perform inference tasks directly on compressive measurements such as activity recognition. In this thesis, I am interested to tackle a more challenging vision problem - Visual question answering (VQA) without reconstructing the compressive images. I investigate the feasibility of this problem with a series of experiments, and I evaluate proposed methods on a VQA dataset and discuss promising results and direction for future work. / Dissertation/Thesis / Masters Thesis Computer Engineering 2017
156

Unconstrained Periocular Face Recognition: From Reconstructive Dictionary Learning to Generative Deep Learning and Beyond

Juefei-Xu, Felix 01 April 2018 (has links)
Many real-world face recognition tasks are under unconstrained conditions such as off-angle pose variations, illumination variations, facial occlusion, facial expression, etc. In this work, we are focusing on the real-world scenarios where only the periocular region of a face is visible such as in the ISIS case. In Part I of the dissertation, we will showcase the face recognition capability based on the periocular region, which we call the periocular face recognition. We will demonstrate that face matching using the periocular region directly is more robust than the full face in terms of age-tolerant face recognition, expression-tolerant face recognition, pose-tolerant face recognition, as well as contains more cues for determining the gender information of a subject. In this dissertation, we will study direct periocular matching more comprehensively and systematically using both shallow and deep learning methods. Based on this, in Part II and Part III of the dissertation, we will continue to explore an indirect way of carrying out the periocular face recognition: periocular-based full face hallucination, because we want to capitalize on the powerful commercial face matchers and deep learning-based face recognition engines which are all trained on large-scale full face images. The reproducibility and feasibility of re-training for a proprietary facial region, such as the periocular region, is relatively low, due to the nonopen source nature of commercial face matchers as well as the amount of training data and computation power required by the deep learning based models. We will carry out the periocular-based full face hallucination based on two proposed reconstructive dictionary learning methods, including the dimensionally weighted K-SVD (DW-KSVD) dictionary learning approach and its kernel feature space counterpart using Fastfood kernel expansion approximation to reconstruct high-fidelity full face images from the periocular region, as well as two proposed generative deep learning approaches that build upon deep convolutional generative adversarial networks (DCGAN) to generate the full face from the periocular region observations, including the Gang of GANs (GoGAN) method and the discriminant nonlinear many-to-one generative adversarial networks (DNMM-GAN) for applications such as the generative open-set landmark-free frontalization (Golf) for faces and universal face optimization (UFO), which tackles an even broader set of problems than periocular based full face hallucination. Throughout Parts I-III, we will study how to handle challenging realworld scenarios such as unconstrained pose variations, unconstrained illumination conditions, and unconstrained low resolution of the periocular and facial images. Together, we aim to achieve unconstrained periocular face recognition through both direct periocular face matching and indirect periocular-based full face hallucination. In the final Part IV of the dissertation, we will go beyond and explore several new methods in deep learning that are statistically efficient for generalpurpose image recognition. Methods include the local binary convolutional neural networks (LBCNN), the perturbative neural networks (PNN), and the polynomial convolutional neural networks (PolyCNN).
157

A study of semantics across different representations of language

Dharmaretnam, Dhanush 28 May 2018 (has links)
Semantics is the study of meaning and here we explore it through three major representations: brain, image and text. Researchers in the past have performed various studies to understand the similarities between semantic features across all the three representations. Distributional Semantic (DS) models or word vectors that are trained on text corpora have been widely used to study the convergence of semantic information in the human brain. Moreover, they have been incorporated into various NLP applications such as document categorization, speech to text and machine translation. Due to their widespread adoption by researchers and industry alike, it becomes imperative to test and evaluate the performance of di erent word vectors models. In this thesis, we publish the second iteration of BrainBench: a system designed to evaluate and benchmark word vectors using brain data by incorporating two new Italian brain datasets collected using fMRI and EEG technology. In the second half of the thesis, we explore semantics in Convolutional Neural Network (CNN). CNN is a computational model that is the state of the art technology for object recognition from images. However, these networks are currently considered a black-box and there is an apparent lack of understanding on why various CNN architectures perform better than the other. In this thesis, we also propose a novel method to understand CNNs by studying the semantic representation through its hierarchical layers. The convergence of semantic information in these networks is studied with the help of DS models following similar methodologies used to study semantics in the human brain. Our results provide substantial evidence that Convolutional Neural Networks do learn semantics from the images, and the features learned by the CNNs correlate to the semantics of the object in the image. Our methodology and results could potentially pave the way for improved design and debugging of CNNs. / Graduate
158

Data-Driven Representation Learning in Multimodal Feature Fusion

January 2018 (has links)
abstract: Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction. We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems. In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
159

Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition

January 2018 (has links)
abstract: Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements. First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance. Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model. Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
160

Aprimoramento na predição de doses em casos de acidentes nucleares utilizando deep nets e gpu

Desterro, Filipe Santana Moreira do, Instituto de Engenharia Nuclear 03 1900 (has links)
Submitted by Almir Azevedo (barbio1313@gmail.com) on 2018-06-07T16:27:53Z No. of bitstreams: 1 dissertação mestrado ien 2018 Filipe Santana Moreira do Desterro.PDF: 3379721 bytes, checksum: adf3227c935769d0deee6186bbb0daf6 (MD5) / Made available in DSpace on 2018-06-07T16:27:53Z (GMT). No. of bitstreams: 1 dissertação mestrado ien 2018 Filipe Santana Moreira do Desterro.PDF: 3379721 bytes, checksum: adf3227c935769d0deee6186bbb0daf6 (MD5) Previous issue date: 2018-03 / Recentemente, o uso de dispositivos móveis foi proposto para a medição da avaliação da dose durante acidentes nucleares. A ideia é apoiar equipes de campo, fornecendo uma estimativa aproximada do mapa de distribuição de dose na proximidade da usina de energia nuclear (UEN), sem a necessidade de se conectar aos sistemas da UEN. A fim de fornecer essa execução autônoma, um conjunto de redes neurais artificiais (RNA) é proposto em substituição aos tradicionais sistemas de dispersão atmosférica de radionuclídeo (DAR) que utilizam modelos físicos complexos que demandam um excessivo tempo de processamento. Uma limitação observada nessa abordagem é o treinamento muito demorado das RNAs. Além disso, se o número de parâmetros de entrada aumenta, o desempenho de RNAs tradicionais, como o Multilayer-Perceptron (MLP) com treinamento de backpropagation ou Redes Neurais de Regressão Geral (GRNN), é afetado, prejudicando sensivelmente a predição. Este trabalho centrase no estudo de tecnologias computacionais para melhoria das RNAs a serem usadas na aplicação móvel, bem como seus algoritmos de treinamento. Contudo, para refinar a aprendizagem e permitir melhores estimativas de dose, são necessárias arquiteturas de RNA mais complexas. As RNAs com muitas camadas (muito mais do que um número típico de camadas), às vezes referidas como Redes Neurais Profundas ou Deep Neural Networks (DNN), por exemplo, demonstraram obter melhores resultados. Por outro lado, o treinamento de tais RNAs é muito lento. Deste modo, com o objetivo de permitir o uso desses DNNs em um tempo de treinamento razoável. É proposta uma solução de programação paralela, usando a Unidade de Processamento Gráfico (GPU). Neste contexto, este trabalho utilizou o framework TensorFlow para desenvolver Redes Neurais Profundas com 9 camadas. Como resultado, speedups entre 50 e 100 vezes (dependendo das arquiteturas RNA comparadas) foram alcançadas no processo de treinamento, sem afetar a qualidade dos resultados obtidos (estimativas de dose). / Recently, the use of mobile devices has been proposed for the measurement of dose evaluation during nuclear accidents. The idea is to support field teams, providing a rough estimate of the dose distribution map in the vicinity of the nuclear power plant (NPP), without the need to connect to the NPP systems. In order to provide this autonomous execution, a set of artificial neural networks (ANNs) is proposed instead of the traditional atmospheric dispersion of radionuclides (ADR) systems that use complex physical models that require an excessive processing time. One limitation observed in this approach is the very time-consuming training of ANN. In addition, if the number of input parameters increases, the performance of standard ANNs, such as Multilayer-Perceptron (MLP) with backpropagation training or General Regression Neural Networks (GRNN), is affected, leading to an irrational prediction. Thus, work focuses on the study of computational technologies to improve the RNAs to be used in the mobile application, as well as their training algorithms. However, to refine learning and allow better dose estimates, more complex ANN architectures are required. Layer ANNs (much more than a typical number of layers), sometimes referred to as Deep Neural Networks (DNNs), for example, have been shown to perform better. On the other hand, the training of such ANNs is very slow. Thus, in order to allow the use of these DNNs in a reasonable training time. With this, a parallel programming solution is proposed, using the Graphics Processing Units (GPU). In this context, this work used the TensorFlow framework to develop deep neural networks with 9. As a result, speedups between 50 and 100 times (depending on the ANN architectures compared) were achieved in the training process, without affecting the quality of the results obtained dose).

Page generated in 0.0601 seconds