Global ETD Search

151	Multi-person Pose Estimation in Soccer Videos with Convolutional Neural Networks Skyttner, Axel January 2018 (has links) Pose estimation is the problem of detecting poses of people in images, multiperson pose estimation is the problem of detecting poses of multiple persons in images. This thesis investigates multi-person pose estimation by applying the associative embedding method on images from soccer videos. Three models are compared, first a pre-trained model, second a fine-tuned model and third a model extended to handle image sequences. The pre-trained method performed well on soccer images and the fine-tuned model performed better then the pre-trained model. The image sequence model performed equally as the fine-tuned model but not better. This thesis concludes that the associative embedding model is a feasible option for pose estimation in soccer videos and should be further researched. Deep Learning Pose Estimation Sport Analysis Engineering and Technology Teknik och teknologier
152	Klasifikace na množinách bodů v 3D / Klasifikace na množinách bodů v 3D Střelský, Jakub January 2018 (has links) Increasing interest for classification of 3D geometrical data has led to discov- ery of PointNet, which is a neural network architecture capable of processing un- ordered point sets. This thesis explores several methods of utilizing conventional point features within PointNet and their impact on classification. Classification performance of the presented methods was experimentally evaluated and com- pared with a baseline PointNet model on four different datasets. The results of the experiments suggest that some of the considered features can improve clas- sification effectiveness of PointNet on difficult datasets with objects that are not aligned into canonical orientation. In particular, the well known spin image rep- resentations can be employed successfully and reliably within PointNet. Further- more, a feature-based alternative to spatial transformer, which is a sub-network of PointNet responsible for aligning misaligned objects into canonical orientation, have been introduced. Additional experiments demonstrate that the alternative might be competitive with spatial transformer on challenging datasets. 1
153	Techniques d'analyse de contenu appliquées à l'imagerie spatiale / Machine learning applied to remote sensing images Le Goff, Matthieu 20 October 2017 (has links) Depuis les années 1970, la télédétection a permis d’améliorer l’analyse de la surface de la Terre grâce aux images satellites produites sous format numérique. En comparaison avec les images aéroportées, les images satellites apportent plus d’information car elles ont une couverture spatiale plus importante et une période de revisite courte. L’essor de la télédétection a été accompagné de l’émergence des technologies de traitement qui ont permis aux utilisateurs de la communauté d’analyser les images satellites avec l’aide de chaînes de traitement de plus en plus automatiques. Depuis les années 1970, les différentes missions d’observation de la Terre ont permis d’accumuler une quantité d’information importante dans le temps. Ceci est dû notamment à l’amélioration du temps de revisite des satellites pour une même région, au raffinement de la résolution spatiale et à l’augmentation de la fauchée (couverture spatiale d’une acquisition). La télédétection, autrefois cantonnée à l’étude d’une seule image, s’est progressivement tournée et se tourne de plus en plus vers l’analyse de longues séries d’images multispectrales acquises à différentes dates. Le flux annuel d’images satellite est supposé atteindre plusieurs Péta octets prochainement. La disponibilité d’une si grande quantité de données représente un atout pour développer de chaines de traitement avancées. Les techniques d’apprentissage automatique beaucoup utilisées en télédétection se sont beaucoup améliorées. Les performances de robustesse des approches classiques d’apprentissage automatique étaient souvent limitées par la quantité de données disponibles. Des nouvelles techniques ont été développées pour utiliser efficacement ce nouveau flux important de données. Cependant, la quantité de données et la complexité des algorithmes mis en place nécessitent une grande puissance de calcul pour ces nouvelles chaînes de traitement. En parallèle, la puissance de calcul accessible pour le traitement d’images s’est aussi accrue. Les GPUs («Graphic Processing Unit ») sont de plus en plus utilisés et l’utilisation de cloud public ou privé est de plus en plus répandue. Désormais, pour le traitement d’images, toute la puissance nécessaire pour les chaînes de traitements automatiques est disponible à coût raisonnable. La conception des nouvelles chaînes de traitement doit prendre en compte ce nouveau facteur. En télédétection, l’augmentation du volume de données à exploiter est devenue une problématique due à la contrainte de la puissance de calcul nécessaire pour l’analyse. Les algorithmes de télédétection traditionnels ont été conçus pour des données pouvant être stockées en mémoire interne tout au long des traitements. Cette condition est de moins en moins respectée avec la quantité d’images et leur résolution. Les algorithmes de télédétection traditionnels nécessitent d’être revus et adaptés pour le traitement de données à grande échelle. Ce besoin n’est pas propre à la télédétection et se retrouve dans d’autres secteurs comme le web, la médecine, la reconnaissance vocale,… qui ont déjà résolu une partie de ces problèmes. Une partie des techniques et technologies développées par les autres domaines doivent encore être adaptées pour être appliquée aux images satellites. Cette thèse se focalise sur les algorithmes de télédétection pour le traitement de volumes de données massifs. En particulier, un premier algorithme existant d’apprentissage automatique est étudié et adapté pour une implantation distribuée. L’objectif de l’implantation est le passage à l’échelle c’est-à-dire que l’algorithme puisse traiter une grande quantité de données moyennant une puissance de calcul adapté. Enfin, la deuxième méthodologie proposée est basée sur des algorithmes récents d’apprentissage automatique les réseaux de neurones convolutionnels et propose une méthodologie pour les appliquer à nos cas d’utilisation sur des images satellites. / Since the 1970s, remote sensing has been a great tool to study the Earth in particular thanks to satellite images produced in digital format. Compared to airborne images, satellite images provide more information with a greater spatial coverage and a short revisit period. The rise of remote sensing was followed by the development of processing technologies enabling users to analyze satellite images with the help of automatic processing chains. Since the 1970s, the various Earth observation missions have gathered an important amount of information over time. This is caused in particular by the frequent revisiting time for the same region, the improvement of spatial resolution and the increase of the swath (spatial coverage of an acquisition). Remote sensing, which was once confined to the study of a single image, has gradually turned into the analysis of long time series of multispectral images acquired at different dates. The annual flow of satellite images is expected to reach several Petabytes in the near future. The availability of such a large amount of data is an asset to develop advanced processing chains. The machine learning techniques used in remote sensing have greatly improved. The robustness of traditional machine learning approaches was often limited by the amount of available data. New techniques have been developed to effectively use this new and important data flow. However, the amount of data and the complexity of the algorithms embedded in the new processing pipelines require a high computing power. In parallel, the computing power available for image processing has also increased. Graphic Processing Units (GPUs) are increasingly being used and the use of public or private clouds is becoming more widespread. Now, all the power required for image processing is available at a reasonable cost. The design of the new processing lines must take this new factor into account. In remote sensing, the volume of data currently available for exploitation has become a problem due to the constraint of the computing power required for the analysis. Traditional remote sensing algorithms have often been designed for data that can be stored in internal memory throughout processing. This condition is violated with the quantity of images and their resolution taken into account. Traditional remote sensing algorithms need to be reviewed and adapted for large-scale data processing. This need is not specific to remote sensing and is found in other sectors such as the web, medicine, speech recognition ... which have already solved some of these problems. Some of the techniques and technologies developed by the other domains still need to be adapted to be applied to satellite images. This thesis focuses on remote sensing algorithms for processing massive data volumes. In particular, a first algorithm of machine learning is studied and adapted for a distributed implementation. The aim of the implementation is the scalability, i.e. the algorithm can process a large quantity of data with a suitable computing power. Finally, the second proposed methodology is based on recent algorithms of learning convolutional neural networks and proposes a methodology to apply them to our cases of use on satellite images. Apprentissage automatique Télédétection Machine learning Deep learning Remote Sensing
154	Image Reconstruction, Classification, and Tracking for Compressed Sensing Imaging and Video January 2016 (has links) abstract: Compressed sensing (CS) is a novel approach to collecting and analyzing data of all types. By exploiting prior knowledge of the compressibility of many naturally-occurring signals, specially designed sensors can dramatically undersample the data of interest and still achieve high performance. However, the generated data are pseudorandomly mixed and must be processed before use. In this work, a model of a single-pixel compressive video camera is used to explore the problems of performing inference based on these undersampled measurements. Three broad types of inference from CS measurements are considered: recovery of video frames, target tracking, and object classification/detection. Potential applications include automated surveillance, autonomous navigation, and medical imaging and diagnosis. Recovery of CS video frames is far more complex than still images, which are known to be (approximately) sparse in a linear basis such as the discrete cosine transform. By combining sparsity of individual frames with an optical flow-based model of inter-frame dependence, the perceptual quality and peak signal to noise ratio (PSNR) of reconstructed frames is improved. The efficacy of this approach is demonstrated for the cases of \textit{a priori} known image motion and unknown but constant image-wide motion. Although video sequences can be reconstructed from CS measurements, the process is computationally costly. In autonomous systems, this reconstruction step is unnecessary if higher-level conclusions can be drawn directly from the CS data. A tracking algorithm is described and evaluated which can hold target vehicles at very high levels of compression where reconstruction of video frames fails. The algorithm performs tracking by detection using a particle filter with likelihood given by a maximum average correlation height (MACH) target template model. Motivated by possible improvements over the MACH filter-based likelihood estimation of the tracking algorithm, the application of deep learning models to detection and classification of compressively sensed images is explored. In tests, a Deep Boltzmann Machine trained on CS measurements outperforms a naive reconstruct-first approach. Taken together, progress in these three areas of CS inference has the potential to lower system cost and improve performance, opening up new applications of CS video cameras. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016 Electrical engineering Compressed Sensing Computer Vision Deep Learning Image Processing
155	Compressive Light Field Reconstruction using Deep Learning January 2017 (has links) abstract: Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a light field. We present a deep learning approach using a new, two branch network architecture, consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution 4D light field from a single coded 2D image. This network decreases reconstruction time significantly while achieving average PSNR values of 26-32 dB on a variety of light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7 minutes as compared to the dictionary method for equivalent visual quality. These reconstructions are performed at small sampling/compression ratios as low as 8%, allowing for cheaper coded light field cameras. We test our network reconstructions on synthetic light fields, simulated coded measurements of real light fields captured from a Lytro Illum camera, and real coded images from a custom CMOS diffractive light field camera. The combination of compressive light field capture with deep learning allows the potential for real-time light field video acquisition systems in the future. / Dissertation/Thesis / Masters Thesis Computer Engineering 2017 Computer engineering Electrical engineering compressive deep learning field light
156	Compressive Visual Question Answering January 2017 (has links) abstract: Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from compressive measurements. However,the reconstruction process is a computationally expensive step that also provides poor results at high compression rate. There have been several successful attempts to perform inference tasks directly on compressive measurements such as activity recognition. In this thesis, I am interested to tackle a more challenging vision problem - Visual question answering (VQA) without reconstructing the compressive images. I investigate the feasibility of this problem with a series of experiments, and I evaluate proposed methods on a VQA dataset and discuss promising results and direction for future work. / Dissertation/Thesis / Masters Thesis Computer Engineering 2017 Computer science Mathematics compressive sensing deep learning visual question anwering
157	Unconstrained Periocular Face Recognition: From Reconstructive Dictionary Learning to Generative Deep Learning and Beyond Juefei-Xu, Felix 01 April 2018 (has links) Many real-world face recognition tasks are under unconstrained conditions such as off-angle pose variations, illumination variations, facial occlusion, facial expression, etc. In this work, we are focusing on the real-world scenarios where only the periocular region of a face is visible such as in the ISIS case. In Part I of the dissertation, we will showcase the face recognition capability based on the periocular region, which we call the periocular face recognition. We will demonstrate that face matching using the periocular region directly is more robust than the full face in terms of age-tolerant face recognition, expression-tolerant face recognition, pose-tolerant face recognition, as well as contains more cues for determining the gender information of a subject. In this dissertation, we will study direct periocular matching more comprehensively and systematically using both shallow and deep learning methods. Based on this, in Part II and Part III of the dissertation, we will continue to explore an indirect way of carrying out the periocular face recognition: periocular-based full face hallucination, because we want to capitalize on the powerful commercial face matchers and deep learning-based face recognition engines which are all trained on large-scale full face images. The reproducibility and feasibility of re-training for a proprietary facial region, such as the periocular region, is relatively low, due to the nonopen source nature of commercial face matchers as well as the amount of training data and computation power required by the deep learning based models. We will carry out the periocular-based full face hallucination based on two proposed reconstructive dictionary learning methods, including the dimensionally weighted K-SVD (DW-KSVD) dictionary learning approach and its kernel feature space counterpart using Fastfood kernel expansion approximation to reconstruct high-fidelity full face images from the periocular region, as well as two proposed generative deep learning approaches that build upon deep convolutional generative adversarial networks (DCGAN) to generate the full face from the periocular region observations, including the Gang of GANs (GoGAN) method and the discriminant nonlinear many-to-one generative adversarial networks (DNMM-GAN) for applications such as the generative open-set landmark-free frontalization (Golf) for faces and universal face optimization (UFO), which tackles an even broader set of problems than periocular based full face hallucination. Throughout Parts I-III, we will study how to handle challenging realworld scenarios such as unconstrained pose variations, unconstrained illumination conditions, and unconstrained low resolution of the periocular and facial images. Together, we aim to achieve unconstrained periocular face recognition through both direct periocular face matching and indirect periocular-based full face hallucination. In the final Part IV of the dissertation, we will go beyond and explore several new methods in deep learning that are statistically efficient for generalpurpose image recognition. Methods include the local binary convolutional neural networks (LBCNN), the perturbative neural networks (PNN), and the polynomial convolutional neural networks (PolyCNN). Biometrics Deep Learning Dictionary Learning Face Recognition Periocular Recognition
158	A study of semantics across different representations of language Dharmaretnam, Dhanush 28 May 2018 (has links) Semantics is the study of meaning and here we explore it through three major representations: brain, image and text. Researchers in the past have performed various studies to understand the similarities between semantic features across all the three representations. Distributional Semantic (DS) models or word vectors that are trained on text corpora have been widely used to study the convergence of semantic information in the human brain. Moreover, they have been incorporated into various NLP applications such as document categorization, speech to text and machine translation. Due to their widespread adoption by researchers and industry alike, it becomes imperative to test and evaluate the performance of di erent word vectors models. In this thesis, we publish the second iteration of BrainBench: a system designed to evaluate and benchmark word vectors using brain data by incorporating two new Italian brain datasets collected using fMRI and EEG technology. In the second half of the thesis, we explore semantics in Convolutional Neural Network (CNN). CNN is a computational model that is the state of the art technology for object recognition from images. However, these networks are currently considered a black-box and there is an apparent lack of understanding on why various CNN architectures perform better than the other. In this thesis, we also propose a novel method to understand CNNs by studying the semantic representation through its hierarchical layers. The convergence of semantic information in these networks is studied with the help of DS models following similar methodologies used to study semantics in the human brain. Our results provide substantial evidence that Convolutional Neural Networks do learn semantics from the images, and the features learned by the CNNs correlate to the semantics of the object in the image. Our methodology and results could potentially pave the way for improved design and debugging of CNNs. / Graduate Computational linguistics Semantics Semantics in Brain Convolutional Neural Networks Deep learning
159	Data-Driven Representation Learning in Multimodal Feature Fusion January 2018 (has links) abstract: Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction. We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems. In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Computer science Deep Learning Feature Fusion Multimodal Learning Representation Learning
160	Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition January 2018 (has links) abstract: Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements. First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance. Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model. Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Artificial intelligence computer vision deep learning human studies visual saliency

Search results