• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 1
  • 1
  • Tagged with
  • 13
  • 7
  • 7
  • 6
  • 6
  • 5
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Improvement and Implementation of Gumbel-Softmax VAE

Fangshi, Zhou 10 August 2022 (has links)
No description available.
2

Proactive Planning through Active Policy Inference in Stochastic Environments

Poulin, Nolan 01 May 2018 (has links)
In multi-agent Markov Decision Processes, a controllable agent must perform optimal planning in a dynamic and uncertain environment that includes another unknown and uncontrollable agent. Given a task specification for the controllable agent, its ability to complete the task can be impeded by an inaccurate model of the intent and behaviors of other agents. In this work, we introduce an active policy inference algorithm that allows a controllable agent to infer a policy of the environmental agent through interaction. Active policy inference is data-efficient and is particularly useful when data are time-consuming or costly to obtain. The controllable agent synthesizes an exploration-exploitation policy that incorporates the knowledge learned about the environment's behavior. Whenever possible, the agent also tries to elicit behavior from the other agent to improve the accuracy of the environmental model. This is done by mapping the uncertainty in the environmental model to a bonus reward, which helps elicit the most informative exploration, and allows the controllable agent to return to its main task as fast as possible. Experiments demonstrate the improved sample efficiency of active learning and the convergence of the policy for the controllable agents.
3

The Automated Prediction of Solar Flares from SDO Images Using Deep Learning

Abed, Ali K., Qahwaji, Rami S.R., Abed, A. 21 March 2021 (has links)
No / In the last few years, there has been growing interest in near-real-time solar data processing, especially for space weather applications. This is due to space weather impacts on both space-borne and ground-based systems, and industries, which subsequently impacts our lives. In the current study, the deep learning approach is used to establish an automated hybrid computer system for a short-term forecast; it is achieved by using the complexity level of the sunspot group on SDO/HMI Intensitygram images. Furthermore, this suggested system can generate the forecast for solar flare occurrences within the following 24 h. The input data for the proposed system are SDO/HMI full-disk Intensitygram images and SDO/HMI full-disk magnetogram images. System outputs are the “Flare or Non-Flare” of daily flare occurrences (C, M, and X classes). This system integrates an image processing system to automatically detect sunspot groups on SDO/HMI Intensitygram images using active-region data extracted from SDO/HMI magnetogram images (presented by Colak and Qahwaji, 2008) and deep learning to generate these forecasts. Our deep learning-based system is designed to analyze sunspot groups on the solar disk to predict whether this sunspot group is capable of releasing a significant flare or not. Our system introduced in this work is called ASAP_Deep. The deep learning model used in our system is based on the integration of the Convolutional Neural Network (CNN) and Softmax classifier to extract special features from the sunspot group images detected from SDO/HMI (Intensitygram and magnetogram) images. Furthermore, a CNN training scheme based on the integration of a back-propagation algorithm and a mini-batch AdaGrad optimization method is suggested for weight updates and to modify learning rates, respectively. The images of the sunspot regions are cropped automatically by the imaging system and processed using deep learning rules to provide near real-time predictions. The major results of this study are as follows. Firstly, the ASAP_Deep system builds on the ASAP system introduced in Colak and Qahwaji (2009) but improves the system with an updated deep learning-based prediction capability. Secondly, we successfully apply CNN to the sunspot group image without any pre-processing or feature extraction. Thirdly, our system results are considerably better, especially for the false alarm ratio (FAR); this reduces the losses resulting from the protection measures applied by companies. Also, the proposed system achieves a relatively high scores for True Skill Statistics (TSS) and Heidke Skill Score (HSS).
4

On the Softmax Bottleneck of Word-Level Recurrent Language Models

Parthiban, Dwarak Govind 06 November 2020 (has links)
For different input contexts (sequence of previous words), to predict the next word, a neural word-level language model outputs a probability distribution over all the words in the vocabulary using a softmax function. When the log of probability outputs for all such contexts are stacked together, the resulting matrix is a log probability matrix which can be denoted as Q_theta, where theta denotes the model parameters. When language modeling is formulated as a matrix factorization problem, the matrix to be factorized Q_theta is expected to be high-rank as natural language is highly context-dependent. But existing softmax based word-level language models have a limitation of not being able to produce such matrices; this is known as the softmax bottleneck. There are several works that attempted to overcome the limitations introduced by softmax bottleneck, such as the models that can produce high-rank Q_theta. During the process of reproducing the results of these works, we observed that the rank of Q_theta does not always positively correlate with better performance (i.e., lower test perplexity). This puzzling observation triggered us to conduct a systematic investigation to check the influence of rank of Q_theta on better performance of a language model. We first introduce a new family of activation functions called the Generalized SigSoftmax (GSS). By controlling the parameters of GSS, we were able to construct language models that can produce Q_theta with diverse ranks (i.e., low, medium, and high ranks). For models that use GSS with different parameters, we observe that rank does not have a strong positive correlation with perplexity on the test data, reinforcing the support of our initial observation. By inspecting the top-5 predictions made by different models for a selected set of input contexts, we observe that a high-rank Q_theta does not guarantee a strong qualitative performance. Then, we conduct experiments to check if there are any other additional benefits in having models that can produce high-rank Q_theta. We expose that Q_theta rather suffers from the phenomenon of fast singular value decay. Additionally, we also propose an alternative metric to denote the rank of any matrix known as epsilon-effective rank, which can be useful to approximately quantify the singular value distribution when different values for epsilon are used. We conclude by showing that it is the regularization which has played a positive role in the performance of these high-rank models in comparison to the chosen baselines, and there is no single model yet which truly gains improved expressiveness just because of breaking the softmax bottleneck.
5

A multi-biometric iris recognition system based on a deep learning approach

Al-Waisy, Alaa S., Qahwaji, Rami S.R., Ipson, Stanley S., Al-Fahdawi, Shumoos, Nagem, Tarek A.M. 24 October 2017 (has links)
Yes / Multimodal biometric systems have been widely applied in many real-world applications due to its ability to deal with a number of significant limitations of unimodal biometric systems, including sensitivity to noise, population coverage, intra-class variability, non-universality, and vulnerability to spoofing. In this paper, an efficient and real-time multimodal biometric system is proposed based on building deep learning representations for images of both the right and left irises of a person, and fusing the results obtained using a ranking-level fusion method. The trained deep learning system proposed is called IrisConvNet whose architecture is based on a combination of Convolutional Neural Network (CNN) and Softmax classifier to extract discriminative features from the input image without any domain knowledge where the input image represents the localized iris region and then classify it into one of N classes. In this work, a discriminative CNN training scheme based on a combination of back-propagation algorithm and mini-batch AdaGrad optimization method is proposed for weights updating and learning rate adaptation, respectively. In addition, other training strategies (e.g., dropout method, data augmentation) are also proposed in order to evaluate different CNN architectures. The performance of the proposed system is tested on three public datasets collected under different conditions: SDUMLA-HMT, CASIA-Iris- V3 Interval and IITD iris databases. The results obtained from the proposed system outperform other state-of-the-art of approaches (e.g., Wavelet transform, Scattering transform, Local Binary Pattern and PCA) by achieving a Rank-1 identification rate of 100% on all the employed databases and a recognition time less than one second per person.
6

Robot semantic place recognition based on deep belief networks and a direct use of tiny images / Robot de reconnaissance des lieux sémantiques basée sur l'architecture profonde et une utilisation directe de mini-images

Hasasneh, Ahmad 23 November 2012 (has links)
Il est généralement facile pour les humains de distinguer rapidement différents lieux en se basant uniquement sur leur aspect visuel. . Ces catégories sémantiques peuvent être utilisées comme information contextuelle favorisant la détection et la reconnaissance d'objets. Des travaux récents en reconnaissance des lieux visent à doter les robots de capacités similaires. Contrairement aux travaux classiques, portant sur la localisation et la cartographie, cette tâche est généralement traitée comme un problème d'apprentissage supervisé.La reconnaissance de lieux sémantiques - la capacité à reconnaître la catégorie sémantique à laquelle une scène appartient – peut être considérée comme une condition essentielle en robotique autonome. Un robot autonome doit en effet pouvoir apprendre facilement l'organisation sémantique de son environnement pour pouvoir fonctionner et interagir avec succès. Pour atteindre cet objectif, différentes méthodes ont déjà été proposées. Certaines sont basées sur l'identification des objets comme une condition préalable à la reconnaissance des scènes, et d'autres fondées sur une description directe des caractéristiques de la scène. Si nous faisons l'hypothèse que les objets sont plus faciles à reconnaître quand la scène dans laquelle ils apparaissent est bien identifiée, la deuxième approche semble plus appropriée. Elle est cependant fortement dépendante de la nature des descripteurs d'images utilisées qui sont généralement dérivés empiriquement a partir des observations générales sur le codage d'images.En opposition avec ces propositions, une autre approche de codage des images, basée sur un point de vue plus théorique, a émergé ces dernières années. Les modèles d'extraction de caractéristiques fondés sur le principe de la minimisation d'une fonction d'énergie en relation avec un modèle statistique génératif expliquant au mieux les données, ont abouti à l'apparition des Machines de Boltzmann Restreintes (Rectricted Boltzmann Machines : RBMs) capables de coder une image comme la superposition d'un nombre limité de caractéristiques extraites à partir d'un plus grand alphabet. Il a été montré que ce processus peut être répété dans une architecture plus profonde, conduisant à une représentation parcimonieuse et efficace des données initiales dans l'espace des caractéristiques. Le problème complexe de la classification dans l'espace de début est ainsi remplacé par un problème plus simple dans l'espace des caractéristiques.Dans ce travail, nous montrons que la reconnaissance sémantiques des lieux peut être réalisée en considérant des mini-images au lieu d'approches plus classiques de type ''sacs-de-mots'' et par l'utilisation de réseaux profonds pour le codage des images. Après avoir realisé un codage approprié, une régression softmax dans l'espace de projection est suffisante pour obtenir des résultats de classification prometteurs. A notre connaissance, cette approche n'a pas encore été proposée pour la reconnaissance de scène en robotique autonome.Nous avons comparé nos méthodes avec les algorithmes de l'état-de-l'art en utilisant une base de données standard de localisation de robot. Nous avons étudié l'influence des paramètres du système et comparé les différentes conditions sur la même base de données. Les expériences réalisées montrent que le modèle que nous proposons, tout en étant très simple, conduit à des résultats comparables à l'état-de-l'art sur une tâche de reconnaissance de lieux sémantiques. / Usually, human beings are able to quickly distinguish between different places, solely from their visual appearance. This is due to the fact that they can organize their space as composed of discrete units. These units, called ``semantic places'', are characterized by their spatial extend and their functional unity. Such a semantic category can thus be used as contextual information which fosters object detection and recognition. Recent works in semantic place recognition seek to endow the robot with similar capabilities. Contrary to classical localization and mapping works, this problem is usually addressed as a supervised learning problem. The question of semantic places recognition in robotics - the ability to recognize the semantic category of a place to which scene belongs to - is therefore a major requirement for the future of autonomous robotics. It is indeed required for an autonomous service robot to be able to recognize the environment in which it lives and to easily learn the organization of this environment in order to operate and interact successfully. To achieve that goal, different methods have been already proposed, some based on the identification of objects as a prerequisite to the recognition of the scenes, and some based on a direct description of the scene characteristics. If we make the hypothesis that objects are more easily recognized when the scene in which they appear is identified, the second approach seems more suitable. It is however strongly dependent on the nature of the image descriptors used, usually empirically derived from general considerations on image coding.Compared to these many proposals, another approach of image coding, based on a more theoretical point of view, has emerged the last few years. Energy-based models of feature extraction based on the principle of minimizing the energy of some function according to the quality of the reconstruction of the image has lead to the Restricted Boltzmann Machines (RBMs) able to code an image as the superposition of a limited number of features taken from a larger alphabet. It has also been shown that this process can be repeated in a deep architecture, leading to a sparse and efficient representation of the initial data in the feature space. A complex problem of classification in the input space is thus transformed into an easier one in the feature space. This approach has been successfully applied to the identification of tiny images from the 80 millions image database of the MIT. In the present work, we demonstrate that semantic place recognition can be achieved on the basis of tiny images instead of conventional Bag-of-Word (BoW) methods and on the use of Deep Belief Networks (DBNs) for image coding. We show that after appropriate coding a softmax regression in the projection space is sufficient to achieve promising classification results. To our knowledge, this approach has not yet been investigated for scene recognition in autonomous robotics. We compare our methods with the state-of-the-art algorithms using a standard database of robot localization. We study the influence of system parameters and compare different conditions on the same dataset. These experiments show that our proposed model, while being very simple, leads to state-of-the-art results on a semantic place recognition task.
7

Robot semantic place recognition based on deep belief networks and a direct use of tiny images

Hasasneh, Ahmad 23 November 2012 (has links) (PDF)
Usually, human beings are able to quickly distinguish between different places, solely from their visual appearance. This is due to the fact that they can organize their space as composed of discrete units. These units, called ''semantic places'', are characterized by their spatial extend and their functional unity. Such a semantic category can thus be used as contextual information which fosters object detection and recognition. Recent works in semantic place recognition seek to endow the robot with similar capabilities. Contrary to classical localization and mapping works, this problem is usually addressed as a supervised learning problem. The question of semantic places recognition in robotics - the ability to recognize the semantic category of a place to which scene belongs to - is therefore a major requirement for the future of autonomous robotics. It is indeed required for an autonomous service robot to be able to recognize the environment in which it lives and to easily learn the organization of this environment in order to operate and interact successfully. To achieve that goal, different methods have been already proposed, some based on the identification of objects as a prerequisite to the recognition of the scenes, and some based on a direct description of the scene characteristics. If we make the hypothesis that objects are more easily recognized when the scene in which they appear is identified, the second approach seems more suitable. It is however strongly dependent on the nature of the image descriptors used, usually empirically derived from general considerations on image coding.Compared to these many proposals, another approach of image coding, based on a more theoretical point of view, has emerged the last few years. Energy-based models of feature extraction based on the principle of minimizing the energy of some function according to the quality of the reconstruction of the image has lead to the Restricted Boltzmann Machines (RBMs) able to code an image as the superposition of a limited number of features taken from a larger alphabet. It has also been shown that this process can be repeated in a deep architecture, leading to a sparse and efficient representation of the initial data in the feature space. A complex problem of classification in the input space is thus transformed into an easier one in the feature space. This approach has been successfully applied to the identification of tiny images from the 80 millions image database of the MIT. In the present work, we demonstrate that semantic place recognition can be achieved on the basis of tiny images instead of conventional Bag-of-Word (BoW) methods and on the use of Deep Belief Networks (DBNs) for image coding. We show that after appropriate coding a softmax regression in the projection space is sufficient to achieve promising classification results. To our knowledge, this approach has not yet been investigated for scene recognition in autonomous robotics. We compare our methods with the state-of-the-art algorithms using a standard database of robot localization. We study the influence of system parameters and compare different conditions on the same dataset. These experiments show that our proposed model, while being very simple, leads to state-of-the-art results on a semantic place recognition task.
8

Rozpoznávání lidské aktivity s pomocí senzorů v chytrém telefonu / Human Activity Recognition Using Smartphone

Novák, Andrej January 2016 (has links)
The increase of mobile smartphones continues to grow and with it the demand for automation and use of the most offered aspects of the phone, whether in medicine (health care and surveillance) or in user applications (automatic recognition of position, etc.). As part of this work has been created the designs and implementation of the system for the recognition of human activity on the basis of data processing from sensors of smartphones, along with the determination of the optimal parameters, recovery success rate and comparison of individual evaluation. Other benefits include a draft format and displaying numerous training set consisting of real contributions and their manual evaluation. In addition to the main benefits, the software tool was created to allow the validation of the elements of the training set and acquisition of features from this set and software, that is able with the help of deep learning to train models and then test them.
9

Regularizing Vision-Transformers Using Gumbel-Softmax Distributions on Echocardiography Data / Regularisering av Vision-Transformers med hjälp av Gumbel-Softmax-fördelningar på ekokardiografidata

Nilsson, Alfred January 2023 (has links)
This thesis introduces an novel approach to model regularization in Vision Transformers (ViTs), a category of deep learning models. It employs stochastic embedded feature selection within the context of echocardiography video analysis, specifically focusing on the EchoNet-Dynamic dataset. The proposed method, termed Gumbel Vision-Transformer (G-ViT), combines ViTs and Concrete Autoencoders (CAE) to enhance the generalization of models predicting left ventricular ejection fraction (LVEF). The model comprises a ViT frame encoder for spatial representation and a transformer sequence model for temporal aspects, forming a Video ViT (V-ViT) architecture that, when used without feature selection, serves as a baseline on LVEF prediction performance. The key contribution lies in the incorporation of stochastic image patch selection in video frames during training. The CAE method is adapted for this purpose, achieving approximately discrete patch selections by sampling from the Gumbel-Softmax distribution, a relaxation of the categorical. The experiments conducted on EchoNetDynamic demonstrate a consistent and notable regularization effect. The G-ViT model, trained with learned feature selection, achieves a test R² of 0.66 outperforms random masking baselines and the full-input V-ViT counterpart with an R² of 0.63, and showcasing improved generalization in multiple evaluation metrics. The G-ViT is compared against recent related work in the application of ViTs on EchoNet-Dynamic, notably outperforming the application of Swin-transformers, UltraSwin, which achieved an R² of 0.59. Moreover, the thesis explores model explainability by visualizing selected patches, providing insights into how the G-ViT utilizes regions known to be crucial for LVEF prediction for humans. This proposed approach extends beyond regularization, offering a unique explainability tool for ViTs. Efficiency aspects are also considered, revealing that the G-ViT model, trained with a reduced number of input tokens, yields comparable or superior results while significantly reducing GPU memory and floating-point operations. This efficiency improvement holds potential for energy reduction during training. / Detta examensarbete introducerar en ny metod för att uppnå regularisering av Vision-Transformers (ViTs), en kategori av deep learning-modeller. Den använder sig stokastisk inbäddad feature selection i kontexten av analys av ekokardiografivideor, specifikt inriktat på datasetet EchoNet-Dynamic. Den föreslagna metoden, kallad Gumbel Vision-Transformer (G-ViT), kombinerar ViTs och Concrete Autoencoders (CAE) för att förbättra generaliseringen av modeller som förutspår ejektionsfraktion i vänstra ventrikeln (left ventricular ejection fraction, LVEF). Modellen inbegriper en ViT frame encoder för spatiella representationer och en transformer-sekvensmodell för tidsaspekter, vilka bilder en arkitektur, Video-ViT (V-ViT), som tränad utan feature selection utgör en utgångspunkt (baseline) för jämförelse vid prediktion av LVEF. Det viktigaste bidraget ligger i införandet av stokastiskt urval av bild-patches i videobilder under träning. CAE-metoden anpassas för detta ändamål, och uppnår approxmativt diskret patch-selektion genom att dra stickprov från Gumbel-Softmax-fördelningen, en relaxation av den kategoriska fördelningen. Experimenten utförda på EchoNet-Dynamic visar en konsekvent och anmärkningsvärd regulariseringseffekt. G-ViTmodellen, tränad med inlärd feature selection, uppnår ett R² på 0,66 och överträffar slumpmässigt urval och V-ViT-motsvarigheten som använder sig av hela bilder med ett R² på 0,63, och uppvisar förbättrad generalisering i flera utvärderingsmått. G-ViT jämförs med nyligen publicerat arbete i tillämpningen av ViTs på EchoNet-Dynamic och överträffar bland annat en tillämpning av Swin-transformers, UltraSwin, som uppnådde en R² på 0,59. Dessutom utforskar detta arbete modellförklarbarhet genom att visualisera utvalda bild-patches, vilket ger insikter i hur G-ViT använder regioner som är kända för att vara avgörande för LVEF-estimering för människor. Denna föreslagna metod sträcker sig bortom regularisering och erbjuder ett unikt förklaringsverktyg för ViTs. Effektivitetsaspekter beaktas också, vilket avslöjar att G-ViT-modellen, tränad med ett reducerat antal inmatningstokens, ger jämförbara eller överlägsna resultat samtidigt som den avsevärt minskar GPU-minnet och flyttalsoperationer. Denna effektivitetsförbättring har potential för energireduktion under träning.
10

Distributed conditional computation

Léonard, Nicholas 08 1900 (has links)
L'objectif de cette thèse est de présenter différentes applications du programme de recherche de calcul conditionnel distribué. On espère que ces applications, ainsi que la théorie présentée ici, mènera à une solution générale du problème d'intelligence artificielle, en particulier en ce qui a trait à la nécessité d'efficience. La vision du calcul conditionnel distribué consiste à accélérer l'évaluation et l'entraînement de modèles profonds, ce qui est très différent de l'objectif usuel d'améliorer sa capacité de généralisation et d'optimisation. Le travail présenté ici a des liens étroits avec les modèles de type mélange d'experts. Dans le chapitre 2, nous présentons un nouvel algorithme d'apprentissage profond qui utilise une forme simple d'apprentissage par renforcement sur un modèle d'arbre de décisions à base de réseau de neurones. Nous démontrons la nécessité d'une contrainte d'équilibre pour maintenir la distribution d'exemples aux experts uniforme et empêcher les monopoles. Pour rendre le calcul efficient, l'entrainement et l'évaluation sont contraints à être éparse en utilisant un routeur échantillonnant des experts d'une distribution multinomiale étant donné un exemple. Dans le chapitre 3, nous présentons un nouveau modèle profond constitué d'une représentation éparse divisée en segments d'experts. Un modèle de langue à base de réseau de neurones est construit à partir des transformations éparses entre ces segments. L'opération éparse par bloc est implémentée pour utilisation sur des cartes graphiques. Sa vitesse est comparée à deux opérations denses du même calibre pour démontrer le gain réel de calcul qui peut être obtenu. Un modèle profond utilisant des opérations éparses contrôlées par un routeur distinct des experts est entraîné sur un ensemble de données d'un milliard de mots. Un nouvel algorithme de partitionnement de données est appliqué sur un ensemble de mots pour hiérarchiser la couche de sortie d'un modèle de langage, la rendant ainsi beaucoup plus efficiente. Le travail présenté dans cette thèse est au centre de la vision de calcul conditionnel distribué émis par Yoshua Bengio. Elle tente d'appliquer la recherche dans le domaine des mélanges d'experts aux modèles profonds pour améliorer leur vitesse ainsi que leur capacité d'optimisation. Nous croyons que la théorie et les expériences de cette thèse sont une étape importante sur la voie du calcul conditionnel distribué car elle cadre bien le problème, surtout en ce qui concerne la compétitivité des systèmes d'experts. / The objective of this paper is to present different applications of the distributed conditional computation research program. It is hoped that these applications and the theory presented here will lead to a general solution of the problem of artificial intelligence, especially with regard to the need for efficiency. The vision of distributed conditional computation is to accelerate the evaluation and training of deep models which is very different from the usual objective of improving its generalization and optimization capacity. The work presented here has close ties with mixture of experts models. In Chapter 2, we present a new deep learning algorithm that uses a form of reinforcement learning on a novel neural network decision tree model. We demonstrate the need for a balancing constraint to keep the distribution of examples to experts uniform and to prevent monopolies. To make the calculation efficient, the training and evaluation are constrained to be sparse by using a gater that samples experts from a multinomial distribution given examples. In Chapter 3 we present a new deep model consisting of a sparse representation divided into segments of experts. A neural network language model is constructed from blocks of sparse transformations between these expert segments. The block-sparse operation is implemented for use on graphics cards. Its speed is compared with two dense operations of the same caliber to demonstrate and measure the actual efficiency gain that can be obtained. A deep model using these block-sparse operations controlled by a distinct gater is trained on a dataset of one billion words. A new algorithm for data partitioning (clustering) is applied to a set of words to organize the output layer of a language model into a conditional hierarchy, thereby making it much more efficient. The work presented in this thesis is central to the vision of distributed conditional computation as issued by Yoshua Bengio. It attempts to apply research in the area of mixture of experts to deep models to improve their speed and their optimization capacity. We believe that the theory and experiments of this thesis are an important step on the path to distributed conditional computation because it provides a good framework for the problem, especially concerning competitiveness inherent to systems of experts.

Page generated in 0.0267 seconds