Global ETD Search

1	Improve the Convergence Speed and Stability of Generative Adversarial Networks Zou, Xiaozhou 26 April 2018 (has links) In this thesis, we address two major problems in Generative Adversarial Networks (GAN), an important sub-field in deep learning. The first problem that we address is the instability in the training process that happens in many real-world problems and the second problem that we address is the lack of a good evaluation metric for the performance of GAN algorithms. To understand and address the first problem, three approaches are developed. Namely, we introduce randomness to the training process; we investigate various normalization methods; most importantly we develop a better parameter initialization strategy to help stabilize training. In the randomness techniques part of the thesis, we developed two randomness approaches, namely the addition of gradient noise and the batch random flipping of the results from the discrimination section of a GAN. In the normalization part of the thesis, we compared the performances of the z-score transform, the min-max normalization, affine transformations and batch normalization. In the most novel and important part of this thesis, we developed techniques to initialize the GAN generator section with parameters that can produce a uniform distribution on the range of the training data. As far as we are aware, this seemingly simple idea has not yet appeared in the extant literature, and the empirical results we obtain on 2-dimensional synthetic data show marked improvement. As to better evaluation metrics, we demonstrate a simple yet effective way to evaluate the effectiveness of the generator using a novel "overlap loss". batch normalization overlap loss randomness step-up GAN
2	Improve the Convergence Speed and Stability of Generative Adversarial Networks Zou, Xiaozhou 26 April 2018 (has links) In this thesis, we address two major problems in Generative Adversarial Networks (GAN), an important sub-field in deep learning. The first problem that we address is the instability in the training process that happens in many real-world problems and the second problem that we address is the lack of a good evaluation metric for the performance of GAN algorithms. To understand and address the first problem, three approaches are developed. Namely, we introduce randomness to the training process; we investigate various normalization methods; most importantly we develop a better parameter initialization strategy to help stabilize training. In the randomness techniques part of the thesis, we developed two randomness approaches, namely the addition of gradient noise and the batch random flipping of the results from the discrimination section of a GAN. In the normalization part of the thesis, we compared the performances of the z-score transform, the min-max normalization, affine transformations and batch normalization. In the most novel and important part of this thesis, we developed techniques to initialize the GAN generator section with parameters that can produce a uniform distribution on the range of the training data. As far as we are aware, this seemingly simple idea has not yet appeared in the extant literature, and the empirical results we obtain on 2-dimensional synthetic data show marked improvement. As to better evaluation metrics, we demonstrate a simple yet effective way to evaluate the effectiveness of the generator using a novel "overlap loss". batch normalization overlap loss randomness step-up GAN
3	The Effect of Batch Normalization on Deep Convolutional Neural Networks / Effekten av batch normalization på djupt faltningsneuronnät Schilling, Fabian January 2016 (has links) Batch normalization is a recently popularized method for accelerating the training of deep feed-forward neural networks. Apart from speed improvements, the technique reportedly enables the use of higher learning rates, less careful parameter initialization, and saturating nonlinearities. The authors note that the precise effect of batch normalization on neural networks remains an area of further study, especially regarding their gradient propagation. Our work compares the convergence behavior of batch normalized networks with ones that lack such normalization. We train both a small multi-layer perceptron and a deep convolutional neural network on four popular image datasets. By systematically altering critical hyperparameters, we isolate the effects of batch normalization both in general and with respect to these hyperparameters. Our experiments show that batch normalization indeed has positive effects on many aspects of neural networks but we cannot confirm significant convergence speed improvements, especially when wall time is taken into account. Overall, batch normalized models achieve higher validation and test accuracies on all datasets, which we attribute to its regularizing effect and more stable gradient propagation. Due to these results, the use of batch normalization is generally advised since it prevents model divergence and may increase convergence speeds through higher learning rates. Regardless of these properties, we still recommend the use of variance-preserving weight initialization, as well as rectifiers over saturating nonlinearities. / Batch normalization är en metod för att påskynda träning av djupa framåtmatande neuronnnätv som nyligt blivit populär. Förutom hastighetsförbättringar så tillåter metoden enligt uppgift högre träningshastigheter, mindre noggrann parameterinitiering och mättande olinjäriteter. Författarna noterar att den exakta effekten av batch normalization på neuronnät fortfarande är ett område som kräver ytterligare studier, särskilt när det gäller deras gradient-fortplantning. Vårt arbete jämför konvergensbeteende mellan nätverk med och utan batch normalization. Vi träner både en liten flerlagersperceptron och ett djupt faltningsneuronnät på fyra populära bilddatamängder. Genom att systematiskt ändra kritiska hyperparametrar isolerar vi effekterna från batch normalization både i allmänhet och med avseende på dessa hyperparametrar. Våra experiment visar att batch normalization har positiva effekter på många aspekter av neuronnät, men vi kan inte bekräfta att det ger betydelsefullt snabbare konvergens, speciellt när väggtiden beaktas. Allmänt så uppnår modeller med batch normalization högre validerings- och testträffsäkerhet på alla datamängder, vilket vi tillskriver till dess reglerande effekt och mer stabil gradientfortplantning. På grund av dessa resultat är användningen av batch normalization generellt rekommenderat eftersom det förhindrar modelldivergens och kan öka konvergenshastigheter genom högre träningshastigheter. Trots dessa egenskaper rekommenderar vi fortfarande användning av varians-bevarande viktinitiering samt likriktare istället för mättande olinjäriteter. batch normalization deep learning convolutional neural network Computer Sciences Datavetenskap (datalogi)
4	Rekurentní neuronové sítě pro rozpoznávání řeči / Recurrent Neural Networks for Speech Recognition Nováčik, Tomáš January 2016 (has links) This master thesis deals with the implementation of various types of recurrent neural networks via programming language lua using torch library. It focuses on finding optimal strategy for training recurrent neural networks and also tries to minimize the duration of the training. Furthermore various types of regularization techniques are investigated and implemented into the recurrent neural network architecture. Implemented recurrent neural networks are compared on the speech recognition task using AMI dataset, where they model the acustic information. Their performance is also compared to standard feedforward neural network. Best results are achieved using BLSTM architecture. The recurrent neural network are also trained via CTC objective function on the TIMIT dataset. Best result is again achieved using BLSTM architecture.
5	Advances in parameterisation, optimisation and pruning of neural networks Laurent, César 10 1900 (has links) Les réseaux de neurones sont une famille de modèles de l'apprentissage automatique qui sont capable d'apprendre des tâches complexes directement des données. Bien que produisant déjà des résultats impressionnants dans beaucoup de domaines tels que la reconnaissance de la parole, la vision par ordinateur ou encore la traduction automatique, il y a encore de nombreux défis dans l'entraînement et dans le déploiement des réseaux de neurones. En particulier, entraîner des réseaux de neurones nécessite typiquement d'énormes ressources computationnelles, et les modèles entraînés sont souvent trop gros ou trop gourmands en ressources pour être déployés sur des appareils dont les ressources sont limitées, tels que les téléphones intelligents ou les puces de faible puissance. Les articles présentés dans cette thèse étudient des solutions à ces différents problèmes. Les deux premiers articles se concentrent sur l'amélioration de l'entraînement des réseaux de neurones récurrents (RNNs), un type de réseaux de neurones particulier conçu pour traiter des données séquentielles. Les RNNs sont notoirement difficiles à entraîner, donc nous proposons d'améliorer leur paramétrisation en y intégrant la normalisation par lots (BN), qui était jusqu'à lors uniquement appliquée aux réseaux non-récurrents. Dans le premier article, nous appliquons BN aux connections des entrées vers les couches cachées du RNN, ce qui réduit le décalage covariable entre les différentes couches; et dans le second article, nous montrons comment appliquer BN aux connections des entrées vers les couches cachées et aussi des couches cachée vers les couches cachée des réseau récurrents à mémoire court et long terme (LSTM), une architecture populaire de RNN, ce qui réduit également le décalage covariable entre les pas de temps. Nos expériences montrent que les paramétrisations proposées permettent d'entraîner plus rapidement et plus efficacement les RNNs, et ce sur différents bancs de tests. Dans le troisième article, nous proposons un nouvel optimiseur pour accélérer l'entraînement des réseaux de neurones. Les optimiseurs diagonaux traditionnels, tels que RMSProp, opèrent dans l'espace des paramètres, ce qui n'est pas optimal lorsque plusieurs paramètres sont mis à jour en même temps. A la place, nous proposons d'appliquer de tels optimiseurs dans une base dans laquelle l'approximation diagonale est susceptible d'être plus efficace. Nous tirons parti de l'approximation K-FAC pour construire efficacement cette base propre Kronecker-factorisée (KFE). Nos expériences montrent une amélioration en vitesse d'entraînement par rapport à K-FAC, et ce pour différentes architectures de réseaux de neurones profonds. Le dernier article se concentre sur la taille des réseaux de neurones, i.e. l'action d'enlever des paramètres du réseau, afin de réduire son empreinte mémoire et son coût computationnel. Les méthodes de taille typique se base sur une approximation de Taylor de premier ou de second ordre de la fonction de coût, afin d'identifier quels paramètres peuvent être supprimés. Nous proposons d'étudier l'impact des hypothèses qui se cachent derrière ces approximations. Aussi, nous comparons systématiquement les méthodes basées sur des approximations de premier et de second ordre avec la taille par magnitude (MP), et montrons comment elles fonctionnent à la fois avant, mais aussi après une phase de réapprentissage. Nos expériences montrent que mieux préserver la fonction de coût ne transfère pas forcément à des réseaux qui performent mieux après la phase de réapprentissage, ce qui suggère que considérer uniquement l'impact de la taille sur la fonction de coût ne semble pas être un objectif suffisant pour développer des bon critères de taille. / Neural networks are a family of Machine Learning models able to learn complex tasks directly from the data. Although already producing impressive results in many areas such as speech recognition, computer vision or machine translation, there are still a lot of challenges in both training and deployment of neural networks. In particular, training neural networks typically requires huge amounts of computational resources, and trained models are often too big or too computationally expensive to be deployed on resource-limited devices, such as smartphones or low-power chips. The articles presented in this thesis investigate solutions to these different issues. The first couple of articles focus on improving the training of Recurrent Neural Networks (RNNs), networks specially designed to process sequential data. RNNs are notoriously hard to train, so we propose to improve their parameterisation by upgrading them with Batch Normalisation (BN), a very effective parameterisation which was hitherto used only in feed-forward networks. In the first article, we apply BN to the input-to-hidden connections of the RNNs, thereby reducing internal covariate shift between layers. In the second article, we show how to apply it to both input-to-hidden and hidden-to-hidden connections of the Long Short-Term Memory (LSTM), a popular RNN architecture, thus also reducing internal covariate shift between time steps. Our experiments show that these proposed parameterisations allow for faster and better training of RNNs on several benchmarks. In the third article, we propose a new optimiser to accelerate the training of neural networks. Traditional diagonal optimisers, such as RMSProp, operate in parameters coordinates, which is not optimal when several parameters are updated at the same time. Instead, we propose to apply such optimisers in a basis in which the diagonal approximation is likely to be more effective. We leverage the same approximation used in Kronecker-factored Approximate Curvature (K-FAC) to efficiently build this Kronecker-factored Eigenbasis (KFE). Our experiments show improvements over K-FAC in training speed for several deep network architectures. The last article focuses on network pruning, the action of removing parameters from the network, in order to reduce its memory footprint and computational cost. Typical pruning methods rely on first or second order Taylor approximations of the loss landscape to identify which parameters can be discarded. We propose to study the impact of the assumptions behind such approximations. Moreover, we systematically compare methods based on first and second order approximations with Magnitude Pruning (MP), showing how they perform both before and after a fine-tuning phase. Our experiments show that better preserving the original network function does not necessarily transfer to better performing networks after fine-tuning, suggesting that only considering the impact of pruning on the loss might not be a sufficient objective to design good pruning criteria. Neural networks Deep learning Recurrent neural networks Batch normalization Unstructured pruning Natural gradient Kronecker factorisation Réseaux de neurones Apprentissage profond Réseaux de neurones récurrents Normalisation par lots Taille non structurée Gradient naturel Factorisation de Kronecker

1

Page generated in 0.1134 seconds