• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 99
  • 14
  • 13
  • 12
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 175
  • 175
  • 175
  • 92
  • 60
  • 57
  • 55
  • 49
  • 34
  • 33
  • 32
  • 29
  • 28
  • 28
  • 28
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

On Recurrent and Deep Neural Networks

Pascanu, Razvan 05 1900 (has links)
L'apprentissage profond est un domaine de recherche en forte croissance en apprentissage automatique qui est parvenu à des résultats impressionnants dans différentes tâches allant de la classification d'images à la parole, en passant par la modélisation du langage. Les réseaux de neurones récurrents, une sous-classe d'architecture profonde, s'avèrent particulièrement prometteurs. Les réseaux récurrents peuvent capter la structure temporelle dans les données. Ils ont potentiellement la capacité d'apprendre des corrélations entre des événements éloignés dans le temps et d'emmagasiner indéfiniment des informations dans leur mémoire interne. Dans ce travail, nous tentons d'abord de comprendre pourquoi la profondeur est utile. Similairement à d'autres travaux de la littérature, nos résultats démontrent que les modèles profonds peuvent être plus efficaces pour représenter certaines familles de fonctions comparativement aux modèles peu profonds. Contrairement à ces travaux, nous effectuons notre analyse théorique sur des réseaux profonds acycliques munis de fonctions d'activation linéaires par parties, puisque ce type de modèle est actuellement l'état de l'art dans différentes tâches de classification. La deuxième partie de cette thèse porte sur le processus d'apprentissage. Nous analysons quelques techniques d'optimisation proposées récemment, telles l'optimisation Hessian free, la descente de gradient naturel et la descente des sous-espaces de Krylov. Nous proposons le cadre théorique des méthodes à région de confiance généralisées et nous montrons que plusieurs de ces algorithmes développés récemment peuvent être vus dans cette perspective. Nous argumentons que certains membres de cette famille d'approches peuvent être mieux adaptés que d'autres à l'optimisation non convexe. La dernière partie de ce document se concentre sur les réseaux de neurones récurrents. Nous étudions d'abord le concept de mémoire et tentons de répondre aux questions suivantes: Les réseaux récurrents peuvent-ils démontrer une mémoire sans limite? Ce comportement peut-il être appris? Nous montrons que cela est possible si des indices sont fournis durant l'apprentissage. Ensuite, nous explorons deux problèmes spécifiques à l'entraînement des réseaux récurrents, à savoir la dissipation et l'explosion du gradient. Notre analyse se termine par une solution au problème d'explosion du gradient qui implique de borner la norme du gradient. Nous proposons également un terme de régularisation conçu spécifiquement pour réduire le problème de dissipation du gradient. Sur un ensemble de données synthétique, nous montrons empiriquement que ces mécanismes peuvent permettre aux réseaux récurrents d'apprendre de façon autonome à mémoriser des informations pour une période de temps indéfinie. Finalement, nous explorons la notion de profondeur dans les réseaux de neurones récurrents. Comparativement aux réseaux acycliques, la définition de profondeur dans les réseaux récurrents est souvent ambiguë. Nous proposons différentes façons d'ajouter de la profondeur dans les réseaux récurrents et nous évaluons empiriquement ces propositions. / Deep Learning is a quickly growing area of research in machine learning, providing impressive results on different tasks ranging from image classification to speech and language modelling. In particular, a subclass of deep models, recurrent neural networks, promise even more. Recurrent models can capture the temporal structure in the data. They can learn correlations between events that might be far apart in time and, potentially, store information for unbounded amounts of time in their innate memory. In this work we first focus on understanding why depth is useful. Similar to other published work, our results prove that deep models can be more efficient at expressing certain families of functions compared to shallow models. Different from other work, we carry out our theoretical analysis on deep feedforward networks with piecewise linear activation functions, the kind of models that have obtained state of the art results on different classification tasks. The second part of the thesis looks at the learning process. We analyse a few recently proposed optimization techniques, including Hessian Free Optimization, natural gradient descent and Krylov Subspace Descent. We propose the framework of generalized trust region methods and show that many of these recently proposed algorithms can be viewed from this perspective. We argue that certain members of this family of approaches might be better suited for non-convex optimization than others. The last part of the document focuses on recurrent neural networks. We start by looking at the concept of memory. The questions we attempt to answer are: Can recurrent models exhibit unbounded memory? Can this behaviour be learnt? We show this to be true if hints are provided during learning. We explore, afterwards, two specific difficulties of training recurrent models, namely the vanishing gradients and exploding gradients problem. Our analysis concludes with a heuristic solution for the exploding gradients that involves clipping the norm of the gradients. We also propose a specific regularization term meant to address the vanishing gradients problem. On a toy dataset, employing these mechanisms, we provide anecdotal evidence that the recurrent model might be able to learn, with out hints, to exhibit some sort of unbounded memory. Finally we explore the concept of depth for recurrent neural networks. Compared to feedforward models, for recurrent models the meaning of depth can be ambiguous. We provide several ways in which a recurrent model can be made deep and empirically evaluate these proposals.
122

Modélisation de l'interprétation des pianistes & applications d'auto-encodeurs sur des modèles temporels

Lauly, Stanislas 04 1900 (has links)
Ce mémoire traite d'abord du problème de la modélisation de l'interprétation des pianistes à l'aide de l'apprentissage machine. Il s'occupe ensuite de présenter de nouveaux modèles temporels qui utilisent des auto-encodeurs pour améliorer l'apprentissage de séquences. Dans un premier temps, nous présentons le travail préalablement fait dans le domaine de la modélisation de l'expressivité musicale, notamment les modèles statistiques du professeur Widmer. Nous parlons ensuite de notre ensemble de données, unique au monde, qu'il a été nécessaire de créer pour accomplir notre tâche. Cet ensemble est composé de 13 pianistes différents enregistrés sur le fameux piano Bösendorfer 290SE. Enfin, nous expliquons en détail les résultats de l'apprentissage de réseaux de neurones et de réseaux de neurones récurrents. Ceux-ci sont appliqués sur les données mentionnées pour apprendre les variations expressives propres à un style de musique. Dans un deuxième temps, ce mémoire aborde la découverte de modèles statistiques expérimentaux qui impliquent l'utilisation d'auto-encodeurs sur des réseaux de neurones récurrents. Pour pouvoir tester la limite de leur capacité d'apprentissage, nous utilisons deux ensembles de données artificielles développées à l'Université de Toronto. / This thesis addresses the problem of modeling pianists' interpretations using machine learning, and presents new models that use temporal auto-encoders to improve their learning for sequences. We present previous work in the field of modeling musical expression, including Professor Widmer's statistical models. We then discuss our unique dataset created specifically for our task. This dataset is composed of 13 different pianists recorded on the famous Bösendorfer 290SE piano. Finally, we present the learning results of neural networks and recurrent neural networks in detail. These algorithms are applied to the dataset to learn expressive variations specific to a style of music. We also present novel statistical models involving the use of auto-encoders in recurrent neural networks. To test the limits of these algorithms' ability to learn, we use two artificial datasets developed at the University of Toronto.
123

Métodos neuronais para a solução da equação algébrica de Riccati e o LQR / Neural methods for the solution of Equation Of algebraic Riccati and LQR

SILVA, Fabio Nogueira da 20 June 2008 (has links)
Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-08-14T18:28:45Z No. of bitstreams: 1 FabioSilva.pdf: 1098466 bytes, checksum: a72dcced91748fe6c54f3cab86c19849 (MD5) / Made available in DSpace on 2017-08-14T18:28:45Z (GMT). No. of bitstreams: 1 FabioSilva.pdf: 1098466 bytes, checksum: a72dcced91748fe6c54f3cab86c19849 (MD5) Previous issue date: 2008-06-20 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPQ) / Fundação de Amparo à Pesquisa e ao Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA) / We present in this work the results about two neural networks methods to solve the algebraic Riccati(ARE), what are used in many applications, mainly in the Linear Quadratic Regulator (LQR), H2 and H1 controls. First is showed the real symmetric form of the ARE and two methods based on neural computation. One feedforward neural network (FNN), that de¯nes an error as function of the ARE and a recurrent neural network (RNN), which converts a constrain optimization problem, restricted to the state space model, into an unconstrained convex optimization problem de¯ning an energy as function of the ARE and Cholesky factor. A proposal to chose the learning parameters of the RNN used to solve the ARE, by making a surface of the parameters variations, thus we can tune the neural network for a better performance. Computational experiments related with the plant matrices perturbations of the tested systems in order to perform an analysis of the behavior of the presented methodologies, that are based on homotopies methods, where we chose a good initial condition and compare the results to the Schur method. Two 6th order systems were used, a Doubly Fed Induction Generator(DFIG) and an aircraft plant. The results showed the RNN a good alternative compared with the FNN and Schur methods. / Apresenta-se nesta dissertação os resultados a respeito de dois métodos neuronais para a resolução da equação algébrica de Riccati(EAR), que tem varias aplicações, sendo principalmente usada pelos Regulador Linear Quadrático(LQR), controle H2 e controle H1. É apresentado a EAR real e simétrica e dois métodos baseados em uma rede neuronal direta (RND) que tem a função de erro associada a EAR e uma rede neuronal recorrente (RNR) que converte um problema de otimização restrita ao modelo de espaço de estados em outro de otimização convexa em função da EAR e do fator de Cholesky de modo a usufruir das propriedades de convexidade e condições de otimalidade. Uma proposta para a escolha dos parâmetros da RNR usada para solucionar a EAR por meio da geração de superfícies com a variação paramétrica da RNR, podendo assim melhor sintonizar a rede neuronal para um melhor desempenho. Experimentos computacionais relacionados a perturbações nos sistemas foram realizados para analisar o comportamento das metodologias apresentadas, tendo como base o princípio dos métodos homotópicos, com uma boa condição inicial, a partir de uma ponto de operação estável e comparamos os resultados com o método de Schur. Foram usadas as plantas de dois sistemas: uma representando a dinâmica de uma aeronave e outra de um motor de indução eólico duplamente alimentado(DFIG), ambos sistemas de 6a ordem. Os resultados mostram que a RNR é uma boa alternativa se comparado com a RND e com o método de Schur.
124

Long Term Forecasting of Industrial Electricity Consumption Data With GRU, LSTM and Multiple Linear Regression

Buzatoiu, Roxana January 2020 (has links)
Accurate long-term energy consumption forecasting of industrial entities is of interest to distribution companies as it can potentially help reduce their churn and offer support in decision making when hedging. This thesis work presents different methods to forecast the energy consumption for industrial entities over a long time prediction horizon of 1 year. Notably, it includes experimentations with two variants of the Recurrent Neural Networks, namely Gated Recurrent Unit (GRU) and Long-Short-Term-Memory (LSTM). Their performance is compared against traditional approaches namely Multiple Linear Regression (MLR) and Seasonal Autoregressive Integrated Moving Average (SARIMA). Further on, the investigation focuses on tailoring the Recurrent Neural Network model to improve the performance. The experiments focus on the impact of different model architectures. Secondly, it focuses on testing the effect of time-related feature selection as an additional input to the Recurrent Neural Network (RNN) networks. Specifically, it explored how traditional methods such as Exploratory Data Analysis, Autocorrelation, and Partial Autocorrelation Functions Plots can contribute to the performance of RNN model. The current work shows through an empirical study on three industrial datasets that GRU architecture is a powerful method for the long-term forecasting task which outperforms LSTM on certain scenarios. In comparison to the MLR model, the RNN achieved a reduction in the RMSE between 5% up to to 10%. The most important findings include: (i) GRU architecture outperforms LSTM on industrial energy consumption datasets when compared against a lower number of hidden units. Also, GRU outperforms LSTM on certain datasets, regardless of the choice units number; (ii) RNN variants yield a better accuracy than statistical or regression models; (iii) using ACF and PACF as dicovery tools in the feature selection process is unconclusive and unefficient when aiming for a general model; (iv) using deterministic features (such as day of the year, day of the month) has limited effects on improving the deep learning model’s performance. / Noggranna långsiktiga energiprognosprognoser för industriella enheter är av intresse för distributionsföretag eftersom det potentiellt kan bidra till att minska deras churn och erbjuda stöd i beslutsfattandet vid säkring. Detta avhandlingsarbete presenterar olika metoder för att prognostisera energiförbrukningen för industriella enheter under en lång tids förutsägelsehorisont på 1 år. I synnerhet inkluderar det experiment med två varianter av de återkommande neurala nätverken, nämligen GRU och LSTM. Deras prestanda jämförs med traditionella metoder, nämligen MLR och SARIMA. Vidare fokuserar undersökningen på att skräddarsy modellen för återkommande neurala nätverk för att förbättra prestanda. Experimenten fokuserar på effekterna av olika modellarkitekturer. För det andra fokuserar den på att testa effekten av tidsrelaterat funktionsval som en extra ingång till RNN -nätverk. Specifikt undersökte den hur traditionella metoder som Exploratory Data Analysis, Autocorrelation och Partial Autocorrelation Funtions Plots kan bidra till prestanda för RNN -modellen. Det aktuella arbetet visar genom en empirisk studie av tre industriella datamängder att GRU -arkitektur är en kraftfull metod för den långsiktiga prognosuppgiften som överträffar ac LSTM på vissa scenarier. Jämfört med MLR -modellen uppnådde RNN en minskning av RMSE mellan 5 % upp till 10 %. De viktigaste resultaten inkluderar: (i) GRU -arkitekturen överträffar LSTM på datauppsättningar för industriell energiförbrukning jämfört med ett lägre antal dolda enheter. GRU överträffar också LSTM på vissa datauppsättningar, oavsett antalet valenheter; (ii) RNN -varianter ger bättre noggrannhet än statistiska modeller eller regressionsmodeller; (iii) att använda ACF och PACF som verktyg för upptäckt i funktionsvalsprocessen är otydligt och ineffektivt när man siktar på en allmän modell; (iv) att använda deterministiska funktioner (t.ex. årets dag, månadsdagen) har begränsade effekter på att förbättra djupinlärningsmodellens prestanda.
125

Constrained measurement systems of low-dimensional signals

Yap, Han Lun 20 December 2012 (has links)
The object of this thesis is the study of constrained measurement systems of signals having low-dimensional structure using analytic tools from Compressed Sensing (CS). Realistic measurement systems usually have architectural constraints that make them differ from their idealized, well-studied counterparts. Nonetheless, these measurement systems can exploit structure in the signals that they measure. Signals considered in this research have low-dimensional structure and can be broken down into two types: static or dynamic. Static signals are either sparse in a specified basis or lying on a low-dimensional manifold (called manifold-modeled signals). Dynamic signals, exemplified as states of a dynamical system, either lie on a low-dimensional manifold or have converged onto a low-dimensional attractor. In CS, the Restricted Isometry Property (RIP) of a measurement system ensures that distances between all signals of a certain sparsity are preserved. This stable embedding ensures that sparse signals can be distinguished one from another by their measurements and therefore be robustly recovered. Moreover, signal-processing and data-inference algorithms can be performed directly on the measurements instead of requiring a prior signal recovery step. Taking inspiration from the RIP, this research analyzes conditions on realistic, constrained measurement systems (of the signals described above) such that they are stable embeddings of the signals that they measure. Specifically, this thesis focuses on four different types of measurement systems. First, we study the concentration of measure and the RIP of random block diagonal matrices that represent measurement systems constrained to make local measurements. Second, we study the stable embedding of manifold-modeled signals by existing CS matrices. The third part of this thesis deals with measurement systems of dynamical systems that produce time series observations. While Takens' embedding result ensures that this time series output can be an embedding of the dynamical systems' states, our research establishes that a stronger stable embedding result is possible under certain conditions. The final part of this thesis is the application of CS ideas to the study of the short-term memory of neural networks. In particular, we show that the nodes of a recurrent neural network can be a stable embedding of sparse input sequences.
126

Αναγνώριση ομιλητή / Speaker recognition

Ganchev, Todor 25 June 2007 (has links)
Η παρούσα διατριβή πραγματεύεται την αναγνώριση ομιλητή σε πραγματικές συνθήκες. Τα κύρια σημεία της εργασίας είναι: (1) αξιολόγηση διαφόρων προσεγγίσεων εξαγωγής χαρακτηριστικών παραμέτρων ομιλίας, (2) μείωση της ισχύος της περιβαλλοντικής επίδρασης στην απόδοση της αναγνώρισης ομιλητή, και (3) μελέτη τεχνικών κατηγοριοποίησης, εναλλακτικών προς τις υπάρχουσες. Συγκεκριμένα, στο (1), προτείνεται μια νέα δομή εξαγωγής παραμέτρων ομιλίας βασισμένη σε πακέτα κυματομορφών, κατάλληλα σχεδιασμένη για αναγνώριση ομιλητή. Εξάγεται με ένα αντικειμενικό τρόπο σε σχέση με την απόδοση αναγνώρισης ομιλητή, σε αντίθεση με την MFCC προσέγγιση, που βασίζεται στην προσέγγιση της αντίληψης της ανθρώπινης ακοής. Έπειτα, στο (2), δίνεται μια δομή για την εξαγωγή παραμέτρων βασισμένη στα MFCC, ανεκτική στο θόρυβο, για την βελτίωση της απόδοσης της αναγνώρισης ομιλητή σε πραγματικό περιβάλλον. Συνοπτικά, μια τεχνική μείωσης του θορύβου βασισμένη σε μοντέλο προσαρμοσμένη στο πρόβλημα της επιβεβαίωσης ομιλητή ενσωματώνεται απευθείας στη δομή υπολογισμού των MFCC. Αυτή η προσέγγιση επέδειξε σημαντικό πλεονέκτημα σε πραγματικό και ταχέως μεταβαλλόμενο περιβάλλον. Τέλος, στο (3), εισάγονται δύο νέοι κατηγοριοποιητές που αναφέρονται ως Locally Recurrent Probabilistic Neural Network (LR PNN), και Generalized Locally Recurrent Probabilistic Neural Network (GLR PNN). Είναι υβρίδια μεταξύ των Recurrent Neural Network (RNN) και Probabilistic Neural Network (PNN) και συνδυάζουν τα πλεονεκτήματα των γεννετικών και διαφορικών προσσεγγίσεων κατηγοριοποίησης. Επιπλέον, τα νέα αυτά νευρωνικά δίκτυα είναι ευαίσθητα σε παροδικές και ειδικές συσχετίσεις μεταξύ διαδοχικών εισόδων, και έτσι, είναι κατάλληλα για να αξιοποιήσουν την συσχέτιση παραμέτρων ομιλίας μεταξύ πλαισίων ομιλίας. Κατά την εξαγωγή των πειραμάτων, διαφάνηκε ότι οι αρχιτεκτονικές LR PNN και GLR PNN παρέχουν καλύτερη απόδοση, σε σχέση με τα αυθεντικά PNN. / This dissertation dials with speaker recognition in real-world conditions. The main accent falls on: (1) evaluation of various speech feature extraction approaches, (2) reduction of the impact of environmental interferences on the speaker recognition performance, and (3) studying alternative to the present state-of-the-art classification techniques. Specifically, within (1), a novel wavelet packet-based speech features extraction scheme fine-tuned for speaker recognition is proposed. It is derived in an objective manner with respect to the speaker recognition performance, in contrast to the state-of-the-art MFCC scheme, which is based on approximation of human auditory perception. Next, within (2), an advanced noise-robust feature extraction scheme based on MFCC is offered for improving the speaker recognition performance in real-world environments. In brief, a model-based noise reduction technique adapted for the specifics of the speaker verification task is incorporated directly into the MFCC computation scheme. This approach demonstrated significant advantage in real-world fast-varying environments. Finally, within (3), two novel classifiers referred to as Locally Recurrent Probabilistic Neural Network (LR PNN), and Generalized Locally Recurrent Probabilistic Neural Network (GLR PNN) are introduced. They are hybrids between Recurrent Neural Network (RNN) and Probabilistic Neural Network (PNN) and combine the virtues of the generative and discriminative classification approaches. Moreover, these novel neural networks are sensitive to temporal and special correlations among consecutive inputs, and therefore, are capable to exploit the inter-frame correlations among speech features derived for successive speech frames. In the experimentations, it was demonstrated that the LR PNN and GLR PNN architectures provide benefit in terms of performance, when compared to the original PNN.
127

Dynamical modeling with application to friction phenomena / Dynamische Modellierung mit Anwendung auf Reibungsphaenomene

Hornstein, Alexander 09 November 2005 (has links)
No description available.
128

Modélisation de l'interprétation des pianistes & applications d'auto-encodeurs sur des modèles temporels

Lauly, Stanislas 04 1900 (has links)
Ce mémoire traite d'abord du problème de la modélisation de l'interprétation des pianistes à l'aide de l'apprentissage machine. Il s'occupe ensuite de présenter de nouveaux modèles temporels qui utilisent des auto-encodeurs pour améliorer l'apprentissage de séquences. Dans un premier temps, nous présentons le travail préalablement fait dans le domaine de la modélisation de l'expressivité musicale, notamment les modèles statistiques du professeur Widmer. Nous parlons ensuite de notre ensemble de données, unique au monde, qu'il a été nécessaire de créer pour accomplir notre tâche. Cet ensemble est composé de 13 pianistes différents enregistrés sur le fameux piano Bösendorfer 290SE. Enfin, nous expliquons en détail les résultats de l'apprentissage de réseaux de neurones et de réseaux de neurones récurrents. Ceux-ci sont appliqués sur les données mentionnées pour apprendre les variations expressives propres à un style de musique. Dans un deuxième temps, ce mémoire aborde la découverte de modèles statistiques expérimentaux qui impliquent l'utilisation d'auto-encodeurs sur des réseaux de neurones récurrents. Pour pouvoir tester la limite de leur capacité d'apprentissage, nous utilisons deux ensembles de données artificielles développées à l'Université de Toronto. / This thesis addresses the problem of modeling pianists' interpretations using machine learning, and presents new models that use temporal auto-encoders to improve their learning for sequences. We present previous work in the field of modeling musical expression, including Professor Widmer's statistical models. We then discuss our unique dataset created specifically for our task. This dataset is composed of 13 different pianists recorded on the famous Bösendorfer 290SE piano. Finally, we present the learning results of neural networks and recurrent neural networks in detail. These algorithms are applied to the dataset to learn expressive variations specific to a style of music. We also present novel statistical models involving the use of auto-encoders in recurrent neural networks. To test the limits of these algorithms' ability to learn, we use two artificial datasets developed at the University of Toronto.
129

Réseaux de neurones à relaxation entraînés par critère d'autoencodeur débruitant

Savard, François 08 1900 (has links)
L’apprentissage machine est un vaste domaine où l’on cherche à apprendre les paramètres de modèles à partir de données concrètes. Ce sera pour effectuer des tâches demandant des aptitudes attribuées à l’intelligence humaine, comme la capacité à traiter des don- nées de haute dimensionnalité présentant beaucoup de variations. Les réseaux de neu- rones artificiels sont un exemple de tels modèles. Dans certains réseaux de neurones dits profonds, des concepts "abstraits" sont appris automatiquement. Les travaux présentés ici prennent leur inspiration de réseaux de neurones profonds, de réseaux récurrents et de neuroscience du système visuel. Nos tâches de test sont la classification et le débruitement d’images quasi binaires. On permettra une rétroac- tion où des représentations de haut niveau (plus "abstraites") influencent des représentations à bas niveau. Cette influence s’effectuera au cours de ce qu’on nomme relaxation, des itérations où les différents niveaux (ou couches) du modèle s’interinfluencent. Nous présentons deux familles d’architectures, l’une, l’architecture complètement connectée, pouvant en principe traiter des données générales et une autre, l’architecture convolutionnelle, plus spécifiquement adaptée aux images. Dans tous les cas, les données utilisées sont des images, principalement des images de chiffres manuscrits. Dans un type d’expérience, nous cherchons à reconstruire des données qui ont été corrompues. On a pu y observer le phénomène d’influence décrit précédemment en comparant le résultat avec et sans la relaxation. On note aussi certains gains numériques et visuels en terme de performance de reconstruction en ajoutant l’influence des couches supérieures. Dans un autre type de tâche, la classification, peu de gains ont été observés. On a tout de même pu constater que dans certains cas la relaxation aiderait à apprendre des représentations utiles pour classifier des images corrompues. L’architecture convolutionnelle développée, plus incertaine au départ, permet malgré tout d’obtenir des reconstructions numériquement et visuellement semblables à celles obtenues avec l’autre architecture, même si sa connectivité est contrainte. / Machine learning is a vast field where we seek to learn parameters for models from concrete data. The goal will be to execute various tasks requiring abilities normally associated more with human intelligence than with a computer program, such as the ability to process high dimensional data containing a lot of variations. Artificial neural networks are a large class of such models. In some neural networks said to be deep, we can observe that high level (or "abstract") concepts are automatically learned. The work we present here takes its inspiration from deep neural networks, from recurrent networks and also from neuroscience of the visual system. Our test tasks are classification and denoising for near binary images. We aim to take advantage of a feedback mechanism through which high-level representations, that is to say relatively abstract concepts, can influence lower-level representations. This influence will happen during what we call relaxation, which is iterations where the different levels (or layers) of the model can influence each other. We will present two families of architectures based on this mechanism. One, the fully connected architecture, can in principle accept generic data. The other, the convolutional one, is specifically made for images. Both were trained on images, though, and mostly images of written characters. In one type of experiment, we want to reconstruct data that has been corrupted. In these tasks, we have observed the feedback influence phenomenon previously described by comparing the results we obtained with and without relaxation. We also note some numerical and visual improvement in terms of reconstruction performance when we add upper layers’ influence. In another type of task, classification, little gain has been noted. Still, in one setting where we tried to classify noisy data with a representation trained without prior class information, relaxation did seem to improve results significantly. The convolutional architecture, a bit more risky at first, was shown to produce numerical and visual results in reconstruction that are near those obtained with the fully connected version, even though the connectivity is much more constrained.
130

Classifying Hate Speech using Fine-tuned Language Models

Brorson, Erik January 2018 (has links)
Given the explosion in the size of social media, the amount of hate speech is also growing. To efficiently combat this issue we need reliable and scalable machine learning models. Current solutions rely on crowdsourced datasets that are limited in size, or using training data from self-identified hateful communities, that lacks specificity. In this thesis we introduce a novel semi-supervised modelling strategy. It is first trained on the freely available data from the hateful communities and then fine-tuned to classify hateful tweets from crowdsourced annotated datasets. We show that our model reach state of the art performance with minimal hyper-parameter tuning.

Page generated in 0.0825 seconds