Global ETD Search

1	Training Recurrent Neural Networks Sutskever, Ilya 13 August 2013 (has links) Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization. Recurrent Neural Networks Optimization 0800 0984 0463
2	Training Recurrent Neural Networks Sutskever, Ilya 13 August 2013 (has links) Recurrent Neural Networks (RNNs) are powerful sequence models that were believed to be difficult to train, and as a result they were rarely used in machine learning applications. This thesis presents methods that overcome the difficulty of training RNNs, and applications of RNNs to challenging problems. We first describe a new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs. The new model is more powerful than similar models while being less difficult to train. Next, we present a new variant of the Hessian-free (HF) optimizer and show that it can train RNNs on tasks that have extreme long-range temporal dependencies, which were previously considered to be impossibly hard. We then apply HF to character-level language modelling and get excellent results. We also apply HF to optimal control and obtain RNN control laws that can successfully operate under conditions of delayed feedback and unknown disturbances. Finally, we describe a random parameter initialization scheme that allows gradient descent with momentum to train RNNs on problems with long-term dependencies. This directly contradicts widespread beliefs about the inability of first-order methods to do so, and suggests that previous attempts at training RNNs failed partly due to flaws in the random initialization. Recurrent Neural Networks Optimization 0800 0984 0463
3	Learning Sparse Recurrent Neural Networks in Language Modeling Shao, Yuanlong 25 September 2014 (has links) No description available. Computer Science Artificial Intelligence language modeling recurrent neural networks sparse recurrent neural networks
4	Toward a Brain-like Memory with Recurrent Neural Networks Salihoglu, Utku 12 November 2009 (has links) For the last twenty years, several assumptions have been expressed in the fields of information processing, neurophysiology and cognitive sciences. First, neural networks and their dynamical behaviors in terms of attractors is the natural way adopted by the brain to encode information. Any information item to be stored in the neural network should be coded in some way or another in one of the dynamical attractors of the brain, and retrieved by stimulating the network to trap its dynamics in the desired item’s basin of attraction. The second view shared by neural network researchers is to base the learning of the synaptic matrix on a local Hebbian mechanism. The third assumption is the presence of chaos and the benefit gained by its presence. Chaos, although very simply produced, inherently possesses an infinite amount of cyclic regimes that can be exploited for coding information. Moreover, the network randomly wanders around these unstable regimes in a spontaneous way, thus rapidly proposing alternative responses to external stimuli, and being easily able to switch from one of these potential attractors to another in response to any incoming stimulus. Finally, since their introduction sixty years ago, cell assemblies have proved to be a powerful paradigm for brain information processing. After their introduction in artificial intelligence, cell assemblies became commonly used in computational neuroscience as a neural substrate for content addressable memories. Based on these assumptions, this thesis provides a computer model of neural network simulation of a brain-like memory. It first shows experimentally that the more information is to be stored in robust cyclic attractors, the more chaos appears as a regime in the background, erratically itinerating among brief appearances of these attractors. Chaos does not appear to be the cause, but the consequence of the learning. However, it appears as an helpful consequence that widens the network’s encoding capacity. To learn the information to be stored, two supervised iterative Hebbian learning algorithm are proposed. One leaves the semantics of the attractors to be associated with the feeding data unprescribed, while the other defines it a priori. Both algorithms show good results, even though the first one is more robust and has a greater storing capacity. Using these promising results, a biologically plausible alternative to these algorithms is proposed using cell assemblies as substrate for information. Even though this is not new, the mechanisms underlying their formation are poorly understood and, so far, there are no biologically plausible algorithms that can explain how external stimuli can be online stored in cell assemblies. This thesis provide such a solution combining a fast Hebbian/anti-Hebbian learning of the network's recurrent connections for the creation of new cell assemblies, and a slower feedback signal which stabilizes the cell assemblies by learning the feed forward input connections. This last mechanism is inspired by the retroaxonal hypothesis. Cell Assembly Brain Hebbian Chaos Recurrent Neural Networks Memory
5	A Comparison of Simple Recurrent and Sequential Cascaded Networks for Formal Language Recognition Jacobsson, Henrik January 1999 (has links) <p>Two classes of recurrent neural network models are compared in this report, simple recurrent networks (SRNs) and sequential cascaded networks (SCNs) which are first- and second-order networks respectively. The comparison is aimed at describing and analysing the behaviour of the networks such that the differences between them become clear. A theoretical analysis, using techniques from dynamic systems theory (DST), shows that the second-order network has more possibilities in terms of dynamical behaviours than the first-order network. It also revealed that the second order network could interpret its context with an input-dependent function in the output nodes. The experiments were based on training with backpropagation (BP) and an evolutionary algorithm (EA) on the AnBn-grammar which requires the ability to count. This analysis revealed some differences between the two training-regimes tested and also between the performance of the two types of networks. The EA was found to be far more reliable than BP in this domain. Another important finding from the experiments was that although the SCN had more possibilities than the SRN in how it could solve the problem, these were not exploited in the domain tested in this project</p> Computer and systems science Data- och systemvetenskap
6	A Comparison of Simple Recurrent and Sequential Cascaded Networks for Formal Language Recognition Jacobsson, Henrik January 1999 (has links) Two classes of recurrent neural network models are compared in this report, simple recurrent networks (SRNs) and sequential cascaded networks (SCNs) which are first- and second-order networks respectively. The comparison is aimed at describing and analysing the behaviour of the networks such that the differences between them become clear. A theoretical analysis, using techniques from dynamic systems theory (DST), shows that the second-order network has more possibilities in terms of dynamical behaviours than the first-order network. It also revealed that the second order network could interpret its context with an input-dependent function in the output nodes. The experiments were based on training with backpropagation (BP) and an evolutionary algorithm (EA) on the AnBn-grammar which requires the ability to count. This analysis revealed some differences between the two training-regimes tested and also between the performance of the two types of networks. The EA was found to be far more reliable than BP in this domain. Another important finding from the experiments was that although the SCN had more possibilities than the SRN in how it could solve the problem, these were not exploited in the domain tested in this project Information Systems
7	On Modeling Dependency Dynamics of Sequential Data: Methods and Applications Ji, Taoran 04 February 2022 (has links) Information mining and knowledge learning from sequential data is a field of growing importance in both industrial and academic fields. Sequential data, which is the natural representation format of the information flow in many applications, usually carries enormous information and is able to help researchers gain insights for various tasks such as airport threat detection, cyber-attack detection, recommender system, point-of-interest (POI) prediction, and citation forecasting. This dissertation focuses on developing the methods for sequential data-driven applications and evolutionary dynamics characterization for various topics such as transit service disruption detection, early event detection on social media, technology opportunity discovery, and traffic incident impact analysis. In particular, four specific applications are studied with four proposed novel methods, including a spatiotemporal feature learning framework for transit service disruption detection, a multi-task learning framework for cybersecurity event detection, citation dynamics modeling via multi-context attentional recurrent neural networks, and traffic incident impact forecasting via hierarchical spatiotemporal graph neural networks. For the first of these methods, the existing transit service disruption detection methods usually suffer from two significant shortcomings: 1) failing to modulate the sparsity of the social media feature domain, i.e., only a few important ``particles'' are indeed related to service disruption among the massive volume of data generated every day and 2) ignoring the real-world geographical connections of transit networks as well as the semantic consistency existing in the problem space. This work makes three contributions: 1) developing a spatiotemporal learning framework for metro disruption detection using open-source data, 2) modeling semantic similarity and spatial connectivity among metro lines in feature space, and 3) developing an optimization algorithm for solving the multi-convex and non-smooth objective function efficiently. For the second of these methods, the conventional studies in cybersecurity detection suffer from the following shortcomings: 1) unable to capture weak signals generated by the cyber-attacks on small organizations or individual accounts, 2) lack of generalization of distinct types of security incidents, and 3) failing to consider the relatedness across different types of cyber-attacks in the feature domain. Three contributions are made in this work: 1) formulating the problem of social media-based cyber-attack detection into the multi-task learning framework, 2) modeling multi-type task relatedness in feature space, and 3) developing an efficient algorithm to solve the non-smooth model with inequality constraints. For the third of these methods, conventional citation forecasting methods are using the traditional temporal point process, which suffers from several drawbacks: 1) unable to predict the technological categories of citing documents and thus are incapable of technological diversity assessment, and 2) require prior domain knowledge and thus are hard to extend to different research areas. Two contributions are made in this work: 1) formulating a novel framework to provide long-term citation predictions in an end-to-end fashion by integrating the process of learning intensity function representations and the process of predicting future citations and 2) designing two novel temporal attention mechanisms to improve the model's ability to modulate complicated temporal dependencies and to allow the model to dynamically combine the observation and prediction sides during the learning process. For the fourth of these methods, the previous work treats the traffic sensor readings as the features and views the incident duration prediction as a feature-driven regression, which typically suffers from three drawbacks: 1) ignoring the existence of the road-sensor hierarchical structure in the real-world traffic network, 2) unable to learn and modulate the hidden temporal patterns in the sensor readings, and 3) lack of consideration of the spatial connectivity between arterial roads and traffic sensors. This work makes three significant contributions: 1) designing a hierarchical graph convolutional network architecture for modeling the road-sensor hierarchy, 2) proposing novel spatiotemporal attention mechanism on the sensor- and road-level features for representation learning, and 3) presenting a graph convolutional network-based method for incident representation learning via spatial connectivity modeling and traffic characteristics modulation. / Doctor of Philosophy / Information mining and knowledge learning from sequential data is a field of growing importance in both industrial and academic fields. Sequential data, which is the natural representation format of the information flow in many applications, usually carries enormous information and is able to help researchers gain insights for various tasks such as airport threat detection, cyber-attack detection, recommender system, point-of-interest (POI) prediction, and citation forecasting. This dissertation focuses on developing the methods for sequential data-driven applications and evolutionary dynamics characterization for various topics such as transit service disruption detection, early event detection on social media, technology opportunity discovery, and traffic incident impact analysis. In particular, four specific applications are studied with four proposed novel methods, including a spatiotemporal feature learning framework for transit service disruption detection, a multi-task learning framework for cybersecurity event detection, citation dynamics modeling via multi-context attentional recurrent neural networks, and traffic incident impact forecasting via hierarchical spatiotemporal graph neural networks. For the first of these methods, the existing transit service disruption detection methods usually suffer from two significant shortcomings: 1) failing to modulate the sparsity of the social media feature domain, i.e., only a few important ``particles'' are indeed related to service disruption among the massive volume of data generated every day and 2) ignoring the real-world geographical connections of transit networks as well as the semantic consistency existing in the problem space. This work makes three contributions: 1) developing a spatiotemporal learning framework for metro disruption detection using open-source data, 2) modeling semantic similarity and spatial connectivity among metro lines in feature space, and 3) developing an optimization algorithm for solving the multi-convex and non-smooth objective function efficiently. For the second of these methods, the conventional studies in cybersecurity detection suffer from the following shortcomings: 1) unable to capture weak signals generated by the cyber-attacks on small organizations or individual accounts, 2) lack of generalization of distinct types of security incidents, and 3) failing to consider the relatedness across different types of cyber-attacks in the feature domain. Three contributions are made in this work: 1) formulating the problem of social media-based cyber-attack detection into the multi-task learning framework, 2) modeling multi-type task relatedness in feature space, and 3) developing an efficient algorithm to solve the non-smooth model with inequality constraints. For the third of these methods, conventional citation forecasting methods are using the traditional temporal point process, which suffers from several drawbacks: 1) unable to predict the technological categories of citing documents and thus are incapable of technological diversity assessment, and 2) require prior domain knowledge and thus are hard to extend to different research areas. Two contributions are made in this work: 1) formulating a novel framework to provide long-term citation predictions in an end-to-end fashion by integrating the process of learning intensity function representations and the process of predicting future citations and 2) designing two novel temporal attention mechanisms to improve the model's ability to modulate complicated temporal dependencies and to allow the model to dynamically combine the observation and prediction sides during the learning process. For the fourth of these methods, the previous work treats the traffic sensor readings as the features and views the incident duration prediction as a feature-driven regression, which typically suffers from three drawbacks: 1) ignoring the existence of the road-sensor hierarchical structure in the real-world traffic network, 2) unable to learn and modulate the hidden temporal patterns in the sensor readings, and 3) lack of consideration of the spatial connectivity between arterial roads and traffic sensors. This work makes three significant contributions: 1) designing a hierarchical graph convolutional network architecture for modeling the road-sensor hierarchy, 2) proposing novel spatiotemporal attention mechanism on the sensor- and road-level features for representation learning, and 3) presenting a graph convolutional network-based method for incident representation learning via spatial connectivity modeling and traffic characteristics modulation. recurrent neural networks sequential data feature learning time series
8	An explainable method for prediction of sepsis in ICUs using deep learning Baghaei, Kourosh T 30 April 2021 (has links) As a complicated lethal medical emergency, sepsis is not easy to be diagnosed until it is too late for taking any life saving actions. Early prediction of sepsis in ICUs may reduce inpatient mortality rate. Although deep learning models can make predictions on the outcome of ICU stays with high accuracies, the opacity of such neural networks decreases their reliability. Particularly, in the ICU settings where the time is not on doctors' side and every single mistake increase the chances of patient's mortality. Therefore, it is crucial for the predictive model to provide some sort of reasoning in addition to the prediction it provides, so that the medical staff could avoid actions based on false alarms. To address this problem, we propose to add an attention layer to a deep recurrent neural network that can learn the relative importance of each of the parameters of the multivariate data of the ICU stay. Our approach sheds light on providing explainability through attention mechanism. We compare our method with some of the state-of-the-art methods and show the superiority of our approach in terms of providing explanations. Deep Learning Sepsis Explainability Interpretability Recurrent Neural Networks
9	MIMO Channel Prediction Using Recurrent Neural Networks Potter, Chris, Kosbar, Kurt, Panagos, Adam 10 1900 (has links) ITC/USA 2008 Conference Proceedings / The Forty-Fourth Annual International Telemetering Conference and Technical Exhibition / October 27-30, 2008 / Town and Country Resort & Convention Center, San Diego, California / Adaptive modulation is a communication technique capable of maximizing throughput while guaranteeing a fixed symbol error rate (SER). However, this technique requires instantaneous channel state information at the transmitter. This can be obtained by predicting channel states at the receiver and feeding them back to the transmitter. Existing algorithms used to predict single-input single-output (SISO) channels with recurrent neural networks (RNN) are extended to multiple-input multiple-output (MIMO) channels for use with adaptive modulation and their performance is demonstrated in several examples. Multiple-input multiple-output (MIMO) Channel prediction Recurrent neural networks Online training Adaptive modulation Flat fading
10	Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural Networks Ayoub, Issa 24 June 2019 (has links) Affective computing has gained significant attention from researchers in the last decade due to the wide variety of applications that can benefit from this technology. Often, researchers describe affect using emotional dimensions such as arousal and valence. Valence refers to the spectrum of negative to positive emotions while arousal determines the level of excitement. Describing emotions through continuous dimensions (e.g. valence and arousal) allows us to encode subtle and complex affects as opposed to discrete emotions, such as the basic six emotions: happy, anger, fear, disgust, sad and neutral. Recognizing spontaneous and subtle emotions remains a challenging problem for computers. In our work, we employ two modalities of information: video and audio. Hence, we extract visual and audio features using deep neural network models. Given that emotions are time-dependent, we apply the Temporal Convolutional Neural Network (TCN) to model the variations in emotions. Additionally, we investigate an alternative model that combines a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). Given our inability to fit the latter deep model into the main memory, we divide the RNN into smaller segments and propose a scheme to back-propagate gradients across all segments. We configure the hyperparameters of all models using Gaussian processes to obtain a fair comparison between the proposed models. Our results show that TCN outperforms RNN for the recognition of the arousal and valence emotional dimensions. Therefore, we propose the adoption of TCN for emotion detection problems as a baseline method for future work. Our experimental results show that TCN outperforms all RNN based models yielding a concordance correlation coefficient of 0.7895 (vs. 0.7544) on valence and 0.8207 (vs. 0.7357) on arousal on the validation dataset of SEWA dataset for emotion prediction. Temporal Convolutional Neural Networks Recurrent Neural Networks Gaussian Processes Hyperparameter Optimization Convolutional Neural Networks

Search results