Spelling suggestions: "subject:"beural betworks (computer)"" "subject:"beural betworks (coomputer)""
461 |
Neural computation of the eigenvectors of a symmetric positive definite matrixTsai, Wenyu Julie 01 January 1996 (has links)
No description available.
|
462 |
Temporal EKG signal classification using neural networksMohr, Sheila Jean 02 February 2010 (has links)
Master of Engineering
|
463 |
COMPARISON OF PRE-TRAINED CONVOLUTIONAL NEURAL NETWORK PERFORMANCE ON GLIOMA CLASSIFICATIONUnknown Date (has links)
Gliomas are an aggressive class of brain tumors that are associated with a better prognosis at a lower grade level. Effective differentiation and classification are imperative for early treatment. MRI scans are a popular medical imaging modality to detect and diagnosis brain tumors due to its capability to non-invasively highlight the tumor region. With the rise of deep learning, researchers have used convolution neural networks for classification purposes in this domain, specifically pre-trained networks to reduce computational costs. However, with various MRI modalities, MRI machines, and poor image scan quality cause different network structures to have different performance metrics. Each pre-trained network is designed with a different structure that allows robust results given specific problem conditions. This thesis aims to cover the gap in the literature to compare the performance of popular pre-trained networks on a controlled dataset that is different than the network trained domain. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
464 |
Brain-inspired computing leveraging the transient non-linear dynamics of magnetic nano-oscillators / Calcul bio-inspiré utilisant la dynamique non-linéaire transitoire d’oscillateurs magnétiques nanométriquesRiou, Mathieu 23 January 2019 (has links)
L’objectif de cette thèse est la réalisation expérimentale de calcul bio-inspiré en utilisant la dynamique transitoire d’oscillateurs magnétique nanométriques.Pour bien des tâches telle que la reconnaissance vocale, le cerveau fonctionne bien plus efficacement en terme d’énergie qu’un ordinateur classique. Le développement de puces neuro-inspirées offre donc la perspective de surmonter les limitations des processeurs actuels et de gagner plusieurs ordres de grandeurs sur la consommation énergétique du traitement de données. L’efficacité du cerveau à traiter des données est due à son architecture, qui est particulièrement adaptée à la reconnaissance de motifs. Les briques de base de cette architecture sont les neurones biologiques. Ceux-ci peuvent être vus comme des oscillateurs non linéaires qui interagissent et génèrent des cascades spatiales d’activations en réponse à une excitation. Cependant le cerveau comprend cent milliards de neurones et le développement d’une puce neuro-inspiré requerrait des oscillateurs de très petite dimension. Les oscillateurs à transfert de spin (STNO) sont de taille nanométrique, ont une réponse rapide (de l’ordre de la nanoseconde), sont fortement non-linéaires et leur réponse dépendante du couple de transfert de spin est aisément ajustable (par exemple par l’application d’un courant continu ou d’un champ magnétique). Ils fonctionnent à température ambiante, ont un très faible bruit thermique, et sont compatible avec les technologies CMOS. Ces caractéristiques en font d’excellents candidats pour la réalisation de réseaux artificiels de neurones compatibles avec un ordinateur classique.Dans cette thèse, nous avons utilisé un unique STNO pour générer le comportement d’un réseau de neurones. Ainsi l’oscillateur joue à tour de rôle chaque neurone. Une cascade temporelle remplace donc la cascade spatiale d’un réseau de neurones biologiques. En particulier nous avons utilisé la relaxation et la dépendance non-linéaire de l’amplitude des oscillations afin de réaliser du calcul neuromorphique. L’un des résultats principaux de cette thèse est la réalisation de reconnaissance vocale (reconnaissance de chiffres dits par 5 locuteurs différents) en obtenant un taux de reconnaissance à l’état de l’art de 99.6%. Nous avons pu montrer que les performances de la reconnaissance sont étroitement dépendantes des propriétés physiques du STNO tel que l’évolution de la largeur de raie, la puissance d’émission, ou la fréquence d’émission. Nous avons donc optimisé les conditions expérimentales (champs magnétiques et courant continu appliqués, fréquence du signal à traiter) afin de pouvoir utiliser au mieux les propriétés physiques du STNO pour la reconnaissance. Les signaux vocaux requièrent d’être transformés du domaine temporel au domaine fréquentiel, avant de pouvoir être traités, et cette étape est réalisée numériquement en amont de l’expérience. Nous avons étudié l’influence de différents prétraitements sur la reconnaissance et mis en évidence le rôle majeur de la non-linéarité de ces derniers. Enfin, afin de pouvoir traiter des problèmes requérant de la mémoire, tel que par exemple des signaux sous forme de séquences temporelles, nous avons mesuré la mémoire que possède intrinsèquement un STNO, du fait de sa relaxation. Nous avons aussi augmenté cette mémoire à l’aide d’une boucle à retard. Ce dispositif a permis d’accroître la plage de mémoire de quelques centaines de nanosecondes à plus d’une dizaine de microsecondes. L’ajout de cette mémoire extrinsèque a permis de supprimer jusqu’à 99% des erreurs sur une tâche de reconnaissance de motifs temporels (reconnaissance de signaux sinusoïdaux et carrés). / This thesis studies experimentally the transient dynamics of magnetic nano-oscillators for brain-inspired computing.For pattern recognition tasks such as speech or visual recognition, the brain is much more energy efficient than classical computers. Developing brain-inspired chips opens the path to overcome the limitations of present processors and to win several orders of magnitude in the energy consumption of data processing. The efficiency of the brain originates from its architecture particularly well adapted for pattern recognition. The building blocks of this architecture are the biological neurons, which can be seen as interacting non-linear oscillators generating spatial chain reactions of activations. Nevertheless, the brain has one hundred billion neurons and a brain-inspired chip would require extremely small dimension oscillators. The spin-transfer torque oscillators (STNO) have nanometric size, they are fast (nanosecond time-scales), highly non-linear and their spin-torque dependent response is easily tunable (for instance by applying an external magnetic field or a d.c. current). They work at room temperature, they have a low thermal noise and they are compatible with CMOS technologies. Because of these features, they are excellent candidates for building hardware neural networks, which are compatible with the standard computers.In this thesis, we used a single STNO to emulate the behavior of a whole neural network. In this time multiplexed approach, the oscillator emulates sequentially each neuron and a temporal chain reaction replace the spatial chain reaction of a biological neural network. In particular, we used the relaxation and the non-linear dependence of the oscillation amplitude with the applied current to perform neuromorphic computing. One of the main results of this thesis is the demonstration of speech recognition (digits said by different speakers) with a state-of-the-art recognition rate of 99.6%. We show that the recognition performance is highly dependent on the physical properties of the STNO, such as the linewidth, the emission power or the frequency. We thus optimized the experimental bias conditions (external applied magnetic field, d.c. current and rate of the input) in order to leverage adequately the physical properties of the STNO for recognition. Voice waveforms require a time-to-frequency transformation before being processed, and this step is performed numerically before the experiment. We studied the influence of different time-to-frequency transformations on the final recognition rate, shading light on the critical role of their non-linear behavior. Finally, in order to solve problems requiring memory, such as temporal sequence analysis, we measured the intrinsic memory of a STNO, which comes from the relaxation of the oscillation amplitude. We also increased this memory, using a delayed feedback loop. This feedback improved the range of memory from a few hundreds of nanoseconds to more than ten microseconds. This feedback memory allows suppressing up to 99% of the errors on a temporal pattern recognition task (discrimination of sine and square waveforms).
|
465 |
A simple artificial neural network development system for study and researchSouthworth, David 16 February 2010 (has links)
Master of Science
|
466 |
Statistical Machine Learning & Deep Neural Networks Applied to Neural Data AnalysisShokri Razaghi, Hooshmand January 2020 (has links)
Computational neuroscience seeks to discover the underlying mechanisms by which neural activity is generated. With the recent advancement in neural data acquisition methods, the bottleneck of this pursuit is the analysis of ever-growing volume of neural data acquired in numerous labs from various experiments. These analyses can be broadly divided into two categories. First, extraction of high quality neuronal signals from noisy large scale recordings. Second, inference for statistical models aimed at explaining the neuronal signals and underlying processes that give rise to them. Conventionally, majority of the methodologies employed for this effort are based on statistics and signal processing. However, in recent years recruiting Artificial Neural Networks (ANN) for neural data analysis is gaining traction. This is due to their immense success in computer vision and natural language processing, and the stellar track record of ANN architectures generalizing to a wide variety of problems. In this work we investigate and improve upon statistical and ANN machine learning methods applied to multi-electrode array recordings and inference for dynamical systems that play critical roles in computational neuroscience.
In the first and second part of this thesis, we focus on spike sorting problem. The analysis of large-scale multi-neuronal spike train data is crucial for current and future of neuroscience research. However, this type of data is not available directly from recordings and require further processing to be converted into spike trains. Dense multi-electrode arrays (MEA) are standard methods for collecting such recordings. The processing needed to extract spike trains from these raw electrical signals is carried out by ``spike sorting'' algorithms. We introduce a robust and scalable MEA spike sorting pipeline YASS (Yet Another Spike Sorter) to address many challenges that are inherent to this task. We primarily pay attention to MEA data collected from the primate retina for important reasons such as the unique challenges and available side information that ultimately assist us in scoring different spike sorting pipelines. We also introduce a Neural Network architecture and an accompanying training scheme specifically devised to address the challenging task of deconvolution in MEA recordings.
In the last part, we shift our attention to inference for non-linear dynamics. Dynamical systems are the governing force behind many real world phenomena and temporally correlated data. Recently, a number of neural network architectures have been proposed to address inference for nonlinear dynamical systems. We introduce two different methods based on normalizing flows for posterior inference in latent non-linear dynamical systems. We also present gradient-based amortized posterior inference approaches using the auto-encoding variational Bayes framework that can be applied to a wide range of generative models with nonlinear dynamics. We call our method 𝘍𝘪𝘭𝘵𝘦𝘳𝘪𝘯𝘨 𝘕𝘰𝘳𝘮𝘢𝘭𝘪𝘻𝘪𝘯𝘨 𝘍𝘭𝘰𝘸𝘴 (FNF). FNF performs favorably against state-of-the-art inference methods in terms of accuracy of predictions and quality of uncovered codes and dynamics on synthetic data.
|
467 |
Algorithm Hardware Co-Design of Neural Networks for Always-On DevicesChundi, Pavan Kumar January 2021 (has links)
Deep learning has become the algorithm of choice in many applications like face recognition, object detection, speech recognition, etc. because of superior accuracy. Large models with several parameters were developed to obtain higher accuracy, which eventually gave diminishing returns at very large training and deployment cost. Consequently, greater attention is now being paid to the efficiency of neural networks.
Low power consumption is particularly important in the case of always-on applications. Some examples of these applications are the datacenters, cellular base stations, battery-powered devices like implantable devices, wearables, cell phones and UAVs. Improvement in the efficiency of these devices by reducing the power consumed will bring down the energy cost or extend the battery life or decrease the form factor of these devices, thereby improving the acceptability and adoption of the device.
Neural networks are a significant component of the total workload in the case of IoT devices with smart functions and datacenters. Base stations can also employ neural networks to improve the rate of convergence in channel estimation. Efficient execution of the neural networks on always-on devices, therefore, helps in lowering the overall power dissipation.
Algorithm only solutions target CPU or GPU as a platform and tend to focus on the number of computing operations. Hardware only solutions tend to focus on programmability, low voltage operation, standby power reduction and on-chip data movement. Such solutions fail to take advantage of the joint optimization of both algorithm and hardware for the target application.
This thesis contributes to improving the efficiency of neural networks on always-on devices through both algorithmic and hardware interventions. It presents works of algorithm-hardware co-design which can obtain better power reduction in the case of a smart IoT device, a datacenter and a small cell base station. It achieves power reduction through a combination of appropriate neural network algorithm and architecture, simpler operations and a reduction in the number of off-chip memory accesses.
|
468 |
When Can Nonconvex Optimization Problems be Solved with Gradient Descent? A Few Case StudiesGilboa, Dar January 2020 (has links)
Gradient descent and related algorithms are ubiquitously used to solve optimization problems arising in machine learning and signal processing. In many cases, these problems are nonconvex yet such simple algorithms are still effective. In an attempt to better understand this phenomenon, we study a number of nonconvex problems, proving that they can be solved efficiently with gradient descent. We will consider complete, orthogonal dictionary learning, and present a geometric analysis allowing us to obtain efficient convergence rate for gradient descent that hold with high probability. We also show that similar geometric structure is present in other nonconvex problems such as generalized phase retrieval.
Turning next to neural networks, we will also calculate conditions on certain classes of networks under which signals and gradients propagate through the network in a stable manner during the initial stages of training. Initialization schemes derived using these calculations allow training recurrent networks on long sequence tasks, and in the case of networks with low precision activation functions they make explicit a tradeoff between the reduction in precision and the maximal depth of a model that can be trained with gradient descent.
We finally consider manifold classification with a deep feed-forward neural network, for a particularly simple configuration of the manifolds. We provide an end-to-end analysis of the training process, proving that under certain conditions on the architectural hyperparameters of the network, it can successfully classify any point on the manifolds with high probability given a sufficient number of independent samples from the manifold, in a timely manner. Our analysis relates the depth and width of the network to its fitting capacity and statistical regularity respectively in early stages of training.
|
469 |
Communication optimizations for distributed deep learningShi, Shaohuai 12 August 2020 (has links)
With the increasing amount of data and the growing computing power, deep learning techniques using deep neural networks (DNNs) have been successfully applied in many practical artificial intelligence applications. The mini-batch stochastic gradient descent (SGD) algorithm and its variants are the most widely used algorithms in training deep models. The SGD algorithm is an iterative algorithm that needs to update the model parameters many times by traversing the training data, which is very time-consuming even using the single powerful GPU or TPU. Therefore, it becomes a common practice to exploit multiple processors (e.g., GPUs or TPUs) to accelerate the training process using distributed SGD. However, the iterative nature of distributed SGD requires multiple processors to iteratively communicate with each other to collaboratively update the model parameters. The intensive communication cost easily becomes the system bottleneck and limits the system scalability. In this thesis, we study the communication-efficient techniques for distributed SGD to improve the system scalability and thus accelerate the training process. We identify the performance issues in distributed SGD through benchmarking and modeling and then propose several communication optimization algorithms to address the communication issues. First, we build a performance model with a directed acyclic graph (DAG) to modeling the training process of distributed SGD and verify the model with extensive benchmarks on existing state-of-the-art deep learning frameworks including Caffe, MXNet, TensorFlow, and CNTK. Our benchmarking and modeling point out that existing optimizations for the communication problems are sub-optimal, which we need to address in this thesis. Second, to address the startup problem (due to the high latency of each communication) of layer-wise communications with wait-free backpropagation (WFBP), we propose an optimal gradient merging solution for WFBP, named MG-WFBP, that exploits the layer-wise property to well overlap the communication tasks with the computing tasks and can be adaptive to the training environments. Experiments are conducted on dense-GPU clusters with Ethernet and InfiniBand, and the results show that MG-WFBP can well address the startup problem in distributed training of layer-wise structured DNNs. Third, to make the high computing-intensive training tasks be possible in GPU clusters with low- bandwidth interconnect, we investigate the gradient compression techniques in distributed training. The top-{dollar}k{dollar} sparsification can well compress the communication traffic with little impact on the model convergence, but it suffers from a linear communication complexity to the number of workers so that top-{dollar}k{dollar} sparsification cannot scale well in large-scale clusters. To address the problem, we propose a global top-{dollar}k{dollar} (gTop-{dollar}k{dollar}) sparsification algorithm that reduces the communication complexity to be logarithmic to the number of workers. We also provide detailed theoretical analysis for the gTop-{dollar}k{dollar} SGD training algorithm, and the theoretical results show that our gTop-{dollar}k{dollar} SGD has the same order of convergence rate with SGD. Experiments are conducted on up to 64-GPU cluster to verify that gTop-{dollar}k{dollar} SGD significantly improves the system scalability with only a slight impact on the model convergence. Lastly, to enjoy the both benefits of the pipelining technique and the gradient sparsification algorithm, we propose a new distributed training algorithm, layer-wise adaptive gradient sparsification SGD (LAGS-SGD), which supports layer-wise sparsification and communication, and we theoretically and empirically prove that the LAGS-SGD preserves the convergence properties. To further alliterate the impact of the startup problem of layer-wise communications in LAGS-SGD, we also propose the optimal gradient merging solution for LAGS-SGD, named OMGS-SGD, and theoretical prove its optimality. The experimental results on a 16-node GPU cluster connected 1Gbps Ethernet show that OMGS-SGD can always improve the system scalability while the model convergence properties are not affected
|
470 |
Communication optimizations for distributed deep learningShi, Shaohuai 12 August 2020 (has links)
With the increasing amount of data and the growing computing power, deep learning techniques using deep neural networks (DNNs) have been successfully applied in many practical artificial intelligence applications. The mini-batch stochastic gradient descent (SGD) algorithm and its variants are the most widely used algorithms in training deep models. The SGD algorithm is an iterative algorithm that needs to update the model parameters many times by traversing the training data, which is very time-consuming even using the single powerful GPU or TPU. Therefore, it becomes a common practice to exploit multiple processors (e.g., GPUs or TPUs) to accelerate the training process using distributed SGD. However, the iterative nature of distributed SGD requires multiple processors to iteratively communicate with each other to collaboratively update the model parameters. The intensive communication cost easily becomes the system bottleneck and limits the system scalability. In this thesis, we study the communication-efficient techniques for distributed SGD to improve the system scalability and thus accelerate the training process. We identify the performance issues in distributed SGD through benchmarking and modeling and then propose several communication optimization algorithms to address the communication issues. First, we build a performance model with a directed acyclic graph (DAG) to modeling the training process of distributed SGD and verify the model with extensive benchmarks on existing state-of-the-art deep learning frameworks including Caffe, MXNet, TensorFlow, and CNTK. Our benchmarking and modeling point out that existing optimizations for the communication problems are sub-optimal, which we need to address in this thesis. Second, to address the startup problem (due to the high latency of each communication) of layer-wise communications with wait-free backpropagation (WFBP), we propose an optimal gradient merging solution for WFBP, named MG-WFBP, that exploits the layer-wise property to well overlap the communication tasks with the computing tasks and can be adaptive to the training environments. Experiments are conducted on dense-GPU clusters with Ethernet and InfiniBand, and the results show that MG-WFBP can well address the startup problem in distributed training of layer-wise structured DNNs. Third, to make the high computing-intensive training tasks be possible in GPU clusters with low- bandwidth interconnect, we investigate the gradient compression techniques in distributed training. The top-{dollar}k{dollar} sparsification can well compress the communication traffic with little impact on the model convergence, but it suffers from a linear communication complexity to the number of workers so that top-{dollar}k{dollar} sparsification cannot scale well in large-scale clusters. To address the problem, we propose a global top-{dollar}k{dollar} (gTop-{dollar}k{dollar}) sparsification algorithm that reduces the communication complexity to be logarithmic to the number of workers. We also provide detailed theoretical analysis for the gTop-{dollar}k{dollar} SGD training algorithm, and the theoretical results show that our gTop-{dollar}k{dollar} SGD has the same order of convergence rate with SGD. Experiments are conducted on up to 64-GPU cluster to verify that gTop-{dollar}k{dollar} SGD significantly improves the system scalability with only a slight impact on the model convergence. Lastly, to enjoy the both benefits of the pipelining technique and the gradient sparsification algorithm, we propose a new distributed training algorithm, layer-wise adaptive gradient sparsification SGD (LAGS-SGD), which supports layer-wise sparsification and communication, and we theoretically and empirically prove that the LAGS-SGD preserves the convergence properties. To further alliterate the impact of the startup problem of layer-wise communications in LAGS-SGD, we also propose the optimal gradient merging solution for LAGS-SGD, named OMGS-SGD, and theoretical prove its optimality. The experimental results on a 16-node GPU cluster connected 1Gbps Ethernet show that OMGS-SGD can always improve the system scalability while the model convergence properties are not affected
|
Page generated in 0.0767 seconds