• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • Tagged with
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Algorithm and Hardware Design for Efficient Deep Learning Inference

January 2018 (has links)
abstract: Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data, robustness to noise in previously unseen data and high inference accuracy. With the ability to learn useful features from raw sensor data, deep learning algorithms have out-performed tradinal AI algorithms and pushed the boundaries of what can be achieved with AI. In this work, we demonstrate the power of deep learning by developing a neural network to automatically detect cough instances from audio recorded in un-constrained environments. For this, 24 hours long recordings from 9 dierent patients is collected and carefully labeled by medical personel. A pre-processing algorithm is proposed to convert event based cough dataset to a more informative dataset with start and end of coughs and also introduce data augmentation for regularizing the training procedure. The proposed neural network achieves 92.3% leave-one-out accuracy on data captured in real world. Deep neural networks are composed of multiple layers that are compute/memory intensive. This makes it difficult to execute these algorithms real-time with low power consumption using existing general purpose computers. In this work, we propose hardware accelerators for a traditional AI algorithm based on random forest trees and two representative deep convolutional neural networks (AlexNet and VGG). With the proposed acceleration techniques, ~ 30x performance improvement was achieved compared to CPU for random forest trees. For deep CNNS, we demonstrate that much higher performance can be achieved with architecture space exploration using any optimization algorithms with system level performance and area models for hardware primitives as inputs and goal of minimizing latency with given resource constraints. With this method, ~30GOPs performance was achieved for Stratix V FPGA boards. Hardware acceleration of DL algorithms alone is not always the most ecient way and sucient to achieve desired performance. There is a huge headroom available for performance improvement provided the algorithms are designed keeping in mind the hardware limitations and bottlenecks. This work achieves hardware-software co-optimization for Non-Maximal Suppression (NMS) algorithm. Using the proposed algorithmic changes and hardware architecture With CMOS scaling coming to an end and increasing memory bandwidth bottlenecks, CMOS based system might not scale enough to accommodate requirements of more complicated and deeper neural networks in future. In this work, we explore RRAM crossbars and arrays as compact, high performing and energy efficient alternative to CMOS accelerators for deep learning training and inference. We propose and implement RRAM periphery read and write circuits and achieved ~3000x performance improvement in online dictionary learning compared to CPU. This work also examines the realistic RRAM devices and their non-idealities. We do an in-depth study of the effects of RRAM non-idealities on inference accuracy when a pretrained model is mapped to RRAM based accelerators. To mitigate this issue, we propose Random Sparse Adaptation (RSA), a novel scheme aimed at tuning the model to take care of the faults of the RRAM array on which it is mapped. Our proposed method can achieve inference accuracy much higher than what traditional Read-Verify-Write (R-V-W) method could achieve. RSA can also recover lost inference accuracy 100x ~ 1000x faster compared to R-V-W. Using 32-bit high precision RSA cells, we achieved ~10% higher accuracy using fautly RRAM arrays compared to what can be achieved by mapping a deep network to an 32 level RRAM array with no variations. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
2

Low Power, Dense Circuit Architectures and System Designs for Neural Networks using Emerging Memristors

Fernando, Baminahennadige Rasitha Dilanjana Xavier 09 August 2021 (has links)
No description available.
3

Improving Energy Efficiency of Network-on-Chips Using Emerging Wireless Technology and Router Optimizations

DiTomaso, Dominic F. 25 July 2012 (has links)
No description available.
4

Reconfigurable Logic Architectures based on Disruptive Technologies / Architectures logiques reconfigurables utilisant les propriétés de l'électronique moléculaire

Gaillardon, Pierre-Emmanuel 15 September 2011 (has links)
Durant les quatre dernières décennies, l’industrie des semi-conducteurs a connu une croissance exponentielle. En accord avec l’ITRS et à mesure de l'approche vers le nanomètre, les promesses sont énormes et les composants sont réduits à leurs limites physiques et économiques ultimes. L’objectif principal de cette thèse est d’explorer les opportunités offertes par les technologies émergentes pour la conception d’architectures reconfigurables. Tout d’abord, la thèse se centre sur l’architecture FPGA traditionnelle et étudie des améliorations structurelles apportées par des technologies en ruptures. Tandis que les structures de configuration et de routage occupent la majeure partie de la surface d’un FPGA et limitent ces performances, l’intégration 3-D apparait comme une bonne opportunité pour déplacer ces circuits dans les niveaux métalliques. Des circuits de configuration et de routage utilisant des mémoires résistives compatibles back-end, un procédé d’intégration 3-D ou encore un procédé de réalisation de transistors verticaux seront introduits et évalués dans un contexte architectural complet. Par la suite, la thèse présente de nouvelles propositions architecturales pour la logique à grain ultra-fin. La taille des éléments logiques peut être réduite grâce aux propriétés inhérentes de certaines technologies, telles que l’arrangement en structures entrecroisées de nanofils ou la polarité contrôlable des transistors carbones. Considérant le changement de granularité des opérateurs logiques, des topologies d’interconnexions fixes sont nécessaires afin d’éviter l’important surcoût dû à l’interconnexion programmable. Afin d’étudier les possibilités de cette organisation, un flot d’évaluation est présenté et utilisé pour explorer l’espace de conception relatif aux architectures à grain ultra-fin. / For the last four decades, the semiconductor industry has experienced an exponential growth. According to the ITRS, as we advance into the era of nanotechnology, the traditional CMOS electronics is reaching its physical and economical limits. The main objective of this thesis is to explore novel design opportunities for reconfigurable architectures given by the emerging technologies. On the one hand, the thesis will focus on the traditional FPGA architecture scheme, and survey some structural improvements brought by disruptive technologies. While the memories and routing structures occupy the major part of the FPGAs total area and mainly limit the performances, 3-D integration appears as a good candidate to embed all this circuitry into the metal layers. Configuration and routing circuits based on back-end compatible resistive memories, a monolithic 3-D process flow and a prospective vertical FETs process flow are introduced and assessed within a complete architectural context. On the other hand, the thesis will present some novel architectural schemes for ultra-fine grain computing. The size of the logic elements can be reduced thanks to inherent properties of the technologies, such as the crossbar organization or the controllable polarity of carbon electronics. Considering the granularity of the logic elements, specific fixed and incomplete interconnection topologies are required to prevent the large overhead of a configurable interconnection pattern. To evaluate the potentiality of this new architectural scheme, a specific benchmarking flow will be presented in order to explore the ultra-fine grain architectural design space.
5

Intrusion Detection and High-Speed Packet Classification Using Memristor Crossbars

Bontupalli, Venkataramesh January 2015 (has links)
No description available.

Page generated in 0.0308 seconds