• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Performance Optimization and Parallelization of Turbo Decoding for Software-Defined Radio

Roth, Jonathan 26 September 2009 (has links)
Research indicates that multiprocessor-based architectures will provide a flexible alternative to hard-wired application-specific integrated circuits (ASICs) suitable to implement the multitude of wireless standards required by mobile devices, while meeting their strict area and power requirements. This shift in design philosophy has led to the software-defined radio (SDR) paradigm, where a significant portion of a wireless standard's physical layer is implemented in software, allowing multiple standards to share a common architecture. Turbo codes offer excellent error-correcting performance, however, turbo decoders are one of the most computationally complex baseband tasks of a wireless receiver. Next generation wireless standards such as Worldwide Interoperability for Microwave Access (WiMAX), support enhanced double-binary turbo codes, which offer even better performance than the original binary turbo codes, at the expense of additional complexity. Hence, the design of efficient double-binary turbo decoder software is required to support wireless standards in a SDR environment. This thesis describes the optimization, parallelization, and simulated performance of a software double-binary turbo decoder implementation supporting the WiMAX standard suitable for SDR. An adapted turbo decoder is implemented in the C language, and numerous software optimizations are applied to reduce its overall computationally complexity. Evaluation of the software optimizations demonstrated a combined improvement of at least 270% for serial execution, while maintaining good bit-error rate (BER) performance. Using a customized multiprocessor simulator, special instruction support is implemented to speed up commonly performed turbo decoder operations, and is shown to improve decoder performance by 29% to 40%. The development of a flexible parallel decoding algorithm is detailed, with multiprocessor simulations demonstrating a speedup of 10.8 using twelve processors, while maintaining good parallel efficiency (above 89%). A linear-log-MAP decoder implementation using four iterations was shown to have 90% greater throughput than a max-log-MAP decoder implementation using eight iterations, with comparable BER performance. Simulation also shows that multiprocessor cache effects do not have a significant impact on parallel execution times. An initial investigation into the use of vector processing to further enhance performance of the parallel decoder software reveals promising results. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-09-25 16:22:47.288
2

Large-Scale Real-Time Electromagnetic Transient Simulation of Power Systems Using Hardware Emulation on FPGAs

Chen, Yuan Unknown Date
No description available.
3

Algorithm Design and Optimization of Convolutional Neural Networks Implemented on FPGAs

Du, Zekun January 2019 (has links)
Deep learning develops rapidly in recent years. It has been applied to many fields, which are the main areas of artificial intelligence. The combination of deep learning and embedded systems is a good direction in the technical field. This project is going to design a deep learning neural network algorithm that can be implemented on hardware, for example, FPGA. This project based on current researches about deep learning neural network and hardware features. The system uses PyTorch and CUDA as assistant methods. This project focuses on image classification based on a convolutional neural network (CNN). Many good CNN models can be studied, like ResNet, ResNeXt, and MobileNet. By applying these models to the design, an algorithm is decided with the model of MobileNet. Models are selected in some ways, like floating point operations (FLOPs), number of parameters and classification accuracy. Finally, the algorithm based on MobileNet is selected with a top-1 error of 5.5%on software with a 6-class data set.Furthermore, the hardware simulation comes on the MobileNet based algorithm. The parameters are transformed from floating point numbers to 8-bit integers. The output numbers of each individual layer are cut to fixed-bit integers to fit the hardware restriction. A number handling method is designed to simulate the number change on hardware. Based on this simulation method, the top-1 error increases to 12.3%, which is acceptable. / Deep learning har utvecklats snabbt under den senaste tiden. Det har funnit applikationer inom många områden, som är huvudfälten inom Artificial Intelligence. Kombinationen av Deep Learning och innbyggda system är en god inriktning i det tekniska fältet. Syftet med detta projekt är att designa en Deep Learning-baserad Neural Network algoritm som kan implementeras på hårdvara, till exempel en FPGA. Projektet är baserat på modern forskning inom Deep Learning Neural Networks samt hårdvaruegenskaper.Systemet är baserat på PyTorch och CUDA. Projektets fokus är bild klassificering baserat på Convolutional Neural Networks (CNN). Det finns många bra CNN modeller att studera, t.ex. ResNet, ResNeXt och MobileNet. Genom att applicera dessa modeller till designen valdes en algoritm med MobileNetmodellen. Valet av modell är baserat på faktorer så som antal flyttalsoperationer, antal modellparametrar och klassifikationsprecision. Den mjukvarubaserade versionen av den MobileNet-baserade algoritmen har top-1 error på 5.5En hårdvarusimulering av MobileNet nätverket designades, i vilket parametrarna är konverterade från flyttal till 8-bit heltal. Talen från varje lager klipps till fixed-bit heltal för att anpassa nätverket till befintliga hårdvarubegränsningar. En metod designas för att simulera talförändringen på hårdvaran. Baserat på denna simuleringsmetod reduceras top-1 error till 12.3

Page generated in 0.0851 seconds