Global ETD Search

1	Realisierung der virtuellen Hardware-Maschine in VHDL Bierwisch, Nick 20 October 2017 (has links) Die Verarbeitung von Anwendungen mittels Software ist ein sehr flexibler und mächtiger Weg, jedoch ist der Energieverbrauch sehr hoch aber die Leistung nicht dementsprechend. Eine Realisierung der selben Anwendung in Hardware, falls möglich, erlaubt meist eine schnellere Abarbeitung bei gleichzeitig wesentlich niedriger Leistungsaufnahme. Jedoch sind Hardwarerealisierungen nicht sehr flexibel. Dieser Nachteil wurde vermindert, indem rekonfigurierbare Hardware, wie field programmable gate arrays(FPGAs), entwickelt wurde. Ihre grosse Anzahl und der unterschiedliche Aufbau verhindern jedoch eine standardisierte Schnittstelle. Um nicht für jedes FPGA eine Schaltung komplett neu zu entwickeln, wurde in der Diplomarbeit von Sebastian Lange die virtuelle Hardware-Maschine eingeführt. Sie stellt eine Mittelschicht zwischen der zugrunde liegenden Architektur und der Schaltungsentwicklung dar. In dieser Arbeit wurde versucht eine lauffähige Version dieser VHM zu erstellen. Dafür wurde in VHDL das Verhalten beschrieben und ein Design synthetisiert, welches dann auf einem FPGA getestet wurde. Da eine vollständige Implementierung zwar erstellt, aber nicht zum Laufen gebracht werden konnte, lassen sich keine Geschwindigkeiten messen. Es können nur Abschätzungen der Geschwindigkeit und Aussagen über den Ressourcenverbrauch der Implementierung getroffen werden. Diese Arbeit beschreibt, wie die VHM implementiert wurde und trifft einige Aussagen über die zu erwartende Geschwindigkeit und den Ressourcenverbrauch einer solchen Implementierung gegenüber den bisher benutzten direkten Implementierungen der Schaltungen auf den verschiedenen FPGAs. info:eu-repo/classification/ddc/000 ddc:000
2	Implementierung von Java-Threads in Software und rekonfigurierbarer Hardware Endrullis, Stefan 20 October 2017 (has links) Der Markt tragbarer Geräte gewinnt eine immer stärkere Bedeutung. Mobiltelefone, PDAs (Personal Digital Assistant), Smartphones und viele weitere Geräte werden kontinuierlich mit neuen Funktionen ausgestattet und übernehmen zunehmend klassische Aufgaben eines Personal Computers (PC), wie beispielsweise die Textverarbeitung oder die Ausführung multimedialer Anwendungen. Speziell letztere stellen an die Geräte hohe Anforderungen, die sich nicht allein durch den Einsatz leistungsstärkerer Prozessoren lösen lassen. Nicht selten werden deshalb für rechenaufwendige Arbeiten Chips zur Umsetzung der speziellen Anforderungen in Hardware eingesetzt. Diese werden als Application Specific Integrated Circuit (ASIC) bezeichnet. info:eu-repo/classification/ddc/000 ddc:000
3	Fast Split Arithmetic Encoder Architectures and Perceptual Coding Methods for Enhanced JPEG2000 Performance Varma, Krishnaraj M. 11 April 2006 (has links) JPEG2000 is a wavelet transform based image compression and coding standard. It provides superior rate-distortion performance when compared to the previous JPEG standard. In addition JPEG2000 provides four dimensions of scalability-distortion, resolution, spatial, and color. These superior features make JPEG2000 ideal for use in power and bandwidth limited mobile applications like urban search and rescue. Such applications require a fast, low power JPEG2000 encoder to be embedded on the mobile agent. This embedded encoder needs to also provide superior subjective quality to low bitrate images. This research addresses these two aspects of enhancing the performance of JPEG2000 encoders. The JPEG2000 standard includes a perceptual weighting method based on the contrast sensitivity function (CSF). Recent literature shows that perceptual methods based on subband standard deviation are also effective in image compression. This research presents two new perceptual weighting methods that combine information from both the human contrast sensitivity function as well as the standard deviation within a subband or code-block. These two new sets of perceptual weights are compared to the JPEG2000 CSF weights. The results indicate that our new weights performed better than the JPEG2000 CSF weights for high frequency images. Weights based solely on subband standard deviation are shown to perform worse than JPEG2000 CSF weights for all images at all compression ratios. Embedded block coding, EBCOT tier-1, is the most computationally intensive part of the JPEG2000 image coding standard. Past research on fast EBCOT tier-1 hardware implementations has concentrated on cycle efficient context formation. These pass-parallel architectures require that JPEG2000's three mode switches be turned on. While turning on the mode switches allows for arithmetic encoding from each coding pass to run independent of each other (and thus in parallel), it also disrupts the probability estimation engine of the arithmetic encoder, thus sacrificing coding efficiency for improved throughput. In this research a new fast EBCOT tier-1 design is presented: it is called the Split Arithmetic Encoder (SAE) process. The proposed process exploits concurrency to obtain improved throughput while preserving coding efficiency. The SAE process is evaluated using three methods: clock cycle estimation, multithreaded software implementation, a field programmable gate array (FPGA) hardware implementation. All three methods achieve throughput improvement; the hardware implementation exhibits the largest speedup, as expected. A high speed, task-parallel, multithreaded, software architecture for EBCOT tier-1 based on the SAE process is proposed. SAE was implemented in software on two shared-memory architectures: a PC using hyperthreading and a multi-processor non-uniform memory access (NUMA) machine. The implementation adopts appropriate synchronization mechanisms that preserve the algorithm's causality constraints. Tests show that the new architecture is capable of improving throughput as much as 50% on the NUMA machine and as much as 19% on a PC with two virtual processing units. A high speed, multirate, FPGA implementation of the SAE process is also proposed. The mismatch between the rate of production of data by the context formation (CF) module and the rate of consumption of data by the arithmetic encoder (AE) module is studied in detail. Appropriate choices for FIFO sizes and FIFO write and read capabilities are made based on the statistics obtained from test runs of the algorithm. Using a fast CF module, this implementation was able to achieve as much as 120% improvement in throughput. / Ph. D. FPGA hardware Multithreaded software Split Arithmetic Encoder EBCOT JPEG2000 Perceptual weighting
4	Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry / A Hardware-acceleration Protocol Design for Demanding Computations over Multiple Cores Bareš, Jan January 2018 (has links) This work deals with design of communication protocol for data transmission between control computer and computing cores implemented on FPGA chips. The purpose of the communication is speeding the performance demanding software algorithms of non-stream data processing by their hardware computation on accelerating system. The work defines a terminology used for protocol design and analyses current solutions of given issue. After that the work designs structure of the accelerating system and communication protocol. In the main part the work describes the implementation of the protocol in VHDL language and the simulation of implemented modules. At the end of the work the aplication of designed solution is presented along with possible extension of this work.
5	Graphical Support for the Design and Evaluation of Configurable Logic Blocks Erxleben, Fredo 06 May 2015 (has links) Developing a tool supporting humans to design and evaluate CLB-based circuits requires a lot of know-how and research from different fields of computer science. In this work, the newly developed application q2d, especially its design and implementation will be introduced as a possible tool for approaching CLB circuit development with graphical UI support. Design decisions and implementation will be discussed and a workflow example will be given.:1 Introduction 1.1 Forethoughts 1.2 Theoretical Background 1.2.1 Definitions 1.2.2 Expressing Connections between Circuit Elements 1.2.3 Global Context and Target Function 1.2.4 Problem formulation as QBF and SAT 2 Description of the Implemented Tool 2.1 Design Decisions 2.1.1 Choice of Language, Libraries and Frameworks 2.1.2 Solving the QBF Problem 2.1.3 Design of the Internally Used Meta-Model 2.1.4 User Interface Ergonomics 2.1.5 Aspects of Schematic Visualization 2.1.6 Limitations 2.2 Implemented Features 2.2.1 Basic Interaction 2.2.2 User-Defined Components 2.2.3 Generation of Circuit Symbols 2.2.4 Methods for Specifying Functional Behaviour 3 Implementation Details 3.1 Classes Involved in the Component Meta-Model 3.2 The Document Entry Class and its Factory 3.3 Model and View 3.3.1 The Model Element Hierarchy 3.3.2 The Schematics Element Hierarchy 3.4 The Quantor Interface 4 An Example Workflow 4.1 The Task 4.2 A Component Descriptor for Xilinx’ LUT6-2 4.3 Designing the Model 4.4 Computing the Desired Configuration 5 Summary and Outlook 5.1 Achieved Results 5.2 Suggested Improvements References A Acronyms and Glossary B UML Diagrams info:eu-repo/classification/ddc/004 ddc:004
6	A smart adaptive load for power-frequency support applications Carmona Sanchez, Jesus January 2016 (has links) At present, one of the main issues in electric power networks is the reduction in conventional generation and its replacement by low inertia renewable energy generation. The balance between generation and demand has a direct impact on the system frequency and system inertia limits the frequency rate of change until compensation action can be undertaken. Traditionally generation managed frequency. In future, loads may be required to do more than just be able to be switched off during severe under frequency events. This thesis focuses on the development and practical implementation of the control structure of a smart adaptive load for network power-frequency support applications. The control structure developed makes use of advanced demand side management of fan loads (powered by AC drives) used in heating, ventilation, and air conditioning systems; where a change in power at rated load has little effect on their speed due to the cubic relationship between speed and power. The AC drive implemented in this thesis is based on an induction motor and a two level voltage source converter. To achieve the smart adaptive load functionality, first a power-frequency multi-slope droop control structure (feedforward control) is developed; relating the frequency limits imposed by the network supplier and the fan power-speed profile (Chapter 2, Fig 2.19). Secondly, this control structure is combined with the control developed, in Chapter 3, for the AC drive powering the fan load. The full development of the control structure of the AC drive, its tuning process and its practical implementation is given; an equation is developed to find suitable tuning parameters for the speed control of the nonlinear load (fan load), i.e. Eq. (3.59).The analysis and simulation results provided in Chapter 4 conclude that a fast control of the active power drawn by the AC drive is possible by controlling the electromagnetic torque (hence current) of the induction motor without disturbing the fan load overly. To achieve this, changes between closed loop speed control and open loop torque control (power control) are performed when needed. Two main issues were addressed before the hardware implementation of the smart adaptive load: the estimation of the network frequency under distorted voltage conditions, and the recovery period of the network frequency. In this thesis two slew rate limiters were implemented to deal with such situations. Other possible solutions are also outlined. Finally, experimental results in Chapter 5 support results given in Chapter 4. A full power-frequency response is achieved by the smart adaptive load within 3s. 621.31
7	Algorithm Design and Optimization of Convolutional Neural Networks Implemented on FPGAs Du, Zekun January 2019 (has links) Deep learning develops rapidly in recent years. It has been applied to many fields, which are the main areas of artificial intelligence. The combination of deep learning and embedded systems is a good direction in the technical field. This project is going to design a deep learning neural network algorithm that can be implemented on hardware, for example, FPGA. This project based on current researches about deep learning neural network and hardware features. The system uses PyTorch and CUDA as assistant methods. This project focuses on image classification based on a convolutional neural network (CNN). Many good CNN models can be studied, like ResNet, ResNeXt, and MobileNet. By applying these models to the design, an algorithm is decided with the model of MobileNet. Models are selected in some ways, like floating point operations (FLOPs), number of parameters and classification accuracy. Finally, the algorithm based on MobileNet is selected with a top-1 error of 5.5%on software with a 6-class data set.Furthermore, the hardware simulation comes on the MobileNet based algorithm. The parameters are transformed from floating point numbers to 8-bit integers. The output numbers of each individual layer are cut to fixed-bit integers to fit the hardware restriction. A number handling method is designed to simulate the number change on hardware. Based on this simulation method, the top-1 error increases to 12.3%, which is acceptable. / Deep learning har utvecklats snabbt under den senaste tiden. Det har funnit applikationer inom många områden, som är huvudfälten inom Artificial Intelligence. Kombinationen av Deep Learning och innbyggda system är en god inriktning i det tekniska fältet. Syftet med detta projekt är att designa en Deep Learning-baserad Neural Network algoritm som kan implementeras på hårdvara, till exempel en FPGA. Projektet är baserat på modern forskning inom Deep Learning Neural Networks samt hårdvaruegenskaper.Systemet är baserat på PyTorch och CUDA. Projektets fokus är bild klassificering baserat på Convolutional Neural Networks (CNN). Det finns många bra CNN modeller att studera, t.ex. ResNet, ResNeXt och MobileNet. Genom att applicera dessa modeller till designen valdes en algoritm med MobileNetmodellen. Valet av modell är baserat på faktorer så som antal flyttalsoperationer, antal modellparametrar och klassifikationsprecision. Den mjukvarubaserade versionen av den MobileNet-baserade algoritmen har top-1 error på 5.5En hårdvarusimulering av MobileNet nätverket designades, i vilket parametrarna är konverterade från flyttal till 8-bit heltal. Talen från varje lager klipps till fixed-bit heltal för att anpassa nätverket till befintliga hårdvarubegränsningar. En metod designas för att simulera talförändringen på hårdvaran. Baserat på denna simuleringsmetod reduceras top-1 error till 12.3 Computer and Information Sciences Data- och informationsvetenskap

1

Page generated in 0.0429 seconds