Global ETD Search

21	Offloading Virtual Network Functions – Hierarchical Approach Langlet, Jonatan January 2020 (has links) Next generation mobile networks are designed to run in a virtualized environment, enabling rapid infrastructure deployment and high flexibility for coping with increasing traffic demands and new service requirements. Such network function virtualization imposes additional packet latencies and potential bottlenecks not present in legacy network equipment when run on dedicated hardware; such bottlenecks include PCIe transfer delays, virtualization overhead, and utilizing commodity server hardware which is not optimized for packet processing operations.Through recent developments in P4 programmable networking devices, it is possible to implement complex packet processing pipelines directly in the network data plane; allowing critical traffic flows to be offloaded and flexibly hardware accelerated on new programmable packet processing hardware, prior to entering the virtualized environment.In this thesis, we design and implement a novel hybrid NFV processing architecture which integrates programmable NICs and commodity server hardware, capable of offloading virtual network functions for specified traffic flows directly to the server network card; allowing these flows to completely bypass softwarization overhead, while less sensitive traffic process on the underlying host server.An evaluation in a testbed with customized traffic generators show that accelerated flows have significantly lower jitter and latency, compared with flows processed on commodity server hardware. Our evaluation gives important insights into the designs of such hardware accelerated virtual network deployments, showing that hybrid network architectures are a viable solution for enabling infrastructure scalability without sacrificing critical flow performance. vnf hardware acceleration 5g nfv smartnic offloading Computer and Information Sciences Data- och informationsvetenskap
22	Accelerating RSA Public Key Cryptography via Hardware Acceleration Ramesh, Pavithra 10 April 2020 (has links) A large number and a variety of sensors and actuators, also known as edge devices of the Internet of Things, belonging to various industries - health care monitoring, home automation, industrial automation, have become prevalent in today's world. These edge devices need to communicate data collected to the central system occasionally and often in burst mode which is then used for monitoring and control purposes. To ensure secure connections, Asymmetric or Public Key Cryptography (PKC) schemes are used in combination with Symmetric Cryptography schemes. RSA (Rivest - Shamir- Adleman) is one of the most prevalent public key cryptosystems, and has computationally intensive operations which might have a high latency when implemented in resource constrained environments. The objective of this thesis is to design an accelerator capable of increasing the speed of execution of the RSA algorithm in such resource constrained environments. The bottleneck of the algorithm is determined by analyzing the performance of the algorithm in various platforms - Intel Linux Machine, Raspberry Pi, Nios soft core processor. In designing the accelerator to speedup bottleneck function, we realize that the accelerator architecture will need to be changed according to the resources available to the accelerator. We use high level synthesis tools to explore the design space of the accelerator by taking into consideration system level aspects like the number of ports available to transfer inputs to the accelerator, the word size of the processor, etc. We also propose a new accelerator architecture for the bottleneck function and the algorithm it implements and compare the area and latency requirements of it with other designs obtained from design space exploration. The functionality of the design proposed is verified and prototyped in Zynq SoC of Xilinx Zedboard. RSA Hardware acceleration Security in Sensor Nodes Montgomery Multiplication High level synthesis Electrical and Computer Engineering
23	Hardwarově akcelerovaná funkční verifikace / Hardware Accelerated Functional Verification Zachariášová, Marcela January 2011 (has links) Funkční verifikace je jednou z nejrozšířenějších technik ověřování korektnosti hardwarových systémů podle jejich specifikace. S nárůstem složitosti současných systémů se zvyšují i časové požadavky kladené na funkční verifikaci, a proto je důležité hledat nové techniky urychlení tohoto procesu. Teoretická část této práce popisuje základní principy různých verifikačních technik, jako jsou simulace a testování, funkční verifikace, jakož i formální analýzy a verifikace. Následuje popis tvorby verifikačních prostředí nad hardwarovými komponentami v jazyce SystemVerilog. Část věnující se analýze popisuje požadavky kladené na systém pro akceleraci funkční verifikace, z nichž nejdůležitější jsou možnost jednoduchého spuštění akcelerované verze verifikace a časová ekvivalence akcelerovaného a neakcelerovaného běhu verifikace. Práce dále představuje návrh verifikačního rámce používajícího pro akceleraci běhů verifikace technologii programovatelných hradlových polí se zachováním možnosti spuštění běhu verifikace v uživatelsky přívětivém ladicím prostředí simulátoru. Dle experimentů provedených na prototypové implementaci je dosažené zrychlení úměrné počtu ověřovaných transakcí a komplexnosti verifikovaného systému, přičemž nejvyšší zrychlení dosažené v sadě experimentů je více než 130násobné.
24	Akcelerace algoritmů pro hledání palindromu a opakujících se struktur / Acceleration of Methods for Searching Palindroms and Repetitive Structures Voženílek, Jan January 2010 (has links) Genetic information of all living organisms is stored in DNA. Exploring of its structure and function represents an important area of research in modern biology. One of the interesting structures occurring in DNA are palindromes. Based on the research they are expected to play an important role in interpreting the information stored in DNA, because they are often observed near important genes. Palindromes searching is complicated by the presence of mutations (changes in sequences of DNA elements), which increases the time complexity of algorithms. Therefore it is reasonable to study their parallelization and acceleration. The objective of this work is a study of palindromes searching methods and acceleration architecture design. The hardware unit implemented in a chip with FPGA technology placed on ml555 board can speed up the calculation up to 6 667 times in comparison with the best-known software method relying on suffix arrays.
25	Algoritmy klasifikace paketů / Packet Classification Algorithms Puš, Viktor Unknown Date (has links) Tato práce se zabývá klasifikací paketů v počítačových sítích. Klasifikace paketů je klíčovou úlohou mnoha síťových zařízení, především paketových filtrů - firewallů. Práce se tedy týká oblasti počítačové bezpečnosti. Práce je zaměřena na vysokorychlostní sítě s přenosovou rychlostí 100 Gb/s a více. V těchto případech nelze použít pro klasifikaci obecné procesory, které svým výkonem zdaleka nevyhovují požadavkům na rychlost. Proto se využívají specializované technické prostředky, především obvody ASIC a FPGA. Neméně důležitý je také samotný algoritmus klasifikace. Existuje mnoho algoritmů klasifikace paketů předpokládajících hardwarovou implementaci, přesto však tyto přístupy nejsou připraveny pro velmi rychlé sítě. Dizertační práce se proto zabývá návrhem nových algoritmů klasifikace paketů se zaměřením na vysokorychlostní implementaci ve specializovaném hardware. Je navržen algoritmus, který dělí problém klasifikace na jednodušší podproblémy. Prvním krokem je operace vyhledání nejdelšího shodného prefixu, používaná také při směrování paketů v IP sítích. Tato práce předpokládá využití některého existujícího přístupu, neboť již byly prezentovány algoritmy s dostatečnou rychlostí. Následujícím krokem je mapování nalezených prefixů na číslo pravidla. V této části práce přináší vylepšení využitím na míru vytvořené hashovací funkce. Díky použití hashovací funkce lze mapování provést v konstantním čase a využít při tom pouze jednu paměť s úzkým datovým rozhraním. Rychlost tohoto algoritmu lze určit analyticky a nezávisí na počtu pravidel ani na charakteru síťového provozu. S využitím dostupných součástek lze dosáhnout propustnosti 266 milionů paketů za sekundu. Následující tři algoritmy uvedené v této práci snižují paměťové nároky prvního algoritmu, aniž by ovlivňovaly rychlost. Druhý algoritmus snižuje velikost paměti o 11 % až 96 % v závislosti na sadě pravidel. Nevýhodu nízké stability odstraňuje třetí algoritmus, který v porovnání s prvním zmenšuje paměťové nároky o 31 % až 84 %. Čtvrtý algoritmus kombinuje třetí algoritmus se starším přístupem a díky využití několika technik zmenšuje paměťové nároky o 73 % až 99 %.
26	Hardware Acceleration for Homomorphic Encryption / Accélération matérielle pour la cryptographie homomorphe Cathebras, Joël 17 December 2018 (has links) Dans cette thèse, nous nous proposons de contribuer à la définition de systèmes de crypto-calculs pour la manipulation en aveugle de données confidentielles. L’objectif particulier de ce travail est l’amélioration des performances du chiffrement homomorphe. La problématique principale réside dans la définition d’une approche d’accélération qui reste adaptable aux différents cas applicatifs de ces chiffrements, et qui, de ce fait, est cohérente avec la grande variété des paramétrages. C’est dans cet objectif que cette thèse présente l’exploration d’une architecture hybride de calcul pour l’accélération du chiffrement de Fan et Vercauteren (FV).Cette proposition résulte d’une analyse de la complexité mémoire et calculatoire du crypto-calcul avec FV. Une partie des contributions rend plus efficace l’adéquation d’un système non-positionnel de représentation des nombres (RNS) avec la multiplication de polynôme par transformée de Fourier sur corps finis (NTT). Les opérations propres au RNS, facilement parallélisables, sont accélérées par une unité de calcul SIMD type GPU. Les opérations de NTT à la base des multiplications de polynôme sont implémentées sur matériel dédié de type FPGA. Des contributions spécifiques viennent en soutien de cette proposition en réduisant le coût mémoire et le coût des communications pour la gestion des facteurs de rotation des NTT.Cette thèse ouvre des perspectives pour la définition de micro-serveurs pour la manipulation de données confidentielles à base de chiffrement homomorphe. / In this thesis, we propose to contribute to the definition of encrypted-computing systems for the secure handling of private data. The particular objective of this work is to improve the performance of homomorphic encryption. The main problem lies in the definition of an acceleration approach that remains adaptable to the different application cases of these encryptions, and which is therefore consistent with the wide variety of parameters. It is for that objective that this thesis presents the exploration of a hybrid computing architecture for accelerating Fan and Vercauteren’s encryption scheme (FV).This proposal is the result of an analysis of the memory and computational complexity of crypto-calculation with FV. Some of the contributions make the adequacy of a non-positional number representation system (RNS) with polynomial multiplication Fourier transform over finite-fields (NTT) more effective. RNS-specific operations, inherently embedding parallelism, are accelerated on a SIMD computing unit such as GPU. NTT-based polynomial multiplications are implemented on dedicated hardware such as FPGA. Specific contributions support this proposal by reducing the storage and the communication costs for handling the NTTs’ twiddle factors.This thesis opens up perspectives for the definition of micro-servers for the manipulation of private data based on homomorphic encryption. Protection des données Cryptographie homomorphe Accélération matérielle Data privacy Homomorphic Cryptographie Hardware acceleration
27	Accelerated Graphical User Interfaces / Accelerated Graphical User Interfaces Navrátil, Ladislav January 2013 (has links) Tato práce je zaměřena na multiplatformní grafická uživatelské rozhraní a jejich hardwarovou akceleraci. Popisuje, co to uživatelské rozhraní jsou a srovnává nástroje na jejich tvorbu a způsoby jejich realizace. Hlavním bodem je vlastní návrh a implementace nástroje na tvorbu multiplatformních hardwarově akcelerovaných grafických uživatelských rozhraní. Srovnává vlastní koncept s existujícími řešeními, a uvádí ho do praxe na projektu s externí firmou.
28	Hardware Acceleration of Video analytics on FPGA using OpenCL January 2019 (has links) abstract: With the exponential growth in video content over the period of the last few years, analysis of videos is becoming more crucial for many applications such as self-driving cars, healthcare, and traffic management. Most of these video analysis application uses deep learning algorithms such as convolution neural networks (CNN) because of their high accuracy in object detection. Thus enhancing the performance of CNN models become crucial for video analysis. CNN models are computationally-expensive operations and often require high-end graphics processing units (GPUs) for acceleration. However, for real-time applications in an energy-thermal constrained environment such as traffic management, GPUs are less preferred because of their high power consumption, limited energy efficiency. They are challenging to fit in a small place. To enable real-time video analytics in emerging large scale Internet of things (IoT) applications, the computation must happen at the network edge (near the cameras) in a distributed fashion. Thus, edge computing must be adopted. Recent studies have shown that field-programmable gate arrays (FPGAs) are highly suitable for edge computing due to their architecture adaptiveness, high computational throughput for streaming processing, and high energy efficiency. This thesis presents a generic OpenCL-defined CNN accelerator architecture optimized for FPGA-based real-time video analytics on edge. The proposed CNN OpenCL kernel adopts a highly pipelined and parallelized 1-D systolic array architecture, which explores both spatial and temporal parallelism for energy efficiency CNN acceleration on FPGAs. The large fan-in and fan-out of computational units to the memory interface are identified as the limiting factor in existing designs that causes scalability issues, and solutions are proposed to resolve the issue with compiler automation. The proposed CNN kernel is highly scalable and parameterized by three architecture parameters, namely pe_num, reuse_fac, and vec_fac, which can be adapted to achieve 100% utilization of the coarse-grained computation resources (e.g., DSP blocks) for a given FPGA. The proposed CNN kernel is generic and can be used to accelerate a wide range of CNN models without recompiling the FPGA kernel hardware. The performance of Alexnet, Resnet-50, Retinanet, and Light-weight Retinanet has been measured by the proposed CNN kernel on Intel Arria 10 GX1150 FPGA. The measurement result shows that the proposed CNN kernel, when mapped with 100% utilization of computation resources, can achieve a latency of 11ms, 84ms, 1614.9ms, and 990.34ms for Alexnet, Resnet-50, Retinanet, and Light-weight Retinanet respectively when the input feature maps and weights are represented using 32-bit floating-point data type. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2019 Computer engineering Convolution Neural Network Deep Learning Field-Programmable Gate Array Hardware acceleration Systolic Array
29	Machine Learning for Space Applications on Embedded Systems Dengel, Ric January 2021 (has links) As space missions continue to increase in complexity, the operational capabilities and amount of gathered data demand ever more advanced systems. Currently, mission capabilities are often constrained by the link bandwidth as well as onboard processing capabilities. A large number of commands and complex ground station systems are required to allow spacecraft operations. Thus, methods to allow more efficient use of the bandwidth, computing capacity and increased autonomous capabilities are of strong research interest. Artificial Intelligence (AI), with its vast areas of application scenarios, allows for these challenges and more to be tackled in the spacecraft design. Particularly, the flexibility of Artificial Neural Networks as Machine Learning technology provides many possibilities. For example, Artificial Neural Networks can be used for object detection and classification tasks. Unfortunately, the execution of current Machine Learning algorithms consumes a large amount of power and memory resources. Additionally, the qualification of such algorithms remains challenging, which limits their possible applications in space systems. Thus, an increase in efficiency in all aspects is required to further enable these technologies for space applications. The optimisation of the algorithm for System on Chip (SoC) platforms allows it to benefit from the best of a generic processor and hardware acceleration. This increased complexity of the processing system shall allow broader and more flexible applications of these technologies with a minimum increase of power consumption. As Commercial off-the-shelf embedded systems are commonly used in NewSpace applications and such SoC are not yet available in a qualified manner, the deployment of Machine Learning algorithms on such devices has been evaluated. For deployment of machine learning on such devices, a ConvolutionalNeural Network model was optimised on a workstation. Then, the neural network is deployed with Xilinx’s Vitis AI onto a SoC which includes a powerful generic processor as well as the hardware programming capabilities of an Field ProgrammableGate Array (FPGA). This result was evaluated based on relevant performance and efficiency parameters and a summary is given in this thesis. Additionally, a tool utilising a different approach was developed. With a high-level synthesis tool the hardware description language of an accelerated linear algebra optimised network is created and directly deployed into FPGA logic. The implementation of this tool was started, and the proof of concept is presented. Furthermore, existing challenges with the auto-generated code are outlined and future steps to automate and improve the entire workflow are presented. As both workflows are very different and thus aim for different usage scenarios, both workflows are outlined and the benefits and disadvantages of both are outlined. Machine Learning Space Application Embedded System Hardware Acceleration FPGA Embedded Systems Inbäddad systemteknik
30	A C to Register Transfer Level Algorithm Using Structured Circuit Templates: A Case Study with Simulated Annealing Phillips, Jonathan D. 01 December 2008 (has links) A tool flow is presented for deriving simulated annealing accelerator circuits on a field programmable gate array (FPGA) from C source code by exploring architecture solutions that conform to a preset template through scheduling and mapping algorithms. A case study carried out on simulated annealing-based Autonomous Mission Planning and Scheduling (AMPS) software used for autonomous spacecraft systems is explained. The goal of the research is an automated method for the derivation of a hardware design that maximizes performance while minimizing the FPGA footprint. Results obtained are compared with a peer C to register transfer level (RTL) logic tool, a state-of-the-art space-borne embedded processor and a commodity desktop processor for a variety of problems. The automatically derived hardware circuits consistently outperform other methods by one or more orders of magnitude. Autonomy Complier Hardware Acceleration Space Probes Computer Sciences Electrical and Electronics Engineering

Search results