Global ETD Search

21	Hardware accelerators for post-quantum cryptography and fully homomorphic encryption Agrawal, Rashmi 16 January 2023 (has links) With the monetization of user data, data breaches have become very common these days. In the past five years, there were more than 7000 data breaches involving theft of personal information of billions of people. In the year 2020 alone, the global average cost per data breach was $3.86 million, and this number rose to $4.24 million in 2021. Therefore, the need for maintaining data security and privacy is becoming increasingly critical. Over the years, various data encryption schemes including RSA, ECC, and AES are being used to enable data security and privacy. However, these schemes are deemed vulnerable to quantum computers with their enormous processing power. As quantum computers are expected to become main stream in the near future, post-quantum secure encryption schemes are required. To this end, through NIST’s standardization efforts, code-based and lattice-based encryption schemes have emerged as one of the plausible way forward. Both code-based and lattice-based encryption schemes enable public key cryptosystems, key exchange mechanisms, and digital signatures. In addition, lattice-based encryption schemes support fully homomorphic encryption (FHE) that enables computation on encrypted data. Over the years, there have been several efforts to design efficient FPGA-based and ASIC-based solutions for accelerating the code-based and lattice-based encryption schemes. The conventional code-based McEliece cryptosystem uses binary Goppa code, which has good code rate and error correction capability, but suffers from high encoding and decoding complexity. Moreover, the size of the generated public key is in several MBs, leading to cryptosystem designs that cannot be accommodated on low-end FPGAs. In lattice-based encryption schemes, large polynomial ring operations form the core compute kernel and remain a key challenge for many hardware designers. To extend support for large modular arithmetic operations on an FPGA, while incurring low latency and hardware resource utilization requires substantial design efforts. Moreover, prior FPGA solutions for lattice-based FHE include hardware acceleration of basic FHE primitives for impractical parameter sets without the support for bootstrapping operation that is critical to building real-time privacy-preserving applications. Similarly, prior ASIC proposals of FHE that include bootstrapping are heavily memory bound, leading to large execution times, underutilized compute resources, and cost millions of dollars. To respond to these challenges, in this dissertation, we focus on the design of efficient hardware accelerators for code-based and lattice-based public key cryptosystems (PKC). For code-based PKC, we propose the design of a fully-parameterized en/decryption co-processor based on a new variant of McEliece cryptosystem. This co-processor takes advantage of the non-binary Orthogonal Latin Square Code (OLSC) to achieve a lower computational complexity along with smaller key size than that of the binary Goppa code. Our FPGA-based implementation of the co-processor is ∼3.5× faster than an existing classic McEliece cryptosystem implementation. For lattice-based PKC, we propose the design of a co-processor that implements large polynomial ring operations. It uses a fully-pipelined NTT polynomial multiplier to perform fast polynomial multiplications. We also propose the design of a highly-optimized Gaussian noise sampler, capable of sampling millions of high-precision samples per second. Through an FPGA-based implementation of this lattice-based PKC co-processor, we achieve a speedup of 6.5× while utilizing 5× less hardware resources as compared to state-of-the-art implementations. Leveraging our work on lattice-based PKC implementation, we explore the design of hardware accelerators that perform FHE operations using Cheon-Kim-Kim-Song (CKKS) scheme. Here, we first perform an in-depth architectural analysis of various FHE operations in the CKKS scheme so as to explore ways to accelerate an end-to-end FHE application. For this analysis, we develop a custom architecture modeling tool, SimFHE, to measure the compute and memory bandwidth requirements of hardware-accelerated CKKS. Our analysis using SimFHE reveals that, without a prohibitively large cache, all FHE operations exhibit low arithmetic intensity (<1 Op/byte). To address the memory bottleneck resulting from the low arithmetic intensity, we propose several memory-aware design (MAD) techniques, including caching and algorithmic optimizations, to reduce the memory requirements of CKKS-based application execution. We show that the use of our MAD techniques can yield an ASIC design that is at least 5-10× cheaper than the large-cache proposals, but only ∼2-3× slower. We also design FAB, an FPGA-based accelerator for bootstrappable FHE. FAB, for the first time ever, accelerates bootstrapping (along with basic FHE primitives) on an FPGA for a secure and practical parameter set. FAB tackles the memory-bounded nature of bootstrappable FHE through judicious datapath modification, smart operation scheduling, and on-chip memory management techniques to maximize the overall FHE-based compute throughput. FAB outperforms all prior CPU/GPU works by 9.5× to 456× and provides a practical performance for our target application: secure training of logistic regression models. / 2025-01-16T00:00:00Z Computer engineering FPGA Fully homomorphic encryption Hardware accelerator Post-quantum cryptography Privacy-preserving computing
22	A Data Sorting Hardware Accelerator on FPGA Liu, Boyan January 2020 (has links) In recent years, with the rise of the application of big data, efficiency has become more important for data processing, and simple sorting methods require higher stability and efficiency in large-scale scenarios. This thesis explores topics related to hardware acceleration for data sorting networks of massive input resource or data stream, which leads to our three different design approaches: running the whole data processing fully on the software side (sorting and merging on PC), a combination of PC side and field- programmable gate arrays (FPGA) platform (hardware sorting with software merging), and fully hardware side (sorting and merging on FPGA). Parallel data hardware sorters have been proposed before, but they do not consider that the loading and off-loading of data often is serial in nature. In this analysis, we explore an insertion-sort solution that can sort data in the same clock cycle as is written to the sorter and compare it with standard parallel sorters.‌ The main contributions in this thesis are techniques for data sorting acceleration for large data streams, which involve fully software design, hardware/software co-design and fully hardware design solution on a reconfigurable FPGA platform. The results of this whole experiment mostly meet our predictions, and we show that Insertion-Sort implemented in hardware can improve the data processing speed for small input data sizes. / De senaste årens ökning av tillämpad big data har inneburit att effektiviteten blivit viktigare vid databehandling. Enkla sorteringsmetoder kräver högre stabilitet och effektivitet i storskaliga scenarier. Den här avhandlingen undersöker ämnen relaterade till hårdvaruacceleration av datasorteringsnätverk med massiv inmatning eller strömmande data, vilket leder till tre olika designmetoder: att köra databehandlingen helt på mjukvarusidan (sortering och sammanslagning på PC), en kombination av PC och Fält- Programmerbara Gate-Arrays (FPGA) plattform (hårdvarusortering med mjukvarusammanslagning), och enbart hårdvarulösning (sortering och sammanslagning på FPGA). Parallella hårdvarusorterare has föreslagits förr, men de tar vanligtvis inte hänsyn till att indata och utdata ofta är seriell till sin natur. I den här avhandlingen undersöker vi en insertion-sort lösning, som kan sortera indata i samma clock cykel som den läses in, och jämför den med några standard parallella sorterare.‌ De viktigaste bidragen i den här avhandlingen är tekniker för datasorteringsacceleration av stora dataströmmar, vilket involverar en implementering helt i mjukvara, en HW/SW codesign lösning och en implementering helt i hårdvara på en rekonfigurerbar FPGA plattform. Resultaten av experimenten uppfyller mestadels våra förutsägelser, och vi visar att Insertion-Sort implementerad i hårdvara kan förbättra databehandlingshastigheten för små dataserier. Data Sorting Hardware Accelerator Algorithm Block Circuit FPGA Computer and Information Sciences Data- och informationsvetenskap
23	A Memory-Array Centric Reconfigurable Hardware Accelerator for Security Applications Babecki, Christopher 03 June 2015 (has links) No description available. Computer Engineering security applications domain-specific hardware accelerator energy-efficiency reconfigurable computing side-channel attack resistance
24	A SINDy Hardware Accelerator For Efficient System Identification On Edge Devices Gallagher, Michael Sean 01 March 2024 (has links) (PDF) The SINDy (Sparse Identification of Non-linear Dynamics) algorithm is a method of turning a set of data representing non-linear dynamics into a much smaller set of equations comprised of non-linear functions summed together. This provides a human readable system model the represents the dynamic system analyzed. The SINDy algorithm is important for a variety of applications, including high precision industrial and robotic applications. A Hardware Accelerator was designed to decrease the time spent doing calculations. This thesis proposes an efficient hardware accelerator approach for a broad range of applications that use SINDy and similar system identification algorithms. The accelerator is leverages both systolic arrays for integrated neural network models with other numerical solvers. The novel and efficient reuse of similar processing elements allows this approach to only use a minimal footprint, so that it could be added to microcontroller devices or implemented on lower cost FPGA devices. Our proposed approach also allows the designer to offload calculations onto edge devices from controller nodes and requires less communication from those edge devices to the controller due to the reduced equation space. SINDy Hardware Accelerator FPGA Controls and Control Theory Signal Processing
25	Attacks and Vulnerabilities of Hardware Accelerators for Machine Learning: Degrading Accuracy Over Time by Hardware Trojans Niklasson, Marcus, Uddberg, Simon January 2024 (has links) The increasing application of Neural Networks (NNs) in various fields has heightened the demand for specialized hardware to enhance performance and efficiency. Field-Programmable Gate Arrays (FPGAs) have emerged as a popular choice for implementing NN accelerators due to their flexibility, high performance, and ability to be customized for specific NN architectures. However, the trend of outsourcing Integrated Circuit (IC) design to third parties has introduced new security vulnerabilities, particularly in the form of Hardware Trojans (HTs). These malicious alterations can severely compromise the integrity and functionality of NN accelerators. Building upon this, this study investigates a novel type of HT that degrades the accuracy of Convolutional Neural Network (CNN) accelerators over time. Two variants of the attack are presented: Gradually Degrading Accuracy Trojan (GDAT) and Suddenly Degrading Accuracy Trojan (SDAT), implemented in various components of the CNN accelerator. The approach presented leverages a sensitivity analysis to identify the most impactful targets for the trojan and evaluates the attack’s effectiveness based on stealthiness, hardware overhead, and impact on accuracy. The overhead of the attacks was found to be competitive when compared to other trojans, and has the potential to undermine trust and cause economic damages if deployed. Out of the components targeted, the memory component for the feature maps was identified as the most vulnerable to this attack, closely followed by the bias memory component. The feature map trojans resulted in a significant accuracy degradation of 78.16% with a 0.15% and 0.29% increase in Look-Up-Table (LUT) utilization for the SDAT and GDAT variants, respectively. In comparison, the bias trojans caused an accuracy degradation of 63.33% with a LUT utilization increase of 0.20% and 0.33% for the respective trojans. The power consumption overhead was consistent at 0.16% for both the attacks and trojan versions. Hardware Trojan CNN FPGA Hardware Accelerator Machine Learning Computer Engineering Datorteknik Embedded Systems Inbäddad systemteknik
26	In Support of High Quality 3-D Ultrasound Imaging for Hand-held Devices January 2015 (has links) abstract: Three dimensional (3-D) ultrasound is safe, inexpensive, and has been shown to drastically improve system ease-of-use, diagnostic efficiency, and patient throughput. However, its high computational complexity and resulting high power consumption has precluded its use in hand-held applications. In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound a reality are presented. First, image enhancement methods to improve signal-to-noise ratio (SNR) are proposed. These include virtual source firing techniques and a low overhead digital front-end architecture using orthogonal chirps and orthogonal Golay codes. Second, algorithm-architecture co-design techniques to reduce the power consumption of 3-D SAU imaging systems is presented. These include (i) a subaperture multiplexing strategy and the corresponding apodization method to alleviate the signal bandwidth bottleneck, and (ii) a highly efficient iterative delay calculation method to eliminate complex operations such as multiplications, divisions and square-root in delay calculation during beamforming. These techniques were used to define Sonic Millip3De, a 3-D die stacked architecture for digital beamforming in SAU systems. Sonic Millip3De produces 3-D high resolution images at 2 frames per second with system power consumption of 15W in 45nm technology. Third, a new beamforming method based on separable delay decomposition is proposed to reduce the computational complexity of the beamforming unit in an SAU system. The method is based on minimizing the root-mean-square error (RMSE) due to delay decomposition. It reduces the beamforming complexity of a SAU system by 19x while providing high image fidelity that is comparable to non-separable beamforming. The resulting modified Sonic Millip3De architecture supports a frame rate of 32 volumes per second while maintaining power consumption of 15W in 45nm technology. Next a 3-D plane-wave imaging system that utilizes both separable beamforming and coherent compounding is presented. The resulting system has computational complexity comparable to that of a non-separable non-compounding baseline system while significantly improving contrast-to-noise ratio and SNR. The modified Sonic Millip3De architecture is now capable of generating high resolution images at 1000 volumes per second with 9-fire-angle compounding. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2015 Electrical engineering Computer engineering Medical imaging and radiology 3-D ultrasound handheld device hardware accelerator low power medical imaging separable beamforming
27	Testing and evaluation of the integratability of the Senior processor / Testning och evaluering av Senior processorns integrerbarhet Hedin, Alexander January 2011 (has links) The ﬁrst version of the Senior processor was created as part of a thesis projectin 2007. This processor was completed and used for educational purposes atLinköpings University. In 2008 several parts of the processor were optimized andthe processor expanded with additional functionality as part of another thesisproject. In 2009 an EU funded project called MULTI-BASE started, in which theComputer Division at the Department of Electrical Engineering participated in.For their part of the MULTI-BASE project, the Senior processor was selected tobe used. After continuous revision and development, this processor was sent formanufacturing. The assignment of this thesis project was to test and verify the diﬀerent func-tions implemted in the Senior processor. To do this a PCB was developed fortesting the Senior processor together with a Virtex-4 FPGA. Extensive testingwas done on the most important functions of the Senior processor. These testsshowed that the manufactured Senior processor works as designed and that it alonecan perform larger calculations and use external hardware accelerators with thehelp of its various interfaces. / Den första versionen av Senior processorn skapades som en del i ett examensarbe-te under 2007, denna processor färdigställdes och användes i utbildningssyfte påLinköping Universitet. 2008 optimerades ﬂera delar av processorn och utökadesmed extra funktionalitet som del av ytterligare ett examensarbete. 2009 startadeett EU ﬁnansierat projekt vid namn MULTI-BASE, som ISYs Datortekniks avdel-ning deltar i. Till deras del av MULTI-BASE projektet valdes Senior processorn attanvändas, efter ytterligare utveckling skickades denna processor för tillverkning. Detta examensarbete hade i uppgift att testa och veriﬁera de olika funktionernasom Senior processorn har implementerats med. För att göra detta tillverkades ettkretskort som ska användas för att testa Senior processorn tillsammans med enVirtex-4 FPGA. Utförliga tester gjordes på de viktigaste funktionerna hos Seniorprocessorn, dessa tester visade att den tillverkade Senior processorn fungerar somplanerat. Den kan på egen hand utföra större beräkningar och använda sig avexterna hårdvare acceleratorer med hjälp av sina olika gränssnitt. Senior processor PCB Printed Circuit Board FPGA Hardware accelerator SPI Serial Peripheral Interface DSP Digital Signal Processor Senior processor Computer science Datavetenskap
28	Hardware Architecture of an XML/XPath Broker/Router for Content-Based Publish/Subscribe Data Dissemination Systems El-Hassan, Fadi 25 February 2014 (has links) The dissemination of various types of data faces ongoing challenges with the growing need of accessing manifold information. Since the interest in content is what drives data networks, some new technologies and thoughts attempt to cope with these challenges by developing content-based rather than address-based architectures. The Publish/ Subscribe paradigm can be a promising approach toward content-based data dissemination, especially that it provides total decoupling between publishers and subscribers. However, in content-based publish/subscribe systems, subscriptions are expressive and the information is often delivered based on the matched expressive content - which may not deeply alleviate considerable performance challenges. This dissertation explores a hardware solution for disseminating data in content-based publish/subscribe systems. This solution consists of an efficient hardware architecture of an XML/XPath broker that can route information based on content to either other XML/XPath brokers or to ultimate users. A network of such brokers represent an overlay structure for XML content-based publish/subscribe data dissemination systems. Each broker can simultaneously process many XPath subscriptions, efficiently parse XML publications, and subsequently forward notifications that result from high-performance matching processes. In the core of the broker architecture, locates an XML parser that utilizes a novel Skeleton CAM-Based XML Parsing (SCBXP) technique in addition to an XPath processor and a high-performance matching engine. Moreover, the broker employs effective mechanisms for content-based routing, so as subscriptions, publications, and notifications are routed through the network based on content. The inherent reconfigurability feature of the broker’s hardware provides the system architecture with the capability of residing in any FPGA device of moderate logic density. Furthermore, such a system-on-chip architecture is upgradable, if any future hardware add-ons are needed. However, the current architecture is mature and can effectively be implemented on an ASIC device. Finally, this thesis presents and analyzes the experiments conducted on an FPGA prototype implementation of the proposed broker/router. The experiments tackle tests for the SCBXP alone and for two phases of development of the whole broker. The corresponding results indicate the high performance that the involved parsing, storing, matching, and routing processes can achieve. Publish/Subscribe Reconfigurable Hardware Content-Based Networks Content-based Routing FPGA Broker XML XPath Data Dissemination XML Parsing Embedded Systems Future Internet Router Hardware Accelerator
29	Génération rapide d'accélerateurs matériels par synthèse d'architecture sous contraintes de ressources / High-level synthesis for fast generation of hardware accelerators under resource constraints Prost-Boucle, Adrien 08 January 2014 (has links) Dans le domaine du calcul générique, les circuits FPGA sont très attrayants pour leur performance et leur faible consommation. Cependant, leur présence reste marginale, notamment à cause des limitations des logiciels de développement actuels. En effet, ces limitations obligent les utilisateurs à bien maîtriser de nombreux concepts techniques. Ils obligent à diriger manuellement les processus de synthèse, de façon à obtenir une solution à la fois rapide et conforme aux contraintes des cibles matérielles visées.Une nouvelle méthodologie de génération basée sur la synthèse d'architecture est proposée afin de repousser ces limites. L'exploration des solutions consiste en l'application de transformations itératives à un circuit initial, ce qui accroît progressivement sa rapidité et sa consommation en ressources. La rapidité de ce processus, ainsi que sa convergence sous contraintes de ressources, sont ainsi garanties. L'exploration est également guidée vers les solutions les plus pertinentes grâce à la détection, dans les applications à synthétiser, des sections les plus critiques pour le contexte d'utilisation réel. Cette information peut être affinée à travers un scénario d'exécution transmis par l'utilisateur.Un logiciel démonstrateur pour cette méthodologie, AUGH, est construit. Des expérimentations sont menées sur plusieurs applications reconnues dans le domaine de la synthèse d'architecture. De tailles très différentes, ces applications confirment la pertinence de la méthodologie proposée pour la génération rapide et autonome d'accélérateurs matériels complexes, sous des contraintes de ressources strictes. La méthodologie proposée est très proche du processus de compilation pour les microprocesseurs, ce qui permet son utilisation même par des utilisateurs non spécialistes de la conception de circuits numériques. Ces travaux constituent donc une avancée significative pour une plus large adoption des FPGA comme accélérateurs matériels génériques, afin de rendre les machines de calcul simultanément plus rapides et plus économes en énergie. / In the field of high-performance computing, FPGA circuits are very attractive for their performance and low consumption. However, their presence is still marginal, mainly because of the limitations of current development tools. These limitations force the user to have expert knowledge about numerous technical concepts. They also have to manually control the synthesis processes in order to obtain solutions both fast and that fulfill the hardware constraints of the targeted platforms.A novel generation methodology based on high-level synthesis is proposed in order to push these limits back. The design space exploration consists in the iterative application of transformations to an initial circuit, which progressively increases its rapidity and its resource consumption. The rapidity of this process, along with its convergence under resource constraints, are thus guaranteed. The exploration is also guided towards the most pertinent solutions thanks to the detection of the most critical sections of the applications to synthesize, for the targeted execution context. This information can be refined with an execution scenarion specified by the user.A demonstration tool for this methodology, AUGH, has been built. Experiments have been conducted with several applications known in the field of high-level synthesis. Of very differen sizes, these applications confirm the pertinence of the proposed methodology for fast and automatic generation of complex hardware accelerators, under strict resource constraints. The proposed methodology is very close to the compilation process for microprocessors, which enable it to be used even by users non experts about digital circuit design. These works constitute a significant progress for a broader adoption of FPGA as general-purpose hardware accelerators, in order to make computing machines both faster and more energy-saving. Méthodologie de conception Synthèse d'architecture Accélérateurs matériels FPGA Exploration de l'espace de conception Design flow High-level synthesis Hardware accelerator FPGA Design space exploration 620
30	Hardware Architecture of an XML/XPath Broker/Router for Content-Based Publish/Subscribe Data Dissemination Systems El-Hassan, Fadi January 2014 (has links) The dissemination of various types of data faces ongoing challenges with the growing need of accessing manifold information. Since the interest in content is what drives data networks, some new technologies and thoughts attempt to cope with these challenges by developing content-based rather than address-based architectures. The Publish/ Subscribe paradigm can be a promising approach toward content-based data dissemination, especially that it provides total decoupling between publishers and subscribers. However, in content-based publish/subscribe systems, subscriptions are expressive and the information is often delivered based on the matched expressive content - which may not deeply alleviate considerable performance challenges. This dissertation explores a hardware solution for disseminating data in content-based publish/subscribe systems. This solution consists of an efficient hardware architecture of an XML/XPath broker that can route information based on content to either other XML/XPath brokers or to ultimate users. A network of such brokers represent an overlay structure for XML content-based publish/subscribe data dissemination systems. Each broker can simultaneously process many XPath subscriptions, efficiently parse XML publications, and subsequently forward notifications that result from high-performance matching processes. In the core of the broker architecture, locates an XML parser that utilizes a novel Skeleton CAM-Based XML Parsing (SCBXP) technique in addition to an XPath processor and a high-performance matching engine. Moreover, the broker employs effective mechanisms for content-based routing, so as subscriptions, publications, and notifications are routed through the network based on content. The inherent reconfigurability feature of the broker’s hardware provides the system architecture with the capability of residing in any FPGA device of moderate logic density. Furthermore, such a system-on-chip architecture is upgradable, if any future hardware add-ons are needed. However, the current architecture is mature and can effectively be implemented on an ASIC device. Finally, this thesis presents and analyzes the experiments conducted on an FPGA prototype implementation of the proposed broker/router. The experiments tackle tests for the SCBXP alone and for two phases of development of the whole broker. The corresponding results indicate the high performance that the involved parsing, storing, matching, and routing processes can achieve. Publish/Subscribe Reconfigurable Hardware Content-Based Networks Content-based Routing FPGA Broker XML XPath Data Dissemination XML Parsing Embedded Systems Future Internet Router Hardware Accelerator

Search results