Return to search

Hardware accelerators for post-quantum cryptography and fully homomorphic encryption

With the monetization of user data, data breaches have become very common these days. In the past five years, there were more than 7000 data breaches involving theft of personal information of billions of people. In the year 2020 alone, the global average cost per data breach was $3.86 million, and this number rose to $4.24 million in 2021. Therefore, the need for maintaining data security and privacy is becoming increasingly critical. Over the years, various data encryption schemes including RSA, ECC, and AES are being used to enable data security and privacy. However, these schemes are deemed vulnerable to quantum computers with their enormous processing power. As quantum computers are expected to become main stream in the near future, post-quantum secure encryption schemes are required. To this end, through
NIST’s standardization efforts, code-based and lattice-based encryption schemes have emerged as one of the plausible way forward. Both code-based and lattice-based encryption schemes enable public key cryptosystems, key exchange mechanisms, and digital signatures. In addition, lattice-based encryption schemes support fully homomorphic encryption (FHE) that enables computation on encrypted data.

Over the years, there have been several efforts to design efficient FPGA-based and ASIC-based solutions for accelerating the code-based and lattice-based encryption schemes. The conventional code-based McEliece cryptosystem uses binary Goppa code, which has good code rate and error correction capability, but suffers from high encoding and decoding complexity. Moreover, the size of the generated public key is in several MBs, leading to cryptosystem designs that cannot be accommodated on low-end FPGAs. In lattice-based encryption schemes, large polynomial ring operations
form the core compute kernel and remain a key challenge for many hardware designers. To extend support for large modular arithmetic operations on an FPGA, while incurring low latency and hardware resource utilization requires substantial design efforts. Moreover, prior FPGA solutions for lattice-based FHE include hardware acceleration of basic FHE primitives for impractical parameter sets without the support for bootstrapping operation that is critical to building real-time privacy-preserving applications. Similarly, prior ASIC proposals of FHE that include bootstrapping are heavily memory bound, leading to large execution times, underutilized compute resources, and cost millions of dollars.

To respond to these challenges, in this dissertation, we focus on the design of efficient hardware accelerators for code-based and lattice-based public key cryptosystems (PKC). For code-based PKC, we propose the design of a fully-parameterized en/decryption co-processor based on a new variant of McEliece cryptosystem. This co-processor takes advantage of the non-binary Orthogonal Latin Square Code (OLSC) to achieve a lower computational complexity along with smaller key size than that of the binary Goppa code. Our FPGA-based implementation of the co-processor is ∼3.5× faster than an existing classic McEliece cryptosystem implementation. For lattice-based PKC, we propose the design of a co-processor that implements large polynomial ring operations. It uses a fully-pipelined NTT polynomial multiplier to perform fast polynomial multiplications. We also propose the design of a highly-optimized Gaussian noise sampler, capable of sampling millions of high-precision samples per second. Through an FPGA-based implementation of this lattice-based PKC co-processor, we
achieve a speedup of 6.5× while utilizing 5× less hardware resources as compared to state-of-the-art implementations.

Leveraging our work on lattice-based PKC implementation, we explore the design of hardware accelerators that perform FHE operations using Cheon-Kim-Kim-Song (CKKS) scheme. Here, we first perform an in-depth architectural analysis of various FHE operations in the CKKS scheme so as to explore ways to accelerate an end-to-end FHE application. For this analysis, we develop a custom architecture modeling tool, SimFHE, to measure the compute and memory bandwidth requirements of hardware-accelerated CKKS. Our analysis using SimFHE reveals that, without a prohibitively large cache, all FHE operations exhibit low arithmetic intensity (<1 Op/byte). To address the memory bottleneck resulting from the low arithmetic intensity, we propose several memory-aware design (MAD) techniques, including caching and algorithmic
optimizations, to reduce the memory requirements of CKKS-based application execution. We show that the use of our MAD techniques can yield an ASIC design that is at least 5-10× cheaper than the large-cache proposals, but only ∼2-3× slower.

We also design FAB, an FPGA-based accelerator for bootstrappable FHE. FAB, for the first time ever, accelerates bootstrapping (along with basic FHE primitives) on an FPGA for a secure and practical parameter set. FAB tackles the memory-bounded nature of bootstrappable FHE through judicious datapath modification, smart operation scheduling, and on-chip memory management techniques to maximize the overall FHE-based compute throughput. FAB outperforms all prior CPU/GPU works by 9.5× to 456× and provides a practical performance for our target application: secure
training of logistic regression models. / 2025-01-16T00:00:00Z

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/45456
Date16 January 2023
CreatorsAgrawal, Rashmi
ContributorsJoshi, Ajay J.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0026 seconds