Modern public-key cryptography relies extensively on modular multiplication with long operands. We investigate the opportunities to optimize this operation in a heterogeneous multiprocessing platform such as TI OMAP3530. By migrating the long operand modular multiplication from a general-purpose ARM Cortex A8 to a specialized C64x+ VLIW DSP, we are able to exploit the XOR-Multiply instruction and the inherent parallelism of the DSP. The proposed multiplication utilizes Multi-Precision Binary Polynomial Multiplication with Unbalanced Exponent Modular Reduction. The resulting DSP implementation performs a GF(2^233) multiplication in less than 1.31us, which is over a seven times speed up when compared with the ARM implementation on the same chip. We present several strategies for different field sizes and field polynomials, and show that a 360MHz DSP easily outperforms the 500MHz ARM. / Master of Science
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/33693 |
Date | 08 July 2009 |
Creators | Tergino, Christian Sean |
Contributors | Electrical and Computer Engineering, Schaumont, Patrick R., Hsiao, Michael S., Feng, Wu-chun |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Relation | Thesis.pdf |
Page generated in 0.0017 seconds