• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 14
  • 5
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 84
  • 84
  • 18
  • 17
  • 17
  • 14
  • 13
  • 13
  • 12
  • 12
  • 11
  • 10
  • 9
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Efficient Elliptic Curve Processor Architectures for Field Programmable Logic

Orlando, Gerardo 27 March 2002 (has links)
Elliptic curve cryptosystems offer security comparable to that of traditional asymmetric cryptosystems, such as those based on the RSA encryption and digital signature algorithms, with smaller keys and computationally more efficient algorithms. The ability to use smaller keys and computationally more efficient algorithms than traditional asymmetric cryptographic algorithms are two of the main reasons why elliptic curve cryptography has become popular. As the popularity of elliptic curve cryptography increases, the need for efficient hardware solutions that accelerate the computation of elliptic curve point multiplications also increases. This dissertation introduces elliptic curve processor architectures suitable for the computation of point multiplications for curves defined over fields GF(2^m) and curves defined over fields GF(p). Each of the processor architectures presented here allows designers to tailor the performance and hardware requirements according to their performance and cost goals. Moreover, these architectures are well suited for implementation in modern field programmable gate arrays (FPGAs). This point was proved with prototyped implementations. The fastest prototyped GF(2^m) processor can compute an arbitrary point multiplication for curves defined over fields GF(2^167) in 0.21 milliseconds and the prototyped processor for the field GF(2^192-2^64-1) is capable of computing a point multiplication in about 3.6 milliseconds. The most critical component of an elliptic curve processor is its arithmetic unit. A typical arithmetic unit includes an adder/subtractor, a multiplier, and possibly a squarer. Some of the architectures presented in this work are based on multiplier and squarer architectures developed as part of the work presented in this dissertation. The GF(2^m) least significant bit super-serial multiplier architecture, the GF(2^m) most significant bit super-serial multiplier architecture, and a new GF(p) Montgomery multiplier architecture were developed as part of this work together with a new squaring architecture for GF(2^m).
42

Versatile Montgomery Multiplier Architectures

Gaubatz, Gunnar 30 April 2002 (has links)
Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery's method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption.
43

Low Power and Low complexity Constant Multiplication using Serial Arithmetic

Johansson, Kenny January 2006 (has links)
<p>The main issue in this thesis is to minimize the energy consumption per operation for the arithmetic parts of DSP circuits, such as digital filters. More specific, the focus is on single- and multiple-constant multiplication using serial arithmetic. The possibility to reduce the complexity and energy consumption is investigated. The main difference between serial and parallel arithmetic, which is of interest here, is that a shift operation in serial arithmetic require a flip-flop, while it can be hardwired in parallel arithmetic.</p><p>The possible ways to connect a certain number of adders is limited, i.e., for single-constant multiplication, the number of possible structures is limited for a given number of adders. Furthermore, for each structure there is a limited number of ways to place the shift operations. Hence, it is possible to find the best solution for each constant, in terms of complexity, by an exhaustive search. Methods to bound the search space are discussed. We show that it is possible to save both adders and shifts compared to CSD serial/parallel multipliers. Besides complexity, throughput is also considered by defining structures where the critical path, for bit-serial arithmetic, is no longer than one full adder.</p><p>Two algorithms for the design of multiple-constant multiplication using serial arithmetic are proposed. The difference between the proposed design algorithms is the trade-offs between adders and shifts. For both algorithms, the total complexity is decreased compared to an algorithm for parallel arithmetic.</p><p>The impact of the digit-size, i.e., the number of bits to be processed in parallel, in FIR filters is studied. Two proposed multiple-constant multiplication algorithms are compared to an algorithm for parallel arithmetic and separate realization of the multipliers. The results provide some guidelines for designing low power multiple-constant multiplication algorithms for FIR filters implemented using digit-serial arithmetic.</p><p>A method for computing the number of logic switchings in bit-serial constant multipliers is proposed. The average switching activity in all possible multiplier structures with up to four adders is determined. Hence, it is possible to reduce the switching activity by selecting the best structure for any given constant. In addition, a simplified method for computing the switching activity in constant serial/parallel multipliers is presented. Here it is possible to reduce the energy consumption by selecting the best signed-digit representation of the constant.</p><p>Finally, a data dependent switching activity model is proposed for ripple-carry adders. For most applications, the input data is correlated, while previous estimations assumed un-correlated data. Hence, the proposed method may be included in high-level power estimation to obtain more accurate estimates. In addition, the model can be used as cost function in multiple-constant multiplication algorithms. A modified model based on word-level statistics, which is accurate in estimating the switching activity when real world signals are applied, is also presented.</p> / Report code: LiU-Tek-Lic-2006:30.
44

Design of high-speed summing circuitry and comparator for adaptive parallel multi-level decision feedback equalization

Gao, Hairong 23 June 1997 (has links)
Multi-level decision feedback equalization (MDFE) is an effective sampled signal processing technique to remove inter-symbol interference (ISI) from disk read-back signals. Parallelism which doubles the symbol rate can be realized by utilizing the characteristic of channel response and decision feedback equalization algorithm. A mixed-signal IC implementation has been chosen for the parallel MDFE. The differential current signals from the feedback equalizer are subtracted from the forward equalizer output at the summing node to cancel the non-causal ISI. A high-speed comparator with 6 bit resolution is used after the cancellation to detect the signal which contains no ISI. In this thesis, a description of the parallel MDFE structure and decision feedback equalization algorithm are presented. The design of a high-speed summing circuitry and a high-speed comparator are discussed. The same comparator design is used for the flash analog-to-digital converter (ADC) which generates error signals for adaptation.The circuits design and layout were carried out in an HP 1.2-��m n-well CMOS process. / Graduation date: 1998
45

A Multiprocessor Architecture Using Modular Arithmetic for Very High Precision Computation

Wu, Henry M. 01 April 1989 (has links)
We outline a multiprocessor architecture that uses modular arithmetic to implement numerical computation with 900 bits of intermediate precision. A proposed prototype, to be implemented with off-the-shelf parts, will perform high-precision arithmetic as fast as some workstations and mini- computers can perform IEEE double-precision arithmetic. We discuss how the structure of modular arithmetic conveniently maps into a simple, pipelined multiprocessor architecture. We present techniques we developed to overcome a few classical drawbacks of modular arithmetic. Our architecture is suitable to and essential for the study of chaotic dynamical systems.
46

Improved architectures for a fused floating-point add-subtract unit

Sohn, Jongwook 27 February 2012 (has links)
This report presents improved architecture designs and implementations for a fused floating-point add-subtract unit. The fused floating-point add-subtract unit is useful for DSP applications such as FFT and DCT butterfly operations. To improve the performance of the fused floating-point add-subtract unit, the dual path algorithm and pipelining technique are applied. The proposed designs are implemented for both single and double precision and synthesized with a 45nm standard-cell library. The fused floating-point add-subtract unit saves 40% of the area and power consumption and the dual path fused floating-point add-subtract unit reduces the latency by 30% compared to the traditional discrete floating-point add-subtract unit. By combining fused operation and the dual path design, the proposed floating-point add-subtract unit achieves low area, low power consumption and high speed. Based on the data flow analysis, the proposed fused floating-point add-subtract unit is split into two pipeline stages. Since the latencies of two pipeline stages are fairly well balanced the throughput of the entire logic is increased by 80% compared to the non-pipelined implementation. / text
47

Total delay optimization for column reduction multipliers considering non-uniform arrival times to the final adder

Waters, Ronald S. 26 June 2014 (has links)
Column Reduction Multiplier techniques provide the fastest multiplier designs and involve three steps. First, a partial product array of terms is formed by logically ANDing each bit of the multiplier with each bit of the multiplicand. Second, adders or counters are used to reduce the number of terms in each bit column to a final two. This activity is commonly described as column reduction and occurs in multiple stages. Finally, some form of carry propagate adder (CPA) is applied to the final two terms in order to sum them to produce the final product of the multiplication. Since forming the partial products, in the first step, is simply forming an array of the logical AND's of two bits, there is little opportunity for delay improvement for the first step. There has been much work done in optimizing the reduction stages for column multipliers in the second reduction step. All of the reduction approaches of the second step result in non-uniform arrival times to the input of the final carry propagate adder in the final step. The designs for carry propagate adders have been done assuming that the input bits all have the same arrival time. It is not evident that the non-uniform arrival times from the columns impacts the performance of the multiplier. A thorough analysis of the several column reduction methods and the impact of carry propagate adder designs, along with the column reduction design step, to provide the fastest possible final results, for an array of multiplier widths has not been undertaken. This dissertation investigates the design impact of three carry propagate adders, with different performance attributes, on the final delay results for four column reduction multipliers and suggests general ways to optimize the total delay for the multipliers. / text
48

Versatile architectures for cryptographic systems / Ευέλικτες αρχιτεκτονικές συστημάτων κρυπτογραφίας

Σχοινιανάκης, Δημήτριος 27 May 2014 (has links)
This doctoral thesis approaches the problem of designing versatile architectures for cryptographic hardware. By the term versatile we define hardware architectures capable of supporting a variety of arithmetic operations and algorithms useful in cryptography, with no need to reconfigure the internal interconnections of the integrated circuit. A versatile architecture could offer considerable benefits to the end-user. By embedding a variety of crucial operations in a common architecture, the user is able to switch seamlessly the underlying cryptographic protocols, which not only gives an added value in the design from flexibility but also from practicality point of view. The total cost of a cryptographic application can be also benefited; assuming a versatile integrated circuit which requires no additional circuitry for other vital operations (for example input–output converters) it is easy to deduce that the total cost of development and fabrication of these extra components is eliminated, thus reducing the total production cost. We follow a systematic approach for developing and presenting the proposed versatile architectures. First, an in-depth analysis of the algorithms of interest is carried out, in order to identify new research areas and weaknesses of existing solutions. The proposed algorithms and architectures operate on Galois Fields GF of the form GF(p) for integers and GF(2^n) for polynomials. Alternative number representation systems such as Residue Number System (RNS) for integers and Polynomial Residue Number System (PRNS) for polynomials are employed. The mathematical validity of the proposed algorithms and the applicability of RNS and PRNS in the context of cryptographic algorithms is also presented. The derived algorithms are decomposed in a way that versatile structures can be formulated and the corresponding hardware is developed and evaluated. New cryptanalytic properties of the proposed algorithms against certain types of attacks are also highlighted. Furthermore, we try to approach a fundamental problem in Very Large Scale Integration (VLSI) design, that is the problem of evaluating and comparing architectures using models independent from the underlying fabrication technology. We also provide generic methods to evaluate the optimal operation parameters of the proposed architectures and methods to optimize the proposed architectures in terms of speed, area, and area x speed product, based on the needs of the underlying application. The proposed methodologies can be expanded to include applications other than cryptography. Finally, novel algorithms based on new mathematical and design problems for the crucial operation of modular multiplication are presented. The new algorithms preserve the versatile characteristics discussed previously and it is proved that, along with existing algorithms in the literature, they may forma large family of algorithms applicable in cryptography, unified under the common frame of the proposed versatile architectures. / Η παρούσα διατριβή άπτεται του θέματος της ανάπτυξης ευέλικτων αρχιτεκτονικών κρυπτογραφίας σε ολοκληρωμένα κυκλώματα υψηλής ολοκλήρωσης (VLSI). Με τον όρο ευέλικτες ορίζονται οι αρχιτεκτονικές που δύνανται να υλοποιούν πλήθος βασικών αριθμητικών πράξεων για την εκτέλεση κρυπτογραφικών αλγορίθμων, χωρίς την ανάγκη επαναπροσδιορισμού των εσωτερικών διατάξεων στο ολοκληρωμένο κύκλωμα. Η χρήση ευέλικτων αρχιτεκτονικών παρέχει πολλαπλά οφέλη στο χρήστη. Η ενσωμάτωση κρίσιμων πράξεων απαραίτητων στη κρυπτογραφία σε μια κοινή αρχιτεκτονική δίνει τη δυνατότητα στο χρήστη να εναλλάσσει το υποστηριζόμενο κρυπτογραφικό πρωτόκολλο, εισάγοντας έτσι χαρακτηριστικά ευελιξίας και πρακτικότητας, χωρίς επιπρόσθετη επιβάρυνση του συστήματος σε υλικό. Αξίζει να σημειωθεί πως οι εναλλαγές αυτές δεν απαιτούν τη παρέμβαση του χρήστη. Σημαντική είναι η συνεισφορά μιας ευέλικτης αρχιτεκτονικής και στο κόστος μιας εφαρμογής. Αναλογιζόμενοι ένα ολοκληρωμένο κύκλωμα που μπορεί να υλοποιεί αυτόνομα όλες τις απαραίτητες πράξεις ενός αλγόριθμου χωρίς την εξάρτηση από εξωτερικά υποσυστήματα (π.χ. μετατροπείς εισόδου–εξόδου), είναι εύκολο να αντιληφθούμε πως το τελικό κόστος της εκάστοτε εφαρμογής μειώνεται σημαντικά καθώς μειώνονται οι ανάγκες υλοποίησης και διασύνδεσης επιπρόσθετων υποσυστημάτων στο ολοκληρωμένο κύκλωμα. Η ανάπτυξη των προτεινόμενων αρχιτεκτονικών ακολουθεί μια δομημένη προσέγγιση. Διενεργείται εκτενής μελέτη για τον προσδιορισμό γόνιμων ερευνητικών περιοχών και εντοπίζονται προβλήματα και δυνατότητες βελτιστοποίησης υπαρχουσών κρυπτογραφικών λύσεων. Οι νέοι αλγόριθμοι που αναπτύσσονται αφορούν τα Galois πεδία GF(p) και GF(2^n) και χρησιμοποιούν εναλλακτικές αριθμητικές αναπαράστασης δεδομένων όπως το αριθμητικό σύστημα υπολοίπων (Residue Number System (RNS)) για ακέραιους αριθμούς και το πολυωνυμικό αριθμητικό σύστημα υπολοίπων (Polynomial Residue Number System (PRNS)) για πολυώνυμα. Αποδεικνύεται η μαθηματική τους ορθότητα και βελτιστοποιούνται κατά τέτοιο τρόπο ώστε να σχηματίζουν ευέλικτες δομές. Αναπτύσσεται το κατάλληλο υλικό (hardware) και διενεργείται μελέτη χρήσιμων ιδιοτήτων των νέων αλγορίθμων, όπως για παράδειγμα νέες κρυπταναλυτικές ιδιότητες. Επιπρόσθετα, προσεγγίζουμε στα πλαίσια της διατριβής ένα βασικό πρόβλημα της επιστήμης σχεδιασμού ολοκληρωμένων συστημάτων μεγάλης κλίμακας (Very Large Scale Integration (VLSI)). Συγκεκριμένα, προτείνονται μέθοδοι σύγκρισης αρχιτεκτονικών ανεξαρτήτως τεχνολογίας καθώς και τρόποι εύρεσης των βέλτιστων συνθηκών λειτουργίας των προτεινόμενων αρχιτεκτονικών. Οι μέθοδοι αυτές επιτρέπουν στον σχεδιαστή να παραμετροποιήσει τις προτεινόμενες αρχιτεκτονικές με βάση τη ταχύτητα, επιφάνεια, ή το γινόμενο ταχύτητα x επιφάνεια. Οι προτεινόμενες μεθοδολογίες μπορούν εύκολα να επεκταθούν και σε άλλες εφαρμογές πέραν της κρυπτογραφίας. Τέλος, προτείνονται νέοι αλγόριθμοι για τη σημαντικότατη για την κρυπτογραφία πράξη του πολλαπλασιασμού με υπόλοιπα. Οι νέοι αλγόριθμοι ενσωματώνουν από τη μία τις ιδέες των ευέλικτων δομών, από την άλλη όμως βασίζονται σε νέες ιδέες και μαθηματικά προβλήματα τα οποία προσπαθούμε να προσεγγίσουμε και να επιλύσουμε. Αποδεικνύεται πως είναι δυνατή η ενοποίηση μιας μεγάλης οικογένειας αλγορίθμων για χρήση στην κρυπτογραφία, υπό τη στέγη των προτεινόμενων μεθοδολογιών για ευέλικτο σχεδιασμό.
49

A COMPREHENSIVE HDL MODEL OF A LINE ASSOCIATIVE REGISTER BASED ARCHITECTURE

Sparks, Matthew A. 01 January 2013 (has links)
Modern processor architectures suffer from an ever increasing gap between processor and memory performance. The current memory-register model attempts to hide this gap by a system of cache memory. Line Associative Registers(LARs) are proposed as a new system to avoid the memory gap by pre-fetching and associative updating of both instructions and data. This thesis presents a fully LAR-based architecture, targeting a previously developed instruction set architecture. This architecture features an execution pipeline supporting SWAR operations, and a memory system supporting the associative behavior of LARs and lazy writeback to memory.
50

Low Power and Low complexity Constant Multiplication using Serial Arithmetic

Johansson, Kenny January 2006 (has links)
The main issue in this thesis is to minimize the energy consumption per operation for the arithmetic parts of DSP circuits, such as digital filters. More specific, the focus is on single- and multiple-constant multiplication using serial arithmetic. The possibility to reduce the complexity and energy consumption is investigated. The main difference between serial and parallel arithmetic, which is of interest here, is that a shift operation in serial arithmetic require a flip-flop, while it can be hardwired in parallel arithmetic. The possible ways to connect a certain number of adders is limited, i.e., for single-constant multiplication, the number of possible structures is limited for a given number of adders. Furthermore, for each structure there is a limited number of ways to place the shift operations. Hence, it is possible to find the best solution for each constant, in terms of complexity, by an exhaustive search. Methods to bound the search space are discussed. We show that it is possible to save both adders and shifts compared to CSD serial/parallel multipliers. Besides complexity, throughput is also considered by defining structures where the critical path, for bit-serial arithmetic, is no longer than one full adder. Two algorithms for the design of multiple-constant multiplication using serial arithmetic are proposed. The difference between the proposed design algorithms is the trade-offs between adders and shifts. For both algorithms, the total complexity is decreased compared to an algorithm for parallel arithmetic. The impact of the digit-size, i.e., the number of bits to be processed in parallel, in FIR filters is studied. Two proposed multiple-constant multiplication algorithms are compared to an algorithm for parallel arithmetic and separate realization of the multipliers. The results provide some guidelines for designing low power multiple-constant multiplication algorithms for FIR filters implemented using digit-serial arithmetic. A method for computing the number of logic switchings in bit-serial constant multipliers is proposed. The average switching activity in all possible multiplier structures with up to four adders is determined. Hence, it is possible to reduce the switching activity by selecting the best structure for any given constant. In addition, a simplified method for computing the switching activity in constant serial/parallel multipliers is presented. Here it is possible to reduce the energy consumption by selecting the best signed-digit representation of the constant. Finally, a data dependent switching activity model is proposed for ripple-carry adders. For most applications, the input data is correlated, while previous estimations assumed un-correlated data. Hence, the proposed method may be included in high-level power estimation to obtain more accurate estimates. In addition, the model can be used as cost function in multiple-constant multiplication algorithms. A modified model based on word-level statistics, which is accurate in estimating the switching activity when real world signals are applied, is also presented. / Report code: LiU-Tek-Lic-2006:30.

Page generated in 0.0837 seconds