Return to search

Fused floating-point arithmetic for DSP

Floating-point arithmetic is attractive for the implementation for a variety of Digital Signal Processing (DSP) applications because it allows the designer and user to concentrate on the algorithms and architecture without worrying about numerical issues. In the past, many DSP applications used fixed point arithmetic due to the high cost (in delay, silicon area, and power consumption) of floating-point arithmetic units. In the realization of modern general purpose processors, fused floating-point multiply add units have become attractive since their delay and silicon area is often less than that of a discrete floating-point multiplier followed by a floating point adder. Further the accuracy is improved by the fused implementation since rounding is performed only once (after the multiplication and addition). This work extends the consideration of fused floating-point arithmetic to operations that are frequently encountered in DSP. The Fast Fourier Transform is a case in point since it uses a complex butterfly operation. For a radix-2 implementation, the butterfly consists of a complex multiply and the complex addition and subtraction of the same pair of data. For a radix-4 implementation, the butterfly consists of three complex multiplications and eight complex additions and subtractions. Both of these butterfly operations can be implemented with two fused primitives, a fused two-term dot-product unit and a fused add-subtract unit. The fused two-term dot-product multiplies two sets of operands and adds the products as a single operation. The two products do not need to be rounded (only the sum is normalized and rounded) which reduces the delay by about 15% while reducing the silicon area by about 33%. For the add-subtract unit, much of the complexity of a discrete implementation comes from the need to compare the operand exponents and align the significands prior to the add and the subtract operations. For the fused implementation, sharing the comparison and alignment greatly reduces the complexity. The delay and the arithmetic results are the same as if the operations are performed in the conventional manner with a floating-point adder and a separate floating-point subtracter. In this case, the fused implementation is about 20% smaller than the discrete equivalent. / text

Identiferoai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/18409
Date16 October 2012
CreatorsSaleh, Hani Hasan Mustafa, 1970-
Source SetsUniversity of Texas
LanguageEnglish
Detected LanguageEnglish
Formatelectronic
RightsCopyright is held by the author. Presentation of this material on the Libraries' web site by University Libraries, The University of Texas at Austin was made possible under a limited license grant from the author who has retained all copyrights in the works.

Page generated in 0.0024 seconds