1 |
Redundant Number Systems for Optimising Digital Signal Processing Performance in Field Programmable Gate ArrayKamp, William Hermanus Michael January 2010 (has links)
Speeding up addition is the key to faster digital signal processing (DSP). This can be achieved by exploiting the properties of redundant number systems. Their expanded symbol (digit) alphabet gives them multiple representations for most values. Utilising redundant representations at the output of an adder permits addition to be performed without carry-propagation, yielding fast, constant time performance irrespective of the word length. A resource efficient implementation of this fast adder structure is developed that re-purposes the fast carry logic of low-cost field programmable gate arrays (FPGAs). Experiments confirm constant time addition and show that it outperforms binary ripple carry addition at word lengths of greater than 44 bits in a Xilinx Spartan 3 FPGA and 24 bits in an Altera Cyclone III FPGA.
Redundancy also provides other properties that can be exploited for performance gain. Some redundant representations will have more zero-symbols than others. These maximise the opportunities to exploit the multiplicative absorbing and additive identity properties of zero that when exercised reduce superfluous calculations. A serial recoding algorithm is developed that generates a redundant representation for a specified value with as few nonzero symbols as possible. Unlike previously published methods, it accepts a wide specification of number systems including those with irregularly spaced symbol alphabets. A Markov analysis and analysis of the elementary cycles in the formulated state machine provides average and worst case measures for the tested number system. Typically, the average number of non-zero symbols is less than a third and the worst case is less than a half.
Further to the increase in zero-symbols, zero-dominance is proposed as a new property of redundant number representations. It promotes a set of representations that have uniquely positioned zero-symbols, in a Pareto-optimal sense. This set covers all representations of a value and is used to select representations to optimise the calculation of a dot-product.
The dot-product or vector-multiply is a fundamental operation in DSP, since it is employed in filtering, correlation and convolution. The nonzero partial products can be packed together, substantially reducing the calculation time. The application of redundant number systems provides a two-fold benefit. Firstly, the number of nonzero partial products is reduced. Secondly, a novel opportunity is identified to use the representations in the zero-dominant set to optimise the packing further, gaining an extra 18% improvement.
An implementation of the proposed dot-product with partial product packing is developed for a Cyclone II FPGA. It outperforms a quad-multiplier binary implementation in throughput by 50% .
Redundant number systems excel at increasing performance in particular DSP subsystems, those that are numerically intensive and consist of considerable accumulation. The conversion back to a binary result is the performance bottleneck in the DSP algorithm, taking a time proportional to a binary adder. Therefore, redundant number systems are best utilised when this conversion cost can be amortised over many fast redundant additions, which is typical in many DSP and communications applications.
|
2 |
Multiple Constant Multiplication Optimization Using Common Subexpression Elimination and Redundant NumbersAl-Hasani, Firas Ali Jawad January 2014 (has links)
The multiple constant multiplication (MCM) operation is a fundamental operation in digital signal processing (DSP) and digital image processing (DIP). Examples of the MCM are in finite impulse response (FIR) and infinite impulse response (IIR) filters, matrix multiplication, and transforms.
The aim of this work is minimizing the complexity of the MCM operation using common subexpression elimination (CSE) technique and redundant number representations. The CSE technique searches and eliminates common digit patterns (subexpressions) among MCM coefficients. More common subexpressions can be found by representing the MCM coefficients using redundant number representations.
A CSE algorithm is proposed that works on a type of redundant numbers called the zero-dominant set (ZDS). The ZDS is an extension over the representations of minimum number of non-zero digits called minimum Hamming weight (MHW). Using the ZDS improves CSE algorithms' performance as compared with using the MHW representations. The disadvantage of using the ZDS is it increases the possibility of overlapping patterns (digit collisions). In this case, one or more digits are shared between a number of patterns. Eliminating a pattern results in losing other patterns because of eliminating the common digits. A pattern preservation algorithm (PPA) is developed to resolve the overlapping patterns in the representations.
A tree and graph encoders are proposed to generate a larger space of number representations. The algorithms generate redundant representations of a value for a given digit set, radix, and wordlength. The tree encoder is modified to search for common subexpressions simultaneously with generating of the representation tree. A complexity measure is proposed to compare between the subexpressions at each node. The algorithm terminates generating the rest of the representation tree when it finds subexpressions with maximum sharing. This reduces the search space while minimizes the hardware complexity.
A combinatoric model of the MCM problem is proposed in this work. The model is obtained by enumerating all the possible solutions of the MCM that resemble a graph called the demand graph. Arc routing on this graph gives the solutions of the MCM problem. A similar arc routing is found in the capacitated arc routing such as the winter salting problem. Ant colony optimization (ACO) meta-heuristics is proposed to traverse the demand graph. The ACO is simulated on a PC using Python programming language. This is to verify the model correctness and the work of the ACO. A parallel simulation of the ACO is carried out on a multi-core super computer using C++ boost graph library.
|
Page generated in 0.0137 seconds