Spelling suggestions: "subject:"deduced are"" "subject:"educed are""
1 |
Total delay optimization for column reduction multipliers considering non-uniform arrival times to the final adderWaters, Ronald S. 26 June 2014 (has links)
Column Reduction Multiplier techniques provide the fastest multiplier designs and involve three steps. First, a partial product array of terms is formed by logically ANDing each bit of the multiplier with each bit of the multiplicand. Second, adders or counters are used to reduce the number of terms in each bit column to a final two. This activity is commonly described as column reduction and occurs in multiple stages. Finally, some form of carry propagate adder (CPA) is applied to the final two terms in order to sum them to produce the final product of the multiplication. Since forming the partial products, in the first step, is simply forming an array of the logical AND's of two bits, there is little opportunity for delay improvement for the first step. There has been much work done in optimizing the reduction stages for column multipliers in the second reduction step. All of the reduction approaches of the second step result in non-uniform arrival times to the input of the final carry propagate adder in the final step. The designs for carry propagate adders have been done assuming that the input bits all have the same arrival time. It is not evident that the non-uniform arrival times from the columns impacts the performance of the multiplier. A thorough analysis of the several column reduction methods and the impact of carry propagate adder designs, along with the column reduction design step, to provide the fastest possible final results, for an array of multiplier widths has not been undertaken. This dissertation investigates the design impact of three carry propagate adders, with different performance attributes, on the final delay results for four column reduction multipliers and suggests general ways to optimize the total delay for the multipliers. / text
|
2 |
High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware AcceleratorsUllah, Salim, Rehman, Semeen, Shafique, Muhammad, Kumar, Akash 07 February 2023 (has links)
Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning. FPGA vendors provide high-performance multipliers in the form of DSP blocks. These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications. Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication. However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency. Toward this, we present generic area-optimized, low-latency accurate, and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, i.e., lookup table (LUT) structures and fast-carry chains to reduce the overall critical path delay (CPD) and resource utilization of multipliers. Compared to Xilinx multiplier LogiCORE IP, our proposed unsigned and signed accurate architecture provides up to 25% and 53% reduction in LUT utilization, respectively, for different sizes of multipliers. Moreover, with our unsigned approximate multiplier architectures, a reduction of up to 51% in the CPD can be achieved with an insignificant loss in output accuracy when compared with the LogiCORE IP. For illustration, we have deployed the proposed multiplier architecture in accelerators used in image and video applications, and evaluated them for area and performance gains. Our library of accurate and approximate multipliers is opensource and available online at https://cfaed.tu-dresden.de/pd-downloads to fuel further research and development in this area, facilitate reproducible research, and thereby enabling a new research direction for the FPGA community.
|
Page generated in 0.0359 seconds