• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Fused floating-point arithmetic for application specific processors

Min, Jae Hong 25 February 2014 (has links)
Floating-point computer arithmetic units are used for modern-day computers for 2D/3D graphic and scientific applications due to their wider dynamic range than a fixed-point number system with the same word-length. However, the floating-point arithmetic unit has larger area, power consumption, and latency than a fixed-point arithmetic unit. It has become a big issue in modern low-power processors due to their limited power and performance margins. Therefore, fused architectures have been developed to improve floating-point operations. This dissertation introduces new improved fused architectures for add-subtract, sum-of-squares, and magnitude operations for graphics, scientific, and signal processing. A low-power dual-path fused floating-point add-subtract unit is introduced and compared with previous fused add-subtract units such as the single path and the high-speed dual-path fused add-subtract unit. The high-speed dual-path fused add-subtract unit has less latency compared with the single-path unit at a cost of large power consumption. To reduce the power consumption, an alternative dual-path architecture is applied to the fused add-subtract unit. The significand addition, subtraction and round units are performed after the far/close path. The power consumption of the proposed design is lower than the high-speed dual-path fused add-subtract unit at a cost in latency; however, the proposed fused unit is faster than the single-path fused unit. High-performance and low-power floating-point fused architectures for a two-term sum-of-squares computation are introduced and compared with discrete units. The fused architectures include pre/post-alignment, partial carry-sum width, and enhanced rounding. The fused floating-point sum-of-squares units with the post-alignment, 26 bit partial carry-sum width, and enhanced rounding system have less power-consumption, area, and latency compared with discrete parallel dot-product and sum-of-squares units. Hardware tradeoffs are presented between the fused designs in terms of power consumption, area, and latency. For example, the enhanced rounding processing reduces latency with a moderate cost of increased power consumption and area. A new type of fused architecture for magnitude computation with less power consumption, area, and latency than conventional discrete floating-point units is proposed. Compared with the discrete parallel magnitude unit realized with conventional floating-point squarers, an adder, and a square-root unit, the fused floating-point magnitude unit has less area, latency, and power consumption. The new design includes new designs for enhanced exponent, compound add/round, and normalization units. In addition, a pipelined structure for the fused magnitude unit is shown. / text
2

Design of a Table-Driven Function Evaluation Generator Using Bit-Level Truncation Methods

Tseng, Yu-ling 30 August 2011 (has links)
Functional evaluation is one of key arithmetic operations in many applications including 3D graphics and stereo. Among various designs of hardware-based function evaluators, piecewise polynomial approximation methods are the most popular which interpolate the piecewise function curve in a sub-interval using polynomials with polynomial coefficients of each sub-interval stored in an entry of a ROM. The conventional piecewise methods usually determine the bit-widths of each ROM entry and multipliers and adders by analyzing the various error sources, including polynomial approximation errors, coefficient quantization errors, truncation errors of arithmetic operations, and the final rounding error. In this thesis, we present a new piecewise function evaluation design by considering all the error sources together. By combining all the error sources during the approximation, quantization, truncation and rounding, we can efficiently reduce the area cost of ROM and the corresponding arithmetic units. The proposed method is applied to piecewise function evaluators of both uniform and non-uniform segmentation.
3

EXPLORAÇÃO DE OPERADORES ARITMÉTICOS NA TRANSFORMADA RÁPIDA DE FOURIER / ARITHMETICS OPERATORS EXPLORATION IN FAST FOURIER TRANSFORM

Fonseca, Mateus Beck 22 October 2010 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / The power consumption reduction in the fast Fourier transform (FFT) is important because applications in battery-powered embedded systems grows daily. Thus this work focuses on the application of techniques to reduce power in specific projects of FFT algorithms. The goal is to achieve an architectural exploration in the FFT core, the decimation in time butterfly radix-2 and the efficient implementation of arithmetic operators in the internal structure of this butterfly. The techniques applied to the butterfly are aimed at reducing power consumption through architectural exploration and data encryption. Five different butterfly topologies are shown, one of those, proposed in this work uses three real multipliers, and is based on the previous storage of the product of real and imaginary values of the twiddle factors. The advantage of this topology is the possibility of using 4:2 adder compressors, which performs the sum of four operands simultaneously with reduced critical path. These adder compressors have XOR gates in the critical path, is proposed in this paper a new XOR gate circuit, which is based on the use of pass transistors logic. This new XOR gate circuit has been applied to adder compressors 3:2 and 4:2, which are applied to adders blocks of the butterflies. Digital circuits have been developed in hardware description language and some in the electrical schematic level. Results of area, power consumption and cell count in the logic synthesis in 180nm at 100MHz and 20MHz with switching activity analysis for 10,000 random input vectors were obtained for this work. The electrical level simulations in an environment of mixed digital and analog signals were also performed to the evaluation of the compressors with new topology of XOR gate. Analyses show that 3:2 adder compressor has lower power consumption using the new XOR gate circuit. However, the same conclusion was not achieve in relation to the 4:2 adder compressor which has a lower power consumption using the CMOS XOR gate. Butterfly structures evaluated uses a significant amount of arithmetic operators in their internal structures, so was used different design strategies for implementation. Initially was used the arithmetic operators of automatic synthesis tool (Cadence). After, used dedicated arithmetic operators (adder compressors with the new XOR gate circuit, RNS adders and array multipliers). The results show that butterflies have lower power consumption with the use of adder compressors in their internal structures. / A redução no consumo de potência na transformada rápida de Fourier (FFT) é importante pois sua aplicação cresce em sistemas embarcados movidos à bateria. Sendo assim este trabalho tem como foco a aplicação de técnicas de redução de potência para projetos específicos de algoritmos da FFT. O objetivo é realizar uma exploração arquitetural no elemento central de cálculo da FFT, borboleta na base 2 com decimação no tempo, bem como a aplicação de operadores aritméticos eficientes na estrutura interna desta borboleta. As técnicas aplicadas à borboleta têm por objetivo a redução do consumo de potência através de exploração arquitetural e codificação de dados. São apresentadas cinco diferentes topologias de borboleta, sendo uma destas, proposta no âmbito deste trabalho utilizando três multiplicadores reais é baseada no armazenamento prévio do produto dos valores real e imaginário dos coeficientes. A vantagem desta topologia é a possibilidade do uso de somadores compressores 4:2, que realiza a soma simultânea de quatro operandos, com reduzido caminho crítico. Como estes somadores compressores apresentam portas XOR no caminho crítico, é proposta neste trabalho uma nova porta XOR, que é baseada no uso de transistores de passagem. Esta nova porta lógica XOR foi aplicada em somadores compressores 3:2 e 4:2, que são aplicados nos blocos somadores das borboletas. Os circuitos digitais foram desenvolvidos em linguagem de descrição de hardware e alguns em esquemáticos de nível elétrico. Resultados de área, potência e contagem de células na síntese lógica em 180nm a 100MHz e 20MHz com análise de atividade de chaveamento para 10.000 vetores aleatórios de entrada foram obtidos e simulações no nível elétrico em um ambiente de sinais digitais e analógicos misto também foram realizadas para a avaliação dos compressores com a nova topologia de porta XOR. As análises mostram que os somadores compressores 3:2 apresentam menor consumo de potência com o uso da nova porta XOR. Entretanto, o mesmo não se observa em relação ao compressor 4:2 que apresenta um menor consumo de potência utilizando a porta XOR CMOS. Como as estruturas de borboleta avaliadas utilizam uma quantidade significativa de operadores aritméticos nas suas estruturas internas, foram utilizadas diferentes estratégias de projeto para as suas implementações. Inicialmente foram utilizados os operadores aritméticos da ferramenta de síntese automática (Cadence). Após, foram utilizados operadores aritméticos dedicados (somadores compressores com a nova porta XOR, somadores RNS e multiplicadores array). Os resultados mostram que as borboletas apresentam menores consumos de potência com o uso dos somadores compressores em suas estruturas.

Page generated in 0.0664 seconds