Global ETD Search

11	Design and Analysis of Modular Architectures for an RNS to Mixed Radix Conversion Multi-processor Shivashankar, Nithin 27 October 2014 (has links) No description available. Computer Engineering Residue Number System Mixed Radix RNS to Mixed Radix Conversion Multi-processor FPGA parallelization pipelining Modular Inverse
12	Heterogeneity-Aware Operator Placement in Column-Store DBMS Karnagel, Tomas, Habich, Dirk, Schlegel, Benjamin, Lehner, Wolfgang 02 February 2023 (has links) Due to the tremendous increase in the amount of data efficiently managed by current database systems, optimization is still one of the most challenging issues in database research. Today’s query optimizer determine the most efficient composition of physical operators to execute a given SQL query, whereas the underlying hardware consists of a multi-core CPU. However, hardware systems are more and more shifting towards heterogeneity, combining a multi-core CPU with various computing units, e.g., GPU or FPGA cores. In order to efficiently utilize the provided performance capability of such heterogeneous hardware, the assignment of physical operators to computing units gains importance. In this paper, we propose a heterogeneity-aware physical operator placement strategy (HOP) for in-memory columnar database systems in a heterogeneous environment. Our placement approach takes operators from the physical query execution plan as an input and assigns them to computing units using a cost model at runtime. To enable this runtime decision, our cost model uses the characteristics of the computing units, execution properties of the operators, as well as runtime data to estimate execution costs for each unit. We evaluated our approach on full TPC-H queries within a prototype database engine. As we are going to show, the placement in a heterogeneous hardware system has a high influence on query performance. info:eu-repo/classification/ddc/004 ddc:004
13	Efficient Integer Representations for Cryptographic Operations Muir, James January 2004 (has links) Every positive integer has a unique radix 2 representation which uses the digits {0,1}. However, if we allow digits other than 0 and 1, say {0,1,-1}, then a positive integer has many representations. Of these <i>redundant</i> representations, it is possible to choose one that has few nonzero digits. It is well known that using representations of integers with few nonzero digits allows certain algebraic operations to be done more quickly. This thesis is concerned with various representations of integers that are related to efficient implementations of algebraic operations in cryptographic algorithms. The topics covered here include: <ul> <li> <i>The width-w nonadjacent form (w-NAF)</i>. We prove that the <i>w</i>-NAF of an integer has a minimal number of nonzero digits; that is, no other representation of an integer, which uses the <i>w</i>-NAF digits, can have fewer nonzero digits than its <i>w</i>-NAF. </li> <li><i>A left-to-right analogue of the w-NAF</i>. We introduce a new family of radix 2 representations which use the same digits as the <i>w</i>-NAF, but have the property that they can be computed by sliding a window from left to right across the binary representation of an integer. We show these new representations have a minimal number of nonzero digits. </li> <li><i>Joint representations</i>. Solinas introduced a {0,1,-1}-radix 2 representation for pairs of integers called the joint sparse form. We consider generalizations of the joint sparse form which represent <i>r</i>≥2 integers and use digits other than {0,1,-1}. We show how to construct a {0,1,2,3}-joint representation that has a minimal number of nonzero columns. </li> <li><i>Nonadjacent digit sets</i>. It is well known that if <i>x</i> equals 3 or -1 then every nonnegative integer has a unique {0,1,<i>x</i>}-nonadjacent form; that is, a {0,1,<i>x</i>}-radix 2 representation with the property that, of any two consecutive digits, at most one is nonzero. We investigate what other values of <i>x</i> have this property. </li> </ul> Mathematics redundant representations radix 2 minimum weight representation left-to-right recodings joint representations
14	Versatile Montgomery Multiplier Architectures Gaubatz, Gunnar 30 April 2002 (has links) Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery's method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption. computer arithmetic modular multiplication public key cryptography montgomery vlsi high radix Public key cryptography Computer algorithms
15	FFT Implemention on FPGA for 5G Networks Vasilica, Vlad Valentin January 2019 (has links) The main goal of this thesis will be the design and implementation of a 2048-point FFT on an FPGA through the use of VHDL code.The FFT will use a butterﬂy Radix-2 architecture with focus on the comparison of the parameters between the system with diﬀerent Worlengths, Coeﬃcient Wordlengths and Symbol Error rates as well as diﬀerent modulation types, comparing 64QAM and 256QAM for the 5Gsystem.This implementation will replace an FFT function block in a Matlab based open source 5G NR simulator based on the 3GPP 15 standard and simulate spectrum, MSE payload,and SER performance. FFT OFDMA Physical design FPGA 5G VHDL CP-OFDMA Radix-2 2048-Point Computer Engineering Datorteknik
16	Design and Implementation of an Asynchronous Pipelined FFT Processor / Design och implementering av en asynkron pipelinad FFT processor Claesson, Jonas January 2003 (has links) <p>FFT processors are today one of the most important blocks in communication equipment. They are used in everything from broadband to 3G and digital TV to Radio LANs. This master's thesis project will deal with pipelined hardware solutions for FFT processors with long FFT transforms, 1K to 8K points. These processors could be used for instance in OFDM communication systems. </p><p>The final implementation of the FFT processor uses a GALS (Globally Asynchronous Locally Synchronous) architecture, that implements the SDF (Single Delay Feedback) radix-22 algorithm. </p><p>The goal of this report is to outline the knowledge gained during the master's thesis project, to describe a design methodology and to document the different building blocks needed in these kinds of systems.</p> Electronics DFT FFT Pipelined Parameterizable Processor GALS Radix-22 SDF Elektronik Electronics Elektronik
17	Design and Implementation of an Asynchronous Pipelined FFT Processor / Design och implementering av en asynkron pipelinad FFT processor Claesson, Jonas January 2003 (has links) FFT processors are today one of the most important blocks in communication equipment. They are used in everything from broadband to 3G and digital TV to Radio LANs. This master's thesis project will deal with pipelined hardware solutions for FFT processors with long FFT transforms, 1K to 8K points. These processors could be used for instance in OFDM communication systems. The final implementation of the FFT processor uses a GALS (Globally Asynchronous Locally Synchronous) architecture, that implements the SDF (Single Delay Feedback) radix-22 algorithm. The goal of this report is to outline the knowledge gained during the master's thesis project, to describe a design methodology and to document the different building blocks needed in these kinds of systems. Electronics DFT FFT Pipelined Parameterizable Processor GALS Radix-22 SDF Elektronik Electronics Elektronik
18	Efficient Integer Representations for Cryptographic Operations Muir, James January 2004 (has links) Every positive integer has a unique radix 2 representation which uses the digits {0,1}. However, if we allow digits other than 0 and 1, say {0,1,-1}, then a positive integer has many representations. Of these <i>redundant</i> representations, it is possible to choose one that has few nonzero digits. It is well known that using representations of integers with few nonzero digits allows certain algebraic operations to be done more quickly. This thesis is concerned with various representations of integers that are related to efficient implementations of algebraic operations in cryptographic algorithms. The topics covered here include: <ul> <li> <i>The width-w nonadjacent form (w-NAF)</i>. We prove that the <i>w</i>-NAF of an integer has a minimal number of nonzero digits; that is, no other representation of an integer, which uses the <i>w</i>-NAF digits, can have fewer nonzero digits than its <i>w</i>-NAF. </li> <li><i>A left-to-right analogue of the w-NAF</i>. We introduce a new family of radix 2 representations which use the same digits as the <i>w</i>-NAF, but have the property that they can be computed by sliding a window from left to right across the binary representation of an integer. We show these new representations have a minimal number of nonzero digits. </li> <li><i>Joint representations</i>. Solinas introduced a {0,1,-1}-radix 2 representation for pairs of integers called the joint sparse form. We consider generalizations of the joint sparse form which represent <i>r</i>≥2 integers and use digits other than {0,1,-1}. We show how to construct a {0,1,2,3}-joint representation that has a minimal number of nonzero columns. </li> <li><i>Nonadjacent digit sets</i>. It is well known that if <i>x</i> equals 3 or -1 then every nonnegative integer has a unique {0,1,<i>x</i>}-nonadjacent form; that is, a {0,1,<i>x</i>}-radix 2 representation with the property that, of any two consecutive digits, at most one is nonzero. We investigate what other values of <i>x</i> have this property. </li> </ul> Mathematics redundant representations radix 2 minimum weight representation left-to-right recodings joint representations
19	Energy-Efficient Multiple-Word Montgomery Modular Multiplier Chen, Chia-Wen 25 July 2012 (has links) Nowadays, Internet plays an indispensable role in human lives. People use Internet to search information, transmit data, download ?le, and so on. The data transformed to the composed digital signal by ¡¦0¡¦ and ¡¦1¡¦ are transmitted on Internet . However, Internet is open and unreliable, data may be stolen from the other people if they are not encrypted. In order to ensure the security and secret of data, the cryptosystem is very important. RSA is a famous public-key cryptosystem, and it has easy concept and high security. It needs a lot of modular exponentiations while encryption or decryption. The key length of RSA is always larger than 1024 bits to ensure the high security. In order to achieve real time transmission, we have to speed up the RSA cryptosystem. Therefore, it must be implemented on hardware. In RSA cryptosystem, modular exponentiation is the only operation. Modular exponentiation is based on modular multiplications. Montgomery¡¦s Algorithm used simple additions and shifts to implement the complex modular multiplication. Because the key length is usually larger than 1024 bits, some signals have a lot of fan-outs in hardware architecture. Therefore, the signals have to connect buffers to achieve enough driving ability. But, it may lead to longer delay time and more power consumption. So, Tenca et al. proposed a Multiple Word Montgomery Algorithm to improve the problem of fan-out. Recently, Huang et al. proposed an algorithm which can reduce data dependency of Tenca¡¦s algorithm. This research is based on the architecture of Huang¡¦s algorithm and detects the redundant operations. Then, we block the unnecessary signals to reduce the switch activities. Besides, we use low power shift register to reduce the power consumption of shift register. Experimental results show that our design is useful on decreasing power consumption. Low-Power Energy-Efficient RSA Cryptosystems Montgomery¡¦s Algorithm
20	Decomposição de coeficientes trigonométricos para a redução de área e potência em arquiteturas FFT híbridas na base 2 / Trigonometric coefficients decomposition for area and power reduction in hybrid radix-2 FFT architectures Ghissoni, Sidinei January 2012 (has links) A crescente utilização de equipamentos móveis que empregam a transformada rápida de Fourier (FFT) nas operações de sinal digital pode ter seu uso restrito devido ao comprometimento da durabilidade da bateria e de suas dimensões. Estas possíveis limitações de uso fazem crescer a necessidade do desenvolvimento de técnicas que visam à otimização nos três requisitos básicos de projeto digital: dissipação de potência, área e atraso. Para tanto, é abordado neste trabalho um método que realiza a implementação de arquiteturas FFT com ênfase na otimização através da decomposição dos coeficientes trigonométricos. No cálculo da FFT, as borboletas desempenham um papel central, uma vez que permitem o cálculo de termos complexos. Neste cálculo, que envolve multiplicações dos dados de entrada com coeficientes trigonométricos apropriados, a otimização das borboletas pode contribuir diretamente para a redução de potência e área. Na técnica proposta são analisados quais são os coeficientes trigonométricos existentes na arquitetura FFT utilizada como base e a escolha para decomposição será o que apresentar o menor custo de implementação em hardware. A decomposição de um coeficiente deve garantir a reconstituição de todos os demais coeficientes necessários para a implementação de toda a arquitetura FFT. Assim, a decomposição diminui o número de coeficientes necessários para reconstruir a FFT original. O conjunto dos novos coeficientes gerados são implementados com apenas somadores\subtratores e deslocamentos através de Multiplicação de Matrizes Constantes (CMM – Constant Matrix Multiplication), associados a um sistema de controle com multiplexadores que controlam o caminho para a correta operação da FFT. As implementações dos circuitos somadores/subtratores são realizadas com métrica no nível de portas lógicas, visando menor atraso e dissipação de potência para topologias com somadores dos tipos CSA (Carry Save Adder) e Ripple carry. Os resultados apresentados pelo método proposto, quando comparados com soluções da literatura, são significativamente satisfatórios, pois minimizaram a dissipação de potência e área em 30% e 24% respectivamente. Os resultados apresentam também a redução de componentes somadores necessários para a implementação de arquiteturas FFTs. / The increasing use of mobile devices using the Fast Fourier Transform (FFT) operations in digital signal may have its use restricted due compromising the durability of the battery and its dimensions. These possible limitations on usage makes grow the need to develop techniques aimed at optimizing the three basic requirements of digital design: power dissipation, area and delay. Therefore, this thesis discusses a method that performs the FFT implementation of architectures with emphasis on optimization through decomposition of twiddle factors (trigonometric coefficients). In the FFT the butterflies play a key role, since it allows the computation of complex terms. In this calculation, which involves multiplications of input data with appropriate twiddle factors, optimization of the butterflies can contribute directly to the reduction in power and area. In the proposed technique are analyzed what are the twiddle factors existing in FFT architecture used as a basis and to choose the decomposition that provide the lowest cost hardware implementation. The decomposition of coefficient to must ensure the rebuilding of all the other twiddle factors necessary for the implementation of the architecture FFT. Thus, the decomposition decreases the number of twiddle factors needed to reconstruct the original FFT. The new sets of coefficients generated are implemented with only adders\subtracters and shifting through of Constants Matrix Multiplication (CMM). A control system of multiplexers makes the way for the correct operation of the FFT. The implementations of the circuits arithmetic adders/subtracters are performed at the gate level, seeking lower delay and power consumption for topologies with adders types of CSA (Carry Save Adder) and Ripple carry. The results presented by the proposed method, compared with literature solutions are significantly satisfactory, since minimized power dissipation and area as well as reduced component adders required for implementation architectures FFTs. Microeletrônica Circuitos digitais Twiddle factors Gate-level CMM Radix-2 Low-power Area

Search results