Global ETD Search

41	Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform Wu, Xiaolong 16 December 2015 (has links) Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordered property of row indices for partial sorting instead of full sorting of all triples according to row and column indices. Finally, through a hybrid CPU-GPU approach using two level pipelining technique, our algorithm is able to better exploit a heterogeneous system. Compared with the state-of-the-art SpMM methods in cuSPARSE and CUSP libraries, our approach achieves an average of 1.6x and 2.9x speedup separately on the nine representative matrices from University of Florida sparse matrix collection. Sparse matrix-matrix multiplication Data locality Pipelining GPU
42	Multiplikativt tänkande : Olika strategier för beräkningar av uppgifter inom multiplikation och division Knutsmark, Matilda January 2016 (has links) Studien fokuserar på multiplikativt tänkande hos elever i årskurs 3. Multiplikativt tänkande är abstrakt (Clark & Kamii, 1996) och innebär användning av strategier för lösningar av multiplikations- och divisionsuppgifter. Syftet med studien är att undersöka hur elever använder sig av olika strategier inom multiplikativt tänkande vid multiplikation och division. Studien har inspirerats av Grounded Theory. Utifrån teorierna gjordes en semistrukturerad intervju, observationer samt en analys av data. I studien deltog åtta elever i intervjuerna och en pilotstudie inledde undersökningen. Materialet som samlades in bestod av elevernas lösningar av multiplikations- och divisionsuppgifter, anteckningar från observationer av elevernas lösningar samt ljudinspelade intervjuer. Resultatet visar att nästan alla elever använde sig av en additiv strategi i lösningar av multiplikations- och divisionsuppgifter. Det visade även att det endast var fyra av åtta elever som kunde uppvisa förståelse av ett samband mellan de två räknesätten. Resultatet visar att eleverna har olika strategier och lösningar inom multiplikativt tänkande även om de har haft samma matematikundervisning. / This study focuses on multiplicative thinking among pupils in grade 3. Multiplicative thinking is abstract (Clark & Kamii, 1996), involving applying strategies to solve multiplication and division tasks. The purpose of this study is to examine how pupils use different strategies within multiplicative thinking for multiplication and division. This study was inspired by Grounded Theory. From this theory, a number of semi-structured interviews, observations and analysis of data were made. Eight pupils participated in the interviews, after an initial pilot study. The collected material was based on the pupils' solutions of tasks in multiplication and division, notes from observations of the pupils' solutions and audiotaped interviews. The results show that almost every pupil uses an additive strategy in their solutions of multiplication and division tasks. It also show that only four out of eight pupils could show understanding 0f the connection between the two basic arithmetic operations. From the results, the pupils showed different strategies and solutions within multiplicative thinking, even though they have had the same mathematic education. multiplicative thinking multiplication division strategies multiplikativt tänkande multiplikation division strategier
43	Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic Khayyat, Ahmad 06 August 2013 (has links) The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, and reduces the bandwidth requirements across the device boundaries by reusing available elements. Exploiting the inherent parallelism in the matrix multiplication computation improves the performance and utilizes the available reconfigurable FPGA resources. The design is constructed by identifying the main design decisions and evaluating the alternatives for each one. The considered design decisions include the scheduling of block transfers, the scheduling of arithmetic operations in a block multiplication, the extent to which the parallelism is exploited, determining the block sizes and shapes, and the use of double buffers for storing matrix blocks. The choices offered by each decision are evaluated analytically in terms of their performance and utilization of FPGA resources. Based on this analysis, a detailed, flexible design that accommodates various alternative design choices is described. The design is optimized for matrices of floating-point elements, and for the FPGA target platform. Prior work is analyzed based on the considered design choices in order to identify the similarities and the differences. The proposed design is implemented using the VHDL hardware description language. The implementation is used to verify the correctness of the design and to confirm the analysis of the design decisions. Correctness is verified both by simulation using the ModelSim logic simulator, and in hardware through compiling the implementation using the Altera Quartus II CAD software and testing it on the Altera DE4 board, featuring a Stratix IV EP4SGX530C2 FPGA device. The implementation supports a range of parameters to facilitate the experimental evaluation of design choices. Experimental results show that the design scales linearly with respect to the consumed resources. Although increasing the system size reduces the maximum operating frequency, it also increases the parallelism, resulting in a higher performance. For instance, with 8 floating-point arithmetic units, the system runs at 320 MHz, which corresponds to a performance of 4 GFLOPS, whereas with 64 arithmetic units, it runs at 160 MHz, which corresponds to a performance of 16 GFLOPS. It is also shown that using a transfer schedule based on inner products reduces the transfer time by up to 50% compared to other schedules. Although using square blocks minimizes the number of required block multiplications, other non-square blocks minimize the transfer time, resulting in better total times. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2013-08-03 12:46:13.484 Reconfigurable logic Parallel FPGA Floating-point Matrix multiplication
44	Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic Khayyat, Ahmad 06 August 2013 (has links) The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, and reduces the bandwidth requirements across the device boundaries by reusing available elements. Exploiting the inherent parallelism in the matrix multiplication computation improves the performance and utilizes the available reconfigurable FPGA resources. The design is constructed by identifying the main design decisions and evaluating the alternatives for each one. The considered design decisions include the scheduling of block transfers, the scheduling of arithmetic operations in a block multiplication, the extent to which the parallelism is exploited, determining the block sizes and shapes, and the use of double buffers for storing matrix blocks. The choices offered by each decision are evaluated analytically in terms of their performance and utilization of FPGA resources. Based on this analysis, a detailed, flexible design that accommodates various alternative design choices is described. The design is optimized for matrices of floating-point elements, and for the FPGA target platform. Prior work is analyzed based on the considered design choices in order to identify the similarities and the differences. The proposed design is implemented using the VHDL hardware description language. The implementation is used to verify the correctness of the design and to confirm the analysis of the design decisions. Correctness is verified both by simulation using the ModelSim logic simulator, and in hardware through compiling the implementation using the Altera Quartus II CAD software and testing it on the Altera DE4 board, featuring a Stratix IV EP4SGX530C2 FPGA device. The implementation supports a range of parameters to facilitate the experimental evaluation of design choices. Experimental results show that the design scales linearly with respect to the consumed resources. Although increasing the system size reduces the maximum operating frequency, it also increases the parallelism, resulting in a higher performance. For instance, with 8 floating-point arithmetic units, the system runs at 320 MHz, which corresponds to a performance of 4 GFLOPS, whereas with 64 arithmetic units, it runs at 160 MHz, which corresponds to a performance of 16 GFLOPS. It is also shown that using a transfer schedule based on inner products reduces the transfer time by up to 50% compared to other schedules. Although using square blocks minimizes the number of required block multiplications, other non-square blocks minimize the transfer time, resulting in better total times. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2013-08-03 12:46:13.484 Reconfigurable logic Parallel FPGA Floating-point Matrix multiplication
45	On Space-Time Trade-Off for Montgomery Multipliers over Finite Fields Chen, Yiyang 04 1900 (has links) La multiplication dans le corps de Galois à 2^m éléments (i.e. GF(2^m)) est une opérations très importante pour les applications de la théorie des correcteurs et de la cryptographie. Dans ce mémoire, nous nous intéressons aux réalisations parallèles de multiplicateurs dans GF(2^m) lorsque ce dernier est généré par des trinômes irréductibles. Notre point de départ est le multiplicateur de Montgomery qui calcule A(x)B(x)x^(-u) efficacement, étant donné A(x), B(x) in GF(2^m) pour u choisi judicieusement. Nous étudions ensuite l'algorithme diviser pour régner PCHS qui permet de partitionner les multiplicandes d'un produit dans GF(2^m) lorsque m est impair. Nous l'appliquons pour la partitionnement de A(x) et de B(x) dans la multiplication de Montgomery A(x)B(x)x^(-u) pour GF(2^m) même si m est pair. Basé sur cette nouvelle approche, nous construisons un multiplicateur dans GF(2^m) généré par des trinôme irréductibles. Une nouvelle astuce de réutilisation des résultats intermédiaires nous permet d'éliminer plusieurs portes XOR redondantes. Les complexités de temps (i.e. le délais) et d'espace (i.e. le nombre de portes logiques) du nouveau multiplicateur sont ensuite analysées: 1. Le nouveau multiplicateur demande environ 25% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito lorsque GF(2^m) est généré par des trinômes irréductible et m est suffisamment grand. Le nombre de portes du nouveau multiplicateur est presque identique à celui du multiplicateur de Karatsuba proposé par Elia. 2. Le délai de calcul du nouveau multiplicateur excède celui des meilleurs multiplicateurs d'au plus deux évaluations de portes XOR. 3. Nous determinons le délai et le nombre de portes logiques du nouveau multiplicateur sur les deux corps de Galois recommandés par le National Institute of Standards and Technology (NIST). Nous montrons que notre multiplicateurs contient 15% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito au coût d'un délai d'au plus une porte XOR supplémentaire. De plus, notre multiplicateur a un délai d'une porte XOR moindre que celui du multiplicateur d'Elia au coût d'une augmentation de moins de 1% du nombre total de portes logiques. / The multiplication in a Galois field with 2^m elements (i.e. GF(2^m)) is an important arithmetic operation in coding theory and cryptography. In this thesis, we focus on the bit- parallel multipliers over the Galois fields generated by trinomials. We start by introducing the GF(2^m) Montgomery multiplication, which calculates A(x)B(x)x^{-u} in GF(2^m) with two polynomials A(x), B(x) in GF(2^m) and a properly chosen u. Then, we investigate the rule for multiplicand partition used by a divide-and-conquer algorithm PCHS originally proposed for the multiplication over GF(2^m) with odd m. By adopting similar rules for splitting A(x) and B(x) in A(x)B(x)x^{-u}, we develop new Montgomery multiplication formulae for GF(2^m) with m either odd or even. Based on this new approach, we develop the corresponding bit-parallel Montgomery multipliers for the Galois fields generated by trinomials. A new bit-reusing trick is applied to eliminate redundant XOR gates from the new multiplier. The time complexity (i.e. the delay) and the space complexity (i.e. the logic gate number) of the new multiplier are explicitly analysed: 1. This new multiplier is about 25% more efficient in the number of logic gates than the previous trinomial-based Montgomery multipliers or trinomial-based Mastrovito multipliers on GF(2^m) with m big enough. It has a number of logic gates very close to that of the Karatsuba multiplier proposed by Elia. 2. While having a significantly smaller number of logic gates, this new multiplier is at most two T_X larger in the total delay than the fastest bit-parallel multiplier on GF(2^m), where T_X is the XOR gate delay. 3. We determine the space and time complexities of our multiplier on the two fields recommended by the National Institute of Standards and Technology (NIST). Having at most one more T_X in the total delay, our multiplier has a more-than-15% reduced logic gate number compared with the other Montgomery or Mastrovito multipliers. Moreover, our multiplier is one T_X smaller in delay than the Elia's multiplier at the cost of a less-than-1% increase in the logic gate number. Corps de Galois Calcul parallèle Multiplication Montgomery Multiplication Karatsuba Trinôme irréductible Galois ﬁeld Parallel computation Montgomery multiplication Karatsuba multiplication Irreducible trinomial
46	Využití GPU pro náročné výpočty / Using GPU for HPC Máček, Branislav Unknown Date (has links) Recently there was a significant grow in building HPC systems. Nowadays they are building from mainstream computer components. One of them is graphics accelerators with GPU. This thesis deals with description of graphics accelerators. It examines possibilities usage. GPU chip has hundreds simple processors. This thesis examine possibilities how to benefit from these parallel processors. It contains description of several testing applications, discuss results from experiments and compares them with another components used for HPC.
47	Samband mellan elevers motivation och prestation i matematik : En fallstudie inom multiplikation för årskurs två / The relation between students’ motivation and achievement in the field of mathematics referring to multiplication in second grade Wetterfall, Lina January 2017 (has links) Syftet med denna studie är att undersöka sambandet mellan elevers motivation och prestationer gällande räknesättet multiplikation. Motivationen graderas utifrån elevernas egna skattningar i form av hur de upplever ämnesområdet samt deras upplevelser om hur svårt eller lätt det är. Studien genomfördes i en klass bestående av 21 st elever i årskurs 2. Varje elev fick muntligt ange utifrån en enkät hur motiverade de var med hjälp av skalor gällande multiplikation överlag samt olika uppgiftstyper inom multiplikation. Därefter fick eleverna genomföra ett enskilt multiplikationstest skriftligt. Den insamlade datan från elevernas enkätsvar och testresultat sammanställdes och jämfördes. Resultatet visade ett samband mellan låg motivation och låga resultat och högre motivation och högre resultat gällande multiplikation. Slutsatsen är att det finns ett samband mellan elevers motivation och prestation gällande räknesättet multiplikation, men att andra aspekter, som exempelvis uppgiftstyp, påverkar upplevelsen av hur roligt eller tråkigt och/eller lätt eller svårt räknesättet multiplikation upplevs. Multiplication motivation achievement. Multiplikation motivation prestation. Mathematics Matematik
48	Números: algumas atividades lúdicas / Numbers: some playful activities Lima, Denis Gomes 07 June 2018 (has links) Diante da atual situação da aprendizagem no país, nos capítulos que seguem desse trabalho, discutiremos que é possível mudarmos esse quadro através dos métodos de ensino. Abordamos especialmente a divisão, a multiplicação e as frações. Analisamos a evolução dos números desde sua origem até o atual padrão usado assim como as propriedades dos números inteiros, seguido da discussão de números racionais e congruências que permitem que a divisão e a multiplicação sejam operações usáveis e de fácil compreensão. Finalizamos nossas pesquisas com um capítulo voltado a atividades lúdicas que possam ser aproveitadas em projetos e laboratório de matemática. / Before present situation of the learning in the country, near chapter this work, discuss what is possible to chance this picture through teaching methods. Broach especially the division, multiplication, and the fraction. We analyze the evolution of the numbers since its origin until the present model used, like this how analysis the property in the whole numbers, followed, of the discussion of rational numbers and congruences what permit the division and the multiplication are operation usual and of easy understandable. We finished our research, with a chapter think the playful activities what own be take in project and laboratory of mathematics. Divisão Division Fração Fraction Laboraratório Laboratory Multiplicação Multiplication
49	Améliorations de la multiplication et de la factorisation d'entier / Speeding up integer multiplication and factorization Kruppa, Alexander 28 January 2010 (has links) Cette thèse propose des améliorations aux problèmes de la multiplication et de la factorisation d’entier.L’algorithme de Schönhage-Strassen pour la multiplication d’entier, publié en 1971, fut le premier à atteindre une complexité de O(n log(n) log(log(n))) pour multiplier deux entiers de n bits, et reste parmi les plus rapides en pratique. Il réduit la multiplication d’entier à celle de polynôme sur un anneau fini, en utilisant la transformée de Fourier rapide pour calculer le produit de convolution. Dans un travail commun avec Gaudry et Zimmermann, nous décrivons une implantation efficace de cet algorithme, basée sur la bibliothèque GNU MP; par rapport aux travaux antérieurs, nous améliorons l’utilisation de la mémoire cache, la sélection des parameters et la longueur de convolution, ce qui donne un gain d’un facteur 2 environ.Les algorithmes P–1 et P+1 trouvent un facteur p d’un entier composé rapidement si p-1, respectivement p+1, ne contient pas de grand facteur premier. Ces algorithmes comportent deux phases : la première phase calcule une grande puissance g1 d’un élément g0 d’un groupe fini défini sur Fp, respectivement Fp^2 , la seconde phase cherche une collision entre puissances de g1, qui est trouvée de manière efficace par évaluation-interpolation de polynômes. Dans un travail avec Peter Lawrence Montgomery, nous proposons une amélioration de la seconde phase de ces algorithmes, avec une construction plus rapide des polynômes requis, et une consommation mémoire optimale, ce qui permet d’augmenter la limite pratique pour le plus grand facteur premier de p-1, resp. p + 1, d’un facteur 100 environ par rapport aux implantations antérieures.Le crible algébrique (NFS) est le meilleur algorithme connu pour factoriser des entiers dont les facteurs n’ont aucune propriété permettant de les trouver rapidement. En particulier, le module du système RSA de chiffrement est choisi de telle sorte, et sa factorisation casse le système. De nombreux efforts ont ainsi été consentis pour améliorer NFS, de façon à établir précisément la sécurité de RSA. Nous donnons un bref aperçu de NFS et de son historique. Lors de la phase de crible de NFS, de nombreux petits entiers doivent être factorisés. Nous présentons en detail une implantation de P–1, P+1, et de la méthode ECM basée sur les courbes elliptiques, qui est optimisée pour de tels petits entiers. Finalement, nous montrons comment les paramètres de ces algorithmes peuvent être choisis finement, en tenant compte de la distribution des facteurs premiers dans les entiers produits par NFS, et de la probabilité de trouver des facteurs premiers d’une taille donnée / This thesis explores improvements to well-known algorithms for integer multiplication and factorization.The Schönhage-Strassen algorithm for integer multiplication, published in 1971, was the firstto achieve complexity O(n log(n) log(log(n))) for multiplication of n-bit numbers and is stillamong the fastest in practice. It reduces integer multiplication to multiplication of polynomials over finite rings which allow the use of the Fast Fourier Transform for computing the convolution product. In joint work with Gaudry and Zimmermann, we describe an efficient implementation of the algorithm based on the GNU Multiple Precision arithmetic library, improving cache utilization, parameter selection and convolution length for the polynomial multiplication over previous implementations, resulting in nearly 2-fold speedup.The P–1 and P+1 factoring algorithms find a prime factor p of a composite number quickly if p-1, respectively p+1, contains no large prime factors. They work in two stages: the first step computes a high power g1 of an element g0 of a finite group defined over Fp, respectively Fp^2 , the second stage looks for a collision of powers of g1 which can be performed efficiently via polynomial multi-point evaluation. In joint work with Peter Lawrence Montgomery, we present an improved stage 2 for these algorithms with faster construction of the required polynomial and very memory-efficient evaluation, increasing the practical search limit for the largest permissible prime in p-1, resp. p+1, approximately 100-fold over previous implementations.The Number Field Sieve (NFS) is the fastest known factoring algorithm for “hard” integers where the factors have no properties that would make them easy to find. In particular, the modulus of the RSA encryption system is chosen to be a hard composite integer, and its factorization breaks the encryption. Great efforts are therefore made to improve NFS in order to assess the security of RSA accurately. We give a brief overview of the NFS and its history. In the sieving phase of NFS, a great many smaller integers must be factored. We present in detail an implementation of the P–1, P+1, and Elliptic Curve methods of factorization optimized for high-throughput factorization of small integers. Finally, we show how parameters for these algorithms can be chosen accurately, taking into account the distribution of prime factors in integers produced by NFS to obtain an accurate estimate of finding a prime factor with given parameters Arithmétique Multiplication des entiers Factorisation des entiers Courbes elliptiques Crible algébrique
50	Medidas do coeficiente de multiplicação gasosa no isobutano puro / Measurements of gaseous multiplication coefficient in pure isobutane Lima, Iara Batista de 15 March 2010 (has links) Neste trabalho so apresentadas as medidas do coeficiente de multiplicao gasosa (α) no isobutano puro obtidas com uma cmara de placas paralelas protegida contra descargas por um eletrodo de vidro (anodo) de elevada resistividade (ρ = 2 x 1012.cm). O mtodo empregado foi o de Townsend pulsado, onde a ionizao primria produzida pela incidncia de um feixe de laser de nitrognio em um eletrodo metlico (catodo). As correntes eltricas medidas com a cmara operando em regime de ionizao e de avalanche foram utilizadas para o clculo do coeficiente de multiplicao gasosa pela soluo da equao de Townsend para campos eltricos uniformes. A tcnica utilizada foi validada pelas medidas do coeficiente de multiplicao gasosa no nitrognio, um gs amplamente estudado, e para o qual se tem dados bem estabelecidos na literatura. Os coeficientes de multiplicao gasosa do isobutano foram medidos em funo do campo eltrico reduzido no intervalo de 139Td a 208Td. Os valores obtidos foram comparados com os simulados pelo programa Imonte (verso 4.5) e com os nicos dados existentes na literatura, recentemente obtidos pelo nosso grupo. Esta comparao demonstrou que os resultados so concordantes dentro dos erros experimentais. / In this work it is presented measurements of gaseous multiplication coefficient (α) in pure isobutane obtained with a parallel plate chamber, protected against discharges by one electrode (anode) of high resistivity glass (ρ = 2 x 1012.cm). The method applied was the Pulsed Townsend, where the primary ionization is produced through the incidence of a nitrogen laser beam onto a metallic electrode (cathode). The electric currents measured with the chamber operating in both ionization and avalanche regimes were used to calculate the gaseous multiplication coefficient by the solution of the Townsend equation for uniform electric fields. The validation of the technique was provided by the measurements of gaseous multiplication coefficient in pure nitrogen, a widely studied gas, which has well-established data in literature. The coefficients in isobutane were measured as a function of the reduced electric field in the range of 139Td up to 208Td. The obtained values were compared with those simulated by Imonte software (version 4.5) and the only experimental results available in the literature, recently obtained in our group. This comparison showed that the results are concordant within the experimental errors. coeficiente de multiplicação isobutane isobutano multiplication coefficient parâmetros de transporte em gases

Search results