Global ETD Search

1	Design of a Multi-Core Multi-thread Floating-Point Processor and Its Application in Computer Graphics Yeh, Chia-Yu 06 September 2011 (has links) Graphics processing unit (GPU) designs usually adopts various computer architecture techniques to boost the computation speed, including single-instruction multiple data (SIMD), very-long-instruction word (VLIW), multi-threading, and/or multi-core. In OpenGL ES 2.0, user programmable vertex shader (VS) hardware unit can be designed using vectored SIMD computation unit so that it can efficiently compute the matrix-vector multiplication, one of the key operations in vertex transformation. Recently, high-performance GPU, such as Telsa series from nVidia, is designed with many-core architectures with each core responsible for scalar operations. The intention is to allow for efficient execution of general-purpose computations in addition to the specialized graphics computations. In this thesis, we design a scalar-based multi-threaded GPU design that is composed of four scalar processors, one special-function unit, and can execute multi-threaded instructions. We use the example of vertex transformation to demonstrate execution efficiency of the scalar-based multi-threaded GPU. We also make comparison with the vector-based SIMD GPU. multi-threading graphics processing unit (GPU) vertex shader SIMD matrix-vector multiplication OpenGL ES 2.0
2	Fast algorithms for setting up the stiffness matrix in hp-FEM: a comparison Eibner, Tino, Melenk, Jens Markus 11 September 2006 (has links) (PDF) We analyze and compare different techniques to set up the stiffness matrix in the hp-version of the finite element method. The emphasis is on methods for second order elliptic problems posed on meshes including triangular and tetrahedral elements. The polynomial degree may be variable. We present a generalization of the Spectral Galerkin Algorithm of [7], where the shape functions are adapted to the quadrature formula, to the case of triangles/tetrahedra. Additionally, we study on-the-fly matrix-vector multiplications, where merely the matrix-vector multiplication is realized without setting up the stiffness matrix. Numerical studies are included. elliptic problems matrix-vector multiplication ddc:510 Finite-Elemente-Methode Steifigkeitsmatrix hp-Methode
3	Πειραματική αξιολόγηση μεθοδολογίας βελτιστοποίησης του αλγόριθμου πολλαπλασιασμού πίνακα επί διάνυσμα σε μονοπύρηνες και πολυπύρηνες αρχιτεκτονικές Παπαδήμα, Ελισσάβετ 30 April 2014 (has links) Στην παρούσα διπλωματική εργασία έγινε υλοποίηση και πειραματική αξιολόγηση μιας μεθοδολογίας η οποία έχει αναπτυχθεί στο Εργαστήριο Ολοκληρωμένων Κυκλωμάτων και αφορά τη βελτιστοποίηση του Πολλαπλασιασμού Πίνακα επί Διάνυσμα (ΠΠΔ) σε μονοπύρηνους και πολυπύρηνους επεξεργαστές. Η μεθοδολογία εκμεταλλεύεται το σύνολο των χαρακτηριστικών της αρχιτεκτονικής που χρησιμοποιείται και συγκεκριμένα (α) την ιεραρχία της μνήμης, (β) το μέγεθος της κρυφής μνήμης, (γ) το βαθμό συσχέτισης κάθε επιπέδου της κρυφής μνήμης, (δ) την καθυστέρηση της μνήμης και (ε) το πλήθος των πυρήνων. Είναι η πρώτη φορά που λαμβάνεται υπόψη ο βαθμός συσχέτισης της μνήμης. Σκοπός της μεθοδολογίας είναι η βελτιστοποίηση με βάση όλες τις παραμέτρους μαζί και όχι καθεμία ξεχωριστά. Για να βελτιωθεί η απόδοση προτείνεται διαφορετικός χρονοπρογραμματισμός ανάλογα με το μέγεθος του πίνακα. Για την πειραματική αξιολόγηση χρησιμοποιήθηκαν οι επεξεργαστές γενικού σκοπού Intel Core 2 Duo E6065 και Τ6600, ο Intel i7-3930K και ο ενσωματωμένος επεξεργαστής ειδικού σκοπού Microblaze από το Virtex-5 FPGA (Xilinx). Τα αποτελέσματα συγκρίνονται με την state-of-the-art βιβλιοθήκη ATLAS (Automatically Tuned Linear Algebra Software) και παρουσιάζουν βελτίωση 30%. Από τα πειραματικά αποτελέσματα είναι φανερό ότι η κύρια μνήμη είναι το bottleneck του προβλήματος. Επίσης, η απόδοση βελτιώνεται όταν αλλάζει το layout του πίνακα τόσο σε μονοπύρηνες όσο σε πολυπύρηνες αρχιτεκτονικές. Όσον αφορά τη μέθοδο του tiling τα πειραματικά αποτελέσματα δείχνουν ότι η μείωση των αστοχιών δεν βελτιώνει πάντα την απόδοση γιατί υπάρχει trade-off ανάμεσα στο μέγεθος του tile και στις εντολές διευθυνσιοδότησης. Επίσης, είναι φανερό ότι στις πολυπύρηνες αρχιτεκτονικές δεν υπάρχει γραμμική σχέση της απόδοσης και του πλήθους των πυρήνων που χρησιμοποιούνται. Αυτό οφείλεται στο περιορισμένο εύρος ζώνης της μνήμης. / The subject of this MSc Thesis is the implementation and the experimental evaluation of a methodology that has been developed at the Laboratory of Integrated Circuits and optimizes the Matrix Vector Multiplication (MVM) in single-core and multi-core processors. The methodology fully exploits the characteristics of the architecture. Specifically, it exploits (a) the hierarchy of the memory, (b) the cache size, (c) the cache associativity, (d) the memory latency and (e) the number of the cores. It is the first time that the cache associativity is taken into account. The methodology optimizes all the parameters together as one problem and not separately. A different scheduling is proposed according to the size of the matrix. The general purpose processors Intel Core 2 Duo E6065, Intel Core 2 Duo T6600 and Intel i7-3930K and the embedded processor Virtex-5 Microblaze have been used. The results have been compared with the state-of-the-art library ATLAS (Automatically Tuned Linear Algebra Software) and the performance is improved by 30%. According to the experimental results, it is obvious that the bottleneck is the memory latency. Moreover, the performance is increased when a new way of saving the matrix in the main memory (data array layout) is used in both single-core and multi-core architectures. As far as the tiling is concerned, the experimental results indicate that the decrease of the misses does not always improve the performance because there is a trade-off between the tile size and the addressing instructions. According to the experimental results, as far as multicore architectures are concerned, there is no linear relation between the performance and the number of the cores, because of the limited memory bandwidth. 005.275 Matrix vector multiplication Multicore architectures
4	A New Representation of Structured Grids for Matrix-vector Operation and Optimization of Doitgen Kernel Murugandi, Iyyappa Thirunavukkarasu 27 September 2010 (has links) No description available. Computer Science Structure Grid Doitgen Matrix Vector multiplication vectorization Orio SIMD Tuning Optimization MADNESS
5	Fast algorithms for setting up the stiffness matrix in hp-FEM: a comparison Eibner, Tino, Melenk, Jens Markus 11 September 2006 (has links) We analyze and compare different techniques to set up the stiffness matrix in the hp-version of the finite element method. The emphasis is on methods for second order elliptic problems posed on meshes including triangular and tetrahedral elements. The polynomial degree may be variable. We present a generalization of the Spectral Galerkin Algorithm of [7], where the shape functions are adapted to the quadrature formula, to the case of triangles/tetrahedra. Additionally, we study on-the-fly matrix-vector multiplications, where merely the matrix-vector multiplication is realized without setting up the stiffness matrix. Numerical studies are included. info:eu-repo/classification/ddc/510 ddc:510 Finite-Elemente-Methode Steifigkeitsmatrix hp-Methode elliptic problems matrix-vector multiplication
6	Integral Equation Methods for Rough Surface Scattering Problems in three Dimensions / Integralgleichungsmethoden für Streuprobleme an rauhen Oberflächen in drei Dimensionen Heinemeyer, Eric 10 January 2008 (has links) No description available. 510 Mathematik EDFJ 050 EDFJ 250 EEF 000 Mathematics and Natural Science rough surface scattering boundary integral equations helmholtz equation fast matrix-vector multiplication Streuung an rauhen Oberflächen Randintegralgleichungen Helmholtz Gleichung schnelle Matrix-Vector-Multiplikation 31.80 31.76 33.06 33.12

1

Page generated in 0.1147 seconds