31 |
Combining Shortest Paths, Bottleneck Paths and Matrix MultiplicationShinn, Tong-Wook January 2014 (has links)
We provide a formal mathematical definition of the Shortest Paths for All Flows (SP-AF) problem and provide many efficient algorithms. The SP-AF problem combines the well known Shortest Paths (SP) and Bottleneck Paths (BP) problems, and can be solved by utilising matrix multiplication. Thus in our research of the SP-AF problem, we also make a series of contributions to the underlying topics of the SP problem, the BP problem, and matrix multiplication.
For the topic of matrix multiplication we show that on an n-by-n two dimensional (2D) square mesh array, two n-by-n matrices can be multiplied in exactly 1.5n ‒ 1 communication steps. This halves the number of communication steps required by the well known Cannon’s algorithm that runs
on the same sized mesh array.
We provide two contributions for the SP problem. Firstly, we enhance the breakthrough algorithm by Alon, Galil and Margalit (AGM), which was the first algorithm to achieve a deeply sub-cubic time bound for solving the All Pairs Shortest Paths (APSP) problem on dense directed graphs. Our enhancement allows the algorithm by AGM to remain sub-cubic for larger upper bounds on integer edge costs. Secondly, we show that for graphs with n vertices, the APSP problem can be solved in exactly 3n ‒ 2 communication steps on an n-by-n 2D square mesh array. This improves on the previous result of 3.5n communication steps achieved by Takaoka and Umehara.
For the BP problem, we show that we can compute the bottleneck of the entire graph without solving the All Pairs Bottleneck Paths (APBP) problem, resulting in a much more efficient time bound.
Finally we define an algebraic structure called the distance/flow semi-ring to formally introduce the SP-AF problem, and we provide many algorithms for solving the Single Source SP-AF (SSSP-AF) problem and the All Pairs SP-AF (APSP-AF) problem. For the APSP-AF problem, algebraic algorithms are given that utilise faster matrix multiplication over a ring.
|
32 |
Algorithm/architecture codesign of low power and high performance linear algebra compute fabricsPedram, Ardavan 27 September 2013 (has links)
In the past, we could rely on technology scaling and new micro-architectural techniques to improve the performance of processors. Nowadays, both of these methods are reaching their limits. The primary concern in future architectures with billions of transistors on a chip and limited power budgets is power/energy efficiency. Full-custom design of application-specific cores can yield up to two orders of magnitude better power efficiency over conventional general-purpose cores. However, a tremendous design effort is required in integrating a new accelerator for each new application. In this dissertation, we present the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain operations. The broad vision is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application programming support through standard linear algebra libraries. We try to address these issues specifically in the context of dense linear algebra applications. In the process, we pursue the main questions that architects will face while designing such accelerators. How broad is this class of applications that the accelerator can support? What are the limiting factors that prevent utilization of these accelerators on the chip? What is the maximum achievable performance/efficiency? Answering these questions requires expertise and careful codesign of the algorithms and the architecture to select the best possible components, datapaths, and data movement patterns resulting in a more efficient hardware-software codesign. In some cases, codesign reduces complexities that are imposed on the algorithm side due to the initial limitations in the architectures. We design a specialized Linear Algebra Processor (LAP) architecture and discuss the details of mapping of matrix-matrix multiplication onto it. We further verify the flexibility of our design for computing a broad class of linear algebra kernels. We conclude that this architecture can perform a broad range of matrix-matrix operations as complex as matrix factorizations, and even Fast Fourier Transforms (FFTs), while maintaining its ASIC level efficiency. We present a power-performance model that compares state-of-the-art CPUs and GPUs with our design. Our power-performance model reveals sources of inefficiencies in CPUs and GPUs. We demonstrate how to overcome such inefficiencies in the process of designing our LAP. As we progress through this dissertation, we introduce modifications of the original matrix-matrix multiplication engine to facilitate the mapping of more complex operations. We observe the resulting performance and efficiencies on the modified engine using our power estimation methodology. When compared to other conventional architectures for linear algebra applications and FFT, our LAP is over an order of magnitude better in terms of power efficiency. Based on our estimations, up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology. / text
|
33 |
Utilização de matrizes no estudo de orientação e posição de um braço robótico por meio das coordenadas de Denavit-Hartenberg. / Use of matrices in the study of orientation and position of a robotic arm by Denavit-Hartenberg coordinatesCosta, Carlos Gomides da 08 August 2014 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2015-01-27T14:34:02Z
No. of bitstreams: 2
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Dissertação - Carlos Gomides da Costa - 2014.pdf: 5662268 bytes, checksum: 9c271f8544a1ec02f89c341d68f33801 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-01-28T12:36:28Z (GMT) No. of bitstreams: 2
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Dissertação - Carlos Gomides da Costa - 2014.pdf: 5662268 bytes, checksum: 9c271f8544a1ec02f89c341d68f33801 (MD5) / Made available in DSpace on 2015-01-28T12:36:28Z (GMT). No. of bitstreams: 2
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Dissertação - Carlos Gomides da Costa - 2014.pdf: 5662268 bytes, checksum: 9c271f8544a1ec02f89c341d68f33801 (MD5)
Previous issue date: 2014-08-08 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This work have as the main objective present one a different way to teach the subject
matrix multiplication, a way in a ludic form, applied, that can transform the teaching
more pleasureable and motivate for the high scholl students, because for many this issue
has no practical applications, which for many reasons is the lack of interest in learning
such content. The tool used to attract as students so as teachers was the utilization of Kit
LEGO® Mindstorms NXT 2.0 for the construction of the robotic arm, or robotic
manipulator, that which for this propose, has three rotational joints. The LEGO kit was
choosen due its easy interaction with children and adolescents, and to encourage the
construction of knowledge stimulating the solution of problems that may arise during
the process of building robotic arm.The application of content takes place in obtaining
the four parameters Denavit-Hartenberg and those obtained after the placement of
reference systems, where three of these parameters are constants obtained by
measurement at robotic arm and the fourth parameter is variable dependent on the
intended movement or the final position which is to be determined. / O presente trabalho tem como principal objetivo apresentar mais uma opção de
ensinar o assunto multiplicação de matrizes, de uma forma mais lúdica, aplicada, que
pode tornar o ensino mais prazeroso e atrativo para alunos do ensino médio, pois para
muitos esse assunto não tem aplicações práticas, o que para outros é motivo da falta de
interesse em aprender tal conteúdo. A ferramenta apresentada com o intuito de atrair
tanto alunos quanto professores, foi o Kit LEGO® Mindstorms NXT 2.0, que
possibilitou a construção do braço robótico ou manipulador robótico, que nesse trabalho
foi apresentado com três juntas do tipo rotacionais. A escolha desse Kit LEGO® é
justificada pela sua facilidade de interação com crianças e adolescentes, além de
estimular a construção do aprendizado pois, estimula a solução de problemas que
possam aparecer durante o processo de construção do braço robótico. A aplicação do
conteúdo se dá na obtenção dos quatro parâmetros de Denavit-Hartenberg que são
obtidos após a colocação dos sistemas de referência, onde três desses parâmetros são
definidos a partir de medições feitas no braço robótico e o quarto parâmetro é variável
dependente do movimento pretendido ou a posição final que se queira determinar.
|
34 |
The Kronecker ProductBroxson, Bobbi Jo 01 January 2006 (has links)
This paper presents a detailed discussion of the Kronecker product of matrices. It begins with the definition and some basic properties of the Kronecker product. Statements will be proven that reveal information concerning the eigenvalues, singular values, rank, trace, and determinant of the Kronecker product of two matrices. The Kronecker product will then be employed to solve linear matrix equations. An investigation of the commutativity of the Kronecker product will be carried out using permutation matrices. The Jordan - Canonical form of a Kronecker product will be examined. Variations such as the Kronecker sum and generalized Kronecker product will be introduced. The paper concludes with an application of the Kronecker product to large least squares approximations.
|
35 |
AI Based Methods for Matrix Multiplication in High Resolution Simulations of Radio Access Networks / AI Baserade Metoder för Matris Multiplikationer för högupplösta simuleringar av RadionätverkJohnson, Marcus, Forslund, Herman January 2023 (has links)
The increasing demand for mobile data has placed significant strain on radio access networks (RANs), leading to a continuous need for increased network capacity. In keeping with that, a significant advancement in modern RANs is the ability to utilize several receivers and transmitters, to allow for beamforming. One way to increase the capacity of the network is therefore to optimize the resource allocation by preprocessing the transmitted signals, which involves several costly matrix multiplications (MMs). The aim of the project was to investigate the potential of accelerating Ericsson's RAN simulations by using AI based approximate matrix multiplication (AMM) algorithms. The main focus was on the multiply additionless (MADDNESS) algorithm, a product quantization technique that has achieved speedups of up to 100 times compared to exact MM, and 10 times faster than previous AMM methods. A complex matrix handling version of MADDNESS was implemented in Java and Python respectively, and its speed and accuracy were evaluated against Ericsson's current MM implementation. The proposed implementation did not beat the benchmark with respect to speed, instead resulting in a 4-10 times slowdown in runtime. However, this may largely be due to the fact that the used languages do not allow for complete control over memory resource allocation. As such, the implementations at hand do not incorporate all the crucial features of the algorithm. Particularly, the handicapped version does not fully leverage the vectorization potential, which is one of the key contributors to the speed of the algorithm. Consequently, further improvements are necessary before employing the techniques in an end-to-end implementation. / Den växande efterfrågan på mobildata har ökat belastningen på dagens radionätverk (RAN) och har medfört ett behov av att utvidga dess kapacitet. En betydande innovation inom RAN är beamforming, vilket är förmågan att fokusera digitala signaler mot mottagaren och på så vis öka singalstyrkan. En metod för att öka kapaciteten i ett nätverk är att optimera både kvaliteten av och resursallokeringen mellan nätverkets digitala kanaler, vilket medför tidskrävande matrismultiplikationer. Syftet med denna studie var att utforska om AI-baserade approximativa matrismultiplikationsalgoritmer har potentialen att accelerera Ericssons digitala tvilling-simuleringar. Studien fokuserade i huvudsak på produktkvantiseringsalgoritmen MADDNESS som påvisat potentialen att accelerera exakta matrismultiplikationer med en faktor 100, samt en faktor 10 snabbare än jämförbara approximativa metoder. En modifierad version av MADDNESS, som behandlar komplexa matriser, implementerades i Java samt Python, varefter precisionen och hastigheten utvärderades. Den föreslagna implementationen resulterade i en försämring med avseende på hastigheten med en faktor 4-10 jämfört med Ericssons nuvarande algoritmer. Den föreslagna implementationen saknar effektiv minnesallokering och misslyckas följaktligen att till fullo ta tillvara på vektoriseringspotentialen i MADDNESS. Detta indikerar att det är nödvändigt för ytterligare förbättringar innan algoritmen är användbar i den givna simuleringsmiljön.
|
36 |
Algebraic and multilinear-algebraic techniques for fast matrix multiplicationGouaya, Guy Mathias January 2015 (has links)
This dissertation reviews the theory of fast matrix multiplication from a multilinear-algebraic point of view, as
well as recent fast matrix multiplication algorithms based on discrete Fourier transforms over nite groups.
To this end, the algebraic approach is described in terms of group algebras over groups satisfying the triple
product Property, and the construction of such groups via uniquely solvable puzzles.
The higher order singular value decomposition is an important decomposition of tensors that retains some of
the properties of the singular value decomposition of matrices. However, we have proven a novel negative result
which demonstrates that the higher order singular value decomposition yields a matrix multiplication algorithm
that is no better than the standard algorithm. / Mathematical Sciences / M. Sc. (Applied Mathematics)
|
37 |
Power and Energy Efficiency Evaluation for HW and SW Implementation of nxn Matrix Multiplication on Altera FPGAsRenbi, Abdelghani January 2009 (has links)
<p>In addition to the performance, low power design became an important issue in the design process of mobile embedded systems. Mobile electronics with rich features most often involve complex computation and intensive processing, which result in short battery lifetime and particularly when low power design is not taken in consideration. In addition to mobile computers, thermal design is also calling for low power techniques to avoid components overheat especially with VLSI technology. Low power design has traced a new era. In this thesis we examined several techniques to achieve low power design for FPGAs, ASICs and Processors where ASICs were more flexible to exploit the HW oriented techniques for low power consumption. We surveyed several power estimation methodologies where all of them were prone to at least one disadvantage. We also compared and analyzed the power and energy consumption in three different designs, which perform matrix multiplication within Altera platform and using state-of-the-art FPGA device. We concluded that NIOS II\e is not an energy efficient alternative to multiply nxn matrices compared to HW matrix multipliers on FPGAs and configware is an enormous potential to reduce the energy consumption costs.</p>
|
38 |
Multiplication matricielle efficace et conception logicielle pour la bibliothèque de calcul exact LinBox / Efficient matrix multiplication and design for the exact linear algebra library LinBoxBoyer, Brice 21 June 2012 (has links)
Dans ce mémoire de thèse, nous développons d'abord des multiplications matricielles efficaces. Nous créons de nouveaux ordonnancements qui permettent de réduire la taille de la mémoire supplémentaire nécessaire lors d'une multiplication du type Winograd tout en gardant une bonne complexité, grâce au développement d'outils externes ad hoc (jeu de galets), à des calculs fins de complexité et à de nouveaux algorithmes hybrides. Nous utilisons ensuite des technologies parallèles (multicœurs et GPU) pour accélérer efficacement la multiplication entre matrice creuse et vecteur dense (SpMV), essentielles aux algorithmes dits /boîte noire/, et créons de nouveaux formats hybrides adéquats. Enfin, nous établissons des méthodes de /design/ générique orientées vers l'efficacité, notamment par conception par briques de base, et via des auto-optimisations. Nous proposons aussi des méthodes pour améliorer et standardiser la qualité du code de manière à pérenniser et rendre plus robuste le code produit. Cela permet de pérenniser de rendre plus robuste le code produit. Ces méthodes sont appliquées en particulier à la bibliothèque de calcul exact LinBox. / We first expose in this memoir efficient matrix multiplication techniques. We set up new schedules that allow us to minimize the extra memory requirements during a Winograd-style matrix multiplication, while keeping the complexity competitive. In order to get them, we develop external tools (pebble game), tight complexity computations and new hybrid algorithms. Then we use parallel technologies (multicore CPU and GPU) in order to accelerate efficiently the sparse matrix--dense vector multiplication (SpMV), crucial to /blackbox/ algorithms and we set up new hybrid formats to store them. Finally, we establish generic design methods focusing on efficiency, especially via building block conceptions or self-optimization. We also propose tools for improving and standardizing code quality in order to make it more sustainable and more robust. This is in particular applied to the LinBox computer algebra library.
|
39 |
Algebraic and multilinear-algebraic techniques for fast matrix multiplicationGouaya, Guy Mathias January 2015 (has links)
This dissertation reviews the theory of fast matrix multiplication from a multilinear-algebraic point of view, as
well as recent fast matrix multiplication algorithms based on discrete Fourier transforms over nite groups.
To this end, the algebraic approach is described in terms of group algebras over groups satisfying the triple
product Property, and the construction of such groups via uniquely solvable puzzles.
The higher order singular value decomposition is an important decomposition of tensors that retains some of
the properties of the singular value decomposition of matrices. However, we have proven a novel negative result
which demonstrates that the higher order singular value decomposition yields a matrix multiplication algorithm
that is no better than the standard algorithm. / Mathematical Sciences / M. Sc. (Applied Mathematics)
|
40 |
Numerical Quality and High Performance In Interval Linear Algebra on Multi-Core Processors / Algèbre linéaire d'intervalles - Qualité Numérique et Hautes Performances sur Processeurs Multi-CœursTheveny, Philippe 31 October 2014 (has links)
L'objet est de comparer des algorithmes de multiplication de matrices à coefficients intervalles et leurs implémentations.Le premier axe est la mesure de la précision numérique. Les précédentes analyses d'erreur se limitent à établir une borne sur la surestimation du rayon du résultat en négligeant les erreurs dues au calcul en virgule flottante. Après examen des différentes possibilités pour quantifier l'erreur d'approximation entre deux intervalles, l'erreur d'arrondi est intégrée dans l'erreur globale. À partir de jeux de données aléatoires, la dispersion expérimentale de l'erreur globale permet d'éclairer l'importance des différentes erreurs (de méthode et d'arrondi) en fonction de plusieurs facteurs : valeur et homogénéité des précisions relatives des entrées, dimensions des matrices, précision de travail. Cette démarche conduit à un nouvel algorithme moins coûteux et tout aussi précis dans certains cas déterminés.Le deuxième axe est d'exploiter le parallélisme des opérations. Les implémentations précédentes se ramènent à des produits de matrices de nombres flottants. Pour contourner les limitations d'une telle approche sur la validité du résultat et sur la capacité à monter en charge, je propose une implémentation par blocs réalisée avec des threads OpenMP qui exécutent des noyaux de calcul utilisant les instructions vectorielles. L'analyse des temps d'exécution sur une machine de 4 octo-coeurs montre que les coûts de calcul sont du même ordre de grandeur sur des matrices intervalles et numériques de même dimension et que l'implémentation par bloc passe mieux à l'échelle que l'implémentation avec plusieurs appels aux routines BLAS. / This work aims at determining suitable scopes for several algorithms of interval matrices multiplication.First, we quantify the numerical quality. Former error analyses of interval matrix products establish bounds on the radius overestimation by neglecting the roundoff error. We discuss here several possible measures for interval approximations. We then bound the roundoff error and compare experimentally this bound with the global error distribution on several random data sets. This approach enlightens the relative importance of the roundoff and arithmetic errors depending on the value and homogeneity of relative accuracies of inputs, on the matrix dimension, and on the working precision. This also leads to a new algorithm that is cheaper yet as accurate as previous ones under well-identified conditions.Second, we exploit the parallelism of linear algebra. Previous implementations use calls to BLAS routines on numerical matrices. We show that this may lead to wrong interval results and also restrict the scalability of the performance when the core count increases. To overcome these problems, we implement a blocking version with OpenMP threads executing block kernels with vector instructions. The timings on a 4-octo-core machine show that this implementation is more scalable than the BLAS one and that the cost of numerical and interval matrix products are comparable.
|
Page generated in 0.1281 seconds