1 |
REDUCING MEMORY SPACE FOR COMPLETELY UNROLLED LU FACTORIZATION OF SPARSE MATRICESTHIYAGARAJAN, SANJEEV 11 October 2001 (has links)
No description available.
|
2 |
Acceleration of Block-Aware Matrix Factorization on Heterogeneous PlatformsSomers, Gregory W. January 2016 (has links)
Block-structured matrices arise in several contexts in circuit
simulation problems. These matrices typically inherit the pattern of
sparsity from the circuit connectivity. However, they are also
characterized by dense spots or blocks. Direct factorization of those
matrices has emerged as an attractive approach if the host memory is sufficiently large to store the block-structured matrix. The approach proposed in this thesis aims to accelerate the direct factorization of general block-structured matrices by leveraging the power of multiple OpenCL accelerators such as Graphical Processing Units (GPUs).
The proposed approach utilizes the notion of a Directed Acyclic Graph representing the matrix in order to schedule its factorization on multiple accelerators. This thesis also describes memory management techniques that enable handling large matrices while minimizing the amount of memory transfer over the PCIe bus between the host CPU and the attached devices. The results demonstrate that by using two GPUs the proposed approach can achieve a nearly optimal speedup when compared to a
single GPU platform.
|
3 |
Improving the Execution Time of Large System SimulationsJanuary 2012 (has links)
abstract: Today, the electric power system faces new challenges from rapid developing technology and the growing concern about environmental problems. The future of the power system under these new challenges needs to be planned and studied. However, due to the high degree of computational complexity of the optimization problem, conducting a system planning study which takes into account the market structure and environmental constraints on a large-scale power system is computationally taxing. To improve the execution time of large system simulations, such as the system planning study, two possible strategies are proposed in this thesis. The first one is to implement a relative new factorization method, known as the multifrontal method, to speed up the solution of the sparse linear matrix equations within the large system simulations. The performance of the multifrontal method implemented by UMFAPACK is compared with traditional LU factorization on a wide range of power-system matrices. The results show that the multifrontal method is superior to traditional LU factorization on relatively denser matrices found in other specialty areas, but has poor performance on the more sparse matrices that occur in power-system applications. This result suggests that multifrontal methods may not be an effective way to improve execution time for large system simulation and power system engineers should evaluate the performance of the multifrontal method before applying it to their applications. The second strategy is to develop a small dc equivalent of the large-scale network with satisfactory accuracy for the large-scale system simulations. In this thesis, a modified Ward equivalent is generated for a large-scale power system, such as the full Electric Reliability Council of Texas (ERCOT) system. In this equivalent, all the generators in the full model are retained integrally. The accuracy of the modified Ward equivalent is validated and the equivalent is used to conduct the optimal generation investment planning study. By using the dc equivalent, the execution time for optimal generation investment planning is greatly reduced. Different scenarios are modeled to study the impact of fuel prices, environmental constraints and incentives for renewable energy on future investment and retirement in generation. / Dissertation/Thesis / M.S. Electrical Engineering 2012
|
4 |
A Hardware Interpreter for Sparse Matrix LU FactorizationSyed, Akber 16 September 2002 (has links)
No description available.
|
5 |
Bezmaticové předpodmínění / Matrix-free preconditioningTrojek, Lukáš January 2012 (has links)
The diploma theses is focused on matrix-free preconditioning of a linear system. It gives a very brief introduction into the area of iterative methods, preconditioning and matrix-free environment. The emphasis is put on a detailed description of a variant of LU factorization which can be computed in a matrix-free manner and on a new technique connected with this factorization for preconditioning by incomplete LU factors in matrix-free environment. Its main features are storage of only one of the two incomplete factors and low memory costs during the computation of the stored factor. The thesis closes with numerical experiments demonstrating the efficiency of the proposed technique.
|
6 |
Contribuição da atualização da decomposição LU no metodo Simplex / Contribution of the LU factorization update in the Simplex methodCantane, Daniela Renata 14 August 2018 (has links)
Orientadores: Aurelio Ribeiro Leite de Oliveira, Christiano Lyra Filho / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-14T10:57:39Z (GMT). No. of bitstreams: 1
Cantane_DanielaRenata.pdf: 1253133 bytes, checksum: 870b16a2b9360f77ebd88f50491d181c (MD5)
Previous issue date: 2009 / Resumo: A solução eficiente de sistemas lineares é fundamental em problemas de otimização linear e o primeiro método a obter êxito nesta classe de problemas foi o método Simplex. Com o objetivo de desenvolver alternativas eficientes para sua implementação, são apresentadas nesta tese técnicas de atualização da decomposição LU da base para aperfeiçoar a solução dos sistemas lineares oriundos do método Simplex, utilizando um reordenamento estático nas colunas da matriz. Uma simulação do método Simplex é implementada, realizando troca de bases obtidas pelo MINOS e verificando sua esparsidade. Somente os elementos afetados pela mudança de base são considerados para obter uma atualização da decomposição LU eficaz. As colunas da matriz são reordenadas de acordo com três estratégias: mínimo grau; forma bloco triangular e estratégia de Björck. Assim, obtém-se uma decomposição esparsa para qualquer base sem esforço computacional para obter a ordem das colunas, pois o reordenamento da matriz é estático e as colunas da base obedecem esta ordem. A forma bloco triangular obteve os melhores resultados, para os maiores problemas testados, em relação ao mínimo grau e a estratégia de Björck. Resultados computacionais para problemas da Netlib mostram a robustez e um bom desempenho computacional do método de atualização da decomposição LU proposto, pois não são necessárias refatorações periódicas da base como nos métodos de atualização tradicionais. O método proposto obteve uma redução do número de elementos não nulos da base em relação ao MINOS. Esta abordagem foi aplicada em problemas de corte de estoque e a atualização da decomposição LU proposta obteve uma redução do tempo computacional na solução destes problemas em relação ao GPLK. / Abstract: Finding efficient solution of linear systems is fundamental in the linear programming problems and the first method to obtain success for this class of problems was the Simplex method. With the objective to develop efficient alternatives to its implementation, techniques of the simplex basis LU factorization update are developed in this thesis to improve the solution of the Simplex method linear systems towards a matrix columns static reordering. A simulation of the Simplex method is implemented, carrying through the change of basis obtained from MINOS and verifying its sparsity. Only the factored columns actually modified by the change of the base are carried through to obtain an efficient LU factorization update. The matrix columns are reordered according to three strategies: minimum degree; block triangular form and the Björck strategy. Thus, sparse factorizations are obtained for any base without computational effort to obtain the order of columns, since the reordering of the matrix is static and base columns follow this ordering. The application of the block triangular form achieved the best results, for larger scale problems tested, in comparison to minimum degree method and the Björck strategy. Computational results for Netlib problems show the robustness of this approach and good computational performance, since there is no need of periodical factorizations as used in traditional updating methods. The proposed method obtained a reduction of the nonzero entries of the basis with respect to MINOS. This approach was applied in the cutting stock problems and the proposed method achieved a reduction of the computational time in the solution of such problems with respect to the GLPK. / Universidade Estadual de Campi / Automação / Doutor em Engenharia Elétrica
|
7 |
Pipelined IEEE-754 Double Precision Floating Point Arithmetic Operators on Virtex FPGA’sPathanjali, Nandini 22 May 2002 (has links)
No description available.
|
8 |
Methods and algorithms for solving linear systems of equations on massively parallel computers / Méthodes et algorithmes pour la résolution des systèmes d'équations linéaires sur les ordinateurs massivement parallèlesDonfack, Simplice 07 March 2012 (has links)
Les processeurs multi-cœurs sont considérés de nos jours comme l'avenir des calculateurs et auront un impact important dans le calcul scientifique. Cette thèse présente une nouvelle approche de résolution des grands systèmes linéaires creux et denses, qui soit adaptée à l'exécution sur les futurs machines pétaflopiques et en particulier celles ayant un nombre important de cœurs. Compte tenu du coût croissant des communications comparé au temps dont les processeurs mettent pour effectuer les opérations arithmétiques, notre approche adopte le principe de minimisation des communications au prix de quelques calculs redondants et utilise plusieurs adaptations pour atteindre de meilleures performances sur les machines multi-cœurs. Nous décomposons le problème à résoudre en plusieurs phases qui sont ensuite mises en œuvre séparément. Dans la première partie, nous présentons un algorithme basé sur le partitionnement d'hypergraphe qui réduit considérablement le remplissage ("fill-in") induit lors de la factorisation LU des matrices creuses non symétriques. Dans la deuxième partie, nous présentons deux algorithmes de réduction de communication pour les factorisations LU et QR qui sont adaptés aux environnements multi-cœurs. La principale contribution de cette partie est de réorganiser les opérations de la factorisation de manière à réduire la sollicitation du bus tout en utilisant de façon optimale les ressources. Nous étendons ensuite ce travail aux clusters de processeurs multi-cœurs. Dans la troisième partie, nous présentons une nouvelle approche d'ordonnancement et d'optimisation. La localité des données et l'équilibrage des charges représentent un sérieux compromis pour le choix des méthodes d'ordonnancement. Sur les machines NUMA par exemple où la localité des données n'est pas une option, nous avons observé qu'en présence de perturbations systèmes (" OS noise"), les performances pouvaient rapidement se dégrader et devenir difficiles à prédire. Pour cela, nous présentons une approche combinant un ordonnancement statique et dynamique pour ordonnancer les tâches de nos algorithmes. Nos résultats obtenues sur plusieurs architectures montrent que tous nos algorithmes sont efficaces et conduisent à des gains de performances significatifs. Nous pouvons atteindre des améliorations de l'ordre de 30 à 110% par rapport aux correspondants de nos algorithmes dans les bibliothèques numériques bien connues de la littérature. / Multicore processors are considered to be nowadays the future of computing, and they will have an important impact in scientific computing. In this thesis, we study methods and algorithms for solving efficiently sparse and dense large linear systems on future petascale machines and in particular these having a significant number of cores. Due to the increasing communication cost compared to the time the processors take to perform arithmetic operations, our approach embrace the communication avoiding algorithm principle by doing some redundant computations and uses several adaptations to achieve better performance on multicore machines.We decompose the problem to solve into several phases that would be then designed or optimized separately. In the first part, we present an algorithm based on hypergraph partitioning and which considerably reduces the fill-in incurred in the LU factorization of sparse unsymmetric matrices. In the second part, we present two communication avoiding algorithms that are adapted to multicore environments. The main contribution of this part is to reorganize the computations such as to reduce bus contention and using efficiently resources. Then, we extend this work for clusters of multi-core processors. In the third part, we present a new scheduling and optimization approach. Data locality and load balancing are a serious trade-off in the choice of the scheduling strategy. On NUMA machines for example, where the data locality is not an option, we have observed that in the presence of noise, performance could quickly deteriorate and become difficult to predict. To overcome this bottleneck, we present an approach that combines a static and a dynamic scheduling approach to schedule the tasks of our algorithms.Our results obtained on several architectures show that all our algorithms are efficient and lead to significant performance gains. We can achieve from 30 up to 110% improvement over the corresponding routines of our algorithms in well known libraries.
|
9 |
Jack Rabbit : an effective Cell BE programming system for high performance parallelismEllis, Apollo Isaac Orion 08 July 2011 (has links)
The Cell processor is an example of the trade-offs made when designing a mass market power efficient multi-core machine, but the machine-exposing architecture and raw communication mechanisms of Cell are hard to manage for a programmer. Cell's design is simple and causes software complexity to go up in the areas of achieving low threading overhead, good bandwidth efficiency, and load balance. Several attempts have been made to produce efficient and effective programming systems for Cell, but the attempts have been too specialized and thus fall short. We present Jack Rabbit, an efficient thread pool work queue implementation, with load balancing mechanisms and double buffering. Our system incurs low threading overhead, gets good load balance, and achieves bandwidth efficiency. Our system represents a step towards an effective way to program Cell and any similar current or future processors. / text
|
10 |
Sistemas lineares: métodos de eliminação de Gauss e fatoração LU / Linear systems: methods of gaussian eliminationand LU factorizationAssis, Carmencita Ferreira Silva 20 March 2014 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-07T13:34:15Z
No. of bitstreams: 2
Dissertação - Carmencita Ferreira Silva Assis - 2014.pdf: 1032992 bytes, checksum: dcfbc22b53a2352c6e65a7615ffb72b5 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-07T13:40:12Z (GMT) No. of bitstreams: 2
Dissertação - Carmencita Ferreira Silva Assis - 2014.pdf: 1032992 bytes, checksum: dcfbc22b53a2352c6e65a7615ffb72b5 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-05-07T13:40:12Z (GMT). No. of bitstreams: 2
Dissertação - Carmencita Ferreira Silva Assis - 2014.pdf: 1032992 bytes, checksum: dcfbc22b53a2352c6e65a7615ffb72b5 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Previous issue date: 2014-03-20 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This work aims to present te
hniques for solving systems of linear equations, in its
traditional formulation, where it sought to explore the referen
es
ommonly used in
ourses in linear algebra and numeri
al
omputation, fo
using on the dire
t methods of
Gauss elimination and LU fa
torization. Troubleshooters established in the literature
are
ondu
ted, in order to illustrate the operation and appli
ation of su
h methods to
real problems, thus highlighting the possibility of inserting them in high s
hool. The
ontents were treated and exposed so that exemplify the diversity of areas in
luding
linear systems, su
h as engineering, e
onomi
s and biology, showing the gains that
an
be a
hieved by students if they have
onta
t with the methods as soon as possible.
At the end we suggest the use of
omputational resour
es in math
lasses, sin
e the
redu
tion of time spent in algebrai
manipulation will allow the tea
her to deepen the
on
epts and to address larger systems, to enhan
e the resolution perspe
tive, and
motivate the student in the learning pro
ess. / Este trabalho tem por objetivo apresentar té
ni
as de resolução de sistemas de
equações lineares, em sua formulação tradi
ional, onde se bus
ou explorar as referên
ias
usualmente utilizadas em
ursos de álgebra linear e
ál
ulo numéri
o, enfo
ando os
métodos diretos de Eliminação de Gauss e Fatoração LU. Resoluções de problemas
onsolidados na literatura são realizadas,
om a nalidade de ilustrar o fun
ionamento
e apli
ação de tais métodos em problemas reais, desta
ando assim a possibilidade de
inserção dos mesmos no Ensino Médio. Os
onteúdos foram tratados e expostos de
modo que exempli quem a diversidade de áreas que abrangem os sistemas lineares, tais
omo engenharia, e
onomia e biologia, mostrando os ganhos que podem ser al
ançados
pelos alunos, se tiverem
ontato
om os métodos o quanto antes. Ao nal sugere-
se a utilização de re
ursos
omputa
ionais nas aulas de matemáti
a, uma vez que a
redução do tempo empregado na manipulação algébri
a permitirá que o professor possa
aprofundar os
on
eitos e abordar sistemas de maior porte, que ampliem a perspe
tiva
de resolução, além de motivar o aluno no pro
esso de aprendizagem.
|
Page generated in 0.1397 seconds