21 |
Characterization and Enhancement of Data Locality and Load Balancing for Irregular ApplicationsNiu, Qingpeng 14 May 2015 (has links)
No description available.
|
22 |
Paralelização de inferência em redes credais utilizando computação distribuída para fatoração de matrizes esparsas / Parallelization of credal network inference using distributed computing for sparse matrix factorization.Pereira, Ramon Fortes 25 April 2017 (has links)
Este estudo tem como objetivo melhorar o desempenho computacional dos algoritmos de inferência em redes credais, aplicando técnicas de computação paralela e sistemas distribuídos em algoritmos de fatoração de matrizes esparsas. Grosso modo, técnicas de computação paralela são técnicas para transformar um sistema em um sistema com algoritmos que possam ser executados concorrentemente. E a fatoração de matrizes são técnicas da matemática para decompor uma matriz em um produto de duas ou mais matrizes. As matrizes esparsas são matrizes que possuem a maioria de seus valores iguais a zero. E as redes credais são semelhantes as redes bayesianas, que são grafos acíclicos que representam uma probabilidade conjunta através de probabilidades condicionais e suas relações de independência. As redes credais podem ser consideradas como uma extensão das redes bayesianas para lidar com incertezas ou a má qualidade dos dados. Para aplicar a técnica de paralelização de fatoração de matrizes esparsas na inferência de redes credais, a inferência utiliza-se da técnica de eliminação de variáveis onde o grafo acíclico da rede credal é associado a uma matriz esparsa e cada variável eliminada é análoga a eliminação de uma coluna. / This study\'s objective is the computational performance improvement of credal network inference algorithms by applying computational parallel and distributed system techniques of sparse matrix factorization algorithms. Roughly, computational parallel techniques are used to transform systems in systems with algorithms that can be executed concurrently. And the matrix factorization is a group of mathematical techniques to decompose a matrix in a product of two or more matrixes. The sparse matrixes are matrixes which have most of their values equal to zero. And credal networks are similar to Bayesian networks, which are acyclic graphs representing a joint probability through conditional probabilities and their independence relations. Credal networks can be considered as a Bayesian network extension because of their manner of leading to uncertainty and the poor data quality. To apply parallel techniques of sparse matrix factorization in credal network inference the variable elimination method was used, where the credal network acyclic graph is associated to a sparse matrix and every eliminated variable is analogous to an eliminated column.
|
23 |
Paralelização de inferência em redes credais utilizando computação distribuída para fatoração de matrizes esparsas / Parallelization of credal network inference using distributed computing for sparse matrix factorization.Ramon Fortes Pereira 25 April 2017 (has links)
Este estudo tem como objetivo melhorar o desempenho computacional dos algoritmos de inferência em redes credais, aplicando técnicas de computação paralela e sistemas distribuídos em algoritmos de fatoração de matrizes esparsas. Grosso modo, técnicas de computação paralela são técnicas para transformar um sistema em um sistema com algoritmos que possam ser executados concorrentemente. E a fatoração de matrizes são técnicas da matemática para decompor uma matriz em um produto de duas ou mais matrizes. As matrizes esparsas são matrizes que possuem a maioria de seus valores iguais a zero. E as redes credais são semelhantes as redes bayesianas, que são grafos acíclicos que representam uma probabilidade conjunta através de probabilidades condicionais e suas relações de independência. As redes credais podem ser consideradas como uma extensão das redes bayesianas para lidar com incertezas ou a má qualidade dos dados. Para aplicar a técnica de paralelização de fatoração de matrizes esparsas na inferência de redes credais, a inferência utiliza-se da técnica de eliminação de variáveis onde o grafo acíclico da rede credal é associado a uma matriz esparsa e cada variável eliminada é análoga a eliminação de uma coluna. / This study\'s objective is the computational performance improvement of credal network inference algorithms by applying computational parallel and distributed system techniques of sparse matrix factorization algorithms. Roughly, computational parallel techniques are used to transform systems in systems with algorithms that can be executed concurrently. And the matrix factorization is a group of mathematical techniques to decompose a matrix in a product of two or more matrixes. The sparse matrixes are matrixes which have most of their values equal to zero. And credal networks are similar to Bayesian networks, which are acyclic graphs representing a joint probability through conditional probabilities and their independence relations. Credal networks can be considered as a Bayesian network extension because of their manner of leading to uncertainty and the poor data quality. To apply parallel techniques of sparse matrix factorization in credal network inference the variable elimination method was used, where the credal network acyclic graph is associated to a sparse matrix and every eliminated variable is analogous to an eliminated column.
|
24 |
Unconditionally stable finite difference time domain methods for frequency dependent mediaRouf, Hasan January 2010 (has links)
The efficiency of the conventional, explicit finite difference time domain (FDTD)method is constrained by the upper limit on the temporal discretization, imposed by the Courant–Friedrich–Lewy (CFL) stability condition. Therefore, there is a growing interest in overcoming this limitation by employing unconditionally stable FDTD methods for which time-step and space-step can be independently chosen. Unconditionally stable Crank Nicolson method has not been widely used in time domain electromagnetics despite its high accuracy and low anisotropy. There has been no work on the Crank Nicolson FDTD (CN–FDTD) method for frequency dependent medium. In this thesis a new three-dimensional frequency dependent CN–FDTD (FD–CN–FDTD) method is proposed. Frequency dependency of single–pole Debye materials is incorporated into the CN–FDTD method by means of an auxiliary differential formulation. In order to provide a convenient and straightforward algorithm, Mur’s first-order absorbing boundary conditions are used in the FD–CN–FDTD method. Numerical tests validate and confirm that the FD–CN–FDTD method is unconditionally stable beyond the CFL limit. The proposed method yields a sparse system of linear equations which can be solved by direct or iterative methods, but numerical experiments demonstrate that for large problems of practical importance iterative solvers are to be used. The FD–CN–FDTD sparse matrix is diagonally dominant when the time-stepis near the CFL limit but the diagonal dominance of the matrix deteriorates with the increase of the time-step, making the solution time longer. Selection of the matrix solver to handle the FD–CN–FDTD sparse system is crucial to fully harness the advantages of using larger time-step, because the computational costs associated with the solver must be kept as low as possible. Two best–known iterative solvers, Bi-Conjugate Gradient Stabilised (BiCGStab) and Generalised Minimal Residual (GMRES), are extensively studied in terms of the number of iteration requirements for convergence, CPU time and memory requirements. BiCGStab outperforms GMRES in every aspect. Many of these findings do not match with the existing literature on frequency–independent CN–FDTD method and the possible reasons for this are pointed out. The proposed method is coded in Fortran and major implementation techniques of the serial code as well as its parallel implementation in Open Multi-Processing (OpenMP) are presented. As an application, a simulation model of the human body is developed in the FD–CN–FDTD method and numerical simulation of the electromagnetic wave propagation inside the human head is shown. Finally, this thesis presents a new method modifying the frequency dependent alternating direction implicit FDTD (FD–ADI–FDTD) method. Although the ADI–FDTD method provides a computationally affordable approximation of the CN–FDTD method, it exhibits a loss of accuracy with respect to the CN-FDTD method which may become severe for some practical applications. The modified FD–ADI–FDTD method can improve the accuracy of the normal FD–ADI–FDTD method without significantly increasing the computational costs.
|
25 |
Akcelerace numerického výpočtu vedení tepla v tuhých tělesech v inverzních úlohách / Acceleration of numerical computation of heat conduction in solids in inverse tasksOndruch, Tomáš January 2019 (has links)
The master's thesis deals with possible ways of accelerating numerical computations, which are present in problems related to heat conduction in solids. The thesis summarizes basic characteristics of heat transfer phenomena with emphasis on heat conduction. Theoretical principles of control volume method are utilized to convert a direct heat conduction problem into a sparse linear system. Relevant fundamentals from the field of inverse heat conduction problems are presented with reference to intensive computations of direct problems of such kind. Numerical methods which are well-suited to find a solution of direct heat conduction problems are described. Remarks on practical implementation of time-efficient computations are made in relation with a two-dimensional heat conduction model. The results are compared and discussed with respect to obtained computational time for several tested methods.
|
26 |
Representation and Efficient Computation of Sparse Matrix for Neural Networks in Customized HardwareYan, Lihao January 2022 (has links)
Deep Neural Networks are widely applied to various kinds of fields nowadays. However, hundreds of thousands of neurons in each layer result in intensive memory storage requirement and a massive number of operations, making it difficult to employ deep neural networks on mobile devices where the hardware resources are limited. One common technique to address the memory limitation is to prune and quantize the neural networks. Besides, due to the frequent usage of Rectified Linear Unit (ReLU) function or network pruning, majority of the data in the weight matrices will be zeros, which will not only take up a large amount of memory space but also cause unnecessary computation operations. In this thesis, a new value-based compression method is put forward to represent sparse matrix more efficiently by eliminating these zero elements, and a customized hardware is implemented to realize the decompression and computation operations. The value-based compression method is aimed to replace the nonzero data in each column of the weight matrix with a reference value (arithmetic mean) and the relative differences between each nonzero element and the reference value. Intuitively, the data stored in each column is likely to contain similar values. Therefore, the differences will have a narrow range, and fewer bits rather than the full form will be sufficient to represent all the differences. In this way, the weight matrix can be further compressed to save memory space. The proposed value-based compression method reduces the memory storage requirement for the fully-connected layers of AlexNet to 37%, 41%, 47% and 68% of the compressed model, e.g., the Compressed Sparse Column (CSC) format, when the data size is set to 8 bits and the sparsity is 20%, 40%, 60% and 80% respectively. In the meanwhile, 41%, 53% and 63% compression rates of the fully-connected layers of the compressed AlexNet model with respect to 8-bit, 16-bit and 32-bit data are achieved when the sparsity is 40%. Similar results are obtained for VGG16 experiment. / Djupa neurala nätverk används i stor utsträckning inom olika fält nuförtiden. Emellertid ställer hundratusentals neuroner per lager krav på intensiv minneslagring och ett stort antal operationer, vilket gör det svårt att använda djupa neurala nätverk på mobila enheter där hårdvaruresurserna är begränsade. En vanlig teknik för att hantera minnesbegränsningen är att beskära och kvantifiera de neurala nätverken. På grund av den frekventa användningen av Rectified Linear Unit (ReLU) -funktionen eller nätverksbeskärning kommer majoriteten av datat i viktmatriserna att vara nollor, vilket inte bara tar upp mycket minnesutrymme utan också orsakar onödiga beräkningsoperationer. I denna avhandling presenteras en ny värdebaserad komprimeringsmetod för att representera den glesa matrisen mer effektivt genom att eliminera dessa nollelement, och en anpassad hårdvara implementeras för att realisera dekompressions- och beräkningsoperationerna. Den värdebaserade komprimeringsmetoden syftar till att ersätta icke-nolldata i varje kolumn i viktmatrisen med ett referensvärde (aritmetiskt medelvärde) och de relativa skillnaderna mellan varje icke-nollelement och referensvärdet. Intuitivt kommer data som lagras i varje kolumn sannolikt att innehålla liknande värden. Därför kommer skillnaderna att ha ett smalt intervall, och färre bitar snarare än den fullständiga formen kommer att räcka för att representera alla skillnader. På så sätt kan viktmatrisen komprimeras ytterligare för att spara minnesutrymme. Den föreslagna värdebaserade komprimeringsmetoden minskar minneslagringskravet för de helt anslutna lagren av AlexNet till 37%, 41%, 47% och 68% av den komprimerade modellen, t.ex. Compressed Sparse Column (CSC) format, när datastorleken är inställd på 8 bitar och sparsiteten är 20%, 40%, 60% respektive 80%. Under tiden uppnås 41%, 53% och 63% komprimeringshastigheter för de helt anslutna lagren i den komprimerade AlexNet-modellen med avseende på 8- bitars, 16-bitars och 32-bitars data när sparsiteten är 40%. Liknande resultat erhålls för VGG16-experiment.
|
27 |
Auto-tuning pour la détection automatique du meilleur format de compression pour matrice creuse / Auto-tuning for the automatic sparse matrix optimal compression format detectionMehrez, Ichrak 27 September 2018 (has links)
De nombreuses applications en calcul scientifique traitent des matrices creuses de grande taille ayant des structures régulières ou irrégulières. Pour réduire à la fois la complexité spatiale et la complexité temporelle du traitement, ces matrices nécessitent l'utilisation d'une structure particulière de stockage (ou de compression) des données ainsi que l’utilisation d’architectures cibles parallèles/distribuées. Le choix du format de compression le plus adéquat dépend généralement de plusieurs facteurs dont, en particulier, la structure de la matrice creuse, l'architecture de la plateforme cible et la méthode numérique utilisée. Etant donné la diversité de ces facteurs, un choix optimisé pour un ensemble de données d'entrée peut induire de mauvaises performances pour un autre. D’où l’intérêt d’utiliser un système permettant la sélection automatique du meilleur format de compression (MFC) en prenant en compte ces différents facteurs. C’est dans ce cadre précis que s’inscrit cette thèse. Nous détaillons notre approche en présentant la modélisation d'un système automatique qui, étant donnée une matrice creuse, une méthode numérique, un modèle de programmation parallèle et une architecture, permet de sélectionner automatiquement le MFC. Dans une première étape, nous validons notre modélisation par une étude de cas impliquant (i) Horner, et par la suite le produit matrice-vecteur creux (PMVC), comme méthodes numériques, (ii) CSC, CSR, ELL, et COO comme formats de compression, (iii) le data parallèle comme modèle de programmation et (iv) une plateforme multicœurs comme architecture cible. Cette étude nous permet d’extraire un ensemble de métriques et paramètres qui influent sur la sélection du MFC. Nous démontrons que les métriques extraites de l'analyse du modèle data parallèle ne suffisent pas pour prendre une décision (sélection du MFC). Par conséquent, nous définissons de nouvelles métriques impliquant le nombre d'opérations effectuées par la méthode numérique et le nombre d’accès à la mémoire. Ainsi, nous proposons un processus de décision prenant en compte à la fois l'analyse du modèle data parallèle et l'analyse de l’algorithme. Dans une deuxième étape, et en se basant sur les données que nous avons extrait précédemment, nous utilisons les algorithmes du Machine Learning pour prédire le MFC d’une matrice creuse donnée. Une étude expérimentale ciblant une plateforme parallèle multicœurs et traitant des matrices creuses aléatoires et/ou provenant de problèmes réels permet de valider notre approche et d’évaluer ses performances. Comme travail futur, nous visons la validation de notre approche en utilisant d'autres plateformes parallèles telles que les GPUs. / Several applications in scientific computing deals with large sparse matrices having regular or irregular structures. In order to reduce required memory space and computing time, these matrices require the use of a particular data storage structure as well as the use of parallel/distributed target architectures. The choice of the most appropriate compression format generally depends on several factors, such as matrix structure, numerical method and target architecture. Given the diversity of these factors, an optimized choice for one input data set will likely have poor performances on another. Hence the interest of using a system allowing the automatic selection of the Optimal Compression Format (OCF) by taking into account these different factors. This thesis is written in this context. We detail our approach by presenting a design of an auto-tuner system for OCF selection. Given a sparse matrix, a numerical method, a parallel programming model and an architecture, our system can automatically select the OCF. In a first step, we validate our modeling by a case study that concerns (i) Horner scheme, and then the sparse matrix vector product (SMVP), as numerical methods, (ii) CSC, CSR, ELL, and COO as compression formats; (iii) data parallel as a programming model; and (iv) a multicore platform as target architecture. This study allows us to extract a set of metrics and parameters that affect the OCF selection. We note that data parallel metrics are not sufficient to accurately choose the most suitable format. Therefore, we define new metrics involving the number of operations and the number of indirect data access. Thus, we proposed a new decision process taking into account data parallel model analysis and algorithm analysis.In the second step, we propose to use machine learning algorithm to predict the OCF for a given sparse matrix. An experimental study using a multicore parallel platform and dealing with random and/or real-world random matrices validates our approach and evaluates its performances. As future work, we aim to validate our approach using other parallel platforms such as GPUs.
|
28 |
Scalable, Memory-Intensive Scientific Computing on Field Programmable Gate ArraysMirza, Salma 01 January 2010 (has links) (PDF)
Cache-based, general purpose CPUs perform at a small fraction of their maximum floating point performance when executing memory-intensive simulations, such as those required for many scientific computing problems. This is due to the memory bottleneck that is encountered with large arrays that must be stored in dynamic RAM. A system of FPGAs, with a large enough memory bandwidth, and clocked at only hundreds of MHz can outperform a CPU clocked at GHz in terms of floating point performance. An FPGA core designed for a target performance that does not unnecessarily exceed the memory imposed bottleneck can then be distributed, along with multiple memory interfaces, into a scalable architecture that overcomes the bandwidth limitation of a single interface. Interconnected cores can work together to solve a scientific computing problem and exploit a bandwidth that is the sum of the bandwidth available from all of their connected memory interfaces. The implementation demonstrates this concept of scalability with two memory interfaces through the use of available FPGA prototyping platforms. Even though the FPGAs operate at 133 MHz, which is twenty one times slower than an AMD Phenom X4 processor operating at 2.8 GHz, the system of two FPGAs performs eight times slower than the processor for the example problem of SMVM in heat transfer. However, the system is demonstrated to be scalable with a run-time that decreases linearly with respect to the available memory bandwidth. The floating point performance of a single board implementation is 12 GFlops which doubles to 24 GFlops for a two board implementation, for a gather or scatter operation on matrices of varying sizes.
|
29 |
O método multigrid algébrico na resolução de sistemas lineares oriundos do método dos elementos finitos. / The algebric multigrid method for solving linear systems issued from the finite element method.Pereira, Fábio Henrique 14 February 2007 (has links)
Este trabalho propõe uma nova abordagem, baseada em wavelets, para o método Multigrid Algébrico (WAMG). Nesta nova abordagem, a Transformada Discreta Wavelet é aplicada na matriz de coeficientes do sistema linear gerando uma aproximação dessa matriz em cada nível do processo de multiresolução. As vantagens da nova abordagem, que incluem maior facilidade de paralelização e menor tempo de montagem, são apresentadas com detalhes e uma análise quantitativa de convergência do método WAMG é realizada a partir da sua aplicação em problemas testes. O WAMG também é testado como pré- condicionador para métodos iterativos no subespaço de Krylov na análise magnetostática e magnetodinâmica (regime permanente senoidal) pelo Método dos Elementos Finitos, e em matrizes esparsas extraidas das coleções Matrix Market e da Universidade da Flórida. São apresentados resultados numéricos comparando o WAMG com o Multigrid Algébrico tradicional e com os pré-condicionadores baseados em decomposições incompletas de Cholesky e LU. / In this work we propose a wavelet-based algebraic multigrid method (WAMG) as a linear system solver as well as a prediconditioner for Krylov subspace methods. It is a new approach for the Algebraic Multigrid method (AMG), which considers the use of Discrete Wavelet Transform (DWT) in the construction of a hierarchy of matrices. The two-dimensional DWT is applied to produce an approximation of the matrix in each level of the wavelets multiresolution decomposition process. The main advantages of this new approach are presented and a quantitative analysis of its convergence is shown after its application in some test problems. The WAMG also is tested as a preconditioner for Krylov subspace methods in problems with sparse matrices, in nonlinear magnetic field problems and in 3D time-harmonic Electromagnetic Edge-based Finite Element Analysis. Numerical results are presented comparing the WAMG with the standard Algebraic Multigrid method and with the preconditioners based on the incomplete Cholesky and LU decompositions.
|
30 |
O método multigrid algébrico na resolução de sistemas lineares oriundos do método dos elementos finitos. / The algebric multigrid method for solving linear systems issued from the finite element method.Fábio Henrique Pereira 14 February 2007 (has links)
Este trabalho propõe uma nova abordagem, baseada em wavelets, para o método Multigrid Algébrico (WAMG). Nesta nova abordagem, a Transformada Discreta Wavelet é aplicada na matriz de coeficientes do sistema linear gerando uma aproximação dessa matriz em cada nível do processo de multiresolução. As vantagens da nova abordagem, que incluem maior facilidade de paralelização e menor tempo de montagem, são apresentadas com detalhes e uma análise quantitativa de convergência do método WAMG é realizada a partir da sua aplicação em problemas testes. O WAMG também é testado como pré- condicionador para métodos iterativos no subespaço de Krylov na análise magnetostática e magnetodinâmica (regime permanente senoidal) pelo Método dos Elementos Finitos, e em matrizes esparsas extraidas das coleções Matrix Market e da Universidade da Flórida. São apresentados resultados numéricos comparando o WAMG com o Multigrid Algébrico tradicional e com os pré-condicionadores baseados em decomposições incompletas de Cholesky e LU. / In this work we propose a wavelet-based algebraic multigrid method (WAMG) as a linear system solver as well as a prediconditioner for Krylov subspace methods. It is a new approach for the Algebraic Multigrid method (AMG), which considers the use of Discrete Wavelet Transform (DWT) in the construction of a hierarchy of matrices. The two-dimensional DWT is applied to produce an approximation of the matrix in each level of the wavelets multiresolution decomposition process. The main advantages of this new approach are presented and a quantitative analysis of its convergence is shown after its application in some test problems. The WAMG also is tested as a preconditioner for Krylov subspace methods in problems with sparse matrices, in nonlinear magnetic field problems and in 3D time-harmonic Electromagnetic Edge-based Finite Element Analysis. Numerical results are presented comparing the WAMG with the standard Algebraic Multigrid method and with the preconditioners based on the incomplete Cholesky and LU decompositions.
|
Page generated in 0.0656 seconds