Global ETD Search

21	Sobre um método assemelhado ao de Francis para a determinação de autovalores de matrizes Oliveira, Danilo Elias de [UNESP] 23 February 2006 (has links) (PDF) Made available in DSpace on 2014-06-11T19:27:08Z (GMT). No. of bitstreams: 0 Previous issue date: 2006-02-23Bitstream added on 2014-06-13T19:26:10Z : No. of bitstreams: 1 oliveira_de_me_sjrp.pdf: 1040006 bytes, checksum: 88dd8fa849febafe8d0aa9bf32892235 (MD5) / O principal objetivo deste trabalho é apresentar, discutir as qualidades e desempenho e provar a convergência de um método iterativo para a solução numérica do problema de autovalores de uma matriz, que chamamos de Método Assemelhado ao de Francis (MAF). O método em questão distingue-se do QR de Francis pela maneira, mais simples e rápida, de se obter as matrizes ortogonais Qk, k = 1; 2. Apresentamos, também, uma comparação entre o MAF e os algoritmos QR de Francis e LR de Rutishauser. / The main purpose of this work is to presente, to discuss the qualities and performance and to prove the convergence of an iterative method for the numerical solution of the eigenvalue problem, that we have called the Método Assemelhado ao de Francis (MAF)þþ. This method di ers from the QR method of Francis by providing a simpler and faster technique of getting the unitary matrices Qk; k = 1; 2; We present, also, a comparison analises between the MAF and the QR of Francis and LR of Rutishauser algorithms. Álgebra linear Matrizes (Matematica) Eigenvalue QR algorithm LR algorithm Cholesky decomposition
22	Sluggish Cognitve Tempo: Stability, Validity, and Heritability Vu, Alexander 01 June 2016 (has links) No description available. Psychology Sluggish Cognitive Tempo Longitudinal Stability Validity Etiology twins genetic ACE Cholesky Decomposition SCT
23	Accelerated sampling of energy landscapes Mantell, Rosemary Genevieve January 2017 (has links) In this project, various computational energy landscape methods were accelerated using graphics processing units (GPUs). Basin-hopping global optimisation was treated using a version of the limited-memory BFGS algorithm adapted for CUDA, in combination with GPU-acceleration of the potential calculation. The Lennard-Jones potential was implemented using CUDA, and an interface to the GPU-accelerated AMBER potential was constructed. These results were then extended to form the basis of a GPU-accelerated version of hybrid eigenvector-following. The doubly-nudged elastic band method was also accelerated using an interface to the potential calculation on GPU. Additionally, a local rigid body framework was adapted for GPU hardware. Tests were performed for eight biomolecules represented using the AMBER potential, ranging in size from 81 to 22\,811 atoms, and the effects of minimiser history size and local rigidification on the overall efficiency were analysed. Improvements relative to CPU performance of up to two orders of magnitude were obtained for the largest systems. These methods have been successfully applied to both biological systems and atomic clusters. An existing interface between a code for free energy basin-hopping and the SuiteSparse package for sparse Cholesky factorisation was refined, validated and tested. Tests were performed for both Lennard-Jones clusters and selected biomolecules represented using the AMBER potential. Significant acceleration of the vibrational frequency calculations was achieved, with negligible loss of accuracy, relative to the standard diagonalisation procedure. For the larger systems, exploiting sparsity reduces the computational cost by factors of 10 to 30. The acceleration of these computational energy landscape methods opens up the possibility of investigating much larger and more complex systems than previously accessible. A wide array of new applications are now computationally feasible. 660.0285
24	Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions / Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel Herrmann, Julien 25 November 2015 (has links) Dans cette thèse, nous nous sommes penchés d’un point de vue à la foisthéorique et pratique sur la conception d’algorithmes et detechniques d’ordonnancement adaptées aux architectures complexes dessuperordinateurs modernes. Nous nous sommes en particulier intéressésà l’utilisation mémoire et la gestion des communications desalgorithmes pour le calcul haute performance (HPC). Nous avonsexploité l’hétérogénéité des superordinateurs modernes pour améliorerles performances du calcul matriciel. Nous avons étudié lapossibilité d’alterner intelligemment des étapes de factorisation LU(plus rapide) et des étapes de factorisation QR (plus stablenumériquement mais plus deux fois plus coûteuses) pour résoudre unsystème linéaire dense. Nous avons amélioré les performances desystèmes d’exécution dynamique à l’aide de pré-calculs statiquesprenants en compte l’ensemble du graphe de tâches de la factorisationCholesky ainsi que l’hétérogénéité de l’architecture. Nous noussommes intéressés à la complexité du problème d’ordonnancement degraphes de tâches utilisant de gros fichiers d’entrée et de sortiesur une architecture hétérogène avec deux types de ressources,utilisant chacune une mémoire spécifique. Nous avons conçu denombreuses heuristiques en temps polynomial pour la résolution deproblèmes généraux que l’on avait prouvés NP-complet aupréalable. Enfin, nous avons conçu des algorithmes optimaux pourordonnancer un graphe de différentiation automatique sur uneplateforme avec deux types de mémoire : une mémoire gratuite maislimitée et une mémoire coûteuse mais illimitée. / Throughout this thesis, we have designed memory-aware algorithms and scheduling techniques suitedfor modern memory architectures. We have shown special interest in improving the performance ofmatrix computations on multiple levels. At a high level, we have introduced new numerical algorithmsfor solving linear systems on large distributed platforms. Most of the time, these linear solvers rely onruntime systems to handle resources allocation and data management. We also focused on improving thedynamic schedulers embedded in these runtime systems by adding static information to their decisionprocess. We proposed new memory-aware dynamic heuristics to schedule workflows, that could beimplemented in such runtime systems.Altogether, we have dealt with multiple state-of-the-art factorization algorithms used to solve linearsystems, like the LU, QR and Cholesky factorizations. We targeted different platforms ranging frommulticore processors to distributed memory clusters, and worked with several reference runtime systemstailored for these architectures, such as P A RSEC and StarPU. On a theoretical side, we took specialcare of modelling convoluted hierarchical memory architectures. We have classified the problems thatare arising when dealing with these storage platforms. We have designed many efficient polynomial-timeheuristics on general problems that had been shown NP-complete beforehand. Ordonnancement multi-critère Algorithmes numériques Factorisation LU Factorisation QR Factorisation Cholesky Calcul haute performance Systèmes linéaires Différentiation automatique Scheduling Numerical algorithms LU factorization QR factorization Cholesky factorization High performance computing Linear systems Automatic differentiation
25	QR與LR算則之位移策略 / On the shift strategies for the QR and LR algorithms 黃義哲, HUANG, YI-ZHE Unknown Date (has links) 用QR與LR迭代法求矩陣特徵值與特徵向量之過程中，前人曾提出位移策略以加速其收斂速度，其中最有效的是Wilkinson 移位值。在此我們希望尋求能使收斂更快速的位移值。我們首先嘗使用一三階子矩陣之特徵值作為一次QR迭代之移位值。在此子矩陣之特徵值中，我們選擇最接近Wilkinson 移位值的特徵值為移位值，期使特徵值之收斂更快。另一移位策略是用一較快速省功的算則先計算矩陣之特徵值，再以這些計算值作為QR迭代之位移值，來計算較為費功的特徵向量。希望能較快得到所需要的特徵值與特徵向量。在計算特徵值之算則中，Cholesky迭代法以其計算簡單，執行速度快為我們所選擇。由程式執行結果可知這兩種算則較EISPACK 的算則分別節省了約10% 與30% 的運算量。我們比較這些策略，並將結果列於文中。 / Abstract The QR and LR algorithms are the general methods for computing eigenvalues and eigenvectors of a dense matrix. In this paper, we propose some shift strategies that can increase the efficiency of the QR algorithm by first computing the eigenvalues of the matrix (or its trailing submatrix) in a fast and economical way, and then using them as shifts to find the eigenvalues and their corresponding eigenvectors. When incorporated with QR algorithm, these kinds of shift strategies can save about 10 to 30percent of work in arithmetic operations. 位移策略特徵向量特徵值
26	多維異質變異模型於結構型商品評價上之應用研究王俊欽 Unknown Date (has links) 近年來市面上的結構型商品日新月異，其中的股權連結型商品，其報酬收益形態往往因為與多檔標的資產連結而造成封閉解的不易求得。在評價此類商品時，常需要藉由撰寫電腦程式語言來模擬各標的股價的未來路徑 (例：Monte Carlo Simulation) ，並對未來期望現金流量折現求解。但因為各標的股價間彼此相關，在模擬股價時，需要對其相關係數矩陣 (correlation matrix) 做Cholesky Decomposition的操作，以便藉由獨立的常態隨機變數造出彼此相關的多元常態隨機變數。由過去的歷史資料和實證分析得知，各股價報酬間的相關係數矩陣和波動度 (volatility) 皆是隨著時間改變 (time-varing) 而非固定不變的常數 (constant) ，故本論文在模擬股價時，不直接以過去歷史資料所求算之樣本變異數、樣本相關係數來做為模擬股價所需參數，而是考慮使用時間序列中的多維異質變異模型 (Multivariate Conditional Heteroscedastic Models) 或稱多維的波動度模型 (Multivariate Volatility Models) 來預測 (forecast) 未來商品存續期間各時點連結標的資產報酬間的相關係數矩陣和波動度，以便做為模擬股價所需之參數。本文實際將波動度模型套用在兩件於中國發行的多標的股權連動債券的評價上，發現因為經由波動度模型所預測而得之未來波動度和相關係數皆有均數回歸性質 (mean reversion)，造成最後的評價結果與直接使用歷史波動度和歷史相關係數所得之結果無太大的差異，故認為將來處理相同問題時，可直接使用歷史資料所估得之參數代入模擬程序即可。關鍵詞：波動度模型、Cholesky Decomposition、結構商品評價、蒙地卡羅法。波動度模型 Cholesky Decomposition 結構商品評價蒙地卡羅法
27	Longitudinal data analysis with covariates measurement error Hoque, Md. Erfanul 05 January 2017 (has links) Longitudinal data occur frequently in medical studies and covariates measured by error are typical features of such data. Generalized linear mixed models (GLMMs) are commonly used to analyse longitudinal data. It is typically assumed that the random effects covariance matrix is constant across the subject (and among subjects) in these models. In many situations, however, this correlation structure may differ among subjects and ignoring this heterogeneity can cause the biased estimates of model parameters. In this thesis, following Lee et al. (2012), we propose an approach to properly model the random effects covariance matrix based on covariates in the class of GLMMs where we also have covariates measured by error. The resulting parameters from this decomposition have a sensible interpretation and can easily be modelled without the concern of positive definiteness of the resulting estimator. The performance of the proposed approach is evaluated through simulation studies which show that the proposed method performs very well in terms biases and mean square errors as well as coverage rates. The proposed method is also analysed using a data from Manitoba Follow-up Study. / February 2017 Cholesky decomposition Longitudinal data Measurement error Random effects Generalized Linear Mixed Model
28	Athapascan-1 : vers un modèle de programmation parallèle adapté au calcul scientifique Doreille, Mathias 14 December 1999 (has links) (PDF) Les ordinateurs parallèles offrent une alternative intéressante pour les applications de calcul scientifique, grandes consommatrices de ressources de calcul et de mémoire. Cependant, la programmation efficace de ces machines est souvent difficile et les implantations obtenues sont généralement peu portables. Nous proposons dans cette thèse un modèle de programmation parallèle permettant une programmation simple, portable et efficace des applications parallèles. Ce modèle est basé sur une décomposition explicite de l'application en tâches de calculs qui communiquent entre elles par l'intermédiaire d'objets en mémoire partagée. La sémantique des accès aux données partagées est quasi séquentielle et les précédences entre les tâches sont implicitement définies pour respecter cette sémantique. Nous présentons dans une première partie la mise en oeuvre de ce modèle de programmation dans l'interface applicative C++ Athapascan-1. Une analyse à l'exécution des dépendances de données entre tâches permet d'extraire le flot de données et donc les précédences entre les tâches à exécuter. Des algorithmes d'ordonnancement adaptables à l'application et à la machine cible sont également utilisés. Nous montrons comment, sur architecture distribuée, la connaissance du flot de données entre les tâches peut être utilisée par le système pour réduire les communications et gérer efficacement la mémoire partagée distribuée. Ce modèle de programmation et sa mise en oeuvre dans l'interface applicative Athapascan-1 sont ensuite validés expérimentalement sur différentes architectures et différentes applications d'algèbre linéaire, notamment la factorisation creuse de Cholesky avec partitionnement bidimensionnel. La facilité de programmation de ces applications grâce à cette interface et les résultats obtenus (amélioration des performances par rapport au code de factorisation dense de Cholesky de la bibliothèque ScaLapak sur une machine à 60 processeurs par exemple) confirment l'intérêt du modèle de programmation proposé. [MATH] Mathematics langages de programmation parallèle ordonnancement application irrégulière
29	Acceleration of Massive MIMO algorithms for Beyond 5G Baseband processing Nihl, Ellen, de Bruijckere, Eek January 2023 (has links) As the world becomes more globalised, user equipment such as smartphones and Internet of Things devices require increasingly more data, which increases the demand for wireless data traffic. Hence, the acceleration of next-generational networks (5G and beyond) focuses mainly on increasing the bitrate and decreasing the latency. A crucial technology for 5G and beyond is the massive MIMO. In a massive MIMO system, a detector processes the received signals from multiple antennas to decode the transmitted data and extract useful information. This has been implemented in many ways, and one of the most used algorithms is the Zero Forcing (ZF) algorithm. This thesis presents a novel parallel design to accelerate the ZF algorithm using the Cholesky decomposition. This is implemented on a GPU, written in the CUDA programming language, and compared to the existing state-of-the-art implementations regarding latency and throughput. The implementation is also validated from a MATLAB implementation. This research demonstrates promising performance using GPUs for massive MIMO detection algorithms. Our approach achieves a significant speedup factor of 350 in comparison to a serial version of the implementation. The throughput achieved is 160 times greater than a comparable GPU-based approach. Despite this, our approach reaches a 2.4 times lower throughput than a solution that employed application-specific hardware. Given the promising results, we advocate for continued research in this area to further optimise detection algorithms and enhance their performance on GPUs, to potentially achieve even higher throughput and lower latency. / <p>Our examiner Mahdi wants to wait six months before the thesis is published. </p> embedded systems massive MIMO GPU massive MIMO detection Zero-Forcing algorithm Cholesky decomposition CUDA 5G and Beyond Embedded Systems Inbäddad systemteknik
30	Recursive Blocked Algorithms, Data Structures, and High-Performance Software for Solving Linear Systems and Matrix Equations Jonsson, Isak January 2003 (has links) <p>This thesis deals with the development of efficient and reliable algorithms and library software for factorizing matrices and solving matrix equations on high-performance computer systems. The architectures of today's computers consist of multiple processors, each with multiple functional units. The memory systems are hierarchical with several levels, each having different speed and size. The practical peak performance of a system is reached only by considering all of these characteristics. One portable method for achieving good system utilization is to express a linear algebra problem in terms of level 3 BLAS (Basic Linear Algebra Subprogram) transformations. The most important operation is GEMM (GEneral Matrix Multiply), which typically defines the practical peak performance of a computer system. There are efficient GEMM implementations available for almost any platform, thus an algorithm using this operation is highly portable.</p><p>The dissertation focuses on how recursion can be applied to solve linear algebra problems. Recursive linear algebra algorithms have the potential to automatically match the size of subproblems to the different memory hierarchies, leading to much better utilization of the memory system. Furthermore, recursive algorithms expose level 3 BLAS operations, and reveal task parallelism. The first paper handles the Cholesky factorization for matrices stored in packed format. Our algorithm uses a recursive packed matrix data layout that enables the use of high-performance matrix--matrix multiplication, in contrast to the standard packed format. The resulting library routine requires half the memory of full storage, yet the performance is better than for full storage routines.</p><p>Paper two and tree introduce recursive blocked algorithms for solving triangular Sylvester-type matrix equations. For these problems, recursion together with superscalar kernels produce new algorithms that give 10-fold speedups compared to existing routines in the SLICOT and LAPACK libraries. We show that our recursive algorithms also have a significant impact on the execution time of solving unreduced problems and when used in condition estimation. By recursively splitting several problem dimensions simultaneously, parallel algorithms for shared memory systems are obtained. The fourth paper introduces a library---RECSY---consisting of a set of routines implemented in Fortran 90 using the ideas presented in paper two and three. Using performance monitoring tools, the last paper evaluates the possible gain in using different matrix blocking layouts and the impact of superscalar kernels in the RECSY library. </p> Recursive algorithm recursive blocked format Cholesky factorization Sylvester-type equations automatic blocking superscalar GEMM-based RECSY library Computer engineering Datorteknik

Search results