Global ETD Search

1	Factorization of Quasiseparable Matrices Johnson, Paul D. 21 November 2008 (has links) This paper investigates some of the ideas and algorithms developed for exploiting the structure of quasiseparable matrices. The case of purely scalar generators is considered initially. The process by which a quasiseparable matrix is represented as the product of matrices comprised of its generators is explained. This is done clearly in the scalar case, but may be extended to block generators. The complete factoring approach is then considered. This consists of two stages: inner-outer factorization followed by inner-coprime factorization. Finally, the stability of the algorithm is investigated. The algorithm is used to factor various quasiseparable matrices R created first using minimal generators, and subsequently using non-minimal generators. The result is that stability of the algorithm is compromised when non-minimal generators are present. QR factorization fast algorithms quasiseparable matrices structure matrices Mathematics
2	Algorithms for a Partially Regularized Least Squares Problem Skoglund, Ingegerd January 2007 (has links) <p>Vid analys av vattenprover tagna från t.ex. ett vattendrag betäms halten av olika ämnen. Dessa halter är ofta beroende av vattenföringen. Det är av intresse att ta reda på om observerade förändringar i halterna beror på naturliga variationer eller är orsakade av andra faktorer. För att undersöka detta har föreslagits en statistisk tidsseriemodell som innehåller okända parametrar. Modellen anpassas till uppmätta data vilket leder till ett underbestämt ekvationssystem. I avhandlingen studeras bl.a. olika sätt att säkerställa en unik och rimlig lösning. Grundidén är att införa vissa tilläggsvillkor på de sökta parametrarna. I den studerade modellen kan man t.ex. kräva att vissa parametrar inte varierar kraftigt med tiden men tillåter årstidsvariationer. Det görs genom att dessa parametrar i modellen regulariseras.</p><p>Detta ger upphov till ett minsta kvadratproblem med en eller två regulariseringsparametrar. I och med att inte alla ingående parametrar regulariseras får vi dessutom ett partiellt regulariserat minsta kvadratproblem. I allmänhet känner man inte värden på regulariseringsparametrarna utan problemet kan behöva lösas med flera olika värden på dessa för att få en rimlig lösning. I avhandlingen studeras hur detta problem kan lösas numeriskt med i huvudsak två olika metoder, en iterativ och en direkt metod. Dessutom studeras några sätt att bestämma lämpliga värden på regulariseringsparametrarna.</p><p>I en iterativ lösningsmetod förbättras stegvis en given begynnelseapproximation tills ett lämpligt valt stoppkriterium blir uppfyllt. Vi använder här konjugerade gradientmetoden med speciellt konstruerade prekonditionerare. Antalet iterationer som krävs för att lösa problemet utan prekonditionering och med prekonditionering jämförs både teoretiskt och praktiskt. Metoden undersöks här endast med samma värde på de två regulariseringsparametrarna.</p><p>I den direkta metoden används QR-faktorisering för att lösa minsta kvadratproblemet. Idén är att först utföra de beräkningar som kan göras oberoende av regulariseringsparametrarna samtidigt som hänsyn tas till problemets speciella struktur.</p><p>För att bestämma värden på regulariseringsparametrarna generaliseras Reinsch’s etod till fallet med två parametrar. Även generaliserad korsvalidering och en mindre beräkningstung Monte Carlo-metod undersöks.</p> / <p>Statistical analysis of data from rivers deals with time series which are dependent, e.g., on climatic and seasonal factors. For example, it is a well-known fact that the load of substances in rivers can be strongly dependent on the runoff. It is of interest to find out whether observed changes in riverine loads are due only to natural variation or caused by other factors. Semi-parametric models have been proposed for estimation of time-varying linear relationships between runoff and riverine loads of substances. The aim of this work is to study some numerical methods for solving the linear least squares problem which arises.</p><p>The model gives a linear system of the form <em>A</em><em>1x1</em><em> + A</em><em>2x2</em><em> + n = b</em><em>1</em>. The vector <em>n</em> consists of identically distributed random variables all with mean zero. The unknowns, <em>x,</em> are split into two groups, <em>x</em><em>1</em><em> </em>and <em>x</em><em>2</em><em>.</em> In this model, usually there are more unknowns than observations and the resulting linear system is most often consistent having an infinite number of solutions. Hence some constraint on the parameter vector x is needed. One possibility is to avoid rapid variation in, e.g., the parameters<em> x</em><em>2</em><em>.</em> This can be accomplished by regularizing using a matrix <em>A</em><em>3</em>, which is a discretization of some norm. The problem is formulated</p><p>as a partially regularized least squares problem with one or two regularization parameters. The parameter <em>x</em><em>2</em> has here a two-dimensional structure. By using two different regularization parameters it is possible to regularize separately in each dimension.</p><p>We first study (for the case of one parameter only) the conjugate gradient method for solution of the problem. To improve rate of convergence blockpreconditioners of Schur complement type are suggested, analyzed and tested. Also a direct solution method based on QR decomposition is studied. The idea is to first perform operations independent of the values of the regularization parameters. Here we utilize the special block-structure of the problem. We further discuss the choice of regularization parameters and generalize in particular Reinsch’s method to the case with two parameters. Finally the cross-validation technique is treated. Here also a Monte Carlo method is used by which an approximation to the generalized cross-validation function can be computed efficiently.</p> Least squares Regularization Block-matrices Conjugate gradient QR-factorization Cross-validation Numerical analysis Numerisk analys
3	Algorithms for a Partially Regularized Least Squares Problem Skoglund, Ingegerd January 2007 (has links) Vid analys av vattenprover tagna från t.ex. ett vattendrag betäms halten av olika ämnen. Dessa halter är ofta beroende av vattenföringen. Det är av intresse att ta reda på om observerade förändringar i halterna beror på naturliga variationer eller är orsakade av andra faktorer. För att undersöka detta har föreslagits en statistisk tidsseriemodell som innehåller okända parametrar. Modellen anpassas till uppmätta data vilket leder till ett underbestämt ekvationssystem. I avhandlingen studeras bl.a. olika sätt att säkerställa en unik och rimlig lösning. Grundidén är att införa vissa tilläggsvillkor på de sökta parametrarna. I den studerade modellen kan man t.ex. kräva att vissa parametrar inte varierar kraftigt med tiden men tillåter årstidsvariationer. Det görs genom att dessa parametrar i modellen regulariseras. Detta ger upphov till ett minsta kvadratproblem med en eller två regulariseringsparametrar. I och med att inte alla ingående parametrar regulariseras får vi dessutom ett partiellt regulariserat minsta kvadratproblem. I allmänhet känner man inte värden på regulariseringsparametrarna utan problemet kan behöva lösas med flera olika värden på dessa för att få en rimlig lösning. I avhandlingen studeras hur detta problem kan lösas numeriskt med i huvudsak två olika metoder, en iterativ och en direkt metod. Dessutom studeras några sätt att bestämma lämpliga värden på regulariseringsparametrarna. I en iterativ lösningsmetod förbättras stegvis en given begynnelseapproximation tills ett lämpligt valt stoppkriterium blir uppfyllt. Vi använder här konjugerade gradientmetoden med speciellt konstruerade prekonditionerare. Antalet iterationer som krävs för att lösa problemet utan prekonditionering och med prekonditionering jämförs både teoretiskt och praktiskt. Metoden undersöks här endast med samma värde på de två regulariseringsparametrarna. I den direkta metoden används QR-faktorisering för att lösa minsta kvadratproblemet. Idén är att först utföra de beräkningar som kan göras oberoende av regulariseringsparametrarna samtidigt som hänsyn tas till problemets speciella struktur. För att bestämma värden på regulariseringsparametrarna generaliseras Reinsch’s etod till fallet med två parametrar. Även generaliserad korsvalidering och en mindre beräkningstung Monte Carlo-metod undersöks. / Statistical analysis of data from rivers deals with time series which are dependent, e.g., on climatic and seasonal factors. For example, it is a well-known fact that the load of substances in rivers can be strongly dependent on the runoff. It is of interest to find out whether observed changes in riverine loads are due only to natural variation or caused by other factors. Semi-parametric models have been proposed for estimation of time-varying linear relationships between runoff and riverine loads of substances. The aim of this work is to study some numerical methods for solving the linear least squares problem which arises. The model gives a linear system of the form A1x1 + A2x2 + n = b1. The vector n consists of identically distributed random variables all with mean zero. The unknowns, x, are split into two groups, x1 and x2. In this model, usually there are more unknowns than observations and the resulting linear system is most often consistent having an infinite number of solutions. Hence some constraint on the parameter vector x is needed. One possibility is to avoid rapid variation in, e.g., the parameters x2. This can be accomplished by regularizing using a matrix A3, which is a discretization of some norm. The problem is formulated as a partially regularized least squares problem with one or two regularization parameters. The parameter x2 has here a two-dimensional structure. By using two different regularization parameters it is possible to regularize separately in each dimension. We first study (for the case of one parameter only) the conjugate gradient method for solution of the problem. To improve rate of convergence blockpreconditioners of Schur complement type are suggested, analyzed and tested. Also a direct solution method based on QR decomposition is studied. The idea is to first perform operations independent of the values of the regularization parameters. Here we utilize the special block-structure of the problem. We further discuss the choice of regularization parameters and generalize in particular Reinsch’s method to the case with two parameters. Finally the cross-validation technique is treated. Here also a Monte Carlo method is used by which an approximation to the generalized cross-validation function can be computed efficiently. Least squares Regularization Block-matrices Conjugate gradient QR-factorization Cross-validation Numerical analysis Numerisk analys
4	Observability Methods in Sensor Scheduling January 2015 (has links) abstract: Modern measurement schemes for linear dynamical systems are typically designed so that different sensors can be scheduled to be used at each time step. To determine which sensors to use, various metrics have been suggested. One possible such metric is the observability of the system. Observability is a binary condition determining whether a finite number of measurements suffice to recover the initial state. However to employ observability for sensor scheduling, the binary definition needs to be expanded so that one can measure how observable a system is with a particular measurement scheme, i.e. one needs a metric of observability. Most methods utilizing an observability metric are about sensor selection and not for sensor scheduling. In this dissertation we present a new approach to utilize the observability for sensor scheduling by employing the condition number of the observability matrix as the metric and using column subset selection to create an algorithm to choose which sensors to use at each time step. To this end we use a rank revealing QR factorization algorithm to select sensors. Several numerical experiments are used to demonstrate the performance of the proposed scheme. / Dissertation/Thesis / Doctoral Dissertation Applied Mathematics 2015 Applied mathematics control theory observability rank revealing QR factorization sensor scheduling sensor selection
5	Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions / Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matriciel Herrmann, Julien 25 November 2015 (has links) Dans cette thèse, nous nous sommes penchés d’un point de vue à la foisthéorique et pratique sur la conception d’algorithmes et detechniques d’ordonnancement adaptées aux architectures complexes dessuperordinateurs modernes. Nous nous sommes en particulier intéressésà l’utilisation mémoire et la gestion des communications desalgorithmes pour le calcul haute performance (HPC). Nous avonsexploité l’hétérogénéité des superordinateurs modernes pour améliorerles performances du calcul matriciel. Nous avons étudié lapossibilité d’alterner intelligemment des étapes de factorisation LU(plus rapide) et des étapes de factorisation QR (plus stablenumériquement mais plus deux fois plus coûteuses) pour résoudre unsystème linéaire dense. Nous avons amélioré les performances desystèmes d’exécution dynamique à l’aide de pré-calculs statiquesprenants en compte l’ensemble du graphe de tâches de la factorisationCholesky ainsi que l’hétérogénéité de l’architecture. Nous noussommes intéressés à la complexité du problème d’ordonnancement degraphes de tâches utilisant de gros fichiers d’entrée et de sortiesur une architecture hétérogène avec deux types de ressources,utilisant chacune une mémoire spécifique. Nous avons conçu denombreuses heuristiques en temps polynomial pour la résolution deproblèmes généraux que l’on avait prouvés NP-complet aupréalable. Enfin, nous avons conçu des algorithmes optimaux pourordonnancer un graphe de différentiation automatique sur uneplateforme avec deux types de mémoire : une mémoire gratuite maislimitée et une mémoire coûteuse mais illimitée. / Throughout this thesis, we have designed memory-aware algorithms and scheduling techniques suitedfor modern memory architectures. We have shown special interest in improving the performance ofmatrix computations on multiple levels. At a high level, we have introduced new numerical algorithmsfor solving linear systems on large distributed platforms. Most of the time, these linear solvers rely onruntime systems to handle resources allocation and data management. We also focused on improving thedynamic schedulers embedded in these runtime systems by adding static information to their decisionprocess. We proposed new memory-aware dynamic heuristics to schedule workflows, that could beimplemented in such runtime systems.Altogether, we have dealt with multiple state-of-the-art factorization algorithms used to solve linearsystems, like the LU, QR and Cholesky factorizations. We targeted different platforms ranging frommulticore processors to distributed memory clusters, and worked with several reference runtime systemstailored for these architectures, such as P A RSEC and StarPU. On a theoretical side, we took specialcare of modelling convoluted hierarchical memory architectures. We have classified the problems thatare arising when dealing with these storage platforms. We have designed many efficient polynomial-timeheuristics on general problems that had been shown NP-complete beforehand. Ordonnancement multi-critère Algorithmes numériques Factorisation LU Factorisation QR Factorisation Cholesky Calcul haute performance Systèmes linéaires Différentiation automatique Scheduling Numerical algorithms LU factorization QR factorization Cholesky factorization High performance computing Linear systems Automatic differentiation
6	Algorithm-Architecture Co-Design for Dense Linear Algebra Computations Merchant, Farhad January 2015 (has links) (PDF) Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap- plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod- ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic and architectural variation to significantly improve the computation efficiency. There are two standard approaches for improving the computation effficiency, first, to perform application-specific architecture customization and second, to do algorithmic tuning. In the same manner, we first perform a graph-based analysis of selected DLA kernels. From the various forms of parallelism, thus identified, we design a custom processing element for improving the CPI. The processing elements are used as building blocks for a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per- forming detailed experiments on a synthesized CGRA implementation, we demonstrate that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product. DLA computations are encountered in many engineering and scientific computing ap- plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem. Traditionally, these applications are written in highly tuned High Performance Comput- ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba- sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual efforts in development and tuning of these packages, there still exists a scope for fur- ther tuning in this packages. In this thesis, we revisit most prominent and widely used compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these algorithms. Major contributions in the algorithmic tuning in this thesis are as follows: Algorithms: We present graph based analysis of General Matrix Multiplication (GMM) and discuss different types of parallelisms available in GMM We present analysis of Givens Rotation based QR factorization where we improve GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a column of a matrix simultaneously. We show that the multiplications in CGR are lower than GR We generalize CGR further and derive Generalized GR (GGR) that can annihilate multiple elements of the columns of a matrix simultaneously. We show that the parallelism exhibited by GGR is much higher than GR and Householder Transform (HT) We extend generalizations to Square root Free GR (also knows as Fast Givens Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise Fast Givens, and Column-wise SDFG . We also extend generalization for complex matrices and derive Complex Column-wise Givens Rotation Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible for on-chip communication, while Compute Elements in the REDEFINE can be domain customized to accelerate the applications pertaining to the domain of interest. In this thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently. We perform several architectural enhancements in the PE to approach lower bound of the CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of REDEFINE. We achieve better area and power performance compared to the yesteryear customized architecture for DLA. Major contributions in architecture in this thesis are as follows: Architecture: We present design of a PE for acceleration of GMM which is a Level-3 BLAS operation We methodically enhance the PE with different features for improvement in the performance of GMM For efficient realization of Linear Algebra Package (LAPACK), we use PE that can efficiently execute GMM and show better performance For further acceleration of LU and QR factorizations in LAPACK, we identify macro operations encountered in LU and QR factorizations, and realize them on a reconfigurable data-path resulting in 25-30% lower run-time Dense Linear Algebra (DLA) Algorithms-Applications Parallesing Algorithmic Structural Variations Dense Linear Algebra General Matrix Multiplication (GMM) Column-wise Givens Rotation (CGR) QR Factorization Basic Linear Algebra Subprograms (BLAS) Linear Algebra Package (LAPACK) Electronic Systems Engineering

1

Page generated in 0.0727 seconds