Spelling suggestions: "subject:"QR factorization"" "subject:"QR actorization""
1 |
Factorization of Quasiseparable MatricesJohnson, Paul D. 21 November 2008 (has links)
This paper investigates some of the ideas and algorithms developed for exploiting the structure of quasiseparable matrices. The case of purely scalar generators is considered initially. The process by which a quasiseparable matrix is represented as the product of matrices comprised of its generators is explained. This is done clearly in the scalar case, but may be extended to block generators. The complete factoring approach is then considered. This consists of two stages: inner-outer factorization followed by inner-coprime factorization. Finally, the stability of the algorithm is investigated. The algorithm is used to factor various quasiseparable matrices R created first using minimal generators, and subsequently using non-minimal generators. The result is that stability of the algorithm is compromised when non-minimal generators are present.
|
2 |
Algorithms for a Partially Regularized Least Squares ProblemSkoglund, Ingegerd January 2007 (has links)
<p>Vid analys av vattenprover tagna från t.ex. ett vattendrag betäms halten av olika ämnen. Dessa halter är ofta beroende av vattenföringen. Det är av intresse att ta reda på om observerade förändringar i halterna beror på naturliga variationer eller är orsakade av andra faktorer. För att undersöka detta har föreslagits en statistisk tidsseriemodell som innehåller okända parametrar. Modellen anpassas till uppmätta data vilket leder till ett underbestämt ekvationssystem. I avhandlingen studeras bl.a. olika sätt att säkerställa en unik och rimlig lösning. Grundidén är att införa vissa tilläggsvillkor på de sökta parametrarna. I den studerade modellen kan man t.ex. kräva att vissa parametrar inte varierar kraftigt med tiden men tillåter årstidsvariationer. Det görs genom att dessa parametrar i modellen regulariseras.</p><p>Detta ger upphov till ett minsta kvadratproblem med en eller två regulariseringsparametrar. I och med att inte alla ingående parametrar regulariseras får vi dessutom ett partiellt regulariserat minsta kvadratproblem. I allmänhet känner man inte värden på regulariseringsparametrarna utan problemet kan behöva lösas med flera olika värden på dessa för att få en rimlig lösning. I avhandlingen studeras hur detta problem kan lösas numeriskt med i huvudsak två olika metoder, en iterativ och en direkt metod. Dessutom studeras några sätt att bestämma lämpliga värden på regulariseringsparametrarna.</p><p>I en iterativ lösningsmetod förbättras stegvis en given begynnelseapproximation tills ett lämpligt valt stoppkriterium blir uppfyllt. Vi använder här konjugerade gradientmetoden med speciellt konstruerade prekonditionerare. Antalet iterationer som krävs för att lösa problemet utan prekonditionering och med prekonditionering jämförs både teoretiskt och praktiskt. Metoden undersöks här endast med samma värde på de två regulariseringsparametrarna.</p><p>I den direkta metoden används QR-faktorisering för att lösa minsta kvadratproblemet. Idén är att först utföra de beräkningar som kan göras oberoende av regulariseringsparametrarna samtidigt som hänsyn tas till problemets speciella struktur.</p><p>För att bestämma värden på regulariseringsparametrarna generaliseras Reinsch’s etod till fallet med två parametrar. Även generaliserad korsvalidering och en mindre beräkningstung Monte Carlo-metod undersöks.</p> / <p>Statistical analysis of data from rivers deals with time series which are dependent, e.g., on climatic and seasonal factors. For example, it is a well-known fact that the load of substances in rivers can be strongly dependent on the runoff. It is of interest to find out whether observed changes in riverine loads are due only to natural variation or caused by other factors. Semi-parametric models have been proposed for estimation of time-varying linear relationships between runoff and riverine loads of substances. The aim of this work is to study some numerical methods for solving the linear least squares problem which arises.</p><p>The model gives a linear system of the form <em>A</em><em>1x1</em><em> + A</em><em>2x2</em><em> + n = b</em><em>1</em>. The vector <em>n</em> consists of identically distributed random variables all with mean zero. The unknowns, <em>x,</em> are split into two groups, <em>x</em><em>1</em><em> </em>and <em>x</em><em>2</em><em>.</em> In this model, usually there are more unknowns than observations and the resulting linear system is most often consistent having an infinite number of solutions. Hence some constraint on the parameter vector x is needed. One possibility is to avoid rapid variation in, e.g., the parameters<em> x</em><em>2</em><em>.</em> This can be accomplished by regularizing using a matrix <em>A</em><em>3</em>, which is a discretization of some norm. The problem is formulated</p><p>as a partially regularized least squares problem with one or two regularization parameters. The parameter <em>x</em><em>2</em> has here a two-dimensional structure. By using two different regularization parameters it is possible to regularize separately in each dimension.</p><p>We first study (for the case of one parameter only) the conjugate gradient method for solution of the problem. To improve rate of convergence blockpreconditioners of Schur complement type are suggested, analyzed and tested. Also a direct solution method based on QR decomposition is studied. The idea is to first perform operations independent of the values of the regularization parameters. Here we utilize the special block-structure of the problem. We further discuss the choice of regularization parameters and generalize in particular Reinsch’s method to the case with two parameters. Finally the cross-validation technique is treated. Here also a Monte Carlo method is used by which an approximation to the generalized cross-validation function can be computed efficiently.</p>
|
3 |
Algorithms for a Partially Regularized Least Squares ProblemSkoglund, Ingegerd January 2007 (has links)
Vid analys av vattenprover tagna från t.ex. ett vattendrag betäms halten av olika ämnen. Dessa halter är ofta beroende av vattenföringen. Det är av intresse att ta reda på om observerade förändringar i halterna beror på naturliga variationer eller är orsakade av andra faktorer. För att undersöka detta har föreslagits en statistisk tidsseriemodell som innehåller okända parametrar. Modellen anpassas till uppmätta data vilket leder till ett underbestämt ekvationssystem. I avhandlingen studeras bl.a. olika sätt att säkerställa en unik och rimlig lösning. Grundidén är att införa vissa tilläggsvillkor på de sökta parametrarna. I den studerade modellen kan man t.ex. kräva att vissa parametrar inte varierar kraftigt med tiden men tillåter årstidsvariationer. Det görs genom att dessa parametrar i modellen regulariseras. Detta ger upphov till ett minsta kvadratproblem med en eller två regulariseringsparametrar. I och med att inte alla ingående parametrar regulariseras får vi dessutom ett partiellt regulariserat minsta kvadratproblem. I allmänhet känner man inte värden på regulariseringsparametrarna utan problemet kan behöva lösas med flera olika värden på dessa för att få en rimlig lösning. I avhandlingen studeras hur detta problem kan lösas numeriskt med i huvudsak två olika metoder, en iterativ och en direkt metod. Dessutom studeras några sätt att bestämma lämpliga värden på regulariseringsparametrarna. I en iterativ lösningsmetod förbättras stegvis en given begynnelseapproximation tills ett lämpligt valt stoppkriterium blir uppfyllt. Vi använder här konjugerade gradientmetoden med speciellt konstruerade prekonditionerare. Antalet iterationer som krävs för att lösa problemet utan prekonditionering och med prekonditionering jämförs både teoretiskt och praktiskt. Metoden undersöks här endast med samma värde på de två regulariseringsparametrarna. I den direkta metoden används QR-faktorisering för att lösa minsta kvadratproblemet. Idén är att först utföra de beräkningar som kan göras oberoende av regulariseringsparametrarna samtidigt som hänsyn tas till problemets speciella struktur. För att bestämma värden på regulariseringsparametrarna generaliseras Reinsch’s etod till fallet med två parametrar. Även generaliserad korsvalidering och en mindre beräkningstung Monte Carlo-metod undersöks. / Statistical analysis of data from rivers deals with time series which are dependent, e.g., on climatic and seasonal factors. For example, it is a well-known fact that the load of substances in rivers can be strongly dependent on the runoff. It is of interest to find out whether observed changes in riverine loads are due only to natural variation or caused by other factors. Semi-parametric models have been proposed for estimation of time-varying linear relationships between runoff and riverine loads of substances. The aim of this work is to study some numerical methods for solving the linear least squares problem which arises. The model gives a linear system of the form A1x1 + A2x2 + n = b1. The vector n consists of identically distributed random variables all with mean zero. The unknowns, x, are split into two groups, x1 and x2. In this model, usually there are more unknowns than observations and the resulting linear system is most often consistent having an infinite number of solutions. Hence some constraint on the parameter vector x is needed. One possibility is to avoid rapid variation in, e.g., the parameters x2. This can be accomplished by regularizing using a matrix A3, which is a discretization of some norm. The problem is formulated as a partially regularized least squares problem with one or two regularization parameters. The parameter x2 has here a two-dimensional structure. By using two different regularization parameters it is possible to regularize separately in each dimension. We first study (for the case of one parameter only) the conjugate gradient method for solution of the problem. To improve rate of convergence blockpreconditioners of Schur complement type are suggested, analyzed and tested. Also a direct solution method based on QR decomposition is studied. The idea is to first perform operations independent of the values of the regularization parameters. Here we utilize the special block-structure of the problem. We further discuss the choice of regularization parameters and generalize in particular Reinsch’s method to the case with two parameters. Finally the cross-validation technique is treated. Here also a Monte Carlo method is used by which an approximation to the generalized cross-validation function can be computed efficiently.
|
4 |
Observability Methods in Sensor SchedulingJanuary 2015 (has links)
abstract: Modern measurement schemes for linear dynamical systems are typically designed so that different sensors can be scheduled to be used at each time step. To determine which sensors to use, various metrics have been suggested. One possible such metric is the observability of the system. Observability is a binary condition determining whether a finite number of measurements suffice to recover the initial state. However to employ observability for sensor scheduling, the binary definition needs to be expanded so that one can measure how observable a system is with a particular measurement scheme, i.e. one needs a metric of observability. Most methods utilizing an observability metric are about sensor selection and not for sensor scheduling. In this dissertation we present a new approach to utilize the observability for sensor scheduling by employing the condition number of the observability matrix as the metric and using column subset selection to create an algorithm to choose which sensors to use at each time step. To this end we use a rank revealing QR factorization algorithm to select sensors. Several numerical experiments are used to demonstrate the performance of the proposed scheme. / Dissertation/Thesis / Doctoral Dissertation Applied Mathematics 2015
|
5 |
Memory-aware Algorithms and Scheduling Techniques for Matrix Computattions / Algorithmes orientés mémoire et techniques d'ordonnancement pour le calcul matricielHerrmann, Julien 25 November 2015 (has links)
Dans cette thèse, nous nous sommes penchés d’un point de vue à la foisthéorique et pratique sur la conception d’algorithmes et detechniques d’ordonnancement adaptées aux architectures complexes dessuperordinateurs modernes. Nous nous sommes en particulier intéressésà l’utilisation mémoire et la gestion des communications desalgorithmes pour le calcul haute performance (HPC). Nous avonsexploité l’hétérogénéité des superordinateurs modernes pour améliorerles performances du calcul matriciel. Nous avons étudié lapossibilité d’alterner intelligemment des étapes de factorisation LU(plus rapide) et des étapes de factorisation QR (plus stablenumériquement mais plus deux fois plus coûteuses) pour résoudre unsystème linéaire dense. Nous avons amélioré les performances desystèmes d’exécution dynamique à l’aide de pré-calculs statiquesprenants en compte l’ensemble du graphe de tâches de la factorisationCholesky ainsi que l’hétérogénéité de l’architecture. Nous noussommes intéressés à la complexité du problème d’ordonnancement degraphes de tâches utilisant de gros fichiers d’entrée et de sortiesur une architecture hétérogène avec deux types de ressources,utilisant chacune une mémoire spécifique. Nous avons conçu denombreuses heuristiques en temps polynomial pour la résolution deproblèmes généraux que l’on avait prouvés NP-complet aupréalable. Enfin, nous avons conçu des algorithmes optimaux pourordonnancer un graphe de différentiation automatique sur uneplateforme avec deux types de mémoire : une mémoire gratuite maislimitée et une mémoire coûteuse mais illimitée. / Throughout this thesis, we have designed memory-aware algorithms and scheduling techniques suitedfor modern memory architectures. We have shown special interest in improving the performance ofmatrix computations on multiple levels. At a high level, we have introduced new numerical algorithmsfor solving linear systems on large distributed platforms. Most of the time, these linear solvers rely onruntime systems to handle resources allocation and data management. We also focused on improving thedynamic schedulers embedded in these runtime systems by adding static information to their decisionprocess. We proposed new memory-aware dynamic heuristics to schedule workflows, that could beimplemented in such runtime systems.Altogether, we have dealt with multiple state-of-the-art factorization algorithms used to solve linearsystems, like the LU, QR and Cholesky factorizations. We targeted different platforms ranging frommulticore processors to distributed memory clusters, and worked with several reference runtime systemstailored for these architectures, such as P A RSEC and StarPU. On a theoretical side, we took specialcare of modelling convoluted hierarchical memory architectures. We have classified the problems thatare arising when dealing with these storage platforms. We have designed many efficient polynomial-timeheuristics on general problems that had been shown NP-complete beforehand.
|
6 |
Algorithm-Architecture Co-Design for Dense Linear Algebra ComputationsMerchant, Farhad January 2015 (has links) (PDF)
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap-
plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod-
ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic
and architectural variation to significantly improve the computation efficiency. There
are two standard approaches for improving the computation effficiency, first, to perform
application-specific architecture customization and second, to do algorithmic tuning.
In the same manner, we first perform a graph-based analysis of selected DLA kernels.
From the various forms of parallelism, thus identified, we design a custom processing
element for improving the CPI. The processing elements are used as building blocks for
a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per-
forming detailed experiments on a synthesized CGRA implementation, we demonstrate
that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product.
DLA computations are encountered in many engineering and scientific computing ap-
plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem.
Traditionally, these applications are written in highly tuned High Performance Comput-
ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable
Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba-
sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK
are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual
efforts in development and tuning of these packages, there still exists a scope for fur-
ther tuning in this packages. In this thesis, we revisit most prominent and widely used
compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and
exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these
algorithms. Major contributions in the algorithmic tuning in this thesis are as follows:
Algorithms:
We present graph based analysis of General Matrix Multiplication (GMM) and
discuss different types of parallelisms available in GMM
We present analysis of Givens Rotation based QR factorization where we improve
GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a
column of a matrix simultaneously. We show that the multiplications in CGR are
lower than GR
We generalize CGR further and derive Generalized GR (GGR) that can annihilate
multiple elements of the columns of a matrix simultaneously. We show that the
parallelism exhibited by GGR is much higher than GR and Householder Transform
(HT)
We extend generalizations to Square root Free GR (also knows as Fast Givens
Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise
Fast Givens, and Column-wise SDFG . We also extend generalization for complex
matrices and derive Complex Column-wise Givens Rotation
Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the
last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible
for on-chip communication, while Compute Elements in the REDEFINE can be domain
customized to accelerate the applications pertaining to the domain of interest. In this
thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently.
We perform several architectural enhancements in the PE to approach lower bound of the
CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of
REDEFINE. We achieve better area and power performance compared to the yesteryear
customized architecture for DLA. Major contributions in architecture in this thesis are as follows:
Architecture:
We present design of a PE for acceleration of GMM which is a Level-3 BLAS
operation
We methodically enhance the PE with different features for improvement in the
performance of GMM
For efficient realization of Linear Algebra Package (LAPACK), we use PE that can
efficiently execute GMM and show better performance
For further acceleration of LU and QR factorizations in LAPACK, we identify
macro operations encountered in LU and QR factorizations, and realize them on a
reconfigurable data-path resulting in 25-30% lower run-time
|
Page generated in 0.084 seconds