Global ETD Search

1	Design of Unified Arithmetic Units for 3D Graphics Vertex Shader Lin, Wei-Sen 02 September 2008 (has links) Vertex shader, one of the core parts in 3D graphics systems, is to speed up the operations of coordinate transformation and lighting in 3D graphics pipeline, and vector ALU is the key part of a vertex shader. This thesis proposes several unified architectures that integrate the floating-point vector arithmetic unit and special function unit in order to share some hardware resource. We propose three different architectures for the design of the unified vector ALU. The first architecture includes a single-instruction-multiple-data (SIMD) vector arithmetic unit, and uses table-based method with first-order approximation to calculate some special functions. The second architecture use higher-order approximation to reduce the table sizes and share the floating-point multipliers in the SIMD vector unit. The proposed third architecture has two copies of hardware that can compute two dot-product operations in parallel and thus increase the throughput of the matrix computation by a factor of two. Furthermore, the two dot-product units can be used to perform the interpolation for special function calculation. Vertex Shader higher-order approximation throughput of the matrix computation
2	Application of dependence analysis and runtime data flow graph scheduling to matrix computations Chan, Ernie W., 1982- 23 November 2010 (has links) We present a methodology for exploiting shared-memory parallelism within matrix computations by expressing linear algebra algorithms as directed acyclic graphs. Our solution involves a separation of concerns that completely hides the exploitation of parallelism from the code that implements the linear algebra algorithms. This approach to the problem is fundamentally different since we also address the issue of programmability instead of strictly focusing on parallelization. Using the separation of concerns, we present a framework for analyzing and developing scheduling algorithms and heuristics for this problem domain. As such, we develop a theory and practice of scheduling concepts for matrix computations in this dissertation. / text Matrix computation Directed acyclic graph Algorithm-by-blocks
3	Efficient Matrix-aware Relational Query Processing in Big Data Systems Yongyang Yu (5930462) 03 January 2019 (has links) <div>In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data exploration tasks ranging from business intelligence and bioinformatics to self-driving cars. In these domains, a number of queries are composed of various kinds of operators, such as relational operators for preprocessing input data, and machine learning models for complex analysis. Usually, these learning methods heavily rely on matrix computations. As a result, it is imperative to develop novel query processing approaches and systems that are aware of big matrix data and corresponding operators, scale to clusters of hundreds of machines, and leverage distributed memory for high-performance computation. This dissertation introduces and studies several matrix-aware relational query processing strategies, analyzes and optimizes their performance.</div><div><br></div><div><div>The first contribution of this dissertation is MatFast, a matrix computation system for efficiently processing and optimizing matrix-only queries in a distributed in-memory environment. We introduce a set of heuristic rules to rewrite special features of a matrix query for less memory footprint, and cost models to estimate the sparsity of sparse matrix multiplications, and to distribute the matrix data partitions among various compute workers for a communication-efficient execution. We implement and test the query processing strategies in an open-source distributed dataflow</div><div>engine (Apache Spark).</div></div><div><br></div><div><div>In the second contribution of this dissertation, we extend MatFast to MatRel, where we study how to efficiently process queries that involve both matrix and relational operators. We identify a series of equivalent transformation rules to rewrite a logical plan when both relational and matrix operations are present. We introduce selection, projection, aggregation, and join operators over matrix data, and propose optimizations to reduce computation overhead. We also design a cost model to distribute matrix data among various compute workers for communication-efficient</div><div>evaluation of relational join operations.</div></div><div><br></div><div><div>In the third and last contribution of this dissertation, we demonstrate how to leverage MatRel for optimizing complex matrix-aware relational query evaluation pipelines. Especially, we showcase how to efficiently learn model parameters for deep neural networks of various applications with MatRel, e.g., Word2Vec.</div></div> Applied Computer Science big data query optimization matrix computation distributed computing
4	Structured numerical problems in contemporary applications Sustik, Mátyás Attila 31 October 2013 (has links) The presence of structure in a computational problem can often be exploited and can lead to a more efficient numerical algorithm. In this dissertation, we look at structured numerical problems that arise from applications in wireless communications and machine learning that also impact other areas of scientific computing. In wireless communication system designs, certain structured matrices (frames) need to be generated. The design of such matrices is equivalent to a symmetric inverse eigenvalue problem where the values of the diagonal elements are prescribed. We present algorithms that are capable of generating a larger set of these constructions than previous algorithms. We also discuss the existence of equiangular tight frames---frames that satisfy additional structural properties. Kernel learning is an important class of problems in machine learning. It often relies on efficient numerical algorithms that solve underlying convex optimization problems. In our work, the objective functions to be minimized are the von Neumann and the LogDet Bregman matrix divergences. The algorithm that solves this optimization problem performs matrix updates based on repeated eigendecompositions of diagonal plus rank-one matrices in the case of von Neumann matrix divergence, and Cholesky updates in case of the LogDet Bregman matrix divergence. Our contribution exploits the low-rank representations and the structure of the constraint matrices, resulting in more efficient algorithms than previously known. We also present two specialized zero-finding algorithms where we exploit the structure through the shape and exact formulation of the objective function. The first zero-finding task arises during the matrix update step which is part of the above-mentioned kernel learning application. The second zero-finding problem is for the secular equation; it is equivalent to the computation of the eigenvalues of a diagonal plus rank-one matrix. The secular equation arises in various applications, the most well-known is the divide-and-conquer eigensolver. In our solutions, we build upon a somewhat forgotten zero-finding method by P. Jarratt, first described in 1966. The method employs first derivatives only and needs the same amount of evaluations as Newton's method, but converges faster. Our contributions are the more efficient specialized zero-finding algorithms. / text Matrix computation Inverse eigenvalue problem Equiangular frame Bregman divergence Zero-finding Divide-and-conquer eigensolver
5	Higher-order generalized singular value decomposition : comparative mathematical framework with applications to genomic signal processing Ponnapalli, Sri Priya 03 December 2010 (has links) The number of high-dimensional datasets recording multiple aspects of a single phenomenon is ever increasing in many areas of science. This is accompanied by a fundamental need for mathematical frameworks that can compare data tabulated as multiple large-scale matrices of di erent numbers of rows. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. This thesis addresses this limitation and de fines a higher-order GSVD (HO GSVD) of N > 2 datasets, that provides a mathematical framework that can compare multiple high-dimensional datasets tabulated as large-scale matrices of different numbers of rows. / text Tensor and matrix computation Cell cycle DNA microarrays
6	Speeding up matrix computation kernels by sharing vector coprocessor among multiple cores on chip Dahlberg, Christopher January 2012 (has links) Today’s computer systems develop towards less energy consumption while keeping high performance. These are contradictory requirement and pose a great challenge. A good example of an application were this is used is the smartphone. The constraints are on long battery time while getting high performance required by future 2D/3D applications. A solution to this is heterogeneous systems that have components that are specialized in different tasks and can execute them fast with low energy consumption. These could be specialized i.e. encoding/decoding, encryption/decryption, image processing or communication. At the apartment of Computer Architecture and Parallel Processing Laboratory (CAPPL) at New Jersey Institute of Technology (NJIT) a vector co-processor has been developed. The Vector co-processor has the unusual feature of being able to receive instructions from multiple hosts (scalar cores). In addition to this a test system with a couple of scalar processors using the vector processor has been developed. This thesis describes this processor and its test system. It also shows the development of math applications involving matrix operations. This results in the conclusions of the vector co-processing saving substantial amount of energy while speeding up the execution of the applications. In addition to this the thesis will describe an extension of the vector co-processor design that makes it possible to monitor the throughput of instructions and data in the processor. Coprocessor Speeding-up Accelerator Power efficiency Shared resources Vector processor Multi-core on chip Matrix algorithm Matrix computation kernels
7	Diffraction électromagnétique par des réseaux et des surfaces rugueuses aléatoires : mise en œuvre deméthodes hautement efficaces pour la résolution de systèmes aux valeurs propres et de problèmesaux conditions initiales / Electromagnetic scattering by gratings and random rough surfaces : implementation of high performance algorithms for solving eigenvalue problems and problems with initial conditions Pan, Cihui 02 December 2015 (has links) Dans cette thèse, nous étudions la diffraction électromagnétique par des réseau et surfaces rugueuse aléatoire. Le méthode C est une méthode exacte développée pour ce but. Il est basé sur équations de Maxwell sous forme covariante écrite dans un système de coordonnées non orthogonal. Le méthode C conduisent à résoudre le problème de valeur propre. Le champ diffusé est expansé comme une combinaison linéaire des solutions propres satisfaisant à la condition d’onde sortant.Nous nous concentrons sur l’aspect numérique de la méthode C, en essayant de développer une application efficace de cette méthode exacte. Pour les réseaux, nous proposons une nouvelle version de la méthode C qui conduit `a un système différentiel avec les conditions initiales. Nous montrons que cette nouvelle version de la méthode C peut être utilisée pour étudier les réseaux de multicouches avec un médium homogène.Nous vous proposons un algorithme QR parallèle conçu spécifiquement pour la méthode C pour résoudre le problème de valeurs propres. Cet algorithme QR parallèle est une variante de l’algorithme QR sur la base de trois tech- niques: “décalage rapide”, poursuite de renflement parallèle et de dégonflage parallèle agressif précoce (AED). / We study the electromagnetic diffraction by gratings and random rough surfaces. The C-method is an exact method developed for this aim. It is based on Maxwell’s equations under covariant form written in a nonorthogonal coordinate system. The C-method leads to an eigenvalue problem, the solution of which gives the diffracted field.We focus on the numerical aspect of the C-method, trying to develop an efficient application of this exact method. For gratings, we have developed a new version of C-method which leads to a differential system with initial conditions. This new version of C-method can be used to study multilayer gratings with homogeneous medium.We implemented high performance algorithms to the original versions of C-method. Especially, we have developed a specifically designed parallel QR algorithm for the C- method and spectral projection method to solve the eigenvalue problem more efficiently. Experiments have shown that the computation time can be reduced significantly. Physique des ondes Méthodes numériques Calcul matriciel Physics of waves Numerical methods Matrix Computation
8	Analysis of Algorithms for Star Bicoloring and Related Problems Jones, Jeffrey S. 25 August 2015 (has links) No description available. Applied Mathematics Computer Science star bicoloring acyclic bicoloring Jacobian matrix computation approximation algorithms greedy star bicoloring Jacobian matrix optimization greedy optimization methods
9	Fast Order Basis and Kernel Basis Computation and Related Problems Zhou, Wei 28 November 2012 (has links) In this thesis, we present efficient deterministic algorithms for polynomial matrix computation problems, including the computation of order basis, minimal kernel basis, matrix inverse, column basis, unimodular completion, determinant, Hermite normal form, rank and rank profile for matrices of univariate polynomials over a field. The algorithm for kernel basis computation also immediately provides an efficient deterministic algorithm for solving linear systems. The algorithm for column basis also gives efficient deterministic algorithms for computing matrix GCDs, column reduced forms, and Popov normal forms for matrices of any dimension and any rank. We reduce all these problems to polynomial matrix multiplications. The computational costs of our algorithms are then similar to the costs of multiplying matrices, whose dimensions match the input matrix dimensions in the original problems, and whose degrees equal the average column degrees of the original input matrices in most cases. The use of the average column degrees instead of the commonly used matrix degrees, or equivalently the maximum column degrees, makes our computational costs more precise and tighter. In addition, the shifted minimal bases computed by our algorithms are more general than the standard minimal bases. polynomial matrix computation algorithm complexity computer algebra order basis kernel basis linear system solving matrix inverse determinant unimodular completion Popov form Hermite form column reduced form GCD matrix GCD rank rank profile column basis Computer Science
10	Fast Order Basis and Kernel Basis Computation and Related Problems Zhou, Wei 28 November 2012 (has links) In this thesis, we present efficient deterministic algorithms for polynomial matrix computation problems, including the computation of order basis, minimal kernel basis, matrix inverse, column basis, unimodular completion, determinant, Hermite normal form, rank and rank profile for matrices of univariate polynomials over a field. The algorithm for kernel basis computation also immediately provides an efficient deterministic algorithm for solving linear systems. The algorithm for column basis also gives efficient deterministic algorithms for computing matrix GCDs, column reduced forms, and Popov normal forms for matrices of any dimension and any rank. We reduce all these problems to polynomial matrix multiplications. The computational costs of our algorithms are then similar to the costs of multiplying matrices, whose dimensions match the input matrix dimensions in the original problems, and whose degrees equal the average column degrees of the original input matrices in most cases. The use of the average column degrees instead of the commonly used matrix degrees, or equivalently the maximum column degrees, makes our computational costs more precise and tighter. In addition, the shifted minimal bases computed by our algorithms are more general than the standard minimal bases. polynomial matrix computation algorithm complexity computer algebra order basis kernel basis linear system solving matrix inverse determinant unimodular completion Popov form Hermite form column reduced form GCD matrix GCD rank rank profile column basis Computer Science

Search results