Global ETD Search

11	Pthreads and OpenMP : A performance and productivity study Swahn, Henrik January 2016 (has links) Today most computer have a multicore processor and are depending on parallel execution to be able to keep up with the demanding tasks that exist today, that forces developers to write software that can take advantage of multicore systems. There are multiple programming languages and frameworks that makes it possible to execute the code in parallel on different threads, this study looks at the performance and effort required to work with two of the frameworks that are available to the C programming language, POSIX Threads(Pthreads) and OpenMP. The performance is measured by paralleling three algorithms, Matrix multiplication, Quick Sort and calculation of the Mandelbrot set using both Pthreads and OpenMP, and comparing first against a sequential version and then the parallel version against each other. The effort required to modify the sequential program using OpenMP and Pthreads is measured in number of lines the final source code has. The results shows that OpenMP does perform better than Pthreads in Matrix Multiplication and Mandelbrot set calculation but not on Quick Sort because OpenMP has problem with recursion and Pthreads does not. OpenMP wins the effort required on all the tests but because there is a large performance difference between OpenMP and Pthreads on Quick Sort OpenMP cannot be recommended for paralleling Quick Sort or other recursive programs. OpenMP Pthreads Algorithms Performance Productivity Quick Sort Matrix Multiplication Mandelbrot Set
12	DSP Platform Benchmarking : DSP Platform Benchmarking Xinyuan, Luo January 2009 (has links) <p><p>Benchmarking of DSP kernel algorithms was conducted in the thesis on a DSP processor for teaching in the course TESA26 in the department of Electrical Engineering. It includes benchmarking on cycle count and memory usage. The goal of the thesis is to evaluate the quality of a single MAC DSP instruction set and provide suggestions for further improvement in instruction set architecture accordingly. The scope of the thesis is limited to benchmark the processor only based on assembly coding. The quality check of compiler is not included. The method of the benchmarking was proposed by BDTI, Berkeley Design Technology Incorporations, which is the general methodology used in world wide DSP industry.</p><p>Proposals on assembly instruction set improvements include the enhancement of FFT and DCT. The cycle cost of the new FFT benchmark based on the proposal was XX% lower, showing that the proposal was right and qualified. Results also show that the proposal promotes the cycle cost score for matrix computing, especially matrix multiplication. The benchmark results were compared with general scores of single MAC DSP processors offered by BDTI.</p></p> DSP benchmark benchmarking FFT IFFT DCT blocktransfer Single FIR Filter Matrix Multiplication Crosscorrelation Electrical engineering Elektroteknik
13	DSP Platform Benchmarking : DSP Platform Benchmarking Xinyuan, Luo January 2009 (has links) Benchmarking of DSP kernel algorithms was conducted in the thesis on a DSP processor for teaching in the course TESA26 in the department of Electrical Engineering. It includes benchmarking on cycle count and memory usage. The goal of the thesis is to evaluate the quality of a single MAC DSP instruction set and provide suggestions for further improvement in instruction set architecture accordingly. The scope of the thesis is limited to benchmark the processor only based on assembly coding. The quality check of compiler is not included. The method of the benchmarking was proposed by BDTI, Berkeley Design Technology Incorporations, which is the general methodology used in world wide DSP industry. Proposals on assembly instruction set improvements include the enhancement of FFT and DCT. The cycle cost of the new FFT benchmark based on the proposal was XX% lower, showing that the proposal was right and qualified. Results also show that the proposal promotes the cycle cost score for matrix computing, especially matrix multiplication. The benchmark results were compared with general scores of single MAC DSP processors offered by BDTI. DSP benchmark benchmarking FFT IFFT DCT blocktransfer Single FIR Filter Matrix Multiplication Crosscorrelation Electrical engineering Elektroteknik
14	Σχεδιασμός και ανάλυση αλγορίθμων προσέγγισης με μητρώα χαμηλής τάξης / Algorithms for fast matrix computations Ζούζιας, Αναστάσιος 24 January 2012 (has links) Στόχος της εργασίας είναι η μελέτη πιθανοτικών αλγορίθμων για προσεγγιστική επίλυση προβλημάτων του επιστημονικού υπολογισμού. Τα προβλήματα τα οποία θα μας απασχολήσουν είναι ο πολλαπλασιασμός μητρών, ο υπολογισμός της διάσπασης ιδιαζουσών τιμών (SVD) ενός μητρώου και ο υπολογισμός μιας "συμπιεσμένης" διάσπασης ενός μητρώου. / - 512.943 4 Sampling Singular value decomposition Randomized algorithms Matrix multiplication
15	Nonnegative matrix factorization for clustering Kuang, Da 27 August 2014 (has links) This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods. Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments. Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents. Nonnegative matrix factorization Cluster analysis Hierarchical clustering Cancer subtype discovery GPU computing Sparse matrix multiplication
16	A planilha eletrônica como recurso didático: um exemplo com multiplicação de matrizes Rocha, Josimar Moreira 16 August 2014 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-02-19T16:07:51Z No. of bitstreams: 1 josimarmoreirarocha.pdf: 2068448 bytes, checksum: 720b45220091d2a8fe3fb2740d4b4cf9 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-02-26T13:50:01Z (GMT) No. of bitstreams: 1 josimarmoreirarocha.pdf: 2068448 bytes, checksum: 720b45220091d2a8fe3fb2740d4b4cf9 (MD5) / Made available in DSpace on 2016-02-26T13:50:01Z (GMT). No. of bitstreams: 1 josimarmoreirarocha.pdf: 2068448 bytes, checksum: 720b45220091d2a8fe3fb2740d4b4cf9 (MD5) Previous issue date: 2014-08-16 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O propósito deste trabalho é dissertar sobre as possibilidades do uso da planilha eletrônica como recurso didático no ensino de Matemática. Para exemplificar esse uso, é apresentada uma sequência didática sobre multiplicação de matrizes, abordada sob a perspectiva de transformações no plano. Este trabalho tem um duplo caráter: mostrar que a planilha eletrônica é uma eficiente ferramenta a ser utilizada na intermediação do processo de ensino e aprendizagem de diversos conteúdos e sugerir uma abordagem diferenciada do tópico multiplicação de matrizes no Ensino Médio. É um trabalho de natureza teórica, fundamentado na pesquisa da bibliografia disponível sobre o assunto. Espera-se que sua leitura estimule o uso da planilha eletrônica em sala de aula e, em particular, que ela forneça elementos para o tratamento significativo da multiplicação de matrizes. De uma maneira mais geral, a importância deste trabalho está em promover a inserção das planilhas eletrônicas no contexto escolar, reconhecendo seu potencial pedagógico no ensino de Matemática e no ensino de outras áreas do conhecimento. / The purpose of this work is to discuss the possibilities of using the electronic spreadsheet as a teaching resource in the teaching of Mathematics. To illustrate this use, a didactic sequence on matrix multiplication is presented, approached from the perspective of changes in the plan. This work has a dual character: to show that the electronic spreadsheet is an efficient tool to be used in the intermediation of the teaching and learning process of several contents and suggest a differentiated approach to the matrix multiplication topic in high school. It is a work of theoretical nature, based on the research of the literature available on the subject. It is expected that its reading will stimulate the use of the electronic spreadsheet in the classroom and, in particular, that it will provide evidence for the significant treatment of matrix multiplication. In a more general way, the importance of this work is to promote the integration of electronic spreadsheets in the school context, recognizing its pedagogical potential in the teaching of mathematics and in the teaching of other areas of knowledge. Planilha Eletrônica Transformação Linear Multiplicação de Matrizes Spreadsheet Linear Transformation Matrix Multiplication
17	Adaptation of Legacy Codes to Context-Aware Composition using Aspect-Oriented Programming for Data Representation Conversion Sotsenko, Alisa January 2013 (has links) Different computation problem domains such as sorting, matrix multiplication, etc. usually require different data representations and algorithms variants implementations in order to be adapted and re-designed to context-aware composition (CAC). Context-aware composition is a technique for the design of applications that can adapt its behavior according to changes in the program. We considered two application domains: matrix multiplication and graph algorithms (DFS algorithm in particular). The main problem in the implementation of the representation mechanisms applied in these problem domains is time spent on the data representation conversion that in the end should not influence the application performance. This thesis work presents a flexible aspect-based architecture that includes the data structure representation adaptation in order to reduce implementation efforts required for adaptation different application domains. Although, manual approach has small overhead 4-10% for different problems compared to the AOP-based approach, experiments show that the manual adaptation to CAC requires on average three times more programming effort in terms of lines of code than AOP-based approach. Moreover, the AOP-based approach showed the average speed-up over baseline algorithms that use standard data structures of 2.1. aspect-oriented programing context-aware composition matrix multiplication graph algorithm data representation Software Engineering Programvaruteknik
18	Algebraic geometry for tensor networks, matrix multiplication, and flag matroids Seynnaeve, Tim 08 January 2021 (has links) This thesis is divided into two parts, each part exploring a different topic within the general area of nonlinear algebra. In the first part, we study several applications of tensors. First, we study tensor networks, and more specifically: uniform matrix product states. We use methods from nonlinear algebra and algebraic geometry to answer questions about topology, defining equations, and identifiability of uniform matrix product states. By an interplay of theorems from algebra, geometry, and quantum physics we answer several questions and conjectures posed by Critch, Morton and Hackbusch. In addition, we prove a tensor version of the so-called quantum Wielandt inequality, solving an open problem regarding the higher-dimensional version of matrix product states. Second, we present new contributions to the study of fast matrix multiplication. Motivated by the symmetric version of matrix multiplication we study the plethysm S^k(sl_n) of the adjoint representation sl_n of the Lie group SL_n . Moreover, we discuss two algebraic approaches for constructing new tensors which could potentially be used to prove new upper bounds on the complexity of matrix multiplication. One approach is based on the highest weight vectors of the aforementioned plethysm. The other approach uses smoothable finite-dimensional algebras. Finally, we study the Hessian discriminant of a cubic surface, a recently introduced invariant defined in terms of the Waring rank. We express the Hessian discriminant in terms of fundamental invariants. This answers Question 15 of the 27 questions on the cubic surface posed by Bernd Sturmfels. In the second part of this thesis, we apply algebro-geometric methods to study matroids and flag matroids. We review a geometric interpretation of the Tutte polynomial in terms of the equivariant K-theory of the Grassmannian. By generalizing Grassmannians to partial flag varieties, we obtain a new invariant of flag matroids: the flag-geometric Tutte polynomial. We study this invariant in detail, and prove several interesting combinatorial properties. info:eu-repo/classification/ddc/500 ddc:500
19	Nové Odhady pro Kombinatorických Problémů a Kvazi-Grayových Kódů / New Bounds for Combinatorial Problems and Quasi-Gray Codes Das, Debarati January 2019 (has links) This thesis consists of two parts. In part I, a group of combinatorial problems pertaining to strings, boolean matrices and graphs is studied. For given two strings x and y, their edit distance is the minimum number of character insertions, deletions and substitutions required to convert x into y. In this thesis we provide an algorithm that computes a constant approximation of edit distance in truly sub-quadratic time. Based on the provided ideas, we construct a separate sub- quadratic time algorithm that can find an occurrence of a pattern P in a given text T while allowing a few edit errors. Afterwards we study the boolean matrix multiplication (BMM) problem where given two boolean matrices, the aim is to find their product over boolean semi-ring. For this problem, we present two combinatorial models and show in these models BMM requires Ω(n3 /2O( √ log n) ) and Ω(n7/3 /2O( √ log n) ) work respectively. Furthermore, we also give a construction of a sparse sub-graph that preserves the distance between a designated source and any other vertex as long as the total weight increment of all the edges is bounded by some constant. In part II, we study the efficient construction of quasi-Gray codes. We give a construction of space optimal quasi-Gray codes over odd sized alphabets with read complexity 4...
20	Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication Ahmed, Salman, Houser, Jennifer, Hoque, Mohammad A., Raju, Rezaul, Pfeiffer, Phil 01 July 2017 (has links) Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results. communication overhead MPI communication parallel computing performance analysis scalability sparse matrix-matrix multiplication Computing

Search results