Return to search

Detection and exploitation of data-parallelism in assignments of multi-dimensional tensors

This thesis studies data-parallelism in tensor assignments. Building on an existent domain specific language for tensor calculations developed at the Chair of Compiler Construction, an extension is proposed to detect so called compatible statements, which describe when a statement is data-parallel. Using a type system, the correctness is shown and a conjecture about the optimality is proposed. By applying the extension, two optimizations for exploiting the data-parallelism are described. These optimizations reduce the memory usage for computation, therefore reducing cache misses and improving runtime. The speedup which can be gained mostly depends on the complexity of the kernel and the size of the tensors. For simple kernels like multiplication of a vector with a scalar or elementwise multiplication of two vectors, a speedup of up to 2x can be achieved. For more complex kernels like a kernel containing matrix-matrix multiplication, the speed difference is a few percent. Additionally, a kernel called interpolation consisting of incompatible statements is analysed to check whether a similar optimization can be applied. The result is that while the optimization does not result in a speedup, similar improvements might be possible. Finally, in order to gain a better understanding of what effect the optimizations might have, different kernels are analysed regarding the data size at which parallelism makes sense.:1 Introduction
1.1 Parallelization
1.2 Existing DSLs and compilers
2 Background
2.1 Tensors and tensor operations
2.2 A language for tensor manipulation
3 Compatible statements
3.1 Detecting compatible statements
4 Extension of the DSL
5 Correctness of the extension
6 Performance evaluation
6.1 Copy vs. in-place (avoid-copy)
6.2 Other variable vs. in-place (reduce-cache-miss)
6.2.1 Explanation of the optimization
6.2.2 Measuring the impact
6.3 Memory reusing for incompatible statements
7 Evaluation of data sizes for parallelization
8 Summary
9 Outlook
Appendices

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:31972
Date22 October 2018
CreatorsUllrich, Til Jasper
ContributorsRink, Norman, Castrillon, Jeronimo, Vogler, Heiko, Technische Universität Dresden
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typedoc-type:bachelorThesis, info:eu-repo/semantics/bachelorThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds