In this thesis, we have evaluated the memory characteristics and parallel behaviour of the SUSAN (Smallest Univalue Segment Assimilating Nucleus) and Harris corner detection algorithms. Our purpose is understanding how the memory affects the predictability of these algorithms and furthermore how we can use multi-core machines to improve the execution time of such algorithms. By investigating the execution pattern of the SUSAN and Harris corner detection algorithms, we were able of breaking down the algorithms into parallelizable parts and non-parallelizable parts. We implemented a fork-join model on the parallelizable parts of these two algorithms and we were able to achieve a 7.9--8 times speedup on the two corner detection algorithms using an 8-core P4080 machine. For the sake of a wider study, we also executed these parallel adaptations on 4 different Intel platforms which generated similar results. The parallelized algorithms are also subjects for further improvement. We therefore investigated the memory characteristics of L1 data and instruction cache misses, cycles waiting for L2 cache miss loads, and TLB store misses. In these measurements, we found a strong correlation between L1 data cache replacement and the execution time. To encounter this memory issue, we implemented loop tiling techniques which were adjusted according to the L1 cache size of our test systems. Our tests of the tiling techniques exhibit a less fluctuating memory behaviour, which however comes at the cost of an increase in the execution time.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:mdh-35842 |
Date | January 2017 |
Creators | Sääf, André, Samuelsson, Alvin |
Publisher | Mälardalens högskola, Akademin för innovation, design och teknik, Mälardalens högskola, Akademin för innovation, design och teknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0015 seconds