Return to search

Memory and functional unit design for vector microprocessors

Modern mobile devices employ SIMD datapaths to exploit small scale data-level parallelism to achieve the performance required to process a continuously growing number of computation intensive applications within a severely energy constrained environment. The introduction of advanced SIMD features expands the applicability of vector ISA extensions from media and signal processing algorithms to general purpose code. Considering the high memory bandwidth demands and the complexity of execution units associated with those features, this dissertation focuses on two main areas of investigation, the efficient handling of parallel memory accesses and the optimization of vector functional units. A key observation, obtained from simulation based analysis on the type and frequency of memory access patterns exhibited by general purpose workloads, is the tendency of consecutive memory references to access the same page. Exploiting this and further observations, Page-Based Memory Access Grouping enables a level one data cache interface to utilize single-ported TLBs and cache banks to achieve performance similar to multi-ported components, while consuming significantly less energy. Page-Based Way Determination extends the proposed scheme with TLB-coupled structures holding way information on recently accessed lines. These structures improve the energy efficiency of the vast majority of memory references by enabling them to bypass tag-arrays and directly target individual cache ways. A vector benchmarking environment - comprised of a flexible ISA extension, a parameterizable simulation framework and a corresponding benchmark suite - is developed and utilized in the second part of this thesis to facilitate investigations into the design aspects and potential performance benefits of advanced SIMD features. Based on it, a set of microarchitecture optimizations is introduced, including techniques to compute hardware interpretable masks for segmented operations, partition scans to allow specific energy - performance trade-offs, re-use existing multiplexers to process predicated and segmented vectors, accelerate scans on incomplete vectors, efficiently handle micro-ops fully comprised of predicated elements, and reference multiple physical registers within individual operands to improve the utilization of the vector register file.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:605767
Date January 2014
CreatorsBoettcher, Matthias
ContributorsAl-Hashimi, Bashir
PublisherUniversity of Southampton
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://eprints.soton.ac.uk/365071/

Page generated in 0.0018 seconds