Return to search

A High Performance Register Allocator for Vector Architectures with a Unified Register-Set

This thesis describes a compiler optimization targeted for machines with unified, vector-based register sets. This optimization combines register allocation and instruction scheduling. It examines places where the code performs computations on scalar variables. The goal is to identify instances where the same operation is performed. For example, a program might calculate ¡§base+offset¡¨ and then calculate ¡§i+j¡¨. Even though these computations are unrelated, yet they use the same operator; if ¡§base¡¨ and ¡§i¡¨ are packed into one vector register, while ¡§offset¡¨ and ¡§j¡¨ are packed into another, then these two computations can be performed simultaneously through the vectors¡¦ parallel addition operation. This would reduce the execution time of the compiled code.
Although other researchers have considered similar packing methods, their work has been limited by the hardware that they were studying. Such hardware usually imposed high costs for moving data between scalar and vector register banks. This present thesis, however, considers a novel hardware architecture that imposes no such costs. As a consequence, we are able to obtain significant speedups.
The architecture that we consider is a Graphics Processing Unit (GPU) for embedded systems that is under development at this university. This GPU has a single register set for integers, float, and vectors.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0629112-150235
Date29 June 2012
CreatorsSu, Yu-Dan
ContributorsTsung-Chuan Huang, Shen-Fu Hsiao, Steve W. Haga, Chung-Nan Lee
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0629112-150235
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0016 seconds