Global ETD Search

1	Instruction scheduling optimizations for energy efficient VLIW processors Porpodas, Vasileios January 2013 (has links) Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitioning the shared hardware resources (e.g., register file). Such designs further increase the challenges of instruction scheduling since the compiler has the additional tasks of deciding on the placement of the instructions to the corresponding clusters and orchestrating the data movements across clusters. In this thesis we propose instruction scheduling optimizations for energy-efficient VLIW processors. Some of the techniques aim at improving the existing state-of-theart scheduling techniques, while others aim at using compiler techniques for closing the gap between lightweight hardware designs and more complex ones. Each of the proposed techniques target individual features of energy efficient VLIW architectures. Our first technique, called Aligned Scheduling, makes use of a novel scheduling heuristic for hiding memory latencies in lightweight VLIW processors without hardware load-use interlocks (Stall-On-Miss). With Aligned Scheduling, a software-only technique, a SOM processor coupled with non-blocking caches can better cope with the cache latencies and it can perform closer to the heavyweight designs. Performance is improved by up to 20% across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The rest of the techniques target a class of VLIW processors known as clustered VLIWs, that are more scalable and more energy efficient and operate at higher frequencies than their monolithic counterparts. The second scheme (LUCAS) is an improved scheduler for clustered VLIW processors that solves the problem of the existing state-of-the-art schedulers being very susceptible to the inter-cluster communication latency. The proposed unified clustering and scheduling technique is a hybrid scheme that performs instruction by instruction switching between the two state-of-the-art clustering heuristics, leading to better scheduling than either of them. It generates better performing code compared to the state-of-the-art for a wide range of inter-cluster latency values on the Mediabench II benchmarks. The third technique (called CAeSaR) is a scheduler for clustered VLIW architectures that minimizes inter-cluster communication by local caching and reuse of already received data. Unlike dynamically scheduled processors, where this can be supported by the register renaming hardware, in VLIWs it has to be done by the code generator. The proposed instruction scheduler unifies cluster assignment, instruction scheduling and communication minimization in a single unified algorithm, solving the phase ordering issues between all three parts. The proposed scheduler shows an improvement in execution time of up to 20.3% and 13.8% on average across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The last technique, applies to heterogeneous clustered VLIWs that support dynamic voltage and frequency scaling (DVFS) independently per cluster. In these processors there are no hardware interlocks between clusters to honor the data dependencies. Instead, the scheduler has to be aware of the DVFS decisions to guarantee correct execution. Effectively controlling DVFS, to selectively decrease the frequency of clusters with slack in their schedule, can lead to significant energy savings. The proposed technique (called UCIFF) solves the phase ordering problem between frequency selection and scheduling that is present in existing algorithms. The results show that UCIFF produces better code than the state-of-the-art and very close to the optimal across the Mediabench II benchmarks. Overall, the proposed instruction scheduling techniques lead to either better efficiency on existing designs or allow simpler lightweight designs to be competitive against ones with more complex hardware.
2	Kompiliatorių optimizavimas IA-64 architektūroje / Compiler optimizations on ia-64 architecture Valiukas, Tadas 01 July 2014 (has links) Tradicinės x86 architektūros spartinimui artėjant prie galimybių ribos, kompanija Intel pradėjo kurti naują IA-64 architektūrą, paremtą EPIC – išreikštinai lygiagrečiai vykdomomis instrukcijomis vieno takto metu. Ši pagrindinė savybė leidžia vykdyti iki šešių instrukcijų per vieną taktą. Taipogi architektūra pasižymi tokiomis savybėmis, kurios leido efektyviai spręsti su kodo optimizavimu susijusias problemas tradicinėse architektūrose. Tačiau kompiliatorių optimizavimo algoritmai ilgą laiką buvo tobulinami tradicinėse architektūrose, todėl norint išnaudoti naująją architektūrą, reikia ieškoti būdų tobulinti esamus kompiliatorius. Vienas iš būdų – kompiliatoriaus vidinių parametrų atsakingų už optimizacijas reikšmių pritaikymas IA-64. Būtent toks yra šio darbo tikslas, kuriam pasiekti reikia išnagrinėti IA-64 savybes, jas vėliau eksperimentiškai taikyti realaus kodo pavyzdžiuose bei įvertinti jų įtaką kodo vykdymo spartai. Pagal gautus rezultatus nagrinėjami kompiliatoriaus vidiniai parametrai ir su specialia kompiliatorių testavimo programa randamas geriausias reikšmių rinkinys šiai architektūrai. Vėliau šis rinkinys išbandomas su taikomosiomis programomis. Gauto parametrų rinkinio reikšmės turėtų leisti generuoti efektyvesnį kodą IA-64 architektūrai. / After performance optimization of traditional architectures began to reach their limits, Intel corporation started to develop new architecture based on EPIC – Explicitly Parallel Instruction Counting. This main feature allowed up to six instructions to be executed in single CPU cycle. Also this architecture includes more features, which allowed efficient solution of traditional architectures code optimization problems. However for long time code optimization algorithms have been improved for traditional architectures only, as a result those algorithms should be adopted to new architecture. One of the ways to do that – exploration of internal compilers parameters, which are responsible for code optimizations. That is the primary target of this work and in order to reach it the features of the IA-64 architecture and impact to execution performance must be explored using real-life code examples. Tests results may be used later for internal parameters selection and further exploration of these parameters values by using special compiler performance testing benchmarks. The set of those new values could be tested with real life applications in order to prove efficiency of IA-64 architecture features. IA-64 architektūra Itanium Predikacija Išankstinis duomenų užkrovimas Ciklų optimizavimas Valdymo spėjimas IA-64 architecture VLIW – Very Long Instruction Word Predication Control and data speculation Software pipelining Branch prediction Prefetching RSE – rotating register engine Gcc optimization

Search results

Instruction scheduling optimizations for energy efficient VLIW processors

Kompiliatorių optimizavimas IA-64 architektūroje / Compiler optimizations on ia-64 architecture