Global ETD Search

1	Transparent large-page support for Itanium linux Wienand, Ian Raymond, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) The abstraction provided by virtual memory is central to the operation of modern operating systems. Making the most efficient use of the available translation hardware is critical to achieving high performance. The multiple page-size support provided by almost all architectures promises considerable benefits but poses a number of implementation challenges. This thesis presents a minimally-invasive approach to transparent multiple page-size support for Itanium Linux. In particular, it examines the interaction between supporting large pages and Itanium's two inbuilt hardware page-table walkers; one being a virtual linear page-table with limited support for storing different page-size translations and the other a more flexible but higher overhead hash table based translation cache. Compared to a single-page-size kernel, a range of benchmarks show performance improvements when multiple page-sizes are available, generally large working sets that stress the TLB. However, other benchmarks are negatively impacted. Analysis shows that the increased TLB coverage, resulting from the use of large pages, frequently does not reduce TLB miss rates sufficiently to make up for the increased cost of TLB reloads. These results, which are specific to the Itanium architecture, suggest that large-page support for Itanium Linux is best enabled selectively with insight into application behaviour. Superpage Itanium Large-Page Linux
2	Transparent large-page support for Itanium linux Wienand, Ian Raymond, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) The abstraction provided by virtual memory is central to the operation of modern operating systems. Making the most efficient use of the available translation hardware is critical to achieving high performance. The multiple page-size support provided by almost all architectures promises considerable benefits but poses a number of implementation challenges. This thesis presents a minimally-invasive approach to transparent multiple page-size support for Itanium Linux. In particular, it examines the interaction between supporting large pages and Itanium's two inbuilt hardware page-table walkers; one being a virtual linear page-table with limited support for storing different page-size translations and the other a more flexible but higher overhead hash table based translation cache. Compared to a single-page-size kernel, a range of benchmarks show performance improvements when multiple page-sizes are available, generally large working sets that stress the TLB. However, other benchmarks are negatively impacted. Analysis shows that the increased TLB coverage, resulting from the use of large pages, frequently does not reduce TLB miss rates sufficiently to make up for the increased cost of TLB reloads. These results, which are specific to the Itanium architecture, suggest that large-page support for Itanium Linux is best enabled selectively with insight into application behaviour. Superpage Itanium Large-Page Linux
3	Kompiliatorių optimizavimas IA-64 architektūroje / Compiler optimizations on ia-64 architecture Varanavičius, Andrius 25 November 2010 (has links) Šiame darbe buvo išnagrinėtos Intel Itanium (IA-64) architektūros savybės, įtakojančios kompiliatoriaus generuojamą kodą, ir išanalizuotos kompiliatoriaus optimizacijos, kurios buvo pritaikytos IA-64 architektūrai. Buvo prieita prie išvados, kad tokias optimizacijas galima susiskirstyti į kelis tipus. Pirmiausia nuo architektūros priklausomos optimizacijos, kurių efektyvumą galima padidinti išnaudojant predikaciją ir prognozavimo savybes ar kitas IA-64 specifines savybes. Antra, nuo architektūros nepriklausomos tradicinės optimizacijos, kurių pertvarkomo kodo efektyvumą galima padidinti parenkant kitokius šias optimizacijas valdančius kompiliavimo parametrus. Tyrime buvo išnagrinėtos ciklų optimizacijos, kurių kodą galimą būtų pakeisti valdomais parametrais. Tyrimas parodė, kad iš tiesų įmanoma sugeneruoti efektyvesnį kodą Intel Itanium architektūroje, keičiant šių parametrų reikšmes nuo numatytųjų reikšmių. / This thesis deeply explored Intel Itanium architecture features that improve a code generated by compiler. Compiler optimizations which are tuned to this architecture are also described. Accomplished research showed that there were several types of optimizations which can be improved on IA-64 architecture. Firstly, optimizations which are dependent on architecture can be optimized using predication and speculation or other unique IA-64 features. Secondly, optimizations that are undependable from traditional architecture can be improved using more aggressive compilation controllable parameters than they are by default. Loop optimizations were chosen for final research. Research proved that changing values of these parameters from default can improve program performance. Itanium Ciklų optimizacijos Kompiliatorius Kompiliatoriaus optimizacijos GCC
4	Predicated execution and register windows for out-of-order processors Quiñones Moreno, Eduardo 18 November 2008 (has links) ISA extensions are a very powerful approach to implement new hardware techniques that require or benefit from compiler support: decisions made at compile time can be complemented at runtime, achieving a synergistic effect between the compiler and the processor. This thesis is focused on two ISA extensions: predicate execution and register windows. Predicate execution is exploited by the if-conversion compiler technique. If-conversion removes control dependences by transforming them to data dependences, which helps to exploit ILP beyond a single basic-block. Register windows help to reduce the amount of loads and stores required to save and restore registers across procedure calls by storing multiple contexts into a large architectural register file.In-order processors specially benefit from using both ISA extensions to overcome the limitations that control dependences and memory hierarchy impose on static scheduling. Predicate execution allows to move control dependence instructions past branches. Register windows reduce the amount of memory operations across procedure calls. Although if-conversion and register windows techniques have not been exclusively developed for in-order processors, their use for out-of-order processors has been studied very little. In this thesis we show that the uses of if-conversion and register windows introduce new performance opportunities and new challenges to face in out-of-order processors.The use of if-conversion in out-of-order processors helps to eliminate hard-to-predict branches, alleviating the severe performance penalties caused by branch mispredictions. However, the removal of some conditional branches by if-conversion may adversely affect the predictability of the remaining branches, because it may reduce the amount of correlation information available to the branch predictor. Moreover, predicate execution in out-of-order processors has to deal with two performance issues. First, multiple definitions of the same logical register can be merged into a single control flow, where each definition is guarded with a different predicate. Second, instructions whose guarding predicate evaluates to false consume unnecessary resources. This thesis proposes a branch prediction scheme based on predicate prediction that solves the three problems mentioned above. This scheme, which is built on top of a predicated ISA that implement a compare-and-branch model such as the one considered in this thesis, has two advantages: First, the branch accuracy is improved because the correlation information is not lost after if-conversion and the mechanism we propose permits using the computed value of the branch predicate when available, achieving 100% of accuracy. Second it avoids the predicate out-of-order execution problems.Regarding register windows, we propose a mechanism that reduces physical register requirements of an out-of-order processor to the bare minimum with almost no performance loss. The mechanism is based on identifying which architectural registers are in use by current in-flight instructions. The registers which are not in use, i.e. there is no in-flight instruction that references them, can be early released.In this thesis we propose a very efficient and low-cost hardware implementation of predicate execution and register windows that provide important benefits to out-of-order processors. ISA-extensions out-of-order itanium predicate prediction register windows predicated execution 004
5	Kompiliatorių optimizavimas IA-64 architektūroje / Compiler optimizations on ia-64 architecture Valiukas, Tadas 01 July 2014 (has links) Tradicinės x86 architektūros spartinimui artėjant prie galimybių ribos, kompanija Intel pradėjo kurti naują IA-64 architektūrą, paremtą EPIC – išreikštinai lygiagrečiai vykdomomis instrukcijomis vieno takto metu. Ši pagrindinė savybė leidžia vykdyti iki šešių instrukcijų per vieną taktą. Taipogi architektūra pasižymi tokiomis savybėmis, kurios leido efektyviai spręsti su kodo optimizavimu susijusias problemas tradicinėse architektūrose. Tačiau kompiliatorių optimizavimo algoritmai ilgą laiką buvo tobulinami tradicinėse architektūrose, todėl norint išnaudoti naująją architektūrą, reikia ieškoti būdų tobulinti esamus kompiliatorius. Vienas iš būdų – kompiliatoriaus vidinių parametrų atsakingų už optimizacijas reikšmių pritaikymas IA-64. Būtent toks yra šio darbo tikslas, kuriam pasiekti reikia išnagrinėti IA-64 savybes, jas vėliau eksperimentiškai taikyti realaus kodo pavyzdžiuose bei įvertinti jų įtaką kodo vykdymo spartai. Pagal gautus rezultatus nagrinėjami kompiliatoriaus vidiniai parametrai ir su specialia kompiliatorių testavimo programa randamas geriausias reikšmių rinkinys šiai architektūrai. Vėliau šis rinkinys išbandomas su taikomosiomis programomis. Gauto parametrų rinkinio reikšmės turėtų leisti generuoti efektyvesnį kodą IA-64 architektūrai. / After performance optimization of traditional architectures began to reach their limits, Intel corporation started to develop new architecture based on EPIC – Explicitly Parallel Instruction Counting. This main feature allowed up to six instructions to be executed in single CPU cycle. Also this architecture includes more features, which allowed efficient solution of traditional architectures code optimization problems. However for long time code optimization algorithms have been improved for traditional architectures only, as a result those algorithms should be adopted to new architecture. One of the ways to do that – exploration of internal compilers parameters, which are responsible for code optimizations. That is the primary target of this work and in order to reach it the features of the IA-64 architecture and impact to execution performance must be explored using real-life code examples. Tests results may be used later for internal parameters selection and further exploration of these parameters values by using special compiler performance testing benchmarks. The set of those new values could be tested with real life applications in order to prove efficiency of IA-64 architecture features. IA-64 architektūra Itanium Predikacija Išankstinis duomenų užkrovimas Ciklų optimizavimas Valdymo spėjimas IA-64 architecture VLIW – Very Long Instruction Word Predication Control and data speculation Software pipelining Branch prediction Prefetching RSE – rotating register engine Gcc optimization

1

Page generated in 0.0435 seconds