Global ETD Search

1	Explicitly Staged Software Pipelining Thaller, Wolfgang 08 1900 (has links) <p> Software Pipelining is a method of instruction scheduling where loops are scheduled more efficiently by executing operations from more than one iteration of the loop in parallel. Finding an optimal software pipelined schedule is NP-complete, but many heuristic algorithms exist. </p> In iteration i, a software pipelined loop will execute, in parallel, "stage" 1 of iteration i, stage 2 of iteration i- 1 and so on until stage k of iteration i-k+l. </p> <p> We present a new approach to software pipelining based on using a hemistic algorithm to explicitly assign each operation to its stage before the actual scheduling. </p> <p> This explicit assignment allows us to implement control flow mechanisms that are hard to implement with traditional methods of software pipelining, which do not give us direct control over what stages instructions are assigned to. </p> / Thesis / Master of Science (MSc) software pipelining optimal software pipeline schedule computing
2	Implementa??o da t?cnica de software pipelining na rede em chip IPNoSyS Medeiros, Aparecida Lopes de 21 February 2014 (has links) Made available in DSpace on 2014-12-17T15:48:10Z (GMT). No. of bitstreams: 1 AparecidaLM_DISSERT.pdf: 8059053 bytes, checksum: a243ee0772a785a00c8a0670955a7cae (MD5) Previous issue date: 2014-02-21 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Alongside the advances of technologies, embedded systems are increasingly present in our everyday. Due to increasing demand for functionalities, many tasks are split among processors, requiring more efficient communication architectures, such as networks on chip (NoC). The NoCs are structures that have routers with channel point-to-point interconnect the cores of system on chip (SoC), providing communication. There are several networks on chip in the literature, each with its specific characteristics. Among these, for this work was chosen the Integrated Processing System NoC (IPNoSyS) as a network on chip with different characteristics compared to general NoCs, because their routing components also accumulate processing function, ie, units have functional able to execute instructions. With this new model, packets are processed and routed by the router architecture. This work aims at improving the performance of applications that have repetition, since these applications spend more time in their execution, which occurs through repeated execution of his instructions. Thus, this work proposes to optimize the runtime of these structures by employing a technique of instruction-level parallelism, in order to optimize the resources offered by the architecture. The applications are tested on a dedicated simulator and the results compared with the original version of the architecture, which in turn, implements only packet level parallelism / Com os avan?os tecnol?gicos os sistemas embarcados est?o cada vez mais presentes em nosso cotidiano. Devido a crescente demanda por funcionalidades, as fun??es s?o distribu?das entre os processadores, demandando arquiteturas de comunica??o mais eficientes, como as redes em chip (Network-on-Chip - NoC). As NoCs s?o estruturas que possuem roteadores com canais ponto-a-ponto que interconectam os cores do SoC (System-on-Chip), provendo comunica??o. Existem diversas redes em chip na literatura, cada uma com suas caracter?sticas espec?ficas. Dentre essas, para este trabalho foi a escolhida a IPNoSyS (Integrated Processing NoC System) por ser uma rede em chip com caracter?sticas diferenciadas em rela??o ?s NoCs em geral, pois seus componentes de roteamento acumulam tamb?m a fun??o de processamento, ou seja, possuem unidades funcionais capazes de executar instru??es. Com esse novo modelo, pacotes s?o processados e roteados pela arquitetura do roteador. Este trabalho visa melhorar o desempenho das aplica??es que possuem repeti??o, pois essas aplica??es gastam um tempo maior na sua execu??o, o que se d? pela repetida execu??o de suas instru??es. Assim, este trabalho prop?e otimizar o tempo de execu??o dessas estruturas, atrav?s do emprego de uma t?cnica de paralelismo em n?vel de instru??es, visando melhor aproveitar os recursos oferecidos pela arquitetura. As aplica??es s?o testadas em um simulador dedicado, e seus resultados comparados com a vers?o original da arquitetura, a qual prov? paralelismo apenas em n?vel de pacotes
3	Dynamic execution prediction and pipeline balancing of streaming applications Aleen, Farhana Afroz 30 August 2010 (has links) The number and scope of data driven streaming applications is growing. Such streaming applications are promising targets for effectively utilizing multi-cores because of their inherent amenability to pipelined parallelism. While existing methods of orchestrating streaming programs on multi-cores have mostly been static, real-world applications show ample variations in execution time that may cause the achieved speedup and throughput to be sub-optimal. One of the principle challenges for moving towards dynamic pipeline balancing has been the lack of approaches that can predict upcoming dynamic variations in execution efficiently, well before they occur. In this thesis, we propose an automated dynamic execution behavior prediction approach based on compiler analysis that can be used to efficiently estimate the time to be spent in different pipeline stages for upcoming inputs. Our approach first uses dynamic taint analysis to automatically generate an input-based execution characterization of the streaming program, which identifies the key control points where variation in execution might occur with respect to the associated input elements. We then automatically generate a light-weight emulator from the program using this characterization that can predict the execution paths taken for new streaming inputs and provide execution time estimates and possible dynamic variations. The main challenge in devising such an approach is the essential trade-off between accuracy and overhead of dynamic analysis. We present experimental evidence that our technique can accurately and efficiently estimate dynamic execution behaviors for several benchmarks with a small error rate. We also showed that the error rate could be lowered with the trade-off of execution overhead by implementing a selective symbolic expression generation for each of the complex conditions of control-flow operations. Our experiments show that dynamic pipeline balancing using our predicted execution behavior can achieve considerably higher speedup and throughput along with more effective utilization of multi-cores than static balancing approaches. Dynamic scheduling Software pipelining Multimedia systems Computers, Pipeline Pipelining (Electronics)
4	Des réseaux de processus cyclo-statiques à la génération de code pour le pipeline multi-dimensionnel / From Cyclo-Static Process Networks to Code Generation for Multidimensional Software Pipelining Fellahi, Mohammed 22 April 2011 (has links) Les applications de flux de données sont des cibles importantes de l’optimisation de programme en raison de leur haute exigence de calcul et la diversité de leurs domaines d’application: communication, systèmes embarqués, multimédia, etc. L’un des problèmes les plus importants et difficiles dans la conception des langages de programmation destinés à ce genre d’applications est comment les ordonnancer à grain fin à fin d’exploiter les ressources disponibles de la machine.Dans cette thèse on propose un "framework" pour l’ordonnancement à grain fin des applications de flux de données et des boucles imbriquées en général. Premièrement on essaye de paralléliser le nombre maximum de boucles en appliquant le pipeline logiciel. Après on merge le prologue et l’épilogue de chaque boucle (phase) parallélisée pour éviter l’augmentation de la taille du code. Ce processus est un pipeline multidimensionnel, quelques occurrences (ou instructions) sont décalées par des iterations de la boucle interne et d’autres occurrences (instructions) par des iterationsde la boucle externe. Les expériences montrent que l’application de cette technique permet l’amélioration des performances, extraction du parallélisme sans augmenter la taille du code, à la fois dans le cas des applications de flux des donnée et des boucles imbriquées en général. / Applications based on streams, ordered sequences of data values, are important targets of program optimization because of their high computational requirements and the diversity of their application domains: communication, embedded systems, multimedia, etc. One of the most important and difficult problems in special purpose stream language design and implementation is how to schedule these applications in a fine-grain way to exploit available machine resources In this thesis we propose a framework for fine-grain scheduling of streaming applications and nested loops in general. First, we try to pipeline steady state phases (inner loops), by finding the repeated kernel pattern, and executing actor occurrences in parallel as much as possible. Then we merge the kernel prolog and epilog of pipelined phases to move them out of the outer loop. Merging the kernel prolog and epilog means that we shift acotor occurrences, or instructions, from one phase iteration to another and from one outer loop iteration to another, a multidimensional shifting. Experimental shows that our framwork can imporove perfomance, prallelism extraction without increasing the code size, in streaming applications and nested loops in general. Applications de flux de données Boucles imbriquées Pipeline logiciel Ordonnancement multidimentionnel Streaming applications Nested loops Multidimensional scheduling Software Pipelining
5	Spill Code Minimization And Buffer And Code Size Aware Instruction Scheduling Techniques Nagarakatte, Santosh G 08 1900 (has links) Instruction scheduling and Software pipelining are important compilation techniques which reorder instructions in a program to exploit instruction level parallelism. They are essential for enhancing instruction level parallelism in architectures such as very Long Instruction Word and tiled processors. This thesis addresses two important problems in the context of these instruction reordering techniques. The first problem is for general purpose applications and architectures, while the second is for media and graphics applications for tiled and multi-core architectures. The first problem deals with software pipelining which is an instruction scheduling technique that overlaps instructions from multiple iterations. Software pipelining increases the register pressure and hence it may be required to introduce spill instructions. In this thesis, we model the problem of register allocation with optimal spill code generation and scheduling in software pipelined loops as a 0-1 integer linear program. By minimizing the amount of spill code produced, the formulation ensures that the initiation interval (II) between successive iterations of the loop is not increased unnecessarily. Experimental results show that our formulation performs better than the existing heuristics by preventing an increase in the II and also generating less spill code on average among loops extracted from Perfect Club and SPEC benchmarks. The second major contribution of the thesis deals with the code size aware scheduling of stream programs. Large scale synchronous dataflow graphs (SDF’s) and StreamIt have emerged as powerful programming models for high performance streaming applications. In these models, a program is represented as a dataflow graph where each node represents an autonomous filter and the edges represent the channels through which the nodes communicate. In constructing static schedules for programs in these models, it is important to optimize the execution time buffer requirements of the data channel and the space required to store the encoded schedule. Earlier approaches have either given priority to one of the requirements or proposed ad-hoc methods for generating schedules with good trade-offs. In this thesis, we propose a genetic algorithm framework based on non-dominated sorting for generating serial schedules which have good trade-off between code size and buffer requirement. We extend the framework to generate software pipelined schedules for tiled architectures. From our experiments, we observe that the genetic algorithm framework generates schedules with good trade-off and performs better than the earlier approaches. Compilers Parallel Processing Instruction Scheduling Software Pipelining Spill Code Scheduling Stream Programs - Scheduling Genetic Algorithms Integer Linear Programming Software Pipelined Loops Register Allocation Computer Science
6	Suporte especializado de hardware para geração automática de loop pipelining em FPGAS Souza, Guilherme Stefano Silva de 19 November 2014 (has links) Submitted by Daniele Amaral (daniee_ni@hotmail.com) on 2016-09-13T20:06:59Z No. of bitstreams: 1 DissGSSS.pdf: 12761989 bytes, checksum: 9e4c2b4e76a2502af072064ed081eec1 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-15T13:34:53Z (GMT) No. of bitstreams: 1 DissGSSS.pdf: 12761989 bytes, checksum: 9e4c2b4e76a2502af072064ed081eec1 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-15T13:35:23Z (GMT) No. of bitstreams: 1 DissGSSS.pdf: 12761989 bytes, checksum: 9e4c2b4e76a2502af072064ed081eec1 (MD5) / Made available in DSpace on 2016-09-15T13:35:30Z (GMT). No. of bitstreams: 1 DissGSSS.pdf: 12761989 bytes, checksum: 9e4c2b4e76a2502af072064ed081eec1 (MD5) Previous issue date: 2014-11-19 / Não recebi financiamento / Loop pipelining is a technique that may offer significant performance improvements, being employed not only in conventional compilation targeting microprocessors, but also by High Level Synthesis (HLS) tools, targeting heterogeneous architectures and hardware accelerators. This work presents a specialized hardware support aiming at facilitate compilation tasks for HLS tools, along with potential advantages in execution performance and total silicon area employed. Two specialized hardware modules are presented: a queue register file and an instruction predication control module. / O desempenho na execução de programas, que é cada vez mais uma prioridade, pode ter uma melhora significativa por meio do uso de paralelismo em nível de instrução (ILP). Uma técnica que utiliza o ILP e propicia ganhos de desempenho significativos é o loop pipelining, sendo usado não apenas por compiladores para microprocessadores, mas também por ferramentas de Síntese de Alto Nível (HLS), visando arquiteturas heterogêneas e aceleradores de hardware. Neste trabalho é apresentado o projeto e implementação de estruturas de hardware especializadas, objetivando-se em solucionar o problema de sobreposição de valores que ocorre no loop pipelining, facilitar tarefas de compilaçãoo em ferramentas HLS e diminuir a repetição de código. Além disso, ganhos potenciais de desempenho e área de silício total podem ser alcançados como resultado do uso das estruturas propostas. Serão apresentados: um arquivo de registradores baseado em filas e um módulo de controle para a execução de instruções predicadas. Loop Pipelining Software Pipelining Queued Register File Modulo Scheduling Register Files Queue Predicated Instructions Escalonamento de Módulo QRF Arquivos de Registradores Filas Instruções Predicadas
7	Efficient Resource Usage Modelling Ramanan, V Janaki 04 1900 (has links) (PDF) No description available. Compilers Resource Usage Models Group Automation Model Dynamic Collision Matrix Software Pipelining Group Automation Approach Resource Usage Modelling Instruction Scheduling Method Computer Science
8	Kompiliatorių optimizavimas IA-64 architektūroje / Compiler optimizations on ia-64 architecture Valiukas, Tadas 01 July 2014 (has links) Tradicinės x86 architektūros spartinimui artėjant prie galimybių ribos, kompanija Intel pradėjo kurti naują IA-64 architektūrą, paremtą EPIC – išreikštinai lygiagrečiai vykdomomis instrukcijomis vieno takto metu. Ši pagrindinė savybė leidžia vykdyti iki šešių instrukcijų per vieną taktą. Taipogi architektūra pasižymi tokiomis savybėmis, kurios leido efektyviai spręsti su kodo optimizavimu susijusias problemas tradicinėse architektūrose. Tačiau kompiliatorių optimizavimo algoritmai ilgą laiką buvo tobulinami tradicinėse architektūrose, todėl norint išnaudoti naująją architektūrą, reikia ieškoti būdų tobulinti esamus kompiliatorius. Vienas iš būdų – kompiliatoriaus vidinių parametrų atsakingų už optimizacijas reikšmių pritaikymas IA-64. Būtent toks yra šio darbo tikslas, kuriam pasiekti reikia išnagrinėti IA-64 savybes, jas vėliau eksperimentiškai taikyti realaus kodo pavyzdžiuose bei įvertinti jų įtaką kodo vykdymo spartai. Pagal gautus rezultatus nagrinėjami kompiliatoriaus vidiniai parametrai ir su specialia kompiliatorių testavimo programa randamas geriausias reikšmių rinkinys šiai architektūrai. Vėliau šis rinkinys išbandomas su taikomosiomis programomis. Gauto parametrų rinkinio reikšmės turėtų leisti generuoti efektyvesnį kodą IA-64 architektūrai. / After performance optimization of traditional architectures began to reach their limits, Intel corporation started to develop new architecture based on EPIC – Explicitly Parallel Instruction Counting. This main feature allowed up to six instructions to be executed in single CPU cycle. Also this architecture includes more features, which allowed efficient solution of traditional architectures code optimization problems. However for long time code optimization algorithms have been improved for traditional architectures only, as a result those algorithms should be adopted to new architecture. One of the ways to do that – exploration of internal compilers parameters, which are responsible for code optimizations. That is the primary target of this work and in order to reach it the features of the IA-64 architecture and impact to execution performance must be explored using real-life code examples. Tests results may be used later for internal parameters selection and further exploration of these parameters values by using special compiler performance testing benchmarks. The set of those new values could be tested with real life applications in order to prove efficiency of IA-64 architecture features. IA-64 architektūra Itanium Predikacija Išankstinis duomenų užkrovimas Ciklų optimizavimas Valdymo spėjimas IA-64 architecture VLIW – Very Long Instruction Word Predication Control and data speculation Software pipelining Branch prediction Prefetching RSE – rotating register engine Gcc optimization

Search results