Global ETD Search

91	Linguagem e compilador para o paradigma orientado a notificações (PON): avanços e comparações Ferreira, Cleverson Avelino 28 August 2015 (has links) Atuais paradigmas correntes de programação de software, mais precisamente o Paradigma Imperativo (PI) e o Paradigma Declarativo (PD), apresentam deficiências que afetam o desempenho das aplicações e a obtenção de “desacoplamento” (ou acoplamento mínimo) entre elementos de software. Com o objetivo de amenizar essas deficiências, foi desenvolvido o Paradigma Orientado a Notificações (PON). O PON se inspira nos conceitos do PI (e.g. objetos) e do PD (e.g. base de fatos e regras), mas altera a essência da execução ou inferência lógica-causal. Basicamente, o PON usa objetos para tratar de fatos e regras na forma de composições de outros objetos menores que, entretanto, apresentam características comportamentais de certa autonomia, independência, reatividade e colaboração por meio de notificações pontuais para fins de inferência. Isto dito, salienta-se que a materialização dos conceitos do PON se deu por meio de um arquétipo ou Framework elaborado em linguagem de programação C++. Tal materialização do PON vem sendo utilizada como uma alternativa para o desenvolvimento de aplicações sob o domínio desse paradigma e possibilitou, de fato, a criação de aplicações para ambientes computacionais usuais baseados na chamada arquitetura Von Neumann. Apesar destas contribuições para com a sua materialização, o desenvolvimento de aplicações no PON ainda não apresentava resultados satisfatórios em termos de desempenho tal qual deveria a luz do seu cálculo assintótico, nem a facilidade de programação que seria uma das suas características principais. Nesse âmbito, o presente trabalho propõe como evolução para o estado da técnica do PON a criação de uma linguagem e compilador para o paradigma. Sendo assim, este trabalho apresenta a definição da linguagem criada com a utilização de exemplos práticos guiados pelo desenvolvimento de aplicações. Subsequentemente são apresentados detalhes do compilador bem como sua estrutura. Para demonstrar a evolução do estado da técnica do paradigma, no tocante a desempenho (e.g. tempo de processamento) e facilidade de programação foram realizados estudos comparativos com a utilização da linguagem e compilador. Os estudos comparativos foram guiados com a elaboração de dois softwares denominados aplicação Mira ao Alvo e aplicação de Vendas. Essas aplicações foram desenvolvidas com base na linguagem PON e foram realizados experimentos simulando sequências de execução com o intuito de avaliar o tempo de processamento para o resultado gerado pelo compilador PON. Ainda, tais experimentos possibilitaram a avaliação de maneira subjetiva da linguagem de programação PON no tocante a facilidade de programação. Deste modo, foi possível observar com tais estudos comparativos que os resultados apresentados pelo compilador PON foram satisfatórios quando comparados aos resultados obtidos pelo Framework e por aplicações equivalentes desenvolvidas baseadas no Paradigma Orientado a Objetos (POO). / The current software development paradigms, specifically the Imperative Paradigm (IP) and the Declarative Paradigm (DP), have weaknesses that affect the applications performance and decoupling (or minimal coupling) between the software modules. In order to provide a solution regarding these weaknesses, the Notification Oriented Paradigm (NOP) was developed. NOP is inspired by the concepts of the IP (e.g. objects) and DP (e.g. base of facts and Rules). Basically, NOP uses objects to deal with facts and Rules as compositions of other, smaller, objects. These objects have the following behavioral characteristics: autonomy, independence, responsiveness and collaboration through notifications. Thus, it’s highlighted that the realization of these concepts was firstly instantiated through a Framework developed in C++. Such NOP materialization has been used as an alternative for Application development in the domain of this paradigm and made possible, in fact, the creation of applications for typical computing environments based on Von Neumann architecture. The development of the C++ materialization of NOP has not presented satisfactory results in terms of performance as it should when taking into account its asymptotic calculation and programming facility. In this context, this work presents an evolution of NOP by creating a specific programming language, and its respective compiler, for this paradigm. Therefore, this work presents the language definition and the details of the development of its compiler. To evaluate the evolution regarding to performance (e.g. processing time) and programming facility, some comparative studies using the NOP language and compiler are presented. These comparative studies were performed by developing two software applications called Target and Sales Application. These applications have been developed based on NOP language, and the experiments were performed simulating sequences of execution in order to evaluate the processing time for the generated results by NOP compiler. Still, these experiments allowed the evaluation of NOP programming language, in a subjective way, regarding to ease programming. Thus, with such comparative studies, it was possible to observe that the results presented by the compiler NOP were satisfactory when compared to the results achieved via Framework and for equivalent applications developed based on the Oriented Object Paradigm (OOP). Compiladores (Programas de computador) Software - Desenvolvimento Métodos de simulação Computação Compilers (Computer programs) Computer software - Development C++ (Computer program language) Simulation methods Computer science
92	Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm Alle, Mythri 12 1900 (has links) (PDF) Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational workloads that demand both flexibility and performance. CGRAs comprise a set of computation elements interconnected using a network and this interconnection of computation elements is referred to as a reconfigurable fabric. The size of application that can be accommodated on the reconfigurable fabric is limited by the size of instruction buffers associated with each Compute element. When an application cannot be accommodated entirely, application is partitioned such that each of these partitions can be executed on the reconfigurable fabric. These partitions are scheduled by an orchestrator. The orchestrator employs dynamic dataflow execution paradigm. Dynamic dataflow execution paradigm has inherent support for synchronization and helps in exploitation of parallelism that exists across application partitions. In this thesis, we present a compiler that targets such CGRAs. The compiler presented in this thesis is capable of accepting applications specified in C89 standard. To enable architectural design space exploration, the compiler is designed such that it can be customized for several instances of CGRAs employing dataflow execution paradigm at the orchestrator. This can be achieved by specifying the appropriate configuration parameters to the compiler. The focus of this thesis is to provide efficient support for various kinds of parallelism while ensuring correctness. The compiler is designed to support fine-grained task level parallelism that exists across iterations of loops and function calls. Additionally, compiler can also support pipeline parallelism, where a loop is split into multiple stages that execute in a pipelined manner. The prototype compiler, which targets multiple instances of a CGRA, is demonstrated in this thesis. We used this compiler to target multiple variants of CGRAs employing dataflow execution paradigm. We varied the reconfigur-able fabric, orchestration mechanism employed, size of instruction buffers. We also choose applications from two different domains viz. cryptography and linear algebra. The execution time of the CGRA (the best among all instances) is compared against an Intel Quad core processor. Cryptography applications show a performance improvement ranging from more than one order of magnitude to close to two orders of magnitude. These applications have large amounts of ILP and our compiler could successfully expose the ILP available in these applications. Further, the domain customization also played an important role in achieving good performance. We employed two custom functional units for accelerating Cryptography applications and compiler could efficiently use them. In linear algebra kernels we observe multiple iterations of the loop executing in parallel, effectively exploiting loop-level parallelism at runtime. Inspite of this we notice close to an order of magnitude performance degradation. The reason for this degradation can be attributed to the use of non-pipelined floating point units, and the delays involved in accessing memory. Pipeline parallelism was demonstrated using this compiler for FFT and QR factorization. Thus, the compiler is capable of efficiently supporting different kinds of parallelism and can support complete C89 standard. Further, the compiler can also support different instances of CGRAs employing dataflow execution paradigm. Reconfigurable Fabric Dataflow Execution Compilers (Computer Programs) Computer Architecture Reconfigurable Architectures Run Time Reconfigurable Platform Runtime Reconfigurable Platform Runtime Reconfigurable Hardware Coarse Grained Computation Computer Science
93	Tiling Stencil Computations To Maximize Parallelism Bandishti, Vinayaka Prakasha 12 1900 (has links) (PDF) Stencil computations are iterative kernels often used to simulate the change in a discretized spatial domain overtime (e.g., computational fluid dynamics) or to solve for unknowns in a discretized space by converging to a steady state (i.e., partial differential equations).They are commonly found in many scientific and engineering applications. Most stencil computations allow tile-wise concurrent start ,i.e., there exists a face of the iteration space and a set of tiling hyper planes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. Loop tiling is a key transformation used to exploit both data locality and parallelism from stencils simultaneously. Numerous works exist that target improving locality, controlling frequency of synchronization, and volume of communication wherever applicable. But, concurrent start-up of tiles that evidently translates into perfect load balance and often reduction in frequency of synchronization is completely ignored. Existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load balance whenever possible. We ﬁrst provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then discuss an iterative approach to find such hyperplanes. It is not possible to directly apply automatic tiling techniques to periodic stencils because of the wrap-around dependences in them. To overcome this, we use iteration space folding techniques as a pre-processing stage after which our technique can be applied without any further change. We have implemented our techniques on top of Pluto-a source-level automatic parallelizer. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-speciﬁc stencil code generator by 4% to2 x, and previous compiler techniques by a factor of 1.5x to 15x. For the swim benchmark from SPECFP2000, we achieve an .improvement of 5.12 x on a 12-core Intel Westmere and 2.5x on a 16-core AMD Magny-Cours machines, over the auto-parallelizer of Intel C Compiler. Stencil Computations Concurrent Start-Up Tiling Hyperplanes Periodic Stencils Compilers (Computer Programs) Multiprocessors Computer Architecture Parallelism (Computer Architecture) Tiling Stencil Computations Automatic Parallelizers Computer Science

Page generated in 0.0813 seconds