Global ETD Search

331	Uma linguagem para especificação de fluxo de execução em aplicações paralelas / A specification language for execution flow in parallel applications Enomoto, Cristina 22 August 2005 (has links) Orientador: Marco Aurelio Amaral Henriques / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-05T12:56:47Z (GMT). No. of bitstreams: 1 Enomoto_Cristina_M.pdf: 856279 bytes, checksum: ce524a49db0f67734e28d8458d5deb0b (MD5) Previous issue date: 2005 / Resumo: Vários sistemas de grid e computação distribuída existentes só permitem a execução de aplicações com um fluxo de execução de tarefas básico, no qual é feita a distribuição das tarefas executadas em paralelo e depois a coleta de seus resultados. Outros sistemas permitem definir uma relação de dependências entre as tarefas, formando um grafo direcionado acíclico. Porém, mesmo com este modelo de fluxo de execução não é possível executar vários tipos de aplicações que poderiam ser paralelizadas, como, por exemplo, algoritmos genéticos e de cálculo numérico que utilizam algum tipo de processamento iterativo. Nesta dissertação é proposta uma linguagem de especificação para fluxo de execução de aplicações paralelas que permite um controle de fluxo de tarefas mais flexível, viabilizando desvios condicionais e laços com iterações controladas. A linguagem é baseada na notação XML (eXtensible Markup Language), o que lhe confere características importantes tais como flexibilidade e simplicidade. Para avaliar estas e outras características da linguagem proposta, foi feita uma implementação sobre o sistema de processamento paralelo JoiN. Além de viabilizar a criação e execução de novas aplicações paralelas cujos fluxos de tarefas contêm laços e/ou desvios condicionais, a linguagem se mostrou simples de usar e não causou sobrecarga perceptível ao sistema paralelo / Abstract: Many distributed and parallel systems allow only a basic task flow, in which the parallel tasks are distributed and their results collected. In some systems the application execution flow gives support to a dependence relationship among tasks, represented by a directed acyclic graph. Even with this model it is not possible to execute in parallel some important applications as, for example, genetic algorithms. Therefore, there is a need for a new specification model with more sophisticated flow controls that allow some kind of iterative processing at the level of task management. The purpose of this work is to present a proposal for a specification language for parallel application execution workflow, which provides new types of control structures and allows the implementation of a broader range of applications. This language is based on XML (eXtensible Markup Language) notation, which provides characteristics like simplicity and flexibility to the proposed language. To evaluate these and other characteristics of the language, it was implemented on the JoiN parallel processing system. Besides allowing the creation and execution of new parallel applications containing task flows with loops and conditional branches, the proposedlanguage was easy to use and did not cause any significant overhead to the parallel system / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Programação paralela (Computação) Processamento paralelo (Computadores) Fluxo de trabalho Computational grids (Computer systems) Parallel programming Parallel processing Workflow
332	Explorando memoria transacional em software nos contextos de arquiteturas assimetricas, jogos computacionais e consumo de energia / Exploiting software transactional memory in the context of asymmetric architectures Baldassin, Alexandro José 15 August 2018 (has links) Orientador: Paulo Cesar Centoducatte / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-15T20:59:38Z (GMT). No. of bitstreams: 1 Baldassin_AlexandroJose_D.pdf: 1596141 bytes, checksum: 265aa763c420b69f70d59ff687bd8ad9 (MD5) Previous issue date: 2009 / Resumo: A adoção dos microprocessadores com múltiplos núcleos de execução pela indústria semicondutora tem criado uma crescente necessidade por novas linguagens, metodologias e ferramentas que tornem o desenvolvimento de sistemas concorrentes mais rápido, eficiente e acessível aos programadores de todos os níveis. Uma das principais dificuldades em programação concorrente com memória compartilhada é garantir a correta sincronização do código, evitando assim condições de corrida que podem levar o sistema a um estado inconsistente. A sincronização tem sido tradicionalmente realizada através de métodos baseados em travas, reconhecidos amplamente por serem de difícil uso e pelas anomalias causadas. Um novo mecanismo, conhecido como memória transacional (TM), tem sido alvo de muita pesquisa recentemente e promete simplificar o processo de sincronização, além de possibilitar maior oportunidade para extração de paralelismo e consequente desempenho. O cerne desta tese é formado por três trabalhos desenvolvidos no contexto dos sistemas de memória transacional em software (STM). Primeiramente, apresentamos uma implementação de STM para processadores assimétricos, usando a arquitetura Cell/B.E. como foco. Como principal resultado, constatamos que o uso de sistemas transacionais em arquiteturas assimétricas também é promissor, principalmente pelo fator escalabilidade. No segundo trabalho, adotamos uma abordagem diferente e sugerimos um sistema de STM especialmente voltado para o domínio de jogos computacionais. O principal motivo que nos levou nesta direção é o baixo desempenho das implementações atuais de STM. Um estudo de caso conduzido a partir de um jogo complexo mostra a eficácia do sistema proposto. Finalmente, apresentamos pela primeira vez uma caracterização do consumo de energia de um sistema de STM considerado estado da arte. Além da caracterização, também propomos uma técnica para redução do consumo em casos de alta contenção. Resultados obtidos a partir dessa técnica revelam ganhos de até 87% no consumo de energia / Abstract: The shift towards multicore processors taken by the semiconductor industry has initiated an era in which new languages, methodologies and tools are of paramount importance to the development of efficient concurrent systems that can be built in a timely way by all kinds of programmers. One of the main obstacles faced by programmers when dealing with shared memory programming concerns the use of synchronization mechanisms so as to avoid race conditions that could possibly lead the system to an inconsistent state. Synchronization has been traditionally achieved by means of locks (or variations thereof), widely known by their anomalies and hard-to-get-it-right facets. A new mechanism, known as transactional memory (TM), has recently been the focus of a lot of research and shows potential to simplify code synchronization as well as delivering more parallelism and, therefore, better performance. This thesis presents three works focused on different aspects of software transactional memory (STM) systems. Firstly, we show an STM implementation for asymmetric processors, focusing on the architecture of Cell/B.E. As an important result, we find out that memory transactions are indeed promising for asymmetric architectures, specially due to their scalability. Secondly, we take a different approach to STM implementation by devising a system specially targeted at computer games. The decision was guided by poor performance figures usually seen on current STM implementations. We also conduct a case study using a complex game that effectively shows the system's efficiency. Finally, we present the energy consumption characterization of a state-of-the-art STM for the first time. Based on the observed characterization, we also propose a technique aimed at reducing energy consumption in highly contended scenarios. Our results show that the technique is indeed effective in such cases, improving the energy consumption by up to 87% / Doutorado / Sistemas de Computação / Doutor em Ciência da Computação Memória transacional Programação paralela (Computação) Arquitetura de computador Estimativa de potência Transactional memory Parallel programming (Computer science) Computer architecture Power estimation
333	Implementação computacional paralela da homogeneização por expansão assintótica para análise de problemas mecânicos em 3D Quintela, Bárbara de Melo 31 January 2011 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-03-03T14:15:37Z No. of bitstreams: 1 barbarademeloquintela.pdf: 17938706 bytes, checksum: 9ab0cb4d4226bdefe7051c92e73feec9 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-06T20:15:36Z (GMT) No. of bitstreams: 1 barbarademeloquintela.pdf: 17938706 bytes, checksum: 9ab0cb4d4226bdefe7051c92e73feec9 (MD5) / Made available in DSpace on 2017-03-06T20:15:36Z (GMT). No. of bitstreams: 1 barbarademeloquintela.pdf: 17938706 bytes, checksum: 9ab0cb4d4226bdefe7051c92e73feec9 (MD5) Previous issue date: 2011-01-31 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico / FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas Gerais / A Homogeneização por Expansão Assintótica (HEA) é uma técnica multiescala empregada ao cálculo de propriedades efetivas de meios contínuos com estrutura periódica. As principais vantagens desta técnica são a redução do tamanho do problema a resolver e a possibilidade de se empregar uma propriedade homogeneizada que guarda informações da microestrutura heterogênea. Quando associada ao Método dos Elementos Finitos (MEF), a HEA demanda o emprego de malhas que permitam a imposição de condições de contorno periódicas – sendo portanto necessário especificar tal particularidade quando da geração dos modelos em MEF. Tais modelos representam as células periódicas, que são volumes representativos do meio heterogêneo e, em alguns casos, apresentam uma complexidade geométrica e física que torna imprescindível o emprego de malhas com alto grau de refinamento – levando a um custo computacional significativo. Este trabalho tem por objetivo a obtenção de um programa em Elementos Finitos para a aplicação da HEA à Elasticidade em 3D, empregando técnicas de programação paralela. Foram desenvolvidas versões do programa em 2D: uma sequencial em C e duas paralelas empregando OpenMP e CUDA. Foi implementado com sucesso o programa HEA3D em uma versão sequencial, em linguagem FORTRAN e uma paralela, empregando OpenMP. Para validação dos programas, foram analisadas células periódicas bifásicas e os resultados apresentaram boa concordância com valores experimentais e numéricos disponíveis na literatura. A versão paralela obteve expressivos ganhos de desempenho, com acelerações de desempenho de até 5.3 vezes em relação a versão sequencial. / The Asymptotic Expansion Homogenization (AEH) is a multiscale technique applied to estimate the effective properties of heterogeneous media with periodical structure. The main advantages of this technique are the reduction of the problem size to be solved and the ability to employ an homogenized property that keeps information from the heterogeneous microstructure. In association with the Finite Element Method (FEM), the AEH requires the application of periodic boundary conditions, which must be taken into account during the generation of FE meshes. Such models represent periodic cells, which are representative volumes for heterogeneous media and, in some cases, present a geometric and physics complexity that demands refined meshes, leading to a significant computational cost. The aim of this work is to develop a parallel program that applies both FEM and AEH to estimate the elasticity properties of 3D bodies. A sequential version of the 2D program using C, and parallel versions using OpenMP and CUDA were implemented. A sequential version of the program, called HEA3D, was successfully implemented using FORTRAN. Also, a parallel version of the code was implemented using OpenMP. The validation of the codes consisted of comparisons of the numerical results obtained, with numerical and experimental data available in the literature, showing good agreement. Significant speedups were obtained by the parallel version of the code, achieving speedups up to 5.3 times over its sequential version. CNPQ::CIENCIAS EXATAS E DA TERRA Modelagem multiescala Elementos finitos Homogeneização Programação paralela Multiscale Modelling Finite Elements Homogenization Parallel Programming
334	Mapas de símbolos proporcionais / Proportional symbol maps Kunigami, Guilherme, 1986- 09 May 2011 (has links) Orientador: Pedro Jussieu de Rezende, Cid Carvalho de Souza / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-19T04:48:53Z (GMT). No. of bitstreams: 1 Kunigami_Guilherme_M.pdf: 3383647 bytes, checksum: 88687783446ea3564995daf2b1ecfd79 (MD5) Previous issue date: 2011 / Resumo: Nesta dissertação, realizamos um estudo extensivo de uma classe de problemas envolvendo mapas de símbolos proporcionais, através de programação linear inteira. Mapas de símbolos proporcionais são uma ferramenta cartográfica para a representação de eventos associados 'a intensidade e localização geográfica. Exemplos clássicos desses tipos de mapas são ocorrências de terremotos e populações de cidades. Devido 'a proximidade e ao tamanho dos símbolos, podem haver sobreposições entre eles. Na ocorrência dessas sobreposições, a decisão sobre quais símbolos ficarão por cima de outros, pode afetar a visibilidade dos símbolos em um desenho. Os problemas envolvendo mapas de símbolos proporcionais dos quais tratamos são restritos ao uso de círculos opacos como símbolos e consistem em decidir a ordem em que estes serão dispostos em vista das sobreposições, de forma a maximizar métricas associadas à qualidade visual desses mapas. Tratam-se, portanto, de problemas de otimização combinatória. Em nosso trabalho, apresentamos modelos de programação linear inteira para resolução de dois desses problemas, um deles foi provado pertencer à classe NP-difícil e o outro tem complexidade ainda não conhecida. Obtivemos resultados teóricos de combinatória poliédrica acerca dos modelos, o que resultou em diversas desigualdades definidoras de facetas que foram incorporadas aos modelos. Desenvolvemos ainda técnicas de pré-processamento que decompuseram as instâncias de entrada em um grande número de componentes de menor tamanho. Essas técnicas permitiram resolver de maneira ótima, pela primeira vez, diversas instâncias criadas a partir de dados reais. Ademais, descrevemos um trabalho que aborda um desses problemas através de uma heurística GRASP, ao qual também contribuímos / Abstract: In this dissertation, we present an extensive study of a class of problems involving proportional symbol maps, through integer linear programming. Proportional symbol maps are a cartographic tool to represent events associated to specified values and geographical coordinates. Classic examples of these maps include representation of earthquakes and city populations. Due to the size and proximity of the symbols, there may be overlap among them. In such case, deciding which symbols will be placed above others may result in maps with different visibility information. The problems dealing with proportional symbol maps we address restrict symbols to be opaque disks and consist of deciding the order of their placement in view of overlaps, so as to maximize metrics related to the visual quality of such maps. Therefore, these amount essentially to combinatorial optimization problems. In our work, we designed integer linear programming models to solve two of these problems, one proven to be NP-hard and the other of complexity yet unknown. We obtained theoretical results concerning these models, through polyhedral combinatorics, which allowed us to include several facet defining inequalities into these models. We also developed preprocessing techniques that successfully broke down the input instances into a large number of smaller components. These techniques lead, for the first time, to optimal solutions of several test instances created from real-world data. Furthermore, we describe work on a heuristic approach to one of these problems using GRASP, to which we also contributed / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Programação inteira Geometria computacional Cartografia Meta-heurística Programação paralela (Computação) Integer programming Computational geometry Cartography Metaheuristic Parallel programming
335	Performance Prediction of Parallel Programs in a Linux Environment Farooq, Mohammad Habibur Rahman & Qaisar January 2010 (has links) Context. Today’s parallel systems are widely used in different computational tasks. Developing parallel programs to make maximum use of the computing power of parallel systems is tricky and efficient tuning of parallel programs is often very hard. Objectives. In this study we present a performance prediction and visualization tool named VPPB for a Linux environment, which had already been introduced by Broberg et.al, [1] for a Solaris2.x environment. VPPB shows the predicted behavior of a multithreaded program using any number of processors and the behavior is shown on two different graphs. The prediction is based on a monitored uni-processor execution. Methods. An experimental evaluation was carried out to validate the prediction reliability of the developed tool. Results. Validation of prediction is conducted, using an Intel multiprocessor with 8 processors and PARSEC 2.0 benchmark suite application programs. The validation shows that the speed-up predictions are +/-7% of a real execution. Conclusions. The experimentation of the VPPB tool showed that the prediction of VPPB is reliable and the incurred overhead into the application programs is low. / contact: +46(0)736368336 VPPB Pthreads parallel programming performance tuning performance prediction distributed memory system shared memory system Computer Sciences Datavetenskap (datalogi)
336	Abstraction fonctionnelle pour la programmation d’architecture multi-niveaux : formalisation et implantation / Functional abstraction for programming multi-level architectures : formalisation and implementation Allombert, Victor 07 July 2017 (has links) Les architectures parallèles sont de plus en plus présentes dans notre environnement, que ce soit dans les ordinateurs personnels disposant des dizaines d’unités de calculs jusqu’aux super-calculateurs comptant des millions d’unités. Les architectures haute performance modernes sont généralement constituées de grappes de multiprocesseurs, elles même constituées de multi-cœurs, et sont qualifiées d’architecture hiérarchiques. La conception de langages pour de telles architectures est un sujet de recherche actif car il s’agit de simplifier la programmation tout en garantissant l’efficacité des programmes. En effet, écrire des programmes parallèles est, en général, plus complexe tant au point de vue algorithmique qu’au niveau de l’implémentation. Afin de répondre à cette problématique, plusieurs modèles structurés ont été proposés. Le modèle logico-materiel BSP définit une vision structurée pour les architectures parallèles dites plates. Afin d’exploiter les architectures actuelles, une extension adaptée aux architectures hiérarchiques a été proposée : Multi-BSP. Tout en préservant la philosophie BSP, ce modèle garanti efficacité, sécurité d’exécution, passage à l’échelle et prédiction de coût.Cette thèse s’articule donc autour de cette idée et propose de définir Multi-ML, un langage basé sur le modèle logico-materiel Multi-BSP, garantissant les propriétés énoncées ci-dessus. Afin de pouvoir garantir la sécurité d’exécution des programmes Multi-ML, nous proposons une sémantique formelle ainsi qu’un système de type afin d’accepter uniquement des programmes bien formés. De plus, nous proposons une machine abstraite permettant de décrire formellement l’évaluation d’un programme Multi-ML sur une machine Multi-BSP. Une implantation du langage, développé dans le cadre de cette thèse, permet de générer un code exécutable. Il est donc possible d’exécuter, efficacement, des algorithmes Multi-BSP écrits à l’aide de Multi-ML sur diverses machines hiérarchiques / From personal computers using an increasing number of cores, to supercomputers having millions of computing units, parallel architectures are the current standard. The high performance architectures are usually referenced to as hierarchical, as they are composed from clusters of multi-processors of multi-cores. Programming such architectures is known to be notoriously difficult. Writing parallel programs is, most of the time, difficult for both the algorithmic and the implementation phase. To answer those concerns, many structured models and languages were proposed in order to increase both expressiveness and efficiency. Among other models, Multi-BSP is a bridging model dedicated to hierarchical architecture that ensures efficiency, execution safety, scalability and cost prediction. It is an extension of the well known BSP model that handles flat architectures.In this thesis we introduce the Multi-ML language, which allows programming Multi-BSP algorithms “à la ML” and thus, guarantees the properties of the Multi-BSP model and the execution safety, thanks to a ML type system. To deal with the multi-level execution model of Multi-ML, we defined formal semantics which describe the valid evaluation of an expression. To ensure the execution safety of Multi-ML programs, we also propose a typing system that preserves replicated coherence. An abstract machine is defined to formally describe the evaluation of a Multi-ML program on a Multi-BSP architecture. An implementation of the language is available as a compilation toolchain. It is thus possible to generate an efficient parallel code from a program written in Multi-ML and execute it on any hierarchical machine Programmation parallèle Multi-BSP Ml Langage de programmation Sûreté des langages Parallel programming Multi-BSP Ml Programming language Parallel execution safety
337	H.264 Baseline Real-time High Definition Encoder on CELL Wei, Zhengzhe January 2010 (has links) In this thesis a H.264 baseline high definition encoder is implemented on CELL processor. The target video sequence is YUV420 1080p at 30 frames per second in our encoder. To meet real-time requirements, a system architecture which reduces DMA requests is designed for large memory accessing. Several key computing kernels: Intra frame encoding, motion estimation searching and entropy coding are designed and ported to CELL processor units. A main challenge is to find a good tradeoff between DMA latency and processing time. The limited 256K bytes on-chip memory of SPE has to be organized efficiently in SIMD way. CAVLC is performed in non-real-time on the PPE. The experimental results show that our encoder is able to encode I frame in high quality and encode common 1080p video sequences in real-time. With the using of five SPEs and 63KB executable code size, 20.72M cycles are needed to encode one P frame partitions for one SPE. The average PSNR of P frames increases a maximum of 1.52%. In the case of fast speed video sequence, 64x64 search range gets better frame qualities than 16x16 search range and increases only less than two times computing cycles of 16x16. Our results also demonstrate that more potential power of the CELL processor can be utilized in multimedia computing. The H.264 main profile will be implemented in future phases of this encoder project. Since the platform we use is IBM Full-System Simulator, DMA performance in a real CELL processor is an interesting issue. Real-time entropy coding is another challenge to CELL. Video coding H.264 CELL processor Real-time coding Intra prediction Parallel programming SIMD Computer Engineering Datorteknik
338	Paralelní zpracování velkých objemů astronomických dat / Parallel Processing of Huge Astronomical Data Haas, František January 2016 (has links) This master thesis focuses on the Random Forests algorithm analysis and implementation. The Random Forests is a machine learning algorithm targeting data classification. The goal of the thesis is an implementation of the Random Forests algorithm using techniques and technologies of parallel programming for CPU and GPGPU and also a reference serial implementation for CPU. A comparison and evaluation of functional and performance attributes of these implementations will be performed. For the comparison of these implementations various data sets will be used but an emphasis will be given to real world data obtained from astronomical observations of stellar spectra. Usefulness of these implementations for stellar spectra classification from the functional and performance view will be performed. Powered by TCPDF (www.tcpdf.org)
339	Investigating tools and techniques for improving software performance on multiprocessor computer systems Tristram, Waide Barrington January 2012 (has links) The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured. Multiprocessors Multiprogramming (Electronic computers) Parallel programming (Computer science) Linux Abstract data types (Computer science) Threads (Computer programs) Computer programming
340	Idiom-driven innermost loop vectorization in the presence of cross-iteration data dependencies in the HotSpot C2 compiler / Idiomdriven vektorisering av inre loopar med databeroenden i HotSpots C2 kompilator Sjöblom, William January 2020 (has links) This thesis presents a technique for automatic vectorization of innermost single statement loops with a cross-iteration data dependence by analyzing data-flow to recognize frequently recurring program idioms. Recognition is carried out by matching the circular SSA data-flow found around the loop body’s φ-function against several primitive patterns, forming a tree representation of the relevant data-flow that is then pruned down to a single parameterized node, providing a high-level specification of the data-flow idiom at hand used to guide algorithmic replacement applied to the intermediate representation. The versatility of the technique is shown by presenting an implementation supporting vectorization of both a limited class of linear recurrences as well as prefix sums, where the latter shows how the technique generalizes to intermediate representations with memory state in SSA-form. Finally, a thorough performance evaluation is presented, showing the effectiveness of the vectorization technique. compiler vectorization SIMD Java HotSpot code optimization reductions prefix sums parallel programming data-level parallelism Computer Sciences Datavetenskap (datalogi)

Search results