• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 120
  • 37
  • 28
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 296
  • 179
  • 122
  • 103
  • 100
  • 68
  • 47
  • 42
  • 42
  • 40
  • 40
  • 37
  • 37
  • 36
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
211

Path Selection Based Branching for Coarse Grained Reconfigurable Arrays

January 2014 (has links)
abstract: Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achieving high performance at low power consumption. While CGRAs can efficiently accelerate loop kernels, accelerating loops with control flow (loops with if-then-else structures) is quite challenging. Techniques that handle control flow execution in CGRAs generally use predication. Such techniques execute both branches of an if-then-else structure and select outcome of either branch to commit based on the result of the conditional. This results in poor utilization of CGRA s computational resources. Dual-issue scheme which is the state of the art technique for control flow fetches instructions from both paths of the branch and selects one to execute at runtime based on the result of the conditional. This technique has an overhead in instruction fetch bandwidth. In this thesis, to improve performance of control flow execution in CGRAs, I propose a solution in which the result of the conditional expression that decides the branch outcome is communicated to the instruction fetch unit to selectively issue instructions from the path taken by the branch at run time. Experimental results show that my solution can achieve 34.6% better performance and 52.1% improvement in energy efficiency on an average compared to state of the art dual issue scheme without imposing any overhead in instruction fetch bandwidth. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2014
212

Ferramenta computacional para a definição e geração de estruturas cristalinas

Ferreira, Roberto de Carvalho 29 August 2012 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-06-10T11:28:09Z No. of bitstreams: 1 robertodecarvalhoferreira.pdf: 4632819 bytes, checksum: e5bd9a607a629a54c4f57e8d4c95a5ed (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-07-13T13:47:43Z (GMT) No. of bitstreams: 1 robertodecarvalhoferreira.pdf: 4632819 bytes, checksum: e5bd9a607a629a54c4f57e8d4c95a5ed (MD5) / Made available in DSpace on 2016-07-13T13:47:43Z (GMT). No. of bitstreams: 1 robertodecarvalhoferreira.pdf: 4632819 bytes, checksum: e5bd9a607a629a54c4f57e8d4c95a5ed (MD5) Previous issue date: 2012-08-29 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A evolução dos computadores, mais especificamente no que diz respeito ao aumento de sua capacidade de armazenamento e de processamento de dados, possibilitou a construção de ferramentas computacionais destinadas à simulação de fenômenos físicos e químicos. Com isso, a realização de experimentos práticos vem, em alguns casos, sendo substituída pela utilização de experimentos computacionais, que simulam o comportamento de inúmeros elementos que compõem o experimento original. Neste contexto, podemos destacar os modelos utilizados para a simulação de fenômenos em escala atômica. A construção desses simuladores requer, por parte dos desenvolvedores, um amplo estudo e definição de modelos precisos e confiáveis. Tal complexidade se reflete, muitas vezes, em simuladores complexos, destinados a simulação de um grupo restrito de estruturas, expressos de maneira fixa, utilizando algumas formas geométricas padrões. Este trabalho propõe uma ferramenta computacional para a geração de um conjunto de estruturas cristalinas. Este conjunto é caracterizado pela organização espacial regular dos átomos que a compõe. A ferramenta é composta por a) uma linguagem de programação, que rege a criação das estruturas através da definição de um sistema cristalino e a construção de objetos a partir de funções características e operadores CSG (Construtive Solid Geometry), e b) um compilador/interpretador que analisa um código fonte escrito na linguagem, e gera a partir deste o objeto correspondente. A ferramenta oferece aos desenvolvedores um mecanismo simples que possibilita a geração de um número irrestrito de estruturas. Sua aplicabilidade é demonstrada através da incorporação de uma estrutura, gerada a partir de um código fonte, ao simulador Monte Carlo Spins Engine, criado pelo Grupo de Computação Gráfica da Universidade Federal de Juiz de Fora. / The evolution of computers, more specifically regarding the increased storage and data processing capacity, allowed the construction of computational tools for the simulation of physical and chemical phenomena. Thus, practical experiments are being replaced, in some cases, by computational experiments that simulate the behavior of many elements that compose the original one. In this context, we can highlight the models used to simulate phenomena at the atomic scale. The construction of these simulators requires, by developers, the study and definition of accurate and reliable models. This complexity is often reflected in the construction of complex simulators, which simulate a limited group of structures. Such structures are sometimes expressed in a fixed manner using a limited set of geometric shapes. This work proposes a computational tool that aims to generate a set crystal structures. Crystal structures are characterized by a set of atoms arranged in a regular way. The proposed tool consists of a) a programming language, which is used to describe the structures using for this purpose characteristic functions and CSG (Construtive Solid Geometry) operators, and b) a compiler/interpreter that examines the source code written in the proposed language, and generates the objects accordingly. This tool enables the generation of an unrestricted number of structures. Its applicability is demonstrated through the incorporation of a structure, generated from the source code, to the Monte Carlo Spins Engine, a spin simulator developed by the Group of Computer Graphics of the Federal University of Juiz de Fora.
213

Técnicas de formação de regiões para projetos de máquinas virtuais eficientes / Region formation techniques for efficient virtual machines design

Zinsly, Raphael Moreira, 1989- 23 August 2018 (has links)
Orientadores: Sandro Rigo, Edson Borin / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-23T22:21:33Z (GMT). No. of bitstreams: 1 Zinsly_RaphaelMoreira_M.pdf: 2659662 bytes, checksum: 961bbb4fb596ee0d81d07c51279c44ed (MD5) Previous issue date: 2013 / Resumo: O resumo poderá ser visualizado no texto completo da tese digital / Abstract: The complete abstract is available with the full electronic document / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
214

Compilador ASN.1 e codificador/decodificador para BER / ASN.1 compiler and encode/decode for BER

Restovic Valderrama, Maria Inés 09 March 1992 (has links)
Orientador: Manuel de Jesus Mendes / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-18T21:31:47Z (GMT). No. of bitstreams: 1 RestovicValderrama_MariaInes_M.pdf: 1435363 bytes, checksum: 073357e2857410d0a58e683b2837ca3e (MD5) Previous issue date: 1992 / Resumo: Neste trabalho apresenta-se uma ferramenta chamada Compilador ASN.1, cujo objetivo é fornecer uma representação concreta para a sintaxe abstrata ASN.1, de forma que, as especificações das PDU's dos protocolos de aplicação, geralmente escritas em ASN.1, possam ser utilizadas computacionalmente. Uma das funções prioritárias da camada de apresentação de um protocolo de comunicação é produzir uma codificação dos valores destas PDU's, baseando-se nas regras definidas pela norma BER. Assim, o compilador deve fornecer numa segunda tarefa, as rotinas de codificação e decodificação específicas para cada PDU compilada, utilizando um conjunto de funções que se encontram em duas bibliotecas auxiliares que realizam estas conversões / Abstract: This work presents a tool called "Compilador ASN.1", which main objective is to provide a concrete representation for the abstract syntax ASN.1, in order to translate the application protocol PDU's specification, written in ASN.1, to the C language. One of the main functions of the presentation layer is produce an encode-decode for the PDU's data values, based on the BER norm. Therefore, a second compiler task is to provide the specific encode-decode routines for each compiled PDU, using a function set available in two complementary libraries that carry out these conversions / Mestrado / Automação / Mestre em Engenharia Elétrica
215

Mecanismo para execução especulativa de aplicações paralelizadas por técnicas DOPIPE usando replicação de estágios / Mechanism for speculative execution of applications parallelized by DOPIPE techniques using stage replication

Baixo, André Oliveira Loureiro do, 1986- 21 August 2018 (has links)
Orientador: Guido Costa Souza de Araújo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-21T04:52:37Z (GMT). No. of bitstreams: 1 Baixo_AndreOliveiraLoureirodo_M.pdf: 1756118 bytes, checksum: 00900e9463b55e1800da080419da53c7 (MD5) Previous issue date: 2012 / Resumo: A utilização máxima dos núcleos de arquiteturas multi-processadas é fundamental para permitir uma utilização completa do paralelismo disponível em processadores modernos. A fim de obter desempenho escalável, técnicas de paralelização requerem um ajuste cuidadoso de: (a) mecanismo arquitetural para especulação; (b) ambiente de execução; e (c) transformações baseadas em software. Mecanismos de hardware e software já foram propostos para tratar esse problema. Estes mecanismos, ou requerem alterações profundas (e arriscadas) nos protocolos de coerência de cache, ou exibem uma baixa escalabilidade de desempenho para uma gama de aplicações. Trabalhos recentes em técnicas de paralelização baseadas em DOPIPE (como DSWP) sugerem que a combinação de versionamento de dados baseado em paginação com especulação em software pode resultar em bons ganhos de desempenho. Embora uma solução apenas em software pareça atrativa do ponto de vista da indústria, essa não utiliza todo o potencial da microarquitetura para detectar e explorar paralelismo. A adição de tags às caches para habilitar o versionamento de dados, conforme recentemente anunciado pela indústria, pode permitir uma melhor exploração de paralelismo no nível da microarquitetura. Neste trabalho, é apresentado um modelo de execução que permite tanto a especulação baseada em DOPIPE, como as técnicas de paralelização especulativas tradicionais. Este modelo é baseado em uma simples abordagem com tags de cache para o versionamento de dados, que interage naturalmente com protocolos de coerência de cache tradicionais, não necessitando que estes sejam alterados. Resultados experimentais, utilizando benchmarks SPEC e PARSEC, revelam um ganho de desempenho geométrico médio de 21.6× para nove programas sequenciais em uma máquina simulada de 24 núcleos, demonstrando uma melhora na escalabilidade quando comparada a uma abordagem apenas em software / Abstract: Maximal utilization of cores in multicore architectures is key to realize the potential performance available from modern microprocessors. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, runtime environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. Recent work on DOPIPE-based parallelization techniques (e.g. DSWP) has suggested that the combination of page-based data versioning with software speculation can result in good speed-ups. Although a softwareonly solution seems very attractive from an industry point-of-view, it does not enable the whole potential of the microarchitecture in detecting and exploiting parallelism. The addition of cache tags as an enabler for data versioning, as recently announced in the industry, could allow a better exploitation of parallelism at the microarchitecture level. In this paper we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, and does not require any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal a geometric mean speedup of 21.6x for nine sequential programs in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
216

CLUSTER AND COLLECT : Compile Time Optimization For Effective Garbage Collection

Ravindar, Archana 05 1900 (has links) (PDF)
No description available.
217

Performance Measurement Of A Java Virtual Machine

Pramod, B S 07 1900 (has links) (PDF)
No description available.
218

Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterogeneous Processors

Prasad, Ashwin 01 1900 (has links) (PDF)
MATLAB is an array language, initially popular for rapid prototyping, but is now being in-creasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program’s execution time. Today’s com-puter systems have tremendous computing power in the form of traditional CPU cores and also throughput-oriented accelerators such as graphics processing units (GPUs). Thus, an approach that maps the control flow dominated regions of a MATLAB program to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this work, we present the design and implementation of MEGHA, a compiler that auto-matically compiles MATLAB programs to enable synergistic execution on heterogeneous pro-cessors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. Our compiler identifies data parallel regions of the program and com-poses them into kernels. The kernel composition step eliminates a number of intermediate arrays which are otherwise required and also reduces the size of the scheduling and mapping problem the compiler needs to solve subsequently. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically, and the amount of data transfer needed is minimized. A heuristic technique to ensure that memory accesses on the CPU exploit locality and those on the GPU are coalesced is also presented. In order to ensure that data transfers required for dependences across basic blocks are performed, we propose a data flow analysis step and an edge-splitting strategy. Thus our compiler automatically handles kernel composition, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfers. Additionally, we address the problem of identifying what variables can coexist in GPU memory simultaneously under the GPU memory constraints. We formulate this problem as that of identifying maximal cliques in an interference graph. We approximate the interference graph using an interval graph and develop an efficient algorithm to solve the problem. Furthermore, we present two program transformations that optimize memory accesses on the GPU using the software managed scratchpad memory available in GPUs. We have prototyped the proposed compiler using the Octave system. Our experiments using this implementation show a geometric mean speedup of 12X on the GeForce 8800 GTS and 29.2X on the Tesla S1070 over baseline MATLAB execution for data parallel benchmarks. Experiments also reveal that our method provides up to 10X speedup over hand written GPUmat versions of the benchmarks. Our method also provides a speedup of 5.3X on the GeForce 8800 GTS and 13.8X on the Tesla S1070 compared to compiled MATLAB code running on the CPU.
219

Linda Implementations Using Monitors and Message Passing

Leveton, Alan L. 01 January 1990 (has links)
Linda is a new parallel programming language that is built around an interprocess communication model called generative communication that differs from previous models in specifying that shared data be added in tuple form to an environment called tuple space, where a tuple exists independently until some process chooses to use it. Interesting properties arise from the model, including space and time uncoupling as well as structured naming. We delineate the essential Linda operations, then discuss the properties of generative communication. We are particularly concerned with implementing Linda on top of two traditional parallel programming paradigms - process communication through globally shared memory via monitors, and process communication in local memory architectures through the use of message passing constructs. We discuss monitors and message passing, then follow with a description of the two Linda implementations.
220

Automation in CS1 with the Factoring Problem Generator

Parker, Joshua B. 01 December 2009 (has links) (PDF)
As the field of computer science continues to grow, the number of students enrolled in related programs will grow as well. Though one-on-one tutoring is one of the more effective means of teaching, computer science instructors will have less and less time to devote to individual students. To address this growing concern, many tools that automate parts of an instructor’s job have been proposed. These tools can assist instructors in presenting concepts and grading student work, and they can help students learn to program more effectively. A growing group of intelligent tutoring systems attempts to tie all of this functionality into a single tool that is meant to be used throughout an entire CS course or series of courses. To contribute to this emerging area, the Factoring Problem Generator (FPG) is presented in this work. The FPG creates and grades problems in C in which students search for and extract blocks of repeated code into individual functions, learning to utilize parameters and return values as they do so. The problems created by the FPG are highly configurable by instructors such that the difficulty can be finely tuned to suit students’ individual needs. Instructors can choose whether or not to include arrays, pointers, certain elemental data types, certain operators, or certain kinds of statements, among other things. The FPG is additionally capable of generating a set of test cases for each generated problem. These test cases fully exercise students’ solutions by covering all branches of execution, and they ensure that program functionality does not change as students factor code into functions. Initial experimentation with the system has suggested that the FPG can be integrated into a beginning CS curriculum and with further refinement could become a standard tool in the CS classroom.

Page generated in 0.0814 seconds