• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 218
  • 81
  • 19
  • 12
  • 6
  • 6
  • 6
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 444
  • 444
  • 219
  • 172
  • 85
  • 76
  • 70
  • 66
  • 59
  • 53
  • 52
  • 48
  • 46
  • 42
  • 41
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
291

Instructional footprinting: a basis for exploiting concurrency through instructional decomposition and code motion

Landry, Kenneth D. 06 June 2008 (has links)
In many languages, the programmer is provided the capability of communicating through the use of function calls with other, separate, independent processes. This capability can be as simple as a service request made to the operating system or as advanced as Tuple Space operations specific to a Linda programming system. The problem with such calls, however, is that they block while waiting for data or information to be returned. This synchronous nature and lack of concurrency can be avoided by initiating a non-blocking request for data earlier in the code and retrieving the returned data later when it is needed. To facilitate a better understanding of how this type of concurrency can be exploited, we introduce an instructional footprint model and application framework that formally describes instructional decomposition and code motion activities. To demonstrate the effectiveness of such an approach, we apply instructional footprinting to programs using the Linda coordination language. <i>Linda Primitive Transposition</i> (LPT) and <i>Instruction Piggybacking</i> are discussed as techniques to increase the size of instructional footprints, and thereby improve the performance of Linda programs. We also present the concept of <i>Lexical Proximity</i> to demonstrate how the overlapping of footprints contributes to the speedup of Linda programs. / Ph. D.
292

Performance comparison of software tools for distributed computations

Peng, Yun 01 January 1999 (has links)
No description available.
293

An Analysis of an Interrupt-Driven Implementation of the Master-Worker Model with Application-Specific Coprocessors

Hickman, Joseph 17 January 2008 (has links)
In this thesis, we present a versatile parallel programming model composed of an individual general-purpose processor aided by several application-specific coprocessors. These computing units operate under a simplification of the master-worker model. The user-defined coprocessors may be either homogeneous or heterogeneous. We analyze system performance with regard to system size and task granularity, and we present experimental results to determine the optimal operating conditions. Finally, we consider the suitability of this approach for scientific simulations — specifically for use in agent-based models of biological systems. / Master of Science
294

Parallel programming on General Block Min Max Criterion

Lee, ChuanChe 01 January 2006 (has links)
The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO).
295

Collecting and representing parallel programs with high performance instrumentation

Railing, Brian Paul 07 January 2016 (has links)
Computer architecture has looming challenges with finding program parallelism, process technology limits, and limited power budget. To navigate these challenges, a deeper understanding of parallel programs is required. I will discuss the task graph representation and how it enables programmers and compiler optimizations to understand and exploit dynamic aspects of the program. I will present Contech, which is a high performance framework for generating dynamic task graphs from arbitrary parallel programs. The Contech framework supports a variety of languages and parallelization libraries, and has been tested on both x86 and ARM. I will demonstrate how this framework encompasses a diversity of program analyses, particularly by modeling a dynamically reconfigurable, heterogeneous multi-core processor.
296

Parallel likelihood calculations for phylogenetic trees

Hayward, Peter 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms. To this end, phylogenetic trees, or evolutionary trees, are used to depict the evolutionary relationships between organisms as reconstructed from DNA sequence data. The likelihood of a given tree is commonly calculated for many purposes including inferring phylogenies, sampling from the space of likely trees and inferring other parameters governing the evolutionary process. This is done using Felsenstein’s algorithm, a widely implemented dynamic programming approach that reduces the computational complexity from exponential to linear in the number of taxa. However, with the advent of efficient modern sequencing techniques the size of data sets are rapidly increasing beyond current computational capability. Parallel computing has been used successfully to address many similar problems and is currently receiving attention in the realm of phylogenetic analysis. Work has been done using data decomposition, where the likelihood calculation is parallelised over DNA sequence sites. We propose an alternative way of parallelising the likelihood calculation, which we call segmentation, where the tree is broken down into subtrees and the likelihood of each subtree is calculated concurrently over multiple processes. We introduce our proposed system, which aims to drastically increase the size of trees that can be practically used in phylogenetic analysis. Then, we evaluate the system on large phylogenies which are constructed from both real and synthetic data, to show that a larger decrease of run times are obtained when the system is used. / AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding van filogenetiese bome, om te monster vanuit ’n versameling van sulke moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre proses. Dit word vermag met behulp van Felsenstein se algoritme, ’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande berekeningsvermoë te oorskry. Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening te parallelliseer, deur die filogenetiese boom op te breek in sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het om ’n drastiese toename in die grootte van die bome wat gebruik kan word in filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is, evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer die stelsel in gebruik is.
297

SIMD Optimizations of Software Rendering in 2D Video Games / SIMD optimeringar i mjukvarurendering av 2D spel

Mendel, Oskar, Bergström, Jesper January 2019 (has links)
Optimizing rendering is one of the greatest challenges faced by game developers. Most game engines make use of hardware rendering which uses technology specifically built for rendering. Before such hardware existed, game developers had to rely on the CPU to render their games. This is known as software rendering. Software rendering is not commonly used nowadays but has been seen in cases such as a backup for when the end users machine does not support the hardware based renderer of the application. Since the CPU is not purposely built for rendering, unlike the GPU, the developer has to perform optimizations to make the renderer more efficient in terms of speed. In this thesis, we present an approach which is a subset of parallel programming called Single Instruction, Multiple Data. This technique operates on vector based registers which means operations can be performed on multiple pieces of data at once. This is applied to an already built game engine in order to optimize its rendering. The results show a speed-up of 90.5% and a framerate increase from 30 frames per second to 133 frames per second within the rendering routine.
298

OOPS - Object-Oriented Parallel System. Um framework de classes para a programação científica paralela / OOPS - Object-Oriented Parallel System. A class framework to support parallel scientific programming.

Sonoda, Eloiza Helena 23 March 2006 (has links)
Neste trabalho foi realizado o projeto e o desenvolvimento do framework de classes OOPS - Object-Oriented Parallel System. Esta é uma ferramenta que utiliza orientação a objetos para apoiar a implementação de programas científicos concorrentes para execução paralela. O OOPS fornece abstrações de alto nível para que o programador da aplicação não se envolva diretamente com detalhes de implementação paralela, sem contudo ocultar completamente aspectos paralelos de projeto, como particionamento e distribuição dos dados, por questões de eficiência e de desempenho da aplicação. Para isso, o OOPS apresenta um conjunto de classes que permitem o encapsulamento de técnicas comumente encontradas em programação de sistemas paralelos. Utiliza o conceito de processadores virtuais organizados em grupos, aos quais podem ser aplicadas topologias que fornecem modos de comunicação entre os processadores virtuais, e contêineres podem ter seus elementos distribuídos por essas topologias, com componentes paralelos atuando sobre eles. A utilização das classes fornecidas pelo OOPS facilita a implementação do código sem adicionar sobrecarga significativa à aplicação paralela, representando uma camada fina sobre a biblioteca de passagem de mensagens usada. / This work describes the design and development of the OOPS (Object Oriented Parallel System) class framework, which is a tool that uses object orientation to support programming of concurrent scientific applications for parallel execution. OOPS provides high level abstractions to avoid application programmer\'s involvement with many parallel implementation details. For performance considerations, some parallel aspects such as decomposition and data distribution are not completely hidden from the application programmer. To achieve its intents, OOPS encapsulates some programming techniques frequently used for parallel systems. Virtual processors are organized in groups, over which topologies that provide communication between the processors can be constructed; distributed containers have their elements distributed across the processors of a topology, and parallel components use these containers for their work. The use of the classes supplied by OOPS simplifies the implementation of parallel applications, without incurring in pronounced overhead. OOPS is thus a thin layer over the message passing interface used for its implementation.
299

Algoritmo de refinamento de Delaunay a malhas seqüenciais, adaptativas e com processamento paralelo. / Delaunay refinement algorithm to sequential, adaptable meshes and with parallel computing.

Sakamoto, Mauro Massayoshi 09 May 2007 (has links)
Este trabalho apresenta o desenvolvimento de um gerador de malha de elementos finitos baseado no Algoritmo de Refinamento de Delaunay. O pacote é versátil e pode ser aplicado às malhas seriais e adaptativas ou à decomposição de uma malha inicial grossa ou pré-refinada usando processamento paralelo. O algoritmo desenvolvido trabalha com uma entrada de dados na forma de um gráfico de linhas retas planas. A construção do algoritmo de Delaunay foi baseada na técnica de Watson para a triangulação fronteiriça e nos métodos seqüenciais de Ruppert e Shewchuk para o refinamento com paralelismo. A técnica elaborada produz malhas que mantêm as propriedades de uma triangulação de Delaunay. A metodologia apresentada foi implementada utilizando os conceitos de Programação Orientada a Objetos com o auxílio de bibliotecas de código livre. Aproveitando a flexibilidade de algumas dessas bibliotecas acopladas foi possível parametrizar a dimensão do problema, permitindo gerar malhas seqüenciais bidimensionais e tridimensionais. Os resultados das aplicações em malhas seriais, adaptativas e com programação paralela mostram a eficácia desta ferramenta. Uma versão acadêmica do algoritmo de refinamento de Delaunay bidimensional para o Ambiente Mathematica também foi desenvolvido. / This work presents the development of a finite elements mesh generation based on Delaunay Triangulation Algorithm. The package is versatile and applicable to the serial and adaptable meshes or to either the coarse or pre-refined initial mesh decomposition using parallel computing. The developed algorithm works with data input in the form of Planar Straight Line Graphics. The building of the Delaunay Algorithm was based on the Watson\'s technique for the boundary triangulation and in both Ruppert and Shewchuk sequential methods for the parallel refinement. The proposed technique produces meshes maintaining the properties of the Delaunay triangulation. The presented methodology was implemented using the Programming Object-Oriented concepts, which is supported by open source libraries. Taking advantage of the flexibility of some of those coupled libraries the parametrization of the problem dimension was possible, allowing to generate both two and three-dimensional sequential meshes. The results obtained with the applications in serial, adaptive and in parallel meshes have shown the effectiveness of this tool. An academic version of the twodimensional Delaunay refinement algorithm for the Mathematica Environment was also developed.
300

Parallel block preconditioning of the incompressible Navier-Stokes equations with weakly imposed boundary conditions

White, Raymon January 2016 (has links)
This project is concerned with the development and implementation of a novel preconditioning method for the iterative solution of linear systems that arise in the finite element discretisation of the incompressible Navier-Stokes equations with weakly imposed boundary conditions. In this context we studied an augmented approach where the Schur complement associated with the momentum block of the Navier-Stokes equations has special sparse structure. We follow the standard inf-sup stable method of discretising the Navier-Stokes equations by the Taylor-Hood elements with the Lagrange multiplier constraints discretised using the same order approximation on matching grids. The resulting system of nonlinear equations is solved iteratively by Newton's method. The spectrum of the linearised Oseen's problem, preconditioned by the exact augmentation preconditioner was analysed. Then we developed inexact versions of the preconditioner aimed at achieving optimal scaling of the algorithm in terms of computational resources and wall-clock times. The experimental evaluation of the methodology involve a number of benchmark problems in two and three spatial dimensions. The obtained results demonstrate efficiency, robustness and almost optimal scaling of the solution algorithm with respect to the discrete problem size. We used OOMPH-LIB as a testbed for our experiments. The preconditioning strategies were implemented using OOMPH-LIB's Parallel Block Preconditioning Framework. The initial version of the software was significantly upgraded during the course of this project with newly implemented functionalities to facilitate the rapid development of sophisticated hierarchical design of parallel block preconditioners. Parallel performance analysis of the newly introduced functionalities demonstrate negligible overhead in terms of wall-clock time and the framework demonstrate good weak and strong parallel scaling.

Page generated in 0.0579 seconds