• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 226
  • 81
  • 30
  • 24
  • 14
  • 7
  • 6
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 501
  • 501
  • 103
  • 70
  • 61
  • 58
  • 58
  • 57
  • 57
  • 56
  • 54
  • 54
  • 52
  • 50
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
341

SIMULAÇÃO CLIMÁTICA DE DADOS DE VENTO EM REDES P2P UTILIZANDO GPU

Baron Neto, Ciro 28 February 2014 (has links)
Made available in DSpace on 2017-07-21T14:19:39Z (GMT). No. of bitstreams: 1 Ciro Baron Neto.pdf: 1513768 bytes, checksum: a9f4624d5d9521cfa109fa40a688cbb2 (MD5) Previous issue date: 2014-02-28 / This paper presents an approach of technologies GPGPU (General-Purpose Computing on Graphics Processing Unit) and P2P (peer-to-peer) networks in order to improve the response time of climate data simulations. Thus, an application using CUDA (Compute Unified Device Architecture) architecture and the simulation model of Venthor simulator were initially adopted and integrated into the P2PComp framework. The results indicate an acceleration factor equal to 70 for single computers. Furthermore, the possibility of using a P2P sharing network for processing, higher acceleration factors can be obtained. Computer simulation models usually demand high processing power and this work showed that the use of parallelism in GPUs and P2P networks is an alternative that allows better performance when compared to sequential computing. / Este trabalho apresenta uma avaliação das tecnologias de GPGPU (General-Purpose Computing on Graphics Processing Unit) e de redes P2P (peer-to-peer) para melhorar o tempo de resposta de simulações de dados climáticos. Para isso, uma aplicação utilizando a arquitetura CUDA (Compute Unified Device Architecture) e o modelo de simulação de dados de vento do software Venthor foram inicialmente adotados e após integrados ao framework P2PComp. Os resultados indicam um fator de aceleração igual a 70 para computadores isolados. Além disso, com a possibilidade do uso de uma rede P2P para compartilhamento de processamento, fatores de aceleração maiores podem ser obtidos. Modelos de simulação computacional geralmente demandam alto poder de processamento e este trabalho mostrou que a utilização do paralelismo em redes P2P e GPUs constitui uma alternativa que permite melhor desempenho quando comparado à computação sequencial.
342

"Processamento distribuído de áudio em tempo real" / "Distributed Real-Time Audio Processing"

Lago, Nelson Posse 04 June 2004 (has links)
Sistemas computadorizados para o processamento de multimídia em tempo real demandam alta capacidade de processamento. Problemas que exigem grandes capacidades de processamento são comumente abordados através do uso de sistemas paralelos ou distribuídos; no entanto, a conjunção das dificuldades inerentes tanto aos sistemas de tempo real quanto aos sistemas paralelos e distribuídos tem levado o desenvolvimento com vistas ao processamento de multimídia em tempo real por sistemas computacionais de uso geral a ser baseado em equipamentos centralizados e monoprocessados. Em diversos sistemas para multimídia há a necessidade de baixa latência durante a interação com o usuário, o que reforça ainda mais essa tendência para o processamento em um único nó. Neste trabalho, implementamos um mecanismo para o processamento síncrono e distribuído de áudio com características de baixa latência em uma rede local, permitindo o uso de um sistema distribuído de baixo custo para esse processamento. O objetivo primário é viabilizar o uso de sistemas computacionais distribuídos para a gravação e edição de material musical em estúdios domésticos ou de pequeno porte, contornando a necessidade de hardware dedicado de alto custo. O sistema implementado consiste em duas partes: uma, genérica, implementada sob a forma de um middleware para o processamento síncrono e distribuído de mídias contínuas com baixa latência; outra, específica, baseada na primeira, voltada para o processamento de áudio e compatível com aplicações legadas através da interface padronizada LADSPA. É de se esperar que pesquisas e aplicações futuras em que necessidades semelhantes se apresentem possam utilizar o middleware aqui descrito para outros tipos de processamento de áudio bem como para o processamento de outras mídias, como vídeo. / Computer systems for real-time multimedia processing require high processing power. Problems that depend on high processing power are usually solved by using parallel or distributed computing techniques; however, the combination of the difficulties of both real-time and parallel programming has led the development of applications for real-time multimedia processing for general purpose computer systems to be based on centralized and single-processor systems. In several systems for multimedia processing, there is a need for low latency during the interaction with the user, which reinforces the tendency towards single-processor development. In this work, we implemented a mechanism for synchronous and distributed audio processing with low latency on a local area network which makes the use of a low cost distributed system for this kind of processing possible. The main goal is to allow the use of distributed systems for recording and editing of musical material in home and small studios, bypassing the need for high-cost equipment. The system we implemented is made of two parts: the first, generic, implemented as a middleware for synchronous and distributed processing of continuous media with low latency; and the second, based on the first, geared towards audio processing and compatible with legacy applications based on the standard LADSPA interface. We expect that future research and applications that share the needs of the system developed here make use of the middleware we developed, both for other kinds of audio processing as well as for the processing of other media forms, such as video.
343

Métodos de fronteira imersa em mecânica dos fluidos / Immersed boundary methods in fluid mechanics

Larissa Alves Petri 24 March 2010 (has links)
No desenvolvimento de códigos paralelos, a biblioteca PETSc se destaca como uma ferramenta prática e útil. Com o uso desta ferramenta, este trabalho apresenta um estudo sobre resolvedores de sistemas lineares aplicados a escoamentos incompressíveis de fluidos em microescala, além de uma análise de seu comportamento em paralelo. Após um estudo dos diversos aspectos dos métodos de fronteira imersa, é apresentado um método de fronteira imersa paralelo de primeira ordem. Na sequência, é apresentada uma proposta de melhoria na precisão do método, baseada na minimização da distância entre a condição de contorno exata e aproximada, no sentido de mínimos quadrados. O desenvolvimento de uma ferramenta paralela eficiente é demonstrado na solução numérica de problemas envolvendo escoamentos incompressíveis de fluidos viscosos com fronteiras imersas / In the development of parallel codes, PETSc library has an important position as a practical and useful tool. With this tool, this work presents a study about linear system solvers applied to incompressible flow in microscale problems, furthermore an analysis of the parallel behavior of these methods is presented. After a study of several aspects of immersed boundary methods, and taking advantage of the flexibility of PETSc, a parallel first order immersed boundary method is presented. Thereafter, an improvement in the accuracy of the method is presented, based on the minimization of the distance between exact and approximated boundary conditions, in the least square sense. The development of a parallel and efficient tool is demonstrated in the numerical solution of incompressible viscous flow problems with immersed boundary
344

Processamento paralelo na simulação de campos eletromagnéticos pelo método das diferenças finitas no domínio do tempo - FDTD. / Parallel processing in the electromagnetic fields simulation with the finite-difference time-domain method - FDTD.

Marcelo Porto Trevizan 08 January 2007 (has links)
São crescentes as pesquisas e os projetos envolvendo o eletromagnetismo. Tanto para as pesquisas quanto para os projetos, tem-se o recurso de realizar simulações computacionais dos problemas envolvidos, a fim de investigar o comportamento dos fenômenos eletromagnéticos diante da situação na qual encontram-se. Há casos, contudo, em que o problema pode ficar computacionalmente grande, requisitando maior quantidade de memória e maior tempo de processamento, devido às geometrias envolvidas ou à acuracidade desejada. Com o objetivo de contornar estas questões, tem-se o desenvolvimento da computação paralela. Uma das implementações possíveis de sistema paralelizado é por meio de uma rede de computadores e, empregando-se programas gratuitos, tem-se sua realização a custo praticamente nulo. O presente trabalho, utilizando o método FDTD, visa a implementação de tal sistema paralelizado. Entretanto, na etapa de desenvolvimento, uma especial atenção foi dada às boas práticas de programação, com o objetivo de garantir ao programa flexibilidade, modularidade e expansibilidade. Adicionalmente, desenvolveu-se uma ferramenta matemática para estimar o tempo de processamento total de uma simulação paralelizada, além de fornecer indicativos de ajustes de parâmetros para que este tempo seja o menor possível. Validam-se o código, o sistema paralelizado e a ferramenta matemática com alguns exemplos. Finalmente, realiza-se um estudo para uma aplicação prática de interesse com a ferramenta desenvolvida. / Researches and projects involving electromagnetic problems are continuously increasing. As much for researches as for projects, there is a resource of achieving computer simulations for the involved problems aiming to investigate the electromagnetic phenomenons behavior, in the situation they are. There are cases, however, the problem results in high computational size, requesting more memories sizes and high processing times, because of the given geometries or high accuracy wanted. With the intent of solving these questions, the parallel computation developing becomes interesting. One of the possible implementations of this parallel system is the use of a computer network. Besides, using free programms, the implementation has almost any costs. The present work, using the FDTD method, aims at the implementation of this parallel system. However, during the development stage, a special attention was given to the programming practices, with the intent of guaranteeing the flexibility, modularity and expansibility of the program. In addition, a mathematic tool was developed to estimate the total processing time of the parallel simulation and to predict indications for adjustments of parameters to reach the minimum time possible. The code, the parallel system and the mathematic tool are validated with some examples. Finally, a study for a practical aplication of interest is done with the developed tool.
345

Processamento paralelo na simulação de campos eletromagnéticos pelo método das diferenças finitas no domínio do tempo - FDTD. / Parallel processing in the electromagnetic fields simulation with the finite-difference time-domain method - FDTD.

Trevizan, Marcelo Porto 08 January 2007 (has links)
São crescentes as pesquisas e os projetos envolvendo o eletromagnetismo. Tanto para as pesquisas quanto para os projetos, tem-se o recurso de realizar simulações computacionais dos problemas envolvidos, a fim de investigar o comportamento dos fenômenos eletromagnéticos diante da situação na qual encontram-se. Há casos, contudo, em que o problema pode ficar computacionalmente grande, requisitando maior quantidade de memória e maior tempo de processamento, devido às geometrias envolvidas ou à acuracidade desejada. Com o objetivo de contornar estas questões, tem-se o desenvolvimento da computação paralela. Uma das implementações possíveis de sistema paralelizado é por meio de uma rede de computadores e, empregando-se programas gratuitos, tem-se sua realização a custo praticamente nulo. O presente trabalho, utilizando o método FDTD, visa a implementação de tal sistema paralelizado. Entretanto, na etapa de desenvolvimento, uma especial atenção foi dada às boas práticas de programação, com o objetivo de garantir ao programa flexibilidade, modularidade e expansibilidade. Adicionalmente, desenvolveu-se uma ferramenta matemática para estimar o tempo de processamento total de uma simulação paralelizada, além de fornecer indicativos de ajustes de parâmetros para que este tempo seja o menor possível. Validam-se o código, o sistema paralelizado e a ferramenta matemática com alguns exemplos. Finalmente, realiza-se um estudo para uma aplicação prática de interesse com a ferramenta desenvolvida. / Researches and projects involving electromagnetic problems are continuously increasing. As much for researches as for projects, there is a resource of achieving computer simulations for the involved problems aiming to investigate the electromagnetic phenomenons behavior, in the situation they are. There are cases, however, the problem results in high computational size, requesting more memories sizes and high processing times, because of the given geometries or high accuracy wanted. With the intent of solving these questions, the parallel computation developing becomes interesting. One of the possible implementations of this parallel system is the use of a computer network. Besides, using free programms, the implementation has almost any costs. The present work, using the FDTD method, aims at the implementation of this parallel system. However, during the development stage, a special attention was given to the programming practices, with the intent of guaranteeing the flexibility, modularity and expansibility of the program. In addition, a mathematic tool was developed to estimate the total processing time of the parallel simulation and to predict indications for adjustments of parameters to reach the minimum time possible. The code, the parallel system and the mathematic tool are validated with some examples. Finally, a study for a practical aplication of interest is done with the developed tool.
346

A TIME-AND-SPACE PARALLELIZED ALGORITHM FOR THE CABLE EQUATION

Li, Chuan 01 August 2011 (has links)
Electrical propagation in excitable tissue, such as nerve fibers and heart muscle, is described by a nonlinear diffusion-reaction parabolic partial differential equation for the transmembrane voltage $V(x,t)$, known as the cable equation. This equation involves a highly nonlinear source term, representing the total ionic current across the membrane, governed by a Hodgkin-Huxley type ionic model, and requires the solution of a system of ordinary differential equations. Thus, the model consists of a PDE (in 1-, 2- or 3-dimensions) coupled to a system of ODEs, and it is very expensive to solve, especially in 2 and 3 dimensions. In order to solve this equation numerically, we develop an algorithm, extended from the Parareal Algorithm, to efficiently incorporate space-parallelized solvers into the framework of the Parareal algorithm, to achieve time-and-space parallelization. Numerical results and comparison of the performance of several serial, space-parallelized and time-and-space-parallelized time-stepping numerical schemes in one-dimension and in two-dimensions are also presented.
347

Melt convection in welding and crystal growth

Do-Quang, Minh January 2004 (has links)
A parallel finite element code with adaptive meshing was developed and used to study three dimensional, time-dependent fluid flows caused by thermocapillary convection as well as temperature and dopant distribution in fusion welding and floating zone crystal growth. A comprehensive numerical model of the three dimensional time-dependent fluid flows in a weld pool had been developed. This model considered most of the physical mechanisms involved in gas tungsten arc welding. The model helped obtaining the actual chaotic time-dependent melt flow. It was found that the fluid flow in the weld pool was highly complex and influenced the weld pool’s depth and width. The physicochemical model had also been studied and applied numerically in order to simulate the surfactant adsorption onto the surface effect to the surface tension of the metal liquid in a weld pool. Another model, a three dimensional time-dependent, with adaptive mesh refinement and coarsening was applied for simulating the effect of weak flow on the radial segregation in floating zone crystal growth. The phase change equation was also included in this model in order to simulate the real interface shape of floating zone. In the new parallel code, a scheme that keeps the level of node and face instead of the complete history of refinements was utilized to facilitate derefinement. The information was now local and the exchange of information between each and every processor during the derefinement process was minimized. This scheme helped to improve the efficiency of the parallel adaptive solver. / QC 20100527
348

PARALLEL COMPUTING ALGORITHMS FOR TANDEM

2013 April 1900 (has links)
Tandem mass spectrometry, also known as MS/MS, is an analytical technique to measure the mass-to-charge ratio of charged ions and widely used in genomics, proteomics and metabolomics areas. There are two types of automatic ways to interpret tandem mass spectra: de novo methods and database searching methods. Both of them need to use massive computational resources and complicated comparison algorithms. The real-time peptide-spectrum matching (RT-PSM) algorithm is a database searching method to interpret tandem mass spectra with strict time constraints. Restricted by the hardware and architecture of an individual workstation the RT-PSM algorithm has to sacrifice the level of accuracy in order to provide prerequisite processing speed. The peptide-spectrum similarity scoring module is the most time-consuming part out of four modules in the RT-PSM algorithm, which is also the core of the algorithm. In this study, a multi-core computing algorithm is developed for individual workstations. Moreover, a distributed computing algorithm is designed for a cluster. The improved algorithms can achieve the speed requirement of RT-PSM without sacrificing the accuracy. With some expansion, this distributed computing algorithm can also support different PSM algorithms. Simulation results show that compared with the original RT-PSM, the parallelization version achieves 25 to 34 times speed-up based on different individual workstations. A cluster with 240 CPU cores could accelerate the similarity score module 210 times compare with the single-thread similarity score module and the whole peptide identification process 85 times compare with the single-thread peptide identification process.
349

A model of dynamic compilation for heterogeneous compute platforms

Kerr, Andrew 10 December 2012 (has links)
Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability, as different processors support different notions of parallelism, whether vector parallelism executing in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software experiences obstacles to portability and efficient execution beyond differences in instruction sets; rather, the underlying execution models of radically different architectures may not be compatible. Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction layer decoupling program representations from optimized binaries, thus enabling portability without encumbering performance. This dissertation proposes several techniques that extend dynamic compilation to data-parallel execution models. These contributions include: - characterization of data-parallel workloads - machine-independent application metrics - framework for performance modeling and prediction - execution model translation for vector processors - region-based compilation and scheduling We evaluate these claims via the development of a novel dynamic compilation framework, GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage of vector instruction set extensions, and effectively exploit data locality via scheduling which attempts to maximize control locality.
350

Soporte arquitectónico a la sincronización imparcial de lectores y escritores en computadores paralelos

Vallejo Gutiérrez, Enrique 10 June 2010 (has links)
La evolución tecnológica en el diseño de microprocesadores ha conducido a sistemas paralelos con múltiples hilos de ejecución. Estos sistemas son más difíciles de programar y presentan overheads mayores que los sistemas uniprocesadores tradicionales, que pueden limitar su rendimiento y escalabilidad: sincronización, coherencia, consistencia y otros mecanismos requeridos para garantizar una ejecución correcta. La programación paralela tradicional se basa en primitivas de sincronización como barreras y locks de lectura/escritura, con alta tendencia a fallos de programación. La Memoria Transaccional (TM) oculta estos problemas de sincronización al programador; sin embargo, múltiples sistemas TM aún se basan en locks, y se beneficiarían de una implementación eficiente de los mismos.Esta tesis presenta nuevas técnicas hardware para acelerar la ejecución de estos programas paralelos. Proponemos un sistema TM híbrido basado en locks de lectura/escritura, que minimiza los overheads del software cuando la aceleración hardware está presente. Desarrollamos un mecanismo para garantizar fairness entre transacciones hardware y software. Introducimos un mecanismo distribuido de aceleración de locks de lectura/escritura, llamado Lock Control Unit. Finalmente, proponemos una organización de multiprocesadores basadas en Kilo-Instruction Processors que garantiza Consistencia Secuencial y permite especulación en secciones críticas. / Technological evolution in microprocessor design has led to parallel systems with multiple execution threads. These systems are more difficult to program and present higher performance overheads than the traditional uniprocessor systems, what may limit their performance and scalability: synchronization, coherence, consistency and other mechanisms required to guarantee a correct execution. Traditional parallel programming is based on synchronization primitives such as barriers, critical sections and reader/writer locks, highly prone to programming errors. Transactional Memory (TM) removes the synchronization problems from the programmer. However, many TM systems still rely on reader/writer locks, and would get benefited from an efficient implementation.This thesis presents new hardware techniques to accelerate the execution of such parallel programs. We propose a Hybrid TM system based on reader/writer locks, which minimizes the software overheads when acceleration hardware is present, still allowing for correct software-only execution. We propose a mechanism to guarantee fairness between hardware and software transactions is provided. We introduce a low-cost distributed mechanism named the Lock Control Unit to handle fine-grain reader-writer locks. Finally, we propose an organization of a mutiprocessor based on Kilo-Instruction Processors, which guarantees Sequential Consistency while allowing for speculation in critical sections.

Page generated in 0.1059 seconds