• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 7
  • 7
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Jämförelse av trådbibliotek för Linux

Lundell, Daniel January 2003 (has links)
<p>Den dominerande implementationen av POSIX Threads-standarden för Linux har länge varit LinuxThreads. Detta trådbibliotek har på senare tid kritiserats för att leverera bland annat bristfällig effektivitet, skalbarhet och följsamhet till POSIX-standarden. Två nya projekt, Next Generation POSIX Threads (IBM, Intel) och Native POSIX Threads Library (Red Hat) siktar på att erbjuda bättre alternativ till LinuxThreads.</p><p>Denna undersökning jämför effektiviteten för dessa tre trådbibliotek, i synnerhet i samband med växling, skapande och synkronisering av trådar samt skalbarhet, huvudsakligen genom användandet av ett testprogram från Sun Microsystems.</p><p>Resultaten visar att NGPT skalar linjärt och är det klart effektivaste trådbiblioteket över 4600 trådar. NPTL är effektivast under 4600 trådar fast dess exponentiella kurva medför sämre resultat vid ett högre antal trådar. LinuxThreads skalade sämst och fick sämst resultat över 1200 trådar och resultaten visade även högst varians. Det rekommenderas att ersätta LinuxThreads med NPTL eller NGPT, beroende på kravet på skalbarhet.</p>
2

Jämförelse av trådbibliotek för Linux

Lundell, Daniel January 2003 (has links)
Den dominerande implementationen av POSIX Threads-standarden för Linux har länge varit LinuxThreads. Detta trådbibliotek har på senare tid kritiserats för att leverera bland annat bristfällig effektivitet, skalbarhet och följsamhet till POSIX-standarden. Två nya projekt, Next Generation POSIX Threads (IBM, Intel) och Native POSIX Threads Library (Red Hat) siktar på att erbjuda bättre alternativ till LinuxThreads. Denna undersökning jämför effektiviteten för dessa tre trådbibliotek, i synnerhet i samband med växling, skapande och synkronisering av trådar samt skalbarhet, huvudsakligen genom användandet av ett testprogram från Sun Microsystems. Resultaten visar att NGPT skalar linjärt och är det klart effektivaste trådbiblioteket över 4600 trådar. NPTL är effektivast under 4600 trådar fast dess exponentiella kurva medför sämre resultat vid ett högre antal trådar. LinuxThreads skalade sämst och fick sämst resultat över 1200 trådar och resultaten visade även högst varians. Det rekommenderas att ersätta LinuxThreads med NPTL eller NGPT, beroende på kravet på skalbarhet.
3

Resolução de um problema térmico inverso utilizando processamento paralelo em arquiteturas de memória compartilhada / Resolution of an inverse thermal problem using parallel processing on shared memory architectures

Ansoni, Jonas Laerte 03 September 2010 (has links)
A programação paralela tem sido freqüentemente adotada para o desenvolvimento de aplicações que demandam alto desempenho computacional. Com o advento das arquiteturas multi-cores e a existência de diversos níveis de paralelismo é importante definir estratégias de programação paralela que tirem proveito desse poder de processamento nessas arquiteturas. Neste contexto, este trabalho busca avaliar o desempenho da utilização das arquiteturas multi-cores, principalmente o oferecido pelas unidades de processamento gráfico (GPUs) e CPUs multi-cores na resolução de um problema térmico inverso. Algoritmos paralelos para a GPU e CPU foram desenvolvidos utilizando respectivamente as ferramentas de programação em arquiteturas de memória compartilhada NVIDIA CUDA (Compute Unified Device Architecture) e a API POSIX Threads. O algoritmo do método do gradiente conjugado pré-condicionado para resolução de sistemas lineares esparsos foi implementado totalmente no espaço da memória global da GPU em CUDA. O algoritmo desenvolvido foi avaliado em dois modelos de GPU, os quais se mostraram mais eficientes, apresentando um speedup de quatro vezes que a versão serial do algoritmo. A aplicação paralela em POSIX Threads foi avaliada em diferentes CPUs multi-cores com distintas microarquiteturas. Buscando um maior desempenho do código paralelizado foram utilizados flags de otimização as quais se mostraram muito eficientes na aplicação desenvolvida. Desta forma o código paralelizado com o auxílio das flags de otimização chegou a apresentar tempos de processamento cerca de doze vezes mais rápido que a versão serial no mesmo processador sem nenhum tipo de otimização. Assim tanto a abordagem utilizando a GPU como um co-processador genérico a CPU como a aplicação paralela empregando as CPUs multi-cores mostraram-se ferramentas eficientes para a resolução do problema térmico inverso. / Parallel programming has been frequently adopted for the development of applications that demand high-performance computing. With the advent of multi-cores architectures and the existence of several levels of parallelism are important to define programming strategies that take advantage of parallel processing power in these architectures. In this context, this study aims to evaluate the performance of architectures using multi-cores, mainly those offered by the graphics processing units (GPUs) and CPU multi-cores in the resolution of an inverse thermal problem. Parallel algorithms for the GPU and CPU were developed respectively, using the programming tools in shared memory architectures, NVIDIA CUDA (Compute Unified Device Architecture) and the POSIX Threads API. The algorithm of the preconditioned conjugate gradient method for solving sparse linear systems entirely within the global memory of the GPU was implemented by CUDA. It evaluated the two models of GPU, which proved more efficient by having a speedup was four times faster than the serial version of the algorithm. The parallel application in POSIX Threads was evaluated in different multi-core CPU with different microarchitectures. Optimization flags were used to achieve a higher performance of the parallelized code. As those were efficient in the developed application, the parallelized code presented processing times about twelve times faster than the serial version on the same processor without any optimization. Thus both the approach using GPU as a coprocessor to the CPU as a generic parallel application using the multi-core CPU proved to be more efficient tools for solving the inverse thermal problem.
4

Resolução de um problema térmico inverso utilizando processamento paralelo em arquiteturas de memória compartilhada / Resolution of an inverse thermal problem using parallel processing on shared memory architectures

Jonas Laerte Ansoni 03 September 2010 (has links)
A programação paralela tem sido freqüentemente adotada para o desenvolvimento de aplicações que demandam alto desempenho computacional. Com o advento das arquiteturas multi-cores e a existência de diversos níveis de paralelismo é importante definir estratégias de programação paralela que tirem proveito desse poder de processamento nessas arquiteturas. Neste contexto, este trabalho busca avaliar o desempenho da utilização das arquiteturas multi-cores, principalmente o oferecido pelas unidades de processamento gráfico (GPUs) e CPUs multi-cores na resolução de um problema térmico inverso. Algoritmos paralelos para a GPU e CPU foram desenvolvidos utilizando respectivamente as ferramentas de programação em arquiteturas de memória compartilhada NVIDIA CUDA (Compute Unified Device Architecture) e a API POSIX Threads. O algoritmo do método do gradiente conjugado pré-condicionado para resolução de sistemas lineares esparsos foi implementado totalmente no espaço da memória global da GPU em CUDA. O algoritmo desenvolvido foi avaliado em dois modelos de GPU, os quais se mostraram mais eficientes, apresentando um speedup de quatro vezes que a versão serial do algoritmo. A aplicação paralela em POSIX Threads foi avaliada em diferentes CPUs multi-cores com distintas microarquiteturas. Buscando um maior desempenho do código paralelizado foram utilizados flags de otimização as quais se mostraram muito eficientes na aplicação desenvolvida. Desta forma o código paralelizado com o auxílio das flags de otimização chegou a apresentar tempos de processamento cerca de doze vezes mais rápido que a versão serial no mesmo processador sem nenhum tipo de otimização. Assim tanto a abordagem utilizando a GPU como um co-processador genérico a CPU como a aplicação paralela empregando as CPUs multi-cores mostraram-se ferramentas eficientes para a resolução do problema térmico inverso. / Parallel programming has been frequently adopted for the development of applications that demand high-performance computing. With the advent of multi-cores architectures and the existence of several levels of parallelism are important to define programming strategies that take advantage of parallel processing power in these architectures. In this context, this study aims to evaluate the performance of architectures using multi-cores, mainly those offered by the graphics processing units (GPUs) and CPU multi-cores in the resolution of an inverse thermal problem. Parallel algorithms for the GPU and CPU were developed respectively, using the programming tools in shared memory architectures, NVIDIA CUDA (Compute Unified Device Architecture) and the POSIX Threads API. The algorithm of the preconditioned conjugate gradient method for solving sparse linear systems entirely within the global memory of the GPU was implemented by CUDA. It evaluated the two models of GPU, which proved more efficient by having a speedup was four times faster than the serial version of the algorithm. The parallel application in POSIX Threads was evaluated in different multi-core CPU with different microarchitectures. Optimization flags were used to achieve a higher performance of the parallelized code. As those were efficient in the developed application, the parallelized code presented processing times about twelve times faster than the serial version on the same processor without any optimization. Thus both the approach using GPU as a coprocessor to the CPU as a generic parallel application using the multi-core CPU proved to be more efficient tools for solving the inverse thermal problem.
5

Διάφανη απεικόνιση προγραμματιστικών μοντέλων υψηλού επιπέδου σε ετερόμορφες παράλληλες αρχιτεκτονικές

Βενέτης, Ιωάννης 16 March 2009 (has links)
- / -
6

Accelerating Hardware Simulation on Multi-cores

Nanjundappa, Mahesh 04 June 2010 (has links)
Electronic design automation (EDA) tools play a central role in bridging the productivity gap for designing complex hardware systems. However, with an increase in the size and complexity of today's design requirements, current methodologies and EDA tools are unable to effectively mitigate the further widening of productivity gap. It is estimated that testing and verification takes 2/3rd of the total development time of complex hardware systems. Functional simulation forms the main stay of testing and verification process and is the most widely used technique for testing and verification. Most of the simulation algorithms and their implementations are designed for uniprocessor systems that cannot easily leverage the parallelism in multi-core and GPU platforms. For example, logic simulation often uses levelized sequential algorithms, whereas the discrete-event simulation frameworks for Verilog, VHDL and SystemC employ concurrency in the form of multi-threading to given an illusion of the inherent parallelism present in circuits. However, the discrete-event model of computation requires a global notion of an event-queue, which makes improving its simulation performance via parallelization even more challenging. This work investigates automatic parallelization of simulation algorithms used to simulate hardware models. In particular, we focus on parallelizing the simulation of hardware designs described at the RTL using SystemC/HDL with examples to clearly describe the parallelization. Even though multi-cores and GPUs other parallelism, efficiently exploiting this parallelism with their programming models is not straightforward. To overcome this, we also focus our research on building intelligent translators to map simulation applications onto multi-cores and GPUs such that the complexity of the low-level programming models is hidden from the designers. / Master of Science
7

Paralelní trénování neuronových sítí pro rozpoznávání řeči / Parallel Training of Neural Networks for Speech Recognition

Veselý, Karel January 2010 (has links)
This thesis deals with different parallelizations of training procedure for artificial neural networks. The networks are trained as phoneme-state acoustic descriptors for speech recognition. Two effective parallelization strategies were implemented and compared. The first strategy is data parallelization, where the training is split into several POSIX threads. The second strategy is node parallelization, which uses CUDA framework for general purpose computing on modern graphic cards. The first strategy showed a 4x speed-up, while using the second strategy we observed nearly 10x speed-up. The Stochastic Gradient Descent algorithm with error backpropagation was used for the training. After a short introduction, the second chapter of this thesis shows the motivation and introduces the neural networks into the context of speech recognition. The third chapter is theoretical, the anatomy of a neural network and the used training method are discussed. The following chapters are focused on the design and implementation of the project, while the phases of the iterative development are described. The last extensive chapter describes the setup of the testing system and reports the experimental results. Finally, the obtained results are concluded and the possible extensions of the project are proposed.

Page generated in 0.0464 seconds