• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 226
  • 81
  • 30
  • 24
  • 14
  • 7
  • 6
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 501
  • 501
  • 103
  • 70
  • 61
  • 58
  • 58
  • 57
  • 57
  • 56
  • 54
  • 54
  • 52
  • 50
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
301

Automatic Parallel Memory Address Generation for Parallel DSP Computing

Dai, Jiehua January 2008 (has links)
The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access. This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory. According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses. A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible. The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.
302

Contact Sound Synthesis in Real-time Applications

Nilsson, Robin Lindh January 2014 (has links)
Synthesizing sounds which occur when physically-simulated objects collide in a virtual environment can give more dynamic and realistic sounds compared to pre-recorded sound effects. This real-time computation of sound samples can be computationally intense. In this study we investigate a synthesis algorithm operating in the frequency domain, previously shown to be more efficient than time domain synthesis, and propose a further optimization using multi-threading on the CPU. The multi-threaded synthesis algorithm was designed and implemented as part of a game being developed by Axolot Games. Measurements were done in three stress-testing cases to investigate how multi-threading improved the synthesis performance. Compared to our single-threaded approach, the synthesis speed was improved by 80% when using 8 threads, running on an i7 processor with hyper-threading enabled. We conclude that synthesis of contact sounds is viable for games and similar real-time applications, when using the investigated optimization. 140000 mode shapes were synthesized 30% faster than real-time, and this is arguably much more than a user can distinguish. / Syntetisering av ljud som uppstår när fysikobjekt kolliderar i en virtuell miljö kan ge mer dynamiska och realistiska ljudeffekter, men är krävande att beräkna. I det här examensarbetet implementerades ljudsyntes i frekvensdomänen baserat på en tidigare studie, och utvecklades sedan vidare till att utnyttja multipla trådar. Enligt mätningar i tre olika testfall kunde den multitrådade implementationen syntetisera 80% fler ljudvågor än den enkeltrådade, på en i7-processor. / <p>Author's website: www.robinerd.com</p>
303

A source-to-source compiler for the PRAM language Fork to the REPLICA many-core architecture

Zhou, Cheng January 2012 (has links)
This thesis describes the implementation of a source to source compiler that translates Fork language to REPLICA baseline language. The Fork language is a high-level programming language designed for the PRAM (Parallel Random Access Machine) model. The baseline language is a low-level parallel programming language for the REPLICA architecture which implements the PRAM computing model. To support the Fork language on REPLICA, a compiler that translates Fork to baseline is built.  The Fork to baseline compiler is built in compatibility with the Fork implementation for SB-PRAM. Moreover, the libraries that support Fork's features are built using baseline language.The evaluation result verifies that the features of the Fork language are supported in the implementation. The evaluation also shows the scalability of our implementation and shows that the overhead introduced by Fork-to-baseline translation is small.
304

Cellular distributed and parallel computing

Xu, Lei January 2014 (has links)
This thesis focuses on novel approaches to distributed and parallel computing that are inspired by the mechanism and functioning of biological cells. We refer to this concept as cellular distributed and parallel computing which focuses on three important principles: simplicity, parallelism, and locality. We first give a parallel polynomial-time solution to the constraint satisfaction problem (CSP) based on a theoretical model of cellular distributed and parallel computing, which is known as neural-like P systems (or neural-like membrane systems). We then design a class of simple neural-like P systems to solve the fundamental maximal independent set (MIS) selection problem efficiently in a distributed way, by drawing inspiration from the way that developing cells in the fruit fly become specialised. Building on the novel bio-inspired approach to distributed MIS selection, we propose a new simple randomised algorithm for another fundamental distributed computing problem: the distributed greedy colouring (GC) problem. We then propose an improved distributed MIS selection algorithm that incorporates for the first time another important feature of the biological system: adapting the probabilities used at each node based on local feedback from neighbouring nodes. The improved distributed MIS selection algorithm is again extended to solve the distributed greedy colouring problem. Both improved algorithms are simple and robust and work under very restrictive conditions, moreover, they both achieve state-of-the-art performance in terms of their worst-case time complexity and message complexity. Given any n-node graph with maximum degree Delta, the expected time complexity of our improved distributed MIS selection algorithm is O(log n) and the message complexity per node is O(1). The expected time complexity of our improved distributed greedy colouring algorithm is O(Delta + log n) and the message complexity per node is again O(1). Finally, we provide some experimental results to illustrate the time and message complexity of our proposed algorithms in practice. In particular, we show experimentally that the number of colours used by our distributed greedy colouring algorithms turns out to be optimal or near-optimal for many standard graph colouring benchmarks, so they provide effective simple heuristic approaches to computing a colouring with a small number of colours.
305

Accélération des calculs pour la simulation du laminage à pas de pèlerin en utilisant la méthode multimaillages / Speeding-up numerical simulation of cold pilgering using a mutlimesh method

Kpodzo, Koffi Woloe 18 March 2014 (has links)
Ce travail vise à accélérer les calculs lors de la simulation numérique du laminage à pas de pèlerin. Pour ce faire, nous nous sommes penchés sur la méthode Multimaillages Multiphysiques Parallèle (MMP) implémentée au sein du code Forge®, et destinée à accélérer les calculs pour des procédés de mise en forme à faible où la déformation est très localisée sur une petite zone du domaine. Dans cette méthode, un maillage localement raffiné dans la zone de déformation et plus grossier sur le reste du domaine est utilisé pour résoudre les équations mécaniques, alors qu'un maillage uniformément raffiné est retenu pour le calcul thermique. Le calcul mécanique étant le plus coûteux, la réduction du nombre de nœuds de son maillage permet d'obtenir des accélérations très importantes. Le maillage de calcul thermique étant uniformément fin, il sert aussi de maillage de stockage pour les champs calculés pour les deux physiques. Pour appliquer la efficacement méthode MMP au laminage à pas de pèlerin, plusieurs aspects importants ont été pris en compte. Tout d'abord la géométrie complexe du tube nécessite le développement d'une technique de déraffinement spéciale afin d'assurer un déraffinement maximal tout en garantissant un maillage convenable pour des calculs. Une technique de déraffinement de maillage utilisant une métrique anisotrope cylindrique a été alors développée. Ensuite, avec la loi de comportement élastoplastique utilisée, des perturbations importantes sont observées sur les contraintes dues aux diffusions numériques engendrées par les différents types de transports des champs P0 (constants du maillage thermique vers le maillage mécanique. Pour y remédier, une approche combinant deux techniques a été développée. La première consiste à effectuer la réactualisation des variables d'état directement sur le maillage mécanique plutôt que sur le maillage thermique et de les transporter ensuite. La deuxième technique est l'utilisation d'un opérateur de transport P0 basé sur un recouvrement super convergent (SPR) et la construction de champs d'ordre supérieur recouvrés. De bonnes accélérations sont alors obtenues sur les cas de laminage à pas de pèlerin étudiés, allant jusqu'à un facteur 6,5 pour la résolution du problème thermomécanique. Les accélérations globales de simulation vont jusqu'à un facteur 3,3 sur un maillage contenant environ 70 000 nœuds en séquentiel. En parallèle les performances chutent légèrement, mais elles restent semblables (2,7). / This work aims at speeding-up the calculations of the numerical simulation of the cold pilgering process. To this end, it is focused on a Parallel Multiphysics Multimesh (MMP) method that bas been implemented in the Forge® code; this method is dedicated to speeding-up the calculations for processes in which the deformation is localized within a small area of the computational domain. A locally refined mesh is used to solve the mechanical equations while a uniformly refined mesh serves as basic mesh to store state variables and is preferred for thermal calculations. The mechanical computations being the most expensive, reducing the number of nodes of its mesh provides high speed-ups. To effectively apply MMP method to cold pilgering process, many important aspects have been taken into account. Firstly the complex geometry of the tube requires the development of a special mesh coarsening technique, in order to ensure a maximum coarsening while guaranteeing a suitable mesh for calculations. A technique using a cylindrical anisotropic metric is then introduced. Afterwards, with the elastoplastic behaviour law used for the considered process, inaccuracies were observed on the stress field. They are mainly due to the numerical diffusion generated by the different transfers operations of P0 variables (constant per element) from the thermal mesh to the mechanical one. To remedy this issue, an approach combining two techniques has been developed. Firstly state variables are directly updated on the mechanical mesh, instead of doing it on the thermal mesh before transferring them on the the mechanical mesh. The second technique consists in using a P0 transfer operator based on super convergent recovery (SPR) technique, which improves the accuracy of the transported field through introduction of higher order recovered fields. High speed-ups are obtained on the studied cold pilgering cases, up to a factor 6,5 for the resolution of the thermomechanical problem, and the global simulation speed-up is up to a factor of 3,2, on a mesh with about 70 000 nodes in sequential calculations. For parallel calculations performances slightly drop but remain quite good.
306

Análise de execução de aplicações paralelas em grades móveis com restrições de processamento e bateria / Analysis of the execution of parallel applications using a mobile grid environment

Frederico Cassis Ribeiro Santos 10 March 2016 (has links)
Existem atualmente diversas propostas para integração de dispositivos móveis em uma grade computacional, porém vários problemas são observados em tais ambientes. Esta dissertação mantém o foco em um problema, a restrição sobre a quantidade de energia despendida na execução das aplicações, ao utilizar esses dispositivos móveis como provedores de recursos em uma grade computacional que fornece processamento para aplicações paralelas. Para tanto, este trabalho propõe um método para estimar o consumo de energia das aplicações considerando que elas utilizam um determinado conjunto de operações as quais estão presentes na grande maioria das aplicações paralelas (operações matemáticas e alocação de memória). Com base no método proposto, dois dispositivos móveis foram estudados e foi criada uma representação do consumo de energia utilizando-se de métodos de regressão. Para validar os modelos, duas aplicações foram analisadas e o consumo de energia real foi comparado ao consumo estimado. O modelo criado apresentou resultados próximos ao medido, mostrando um aumento entre 6% e 14,24% em relação ao resultado medido. / Nowadays, there are different proposals to integrate mobile devices in a computational grid, although several problems are introduces. This dissertation focus on the energy limitation problem when using mobile devices to provide resources, such as processing power to run parallel applications. It also proposes a method to estimate energy consumption for a task that needs to be executed in this environment. To achieve this goal two mobile devices were used as a test case and a representation of its energy consumption was created running benchmarks and using regression techniques. To validate the model created, two applications were executed and had the measured values compared to the estimated ones. The estimation showed a raise between 6 and 14.24 percent.
307

Desenvolvimento e otimização de um código paralelizado para simulação de escoamentos incompressíveis / Development and optimization of a parallel code for the simulation of incompressible flows

Josuel Kruppa Rogenski 06 April 2011 (has links)
O presente trabalho de pesquisa tem por objetivo estudar a paralelização de algoritmos voltados à solução de equações diferenciais parciais. Esses algoritmos são utilizados para gerar a solução numérica das equações de Navier-Stokes em um escoamento bidimensional incompressível de um fluido newtoniano. As derivadas espaciais são calculadas através de um método de diferenças finitas compactas com a utilização de aproximações de altas ordens de precisão. Uma vez que o cálculo de derivadas espaciais com alta ordem de precisão da forma compacta adotado no presente estudo requer a solução de sistemas lineares tridiagonais, é importante realizar estudos voltados a resolução desses sistemas, para se obter uma boa performance. Ressalta-se ainda que a solução de sistemas lineares também faz-se presente na solução numérica da equação de Poisson. Os resultados obtidos decorrentes da solução das equações diferenciais parciais são comparados com os resultados onde se conhece a solução analítica, de forma a verificar a precisão dos métodos implementados. Os resultados do código voltado à resolução das equações de Navier-Stokes paralelizado para simulação de escoamentos incompressíveis são comparados com resultados da teoria de estabilidade linear, para validação do código final. Verifica-se a performance e o speedup do código em questão, comparando-se o tempo total gasto em função do número de elementos de processamento utilizados / The objective of the present work is to study the parallelization of partial differential equations. The aim is to achieve an effective parallelization to generate numerical solution of Navier-Stokes equations in a two-dimensional incompressible and isothermal flow of a Newtonian fluid. The spatial derivatives are calculated using compact finite differences approximations of higher order accuracy. Since the calculation of spatial derivatives with high order adopted in the present work requires the solution of tridiagonal systems, it is important to conduct studies to solve these systems and achieve good performance. In addiction, linear systems solution is also present in the numerical solution of a Poisson equation. The results generated by the solution of partial differential equations are compared to analytical solution, in order to verify the accuracy of the implemented methods. The numerical parallel solution of a Navier-Stokes equations is compared with linear stability theory to validate the final code. The performance and the speedup of the code in question is also checked, comparing the execution time in function of the number of processing elements
308

Estudo de escalabilidade de servidores baseados em eventos em sitemas multiprocessados: um estudo de caso completo\" / Scalability study of event-driven servers in multi-processed systems: a complete case study

Daniel de Angelis Cordeiro 27 October 2006 (has links)
O crescimento explosivo no número de usuários de Internet levou arquitetos de software a reavaliarem questões relacionadas à escalabilidade de serviços que são disponibilizados em larga escala. Projetar arquiteturas de software que não apresentem degradação no desempenho com o aumento no número de acessos concorrentes ainda é um desafio. Neste trabalho, investigamos o impacto do sistema operacional em questões relacionadas ao desempenho, paralelização e escalabilidade de jogos interativos multi-usuários. Em particular, estudamos e estendemos o jogo interativo, multi-usuário, QuakeWorld, disponibilizado publicamente pela id Software sob a licença GPL. Criamos um modelo de paralelismo para a simulação distribuída realizada pelo jogo e o implementamos no servidor do QuakeWorld com adaptações que permitem que o sistema operacional gerencie de forma adequada a execução da carga de trabalho gerada. / The explosive growth in the number of Internet users made software architects reevaluate issues related to the scalability of services deployed on a large scale. It is still challenging to design software architectures that do not experience performance degradation when the concurrent access increases. In this work, we investigate the impact of the operating system in issues related to performance, parallelization, and scalability of interactive multiplayer games. Particularly, we study and extend the interactive, multiplayer game QuakeWorld, made publicly available by id Software under GPL license. We have created a new parallelization model for Quake\'s distributed simulation and implemented that model in QuakeWorld server with adaptations that allows the operating system to manage the execution of the generated workload in a more convenient way.
309

Simulating propeller and Propeller-Hull Interaction in OpenFOAM

Mehdipour, Reza January 2014 (has links)
This is a master’s thesis performed at the Department of Shipping and Marine Technology research group in Hydrodynamics at Chalmers University of Technology and is written for the Center for Naval Architecture at the Royal Institute of Technology, KTH.In order to meet increased requirements on efficient ship propulsions with low noise level, it is important to consider the complete system with both the hull and the propeller in the simulation.OpenFOAM (Open Field Operation and Manipulation) provides different techniques to simulate a rotating propeller with different physical and computational properties. MRF (The Multiple Reference Frame Model) is, perhaps, the easiest way but is a computationally efficient technique to model a rotating frame of reference. The sliding grid techniques provide the more complex way to simulate the propeller and its surrounding region, rotating and interpolate on interface for transient effects. AMI, Arbitrary Mesh Interface, is a sliding grid implementation which is available in the recent versions of OpenFOAM, introduced in the official releases after v2.1.0.In this study, the main objective is to compare these two techniques, MRF and AMI, to perform the open water characteristics of the propeller with the Reynolds-Averaged Navier-Stokes equation computations (RANS) and study the accuracy in parallel performance and the benefits of each approach.More specifically, a self-propelled ship is simulated to study the interaction between the hull and propeller. In order to simplify and decrease the computational complexity the free surface is not considered. The ship under investigation is a 7000 DWT chemical tanker which is subject of a collaborative R&amp;D project called STREAMLINE, strategic research for innovative marine propulsion concepts. In self-propelled condition, the transient forces on the propeller shall be evaluated. This study investigates the results of the experimental work with advanced CFD for accurate analysis and design of the propulsion. In this thesis, all simulations are conducted by using parallel computing. Therefore, a scalability analysis is studied to find out how to affect the average computational time by using different number of nodes.
310

Reinforcement Learning in Eco-driving for Connected and Automated Vehicles

Zhu, Zhaoxuan January 2021 (has links)
No description available.

Page generated in 0.0487 seconds