Global ETD Search

41	Analyzing OpenMP Parallelization Capabilities and Finding Thread Handling Optimums Olofsson, Simon, Olsson, Emrik January 2018 (has links) Utmaningar i modern processortillverkning begränsar klockfrekvensen för enkeltrådiga applikationer, vilket har resulterat i utvecklingen av flerkärniga processorer. Dessa processorer tillåter flertrådig exekvering och ökar därmed prestandan. För att undersöka möjligheterna med parallell exekvering används en Fast Fourier Transform algoritm där trådprestanda mäts för olika skapade tester med varierande problemstorlekar. Dessa tester körs på tre testsystem och använder olika sökalgoritmer för att dynamiskt justera antalet trådar vid exekvering. Denna prestanda jämförs sedan med den högsta möjliga prestanda som kan fås genom Brute-Forcing. Testerna använder OpenMP-instruktioner för att specificera antalet trådar som finns tillgängliga för programexekvering. För mindre problemstorlekar resulterar färre antal trådar i högre prestanda. Motsatsen gäller för större problemstorlekar, där många trådar föredras istället. Denna rapport visar att användning av alla tillgängliga trådar för ett system inte är optimalt i alla lägen då det finns en tydlig koppling mellan problemstorlek och det optimala antalet trådar för maximal prestanda. Detta gäller för alla tre testsystem som omfattas av rapporten. Metodiken som har använts för att skapa testerna har gjort det möjligt att dynamiskt kunna justera antalet trådar vid exekvering. Rapporten visar också att dynamisk justering av antalet trådar inte passar för alla typer av applikationer. / As physical limitations limit the clock frequencies available for a single thread, processor vendors increasingly build multi-core systems with support for dividing processes across multiple threads for increased overall processing power. To examine parallelization capabilities, a fast fourier transform algorithm is used to benchmark parallel execution and compare brute-forced optimum with results from various search algorithms and scenarios across three different testbed systems. These algorithms use OpenMP instructions to directly specify number of threads available for program execution. For smaller problem sizes the tests heavily favour fewer threads, whereas the larger problems favour the native 'maximum' thread count. Several algorithms were used to compare ways of searching for the optimum thread values at runtime. We showed that running at maximum threads is not always the most optimum choice as there is a clear relationship between the problem size and the optimal thread-count in the experimental setup across all three machines. The methods used also made it possible to identify a way to dynamically adjust the thread-count during runtime of the benchmark, however it is not certain all applications would be suitable for this type of dynamic thread assignment OpenMP Parallelization Capabilities Dynamic Thread Handling Performance OpenMP Parallell exekvering Dynamisk trådhantering Prestanda Optimum Computer Systems Datorsystem
42	Análise automática de acessos concorrentes a dados para refatoração de código sequencial em código paralelo OpenMP / Automatic analysis of concurrent access data for sequential code refactoring in OpenMP parallel code Tietzmann, Dionatan Kitzmann 16 December 2011 (has links) The manual transformation of sequential programs into parallel code is not an easy task. It requires very effort and attention of the developer during this process at great risk of introducing errors that can not be perceived by the programmer. One of these problems, strongly connected to shared memory parallel programming is the race condition. This problem occurs because of the simultaneous manipulation performed for more than a thread on a variable shared between them, with the result of this variable dependent of the access order. Exploring this difficulty, this work proposes an approach that helps the programmer during the refactoring of a sequential code for OpenMP parallel code, identifying variables in an automated manner that may have problems of race condition. To this end, we propose a verification algorithm based on access to the variables and made its implementation using the Photran framework tool (a plugin for editing FORTRAN code integrated into the Eclipse IDE). For purposes of empirical evaluation of the algorithm, we present tests with small programs and code examples showing the operation of the tool in the cases provided. In addition, it presents a case study based on a real and complex application, showing the ability of the algorithm to identify all the variables at risk, as well as illustrating some of its known limitations. / A transformação manual de programas sequenciais em código paralelo não é uma tarefa fácil. Ela requer muito esforço e atenção do programador durante esse processo, correndo grande risco de se introduzir erros que podem não ser percebidos pelo programador. Um desses problemas, fortemente ligado à programação paralela em memória compartilhada, é a condição de corrida. Esse problema ocorre em virtude da manipulação concomitante realizada por mais de uma thread sobre uma variável compartilhada entre elas, sendo o resultado desta variável dependente da ordem de acesso. Explorando essa dificuldade, este trabalho propõe uma abordagem que auxilie o programador durante a refatoração de código sequencial para código paralelo OpenMP, identificando de forma automatizada variáveis que podem vir a ter problemas de condição de corrida. Para tanto, é proposto um algoritmo de verificação baseado no acesso às variáveis e feita a sua implementação utilizando-se do framework da ferramenta Photran (um plugin para edição de código Fortran integrado ao IDE Eclipse). Para fins de avaliação empírica do algoritmo, apresentam-se testes realizados com pequenos programas e exemplos de código, mostrando o funcionamento da ferramenta nos casos previstos. Além disso, apresenta-se um estudo de caso baseado em uma aplicação real e complexa, mostrando a habilidade do algoritmo em identificar as variáveis em risco, bem como ilustrando algumas de suas limitações conhecidas. Refatoração Programação paralela Condição de corrida OpenMP Refactoring Parallel programing Race condidition OpenMP
43	Programmation haute performance pour architectures hybrides / High Performance Programming for Hybrid Architectures Habel, Rachid 19 November 2014 (has links) Les architectures parallèles hybrides constituées d'un grand nombre de noeuds de calcul multi-coeurs/GPU connectés en réseau offrent des performances théoriques très élevées, de l'ordre de quelque dizaines de TeraFlops. Mais la programmation efficace de ces machines reste un défi à cause de la complexité de l'architecture et de la multiplication des modèles de programmation utilisés. L'objectif de cette thèse est d'améliorer la programmation des applications scientifiques denses sur les architectures parallèles hybrides selon trois axes: réduction des temps d'exécution, traitement de données de très grande taille et facilité de programmation. Nous avons pour cela proposé un modèle de programmation à base de directives appelé DSTEP pour exprimer à la fois la distribution des données et des calculs. Dans ce modèle, plusieurs types de distribution de données sont exprimables de façon unifiée à l'aide d'une directive "dstep distribute" et une réplication de certains éléments distribués peut être exprimée par un "halo". La directive "dstep gridify" exprime à la fois la distribution des calculs ainsi que leurs contraintes d'ordonnancement. Nous avons ensuite défini un modèle de distribution et montré la correction de la transformation de code du domaine séquentiel au domaine distribué. À partir du modèle de distribution, nous avons dérivé un schéma de compilation pour la transformation de programmes annotés de directives DSTEP en des programmes parallèles hybrides. Nous avons implémenté notre solution sous la forme d'un compilateur intégré à la plateforme de compilation PIPS ainsi qu'une bibliothèque fournissant les fonctionnalités du support d'exécution, notamment les communications. Notre solution a été validée sur des programmes de calcul scientifiques standards tirés des NAS Parallel Benchmarks et des Polybenchs ainsi que sur une application industrielle. / Clusters of multicore/GPU nodes connected with a fast network offer very high therotical peak performances, reaching tens of TeraFlops. Unfortunately, the efficient programing of such architectures remains challenging because of their complexity and the diversity of the existing programming models. The purpose of this thesis is to improve the programmability of dense scientific applications on hybrid architectures in three ways: reducing the execution times, processing larger data sets and reducing the programming effort. We propose DSTEP, a directive-based programming model expressing both data and computation distribution. A large set of distribution types are unified in a "dstep distribute" directive and the replication of some distributed elements can be expressed using a "halo". The "dstep gridify" directive expresses both the computation distribution and the schedule constraints of loop iterations. We define a distribution model and demonstrate the correctness of the code transformation from the sequential domain to the parallel domain. From the distribution model, we derive a generic compilation scheme transforming DSTEP annotated input programs into parallel hybrid ones. We have implemented such a tool as a compiler integrated to the PIPS compilation workbench together with a library offering the runtime functionality, especially the communication. Our solution is validated on scientific programs from the NAS Parallel Benchmarks and the PolyBenchs as well as on an industrial signal procesing application. Compilation Mémoire distribuée Mémoire partagée Gpu Mpi OpenMP Compilation Distributed-Memory Shared-Memory Gpu Mpi OpenMP 004
44	Benchmarking of Vision-Based Prototyping and Testing Tools Balasubramanian, ArunKumar 08 November 2017 (has links) (PDF) The demand for Advanced Driver Assistance System (ADAS) applications is increasing day by day and their development requires efficient prototyping and real time testing. ADTF (Automotive Data and Time Triggered Framework) is a software tool from Elektrobit which is used for Development, Validation and Visualization of Vision based applications, mainly for ADAS and Autonomous driving. With the help of ADTF tool, Image or Video data can be recorded and visualized and also the testing of data can be processed both on-line and off-line. The development of ADAS applications needs image and video processing and the algorithm has to be highly efficient and must satisfy Real-time requirements. The main objective of this research would be to integrate OpenCV library with ADTF cross platform. OpenCV libraries provide efficient image processing algorithms which can be used with ADTF for quick benchmarking and testing. An ADTF filter framework has been developed where the OpenCV algorithms can be directly used and the testing of the framework is carried out with .DAT and image files with a modular approach. CMake is also explained in this thesis to build the system with ease of use. The ADTF filters are developed in Microsoft Visual Studio 2010 in C++ and OpenMP API are used for Parallel programming approach. Bildverarbeitung OpenCV ADAS ADTF Image Processing OpenCV Advanced Driver Assistance Systems ADTF OpenMP ddc:004 Informatik Bildverarbeitung OpenCV ADAS OpenMP
45	Optimisation des temps de calculs dans le domaine de la simulation par éléments discrets pour des applications ferroviaires / Optimization of computation time in the numerical simulation using discrete element method. Application to railway ballast Hoang Thi Minh Phuong, Thi minh Phuong 05 December 2011 (has links) La dégradation géométrique de la voie ballastée sous circulation commerciale nécessite des opérations de maintenance fréquentes et onéreuses. La caractérisation du comportement des pro-cédés de maintenance comme le bourrage, la stabilisation dynamique, est nécessaire pour proposer des améliorations en terme de méthode, paramétrage pour augmenter la pérennité des travaux. La simulation numérique d'une portion de voie soumise à un bourrage ou une stabilisation dynamique permet de comprendre les phénomènes physiques mis en jeu dans le ballast. Toutefois, la complexité numérique de ce problème concernant l'étude de systèmes à très grand nombre de grains et en temps de sollicitation long, demande donc une attention particulière pour une résolution à moindre coût. L'objectif de cette thèse est de développer un outil de calcul numérique performant qui permet de réaliser des calculs dédiés à ce grand problème granulaire moins consommateur en temps. La méthodologie utilisée ici se base sur l'approche Non Smooth Contact Dynamic s(NSCD) avec une discrétisation par Éléments Discrets (DEM). Dans ce cadre, une méthode de dé-composition de domaine (DDM) alliée à une parallélisation adaptée en environnement à mémoire partagée utilisant OpenMP sont appliquées pour améliorer l'efficacité de la simulation numérique. / The track deterioration rate is strongly inﬂuenced by the ballast behaviour under commercial trafﬁc. In order to restore the initial track geometry, different maintenance processes are performed, like tamping, dynamic stabilisation. A better understanding of the ballast behaviour under these operations on a portion of railway track is a key to optimize the process, to limit degradationand to propose some concept for a better homogeneous compaction. The numerical simulation isdeveloped here to investigate the mechanical behaviour of ballast. However, the main difﬁcultiesof this research action concerns the size of the granular system simulation increasing both in termof number of grains and of process duration. The purpose of this thesis is to develop an efﬁcient numerical tool allows to realize faster computations devoted to large-scale granular samples. In this framework, the Non-Smooth Contact Dynamics (NSCD) of three-dimensional Discrete ElementMethod (DEM) simulations, improved by Domain Decomposition Method (DDM) and processedwith the Shared Memory parallel technique (using OpenMP) has been applied to study the ballast media mechanics. Ballast Maintenance Simulation numérique Dem Nscd-nlgs DDM-OpenMP Railway ballast Maintenance process Numerical simulation Dem Nscd-nlgs DDM-OpenMP
46	Équilibrage dynamique de charge sur supercalculateur exaflopique appliqué à la dynamique moléculaire / Dynamic load balancing on exaflop supercomputer applied to molecular dynamics Prat, Raphaël 09 October 2019 (has links) Dans le contexte de la dynamique moléculaire classique appliquée à la physique de la matière condensée, les chercheurs du CEA étudient des phénomènes physiques à une échelle atomique. Pour cela, il est primordial d'optimiser continuellement les codes de dynamique moléculaire sur les dernières architectures de supercalculateurs massivement parallèles pour permettre aux physiciens d'exploiter la puissance de calcul pour reproduire numériquement des phénomènes physiques toujours plus complexes. Cependant, les codes de simulations doivent être adaptés afin d'équilibrer la répartition de la charge de calcul entre les cœurs d'un supercalculateur.Pour ce faire, dans cette thèse nous proposons d'incorporer la méthode de raffinement de maillage adaptatif dans le code de dynamique moléculaire ExaSTAMP. L'objectif est principalement d'optimiser la boucle de calcul effectuant le calcul des interactions entre particules grâce à des structures de données multi-threading et vectorisables. La structure permet également de réduire l'empreinte mémoire de la simulation. La conception de l’AMR est guidée par le besoin d'équilibrage de charge et d'adaptabilité soulevé par des ensembles de particules se déplaçant très rapidement au cours du temps.Les résultats de cette thèse montrent que l'utilisation d'une structure AMR dans ExaSTAMP permet d'améliorer les performances de celui-ci. L'AMR permet notamment de multiplier par 1.31 la vitesse d'exécution de la simulation d'un choc violent entraînant un micro-jet d'étain de 1 milliard 249 millions d'atomes sur 256 KNLs. De plus, l'AMR permet de réaliser des simulations qui jusqu'à présent n'étaient pas concevables comme l'impact d'une nano-goutte d'étain sur une surface solide avec plus 500 millions d'atomes. / In the context of classical molecular dynamics applied to condensed matter physics, CEA researchers are studying complex phenomena at the atomic scale. To do this, it is essential to continuously optimize the molecular dynamics codes of recent massively parallel supercomputers to enable physicists to exploit their capacity to numerically reproduce more and more complex physical phenomena. Nevertheless, simulation codes must be adapted to balance the load between the cores of supercomputers.To do this, in this thesis we propose to incorporate the Adaptive Mesh Refinement method into the ExaSTAMP molecular dynamics code. The main objective is to optimize the computation loop performing the calculation of particle interactions using multi-threaded and vectorizable data structures. The structure also reduces the memory footprint of the simulation. The design of the AMR is guided by the need for load balancing and adaptability raised by sets of particles moving dynamically over time.The results of this thesis show that using an AMR structure in ExaSTAMP improves its performance. In particular, the AMR makes it possible to execute 1.31 times faster than before the simulation of a violent shock causing a tin microjet of 1 billion 249 million atoms on 256 KNLs. In addition, simulations that were not conceivable so far can be carried out thanks to AMR, such as the impact of a tin nanodroplet on a solid surface with more than 500 million atoms. ExaStamp Hpc OpenMP Amr Graphe de dépendances de tâches Maillage adaptatif ExaStamp Hpc OpenMP Amr Task dependency graphe Adaptive mesh refinement
47	OpenMP parallelization in the NFFT software library Volkmer, Toni 29 August 2012 (has links) (PDF) We describe an implementation of a multi-threaded NFFT (nonequispaced fast Fourier transform) software library and present the used parallelization approaches. Besides the NFFT kernel, the NFFT on the two-sphere and the fast summation based on NFFT are also parallelized. Thereby, the parallelization is based on OpenMP and the multi-threaded FFTW library. Furthermore, benchmarks for various cases are performed. The results show that an efficiency higher than 0.50 and up to 0.79 can still be achieved at 12 threads. NFFT FFT OpenMP parallel fast Fourier transform nonequispaced fast Fourier transform NFFT FFT OpenMP ddc:004 ddc:518 Schnelle Fourier-Transformation Parallelverarbeitung OpenMP
48	Exploitation efficace des architectures parallèles de type grappes de NUMA à l’aide de modèles hybrides de programmation Clet-Ortega, Jérôme 18 April 2012 (has links) Les systèmes de calcul actuels sont généralement des grappes de machines composés de nombreux processeurs à l'architecture fortement hiérarchique. Leur exploitation constitue le défi majeur des implémentations de modèles de programmation tels MPI ou OpenMP. Une pratique courante consiste à mélanger ces deux modèles pour bénéficier des avantages de chacun. Cependant ces modèles n'ont pas été pensés pour fonctionner conjointement ce qui pose des problèmes de performances. Les travaux de cette thèse visent à assister le développeur dans la programmation d'application de type hybride. Il s'appuient sur une analyse de la hiérarchie architecturale du système de calcul pour dimensionner les ressources d'exécution (processus et threads). Plutôt qu'une approche hybride classique, créant un processus MPI multithreadé par noeud, nous évaluons de façon automatique des solutions alternatives, avec plusieurs processus multithreadés par noeud, mieux adaptées aux machines de calcul modernes. / Modern computing servers usually consist in clusters of computers with several multi-core CPUs featuring a highly hierarchical hardware design. The major challenge of the programming models implementations is to efficiently take benefit from these servers. Combining two type of models, like MPI and OpenMP, is a current trend to reach this point. However these programming models haven't been designed to work together and that leads to performance issues. In this thesis, we propose to assist the programmer who develop hybrid applications. We lean on an analysis of the computing system architecture in order to set the number of processes and threads. Rather than a classical hybrid approach, that is to say creating one multithreaded MPI process per node, we automatically evaluate alternative solutions, with several multithreaded processes per node, better fitted to modern computing systems. Calcul hautes performances Mpi OpenMP Numa Parallélisme (Informatique) Modèle de programmation Hiérarchie de mémoire (Informatique) HIgh performance computing Mpi OpenMP Numa Parallel Computing Programming models Memory Hierarchy
49	Construção automática de mosaicos de imagens digitais aéreas agrícolas utilizando transformada SIFT e processamento paralelo / Automatic construction of mosaics from aerial digital images agricultural using SIFT transform and parallel processing Tarallo, André de Souza 26 August 2013 (has links) A construção automática de grandes mosaicos a partir de imagens digitais de alta resolução é uma área de grande importância, encontrando aplicações em diferentes áreas, como agricultura, meteorologia, sensoriamento remoto e biológicas. Na agricultura, a eficiência no processo de tomada de decisão para controle de pragas, doenças ou queimadas está relacionada com a obtenção rápida de informações. Até o presente momento este controle vem sendo feito de maneira semiautomática, necessitando obter o modelo digital do terreno, fazer a ortorretificação de imagens, inserir marcações manualmente no solo, para que um software possa construir um mosaico de maneira convencional. Para automatizar este processo, o presente projeto propõe três metodologias (1, 2, 3) baseadas em algoritmos já consolidados na literatura (SIFT, BBF e RANSAC) e processamento paralelo (OpenMP), utilizando imagens aéreas agrícolas de alta resolução, de pastagens e plantações de diversas culturas. As metodologias diferem na maneira como os cálculos são realizados para a construção dos mosaicos. Construir mosaicos com este padrão de imagem não é uma tarefa trivial, pois requer alto esforço computacional para processamento destas imagens. As metodologias incluem um pré-processamento das imagens para minimizar possíveis distorções que surgem no processo de aquisição de imagens e contém também algoritmos para suavização das emendas das imagens no mosaico. A base de imagens, denominada base de imagens sem redimensionamento, foi construída a partir de imagens com dimensão de 2336 x 3504 pixels (100 imagens divididas em 10 grupos de 10 imagens), obtidas na região de Santa Rita do Sapucaí - MG, as quais foram utilizadas para validar a metodologia. Outra base de imagens, referida como base de imagens redimensionada, contêm 200 imagens de 533 x 800 pixels (10 grupos de 20 imagens) e foi utilizada para avaliação de distorção para comparação com os softwares livres Autostitch e PTGui, os quais possuem parâmetros pré-definidos para a resolução de 533 x 800 pixels. Os resultados do tempo de processamento sequencial para as três metodologias evidenciaram a metodologia 3 com o menor tempo, sendo esta 63,5% mais rápida que a metodologia 1 e 44,5% do que a metodologia 2. O processamento paralelo foi avaliado para um computador com 2, 4 e 8 threads (4 núcleos físicos e 4 núcleos virtuais), reduzindo em 60% o tempo para a construção dos mosaicos de imagens para a metodologia 1. Verificou-se que um computador com 4 threads (núcleos físicos) é o mais adequado em tempo de execução e Speedup, uma vez que quando se utilizam 8 threads são incluídos as threads virtuais. Os resultados dos testes de distorção obtidos evidenciam que os mosaicos gerados com a metodologia 1 apresentam menores distorções para 7 grupos de imagens em um total de 10. Foram também avaliadas as distorções nas junções de cinco mosaicos constituídos apenas por pares de imagens utilizando a metodologia 3, evidenciando que a metodologia 3 apresenta menor distorção para 4 mosaicos, em um total de 5. O presente trabalho contribui com a metodologia 2 e 3, com a minimização das distorções das junções do mosaico, com o paralelismo em OpenMP e com a avaliação de paralelismo com MPI. / The automatic construction of large mosaics from high resolution digital images is an area of great importance, which finds applications in different areas, especially agriculture, meteorology, biology and remote sensing. In agriculture, the efficiency of decision making is linked to obtaining faster and more accurate information, especially in the control of pests, diseases or fire control. So far this has been done semiautomatically and it is necessary to obtain the digital terrain model, do the orthorectification of images, insert markings on the ground by manual labor, so that software can build a mosaic in the conventional way. To automate this process, this project proposes three methodologies (1, 2, 3) based on algorithms already well-established in the literature (SIFT, BBF e RANSAC) and parallel processing (OpenMP), using high resolution/size aerial images agricultural of pasture and diverse cultures. The methodologies differ in how the calculations are performed for the construction of mosaics. Build mosaics with this kind of picture isn´t a trivial task, as it requires high computational effort for processing these images. The methodologies include a pre-processing of images to minimize possible distortions that arise in the process of image acquisition and also contain algorithms for smoothing the seams of the images in the mosaic. The image database, called image database without scaling, was constructed from images with dimensions of 2336 x 3504 pixels (100 images divided into 10 groups of 10 pictures), obtained in the region of Santa Rita do Sapucaí - MG, which were used to validate the methodology. Another image database, referred to as base images resize, contains 200 images of 533 x 800 pixels (10 groups of 20 pictures). It was used for evaluation of distortion compared to the free softwares Autostitch and PTGui, which have pre-defined parameters for the resolution of 533 x 800 pixels. The results of sequential processing time for the three methodologies showed the methodology 3 with the shortest time, which is 63.5% faster than the methodology 1 and 44.5% faster than the methodology 2. Parallel processing was evaluated for a computer with 2, 4 and 8 threads (4 physical cores and 4 virtual cores), reducing by 60% the time to build the mosaics of images for the methodology 1. It was found that a computer with 4 threads (physical cores) is most appropriate in execution time and Speedup, since when using 8 threads, threads virtual are included. The test results of distortion show that the mosaics generated with the methodology 1 have lower distortion for 7 groups of images in a total of 10. Distortions at the junctions of five mosaics consisting only of pairs of images were also evaluate utilizing the methodology 3, indicating that the methodology 3 has less distortion for 4 mosaics, for a total of 5. Contributions of this work have been the methodologies 2 and 3, with the distortions minimization of the mosaic junction, the parallelism in OpenMP and the assessment of parallelism with MPI. OpenMP Speedup Aerial images agricultural Automatic mosaic Imagens aéreas agrícolas Mosaico automático OpenMP Parallel processing Processamento paralelo SIFT transform Speedup Transformada SIFT
50	Hybrid parallel algorithms for solving nonlinear Schrödinger equation / Hibridni paralelni algoritmi za rešavanje nelinearne Šredingerove jednačine Lončar Vladimir 17 October 2017 (has links) <p>Numerical methods and algorithms for solving of partial differential equations, especially parallel algorithms, are an important research topic, given the very broad applicability range in all areas of science. Rapid advances of computer technology open up new possibilities for development of faster algorithms and numerical simulations of higher resolution. This is achieved through paralleliza-tion at different levels that  practically all current computers support.</p><p>In this thesis we develop parallel algorithms for solving one kind of partial differential equations known as nonlinear Schrödinger equation (NLSE) with a convolution integral kernel. Equations of this type arise in many fields of physics such as nonlinear optics, plasma physics and physics of ultracold atoms, as well as economics and quantitative  finance. We focus on a special type of NLSE, the dipolar Gross-Pitaevskii equation (GPE), which characterizes the behavior of ultracold atoms in the state of Bose-Einstein condensation.</p><p>We present novel parallel algorithms for numerically solving GPE for a wide range of modern parallel computing platforms, from shared memory systems and dedicated hardware accelerators in the form of graphics processing units (GPUs), to   heterogeneous computer clusters. For shared memory systems, we provide an algorithm and implementation targeting multi-core processors us-ing OpenMP. We also extend the algorithm to GPUs using CUDA toolkit and combine the OpenMP and CUDA approaches into a hybrid, heterogeneous al-gorithm that is capable of utilizing all  available resources on a single computer. Given the inherent memory limitation a single  computer has, we develop a distributed memory algorithm based on Message Passing Interface (MPI) and previous shared memory approaches. To maximize the performance of hybrid implementations, we optimize the parameters governing the distribution of data  and workload using a genetic algorithm. Visualization of the increased volume of output data, enabled by the efficiency of newly developed algorithms, represents a challenge in itself. To address this, we integrate the implementations with the state-of-the-art visualization tool (VisIt), and use it to study two use-cases which demonstrate how the developed programs can be applied to simulate real-world systems.</p> / <p>Numerički metodi i algoritmi za re&scaron;avanje parcijalnih diferencijalnih jednačina, naročito paralelni algoritmi, predstavljaju izuzetno značajnu oblast istraživanja, uzimajući u obzir veoma &scaron;iroku primenljivost u svim oblastima nauke. Veliki napredak informacione tehnologije otvara nove mogućnosti za razvoj bržih al-goritama i  numeričkih simulacija visoke rezolucije. Ovo se ostvaruje kroz para-lelizaciju na različitim nivoima koju poseduju praktično svi moderni računari. U ovoj tezi razvijeni su paralelni algoritmi za re&scaron;avanje jedne vrste parcijalnih diferencijalnih jednačina poznate kao nelinearna &Scaron;redingerova jednačina sa inte-gralnim konvolucionim kernelom. Jednačine ovog tipa se javljaju u raznim oblas-tima fizike poput nelinearne optike, fizike plazme i fizike ultrahladnih atoma, kao i u ekonomiji i kvantitativnim finansijama. Teza se bavi posebnim oblikom nelinearne &Scaron;redingerove jednačine, Gros-Pitaevski jednačinom sa dipol-dipol in-terakcionim članom, koja karakteri&scaron;e pona&scaron;anje ultrahladnih atoma u stanju Boze-Ajn&scaron;tajn kondenzacije.<br />U tezi su predstavljeni novi paralelni algoritmi za numeričko re&scaron;avanje Gros-Pitaevski jednačine za &scaron;irok spektar modernih računarskih platformi, od sis-tema sa deljenom memorijom i specijalizovanih hardverskih akceleratora u ob-liku grafičkih procesora, do heterogenih računarskih klastera. Za sisteme sa deljenom memorijom, razvijen je  algoritam i implementacija namenjena vi&scaron;e-jezgarnim centralnim procesorima  kori&scaron;ćenjem OpenMP tehnologije. Ovaj al-goritam je pro&scaron;iren tako da radi i u  okruženju grafičkih procesora kori&scaron;ćenjem CUDA alata, a takođe je razvijen i  predstavljen hibridni, heterogeni algoritam koji kombinuje OpenMP i CUDA pristupe i koji je u stanju da iskoristi sve raspoložive resurse jednog računara.<br />Imajući u vidu inherentna ograničenja raspoložive memorije koju pojedinačan računar poseduje, razvijen je i algoritam za sisteme sa distribuiranom memorijom zasnovan na Message Passing Interface tehnologiji i prethodnim algoritmima za sisteme sa deljenom memorijom. Da bi se maksimalizovale performanse razvijenih hibridnih implementacija, parametri koji određuju raspodelu podataka i računskog opterećenja su optimizovani kori&scaron;ćenjem genetskog algoritma. Poseban izazov je vizualizacija povećane količine izlaznih podataka, koji nastaju kao rezultat efikasnosti novorazvijenih algoritama. Ovo je u tezi re&scaron;eno kroz inte-graciju implementacija sa najsavremenijim alatom za vizualizaciju (VisIt), &scaron;to je omogućilo proučavanje dva primera koji pokazuju kako razvijeni programi mogu da se iskoriste za simulacije realnih sistema.</p>

Search results