Global ETD Search

141	Automatic Parallelization of Simulation Code from Equation Based Simulation Languages Aronsson, Peter January 2002 (has links) <p>Modern state-of-the-art equation based object oriented modeling languages such as Modelica have enabled easy modeling of large and complex physical systems. When such complex models are to be simulated, simulation tools typically perform a number of optimizations on the underlying set of equations in the modeled system, with the goal of gaining better simulation performance by decreasing the equation system size and complexity. The tools then typically generate efficient code to obtain fast execution of the simulations. However, with increasing complexity of modeled systems the number of equations and variables are increasing. Therefore, to be able to simulate these large complex systems in an efficient way parallel computing can be exploited.</p><p>This thesis presents the work of building an automatic parallelization tool that produces an efficient parallel version of the simulation code by building a data dependency graph (task graph) from the simulation code and applying efficient scheduling and clustering algorithms on the task graph. Various scheduling and clustering algorithms, adapted for the requirements from this type of simulation code, have been implemented and evaluated. The scheduling and clustering algorithms presented and evaluated can also be used for functional dataflow languages in general, since the algorithms work on a task graph with dataflow edges between nodes.</p><p>Results are given in form of speedup measurements and task graph statistics produced by the tool. The conclusion drawn is that some of the algorithms investigated and adapted in this work give reasonable measured speedup results for some specific Modelica models, e.g. a model of a thermofluid pipe gave a speedup of about 2.5 on 8 processors in a PC-cluster. However, future work lies in finding a good algorithm that works well in general.</p> / Report code: LiU-Tek-Lic-2002:06. state-of-the-art equation object oriented modeling automatic parallelization tool data dependency graph clustering algorithms Computer science Datavetenskap
142	Java Code Transformation for Parallelization Iftikhar, Muhammad Usman January 2011 (has links) This thesis describes techniques for defining independent tasks in Java programs forparallelization. Existing Java parallelization APIs like JOMP, Parallel Java,Deterministic Parallel Java, JConqurr and JaMP are discussed. We have seen that JaMPis an implementation of OpenMP for Java, and it has a set of OpenMP directives andruntime library functions. We have discussed that JaMP has source to byte codecompiler, and it does not help in debugging the parallel source codes. There is no designtime syntax checking support of JaMP directives, and we know about mistakes onlywhen we compile the source code with JaMP compiler. So we have decided tocontribute JaMP with adding an option in the compiler to get parallel source code. Wehave created an eclipse plug-in to support design time syntax checking of JaMPdirectives too. It also helps the programmers to get quickly parallel source code withjust one click instead of using shell commands with JaMP compiler. Parallel Java Parallel processing Parallelization OpenMP JaMP JOMP Deterministic Parallel Java DPJ PJ Cluster Hybrid Amdahl‘s law Parallel APIs JConqurr
143	Development, analysis and applications of the technology for parallelization of numerical algorithms for solution of PDE and systems of PDEs / Diferencialinių lygčių ir jų sistemų skaitinio sprendimo algoritmų lygiagretinimo technologijos kūrimas, analizė ir taikymai Jakušev, Aleksandr 20 June 2008 (has links) The new parallelization technology is presented in this work. The technology is suitable for parallelization of linear algebra problems that arise during solution of PDE and PDE systems. The new technology combines the strong points of "data parallel" and "global memory" parallel programming models. Using the pecularities of the problems of a given class, the technology allows to write effective code easily, with the addition of the possibility for semi-automatic parallelization. The work consists of 3 parts: the review of existing technologies, the description of the new one, various applications. / Šiame darbe pateikiama nauja tiesinės algebros algoritmų, atsirandančių sprendžiant dif. lygtis ir jų sistemas, lygiagretinimo technologija. Ši technologija apjungia "lygiagrečiųjų duomenų" ir "globalios atminties" lygiagretinimo modelių privalumus, ir, naudojant apibrėžtos klasės uždavinių yptaumus, leidžia lengvai gauti efektyvų programos kodą, kuris pusiau automatiškai lygiagretinamas. Darbas susideda iš 3 dalių: egzistuojančių priemonių apžvalga, naujos technologijos aprašymas, įvairūs taikymai. Informatics Engineering PDE Parallelization Parallel algorithms C++ MPI Diferencialinės lygtys ir jų sistemos Lygiagretinimas Lygiagretieji algoritmai C++ MPI
144	Diferencialinių lygčių ir jų sistemų skaitinio sprendimo algoritmų lygiagretinimo technologijos kūrimas, analizė ir taikymai / Development, analysis and applications of the technology for parallelization of numerical algorithms for solution of PDE and systems of PDEs Jakušev, Aleksandr 17 February 2009 (has links) Šiame darbe pateikiama nauja tiesinės algebros algoritmų, atsirandančių sprendžiant dif. lygtis ir jų sistemas, lygiagretinimo technologija. Ši technologija apjungia "lygiagrečiųjų duomenų" ir "globalios atminties" lygiagretinimo modelių privalumus, ir, naudojant apibrėžtos klasės uždavinių yptaumus, leidžia lengvai gauti efektyvų programos kodą, kuris pusiau automatiškai lygiagretinamas. Darbas susideda iš 3 dalių: egzistuojančių priemonių apžvalga, naujos technologijos aprašymas, įvairūs taikymai. / The new parallelization technology is presented in this work. The technology is suitable for parallelization of linear algebra problems that arise during solution of PDE and PDE systems. The new technology combines the strong points of "data parallel" and "global memory" parallel programming models. Using the pecularities of the problems of a given class, the technology allows to write effective code easily, with the addition of the possibility for semi-automatic parallelization. The work consists of 3 parts: the review of existing technologies, the description of the new one, various applications. Informatics Engineering Lygiagretieji algoritmai Lygiagretinimas C++ MPI Parallel algorithms Parallelization PDE systems solution C++ MPI
145	Susidūrimų paieškos, naudojant lygiagrečius skaičiavimus, metodų tyrimas / Collision detection methods using parallel computing Šiukščius, Martynas 26 August 2013 (has links) Susidūrimų paieška - tai dviejų ar daugiau objektų susikirtimo radimas. Praktikoje susidūrimų paieška taikoma šiose srityse: kompiuteriniuose žaidimuose, netiesinėje baigtinių elementų analizėje, dalelių hidrodinamikoje, daugiafunkcinės dinamikos analizėje, įvairiose fizikos simuliacijose ir kt. Egzistuoja daugybė susidūrimų paieškos algoritmų, iš kurių populiariausi yra erdvinio skaidymo, hierarchinio struktūrizavimo ir atrinkimo bei rūšiavimo metodai. Šiame darbe yra tiriamas šių algoritmų veikimas ant CPU (Central processing unit) ir ant GPU (Graphics processing unit), analizuojami susidūrimų paieškos nustatymo būdai bei nagrinėjamos pasirinktų algoritmų veikimo spartinimo galimybės panaudojant CUDA (Compute Unified Device Architecture) technologiją. Ši technologija yra Nvidia sukurta nauja duomenų apdorojimo architektūra išnaudojanti grafinio procesoriaus resursus bendro pobūdžio skaičiavimams. Darbe iškeltų tikslų pasiekimui yra realizuotos kelios bazinės algoritmų versijos, jų pritaikymo lygiagretiems skaičiavimams galimybės ir taip pat atliekami bazinių algoritmų laiko, reikalingo skaičiavimams atlikti, grafinio procesoriaus atminties sąnaudos bei įvairių veikimo laiką įtakojančių faktorių tyrimai. Darbo pabaigoje aptariami lygiagretaus programavimo privalumai pritaikant nagrinėjamai temai. Šiame darbe atlikti tyrimai parodė, jog perduodant skaičiavimus į GPU pasiekiamas 200 kartų didesnis nagrinėjamų algoritmų našumas negu atliekant skaičiavimus naudojant CPU. / Collision detection is a well-studied and active research field where the main problem is to determine if one or more objects collide with each other in 3D virtual space. Collision detection is an issue affecting many different fields of study, including computer animation, physical-based simulation, robotics, video games and haptic applications. There is a big variety of collision detection algorithms of witch spatial subdivision, octree and sort and sweep are three of them. In this document we provide a short summary of collision detection algorithms, but the main focus will be on analyzing and increasing their performance working on CPU (orig. Central processing unit) and GPU (orig. Graphics processing unit) separately by making use of CUDA (orig.Compute Unified Device Architecture) technology. This technology is a part of Nvidia, witch helps the use of graphics processor for general-purpose computation. Main goal of this research is achieved by performing analysis of implemented spatial subdivision, octree and sort and sweep algorithms. This analysis consists of both general performance, parallelization performance and various performance affecting factors analyses. At the end of the document, the advantages of parallel programming adapted to the present subject are discussed. Informatics Susidūrimų paieška Grafinis procesorius Skaičiavimų spartinimas Lygiagretus skaičiavimai Collision detection GPU-based parallel computing Spatial subdivision Parallelization
146	Efficient search-based strategies for polyhedral compilation : algorithms and experience in a production compiler Trifunovic, Konrad 04 July 2011 (has links) (PDF) In order to take the performance advantages of the current multicore and heterogeneous architectures the compilers are required to perform more and more complex program transformations. The search space of the possible program optimizations is huge and unstructured. Selecting the best transformation and predicting the potential performance benefits of that transformation is the major problem in today's optimizing compilers. The promising approach to handling the program optimizations is to focus on the automatic loop optimizations expressed in the polyhedral model. The current approaches for optimizing programs in the polyhedral model broadly fall into two classes. The first class of the methods is based on the linear optimization of the analytical cost function. The second class is based on the exhaustive iterative search. While the first approach is fast, it can easily miss the optimal solution. The iterative approach is more precise, but its running time might be prohibitively expensive. In this thesis we present a novel search-based approach to program transformations in the polyhedral model. The new method combines the benefits - effectiveness and precision - of the current approaches, while it tries to minimize their drawbacks. Our approach is based on enumerating the evaluations of the precise, nonlinear performance predicting cost-function. The current practice is to use the polyhedral model in the context of source-to-source compilers. We have implemented our techniques in a GCC framework that is based on the low level three address code representation. We show that the chosen level of abstraction for the intermediate representation poses scalability challenges, and we show the ways to overcome those problems. On the other hand, it is shown that the low level IR abstraction opens new degrees of freedom that are beneficial for the search-based transformation strategies and for the polyhedral compilation in general. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Compilers Programming languages Polyhedral model Program transformations Loop transformations Automatic parallelization Intermediate representation
147	Systematic and Scalable Testing of Concurrent Programs Simsa, Jiri 16 December 2013 (has links) The challenge this thesis addresses is to speed up the development of concurrent programs by increasing the efficiency with which concurrent programs can be tested and consequently evolved. The goal of this thesis is to generate methods and tools that help software engineers increase confidence in the correct operation of their programs. To achieve this goal, this thesis advocates testing of concurrent software using a systematic approach capable of enumerating possible executions of a concurrent program. The practicality of the systematic testing approach is demonstrated by presenting a novel software infrastructure that repeatedly executes a program test, controlling the order in which concurrent events happen so that different behaviors can be explored across different test executions. By doing so, systematic testing circumvents the limitations of traditional ad-hoc testing, which relies on chance to discover concurrency errors. However, the idea of systematic testing alone does not quite solve the problem of concurrent software testing. The combinatorial nature of the number of ways in which concurrent events of a program can execute causes an explosion of the number of possible interleavings of these events, a problem referred to as state space explosion. To address the state space explosion problem, this thesis studies techniques for quantifying the extent of state space explosion and explores several directions for mitigating state space explosion: parallel state space exploration, restricted runtime scheduling, and abstraction reduction. In the course of its research exploration, this thesis pushes the practical limits of systematic testing by orders of magnitude, scaling systematic testing to real-world programs of unprecedented complexity. Concurrent Programming Systematic Testing State Space Explosion State Space Exploration State Space Estimation Parallelization State Space Reduction Computer Sciences
148	Acceleration of Compressible Flow Simulations with Edge Using Implicit Time Stepping Otero, Evelyn January 2014 (has links) Computational fluid dynamics (CFD) is a significant tool routinely used indesign and optimization in aerospace industry. Often cases with unsteadyflows must be computed, and the long compute times of standard methods hasmotivated the present work on new implicit methods to replace the standardexplicit schemes. The implementation and numerical experiments were donewith the Swedish national flow solver Edge, developed by FOI,universities, and collaboration partners.The work is concentrated on a Lower-Upper Symmetric Gauss-Seidel (LU-SGS)type of time stepping. For the very anisotropic grids needed forReynolds-Averaged Navier-Stokes (RANS) computations of turbulent boundary layers,LU-SGS is combined with a line-implicit technique. The inviscid flux Jacobians which contribute to the diagonalblocks of the system matrix are based on a flux splitting method with upwind type dissipation giving control over diagonal dominance and artificial dissipation.The method is controlled by several parameters, and comprehensivenumerical experiments were carried out to identify their influence andinteraction so that close to optimal values can be suggested. As an example,the optimal number of iterations carried out in a time-step increases with increased resolution of the computational grid.The numbering of the unknowns is important, and the numberings produced by mesh generators of Delaunay- and advancing front-type wereamong the best.The solver has been parallelized with the Message Passing Interface (MPI) for runs on multi-processor hardware,and its performance scales with the number of processors at least asefficiently as the explicit methods. The new method saves typicallybetween 50 and 80 percent of the runtime, depending on the case, andthe largest computations have reached 110M grid nodes. Theclassical multigrid acceleration for 3D RANS simulations was foundineffective in the cases tested in combination with the LU-SGS solverusing optimal parameters. Finally, preliminary time-accurate simulations for unsteady flows have shown promising results. / <p>QC 20141201</p> Compressible CFD Convergence Acceleration Implicit Time-Stepping LU-SGS Upwind Type Dissipation Line-implicit Ordering Parallelization Parameters Multigrid
149	Scheduling workflows to optimize for execution time Peters, Mathias January 2018 (has links) Many functions in today’s society are immensely dependent on data. Data drives everything from business decisions to self-driving cars to intelligent home assistants like Amazon Echo and Google Home. To make good decisions based on data, of which exabytes are generated every day, somehow that data has to be processed. Data processing can be complex and time-consuming. One way of reducing the complexity is to create workflows that consist of several steps that together produce the right result. Klarna is an example of a company that relies on workflows for transforming and analyzing data. As a company whose core business involves analyzing customer data, being able to do those analyses faster will lead to direct business value in the form of more well-informed decisions. The workflows Klarna use are currently all written in a sequential form. However, workflows, where independent tasks are executed in parallel, are more performant than workflows where only one task is executed at any point in time. Due to limitations in human attention span, parallelized workflows are harder for humans to write, compared to sequential workflows. In this work, a computer application was created that automates the parallelization of a workflow to let humans write sequential workflows while still getting the performance of parallelized workflows. The application does this by taking a simple sequential workflow, identifies dependencies in the workflow and then schedules it in a way that is as parallel as possible given the identified dependencies. Such a solution has not been created before. However, experimental evaluation shows that parallelization of a sequential workflow used in daily production at Klarna can reduce execution time by up to 80%, showing that the application can bring value to Klarna and other organizations that use workflows to analyze big data. Hadoop Hive big data workflows scheduling parallelization automatic dependency identification dependency graph SQL HiveQL Information Systems
150	Algorithmes pour la dynamique moléculaire restreinte de manière adaptative / Algorithms for adaptively restrained molecular dynamics Singh, Krishna Kant 08 November 2017 (has links) Les méthodes de dynamique moléculaire (MD pour Molecular Dynamics en anglais) sont utilisées pour simuler des systèmes volumineux et complexes. Cependant, la simulation de ce type de systèmes sur de longues échelles temporelles demeure un problème coûteux en temps de calcul. L'étape la plus coûteuse des méthodes de MD étant la mise à jour des forces entre les particules. La simulation de particules restreintes de façon adaptative (ARMD pour Adaptively Restrained Molecular Dynamics en anglais) est une nouvelle approche permettant d'accélérer le processus de simulation en réduisant le nombre de calculs de forces effectués à chaque pas de temps. La méthode ARMD fait varier l'état des degrés de liberté en position en les activants ou en les désactivants de façon adaptative au cours de la simulation. Du fait, que le calcul des forces dépend majoritairement de la distance entre les atomes, ce calcul peut être évité entre deux particules dont les degrés de liberté en position sont désactivés. En revanche, le calcul des forces pour les particules actives (i.e. celles dont les degrés de liberté en position sont actifs) est effectué. Afin d'exploiter au mieux l'adaptabilité de la méthode ARMD, nous avons conçu de nouveaux algorithmes permettant de calculer et de mettre à jour les forces de façon plus efficace. Nous avons développé des algorithmes permettant de construire et de mettre à jour des listes de voisinage de manière incrémentale. En particulier, nous avons travaillé sur un algorithme de mise à jour incrémentale des forces en un seul passage deux fois plus rapide que l'ancien algorithme également incrémental mais qui nécessitait deux passages. Les méthodes proposées ont été implémentées et validées dans le simulateur de MD appelé LAMMPS, mais elles peuvent s'appliquer à n'importe quel autre simulateur de MD. Nous avons validé nos algorithmes pour différents exemples sur les ensembles NVE et NVT. Dans l'ensemble NVE, la méthode ARMD permet à l'utilisateur de jouer sur le précision pour accélérer la vitesse de la simulation. Dans l'ensemble NVT, elle permet de mesurer des grandeurs statistiques plus rapidement. Finalement, nous présentons des algorithmes parallèles pour la mise à jour incrémentale en un seul passage permettant d'utiliser la méthode ARMD avec le standard Message Passage Interface (MPI). / Molecular Dynamics (MD) is often used to simulate large and complex systems. Although, simulating such complex systems for the experimental time scales are still computationally challenging. In fact, the most computationally extensive step in MD is the computation of forces between particles. Adaptively Restrained Molecular Dynamics (ARMD) is a recently introduced particles simulation method that switches positional degrees of freedom on and off during simulation. Since force computations mainly depend upon the inter-atomic distances, the force computation between particles with positional degrees of freedom off~(restrained particles) can be avoided. Forces involving active particles (particles with positional degrees of freedom on) are computed.In order to take advantage of adaptability of ARMD, we designed novel algorithms to compute and update forces efficiently. We designed algorithms not only to construct neighbor lists, but also to update them incrementally. Additionally, we designed single-pass incremental force update algorithm that is almost two times faster than previously designed two-pass incremental algorithm. These proposed algorithms are implemented and validated in the LAMMPS MD simulator, however, these algorithms can be applied to other MD simulators. We assessed our algorithms on different and diverse benchmarks in both microcanonical ensemble (NVE) and canonical (NVT) ensembles. In the NVE ensemble, ARMD allows users to trade between precision and speed while, in the NVT ensemble, it makes it possible to compute statistical averages faster. In Last, we introduce parallel algorithms for single-pass incremental force computations to take advantage of adaptive restraints using the Message Passage Interface (MPI) standard. Simulation adaptative Dynamique moléculaire Parallélisation Adaptive Simulation Molecular Dynamics Parallelization Active Neighbor List Single-Pass algorithm Incremental algorithms 004

Search results