• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 123
  • 21
  • 19
  • 16
  • 14
  • 10
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 235
  • 65
  • 44
  • 31
  • 27
  • 26
  • 26
  • 26
  • 26
  • 24
  • 23
  • 22
  • 22
  • 21
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Simulation of Modelica Models on the CUDA Architecture

Östlund, Per January 2009 (has links)
Simulations are very important for many reasons, and finding ways of accelerating simulations are therefore interesting. In this thesis the feasibility of automatically generating simulation code for a limited set of Modelica models that can be executed on NVIDIAs CUDA architecture is studied. The OpenModelica compiler, an open-source Modelica compiler, was for this purpose extended to generate CUDA code. This thesis presents an overview of the CUDA architecture, and looks at the problems that need to be solved to generate efficient simulation code for this architecture. Methods of finding parallelism in models that can be used on the highly parallel CUDA architecture are shown, and methods of efficiently using the available memory spaces on the architecture are also presented. This thesis shows that it is possible to generate CUDA simulation code for the set of Modelica models that were chosen. It also shows that for models with a large amount of parallelism it is possible to get significant speedups compared with simulation on a normal processor, and a speedup of 4.6 was reached for one of the models used in the thesis. Several suggestions on how the CUDA architecture can be used even more efficiently for Modelica simulations are also given.
42

Adaptation automatique et semi-automatique des optimisations de programmes / Automatic and Semi-Automatic Adaptation of Program Optimizations

Bagnères, Lénaïc 30 September 2016 (has links)
Les compilateurs offrent un excellent compromis entre le temps de développement et les performances de l'application. Actuellement l'efficacité de leurs optimisations reste limitée lorsque les architectures cibles sont des multi-cœurs ou si les applications demandent des calculs intensifs. Il est difficile de prendre en compte les nombreuses configurations existantes et les nombreux paramètres inconnus à la compilation et donc disponibles uniquement pendant l'exécution. En se basant sur les techniques de compilation polyédrique, nous proposons deux solutions complémentaires pour contribuer au traitement de ces problèmes. Dans une première partie, nous présentons une technique automatique à la fois statique et dynamique permettant d'optimiser les boucles des programmes en utilisant les possibilités offertes par l'auto-tuning dynamique. Cette solution entièrement automatique explore de nombreuses versions et sélectionne les plus pertinentes à la compilation. Le choix de la version à exécuter se fait dynamiquement avec un faible surcoût grâce à la génération de versions interchangeables: un ensemble de transformations autorisant le passage d'une version à une autre du programme tout en faisant du calcul utile. Dans une seconde partie, nous offrons à l'utilisateur une nouvelle façon d'interagir avec le compilateur polyédrique. Afin de comprendre et de modifier les transformations faites par l'optimiseur, nous traduisons depuis la représentation polyédrique utilisée en interne n'importe quelle transformation de boucles impactant l’ordonnancement des itérations en une séquence de transformations syntaxiques équivalente. Celle-ci est compréhensible et modifiable par les programmeurs. En offrant la possibilité au développeur d'examiner, modifier, améliorer, rejouer et de construire des optimisations complexes avec ces outils de compilation semi-automatiques, nous ouvrons une boîte noire du compilateur: celle de la plateforme de compilation polyédrique. / Compilers usually offer a good trade-off between productivity and single thread performance thanks to a wide range of available automatic optimizations. However, they are still fragile when addressing computation intensive parts of applications in the context of parallel architectures with deep memory hierarchies that are now omnipresent. The recent shift to multicore architectures for desktop and embedded systems as well as the emergence of cloud computing is raising the problem of the impact of the execution context on performance. Firstly, we present a static-dynamic compiler optimization technique that generates loop-based programs with dynamic auto-tuning capabilities with very low overhead. Our strategy introduces switchable scheduling, a family of program transformations that allows to switch between optimized versions while always processing useful computation. We present both the technique to generate self-adaptive programs based on switchable scheduling and experimental evidence of their ability to sustain high-performance in a dynamic environment.Secondly, we propose a novel approach which aims at opening the polyhedral compilation engines that are powering loop-level optimization and parallelization frameworks of several modern compilers.Building on the state-of-the-art polyhedral representation of programs, we present ways to translate comprehensible syntactic transformation sequences to and from the internal polyhedral compiler abstractions. This new way to interact with high-level optimization frameworks provides an invaluable feedback to programmers with the ability to design, replay or refine loop-level compiler optimizations.
43

Parallelization of Droplet Microfluidic Systems for the Sustainable Production of Micro-Reactors at Industrial Scale

Conchouso Gonzalez, David 04 1900 (has links)
At the cutting edge of the chemical and biological research, innovation takes place in a field referred to as Lab on Chip (LoC), a multi-disciplinary area that combines biology, chemistry, electronics, microfabrication, and fluid mechanics. Within this field, droplets have been used as microreactors to produce advanced materials like quantum dots, micro and nanoparticles, active pharmaceutical ingredients, etc. The size of these microreactors offers distinct advantages, which were not possible using batch technologies. For example, they allow for lower reagent waste, minimal energy consumption, increased safety, as well as better process control of reaction conditions like temperature regulation, residence times, and response times among others. One of the biggest drawbacks associated with this technology is its limited production volume that prevents it from reaching industrial applications. The standard production rates for a single droplet microfluidic device is in the range of 1-10mLh-1, whereas industrial applications usually demand production rates several orders of magnitude higher. Although substantial work has been recently undertaken in the development scaled-out solutions, which run in parallel several droplet generators. Complex fluid mechanics and limitations on the manufacturing capacity have constrained these works to explore only in-plane parallelization. This thesis investigates a three-dimensional parallelization by proposing a microfluidic system that is comprised of a stack of droplet generation layers working on the liquid-liquid ow regime. Its realization implied a study of the characteristics of conventional droplet generators and the development of a fabrication process for 3D networks of microchannels. Finally, the combination of these studies resulted in a functional 3D parallelization system with the highest production rate (i.e. 1 Lh-1) at the time of its publication. Additionally, this architecture can reach industrially relevant production rates as more devices can be integrated into the same chip and many chips can compose a manufacturing plant. The thesis also addresses the concerns about system reliability and quality control by proposing capacitive and radio frequency resonator sensors that can measure accurately increments as small as 2.4% in the water-in-oil volume fraction and identify errors during droplet production.
44

Scaling and distribution of Particle Swarm Optimization Algorithms on Microsoft Azure

Delis, Nikolaos January 2023 (has links)
Introduction. Particle Swarm Optimization (PSO) is a heavy-duty algorithm that is used to identify the optimum (maximum or minimum) solution of a formula with multiple unknown factors. PSO algorithms are used widely for various optimization problems, and all face the same challenge. Being iterative algorithms that in each iteration perform a mathematical formula, PSO algorithms demand a high capacity of physical resources and are often time consuming. This combination is even more challenging when executing a PSO algorithm on the cloud since expensive resources used over a long time come at a high cost and cheaper resources struggle to perform the task. To avoid high costs and achieve the best possible performance, one needs to choose the correct computational resources and configure them accordingly. Objectives. The goal of this study is to identify the optimum tools and configurations to execute a PSO algorithm on Microsoft’s cloud platform, Azure. To achieve that, we choose the Azure resources that are designed to perform deterministic tasks and to be distributed and scaled automatically by Azure. Those are Azure Functions and Azure Durable Functions. We experiment with various configurations, and we collect and compare the results to draw conclusions about which combination performs best. Methods. To identify which combination of Azure resources and configuration performs best in the cloud (Microsoft Azure), we perform experiments and collect metrics which we then aggregate and compare with each other, as well as with the metrics collected by executing the same combination on-premises. During those experiments, we execute the same PSO algorithm using the same variables, the values of which were calculated before performing the experiments. Results. Upon performing the experiments, we collected the results of each experiment, which consist of the time it took to execute, the number of zeros (beyond the decimal point) found in the result, as well as the Global Priority percentage which lead to that result. The results indicate differences both between the on-premises and on-cloud execution and between the various configurations and Azure resources. Conclusion. We succeeded in finding a combination using Azure Durable Functions with the appropriate configuration, which vastly outperforms all others. Concluding, the outcome of this study is that heavy-duty algorithms, such as PSO, can indeed be executed on Azure, with significantly improved performance, when using the right configuration and exploiting the resources to their whole extent. Additionally, we learned that an appropriately configured Azure resource can even outperform an identical execution on-premises (using equal resources).
45

Efficiently Solving the Exact Cover Problem in OpenMP

Hall, Leo January 2023 (has links)
The exact cover problem is an NP-complete problem with many widespread use cases such as crew scheduling, railway scheduling, benchmarking as well as having applications in set theory. Existing algorithms can be slow when dealing with large datasets however. To solve this problem in a quick manner this thesis uses a new method based on an existing algorithm called Algorithm X utilizing parallelization with the task construct of OpenMP to produce better results, at best providing a speedup of 4.5 when compared to a serial optimized implementation of Algorithm X. Since creating child tasks through the task construct causes additional overhead this thesis examines the effect granularity has on the solver by varying how many child tasks should be created before opting to solve the rest of the problem serially. The optimal number of child tasks is found to be very low when using a high amount of cores and vice versa when using fewer cores. Since the new method created for this thesis can solve the exact cover problem faster than Algorithm X it can prove to be beneficial when solving the problems mentioned earlier.
46

Using MPI One-Sided Communication for Parallel Sudoku Solving

Aili, Henrik January 2023 (has links)
This thesis investigates the scalability of parallel Sudoku solving using Donald Knuth’s Dancing Links and Algorithm X with two different MPI communication methods: MPI One-Sided Communication and MPI Send-Receive. The study compares the performance of the two communication approaches and finds that MPI One-Sided Communication exhibits better scalability in terms of speedup and efficiency. The research contributes to the understanding of parallel Sudoku solving and provides insights into the suitability of MPI One-Sided Communication for this task. The results highlight the advantages of using MPI One-Sided Communication over MPI Send-Receive, emphasizing its superior performance in parallel Sudoku solving scenarios. This research lays the foundation for future investigations in distributed computing environments and facilitates advancements in parallel Sudoku solving algorithms.
47

Quasi 3D Multi-stage Turbomachinery Pre-optimizer

Burdyshaw, Chad Eric 04 August 2001 (has links)
A pre-optimizer has been developed which modifies existing turbomachinery blades to create new geometries with improved selected aerodynamic coefficients calculated using a linear panel method. These blade rows can then be further refined using a Navier-Stokes method for evaluation. This pre-optimizer was developed in hopes of reducing the overall CPU time required for optimization when using only Navier-Stokes evaluations. The primary method chosen to effect this optimization is a parallel evolutionary algorithm. Variations of this method have been analyzed and compared for convergence and degree of improvement. Test cases involved both single and multiple row turbomachinery. For each case, both single and multiple criteria fitness evaluations were used.
48

Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model

Bondhugula, Uday Kumar 11 September 2008 (has links)
No description available.
49

Adapting the polytope model for dynamic and speculative parallelization / Adaptation du modèle polyhédrique à la parallélisation dynamique et spéculatice

Jimborean, Alexandra 14 September 2012 (has links)
Dans cette thèse, nous décrivons la conception et l'implémentation d'une plate-forme logicielle de spéculation de threads, ou fils d'exécution, appelée VMAD, pour "Virtual Machine for Advanced Dynamic analysis and transformation", et dont la fonction principale est d'être capable de paralléliser de manière spéculative un nid de boucles séquentiel de différentes façons, en ré-ordonnançant ses itérations. La transformation à appliquer est sélectionnée au cours de l'exécution avec pour objectifs de minimiser le nombre de retours arrières et de maximiser la performance. Nous effectuons des transformations de code en appliquant le modèle polyédrique que nous avons adapté à la parallélisation spéculative au cours de l'exécution. Pour cela, nous construisons au préalable un patron de code qui est "patché" par notre "runtime", ou support d'exécution logiciel, selon des informations de profilage collectées sur des échantillons du temps d'exécution. L'adaptabilité est assurée en considérant des tranches de code de tailles différentes, qui sont exécutées successivement, chacune étant parallélisée différemment, ou exécutée en séquentiel, selon le comportement des accès à la mémoire observé. Nous montrons, sur plusieurs programmes que notre plate-forme offre de bonnes performances, pour des codes qui n'auraient pas pu être traités efficacement par les systèmes spéculatifs de threads proposés précédemment. / In this thesis, we present a Thread-Level Speculation (TLS) framework whose main feature is to speculatively parallelize a sequential loop nest in various ways, to maximize performance. We perform code transformations by applying the polyhedral model that we adapted for speculative and runtime code parallelization. For this purpose, we designed a parallel code pattern which is patched by our runtime system according to the profiling information collected on some execution samples. We show on several benchmarks that our framework yields good performance on codes which could not be handled efficiently by previously proposed TLS systems.
50

Automated Reasoning Support for Invasive Interactive Parallelization

Moshir Moghaddam, Kianosh January 2012 (has links)
To parallelize a sequential source code, a parallelization strategy must be defined that transforms the sequential source code into an equivalent parallel version. Since parallelizing compilers can sometimes transform sequential loops and other well-structured codes into parallel ones automatically, we are interested in finding a solution to parallelize semi-automatically codes that compilers are not able to parallelize automatically, mostly because of weakness of classical data and control dependence analysis, in order to simplify the process of transforming the codes for programmers.Invasive Interactive Parallelization (IIP) hypothesizes that by using anintelligent system that guides the user through an interactive process one can boost parallelization in the above direction. The intelligent system's guidance relies on a classical code analysis and pre-defined parallelizing transformation sequences. To support its main hypothesis, IIP suggests to encode parallelizing transformation sequences in terms of IIP parallelization strategies that dictate default ways to parallelize various code patterns by using facts which have been obtained both from classical source code analysis and directly from the user.In this project, we investigate how automated reasoning can supportthe IIP method in order to parallelize a sequential code with an acceptable performance but faster than manual parallelization. We have looked at two special problem areas: Divide and conquer algorithms and loops in the source codes. Our focus is on parallelizing four sequential legacy C programs such as: Quick sort, Merge sort, Jacobi method and Matrix multipliation and summation for both OpenMP and MPI environment by developing an interactive parallelizing assistance tool that provides users with the assistanceneeded for parallelizing a sequential source code.

Page generated in 0.1595 seconds