11 |
Automatic methods for distribution of data-parallel programs on multi-device heterogeneous platformsMoreń, Konrad 07 February 2024 (has links)
This thesis deals with the problem of finding effective methods for programming and distributing data-parallel applications for heterogeneous multiprocessor systems. These systems are ubiquitous today. They range from embedded devices with low power consumption to high performance distributed systems. The demand for these systems is growing steadily. This is due to the growing number of data-intensive applications and the general growth of digital applications. Systems with multiple devices offer higher performance but unfortunately add complexity to the software development for such systems. Programming heterogeneous multiprocessor systems present several unique challenges compared to single device systems.
The first challenge is the programmability of such systems. Despite constant innovations in programming languages and frameworks, they are still limited. They are either platform specific, like CUDA which supports only NVIDIA GPUs, or applied at a low level of abstraction, such as OpenCL. Application developers that design OpenCL programs must manually distribute data to the different devices and synchronize the distributed computations. These capabilities have an impact on the productivity of the developers. To reduce the programming complexity and the development time, this thesis introduces two approaches that automatically distribute and synchronize the data-parallel workloads. Another challenge is the multi-device hardware utilization. In contrast to single-device platforms, the application optimization process for a multi-device system is even more complicated.
The application designers need to apply not only optimization strategies specific for a single-device architecture. They need also focus on the careful workload balancing between all the platform processors. For the balancing problem, this thesis proposes a method based on the platform model. The platform model is created with machine learning techniques. Using machine learning, this thesis builds automatically a reliable platform model, which is portable and adaptable to different platform setups, with a minimum manual involvement of the programmers.
|
12 |
High-Performance Analytics (HPA) / High-Performance Analytics (HPA)Soukup, Petr January 2012 (has links)
The aim of the thesis on the topic of High-Performance Analytics is to gain a structured overview of solutions of high performance methods for data analysis. The thesis introduction concerns with definitions of primary and secondary data analysis, and with the primary systems which are not appropriate for analytical data analysis. The usage of mobile devices, modern information technologies and other factors caused a rapid change of the character of data. The major part of this thesis is devoted particularly to the historical turn in the new approaches towards analytical data analysis, which was caused by Big Data, a very frequent term these days. Towards the end of the thesis there are discussed the system sources which greatly participate in the new approaches to the analytical data analysis as well as in the technological solutions of High Performance Analytics themselves. The second, practical part of the thesis is aimed at a comparison of the performance in conventional methods for data analysis and in one of the high performance methods of High Performance Analytics (more precisely, with In-Memory Analytics). Comparison of individual solutions is performed in identical environment of High Performance Analytics server. The methods are applied to a certain sample whose volume is increased after every round of executed measurement. The conclusion evaluates the tests results and discusses the possibility of usage of the individual High Performance Analytics methods.
|
13 |
Auto-tuning pour la détection automatique du meilleur format de compression pour matrice creuse / Auto-tuning for the automatic sparse matrix optimal compression format detectionMehrez, Ichrak 27 September 2018 (has links)
De nombreuses applications en calcul scientifique traitent des matrices creuses de grande taille ayant des structures régulières ou irrégulières. Pour réduire à la fois la complexité spatiale et la complexité temporelle du traitement, ces matrices nécessitent l'utilisation d'une structure particulière de stockage (ou de compression) des données ainsi que l’utilisation d’architectures cibles parallèles/distribuées. Le choix du format de compression le plus adéquat dépend généralement de plusieurs facteurs dont, en particulier, la structure de la matrice creuse, l'architecture de la plateforme cible et la méthode numérique utilisée. Etant donné la diversité de ces facteurs, un choix optimisé pour un ensemble de données d'entrée peut induire de mauvaises performances pour un autre. D’où l’intérêt d’utiliser un système permettant la sélection automatique du meilleur format de compression (MFC) en prenant en compte ces différents facteurs. C’est dans ce cadre précis que s’inscrit cette thèse. Nous détaillons notre approche en présentant la modélisation d'un système automatique qui, étant donnée une matrice creuse, une méthode numérique, un modèle de programmation parallèle et une architecture, permet de sélectionner automatiquement le MFC. Dans une première étape, nous validons notre modélisation par une étude de cas impliquant (i) Horner, et par la suite le produit matrice-vecteur creux (PMVC), comme méthodes numériques, (ii) CSC, CSR, ELL, et COO comme formats de compression, (iii) le data parallèle comme modèle de programmation et (iv) une plateforme multicœurs comme architecture cible. Cette étude nous permet d’extraire un ensemble de métriques et paramètres qui influent sur la sélection du MFC. Nous démontrons que les métriques extraites de l'analyse du modèle data parallèle ne suffisent pas pour prendre une décision (sélection du MFC). Par conséquent, nous définissons de nouvelles métriques impliquant le nombre d'opérations effectuées par la méthode numérique et le nombre d’accès à la mémoire. Ainsi, nous proposons un processus de décision prenant en compte à la fois l'analyse du modèle data parallèle et l'analyse de l’algorithme. Dans une deuxième étape, et en se basant sur les données que nous avons extrait précédemment, nous utilisons les algorithmes du Machine Learning pour prédire le MFC d’une matrice creuse donnée. Une étude expérimentale ciblant une plateforme parallèle multicœurs et traitant des matrices creuses aléatoires et/ou provenant de problèmes réels permet de valider notre approche et d’évaluer ses performances. Comme travail futur, nous visons la validation de notre approche en utilisant d'autres plateformes parallèles telles que les GPUs. / Several applications in scientific computing deals with large sparse matrices having regular or irregular structures. In order to reduce required memory space and computing time, these matrices require the use of a particular data storage structure as well as the use of parallel/distributed target architectures. The choice of the most appropriate compression format generally depends on several factors, such as matrix structure, numerical method and target architecture. Given the diversity of these factors, an optimized choice for one input data set will likely have poor performances on another. Hence the interest of using a system allowing the automatic selection of the Optimal Compression Format (OCF) by taking into account these different factors. This thesis is written in this context. We detail our approach by presenting a design of an auto-tuner system for OCF selection. Given a sparse matrix, a numerical method, a parallel programming model and an architecture, our system can automatically select the OCF. In a first step, we validate our modeling by a case study that concerns (i) Horner scheme, and then the sparse matrix vector product (SMVP), as numerical methods, (ii) CSC, CSR, ELL, and COO as compression formats; (iii) data parallel as a programming model; and (iv) a multicore platform as target architecture. This study allows us to extract a set of metrics and parameters that affect the OCF selection. We note that data parallel metrics are not sufficient to accurately choose the most suitable format. Therefore, we define new metrics involving the number of operations and the number of indirect data access. Thus, we proposed a new decision process taking into account data parallel model analysis and algorithm analysis.In the second step, we propose to use machine learning algorithm to predict the OCF for a given sparse matrix. An experimental study using a multicore parallel platform and dealing with random and/or real-world random matrices validates our approach and evaluates its performances. As future work, we aim to validate our approach using other parallel platforms such as GPUs.
|
14 |
Algorithms for Molecular Dynamics SimulationsHedman, Fredrik January 2006 (has links)
<p>Methods for performing large-scale parallel Molecular Dynamics(MD) simulations are investigated. A perspective on the field of parallel MD simulations is given. Hardware and software aspects are characterized and the interplay between the two is briefly discussed. </p><p>A method for performing <i>ab initio </i>MD is described; the method essentially recomputes the interaction potential at each time-step. It has been tested on a system of liquid water by comparing results with other simulation methods and experimental results. Different strategies for parallelization are explored.</p><p>Furthermore, data-parallel methods for short-range and long-range interactions on massively parallel platforms are described and compared. </p><p>Next, a method for treating electrostatic interactions in MD simulations is developed. It combines the traditional Ewald summation technique with the nonuniform Fast Fourier transform---ENUF for short. The method scales as <i>N log N</i>, where <i>N </i>is the number of charges in the system. ENUF has a behavior very similar to Ewald summation and can be easily and efficiently implemented in existing simulation programs.</p><p>Finally, an outlook is given and some directions for further developments are suggested.</p>
|
15 |
Algorithms for Molecular Dynamics SimulationsHedman, Fredrik January 2006 (has links)
Methods for performing large-scale parallel Molecular Dynamics(MD) simulations are investigated. A perspective on the field of parallel MD simulations is given. Hardware and software aspects are characterized and the interplay between the two is briefly discussed. A method for performing ab initio MD is described; the method essentially recomputes the interaction potential at each time-step. It has been tested on a system of liquid water by comparing results with other simulation methods and experimental results. Different strategies for parallelization are explored. Furthermore, data-parallel methods for short-range and long-range interactions on massively parallel platforms are described and compared. Next, a method for treating electrostatic interactions in MD simulations is developed. It combines the traditional Ewald summation technique with the nonuniform Fast Fourier transform---ENUF for short. The method scales as N log N, where N is the number of charges in the system. ENUF has a behavior very similar to Ewald summation and can be easily and efficiently implemented in existing simulation programs. Finally, an outlook is given and some directions for further developments are suggested.
|
16 |
Petri Net Model Based Energy Optimization Of Programs Using Dynamic Voltage And Frequency ScalingArun, R 06 1900 (has links) (PDF)
High power dissipation and on-chip temperature limit performance and affect reliability in modern microprocessors. For servers and data centers, they determine the cooling cost, whereas for handheld and mobile systems, they limit the continuous usage of these systems. For mobile systems, energy consumption affects the battery life. It can not be ignored for desktop and server systems as well, as the contribution of energy continues to go up in organizations’ budgets, influencing strategic decisions, and its implications on the environment are getting appreciated. Intelligent trade-offs involving these quantities are critical to meet the performance demands of many modern applications.
Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing
trade-offs involving energy, power, temperature and performance of computing systems. In our work, we propose and evaluate DVFS schemes that aim at minimizing energy consumption while meeting a performance constraint, for both sequential and parallel applications.
We propose a Petri net based program performance model, parameterized by application properties, microarchitectural settings and system resource configuration, and use this model to find energy efficient DVFS settings. We first propose a DVFS scheme
using this model for sequential programs running on single core multiple clock domain
(MCD) processors, and evaluate this on a MCD processor simulator. We then extend
this scheme for data parallel (Single Program Multiple Data style) applications, and then generalize it for stream applications as well, and evaluate these two schemes on a full system CMP simulator. Our experimental evaluation shows that the proposed schemes achieve significant energy savings for a small performance degradation.
|
17 |
Aplikace pro řízení paralelního zpracování dat / Application for Parallel Data Processing ControlGrepl, Filip January 2021 (has links)
This work deals with the design and implementation of a system for parallel execution of tasks in the Knowledge Technology Research Group. The goal is to create a web application that allows to control their processing and monitor runs of these tasks including the use of system resources. The work first analyzes the current method of parallel data processing and the shortcomings of this solution. Then the work describes the existing tools including the problems that their test deployment revealed. Based on this knowledge, the requirements for a new application are defined and the design of the entire system is created. After that the selected parts of implementation and the way of the whole system testing is described together with the comparison of the efficiency with the original system.
|
18 |
Využití funkcionálních jazyků pro hardwarovou akceleraci / Hardware Acceleration Using Functional LanguagesHodaňová, Andrea January 2013 (has links)
The aim of this thesis is to research how the functional paradigm can be used for hardware acceleration with an emphasis on data-parallel tasks. The level of abstraction of the traditional hardware description languages, such as VHDL or Verilog, is becoming to low. High-level languages from the domains of software development and modeling, such as C/C++, SystemC or MATLAB, are experiencing a boom for hardware description on the algorithmic or behavioral level. Functional Languages are not so commonly used, but they outperform imperative languages in verification, the ability to capture inherent paralellism and the compactness of code. Data-parallel task are often accelerated on FPGAs, GPUs and multicore processors. In this thesis, we use a library for general-purpose GPU programs called Accelerate and extend it to produce VHDL. Accelerate is a domain-specific language embedded into Haskell with a backend for the NVIDIA CUDA platform. We use the language and its frontend, and create a new backend for high-level synthesis of circuits in VHDL.
|
Page generated in 0.0904 seconds