171 |
Neblokující vstup/výstup pro projekt k-Wave / Non-Blocking Input/Output for the k-Wave ToolboxKondula, Václav January 2020 (has links)
This thesis deals with an implementation of non-blocking I/O interface for the k-Wave project, which is designed for time-domain simulation of ultrasound propagation. Main focus is on large domain simulations that, due to high computing power requirements, must run on supercomputers and produce tens of GB of data in a single simulation step. In this thesis, I have designed and implemented a non-blocking interface for storing data using dedicated threads, which allows to overlap simulation calculations with disk operations in order to speed up the simulation. An acceleration of up to 33% was achieved compared to the current implementation of project k-Wave, which resulted, among other things, also to reduce cost of the simulation.
|
172 |
Výpočet optického pole v GP-GPU / Optical Field Calculations in GP-GPUSrnec, Erik January 2012 (has links)
This work describes a relatively new technique designed to write highly parallel programs, that name is OpenCL. It is intended for both GPU and CPU and other parallel processors. Libraries used by the processor architecture, which includes a large number of small cores. These cores are not as comprehensive as conventional processors and is therefore suitable for calculations, which are many and they are simple. It is this property could, under certain conditions, accelerate the calculation of the hologram, namely the calculation of the optical field. While the calculation itself is simple, but the amount of processed data is large and therefore slow. The work also contain the basic concepts of explanation of optical and digital holography.
|
173 |
Support à l'exécution pour objets actifs multi-threadés : conception et implémentation / Execution support for multi-threaded active objects : design and implementationRochas, Justine 22 September 2016 (has links)
Pour aborder le développement d'applications concurrentes et distribuées, le modèle de programmation à objets actifs procure une abstraction de haut niveau pour programmer de façon concurrente. Les objets actifs sont des entités indépendantes qui communiquent par messages asynchrones. Peu de systèmes à objets actifs considèrent actuellement une exécution multi-threadée. Cependant, introduire un parallélisme contrôlé permet d'éviter les coûts induits par des appels de méthodes distants. Dans cette thèse, nous nous intéressons aux enjeux que présentent les objets actifs multi-threadés, et à la coordination des threads pour exécuter de façon sûre les tâches d'un objet actif en parallèle. Nous enrichissons dans un premier temps le modèle de programmation, afin de contrôler l'ordonnancement interne des tâches. Puis nous exhibons son expressivité de deux façons différentes: d'abord en développant et en analysant les performances de plusieurs applications,puis en compilant un autre langage à objets actifs avec des primitives de synchronisation différentes dans notre modèle de programmation. Aussi, nous rendons nos objets actifs multi-threadés résilients dans un contexte distribué en utilisant les paradigmes de programmation que nous avons développé. Enfin, nous développons une application pair-à-pair qui met en scène des objets actifs multi-threadés. Globalement, nous concevons un cadre de développement et d'exécution complet pour les applications hautes performances distribuées. Nous renforçons notre modèle de programmation en formalisant nos contributions et les propriétés du modèle. Cela munit le programmeur de garanties fortes sur le comportement du modèle de programmation. / In order to tackle the development of concurrent and distributed applications, the active object programming model provides a high-level abstraction to program concurrent behaviours. Active objects are independent entities that communicate by mean of asynchronous messages. Very few of the existing active object frameworks consider a multi-threaded execution of active objects. Introducing a controlled parallelism enables removing some latency induced by remote method invocations. In this thesis, we take interest in the challenges of having multiple threads inside an active object, and in their safe coordination to execute tasks in parallel. We enhance this programming model by adding language constructs that control the internal scheduling of tasks. We then show its expressiveness in two ways: first with a classical approach, by developing and analysing the performance of several applications, and secondly, by compiling another active object language with different synchronisation primitives into our programming model. Also, we make multi-threaded active objects resilient in a distributed context through generic engineering constructs, and by using our programming abstractions. Finally, we develop a peer-to-peer application that shows multi-threaded active objects and their features in action. Overall, we design a thorough framework for the development and execution of high performance distributed applications. We reinforce our programming model by formalising our work and the model’s properties
|
174 |
Tools for Understanding, Debugging, and Simulation Performance Improvement of Equation-based ModelsSjölund, Martin January 2013 (has links)
Equation-based object-oriented (EOO) modelling languages provide a convenient, declarative method for describing models of cyber-physical systems.Because of the ease of use of EOO languages, large and complex models can be built with limited effort.However, current state-of-the-art tools do not provide the user with enough information when errors appear or simulation results are wrong.It is paramount that the tools give the user enough information to correct errors or understand where the problems that lead to wrong simulation results are located.However, understanding the model translation process of an EOO compiler is a daunting task that not only requires knowledge of the numerical algorithms that the tool executes during simulation, but also the complex symbolic transformations being performed. In this work, we develop and explore methods where the EOO tool records the transformations during the translation process in order to provide better diagnostics, explanations, and analysis.This information can be used to generate better error-messages during translation.It can also be used to provide better debugging for a simulation that produces unexpected results or where numerical methods fail. Meeting deadlines is particularly important for real-time applications.It is usually important to identify possible bottlenecks and either simplify the model or give hints to the compiler that enables it to generate faster code.When profiling and measuring execution times of parts of the model the recorded information can also be used to find out why a particular system is slow.Combined with debugging information, it is possible to find out why this system of equations is slow to solve, which helps understanding what can be done to simplify the model. Finally, we provide a method and tool prototype suitable for speeding up simulations by compiling a simulation executable for a parallel platform by partitioning the model at appropriate places.
|
175 |
Effektivisera generering av parameterfiler för betalterminaler / Improve the efficiency of generating parameter files for terminalsVillabona, Antonio, Dietrichson, Fredrik January 2014 (has links)
Denna rapport återger arbetsprocessen kring att utvärdera datalagringsstruktur och förbättra prestanda för generering av parameterfiler för kortterminaler. Arbetet utfördes på plats hos Esplanad AB, ett företag som bland annat arbetar med säkerhetslösningar och distribution av inställningar för betalstationer. Uppgiften bestod av att utvärdera möjligheter till att förbättra databasen som sparar alla inställningarna för betalsystemen, samt att förbättra kodstruktur och prestanda för programmet som genererar filerna. Rapporten beskriver testning av prestanda, både på Esplanads gamla program för att generera parameterfiler och det nya som konstruerades. En lösning presenteras som inkluderar förbättring av filgenereringens prestanda och en ny struktur på databasen för ökad skalbarhet. Tester visar att det nya systemet klarar av att skapa parameterfiler på TLV-format ungefär 16 gånger snabbare. Den föreslagna lösningen implementerar parallella processer och replikering av databasen. / This thesis describes the process of analyzing and evaluating the structure of data storage and improving the performance for generating parameter files destined for card terminals. The work was done in house at Esplanad AB, a company dealing with security solutions and distribution of settings for payment stations. The first task was to evaluate the possibilities for improving the database storing the settings for each card reader. The second task was to improve the structure of the code and also improve the file generating systems performance. The thesis describes testing performance, both for Esplanad’s old system for generating parameter files, and the new one constructed. The solution presented includes improved performance of the file generating process and a new structure for the database that increases scalability. Tests show that the new system is capable of generating parameter files with TLV-format, about 16 times faster. The proposed solution implements parallel processes and database replication.
|
176 |
DIGITAL FRAMING OF CLIMATE CHANGE IN MEXICO : Climate change during the 2020 local electionsViggiano Austria, Aldo Jesus January 2021 (has links)
Studies of journalism have predominantly focused on the West, neglecting large parts of the world including all parts of the Americas apart from the USA. Studying media in Mexico attempts to contribute to the de-Westernisation of media studies, since there is a clear research gap in the field. The Natural Resources Defence Council (2021) has warned that Mexico's retreat from international climate commitments is globally significant and notorious. This makes a study of journalistic representations of climate change and environmental issues relevant in an international perspective. The aim of this study is to analyze a series of articles collected in the digital newspapers Milenio and La Jornada whiting the scope of framing theory. Frames are organizing principles and they will be analyzed in the form of frequency as well as patterns of frame usage, taking as a guide some of the main modes of framing which scholars have pointed out, and by being open for framing categories that could be detected inductively in the material. The work carried in by the research attempts to shed some light on specific variables such as actors that are given voice in these digital outlets and the frames which are salient in the media coverage of climate change, as well as the events that trigger the coverage. The results of the study indicate that in Mexico, a country with its vital productive sectors deeply intertwined within North America and Latin America, the coverage on climate change relies highly on sources from abroad, especially on global news agencies.
|
177 |
Utilizing Heterogeneity in Manycore Architectures for Streaming ApplicationsSavas, Süleyman January 2017 (has links)
In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures. The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores. This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains. Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance. / HiPEC (High Performance Embedded Computing) / NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
|
178 |
A Comparison of Parallel Design Patterns for Game DevelopmentAndblom, Robin, Sjöberg, Carl January 2018 (has links)
----- / As processor performance capabilities can only be increased through the useof a multicore architecture, software needs to be developed to utilize the parallelismoffered by the additional cores. Especially game developers need toseize this opportunity to save cycles and decrease the general rendering time.One of the existing advances towards this potential has been the creation ofmultithreaded game engines that take advantage of the additional processingunits. In such engines, different branches of the game loop are parallelized.However, the specifics of the parallel design patterns used are not outlined.Neither are any ideas of how to combine these patterns proposed. Thesemissing factors are addressed in this article, to provide a guideline for whento use which one of two parallel design patterns; fork-join and pipeline parallelism.Through a collection of data and a comparison using the metricsspeedup and efficiency, conclusions were derived that shed light on the waysin which a typical part of a game loop most efficiently can be organized forparallel execution through the use of different parallel design patterns. Thepipeline and fork-join patterns were applied respectively in a variety of testcases for two branches of a game loop: a BOIDS system and an animationsystem.
|
179 |
Étude de transformations et d’optimisations de code parallèle statique ou dynamique pour architecture "many-core" / Study of transformations and static or dynamic parallel code optimization for manycore architectureGallet, Camille 13 October 2016 (has links)
L’évolution des supercalculateurs, de leur origine dans les années 60 jusqu’à nos jours, a fait face à 3 révolutions : (i) l’arrivée des transistors pour remplacer les triodes, (ii) l’apparition des calculs vectoriels, et (iii) l’organisation en grappe (clusters). Ces derniers se composent actuellement de processeurs standards qui ont profité de l’accroissement de leur puissance de calcul via une augmentation de la fréquence, la multiplication des cœurs sur la puce et l’élargissement des unités de calcul (jeu d’instructions SIMD). Un exemple récent comportant un grand nombre de cœurs et des unités vectorielles larges (512 bits) est le co-proceseur Intel Xeon Phi. Pour maximiser les performances de calcul sur ces puces en exploitant aux mieux ces instructions SIMD, il est nécessaire de réorganiser le corps des nids de boucles en tenant compte des aspects irréguliers (flot de contrôle et flot de données). Dans ce but, cette thèse propose d’étendre la transformation nommée Deep Jam pour extraire de la régularité d’un code irrégulier et ainsi faciliter la vectorisation. Ce document présente notre extension et son application sur une mini-application d’hydrodynamique multi-matériaux HydroMM. Ces travaux montrent ainsi qu’il est possible d’obtenir un gain de performances significatif sur des codes irréguliers. / Since the 60s to the present, the evolution of supercomputers faced three revolutions : (i) the arrival of the transistors to replace triodes, (ii) the appearance of the vector calculations, and (iii) the clusters. These currently consist of standards processors that have benefited of increased computing power via an increase in the frequency, the proliferation of cores on the chip and expansion of computing units (SIMD instruction set). A recent example involving a large number of cores and vector units wide (512-bit) is the co-proceseur Intel Xeon Phi. To maximize computing performance on these chips by better exploiting these SIMD instructions, it is necessary to reorganize the body of the loop nests taking into account irregular aspects (control flow and data flow). To this end, this thesis proposes to extend the transformation named Deep Jam to extract the regularity of an irregular code and facilitate vectorization. This thesis presents our extension and application of a multi-material hydrodynamic mini-application, HydroMM. Thus, these studies show that it is possible to achieve a significant performance gain on uneven codes.
|
180 |
A Scheduling and Partitioning Model for Stencil-based Applications on Many-Core Devices / Modèle d'Ordonnancement et de Partitionnement pour Applications à Maillages et Calculs Réguliers dans le Cadre d'Accélérateurs de Type «ManyCore»Papin, Jean-Charles 08 September 2016 (has links)
La puissance de calcul des plus grands calculateurs ne fait qu'augmenter: de quelques centaines de cœurs de calculs dans les années 1990, on en est maintenant à plusieurs millions! Leur infrastructure évolue aussi: elle n'est plus linéaire, mais complètement hiérarchique. Les applications de calcul intensif, largement utilisées par la communauté scientifique, doivent donc se munir d'outils permettant d'utiliser pleinement l'ensemble de ces ressources de manière efficace. La simulation numérique repose bien souvent sur d'importants calculs dont le coût, en termes de temps et d'accès mémoire, peut fortement varier au cours du temps: on parle de charge de calcul variable. Dans cette Thèse, on se propose d'étudier les outils actuels de répartition des données et des calculs, afin de voir les raisons qui font que de tels outils ne sont pas pleinement adaptés aux fortes variations de charge ainsi qu'à la hiérarchie toujours plus importante des nouveaux calculateurs. Nous proposerons alors un nouveau modèle d'ordonnancement et de partitionnement, basé sur des interactions physiques, particulièrement adapté aux applications basées sur des maillages réguliers et présentant de fortes variations de charge au cours du temps. Nous validerons alors ce modèle en le comparant à des outils de partitionnement de graphes reconnus et largement utilisés, et verrons les raisons qui le rendent plus performant pour des applications aussi bien parallèles que distribuées. Enfin, nous proposerons une interface nous permettant d'utiliser cette méthode d'ordonnancement dans des calculateurs toujours plus hiérarchiques. / Computing capability of largest computing centers is still increasing: from a few hundred of cores in the90's, they can now exceed several million of cores! Their infrastructure also evolves: it is no longerlinear, but fully hierarchical.High Performance applications, well used by the scientific community, require on tools that allow themto efficiently and fully use computing resources.Numerical simulations mostly rely on large computations chains for which the cost (computing load), either acomputing time or a memory access time, can strongly vary over time: it is referred to as dynamic computing loadevolution.In this thesis, we propose to study actual data partitioning and computing scheduling tools, and to explore theirlimitations with regards to strong and repetitive load variation as well as the still increasing cluster hierarchy.We will then propose a new scheduling and partitioning model, based on physical interactions, particularlysuitable to regular mesh based applications that produce strong computing load variations over time.We will then compare our model against well-known and widely used graph partitioning tools and we will see thereasons that make this model more reliable for such parallel and distributed applications.Lastly, we will propose a multi-level scheduling interface that is specially designed to allow to use ourmodel in even more hierarchical clusters.
|
Page generated in 0.0471 seconds