Global ETD Search

21	PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES Kristipati, Pavan K. 01 January 2008 (has links) This thesis focuses on optimizing the performance of an in-house, structured, 2D CFD code – GHOST, on commodity cluster architectures. The basic philosophy of the work is to optimize the cache usage of the code by implementing efficient coding techniques without changing the underlying numerical algorithm. Various optimization techniques that were implemented and the resulting changes in performance have been presented. Two techniques, external and internal blocking that were implemented earlier to tune the performance of this code have been reviewed. What follows is further tuning effort in order to circumvent the problems associated with using the blocking techniques. Later, to establish the universality of the optimization techniques, testing has been done on more complicated test case. All the techniques presented in this thesis have been tested on steady, laminar test cases. It has been proved that optimized versions of the code achieve better performances on variety of commodity cluster architectures chosen in this study. Mechanical Engineering
22	Idiom-driven innermost loop vectorization in the presence of cross-iteration data dependencies in the HotSpot C2 compiler / Idiomdriven vektorisering av inre loopar med databeroenden i HotSpots C2 kompilator Sjöblom, William January 2020 (has links) This thesis presents a technique for automatic vectorization of innermost single statement loops with a cross-iteration data dependence by analyzing data-flow to recognize frequently recurring program idioms. Recognition is carried out by matching the circular SSA data-flow found around the loop body’s φ-function against several primitive patterns, forming a tree representation of the relevant data-flow that is then pruned down to a single parameterized node, providing a high-level specification of the data-flow idiom at hand used to guide algorithmic replacement applied to the intermediate representation. The versatility of the technique is shown by presenting an implementation supporting vectorization of both a limited class of linear recurrences as well as prefix sums, where the latter shows how the technique generalizes to intermediate representations with memory state in SSA-form. Finally, a thorough performance evaluation is presented, showing the effectiveness of the vectorization technique. compiler vectorization SIMD Java HotSpot code optimization reductions prefix sums parallel programming data-level parallelism Computer Sciences Datavetenskap (datalogi)
23	Code optimization based on source to source transformations using profile guided metrics / Optimisation de code basée sur des transformations source-à-source guidées par des métriques issues de profilages Lebras, Youenn 03 July 2019 (has links) Le but est de développer d'un cadriciel permettant de définir les transformations de code source que nous jugeons judicieuses et sur la base de métriques dynamiques.Ce cadriciel sera ensuite intégré à la suite d'outil MAQAO, développée à l'UVSQ/ECR.Nous présentons des transformations source-à-source automatique guidées par l'utilisateur ansi que par les métriques dynamiques qui proviennent des différents outils d'analyse de MAQAO, afin de pouvoir travailler à la fois sur des objets sources et binaires.Ce cadriciel peut aussi servir de pré-processeur pour simplifier le développement en permettant d'effectuer certaines transformations simples mais chronophage et sources d'erreurs (i.e.: spécialisation de boucle ou fonction). / Our goal is to develop a framework allowing the definition of source code transformations based on dynamic metrics.This framework be integrated to the MAQAO tool suite developed at the UVSQ / ECR.We present a set of source-to-source transformations guidable by the end user and by the dynamic metrics coming from the various MAQAO tools in order to work at source and binary levels.This framework can also be used as a pre-processor to simplify the development by enabling to perform cleanly and automatically some simple but time-consuming and error-prone transformations (i.e .: loop/function specialization, ...). Optimisation de code Transformations source à source Autotuning Pgo Métriques dynamiques Code optimization Source-To-Source transformations Autotuning Pgo Dynamic metrics 005.45
24	Automatização do processo de seleção de transformações para otimização do tempo de execução por meio de aprendizado de máquina no arcabouço da LLVM. / Transformation selection process automation for execution time optimization through machine learning on LLVM framework. Sabaliauskas, Jorge Augusto 28 April 2015 (has links) A rápida evolução do hardware demanda uma evolução contínua dos compiladores. Um processo de ajuste deve ser realizado pelos projetistas de compiladores para garantir que o código gerado pelo compilador mantenha uma determinada qualidade, seja em termos de tempo de processamento ou outra característica pré-definida. Este trabalho visou automatizar o processo de ajuste de compiladores por meio de técnicas de aprendizado de máquina. Como resultado os planos de compilação obtidos usando aprendizado de máquina com as características propostas produziram código para programas cujos valores para os tempos de execução se aproximaram daqueles seguindo o plano padrão utilizado pela LLVM. / The fast evolution of hardware demands a continue evolution of the compilers. Compiler designers must perform a tuning process to ensure that the code generated by the compiler maintain a certain quality, both in terms of processing time or another preset feature. This work aims to automate compiler adjustment process through machine learning techniques. As a result the compiler plans obtained using machine learning with the proposed features had produced code for programs whose values for the execution times approached those following the standard plan used by LLVM. Ajuste de parâmetros de transformação Aprendizado computacional Aprendizado de máquina Code optimization Compiler tuning automatization process Machine learning Otimização de código Transformation parameters tuning
25	Automatização do processo de seleção de transformações para otimização do tempo de execução por meio de aprendizado de máquina no arcabouço da LLVM. / Transformation selection process automation for execution time optimization through machine learning on LLVM framework. Jorge Augusto Sabaliauskas 28 April 2015 (has links) A rápida evolução do hardware demanda uma evolução contínua dos compiladores. Um processo de ajuste deve ser realizado pelos projetistas de compiladores para garantir que o código gerado pelo compilador mantenha uma determinada qualidade, seja em termos de tempo de processamento ou outra característica pré-definida. Este trabalho visou automatizar o processo de ajuste de compiladores por meio de técnicas de aprendizado de máquina. Como resultado os planos de compilação obtidos usando aprendizado de máquina com as características propostas produziram código para programas cujos valores para os tempos de execução se aproximaram daqueles seguindo o plano padrão utilizado pela LLVM. / The fast evolution of hardware demands a continue evolution of the compilers. Compiler designers must perform a tuning process to ensure that the code generated by the compiler maintain a certain quality, both in terms of processing time or another preset feature. This work aims to automate compiler adjustment process through machine learning techniques. As a result the compiler plans obtained using machine learning with the proposed features had produced code for programs whose values for the execution times approached those following the standard plan used by LLVM. Ajuste de parâmetros de transformação Aprendizado computacional Aprendizado de máquina Otimização de código Code optimization Compiler tuning automatization process Machine learning Transformation parameters tuning
26	Estudo e implantação numérica da teoria de Biot para meios elastoplásticos e uso de estratégias de otimização para o processamento / Study and implementation of Biot s theory for media elastoplastic and use of optimization strategy for the processing Costa, Joseanderson Augusto de Caldas 03 May 2012 (has links) This work presents a strategy for the coupled poro-elasto-plastic formulation. The Finite Element Method (FEM) is used to solve the differential equations, interpolating displacement and pore pressure fields. This problem is solved fully coupled, based on an only one system of equations. The nonlinear problem is globally solved by the Newton-Raphson procedure, and the Closest Point algorithm is implemented for the returning map in the elasto-plastic models. Based on a computational module that has already been developed (PORO), which is written using C++ language and Object-Oriented Programming (OOP), this work expands this program creating new classes for different elasto-plastic constitutive models. The program is verified by classical examples in the literature such as the poro-elastic column and the problem of Schiffman. Some strategies for optimization the computational cost are presented, which use specialized math libraries (MKL) and code parallelization (OpenMP). / Este trabalho apresenta, discute e implementa a formulação poro-elastoplástica fortemente acoplada. A discretização espacial das equações diferenciais governantes é realizada através do Método dos Elementos Finitos (MEF), com interpolação do campo de deslocamento e da poropressão. O problema poro-mecânico é resolvido de forma totalmente acoplada, com base em um único sistema de equações. O método iterativo de Newton-Rhapson é empregado para a solução global do problema não linear, tendo ainda o algoritmo implícito iterativo Closest Point para a integração local das equações da plasticidade. Baseando-se em um programa computacional pré-existente denominado PORO, escrito na linguagem C++ e que utiliza o paradigma de Programação Orientada a Objetos (POO), faz-se a adaptação desse código através da criação de novas classes para permitir o uso de modelos constitutivos elastoplásticos e lei de fluxo associada no acoplamento poro-mecânico. Para verificação do programa são analisados problemas clássicos da literatura, a exemplo da coluna poro-elástica e o caso de Schiffman. Descrevem-se ainda algumas estratégias de otimização do custo computacional, implementando-se o uso de bibliotecas matemáticas (MKL) e paralelização do código (OpenMP). Coupled analysis Poroplasticity Constitutive models Finite element method Code optimization Análise acoplada Poroelasticidade Modelos constitutivos Método dos elementos finitos Otimização de código CNPQ::ENGENHARIAS::ENGENHARIA CIVIL
27	Généralisation de l’analyse de performance décrémentale vers l’analyse différentielle / Generalization of the decremental performance analysis to differential analysis Bendifallah, Zakaria 17 September 2015 (has links) Une des étapes les plus cruciales dans le processus d’analyse des performances d’une application est la détection des goulets d’étranglement. Un goulet étant tout évènement qui contribue à l’allongement temps d’exécution, la détection de ses causes est importante pour les développeurs d’applications afin de comprendre les défauts de conception et de génération de code. Cependant, la détection de goulets devient un art difficile. Dans le passé, des techniques qui reposaient sur le comptage du nombre d’évènements, arrivaient facilement à trouver les goulets. Maintenant, la complexité accrue des micro-architectures modernes et l’introduction de plusieurs niveaux de parallélisme ont rendu ces techniques beaucoup moins efficaces. Par conséquent, il y a un réel besoin de réflexion sur de nouvelles approches.Notre travail porte sur le développement d’outils d’évaluation de performance des boucles de calculs issues d’applications scientifiques. Nous travaillons sur Decan, un outil d’analyse de performance qui présente une approche intéressante et prometteuse appelée l’Analyse Décrémentale. Decan repose sur l’idée d’effectuer des changements contrôlés sur les boucles du programme et de comparer la version obtenue (appelée variante) avec la version originale, permettant ainsi de détecter la présence ou pas de goulets d’étranglement.Tout d’abord, nous avons enrichi Decan avec de nouvelles variantes, que nous avons conçues, testées et validées. Ces variantes sont, par la suite, intégrées dans une analyse de performance poussée appelée l’Analyse Différentielle. Nous avons intégré l’outil et l’analyse dans une méthodologie d’analyse de performance plus globale appelée Pamda.Nous décrirons aussi les différents apports à l’outil Decan. Sont particulièrement détaillées les techniques de préservation des structures de contrôle du programme,ainsi que l’ajout du support pour les programmes parallèles.Finalement, nous effectuons une étude statistique qui permet de vérifier la possibilité d’utiliser des compteurs d’évènements, autres que le temps d’exécution, comme métriques de comparaison entre les variantes Decan / A crucial step in the process of application performance analysis is the accurate detection of program bottlenecks. A bottleneck is any event which contributes to extend the execution time. Determining their cause is important for application developpers as it enable them to detect code design and generation flaws.Bottleneck detection is becoming a difficult art. Techniques such as event counts,which succeeded to find bottlenecks easily in the past, became less efficient because of the increasing complexity of modern micro-processors, and because of the introduction of parallelism at several levels. Consequently, a real need for new analysis approaches is present in order to face these challenges.Our work focuses on performance analysis and bottleneck detection of computeintensive loops in scientific applications. We work on Decan, a performance analysis and bottleneck detection tool, which offers an interesting and promising approach called Decremental Analysis. The tool, which operates at binary level, is based on the idea of performing controlled modifications on the instructions of a loop, and comparing the new version (called variant) to the original one. The goal is to assess the cost of specific events, and thus the existence or not of bottlenecks.Our first contribution, consists of extending Decan with new variants that we designed, tested and validated. Based on these variants, we developed analysis methods which we used to characterize hot loops and find their bottlenecks. Welater, integrated the tool into a performance analysis methodology (Pamda) which coordinates several analysis tools in order to achieve a more efficient application performance analysis.Second, we introduce several improvements on the Decan tool. Techniquesdeveloped to preserve the control flow of the modified programs, allowed to use thetool on real applications instead of extracted kernels. Support for parallel programs(thread and process based) was also added. Finally, our tool primarily relying on execution time as the main concern for its analysis process, we study the opportunity of also using other hardware generated events, through a study of their stability, precision and overhead Analyse de performances Réécriture binaire Analyse dynamique Analyse statique du code Optimization de code Parallélisme Accès mémoire Performance analysis Binary rewriting Dynamic code analysis Static code analysis Code optimization Parallelism Memory accesses Hardware counters
28	Test particle transport in turbulent magnetohydrodynamic structures Lalescu, Cristian 01 July 2011 (has links) Turbulent phenomena are found in both natural (e.g. the Earth's oceans, the Sun's corona) and artificial (e.g. flows through pipes, the plasma in a tokamak device) settings; evidence suggests that turbulence is usually the normal behaviour in most cases. Turbulence has been studied extensively for more than a century, but a complete and consistent theoretical description of it has not yet been proposed. It is in this context that the motion of particles under the influence of turbulent fields is studied in this work, with direct numerical simulations. The thesis is structured in three main parts. The first part describes the tools that are used. Methods of integrating particle trajectories are presented, together with a discussion of the properties that these methods should have. The simulation of magnetohydrodynamic (MHD) turbulence is discussed, while also introducing fundamental concepts of fluid turbulence. Particle trajectory integration requires information that is not readily available from simulations of turbulent flows, so the interpolation methods needed to adapt the fluid simulation results are constructed as well. The second part is dedicated to the study of two MHD problems. Simulations of Kolmogorov flow in incompressible MHD are presented and discussed, and also simulations of the dynamo effect in compressible MHD. These two scenarios are chosen because large scale structures are formed spontaneously by the turbulent flow, and there is an interest in studying particle transport in the presence of structures. Studies of particle transport are discussed in the third part. The properties of the overall approach are first analyzed in detail, for stationary predefined fields. Focus is placed on the qualitative properties of the different methods presented. Charged article transport in frozen turbulent fields is then studied. Results concerning transport of particles in fully developed, time-evolving, turbulent fields are presented in the final chapter.<p><p><p>\ / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Physique Sciences exactes et naturelles Turbulence Magnetohydrodynamics Electromagnetic fields Plasma (Ionized gases) Spline theory Turbulence Magnétohydrodynamique Champs électromagnétiques Plasma (Gaz ionisés) Splines, Théorie des code optimization pseudo-spectral simulations anisotropic transport Poincaré sections global invariant conservation spline interpolation Kolmogorov flow direct numerical simulation

Search results