• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 122
  • 21
  • 19
  • 16
  • 14
  • 10
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 234
  • 64
  • 44
  • 31
  • 27
  • 26
  • 26
  • 25
  • 25
  • 23
  • 23
  • 22
  • 22
  • 21
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Avaliação de métodos de paralelização automática. / Evaluation of automatic parallelization methods.

Edson Pedro Ferlin 24 March 1997 (has links)
Este trabalho aborda alguns conceitos e definições de processamento paralelo, que são aplicados a paralelização automática, e também às análises e condições para as dependências dos dados, de modo a aplicarmos os métodos de paralelização: Hiperplano, Transformação Unimodular, Alocação de Dados Sem Comunicação e Particionamento & Rotulação. Desta forma, transformamos um programa seqüencial em seu equivalente paralelo. Utilizando-os em um sistema de memória distribuída com comunicação através da passagem de mensagem MPI (Message-Passing Interface), e obtemos algumas métricas para efetuarmos as avaliações/comparações entre os métodos. / This work invoke some concepts and definitions about parallel processing, applicable in the automatic parallelization, and also the analysis and conditions for the data dependence, in order to apply the methods for parallelization: Hyperplane, Unimodular Transformation, Communication-Free Data Allocation and Partitioning & Labeling. On this way, transform a sequential program into an equivalent parallel one. Applying these programs on the distributed-memory system with communication through message-passing MPI (Message-Passing Interface), and we obtain some measurements for the evaluations/comparison between those methods.
92

Programação paralela e sequencial aplicada à otimização de estruturas metálicas com o algoritmo PSO

Esposito, Adelano January 2012 (has links)
Um dos métodos heurísticos bastante explorados em engenharia é o PSO (Otimização por enxame de partículas). O PSO é uma meta-heurística baseada em populações de indivíduos, na qual candidatos à solução evoluem através da simulação de um modelo simplificado de adaptação social. Este método vem conquistando grande popularidade, no entanto, o elevado número de avaliações da função objetivo limita a sua aplicação em problemas de grande porte de engenharia. Por outro lado, esse algoritmo pode ser facilmente paralelizado, o que torna a computação paralela uma alternativa atraente para sua utilização. Neste trabalho, são desenvolvidas duas versões seriais do algoritmo por enxame de partícula e suas respectivas extensões paralelas. Os algoritmos paralelos, por meio de funções disponíveis na biblioteca do MATLAB®, utilizam os paradigmas mestre-escravo e múltiplas populações, diferindo entre si pela forma de atualização das partículas do enxame (revoada ou pseudo-revoada) bem como pelo modo de comunicação entre os processadores (síncrono ou assíncrono). Os modelos propostos foram aplicados na otimização de problemas clássicos da engenharia estrutural, tradicionalmente encontrados na literatura (benchmarks) e seus resultados são comparados quanto às métricas utilizadas na literatura para avaliação dos algoritmos. Os resultados obtidos demonstram que a computação paralela possibilitou uma melhora no desempenho do algoritmo sequencial assíncrono. Também são registrados bons ganhos de tempo de processamento para as duas extensões paralelas do algoritmo, salvo que o algoritmo paralelo síncrono, diferentemente da versão paralela assíncrona, demonstrou um crescente desempenho computacional à medida que mais processadores são utilizados. / Amongst heuristic algorithms, PSO (Particle Swarm Optimization) is one of the most explored. PSO is a metaheuristic based on a population of individuals, in which solution candidates evolve by simulating a simplified model of social adaptation. This method has becoming popular, however, the large number of evaluations of the objective function limits its application to large-scale engineering problems. On the other hand, this algorithm can easily be parallelized, which makes parallel computation an attractive alternative to be used. In this work, two versions of the serial particle swarm algorithm and their parallel extensions are developed. The parallel algorithms, by means of available MATLAB® functionalities, use the master-slave paradigm and multiple populations, differing from each other by the way the particle swarm is updated (flocking or pseudo-flocking) as well as by the communication between processors (synchronous or asynchronous). The proposed models were applied to the optimization of classical structural engineering problems found in the literature (benchmarks) and the results are compared in terms usual metrics used for algorithm evaluation. The results show that parallel computing has enabled an improvement in the performance of asynchronous parallel algorithm. Good time savings were recorded for the two parallel extensions, except that the synchronous parallel algorithm, unlike the asynchronous parallel version, demonstrated a growing performance as more processors are used.
93

An Improved Design and Implementation of the Session-based SAMBO with Parallelization Techniques and MongoDB

Zhao, Yidan January 2017 (has links)
The session-based SAMBO is an ontology alignment system involving MySQL to store matching results. Currently, SAMBO is able to align most ontologies within acceptable time. However, when it comes to large scale ontologies, SAMBO fails to reach the target. Thus, the main purpose of this thesis work is to improve the performance of SAMBO, especially in the case of matching large scale ontologies.  To reach the purpose, a comprehensive literature study and an investigation on two outstanding large scale ontology system are carried out with the aim of setting the improvement directions. A detailed investigation on the existing SAMBO is conducted to figure out in which aspects the system can be improved. Parallel matching process optimization and data management optimization are determined as the primary optimization goal of the thesis work. In the following, a few relevant techniques are studied and compared. Finally, an optimized design is proposed and implemented.  System testing results of the improved SAMBO show that both parallel matching process optimization and data management optimization contribute greatly to improve the performance of SAMBO. However the execution time of SAMBO to align large scale ontologies with database interaction is still unacceptable.
94

Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU / Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators

Amini, Mehdi 13 December 2012 (has links)
Depuis le début des années 2000, la performance brute des cœurs des processeurs a cessé son augmentation exponentielle. Les circuits graphiques (GPUs) modernes ont été conçus comme des circuits composés d'une véritable grille de plusieurs centaines voir milliers d'unités de calcul. Leur capacité de calcul les a amenés à être rapidement détournés de leur fonction première d'affichage pour être exploités comme accélérateurs de calculs généralistes. Toutefois programmer un GPU efficacement en dehors du rendu de scènes 3D reste un défi.La jungle qui règne dans l'écosystème du matériel se reflète dans le monde du logiciel, avec de plus en plus de modèles de programmation, langages, ou API, sans laisser émerger de solution universelle.Cette thèse propose une solution de compilation pour répondre partiellement aux trois "P" propriétés : Performance, Portabilité, et Programmabilité. Le but est de transformer automatiquement un programme séquentiel en un programme équivalent accéléré à l'aide d'un GPU. Un prototype, Par4All, est implémenté et validé par de nombreuses expériences. La programmabilité et la portabilité sont assurées par définition, et si la performance n'est pas toujours au niveau de ce qu'obtiendrait un développeur expert, elle reste excellente sur une large gamme de noyaux et d'applications.Une étude des architectures des GPUs et les tendances dans la conception des langages et cadres de programmation est présentée. Le placement des données entre l'hôte et l'accélérateur est réalisé sans impliquer le développeur. Un algorithme d'optimisation des communications est proposé pour envoyer les données sur le GPU dès que possible et les y conserver aussi longtemps qu'elle ne sont pas requises sur l'hôte. Des techniques de transformations de boucles pour la génération de code noyau sont utilisées, et même certaines connues et éprouvées doivent être adaptées aux contraintes posées par les GPUs. Elles sont assemblées de manière cohérente, et ordonnancées dans le flot d'un compilateur interprocédural. Des travaux préliminaires sont présentés au sujet de l'extension de l'approche pour cibler de multiples GPUs. / Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase. The modern graphic processing units (GPUs) have been designed as array of hundreds or thousands of compute units. The GPUs' compute capacity quickly leads them to be diverted from their original target to be used as accelerators for general purpose computation. However programming a GPU efficiently to perform other computations than 3D rendering remains challenging.The current jungle in the hardware ecosystem is mirrored by the software world, with more and more programming models, new languages, different APIs, etc. But no one-fits-all solution has emerged.This thesis proposes a compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability. The goal is to transform automatically a sequential program into an equivalent program accelerated with a GPU. A prototype, Par4All, is implemented and validated with numerous experiences. The programmability and portability are enforced by definition, and the performance may not be as good as what can be obtained by an expert programmer, but still has been measured excellent for a wide range of kernels and applications.A survey of the GPU architectures and the trends in the languages and framework design is presented. The data movement between the host and the accelerator is managed without involving the developer. An algorithm is proposed to optimize the communication by sending data to the GPU as early as possible and keeping them on the GPU as long as they are not required by the host. Loop transformations techniques for kernel code generation are involved, and even well-known ones have to be adapted to match specific GPU constraints. They are combined in a coherent and flexible way and dynamically scheduled within the compilation process of an interprocedural compiler. Some preliminary work is presented about the extension of the approach toward multiple GPUs.
95

A Data-Parallel Graphics Pipeline Implemented in OpenCL / En Data-Parallell Grafikpipeline Implementerad i OpenCL

Ek, Joel January 2012 (has links)
This report documents implementation details, results, benchmarks and technical discussions for the work carried out within a master’s thesis at Linköping University. Within the master’s thesis, the field of software rendering is explored in the age of parallel computing. Using the Open Computing Language, a complete graphics pipeline was implemented for use on general processing units from different vendors. The pipeline is tile-based, fully-configurable and provides means of rendering visually compelling images in real-time. Yet, further optimizations for parallel architectures are needed as uneven work loads drastically decrease the overall performance of the pipeline.
96

Garbage Collection supporting automatic JIT parallelization in JVM

Österlund, Erik January 2012 (has links)
With increasing clock-rates in CPUs coming to an end, a need for parallelization has emerged. This thesis proposes a dynamic purity analysis of objects, detecting independent execution paths that may be run in parallel. The analysis relies in speculative guesses and may be rolled back when proven wrong. It piggybags on an efficient replicating garbage collector integrated to JVM. The efficiency of the algorithms are shown in benchmark, and are comparable to the speed of state of the art garbage collectors in hotspot’s JVM. With this dynamic purity analysis now accessible in Java programs, the potential for automatic JIT-parallelization of pure methods is possible.
97

Etude du couplage convection-rayonnement en cavité différentiellement chauffée à haut nombre de Rayleigh en ambiances habitables / Convection-radiation coupling in differentially heated cavity at high Rayleigh number in building situations

Cadet, Laurent 07 December 2015 (has links)
L'influence des transferts radiatifs sur les écoulements de convection naturelle en cavités habitables est étudié numériquement en régimes turbulents. L'étude considère des approches DNS et LES pour le problème de convection et une méthode des ordonnées discrètes (MOD) pour la résolution du problème radiatif combinée au modèle de gaz réel SLW. La configuration étudiée est basée sur une cavité différentiellement chauffée expérimentale en air située à l'institut PPRIME, de rapport d'aspect vertical 4, pour des nombres de Rayleigh allant de 1,5x109 à 1,2x1011. La première partie de l'étude se focalise sur les techniques de parallélisations hybrides MPI + OpenMP de la MOD. Les méthodes développées montrent une amélioration des performances de 13 à 1600% pour des niveaux d'hybridations élevés par rapport à la méthode classique de front d'onde. Puis, une étude du couplage convection-rayonnement surfacique est réalisée au travers d'une étude de sensibilité de l'écoulement vis-à-vis des émissivités de parois pour différentes valeurs du nombre de Rayleigh. Ensuite, le rayonnement volumique du gaz est ajouté, et son impact est évalué au travers d'une variation du taux d'humidité relative du mélange air sec/vapeur d'eau. Les résultats obtenus sont comparés aux cas d'une cavité convectivement adiabatique (i.e. flux convectif nul aux parois passives). Les transferts radiatifs ont pour effet de diminuer la stratification thermique centrale et d’augmenter la dynamique générale de l'écoulement. L'émissivité des parois passives pilote principalement la localisation de la transition laminaire-turbulente aux parois actives et la stratification centrale, tandis que le rayonnement de gaz ne semble impacter que les couches limites des parois horizontales. / The influence of radiative transfer on natural convection flows in cavities is studied numerically in turbulent regimes. The study considers DNS and LES approaches for the convection problem and discrete ordinate method (MOD) to solve the radiative problem combined with the SLW real gas model. The studied configuration is based on an experimental differentially heated cavity in air located at the Pprime Institut with a vertical aspect ratio of 4, for Rayleigh numbers ranging from 1,5x109 to 1,2x1011. The first part of the study focuses on hybrid MPI + OpenMP parallelization of the DOM. The methods developed show performance improvements of 13 to 1600% compared to the classical wavefront method. Then, a study of convection-wall radiation coupling is achieved through a flow sensitivity study to walls emissivities for different values of the Rayleigh number. Then, the gas radiation is added, and its impact is measured through a variation of the relative humidity of the dry air + steam. The results are compared to the case of a convectively adiabatic cavity (i.e. zero convective flux at the passive walls). Radiative transfers have the effect of reducing the central thermal stratification and increase the overall dynamics of the flow. The emissivity of the passive walls drives the location of the laminar-turbulent transition on the active walls and the central thermal stratification, while the gas radiation seems to impact the boundary layers of the horizontal walls.
98

Résolution des équations de Maxwell tridimensionnelles instationnaires sur architecture massivement multicoeur / Resolution of tridimensional instationary Maxwell's equations on massively multicore architecture

Strub, Thomas 13 March 2015 (has links)
Cette thèse s'inscrit dans un projet d'innovation duale RAPID financé par DGA/DS/MRIS et appelé GREAT faisant intervenir la société Axessim, l'ONERA, INRIA, l'IRMA et le CEA. Ce projet a pour but la mise en place d'une solution industrielle de simulation électromagnétique basée sur une méthode Galerkin Discontinue (GD) parallèle sur maillage hexaédrique. Dans un premier temps, nous établissons un schéma numérique adapté à un système de loi de conservation. Nous pouvons ainsi appliquer cette approche aux équations de Maxwell, mais également à tout système hyperbolique. Dans un second temps, nous mettons en place une parallélisation à deux niveaux de ce schéma. D'une part, les calculs sont parallélisés sur carte graphique au moyen de la bibliothèque OpenCL. D'autre part, plusieurs cartes graphiques peuvent être utilisées, chacune étant pilotée par un processus MPI. De plus, les communications MPI et les calculs OpenCL sont asynchronisés permettant d'obtenir une forte accélération. / This thesis is part of a dual innovation project funded by RAPID DGA/DS/MRIS and called GREAT involving Axessim company, ONERA, INRIA, IRMA and the CEA. This project aims at the establishment of an industrial solution of electromagnetic simulation based on a method Discontinuous Galerkin (DG) on parallel hexahedral mesh. First, we establish a numerical scheme adapted to a conservation law system. We can apply this approach to the Maxwell equations but also to any hyperbolic system. In a second step, we set up a two-level parallelization of this scheme. On the one hand, the calculations are parallelized on graphics card using the OpenCL library. On the other hand, multiple graphics cards can be used, each driven by a MPI process. In addition, MPI communications and OpenCL computations are launched asynchronously in order to obtain a strong acceleration.
99

Fast and flexible compilation techniques for effective speculative polyhedral parallelization / Techniques de compilation flexibles et rapides pour la parallelization polyédrique et spéculative

Martinez Caamaño, Juan Manuel 29 September 2016 (has links)
Dans cette thèse, nous présentons nos contributions à APOLLO : un compilateur de parallélisation automatique qui combine l'optimisation polyédrique et la parallélisation spéculative, afin d'optimiser des programmes dynamiques à la volée. Grâce à une phase de profilage en ligne et un modèle spéculatif du comportement mémoire du programme cible, Apollo est capable de sélectionner une optimisation et de générer le code résultant. Pendant l'exécution du programme optimisé, Apollo vérifie constamment la validité du modèle spéculatif. La contribution principale de cette thèse est un mécanisme de génération de code qui permet d'instancier toute transformation polyédrique, au cours de l'exécution du programme cible, sans engendrer de surcoût temporel majeur. Ce procédé est désormais utilisé dans Apollo. Nous l'appelons Code-Bones. Il procure des gains de performance significatifs par comparaison aux autres approches. / In this thesis, we present our contributions to APOLLO: an automatic parallelization compiler that combines polyhedral optimization with Thread-Level-Speculation, to optimize dynamic codes on-the-fly. Thanks to an online profiling phase and a speculation model about the target's code behavior, Apollo is able to select an optimization and to generate code based on it. During optimized code execution, Apollo constantly verifies the validity of the speculation model. The main contribution of this thesis is a code generation mechanism that is able to instantiate any polyhedral transformation, at runtime, without incurring a major time-overhead. This mechanism is currently in use inside Apollo. We called it Code-Bones. It provides significant performance benefits when compared to other approaches.
100

Parallel Evaluation of Numerical Models for Algorithmic Trading / Parallel Evaluation of Numerical Models for Algorithmic Trading

Ligr, David January 2016 (has links)
This thesis will address the problem of the parallel evaluation of algorithmic trading models based on multiple kernel support vector regression. Various approaches to parallelization of the evaluation of these models will be proposed and their suitability for highly parallel architectures, namely the Intel Xeon Phi coprocessor, will be analysed considering specifics of this coprocessor and also specifics of its programming. Based on this analysis a prototype will be implemented, and its performance will be compared to a serial and multi-core baseline pursuant to executed experiments. Powered by TCPDF (www.tcpdf.org)

Page generated in 0.1058 seconds