Global ETD Search

121	Mouvement de données et placement des tâches pour les communications haute performance sur machines hiérarchiques Moreaud, Stéphanie 12 October 2011 (has links) Les architectures des machines de calcul sont de plus en plus complexes et hiérarchiques, avec des processeurs multicœurs, des bancs mémoire distribués, et de multiples bus d'entrées-sorties. Dans le cadre du calcul haute performance, l'efficacité de l'exécution des applications parallèles dépend du coût de communication entre les tâches participantes qui est impacté par l'organisation des ressources, en particulier par les effets NUMA ou de cache.Les travaux de cette thèse visent à l'étude et à l'optimisation des communications haute performance sur les architectures hiérarchiques modernes. Ils consistent tout d'abord en l'évaluation de l'impact de la topologie matérielle sur les performances des mouvements de données, internes aux calculateurs ou au travers de réseaux rapides, et pour différentes stratégies de transfert, types de matériel et plateformes. Dans une optique d'amélioration et de portabilité des performances, nous proposons ensuite de prendre en compte les affinités entre les communications et le matériel au sein des bibliothèques de communication. Ces recherches s'articulent autour de l'adaptation du placement des tâches en fonction des schémas de transfert et de la topologie des calculateurs, ou au contraire autour de l'adaptation des stratégies de mouvement de données à une répartition définie des tâches. Ce travail, intégré aux principales bibliothèques MPI, permet de réduire de façon significative le coût des communications et d'améliorer ainsi les performances applicatives. Les résultats obtenus témoignent de la nécessité de prendre en compte les caractéristiques matérielles des machines modernes pour en exploiter la quintessence. / The emergence of multicore processors led to an increasing complexity inside the modern servers, with many cores, distributed memory banks and multiple Input/Output buses. The execution time of parallel applications depends on the efficiency of the communications between computing tasks. On recent architectures, the communication cost is largely impacted by hardware characteristics such as NUMA or cache effects. In this thesis, we propose to study and optimize high performance communication on hierarchical architectures. We first evaluate the impact of the hardware affinities on data movement, inside servers or across high-speed networks, and for multiple transfer strategies, technologies and platforms. We then propose to consider affinities between hardware and communicating tasks inside the communication libraries to improve performance and ensure their portability. To do so,we suggest to adapt the tasks binding according to the transfer method and thetopology, or to adjust the data transfer strategies to a defined task distribution. Our approaches have been integrated in some main MPI implementations. They significantly reduce the communication costs and improve the overall application performance. These results highlight the importance of considering hardware topology for nowadays servers. Calcul intensif Communication réseau Mémoire partagée Mpi Multiprocesseur Numa Mulicœur Affinité matérielle Topologie High Performance Computing Network communication Shared memory Mpi Multiprocessor Numa Multicore Hardware affinity Topology
122	Monitorovací systém mobilních jednotek / Monitoring System for the Mobile Units Ševčík, Pavel January 2011 (has links) This thesis deals with real-time image processing including preprocessing, segmentation and classification of objects. On the basis of classification is determined rotation and position of objects. The aim of this project is to develop a modular application which will be able to monitor mobile units and determine their rotation and position in real time.
123	Portierbare numerische Simulation auf parallelen Architekturen Rehm, W. 30 October 1998 (has links) The workshop ¨Portierbare numerische Simulationen auf parallelen Architekturen¨ (¨Portable numerical simulations on parallel architectures¨) was organized by the Fac- ulty of Informatics/Professorship Computer Architecture at 18 April 1996 and held in the framework of the Sonderforschungsbereich (Joint Research Initiative) ¨Numerische Simulationen auf massiv parallelen Rechnern¨ (SFB 393) (¨Numerical simulations on massiv parallel computers¨) ( http://www.tu-chemnitz.de/~pester/sfb/sfb393.html ) The SFB 393 is funded by the German National Science Foundation (DFG). The purpose of the workshop was to bring together scientists using parallel computing to provide integrated discussions on portability issues, requirements and future devel- opments in implementing parallel software efficiently as well as portable on Clusters of Symmetric Multiprocessorsystems. I hope that the present paper gives the reader some helpful hints for further discussions in this field. info:eu-repo/classification/ddc/004 ddc:004 Benchmarking Parallel Computing FEM Shared Memory Implementations Message Passing Efficiency, MSC 68M99 MSC 68Q22
124	Parallellisering av Sliding Extensive Cancellation Algorithm (ECA-S) för passiv radar med OpenMP / Parallelization of Sliding Extensive Cancellation Algorithm (ECA-S) for Passive Radar with OpenMP Johansson Hultberg, Andreas January 2021 (has links) Software parallelization has gained increasing interest since the transistor manufacturing of smaller chips within an integrated circuit has begun to stagnate. This has led to the development of new processing units with an increasing number of cores. Parallelization is an optimization technique that allows the user to utilize parallel processes in order to streamline algorithm flows. This study examines the performance benefits that a passive bistatic radar system can obtain by parallelization and code refactorization. The study focuses mainly on investigating the use of parallel instructions within a shared memory model on a Central Processing Unit (CPU) with the use of an application programming interface, namely OpenMP. Quantitative data is collected to compare the runtime of the most central algorithm in the passive radar system, namely the Extensive Cancellation Algorithm (ECA). ECA can be used to suppress unwanted clutter in the surveillance signal, which purpose is to create clear target detections of airborne objects. The algorithm on the other hand is computationally demanding, which has led to the development of faster versions such as the Sliding ECA (ECA-S). Despite the ongoing development, the algorithm is still relatively computationally demanding which can lead to long execution times within the radar system. In this study, a MATLAB implementation of ECA-S is transformed to C in order to take advantage of the fast execution time of the procedural programming language. Parallelism is introduced within the converted algorithm by the use of Intel's thread methodology and then applied within two different operating systems. The study shows that a speedup can be obtained, in the programming language C, by a factor of 24 while still ensuring the correctness of the results. The results also showed that code refactorization of a MATLAB algorithm could result in 73% faster code and that C-MEX implementations are twice as slow as a C-implementation. Finally, the study pointed out that real-time can be achieved for a passive bistatic radar system with the use of the programming language C and by using parallel instructions within a shared memory model on a CPU. / Parallellisering av mjukvara har fått ett ökat intresse sedan transistortillverkningen av mindre chip inom en integrerade krets har börjat att stagnera. Detta har lett till utveckling av moderna processorer med ett ökande antal av kärnor. Parallellisering är en optimeringsteknik vilken tillåter användaren att utnyttja parallella processer till att effektivisera algoritmflöden. Denna studie undersöker de tidsmässiga fördelar ett passivt bistatiskt radarsystem kan erhålla genom att, bland annat tillämpa parallellisering och omformning. Studien fokuserar främst på att undersöka användandet av parallella trådar inom det delade minnesutrymmet på en centralprocessor (CPU), detta med hjälp av applikationsprogrammeringsgränssnittet OpenMP. Kvantitativa jämförelser tas fram med hjälp av en av de mest centrala algoritmerna inom det passiva radarsystemet, nämligen Extensive Cancellation Algorithm (ECA). ECA kan används till att undertrycka oönskat klotter i övervakningssignalen, vilket har till syfte att skapa klara måldetektioner av luftföremål. Algoritmen är däremot beräkningstung, vilket har medfört utveckling av snabbare versioner som exempelvis Sliding ECA (ECA-S). Trots utvecklingen är algoritmen fortfarande relativt beräkningstung och kan medföra en lång exekeveringstid inom hela radarsystemet. I denna studie transformeras en MATLAB-implementation av ECA-S till C för att kunna dra nytta av den snabba exekeveringstiden i det procedurella programmeringsspråket. Parallellism införs inom den transformerade algoritmen med hjälp av Intels trådmetodik och appliceras sedan inom två olika operativsystem. Studien visar på en tidsmässig optimering i C med upp till 24 gånger snabbare exekeveringstid och bibehållen noggrannhet. Resultaten visade även på att en enklare omformning av en MATLAB-algoritm kunde resultera till 73% snabbare kod och att en C-MEX-implementation är dubbelt så långsam i jämförelse med en C-implementering. Slutligen pekade studien på att realtid kan uppnås för ett passivt bistatiskt radarsystem vid användandet av programmeringsspråket C och med utnyttjandet av parallella instruktioner inom det delade minnet på en CPU. Extensive Cancellation Algorithm (ECA) ECA-S Passive Radar Optimization OpenMP Parallel Processing Shared Memory Model Central Processing Unit (CPU) Signal Processing Signalbehandling Computer Engineering Datorteknik Software Engineering Programvaruteknik
125	Formalisation et automatisation de YAO, générateur de code pour l’assimilation variationnelle de données / Formalisation and automation of YAO, code generator for variational data assimilation Nardi, Luigi 08 March 2011 (has links) L’assimilation variationnelle de données 4D-Var est une technique très utilisée en géophysique, notamment en météorologie et océanographie. Elle consiste à estimer des paramètres d’un modèle numérique direct, en minimisant une fonction de coût mesurant l’écart entre les sorties du modèle et les mesures observées. La minimisation, qui est basée sur une méthode de gradient, nécessite le calcul du modèle adjoint (produit de la transposée de la matrice jacobienne avec le vecteur dérivé de la fonction de coût aux points d’observation). Lors de la mise en œuvre de l’AD 4D-Var, il faut faire face à des problèmes d’implémentation informatique complexes, notamment concernant le modèle adjoint, la parallélisation du code et la gestion efficace de la mémoire. Aﬁn d’aider au développement d’applications d’AD 4D-Var, le logiciel YAO qui a été développé au LOCEAN, propose de modéliser le modèle direct sous la forme d’un graphe de ﬂot de calcul appelé graphe modulaire. Les modules représentent des unités de calcul et les arcs décrivent les transferts des données entre ces modules. YAO est doté de directives de description qui permettent à un utilisateur de décrire son modèle direct, ce qui lui permet de générer ensuite le graphe modulaire associé à ce modèle. Deux algorithmes, le premier de type propagation sur le graphe et le second de type rétropropagation sur le graphe permettent, respectivement, de calculer les sorties du modèle direct ainsi que celles de son modèle adjoint. YAO génère alors le code du modèle direct et de son adjoint. En plus, il permet d’implémenter divers scénarios pour la mise en œuvre de sessions d’assimilation.Au cours de cette thèse, un travail de recherche en informatique a été entrepris dans le cadre du logiciel YAO. Nous avons d’abord formalisé d’une manière plus générale les spécifications deYAO. Par la suite, des algorithmes permettant l’automatisation de certaines tâches importantes ont été proposés tels que la génération automatique d’un parcours “optimal” de l’ordre des calculs et la parallélisation automatique en mémoire partagée du code généré en utilisant des directives OpenMP. L’objectif à moyen terme, des résultats de cette thèse, est d’établir les bases permettant de faire évoluer YAO vers une plateforme générale et opérationnelle pour l’assimilation de données 4D-Var, capable de traiter des applications réelles et de grandes tailles. / Variational data assimilation 4D-Var is a well-known technique used in geophysics, and in particular in meteorology and oceanography. This technique consists in estimating the control parameters of a direct numerical model, by minimizing a cost function which measures the misﬁt between the forecast values and some actual observations. The minimization, which is based on a gradient method, requires the computation of the adjoint model (product of the transpose Jacobian matrix and the derivative vector of the cost function at the observation points). In order to perform the 4DVar technique, we have to cope with complex program implementations, in particular concerning the adjoint model, the parallelization of the code and an efﬁcient memory management. To address these difﬁculties and to facilitate the implementation of 4D-Var applications, LOCEAN is developing the YAO framework. YAO proposes to represent a direct model with a computation ﬂow graph called modular graph. Modules depict computation units and edges between modules represent data transfer. Description directives proper to YAO allow a user to describe its direct model and to generate the modular graph associated to this model. YAO contains two core algorithms. The ﬁrst one is a forward propagation algorithm on the graph that computes the output of the numerical model; the second one is a back propagation algorithm on the graph that computes the adjoint model. The main advantage of the YAO framework, is that the direct and adjoint model programming codes are automatically generated once the modular graph has been conceived by the user. Moreover, YAO allows to cope with many scenarios for running different data assimilation sessions.This thesis introduces a computer science research on the YAO framework. In a ﬁrst step, we have formalized in a more general way the existing YAO speciﬁcations. Then algorithms allowing the automatization of some tasks have been proposed such as the automatic generation of an “optimal” computational ordering and the automatic parallelization of the generated code on shared memory architectures using OpenMP directives. This thesis permits to lay the foundations which, at medium term, will make of YAO a general and operational platform for data assimilation 4D-Var, allowing to process applications of high dimensions. Assimilation variationnelle de données Modèle numérique Modèle adjoint Génération automatique Parallélisation automatique Mémoire partagée OpenMP Variational data assimilation Numerical model Adjoint model Automatic generation Automatic parallelization Shared memory OpenMP
126	High Performance and Scalable Cooperative Communication Middleware for Next Generation Architectures Chakraborty, Sourav 10 October 2019 (has links) No description available. Computer Science Computer Engineering
127	Parallel Processing of Reactive Transport Models Using OpenMP McLaughlin, Jared D. 20 March 2008 (has links) (PDF) Transport codes are beginning to be parallelized in order to allow more complex add-ons, such as geochemical packages, to utilize finer, more accurate grids, and to reduce solution times making stochastic and Monte Carlo simulations more feasible. Most codes parallelized via MPI (message passing interface) offer good results, but require the development of a new parallel code. OpenMP, the shared-memory standard, offers incremental parallelization, allowing sequential codes to remain relatively intact with minimal changes or additions. OpenMP allows speedup to be seen on personal computers with dual processors or greater, unlike some other parallelization approaches that require a supercomputer. An operator-split strategy creates an environment for easy parallelization by decoupling the transport and reactions of species. The transport, when decoupled from the reactions, is dependent on surrounding nodes and not on species. Therefore, each species transport can be solved on a different processor. The reactions, when decoupled from the transport, are dependant on the other species concentrations and not on the surrounding nodes, allowing the concentrations for all species to be solve for at a given node as if in a batch reactor. This allows a parallelization of the nodes. Two codes are parallelized in this work. The first is a 100-species 1D theoretical problem. The second is RT3D, a modular computer code for simulating reactive multi-species transport in 3-dimensional groundwater systems written and developed by Dr. T. Prabhakar Clement. RT3D is a sub-component of a parent code, MT3DMS, which utilizes RT3D to solve reaction terms. A speedup factor of 3.91 is seen on four processors, accomplishing a processor efficiency of approximately 98% while spent in RT3D itself. reactive transport OpenMP RT3D parallel speedup shared-memory multiprocessing modeling TVD flux limiters operator-split advection dispersion Amdahl Civil and Environmental Engineering
128	Parallel Solution of the Subset-sum Problem: An Empirical Study Bokhari, Saniyah S. 21 July 2011 (has links) No description available. Computer Engineering Computer Science Cray XMT Dynamic Programming IBM x3755 Multicore Multithreading NVIDIA FX 5800 OMP Opteron Parallel Algorithms Parallel Computing Shared Memory Subset-sum problem
129	[pt] REVISITANDO MONITORES / [en] REVISITING MONITORS RENAN ALMEIDA DE MIRANDA SANTOS 13 August 2020 (has links) [pt] A maioria das linguagens de programação modernas fornece ferramentas para programação concorrente sem restringir seu uso. Assim, fica a cargo do programador evitar a ocorrência de condições de corrida. Nessa dissertação, revisitamos o modelo de monitores, projetados para prevenir condições de corrida ao limitar o acesso à variáveis compartilhadas, e mostramos que monitores podem ser implementados em linguagens de programação com semântica referencial, dadas as regras de tipagem apropriadas. Nós descrevemos a linguagem de programação Aria, projetada com monitores nativos seguindo a proposta original do modelo. Através da resolução de problemas clássicos de concorrência, nós avaliamos o uso de monitores em Aria para sincronização em diferentes níveis de granularidade, e extendemos a linguagem com novos recursos a fim de contemplar as limitações do modelo envolvendo desempenho e expressividade. / [en] Most current programming languages do not restrict the use of the concurrency primitives they provide, leaving it to the programmer to detect data races. In this dissertation, we revisit the monitor model, which guards against data races by guaranteeing that accesses to shared variables occur only inside monitors, and show that this concept can be implemented in a programming language with referential semantics, given appropriate typing rules. We describe the Aria programming language, designed with native monitors according to these rules. Through the discussion of classic concurrency problems, we evaluate the use of Aria monitors for synchronization at different levels of granularity and extend the language with new features to address the limitations of monitors regarding performance and expressiveness. [pt] IMUTABILIDADE [pt] PROBLEMAS DE CONCORRENCIA CLASSICOS [pt] MONITORES [pt] MEMORIA COMPARTILHADA [pt] CONCORRENCIA [en] IMMUTABILITY [en] CONCURRENT PROGRAMMING LANGUAGES [en] CLASSIC CONCURRENCY PROBLEMS [en] MONITORS [en] SHARED MEMORY [en] CONCURRENCE
130	The HELLS-Join: A Heterogeneous Stream join for ExtremeLy Large windows Karnagel, Tomas, Habich, Dirk, Schlegel, Benjamin, Lehner, Wolfgang 19 September 2022 (has links) Upcoming processors are combining different computing units in a tightly-coupled approach using a unified shared memory hierarchy. This tightly-coupled combination leads to novel properties with regard to cooperation and interaction. This paper demonstrates the advantages of those processors for a stream-join operator as an important data-intensive example. In detail, we propose our HELLS-Join approach employing all heterogeneous devices by outsourcing parts of the algorithm on the appropriate device. Our HELLS-Join performs better than CPU stream joins, allowing wider time windows, higher stream frequencies, and more streams to be joined as before. info:eu-repo/classification/ddc/004 ddc:004

Search results