Global ETD Search

151	Metodologia de paralelização híbrida do DEM com controle de balanço de carga baseado em curva de Hilbert CINTRA, Diogo Tenório 29 January 2016 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-28T12:46:53Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) tese_diogotc_final.pdf: 7303783 bytes, checksum: f9959e8bb63b91d247de9903c2484d35 (MD5) / Made available in DSpace on 2016-07-28T12:46:53Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) tese_diogotc_final.pdf: 7303783 bytes, checksum: f9959e8bb63b91d247de9903c2484d35 (MD5) Previous issue date: 2016-01-29 / Esta tese apresenta uma metodologia de paralelização híbrida aplicada ao Método dos Elementos Discretos (DEM - Discrete Element Method) que combina MPI e OpenMP com o intuito de melhoria de desempenho computacional. A metodologia utiliza estratégias de decomposição de domínio visando a distribuição do cálculo de modelos de larga escala em um cluster. A técnica proposta também particiona a carga de trabalho de cada subdomínio entre threads. Este procedimento adicional visa obter maiores desempenhos computacionais através do ajuste de utilização de mecanismos de troca de mensagens entre processos e paralelização por threads. O objetivo principal da técnica é reduzir os elevados tempos de comunicação entre processos em ambientes computacionais de memória compartilhada tais como os processadores modernos. A divisão de trabalho por threads emprega a curva de preenchimento de espaço de Hilbert (HSFC) visando a melhoria de localidade dos dados e evitando custos computacionais (overheads) resultantes de ordenações constantes para o vetor de partículas. As simulações numéricas apresentadas permitem avaliar os métodos de decomposição de domínio, técnicas de particionamento, mecanismos de controle de acesso à memória, dentre outros. Algoritmos distintos de particionamento e diferentes estratégias de solução paralela são abordados para ambientes computacionais de memória distribuída, compartilhada ou para um modelo híbrido que envolve os dois ambientes. A metodologia desenvolvida e a ferramenta computacional utilizada nas implementações realizadas, o software DEMOOP, fornecem recursos que podem ser aplicados em diversos problemas de engenharia envolvendo modelos de partículas em larga escala. Nesta tese alguns destes problemas são abordados, em especial aqueles relacionados com fluxo de partículas em rampas, em funis de descarga e em cenários reais de deslizamento de terra. Os resultados mostram que as estratégias de execução híbridas atingem, em geral, melhores desempenhos computacionais que aqueles que se baseiam unicamente em troca de mensagens. A técnica de paralelização híbrida desenvolvida também obtém um bom controle de balanço de carga entre threads. Os estudos de caso apresentados apresentam boa escalabilidade e eficiências paralelas. O método proposto permite uma execução configurável de modelos numéricos do DEM e introduz uma estratégia combinada que melhora localidade dos dados e um balanceamento de carga iterativo. / This thesis introduces a methodology of hybrid parallelization applied to the Discrete Element Method (DEM) that combines MPI and OpenMP to improve computational performance. The methodology uses domain decomposition strategies to distribute the computation of large-scale models in a cluster. It also partitions the workload of each subdomain among threads. This additional procedure aims to reach higher computational performance by adjusting the usage of message passing artifacts and threads. The main objective is to reduce the expensive communications between processes in computer resources of shared memory such as modern processors. The work division by threads employs Hilbert Space Filling Curves (HSFC) in order to improve data-locality and to avoid the overhead caused by the dynamical sorting of the particles array. Presented numerical simulations allow to evaluate several domain decomposition schemes, partitioning methods, mechanisms of memory access control, among others. The work investigate distinct schemes of parallel solution for both distributed and shared memory environments. The method and the computational tool employed, the software DEMOOP, provide applied resources for several engineering problems involving large scale particle models. Some of these problems are presented on this thesis, such as the particle flows that happen on inclined ramps, discharge hoppers and real scenarios of landslides. The results shows that the hybrid executions reach better computational performance than those based on message passing only, including a good control of load balancing among threads. Case studies present good scalability and parallel efficiencies. The proposed approach allows a configurable execution of numerical models and introduces a combined scheme that improves data-locality and an iterative workload balancing. Método dos elementos discretos Processamento de alto desempenho Paralelização híbrida HSFC DEM High performance computing Hybrid parallelization HSFC
152	Laser Triangulation Using Spacetime Analysis Benderius, Björn January 2007 (has links) In this thesis spacetime analysis is applied to laser triangulation in an attempt to eliminate certain artifacts caused mainly by reflectance variations of the surface being measured. It is shown that spacetime analysis do eliminate these artifacts almost completely, it is also shown that the shape of the laser beam used no longer is critical thanks to the spacetime analysis, and that in some cases the laser probably even could be exchanged for a non-coherent light source. Furthermore experiments of running the derived algorithm on a GPU (Graphics Processing Unit) are conducted with very promising results. The thesis starts by deriving the theory needed for doing spacetime analysis in a laser triangulation setup taking perspective distortions into account, then several experiments evaluating the method is conducted. laser triangulation camera geometry spacetime analysis range imaging parallelization gpgpu
153	Parallelization of ray casting for solar irradiance calculations in urban environments Eggers, Patrick January 2017 (has links) The growing amount of photovoltaic systems in urban environments creates peaks of energy generation in local energy grids. These peaks can lead to unwanted instability in the electrical grid. By aligning solar panels differently, spikes could be avoided. Planning locations for solar panels in urban environments is very time-intense as they require a high spatial and temporal resolution. The aim of this thesis is to investigate the decrease in runtime of planning applications by parallelizing ray-casting algorithms. This thesis includes a software tool for professionals and laymen, which has been developed in a user centered design process and shows ways to perform those calculations on a graphics processing unit.After creating a computational concept and a concept of the software design, those concepts have been implemented starting with an implementation of the Möller-Trumbore ray-casting algorithm which has been run with Python on the central processing unit (CPU). Further the same test with the same algorithm and the same data has been performed on the graphics processing unit (GPU) by using PyCUDA, a Python wrapper for NVIDIAs Compute Unified Device Architecture (CUDA). Both results were compared resulting in, that parallelizing, transferring and performing those calculations on the graphics processing unit can decrease the runtime of a software significantly. In the used system setup, the same calculations were 42 times faster on the Graphics Processing Unit than on the Central Processing Unit. It was also found, that other factors such as the time of the year, the location of the tested points in the data model, the test interval length and the algorithm design of the ray-casting algorithm have a major impact on the performance of such. In the test scenario the processing time for the same case, but just during another time of the year, increases by factor 4.The findings of this thesis can be used in a wide range of software as it shows, that computationally intensive calculations can easily be sourced out from the Python code and executed on another platform. By doing so, the runtime can be significantly decreased and the whole software package can get an enormous speed boost. Solar Radiation Parallelization Simulation Hardware Architecture Ray-casting Energy Systems Energisystem Construction Management Byggproduktion Computer Sciences Datavetenskap (datalogi)
154	Movement sensor using image correlation on a multicore platform Lind, Christoffer, Green, Jonas, Ingvarsson, Thomas January 2012 (has links) The purpose of this study was to investigate the possibility to measure speed of a vehicle usingimage correlation. It was identified that a new solution of measuring the speed of a vehicle, astoday’s solution does not give the True Speed Over Ground, would open up possibilities of highprecision driving applications. It was also the intention to evaluate the performance of theproposed algorithm on a multicore platform. The study was commissioned by HalmstadUniversity.The investigation of image correlation as a method to measure speed of a vehicle was conductedby applying the proposed algorithm on a sequence of images. The result was compared toreference points in the image sequence to confirm the accuracy. The performance of the multicoreplatform was measured by counting the clock cycles it took to perform one measurement cycle ofthe algorithm.It was found out that using image correlation to measure speed has a positional accuracy of closeto a half percent. The results also revealed that one measurement cycle of the algorithm could beperformed in close to half a millisecond and the achieved parallel utilization of the multicoreplatform was close to eighty-seven percent.It was concluded that the algorithm performed well within the limit of acceptance. A conclusionabout the performance was that low execution time of a measurement cycle makes it possible toexecute the algorithm at a frequency of eighteen hundred Hertz. With a frequency that high, incombination with the camera settings proposed in the thesis, the algorithm would be able tomeasure speeds close to one thousand one hundred kilometers per hour.The authors recommend that future work should be focused on investigating the cameraparameters to be able to optimize both the memory and computational requirements of theapplication. It is also recommended to look closer at the algorithm and the possibilities ofdetecting transversal and angular changes as it would open up for other application areas,requiring more than just the speed. Adapteva Multicore Parallel Image correlation Speed sensor True speed over ground Many-core Phase correlation Parallelization Computer Sciences Datavetenskap (datalogi)
155	Parallélisations de méthodes de programmation par contraintes / Parallelizations of constraint programming methods Menouer, Tarek 26 June 2015 (has links) Dans le cadre du projet PAJERO, nous présentons dans cette thèse une parallélisation externe d'un solveur de Programmation Par Contraintes (PPC) basée sur des méthodes de parallélisation de la search et Portfolio. Cela, afin d'améliorer la performance de la résolution des problèmes de satisfaction et d'optimisation sous contraintes. La parallélisation de la search que nous proposons est adaptée pour une exécution en mode opportuniste et déterministe, suivant les besoins des clients. Le principe consiste à partitionner à la demande l'arbre de recherche unique généré par une seule stratégie de recherche en un ensemble de sous-arbres, pour ensuite affecter chaque sous-arbre à un coeur de calcul. Une stratégie de recherche est un algorithme qui choisit pour chaque noeud dans l'arbre de recherche la variable à assigner et choisi également l'ordonnancement de la recherche. En PPC, il existe plusieurs stratégies de recherche, certaines sont plus efficaces que d'autres, mais cela dépend généralement de la nature des problèmesde contraintes. Cependant la difficulté reste de choisir la bonne stratégie. Pour bénéficier de la variété des stratégies et de la disponibilité des ressources de calcul, un autre type de parallélisation est proposé, appelé Portfolio. La parallélisationPortfolio consiste à exécuter en parallèle N stratégies de recherche, ensuite la première stratégie qui trouve une solution met fin à toutes les autres. La nouveauté que nous proposons dans la parallélisation Portfolio consiste à adapterl'ordonnancement des N stratégies entre elles afin de privilégier la stratégie la plus prometteuse. Cela en lui donnant plus de coeurs que les autres. Pour ceci nous appliquons soit une fonction d'estimation pour chaque stratégie afin de sélectionner la stratégie qui a le plus petit arbre de recherche, soit un algorithme d'apprentissage qui permet de prédire quelle est la meilleure stratégie suivant le résultat d'un apprentissage effectué sur des instances déjà résolues. Afin d'ordonnancer plusieurs applications de PPC, nous proposons également un nouveau système d'allocation de ressources basé sur une stratégie d'ordonnancement combinée avec un modèle économique. Les applications de PPC sont résolues avec des solveurs parallèles dans une infrastructure cloud computing. L'originalité du system d'allocation est qu'il détermine automatiquement le nombre de ressources à affecter pour chaque application suivant la classe économique du client. Les performances obtenues avec nos méthodes de parallélisation sont illustrées par la résolution des problèmes de contraintes en portant le solveur Google OR-Tools au-dessus de notre framework parallèle Bobpp / In the context of the PAJERO project, we propose in this thesis an external parallelization of a Constraint Programming (CP) solver, using both search and Portfolio parallelizations, in order to solve constraint satisfaction and optimization problems. In our work the search parallelization is adapted for deterministic and non-deterministic executions, according to the needs of the user. The principle is to partition the unique search tree generated by one search strategy into a set of sub-trees, then assign each sub-tree to one computing core. A search strategy herein means an algorithm to decide which variable is selected to be assigned in each node of the search tree, and decide also the scheduling of the search. In CP, several search strategies exist and each one could be better than others for solving a specific problem. The difficulty lies in how to choose the best strategy. To benefit from the variety of strategies and the availability of computationalresources, another parallelization exists called the Portfolio parallelization. The principle of this Portfolio parallelization is to execute N search strategies in parallel. The first strategy which find a solution stops the others. The noveltyof our work in the context of the Portfolio is to adapt the schedule of the N strategies in order to favour the most promising strategy, which is a candidate to find a solution first, by giving it more cores than others. The promising strategyis selected using two methods. The first method is to use an estimation function which select the strategy with the smallest search tree. The second method is to use a learning algorithm which automatically determines the number of cores thatwill be allocated to each strategy according to the previous experiment. We have also proposed a new resource allocation system based on a scheduling strategy used with an economic model in order to execute several PPC applications. Thisapplications are solved using parallel solvers in the cloud computing infrastructure. The originality of this system is that the number of resources allocated to each PPC application is determined automatically according the economic classesof the users. The performances obtained by our parallelization methods are illustrated by solving the CP problems using the Google OR-Tools solver on top of the parallel Bobpp framework. Parallélisme Équilibrage de charge parallèle Ordonnancement dynamique Programmation Par Contraintes Portfolio Parallelization Parallel load balancing Dynamic scheduling Constraint Programming Portfolio
156	Genetické algoritmy – implementace paralelního zpracování / Genetic Algorithms - Implementation of Multiprocessing Tuleja, Martin January 2018 (has links) Genetic algorithms are modern algorithms intended to solve optimization problems. Inspiration originates in evolutionary principles in nature. Parallelization of genetic algorithms provides not only faster processing but also new and better solutions. Parallel genetic algorithms are also closer to real nature than their sequential counterparts. This paper describes the most used models of parallelization of genetic algorithms. Moreover, it provides the design and implementation in programming language Python. Finally, the implementation is verified in several test cases.
157	Hardware Accelerated Digital Image Stabilization in a Video Stream / Hardware Accelerated Digital Image Stabilization in a Video Stream Pacura, Dávid January 2016 (has links) Cílem této práce je návrh nové techniky pro stabilizaci obrazu za pomoci hardwarové akcelerace prostřednictvím GPGPU. Využití této techniky umožnuje stabilizaci videosekvencí v reálném čase i pro video ve vysokém rozlišení. Toho je zapotřebí pro ulehčení dalšího zpracování v počítačovém vidění nebo v armádních aplikacích. Z důvodu existence vícerých programovacích modelů pro GPGPU je navrhnutý stabilizační algoritmus implementován ve třech nejpoužívanějších z nich. Jejich výkon a výsledky jsou následně porovnány a diskutovány.
158	Software concepts and algorithms for an efficient and scalable parallel finite element method Witkowski, Thomas 19 December 2013 (has links) Software packages for the numerical solution of partial differential equations (PDEs) using the finite element method are important in different fields of research. The basic data structures and algorithms change in time, as the user\'s requirements are growing and the software must efficiently use the newest highly parallel computing systems. This is the central point of this work. To make efficiently use of parallel computing systems with growing number of independent basic computing units, i.e.~CPUs, we have to combine data structures and algorithms from different areas of mathematics and computer science. Two crucial parts are a distributed mesh and parallel solver for linear systems of equations. For both there exists multiple independent approaches. In this work we argue that it is necessary to combine both of them to allow for an efficient and scalable implementation of the finite element method. First, we present concepts, data structures and algorithms for distributed meshes, which allow for local refinement. The central point of our presentation is to provide arbitrary geometrical information of the mesh and its distribution to the linear solver. A large part of the overall computing time of the finite element method is spend by the linear solver. Thus, its parallelization is of major importance. Based on the presented concept for distributed meshes, we preset several different linear solver methods. Hereby we concentrate on general purpose linear solver, which makes only little assumptions about the systems to be solver. For this, a new FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) method is proposed. Those the standard FETI-DP method is quasi optimal from a mathematical point of view, its not possible to implement it efficiently for a large number of processors (> 10,000). The main reason is a relatively small but globally distributed coarse mesh problem. To circumvent this problem, we propose a new multilevel FETI-DP method which hierarchically decompose the coarse grid problem. This leads to a more local communication pattern for solver the coarse grid problem and makes it possible to scale for a large number of processors. Besides the parallelization of the finite element method, we discuss an approach to speed up serial computations of existing finite element packages. In many computations the PDE to be solved consists of more than one variable. This is especially the case in multi-physics modeling. Observation show that in many of these computation the solution structure of the variables is different. But in the standard finite element method, only one mesh is used for the discretization of all variables. We present a multi-mesh finite element method, which allows to discretize a system of PDEs with two independently refined meshes. / Softwarepakete zur numerischen Lösung partieller Differentialgleichungen mit Hilfe der Finiten-Element-Methode sind in vielen Forschungsbereichen ein wichtiges Werkzeug. Die dahinter stehenden Datenstrukturen und Algorithmen unterliegen einer ständigen Neuentwicklung um den immer weiter steigenden Anforderungen der Nutzergemeinde gerecht zu werden und um neue, hochgradig parallel Rechnerarchitekturen effizient nutzen zu können. Dies ist auch der Kernpunkt dieser Arbeit. Um parallel Rechnerarchitekturen mit einer immer höher werdenden Anzahl an von einander unabhängigen Recheneinheiten, z.B.~Prozessoren, effizient Nutzen zu können, müssen Datenstrukturen und Algorithmen aus verschiedenen Teilgebieten der Mathematik und Informatik entwickelt und miteinander kombiniert werden. Im Kern sind dies zwei Bereiche: verteilte Gitter und parallele Löser für lineare Gleichungssysteme. Für jedes der beiden Teilgebiete existieren unabhängig voneinander zahlreiche Ansätze. In dieser Arbeit wird argumentiert, dass für hochskalierbare Anwendungen der Finiten-Elemente-Methode nur eine Kombination beider Teilgebiete und die Verknüpfung der darunter liegenden Datenstrukturen eine effiziente und skalierbare Implementierung ermöglicht. Zuerst stellen wir Konzepte vor, die parallele verteile Gitter mit entsprechenden Adaptionstrategien ermöglichen. Zentraler Punkt ist hier die Informationsaufbereitung für beliebige Löser linearer Gleichungssysteme. Beim Lösen partieller Differentialgleichung mit der Finiten Elemente Methode wird ein großer Teil der Rechenzeit für das Lösen der dabei anfallenden linearen Gleichungssysteme aufgebracht. Daher ist deren Parallelisierung von zentraler Bedeutung. Basierend auf dem vorgestelltem Konzept für verteilten Gitter, welches beliebige geometrische Informationen für die linearen Löser aufbereiten kann, präsentieren wir mehrere unterschiedliche Lösermethoden. Besonders Gewicht wird dabei auf allgemeine Löser gelegt, die möglichst wenig Annahmen über das zu lösende System machen. Hierfür wird die FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) Methode weiterentwickelt. Obwohl die FETI-DP Methode vom mathematischen Standpunkt her als quasi-optimal bezüglich der parallelen Skalierbarkeit gilt, kann sie für große Anzahl an Prozessoren (> 10.000) nicht mehr effizient implementiert werden. Dies liegt hauptsächlich an einem verhältnismäßig kleinem aber global verteilten Grobgitterproblem. Wir stellen eine Multilevel FETI-DP Methode vor, die dieses Problem durch eine hierarchische Komposition des Grobgitterproblems löst. Dadurch wird die Kommunikation entlang des Grobgitterproblems lokalisiert und die Skalierbarkeit der FETI-DP Methode auch für große Anzahl an Prozessoren sichergestellt. Neben der Parallelisierung der Finiten-Elemente-Methode beschäftigen wir uns in dieser Arbeit mit der Ausnutzung von bestimmten Voraussetzung um auch die sequentielle Effizienz bestehender Implementierung der Finiten-Elemente-Methode zu steigern. In vielen Fällen müssen partielle Differentialgleichungen mit mehreren Variablen gelöst werden. Sehr häufig ist dabei zu beobachten, insbesondere bei der Modellierung mehrere miteinander gekoppelter physikalischer Phänomene, dass die Lösungsstruktur der unterschiedlichen Variablen entweder schwach oder vollständig voneinander entkoppelt ist. In den meisten Implementierungen wird dabei nur ein Gitter zur Diskretisierung aller Variablen des Systems genutzt. Wir stellen eine Finite-Elemente-Methode vor, bei der zwei unabhängig voneinander verfeinerte Gitter genutzt werden können um ein System partieller Differentialgleichungen zu lösen. info:eu-repo/classification/ddc/510 ddc:510
159	Paralelizace evolučních algoritmů pomocí GPU / GPU Parallelization of Evolutionary Algorithms Valkovič, Patrik January 2021 (has links) Graphical Processing Units stand for the success of Artificial Neural Networks over the past decade and their broader application in the industry. Another promising field of Artificial Intelligence is Evolutionary Algorithms. Their parallelization ability is well known and has been successfully applied in practice. However, these attempts focused on multi-core and multi-machine parallelization rather than on the GPU. This work explores the possibilities of Evolutionary Algorithms parallelization on GPU. I propose implementation in PyTorch library, allowing to execute EA on both CPU and GPU. The proposed implementation provides the most common evolutionary operators for Genetic Algorithms, Real-Coded Evolutionary Algorithms, and Particle Swarm Op- timization Algorithms. Finally, I show the performance is an order of magnitude faster on GPU for medium and big-sized problems and populations. 1
160	Plasma discharge 2D modeling of a Hall thruster / Modélisation bidimensionnelle de la décharge plasma dans un propulseur de Hall Croes, Vivien 24 October 2017 (has links) Alors que les applications spatiales prennent une place de plus en plus cruciale dans nos vies, les coûts d'opération des satellites doivent être réduits. Ceci peut être obtenu par l'utilisation de systèmes de propulsion électriques, plus efficients que leurs homologues chimiques traditionnellement utilisés. Une des technologies de propulsion électrique la plus performante et la plus utilisée est le propulseur à effet Hall, toutefois ce système reste complexe et peu compris. En effet de nombreuses questions, concernant le transport anormal des électrons ou les interactions plasma/paroi, sont encore ouvertes.Les réponses à ces questions sont basées sur des mécanismes cinétiques et donc ne peuvent être résolues par des modèles fluides. De plus les caractéristiques géométriques et temporelles de ces mécanismes les rendent difficilement observables expérimentalement. Par conséquent nous avons, pour répondre à ces questions, développé un code cinétique bi-dimensionnel.Grâce à un modèle simplifié de propulseur à effet Hall, nous avons observé l'importance de l'instabilité de dérive électronique pour le transport anormal. Ensuite en utilisant un modèle réaliste de propulseur, nous avons pu étudier les effets des interactions plasma/paroi sur la décharge plasma. Nous avons également pu quantifier les effets intriqués des émissions électroniques secondaires et de l'instabilité de dérive sur le transport anormal. Par une étude paramétrique sur les émissions électroniques secondaires, nous avons pu identifier trois régimes de décharge plasma. Finalement l'impact des ergols alternatifs a pu être étudié en utilisant des processus collisionnels réalistes. / As space applications are increasingly crucial in our daily life, satellite operating costs need to be decreased. This can be achieved through the use of cost efficient electric propulsion systems. One of the most successful and competitive electric propulsion system is the Hall effect thruster, but this system is characterized by its complexity and remains poorly understood. Indeed some key questions, concerning electron anomalous transport or plasma/wall interactions, are still to be answered.Answers to both questions are based on kinetic mechanisms, and thus cannot be solved with fluid models. Furthermore the temporal and geometrical scales of these mechanisms make them difficult to be experimentally measured. Consequently we chose, in order to answer those questions, to develop a bi-dimensional fully kinetic simulation tool.Using a simplified simulation of the Hall effect thruster, we observed the importance of the azimuthal electron drift instability for anomalous cross-field electron transport. Then, using a realistic model of a Hall effect thruster, we were able to study the effects of plasma/wall interactions on the plasma discharge characteristics, as well as to quantify the coupled effects of secondary electron emission and electron drift instability on the anomalous transport. Through parametric study of secondary electron emission, three plasma discharge regimes were identified. Finally the impact of alternative propellants was studied. Propulseur Courant de Hall Modélisation PIC Astronautique Code collisionnel Parallèlisation massive Thruster Hall current PIC modelization Astronautics Collisional code Massive Parallelization

Search results