Global ETD Search

401	Précision de modèle et efficacité algorithmique : exemples du traitement de l'occultation en stéréovision binoculaire et de l'accélération de deux algorithmes en optimisation convexe / Model accuracy and algorithmic efficiency : examples of occlusion handling in binocular stereovision and the acceleration of two convex optimization algorithms Tan, Pauline 28 November 2016 (has links) Le présent manuscrit est composé de deux parties relativement indépendantes.La première partie est consacrée au problème de la stéréovision binoculaire, et plus particulièrement au traitement de l'occultation. En partant d'une analyse de ce phénomène, nous en déduisons un modèle de régularité qui inclut une contrainte convexe de visibilité. La fonctionnelle d'énergie qui en résulte est minimisée par relaxation convexe. Les zones occultées sont alors détectées grâce à la pente horizontale de la carte de disparité avant d'être densifiées.Une autre méthode gérant l'occultation est la méthode des graph cuts proposée par Kolmogorov et Zabih. L'efficacité de cette méthode justifie son adaptation à deux problèmes auxiliaires rencontrés en stéréovision, qui sont la densification de cartes éparses et le raffinement subpixellique de cartes pixelliques.La seconde partie de ce manuscrit traite de manière plus générale de deux algorithmes d'optimisation convexe, pour lequels deux variantes accélérées sont proposées. Le premier est la méthode des directions alternées (ADMM). On montre qu'un léger relâchement de contraintes dans les paramètres de cette méthode permet d'obtenir un taux de convergence théorique plus intéressant.Le second est un algorithme de descentes proximales alternées, qui permet de paralléliser la résolution approchée du problème Rudin-Osher-Fatemi (ROF) de débruitage pur dans le cas des images couleurs. Une accélération de type FISTA est également proposée. / This thesis is splitted into two relatively independant parts. The first part is devoted to the binocular stereovision problem, specifically to the occlusion handling. An analysis of this phenomena leads to a regularity model which includes a convex visibility constraint. The resulting energy functional is minimized by convex relaxation. The occluded areas are then detected thanks to the horizontal slope of the disparity map and densified. Another method with occlusion handling was proposed by Kolmogorov and Zabih. Because of its efficiency, we adapted it to two auxiliary problems encountered in stereovision, namely the densification of sparse disparity maps and the subpixel refinement of pixel-accurate maps.The second part of this thesis studies two convex optimization algorithms, for which an acceleration is proposed. The first one is the Alternating Direction Method of Multipliers (ADMM). A slight relaxation in the parameter choice is shown to enhance the convergence rate. The second one is an alternating proximal descent algorithm, which allows a parallel approximate resolution of the Rudin-Osher-Fatemi (ROF) pure denoising model, in color-image case. A FISTA-like acceleration is also proposed. Stéréovision Occultation Méthodes variationnelles Vision par ordinateur Optimisation convexe Calcul parallèle Stereovision Occlusion Variational methods Computer vision Convex optimisation Parallel computing
402	Direct Numerical Simulation of bubbles with Adaptive Mesh Refinement with Distributed Algorithms / Simulation numérique directe de bulles sur maillage adaptatif avec algorithmes distribués Talpaert, Arthur 24 February 2017 (has links) Ce travail de thèse présente l'implémentation de la simulation d'écoulements diphasiques dans des conditions de réacteurs nucléaires à caloporteur eau, à l'échelle de bulles individuelles. Pour ce faire, nous étudions plusieurs modèles d'écoulements thermohydrauliques et nous focalisons sur une technique de capture d'interface mince entre phases liquide et vapeur. Nous passons ainsi en revue quelques techniques possibles de maillage adaptatif (AMR) et nous fournissons des outils algorithmiques et informatiques adaptés à l'AMR par patchs dont l'objectif localement la précision dans des régions d'intérêt. Plus précisément, nous introduisons un algorithme de génération de patchs conçu dans l'optique du calcul parallèle équilibré. Cette approche nous permet de capturer finement des changements situés à l'interface, comme nous le montrons pour des cas tests d'advection ainsi que pour des modèles avec couplage hyperbolique-elliptique. Les calculs que nous présentons incluent également la simulation du système de Navier-Stokes incompressible qui modélise la déformation de l'interface entre deux fluides non-miscibles. / This PhD work presents the implementation of the simulation of two-phase flows in conditions of water-cooled nuclear reactors, at the scale of individual bubbles. To achieve that, we study several models for Thermal-Hydraulic flows and we focus on a technique for the capture of the thin interface between liquid and vapour phases. We thus review some possible techniques for Adaptive Mesh Refinement (AMR) and provide algorithmic and computational tools adapted to patch-based AMR, which aim is to locally improve the precision in regions of interest. More precisely, we introduce a patch-covering algorithm designed with balanced parallel computing in mind. This approach lets us finely capture changes located at the interface, as we show for advection test cases as well as for models with hyperbolic-elliptic coupling. The computations we present also include the simulation of the incompressible Navier-Stokes system, which models the shape changes of the interface between two non-miscible fluids. Simulation numérique Maillage adaptatif Parallélisation Écoulement diphasique Cavité entraînée Bulle Numerical simulation Adaptive mesh refinement Parallel computing Two-Phase flow Lid-Driven cavity Bubble
403	Méthodes itératives à retard pour architecture massivement parallèles / Iterative methods with retards for massively parallel architecture Zhang, Hanyu 29 September 2016 (has links) Avec l'avènement de machine parallèles multi-coeurs, de nombreux algorithmes doivent être modifiés ou conçus pour s'adapter à ces architectures. Ces algorithmes consistent pour la plupart à diviser le problème original en plusieurs petits sous-problèmes et à les distribuer sur les différentes unités de calcul disponibles. La résolution de ces petits sous-problèmes peut être exécutée en parallèle, des communications entre les unités de calcul étant indispensables pour assurer la convergence de ces méthodes.Ma thèse propose de nouveaux algorithmes parallèles pour résoudre de grands systèmes linéaires.Les algorithmes proposés sont ici basés sur la méthode du gradient. Deux points fondamentaux de la méthode du gradient sont la direction de descente de la solution approchée et la valeur du pas de descente, qui détermine la modification à effectuer à chaque itération. Nous proposons dans cette thèse de calculer la direction et le pas indépendamment et localement sur chaque unité de calcul, ce qui nécessite moins de synchronisation entre les processeurs, et par suite rend chaque itération simple et plus rapide, et rend son extension dans un contexte asynchrone possible.Avec les paramètres d'échelle appropriés pour le pas des longueurs, la convergence peut être démontrée pour les deux versions synchrone et asynchrone des algorithmes. De nombreux tests numériques illustrent l’efficacité de ces méthodes.L'autre partie de ma thèse propose d'utiliser une méthode d'extrapolation pour accélérer les méthodes itératives classiques avec retard. Bien que les séquences de vecteur générées par des méthodes itératives asynchrones générales classiques ne peut être accélérée, nous sommes en mesure de démontrer que, une fois le modèle de calcul et de communication fixés au cours de l’exécution, la séquence de vecteurs générés peut être accéléré. De nombreux tests numériques illustrent l’efficacité de ces accélérations dans le cas des méthodes avec retard. / With the increase of architectures composed of multi-cores, many algorithms need to revisited and be modified to exploit the power of these new architectures. These algorithms divide the original problem into “small pieces” and distribute these pieces to different processors at disposal, thus communications among them are indispensible to assure the convergence. My thesis mainly focus on solving large sparse systems of linear equations in parallel with new methods. These methods are based on the gradient methods. Two key parameters of the gradient methods are descent direction and step-length of descent for each iteration. Our methods compute the directions locally, which requires less synchronization and computation, leading to faster iterations and make easy asynchronization possible. Convergence can be proved in both synchronized or asynchronized cases. Numerical tests demonstrate the efficiency of these methods. The other part of my thesis deal with the acceleration of the vector sequences generated by classical iterative algorithms. Though general chaotic sequences may not be accelerated, it is possible to prove that with any fixed retard pattern, then the generated sequence can be accelerated. Different numerical tests demonstrate its efficiency. Calcul parallèle Synchronisation Algorithmes asynchrones Méthodes itératives Méthodes gradient Relaxation chaotique Accélération Parallel computing Synchronization Asynchronization Iterative methods Gradient methods Chaotic relaxation Acceleration
404	Environnement décentralisé et protocole de communication pour le calcul intensif sur grille / A decentralized environment and a protocol of communication for high performance computing on grid architecture Fakih, Bilal 09 November 2018 (has links) Dans cette thèse nous présentons un environnement décentralisé pour la mise en oeuvre des calcul intensif sur grille. Nous nous intéressons à des applications dans les domaines de la simulation numérique qui font appel à des modèles de type parallélisme de tâches et qui sont résolues par des méthodes itératives parallèles ou distribuées; nous nous intéressons aussi aux problèmes de planification. Mes contributions se situent au niveau de la conception et la réalisation d'un environnement de programmation GRIDHPC. GRIDHPC permet l'utilisation de tous les ressources de calcul, c'est-à-dire de tous les coeurs des processeurs multi-coeurs ainsi que l'utilisation du protocole de communication RMNP pour exploiter simultanément différents réseaux hauts débits comme Infiniband, Myrinet et aussi Ethernet. Notons que RMNP peut se reconfigurer automatiquement et dynamiquement en fonction des exigences de l'application, comme les schémas de calcul, c.-à-d, les schémas itératifs synchrones ou asynchrones, des éléments de contexte comme la topologie du réseau et le type de réseau comme Ethernet, Infiniband et Myrinet en choisissant le meilleur mode de communication entre les noeuds de calcul et le meilleur réseau. Nous présentons et analysons des résultats expérimentaux obtenus sur des grappes de calcul de la grille Grid5000 pour le problème de l'obstacle et le problème de planification. / This thesis aims at designing an environment for the implementation of high performance computing applications on Grid platforms. We are interested in applications like loosely synchronous applications and pleasingly parallel applications. For loosely synchronous applications, we are interested in particular in applications in the domains of numerical simulation that can be solved via parallel or distributed iterative methods, i.e., synchronous, asynchronous and hybrid iterative method; while, for pleasingly parallel applications, we are interested in planning problems. Our thesis work aims at designing the decentralized environment GRIDHPC. GRIDHPC exploits all the computing resources (all the available cores of computing nodes) using OpenMP as well as several types of networks like Ethernet, Infiniband and Myrinet of the grid platform using the reconfigurable multi network protocol RMNP. Note that RMNP can configure itself automatically and dynamically in function of application requirements like schemes of computation, i.e., synchronous or asynchronous iterative schemes, elements of context like network topology and type of network like Ethernet, Infiniband and Myrinet by choosing the best communication mode between computing nodes and the best network. We present and analyze a set of computational results obtained on Grid5000 platform for the obstacle and planning problems. Protocole de communication Calcul sur grille Calcul haute performance Calcul parallèle Planification Simulation numérique Communication Protocol Grid Computing High Performance Computing Parallel Computing Planning Numerical Simulation
405	Web-based atmospheric nucleation data management and visualization Zhu, Kai 01 January 2012 (has links) Atmospheric nucleation is a process of phase transformation like liquid water transforming into solid or gas phase water, which serves as a significant impact on many atmospheric and technological processes. During the process of the atmospheric nucleation, certain 3D molecular models for atmospheric nucleation will be generated, which are main mixtures of water molecules and hexanol molecules. Analyzing these 3D molecular models can promote the understanding for the nucleation and growth of the particles and phases in a multi-component mixture, as well as for the changes in climate and weather. Therefore, the research for atmospheric nucleation can be transformed into the research for the 3D molecular visualizations and comparisons, which are the similarity calculations. Unfortunately, the research on understanding atmospheric nucleation processes is restricted due to the lack of efficient visual data exploration tools. In this paper, the issue of lacking efficient data visualization tools is tackled by implementing our own application to visualize the atmospheric nucleation. The similarity calculation for these 3D molecules is implemented in order to analyze and compare the atmospheric nucleation processes and molecular models. Admittedly, there are various 3D molecular similarity calculation algorithms, such as clique-detection algorithms and point matching, etc; however, these algorithms are specifically utilized in the fields of protein amino-acids and pharmacophore. Due to the large scale of the atmospheric nucleation data, GPU (Graphical Processing Units) is employed in order to significantly reduce the computation times. This is achieved by utilizing CUDA (Compute Uniform Device Architecture) technology which allows us to execute our algorithm in a parallel method. Furthermore, in this research, the knowledge of hypertree visualization is intended to be utilized to enhance the previously developed web-based visualization and analysis tool that allows remote users to effectively mine the wealth of particle-based nucleation simulation data. The research goal is to speed up knowledge discovery and improve users' productivity through effective data visualization technique and more friendly user interface design. Meanwhile, a feasible parallel computing solution is developed to overcome the slow response due to expensive large data pre-processing. The core research of my thesis is to calculate the similarity between the distinct 3D molecules. Engineering Physical Sciences and Mathematics
406	Dynamic Soil-Structure Interactionof Soil-Steel Composite Bridges : A Frequency Domain Approach Using PML Elements and Model Updating FERNANDEZ BARRERO, DIEGO January 2019 (has links) This master thesis covers the dynamic soil structure interaction of soil-steel culverts applyinga methodology based on the frequency domain response. At the first stage of this masterthesis, field tests were performed on one bridge using controlled excitation. Then, themethodology followed uses previous research, the field tests, finite element models (FEM)and perfectly matched layer (PML) elements.Firstly, a 2D model of the analysed bridge, Hårestorp, was made to compare the frequencyresponse functions (FRF) with the ones obtained from the field tests. Simultaneously, a 3Dmodel of the bridge is created for the following purposes: compare it against the 2D modeland the field tests, and to implement a model updating procedure with the particle swarmalgorithm to calibrate the model parameters. Both models use PML elements, which areverified against previous solution from the literature. The verification concludes that thePML behave correctly except for extreme parameter values.In the course of this master thesis, relatively advanced computation techniques were requiredto ensure the computational feasibility of the problem with the resources available.To do that, a literature review of theoretical aspects of parallel computing was performed, aswell as the practical aspects in Comsol. Then, in collaboration with Comsol Support and thehelp given by PDC at KTH it was possible to reduce the computational time to a feasiblepoint of around two weeks for the model updating of the 3D model.The results are inconclusive, in terms of searching for a perfectly fitting model. Therefore,further research is required to adequately face the problem. Nevertheless, there are some accelerometerswhich show a considerable level of agreement. This thesis concludes to discardthe 2D models due to their incapability of facing the reality correctly, and establishes a modeloptimisation methodology using Comsol in connection with Matlab. Dynamics soil-steel bridges soil-structure interaction model updating PML particle swarm corrugated steel plates frequency domain assurance criteria parallel computing 2 Other Engineering and Technologies Annan teknik
407	PhD_ShunjiangTao_May2023.pdf Shunjiang Tao (15209053) 12 April 2023 (has links) <p>The broad implementation of three-dimensional full-core modeling, with pin-resolved detail, for computational simulation and analysis of nuclear reactors highlights the importance of accuracy and efficiency in simulation codes for accurate and precise analysis. The primary objective of this dissertation is to develop a high-fidelity code capable of solving time-dependent neutron transport problems with 3D whole-core pin-resolved detail in nuclear reactor cores. Additionally, the dissertation explores the optimization of the code's parallelism to enhance its computational efficiency. To reduce the computational intensity associated with the direct 3D calculation of the neutron transport equation, a high-fidelity neutron transport code called PANDAS-MOC is developed using the 2D/1D approach. The 2D radial solution is obtained using the 2D Method of Characteristics (MOC), the axial 1D solution is determined through the Nodal Expansion Method (NEM), and then two solutions are coupled using transverse leakages to find the 3D solution. The convergence of the iterative scheme is accelerated using the multi-level coarse finite different mesh (ML-CMFD) technique. The code's validation and verification are carried out using the C5G7-TD benchmark exercises.</p> <p><br></p> <p>The significant and innovative aspect of this work involves parallelizing and optimizing the PANDAS-MOC code. Three parallel models are developed and evaluated based on the distributed memory and shared memory architecture: MPI parallel model (PMPI), Segment OpenMP threading hybrid model (SGP), and Whole-code OpenMP threading hybrid model (WCP). When computing the steady state of the C5G7 3D core with the same resources, the obtained speedup relationship between the three models is PMPI \(>\) WCP \(>\) SGP, whereas the WCP model only consumed 60\% of the memory of the PMPI model. Furthermore, the hybrid reduction in the ML-CMFD solver and the parallelism design of the MOC sweep are significant issues that decreased the speedup of WCP. Therefore, this study also addresses further optimizations of these two modules.</p> <p><br></p> <p>Concerning the MOC parallelism, two improvements are discussed: No-atomic schedule and Additional Axial Decomposition (AAD) parallelism. The No-atomic schedule evenly distributed the workload among threads and removes the \textit{omp atomic} clause from the code by predefining the MOC calculation sequence for each launched OpenMP thread while ensuring a thread-safe parallel environment. It can significantly reduce the calculation time and improve parallel efficiency. Furthermore, AAD divides the axial layers and OpenMP threads into multiple groups and restricts each thread to work on the layers designated to the same group. </p> <p>Meanwhile, Flag-Save-Update reduction is designed to increase the computational efficiency of the hybrid MPI/OpenMP reduction operations in the ML-CMFD module. It is accomplished by using the global arrays and status flags and establishing a tree configuration of all threads, and it includes no implicit and explicit barriers. In the case of the C5G7 3D core, the parallel efficiency of the MOC solver is about 0.872 when using 32 threads (=\#MPI \(\times\)\#OpenMP), and the Flag-Save-Update reduction yielded better speedup than the traditional hybrid MPI/OpenMP reduction, and its superiority is more obvious as more OpenMP threads are utilized. As a result, the WCP model outperforms the PMPI model for the overall steady-state calculation.</p> <p><br></p> <p>This research also investigates parallelizable preconditioners to accelerate the convergence of the generalized minimal residual method (GMRES) in the CMFD solver. Preconditioners such as Incomplete LU factorization (ILU), Symmetric Successive Over-relaxation (SOR), and Reduced Symmetric Successive Over-Relaxation (RSOR), are implemented in PANDAS-MOC. Except for RSOR, others are unsuitable for hybrid MPI/OpenMP parallel machines due to their inherent sequential nature and dependency on computation order. Their counterparts using the Red-Black ordering algorithm, namely RB-SOR, RB-RSOR, and RB-ILU, are formatted and examined on benchmark reactors such as TWIGL-2D, C5G7-2D, C5G7-3D, and their corresponding subplane models (TWIGL-2D(5S), C5G7-2D(5S), C5G7-3D(5S)), with relaxed convergence criteria (\(10^{-3}\)). Results show that all preconditioners significantly reduce the required number of iterations to converge the GMRES solutions, and RB-SOR is the best one for most reactors. In the case of C5G7-3D(5S), preconditioners exhibit similar sublinear speedup but demonstrate varying runtimes across all tests for both MG-GMRES and 1G-GMRES. However, the speedup results in 1G-GMRES are more than twice as high as those in MG-GMRES. RB-RSOR has an optimal efficiency of 0.6967 at (4,8), while RB-SOR and RB-ILU have optimal efficiencies of 0.6855 and 0.7275 at (32,1), respectively.</p> Numerical analysis Nuclear physics Neutron transport Parallel computing PANDAS-MOC Large-scale Linear System Method of characteristics CMFD acceleration Preconditioner 2D/1D method Reactor physics
408	Exploring Selective coherence as a Solution to Self-invalidation in ArgoDSM Edberg, Christopher January 2022 (has links) Maintaining coherency in a distributed system can prove challenging, this is especially true for distributed shared memory systems. The problem with remote synchronization in the distributed shared memory software ArgoDSM occurs when a lock operation has to cross the boundaries of a node, this causes a large number of self-invalidations (SI) or self-downgrades (SD) which is costly. The performance of the coherency protocol can be improved if the SI/SD situations can be avoided by using a suitable alternative. This work explores if the use of selective coherence operations and non-synchronizing locking can help alleviate the issue of SI and SD in ArgoDSM in order to improve performance compared to the cache-wide coherence operations that are triggered by the default locking mechanism in ArgoDSM. An implementation of the concept is done by replacing the standard coherence protocol used in locking operations with selective operations which is then used to analyze the performance compared to the baseline software. The selective coherence operations are more powerful than the default protocol when applied to synchronization-heavy benchmarks, while the baseline software performs better when there is a lower amount of parallel work being done. ArgoDSM Coherence Performance Selective Coherence Distributed Shared Memory Software-based Distributed Shared Memory DSM Software DSM Computer Architecture Parallel Computing Computer and Information Sciences Data- och informationsvetenskap
409	A Descriptive Performance Model of Small, Low Cost, Diskless Beowulf Clusters Nielson, Curtis R. 16 September 2003 (has links) (PDF) Commodity supercomputing clusters known as Beowulf clusters, have become a low cost alternative to traditional supercomputers. Beowulf clusters combine inexpensive computers and specialized software to achieve supercomputing power. The processing nodes in a diskless Beowulf cluster do not have a local hard disk unlike the nodes in most commodity clusters. Research has provided performance information for diskless clusters built with expensive, high performance equipment. Beowulf clusters use commodity off-the-shell hardware, and little information is available about their performance. This research includes the construction of several diskless Beowulf clusters. Using the NAS Parallel Benchmarks, the performance of these clusters was measured. Through analysis of these measurements, a descriptive performance model of diskless Beowulf clusters was produced. clustered computing parallel computing Beowulf cluster Linux cluster cluster diskless cluster parallel processing NAS parallel benchmark benchmarking Construction Engineering and Management Manufacturing
410	A Conjugate Residual Solver with Kernel Fusion for massive MIMO Detection Broumas, Ioannis January 2023 (has links) This thesis presents a comparison of a GPU implementation of the Conjugate Residual method as a sequence of generic library kernels against implementations ofthe method with custom kernels to expose the performance gains of a keyoptimization strategy, kernel fusion, for memory-bound operations which is to makeefficient reuse of the processed data. For massive MIMO the iterative solver is to be employed at the linear detection stageto overcome the computational bottleneck of the matrix inversion required in theequalization process, which is 𝒪(𝑛3) for direct solvers. A detailed analysis of howone more of the Krylov subspace methods that is feasible for massive MIMO can beimplemented on a GPU as a unified kernel is given. Further, to show that kernel fusion can improve the execution performance not onlywhen the input data is large matrices-vectors as in scientific computing but also inthe case of massive MIMO and possibly similar cases where the input data is a largenumber of small matrices-vectors that must be processed in parallel.In more details, focusing on the small number of iterations required for the solver toachieve a close enough approximation of the exact solution in the case of massiveMIMO, and the case where the number of users matches the size of a warp, twodifferent approaches that allow to fully unroll the algorithm and gradually fuse allthe separate kernels into a single, until reaching a top-down hardcodedimplementation are proposed and tested. Targeting to overcome the algorithms computational burden which is the matrixvector product, further optimization techniques such as two ways to utilize the faston-chip memories, preloading the matrix in shared memory and preloading thevector in shared memory, are tested and proposed to achieve high efficiency andhigh parallelism. MIMO massive MIMO GPU CUDA Software Defined Radio SDR MMSE ZF zero-forcing parallel detection iterative methods conjugate residual parallel computing kernel fusion Embedded Systems Inbäddad systemteknik

Search results