341 |
Paralelizace náročných úloh rekonstrukce v dynamické magnetické rezonanci / Parallelization of complex tasks in reconstruction of dynamic magnetic resonanceBijotová, Kateřina January 2019 (has links)
This thesis deals with parallelization of complex tasks in reconstruction of dynamic magnetic resonance. It describes the basic principle of magnetic resonance and its relation to Fourier transform. It deals with the difference between static and dynamic magnetic resonance image reconstruction. It analyzes SVD algorithm and its use in magnetic resonance image reconstruction. It presents the principles and the importance of parallel computing in magnetic resonance imaging and describes CUDA technology. The thesis also contains a description and execution of the implementation of the reconstruction model in MATLAB and Java programming language which were optimized by JCuda library for Java implementation and gpuArray function in case of MATLAB implementation.
|
342 |
Paralelizace náročných úloh rekonstrukce v dynamické magnetické rezonanci / Parallelization of complex tasks in reconstruction of dynamic magnetic resonanceBijotová, Kateřina January 2019 (has links)
This thesis deals with parallelization of complex tasks in reconstruction of dynamic magnetic resonance. It describes the basic principle of magnetic resonance and its relation to Fourier transform. It deals with the difference between static and dynamic magnetic resonance image reconstruction. It analyzes SVD algorithm and its use in magnetic resonance image reconstruction. It presents the principles and the importance of parallel computing in magnetic resonance imaging and describes CUDA technology. The thesis also contains a description and execution of the implementation of the reconstruction model in MATLAB and Java programming language which were optimized by JCuda library for Java implementation and gpuArray function in case of MATLAB implementation.
|
343 |
Simulating propeller and Propeller-Hull Interaction in OpenFOAMMehdipour, Reza January 2014 (has links)
This is a master’s thesis performed at the Department of Shipping and Marine Technology research group in Hydrodynamics at Chalmers University of Technology and is written for the Center for Naval Architecture at the Royal Institute of Technology, KTH.In order to meet increased requirements on efficient ship propulsions with low noise level, it is important to consider the complete system with both the hull and the propeller in the simulation.OpenFOAM (Open Field Operation and Manipulation) provides different techniques to simulate a rotating propeller with different physical and computational properties. MRF (The Multiple Reference Frame Model) is, perhaps, the easiest way but is a computationally efficient technique to model a rotating frame of reference. The sliding grid techniques provide the more complex way to simulate the propeller and its surrounding region, rotating and interpolate on interface for transient effects. AMI, Arbitrary Mesh Interface, is a sliding grid implementation which is available in the recent versions of OpenFOAM, introduced in the official releases after v2.1.0.In this study, the main objective is to compare these two techniques, MRF and AMI, to perform the open water characteristics of the propeller with the Reynolds-Averaged Navier-Stokes equation computations (RANS) and study the accuracy in parallel performance and the benefits of each approach.More specifically, a self-propelled ship is simulated to study the interaction between the hull and propeller. In order to simplify and decrease the computational complexity the free surface is not considered. The ship under investigation is a 7000 DWT chemical tanker which is subject of a collaborative R&D project called STREAMLINE, strategic research for innovative marine propulsion concepts. In self-propelled condition, the transient forces on the propeller shall be evaluated. This study investigates the results of the experimental work with advanced CFD for accurate analysis and design of the propulsion. In this thesis, all simulations are conducted by using parallel computing. Therefore, a scalability analysis is studied to find out how to affect the average computational time by using different number of nodes.
|
344 |
Reinforcement Learning in Eco-driving for Connected and Automated VehiclesZhu, Zhaoxuan January 2021 (has links)
No description available.
|
345 |
Optimization of Monte Carlo Neutron Transport Simulations with Emerging Architectures / Optimisation du code Monte Carlo neutronique à l’aide d’accélérateurs de calculsWang, Yunsong 14 December 2017 (has links)
L’accès aux données de base, que sont les sections efficaces, constitue le principal goulot d’étranglement aux performances dans la résolution des équations du transport neutronique par méthode Monte Carlo (MC). Ces sections efficaces caractérisent les probabilités de collisions des neutrons avec les nucléides qui composent le matériau traversé. Elles sont propres à chaque nucléide et dépendent de l’énergie du neutron incident et de la température du matériau. Les codes de référence en MC chargent ces données en mémoire à l’ensemble des températures intervenant dans le système et utilisent un algorithme de recherche binaire dans les tables stockant les sections. Sur les architectures many-coeurs (typiquement Intel MIC), ces méthodes sont dramatiquement inefficaces du fait des accès aléatoires à la mémoire qui ne permettent pas de profiter des différents niveaux de cache mémoire et du manque de vectorisation de ces algorithmes.Tout le travail de la thèse a consisté, dans une première partie, à trouver des alternatives à cet algorithme de base en proposant le meilleur compromis performances/occupation mémoire qui tire parti des spécificités du MIC (multithreading et vectorisation). Dans un deuxième temps, nous sommes partis sur une approche radicalement opposée, approche dans laquelle les données ne sont pas stockées en mémoire, mais calculées à la volée. Toute une série d’optimisations de l’algorithme, des structures de données, vectorisation, déroulement de boucles et influence de la précision de représentation des données, ont permis d’obtenir des gains considérables par rapport à l’implémentation initiale.En fin de compte, une comparaison a été effectué entre les deux approches (données en mémoire et données calculées à la volée) pour finalement proposer le meilleur compromis en termes de performance/occupation mémoire. Au-delà de l'application ciblée (le transport MC), le travail réalisé est également une étude qui peut se généraliser sur la façon de transformer un problème initialement limité par la latence mémoire (« memory latency bound ») en un problème qui sature le processeur (« CPU-bound ») et permet de tirer parti des architectures many-coeurs. / Monte Carlo (MC) neutron transport simulations are widely used in the nuclear community to perform reference calculations with minimal approximations. The conventional MC method has a slow convergence according to the law of large numbers, which makes simulations computationally expensive. Cross section computation has been identified as the major performance bottleneck for MC neutron code. Typically, cross section data are precalculated and stored into memory before simulations for each nuclide, thus during the simulation, only table lookups are required to retrieve data from memory and the compute cost is trivial. We implemented and optimized a large collection of lookup algorithms in order to accelerate this data retrieving process. Results show that significant speedup can be achieved over the conventional binary search on both CPU and MIC in unit tests other than real case simulations. Using vectorization instructions has been proved effective on many-core architecture due to its 512-bit vector units; on CPU this improvement is limited by a smaller register size. Further optimization like memory reduction turns out to be very important since it largely improves computing performance. As can be imagined, all proposals of energy lookup are totally memory-bound where computing units does little things but only waiting for data. In another word, computing capability of modern architectures are largely wasted. Another major issue of energy lookup is that the memory requirement is huge: cross section data in one temperature for up to 400 nuclides involved in a real case simulation requires nearly 1 GB memory space, which makes simulations with several thousand temperatures infeasible to carry out with current computer systems.In order to solve the problem relevant to energy lookup, we begin to investigate another on-the-fly cross section proposal called reconstruction. The basic idea behind the reconstruction, is to do the Doppler broadening (performing a convolution integral) computation of cross sections on-the-fly, each time a cross section is needed, with a formulation close to standard neutron cross section libraries, and based on the same amount of data. The reconstruction converts the problem from memory-bound to compute-bound: only several variables for each resonance are required instead of the conventional pointwise table covering the entire resolved resonance region. Though memory space is largely reduced, this method is really time-consuming. After a series of optimizations, results show that the reconstruction kernel benefits well from vectorization and can achieve 1806 GFLOPS (single precision) on a Knights Landing 7250, which represents 67% of its effective peak performance. Even if optimization efforts on reconstruction significantly improve the FLOP usage, this on-the-fly calculation is still slower than the conventional lookup method. Under this situation, we begin to port the code on GPGPU to exploit potential higher performance as well as higher FLOP usage. On the other hand, another evaluation has been planned to compare lookup and reconstruction in terms of power consumption: with the help of hardware and software energy measurement support, we expect to find a compromising solution between performance and energy consumption in order to face the "power wall" challenge along with hardware evolution.
|
346 |
A Parallel Adaptive Mesh Refinement Library for Cartesian MeshesJanuary 2019 (has links)
abstract: This dissertation introduces FARCOM (Fortran Adaptive Refiner for Cartesian Orthogonal Meshes), a new general library for adaptive mesh refinement (AMR) based on an unstructured hexahedral mesh framework. As a result of the underlying unstructured formulation, the refinement and coarsening operators of the library operate on a single-cell basis and perform in-situ replacement of old mesh elements. This approach allows for h-refinement without the memory and computational expense of calculating masked coarse grid cells, as is done in traditional patch-based AMR approaches, and enables unstructured flow solvers to have access to the automated domain generation capabilities usually only found in tree AMR formulations.
The library is written to let the user determine where to refine and coarsen through custom refinement selector functions for static mesh generation and dynamic mesh refinement, and can handle smooth fields (such as level sets) or localized markers (e.g. density gradients). The library was parallelized with the use of the Zoltan graph-partitioning library, which provides interfaces to both a graph partitioner (PT-Scotch) and a partitioner based on Hilbert space-filling curves. The partitioned adjacency graph, mesh data, and solution variable data is then packed and distributed across all MPI ranks in the simulation, which then regenerate the mesh, generate domain decomposition ghost cells, and create communication caches.
Scalability runs were performed using a Leveque wave propagation scheme for solving the Euler equations. The results of simulations on up to 1536 cores indicate that the parallel performance is highly dependent on the graph partitioner being used, and differences between the partitioners were analyzed. FARCOM is found to have better performance if each MPI rank has more than 60,000 cells. / Dissertation/Thesis / Doctoral Dissertation Aerospace Engineering 2019
|
347 |
Pending Event Set Management in Parallel Discrete Event SimulationGupta, Sounak 02 October 2018 (has links)
No description available.
|
348 |
Detection And Classification Of Buried Radioactive MaterialsWei, Wei 09 December 2011 (has links)
This dissertation develops new approaches for detection and classification of buried radioactive materials. Different spectral transformation methods are proposed to effectively suppress noise and to better distinguish signal features in the transformed space. The contributions of this dissertation are detailed as follows. 1) Propose an unsupervised method for buried radioactive material detection. In the experiments, the original Reed-Xiaoli (RX) algorithm performs similarly as the gross count (GC) method; however, the constrained energy minimization (CEM) method performs better if using feature vectors selected from the RX output. Thus, an unsupervised method is developed by combining the RX and CEM methods, which can efficiently suppress the background noise when applied to the dimensionality-reduced data from principle component analysis (PCA). 2) Propose an approach for buried target detection and classification, which applies spectral transformation followed by noisejusted PCA (NAPCA). To meet the requirement of practical survey mapping, we focus on the circumstance when sensor dwell time is very short. The results show that spectral transformation can alleviate the effects from spectral noisy variation and background clutters, while NAPCA, a better choice than PCA, can extract key features for the following detection and classification. 3) Propose a particle swarm optimization (PSO)-based system to automatically determine the optimal partition for spectral transformation. Two PSOs are incorporated in the system with the outer one being responsible for selecting the optimal number of bins and the inner one for optimal bin-widths. The experimental results demonstrate that using variable bin-widths is better than a fixed bin-width, and PSO can provide better results than the traditional Powell’s method. 4) Develop parallel implementation schemes for the PSO-based spectral partition algorithm. Both cluster and graphics processing units (GPU) implementation are designed. The computational burden of serial version has been greatly reduced. The experimental results also show that GPU algorithm has similar speedup as cluster-based algorithm.
|
349 |
An Internal Representation for Adaptive Online ParallelizationRehme, Koy D. 29 May 2009 (has links) (PDF)
Future computer processors may have tens or hundreds of cores, increasing the need for efficient parallel programming models. The nature of multicore processors will present applications with the challenge of diversity: a variety of operating environments, architectures, and data will be available and the compiler will have no foreknowledge of the environment until run time. Adaptive Online Parallelization (ADOPAR) is a unifying framework that attempts to overcome diver sity by separating discovery and packaging of parallelism. Scheduling for execution may then occur at run time when diversity may best be resolved. This work presents a compact representation of parallelism based on the task graph programming model, tailored especially for ADOPAR and for regular and irregular parallel computations. Task graphs can be unmanageably large for fine-grained parallelism. Rather than representing each task individually, similar tasks are grouped into task descriptors. From these, a task descriptor graph, with relationship descriptors forming the edges of the graph, may be represented. While even highly irregular computations often have structure, previous representations have chosen to restrict what can be easily represented, thus limiting full exploitation by the back end. Therefore, in this work, task and relationship descriptors have been endowed with instantiation functions (methods of descriptors that act as factories) so the front end may have a full range of expression when describing the task graph. The representation uses descriptors to express a full range of regular and irregular computations in a very flexible and compact manner. The representation also allows for dynamic optimization and transformation, which assists ADOPAR in its goal of overcoming various forms of diversity. We have successfully implemented this representation using new compiler intrinsics, allow ADOPAR schedulers to operate on the described task graph for parallel execution, and demonstrate the low code size overhead and the necessity for native schedulers.
|
350 |
Scalable Extraction and Visualization of Scientific Features with Load-Balanced ParallelismXu, Jiayi January 2021 (has links)
No description available.
|
Page generated in 0.0948 seconds