Global ETD Search

221	Non-Equilibrium Many-Body Influence on Mode-Locked Vertical External-Cavity Surface-Emitting Lasers Kilen, Isak Ragnvald, Kilen, Isak Ragnvald January 2017 (has links) Vertical external-cavity surface-emitting lasers are ideal testbeds for studying the influence of the non-equilibrium many-body dynamics on mode locking. As we will show in this thesis, ultra short pulse generation involves a marked departure from Fermi carrier distributions assumed in prior theoretical studies. A quantitative model of the mode locking dynamics is presented, where the semiconductor Bloch equations with Maxwell’s equation are coupled, in order to study the influences of quantum well carrier scattering on mode locking dynamics. This is the first work where the full model is solved without adiabatically eliminating the microscopic polarizations. In many instances we find that higher order correlation contributions (e.g. polarization dephasing, carrier scattering, and screening) can be represented by rate models, with the effective rates extracted at the level of second Born-Markov approximations. In other circumstances, such as continuous wave multi-wavelength lasing, we are forced to fully include these higher correlation terms. In this thesis we identify the key contributors that control mode locking dynamics, the stability of single pulse mode-locking, and the influence of higher order correlation in sustaining multi-wavelength continuous wave operation. Maxwell semiconductor Bloch equations mode-locked pulses non-equilibrium dynamics numerical simulation parallel computing VECSEL
222	SPC-PM Po 3D --- Programmers Manual Apel, Th., Milde, F., Theß, M. 30 October 1998 (has links) (PDF) The experimental program ¨SPC-PM Po 3D¨ is part of the ongoing research of the Chemnitz research group Scientific Parallel Computing (SPC) into finite element methods for problems over three dimensional domains. The package in its version 2.0 is documented in two manuals. The User's Manual provides an overview over the program, its capabilities, its installation, and handling. Moreover, test examples are explained. The aim of the Programmer's Manual is to provide a description of the algorithms and their realization. It is written for those who are interested in a deeper insight into the code, for example for improving and extending. In Version 2.0 the program can solve the Poisson equation and the Lam'e system of linear elasticity with in general mixed boundary conditions of Dirichlet and Neumann type. The domain $\Omega\subset\R^3$ can be an arbitrarily bounded polyhedron. The input is a coarse mesh, a description of the data and some control parameters. The program distributes the elements of the coarse mesh to the processors, refines the elements, generates the system of equations using linear or quadratic shape functions, solves this system and offers graphical tools to display the solution. Further, the behavior of the algorithms can be monitored: arithmetic and communication time is measured, the discretization error is measured, different preconditioners can be compared. We plan to extend the program in the next future by including a multigrid solver, an error estimator and adaptive mesh refinement, as well as the treatment of coupled thermo-elastic problems. The program has been developed for MIMD computers; it has been tested on Parsytec machines (GCPowerPlus-128 with Motorola Power PC601 processors and GCel-192 on transputer basis) and on workstation clusters using PVM. The special case of only one processor is included, that means the package can be compiled for single processor machines without any change in the source files. parallel computing finite element methods MSC 65Y05 ddc:510 ddc:004
223	Random Forests for CUDA GPUs Lapajne, Mikael Hellborg, Slat, Daniel January 2010 (has links) Context. Machine Learning is a complex and resource consuming process that requires a lot of computing power. With the constant growth of information, the need for efficient algorithms with high performance is increasing. Today's commodity graphics cards are parallel multi processors with high computing capacity at an attractive price and are usually pre-installed in new PCs. The graphics cards provide an additional resource to be used in machine learning applications. The Random Forest learning algorithm which has been showed competitive within machine learning has a good potential for performance increase through parallelization of the algorithm. Objectives. In this study we implement and review a revised Random Forest algorithm for GPU execution using CUDA. Methods. A review of previous work in the area has been done by studying articles from several sources, including Compendex, Inspec, IEEE Xplore, ACM Digital Library and Springer Link. Additional information regarding GPU architecture and implementation specific details have been obtained mainly from documentation available from Nvidia and the Nvidia developer forums. The implemented algorithm has been benchmarked and compared with two state-of-the-art CPU implementations of the Random Forest algorithm, both regarding consumed time for training and classification and for classification accuracy. Results. Measurements from benchmarks made on the three different algorithms are gathered showing the performance results of the algorithms for two publicly available data sets. Conclusion. We conclude that our implementation under the right conditions is able to outperform its competitors. We also conclude that this is only true for certain data sets depending on the size of the data sets. Moreover we conclude that there is potential for further improvements of the algorithm both regarding performance as well as adaption towards a wider range of real world applications. / Mikael: +46768539263, Daniel: +46703040693 CUDA Random forests Parallel computing Graphics processing units Software Engineering Programvaruteknik
224	A Skeleton Programming Library for Multicore CPU and Multi-GPU Systems Enmyren, Johan January 2010 (has links) This report presents SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP back end. It also supports multi-GPU systems. Benchmarks show that copying data between the host and the GPU is often a bottleneck. Therefore a container which uses lazy memory copying has been implemented to avoid unnecessary memory transfers. SkePU was evaluated with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. The best performance gains are received when the computation load is large compared to memory I/O (the lazy memory copying can help to achieve this). We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to ten times faster run times when using SkePU with a GPU back end compared to a sequential solver running on a fast CPU. From the benchmarks we can conclude that skeletal parallel programming is indeed a viable approach for GPU Computing and that a generalized interface for multiple back ends is also reasonable. SkePU does however have some disadvantages too; there is some overhead in using the library which we can see from the dot product and LibSolve benchmarks. Although not big, it is still there and if performance is of uttermost importance, then a hand coded solution would be best. One cannot express all calculations in terms of skeletons either, if one have such a problem, specialized routines must still be created. CUDA OpenCL Skeleton Programming Parallel Computing Data Parallelism Computer Sciences Datavetenskap (datalogi)
225	Simulating Partial Differential Equations using the Explicit Parallelism of ParModelica Thorslund, Gustaf January 2015 (has links) The Modelica language is a modelling and programming language for modelling cyber-physical systems using equations and algorithms. In this thesis two suggested extensions of the Modelica language are covered. Those are Partial Differential Equations (PDE) and explicit parallelism in algorithmic code. While PDEs are not yet supported by the Modelica language, this thesis presents a framework for solving PDEs using the algorithmic part of the Modelica language, including parallel extensions. Different numerical solvers have been implemented using the explicit parallel constructs suggested for Modelica by the ParModelica language extensions, and implemented as part of OpenModelica. The solvers have been evaluated using different models, and it can be seen how bigger models are suitable for a parallel solver. The intention has been to write a framework suitable for modelling and parallel simulation of PDEs. This work can, however, also be seen as a case study of how to write a custom solver using parallel algorithmic Modelica and how to evaluate the performance of a parallel solver. OpenModelica ParModelica PDE Parallel Computing GPU GPGPU Computer Sciences Datavetenskap (datalogi)
226	Re-scheduling the Railway Traffic using Parallel Simulated Annealing and Tabu Search : A comparative study Gerdovci, Petrit, Boman, Sebastian January 2015 (has links) Context: This study has been conducted in the area of train rescheduling. One of the most common types of disturbance scenarios are trains which have deviated from their originally planned arrival or departure times. This type of disturbance is of today handled manually by the train dispatcher, which in some cases can be cumbersome and overwhelmingly complex to solve. Therefore, there is an essential need for a train re-scheduling decision support system. Objectives: The aim of the study is to determine if parallel adaptations of simulated annealing(SA), and tabu search(TS) are able to find high quality solutions for the train re-scheduling problem. The study also aims to compare the two proposed meta-heuristics in order to determine the more adequate algorithm for the given problem. Methods: To answer the research question sequential and parallel versions of the algorithms were implemented. Further the research methodology of choice was experiment, were the meta-heuristics are evaluated based on 10 disturbance scenarios. Results: Parallel simulated annealing(PSA) is overall the better performing algorithm, as it is able to reduce the total delay by 585 seconds more than parallel tabu search(PTS) for the 10 disturbance scenarios. However, PTS is able to solve more conflicts per millisecond than PTS, when compared to their sequential versions. Conclusions: We conclude that both the parallel versions perform better than their sequential versions. Further, PSA is clearly able to outperform PTS in terms of minimizing the accumulated delay. One observation is that the parallel versions are not reaching their max efficiency per thread, this is assumed to be caused by the RAM. For future work we propose further investigation of why we are not reaching the max efficiency per thread, and further improvement of algorithm settings. Parallel Computing tabu search simulated annealing train re-scheduling Computer Sciences Datavetenskap (datalogi)
227	Strain and defects in irradiated materials : a study using X-ray diffraction and diffuse scattering / Défauts et déformations au sein de matériaux irradiés : Etude par diffraction et diffusion diffuse des rayons X Channagiri, Jayanth 04 December 2015 (has links) Les faisceaux d'ions, sont communément utilisés dans le cadre de l'étude des matériaux du nucléaire dans le but de reproduire, dans une certaine mesure, les différentes sources d'irradiations auxquelles sont soumis ces matériaux. L’interaction des ions avec la matière induit la formation de défauts cristallins le long du trajet de ces ions, associée à d'importantes déformations au sein de la zone irradiée. L'un des principaux enjeux de l'industrie électro-nucléaire consiste en l'encapsulation, à long terme, des déchets nucléaires. La zircone yttriée (YSZ) est un des matériaux qui pourrait être utilisé comme matrice inerte pour la transmutation des actinides. Par conséquent, la compréhension du comportement d’YSZ sous différentes conditions d'irradiations est d'une importance capitale.Cette thèse est décomposée en deux parties distinctes. Dans la première partie de ce travail, nous avons utilisé plusieurs techniques avancées de diffraction des rayons X (DRX) dans le but de caractériser les défauts et déformations au sein de la zone irradiée des cristaux étudiés. Les profils de déformations et de défauts ont été modélisés par des fonctions B-splines cubiques et les données DRX ont été simulées en utilisant la théorie dynamique de la diffraction couplée à un algorithme de recuit simulé généralisé. Cette démarche a été appliquée au cas des monocristaux d'YSZ irradiés par des ions Au 2+ dans une large gamme de températures et de fluences. Les résultats ont été comparés avec ceux de la spectroscopie de rétrodiffusion de Rutherford en mode canalisé (RBS/C) obtenus pour les mêmes échantillons.La deuxième partie est consacrée au développement d'un modèle spécifique pour calculer la distribution bidimensionnelle d'intensité diffractée par des monocristaux irradiés de grandes dimensions et présentant des distributions de défauts réalistes. Pour atteindre cet objectif, nous avons mis en œuvre une approche de calcul parallèle haute performance (basée à la fois sur l'utilisation de processeurs multi-cœurs et de processeurs graphiques) afin de réduire les durées de calcul. Cette approche a été utilisée pour modéliser les cartographies X de l'espace réciproque de monocristaux d’YSZ présentant des défauts de structure complexe. / Ion beams are commonly used in the framework of nuclear materials in order to reproduce, in a controlled way, the different sources of irradiation that these materials are submitted to. The interaction of ions with the material induces the formation of crystalline defects along the path of these ions,associated with high strains in the irradiated region. One of the main issues of the electro-nuclearindustry is the encapsulation of the long-term nuclear waste. Yttria stabilized zirconia (YSZ) is one of the materials that can be used as an inert matrix for the transmutation of actinides and therefore,understanding its behaviour under different conditions of irradiation is of utmost importance.This thesis is divided into two distinct parts. In the first part of this work, we have used advanced X-raydiffraction (XRD) techniques in order to characterize the strain and the damage levels within the irradiated region of the crystals. The strain and the damage profiles were modelled using B-splines functions and the XRD data were simulated using the dynamical theory of diffraction combined with a generalized simulated annealing algorithm. This approach was used to study YSZ single crystals irradiated with Au 2+ ions in a wide range of temperatures and fluences. The results were compared with the RBS/C results obtained for same samples.The second part of the thesis is devoted to the development of a specific model for calculating the two-dimensional XRD intensity from irradiated single crystals with realistic dimensions and defectdistributions. In order to achieve this goal, we have implemented high-performance parallel computing (both multi-processing and GPU-based) to accelerate the calculations. The approach was used to successfully model the reciprocal space maps of the YSZ single crystals which exhibit a complex defect structure. Irradiation aux ions Modélisation Déformation Calcul parallèle Ion irradiation Modelling Strain Parallel computing 620.112 72
228	Enforcing Security Policies On GPU Computing Through The Use Of Aspect-Oriented Programming Techniques Albassam, Bader 29 June 2016 (has links) This thesis presents a new security policy enforcer designed for securing parallel computation on CUDA GPUs. We show how the very features that make a GPGPU desirable have already been utilized in existing exploits, fortifying the need for security protections on a GPGPU. An aspect weaver was designed for CUDA with the goal of utilizing aspect-oriented programming for security policy enforcement. Empirical testing verified the ability of our aspect weaver to enforce various policies. Furthermore, a performance analysis was performed to demonstrate that using this policy enforcer provides no significant performance impact over manual insertion of policy code. Finally, future research goals are presented through a plan of work. We hope that this thesis will provide for long term research goals to guide the field of GPU security. Programming Languages Enforceability Theory Distributed Computing Parallel Computing CUDA Computer Engineering Computer Sciences
229	Techniques for algorithm design on the instruction systolic array Schmidt, Bertil January 1999 (has links) Instruction systolic arrays (ISAs) provide a programmable high performance hardware for specific computationally intensive applications. Typically, such an array is connected to a sequential host, thus operating like a coprocessor which solves only the computationally intensive tasks within a global application. The ISA model is a mesh connected processor grid, which combines the advantages of special purpose systolic arrays with the flexible programmability of general purpose machines. The subject of this thesis is the analysis, design, and implementation of several special purpose algorithms and subroutines on the ISA that take advantage of the special features of the systolic information flow. The ability of ISAs to perform parallel prefix computations in an extremely efficient way is exploited as a key-operation to derive efficiency as well as local operations within each processor. Therefore, given sequential algorithms has to be decomposed in simple building blocks of parallel prefix computations and parallel local operations. To modify sequential algorithms for a parallelisation several techniques are introduced in this thesis, e. g. swapping of loops in the sequential algorithm, shearing of data, and appropriate mapping of input data onto the processor array It is demonstrated how these techniques can be exploited to derive efficient ISA algorithms for several computationally intensive applications. These include cryptographic applications (e. g. arithmetic operations on long operands, RSA encryption, RSA key generation) and image processing applications (e. g. convolution, Wavelet Transform, morphological operators, median filter, Fourier Transform, Hough Transform, Morphological Hough Transform, and tomographic image reconstruction). Their implementation on Systola 1024 - the first commercial parallel computer with the ISA architecture - shows that the concept of the ISA is very suitable for these applications and results in significant run time savings. The results of this thesis emphases the suitability of the ISA concept as an accelerator for computationally intensive applications in the areas of cryptography and image processing. This might lead research towards further high-speed low cost systems based on ISA hardware. 621.39
230	Robust and scalable hierarchical matrix-based fast direct solver and preconditioner for the numerical solution of elliptic partial differential equations Chavez, Gustavo Ivan 10 July 2017 (has links) This dissertation introduces a novel fast direct solver and preconditioner for the solution of block tridiagonal linear systems that arise from the discretization of elliptic partial differential equations on a Cartesian product mesh, such as the variable-coefficient Poisson equation, the convection-diffusion equation, and the wave Helmholtz equation in heterogeneous media. The algorithm extends the traditional cyclic reduction method with hierarchical matrix techniques. The resulting method exposes substantial concurrency, and its arithmetic operations and memory consumption grow only log-linearly with problem size, assuming bounded rank of off-diagonal matrix blocks, even for problems with arbitrary coefficient structure. The method can be used as a standalone direct solver with tunable accuracy, or as a black-box preconditioner in conjunction with Krylov methods. The challenges that distinguish this work from other thrusts in this active field are the hybrid distributed-shared parallelism that can demonstrate the algorithm at large-scale, full three-dimensionality, and the three stressors of the current state-of-the-art multigrid technology: high wavenumber Helmholtz (indefiniteness), high Reynolds convection (nonsymmetry), and high contrast diffusion (inhomogeneity). Numerical experiments corroborate the robustness, accuracy, and complexity claims and provide a baseline of the performance and memory footprint by comparisons with competing approaches such as the multigrid solver hypre, and the STRUMPACK implementation of the multifrontal factorization with hierarchically semi-separable matrices. The companion implementation can utilize many thousands of cores of Shaheen, KAUST's Haswell-based Cray XC-40 supercomputer, and compares favorably with other implementations of hierarchical solvers in terms of time-to-solution and memory consumption. hierarchical matrices cyclic reduction fast solvers Direct solvers preconditioning Parallel Computing

Search results