Global ETD Search

91	Analyzing communication flow and process placement in Linda programs on transputers De-Heer-Menlah, Frederick Kofi 28 November 2012 (has links) With the evolution of parallel and distributed systems, users from diverse disciplines have looked to these systems as a solution to their ever increasing needs for computer processing resources. Because parallel processing systems currently require a high level of expertise to program, many researchers are investing effort into developing programming approaches which hide some of the difficulties of parallel programming from users. Linda, is one such parallel paradigm, which is intuitive to use, and which provides a high level decoupling between distributable components of parallel programs. In Linda, efficiency becomes a concern of the implementation rather than of the programmer. There is a substantial overhead in implementing Linda, an inherently shared memory model on a distributed system. This thesis describes the compile-time analysis of tuple space interactions which reduce the run-time matching costs, and permits the distributon of the tuple space data. A language independent module which partitions the tuple space data and suggests appropriate storage schemes for the partitions so as to optimise Linda operations is presented. The thesis also discusses hiding the network topology from the user by automatically allocating Linda processes and tuple space partitons to nodes in the network of transputers. This is done by introducing a fast placement algorithm developed for Linda. / KMBT_223 LINDA (Computer system) Transputers Parallel programming (Computer science)
92	Towards a portable occam Hill, David Timothy 07 March 2013 (has links) Occam is designed for concurrent programming on a network of transputers. AIlocation and partitioning of the program is specified within the source code, binding the program to a specific network. An altemative approach is proposed which completely separates the source code from hardware considerations. Static allocation is performed as a separate phase and should, ideally, be automatic but at present is manual. Complete hardware abstraction requires that non-local, shared communication be provided for, introducing an efficiency overhead which can be minimised by the allocation. The proposal was implemented on a network of IBM PCs, modelled on a transputer network, and implementation issues are discussed occam (Computer program language) Transputers Parallel programming (Computer science)
93	Unsupervised-based Distributed Machine Learning for Efficient Data Clustering and Prediction Baligodugula, Vishnu Vardhan 23 May 2023 (has links) No description available. Computer Science Machine Learning Parallel Programming Distributing Computing
94	A Parallelized Naïve Algorithm for Pattern Matching Svensson, William January 2022 (has links) The pattern matching is the problem of locating one string, a pattern, inside another, a text, which is required in for example databases, search engines, and text editors. Thus, several algorithms have been created to tackle this problem and this thesis evaluates whether a parallel version of the Naïve algorithm, given a reasonable amount of threads for a personal computer, could become more efficient than some state-of-the-art algorithms used today. Therefore, an algorithm from the Deadzone family, the Horspool algorithm, and a parallel Naïve algorithm was implemented and evaluated on two different sized alphabets. The results show that a parallel Naïve implementation is to be favoured over the Deadzone and Horspool on a alphabet of size 4 for patterns larger than 2 up to 20. Furthermore, for alphabet of size 256 the parallel Naïve should also be used for patterns of lengths 1 to 20. Pattern matching Parallel programming Computer Sciences Datavetenskap (datalogi)
95	Dynamic memory management for the Loci framework Zhang, Yang 08 May 2004 (has links) Resource management is a critical part in high-performance computing software. While management of processing resources to increase performance is the most critical, efficient management of memory resources plays an important role in solving large problems. This thesis research seeks to create an effective dynamic memory management scheme for a declarative data-parallel programming system. In such systems, some sort of automatic resource management is a requirement. Using the Loci framework, this thesis research focuses on exploring such opportunities. We believe there exists an automatic memory management scheme for such declarative data-parallel systems that provides good compromise between memory utilization and performance. In addition to basic memory management, this thesis research also seeks to develop methods that take advantages of the cache memory subsystem and explore balances between memory utilization and parallel communication costs in such declarative data-parallel frameworks. graph processing parallel programming declarative languages software synthesis
96	Problem specific environments for parallel scientific computing Auvil, Loretta Sue 04 December 2009 (has links) Parallelism is one of the key components of large scale, high performance computing. Extensive use of parallelism has yielded a tremendous increase in the raw processing speed of high performance systems, but parallel problem solving remains difficult. These difficulties are typically solved by building software tools, such as parallel programming environments. Existing parallel programming environments are general purpose and use a broad paradigm. This thesis illustrates that problem specific environments are more beneficial than general purpose environments. A problem specific environment permits the design of the algorithm, while also facilitating definition of the problem. We developed problem specific environments for a simple and a complex class of problems. The simple class consists of two classic graph problems, namely, all pairs shortest path and connected components. The complex class consists of elliptic partial differential equations solved via domain decomposition. Specific problems were solved with the problem specific environments and the general purpose environment, BUILD, which allows the algorithm to be described with a control flow graph. Comparisons between special purpose environments and general purpose environments show that the special purpose environments allow the user to concentrate on the problem, while general purpose environments force the user to think about mapping the problem to the environment rather than solving the problem in parallel. Therefore, we conclude more effort should be spent on building tools and environments for parallel computing that focus specifically on a particular class of problems. / Master of Science LD5655.V855 1992.A884 Parallel programming (Computer science)
97	Explicit parallel programming Gamble, James Graham 08 June 2009 (has links) While many parallel programming languages exist, they rarely address programming languages from the issue of communication (implying expressability, and readability). A new language called Explicit Parallel Programming (EPP), attempts to provide this quality by separating the responsibility for the execution of run time actions from the responsibility for deciding the order in which they occur. The ordering of a parallel algorithm is specified in the new EPP language; run time actions are written in FORTRAN and embedded in the EPP code, from which they are later extracted and given to a FORTRAN compiler for compilation. The separation of order and run time actions is taken to its logical extreme in an attempt to evaluate its benefits and costs in the parallel environment. As part of the evaluation, a compiler and executive were implemented on a Sequent Symmetry 881 shared memory multiprocessor computer. The design decisions and difficulties in implementation are discussed at some length, since certain problems are unique to this approach. In the final evaluation, the EPP project asserts that structured, parallel programming languages require a significant amount of interaction with an over-seeing task in order to provide some basic, desirable functions. It also asserts that the description of run time actions (e.g., expression syntax) need not change from the uniprocessor environment to the multiprocessor environment. / Master of Science LD5655.V855 1990.G362 Parallel programming (Computer science)
98	Chitra: a visualization system to analyze the dynamics of parallel programs Doraswamy, Naganand. January 1991 (has links) Visualization is gaining popularity in the field of Computer Science, especially in areas such as performance evaluation and program animation. In this thesis, we explore the possibility of using visualization to analyze parallel program dynamics. We have developed a visualization system, Chitra for this purpose. Chitra visualizes program execution sequences collected by monitoring parallel and distributed program execution. It provides multiple views to aid in the analysis. Through analysis of program execution sequences, we develop an empirical model, a hybrid stochastic/deterministic model, that describes parallel program behavior. Chitra provides various clues and capabilities that assist in developing an empirical model to fit the observed program behavior. Developing empirical models helps in predicting and evaluation efficiency of parallel programs. A case study of the Dining philosophers problem, a classic resource sharing problem, applies Chitra to develop empirical models for a range of processes. These models help in evaluating efficiency at a range of parameter values, given observations of a few parameters values. Working with Chitra has strengthened our belief that as parallel and distributed programs become more common, visualization systems will have an important role to play in analyzing these programs. / M.S. LD5655.V855 1991.D382 Parallel programming (Computer science) Visualization
99	A Runtime Framework for Regular and Irregular Message-Driven Parallel Applications on GPU Systems Rengasamy, Vasudevan January 2014 (has links) (PDF) The effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing data transfer between CPU and GPU, increasing occupancy of GPU kernels, overlapping data transfers with computations, reducing GPU idling and kernel optimizations. Overcoming these challenges require considerable effort on the part of the application developers. Most optimization strategies are often proposed and tuned specifically for individual applications. Message-driven executions with over-decomposition of tasks constitute an important model for parallel programming and provide multiple benefits including communication-computation overlap and reduced idling on resources. Charm++ is one such message-driven language which employs over decomposition of tasks, computation-communication overlap and a measurement-based load balancer to achieve high CPU utilization. This research has developed an adaptive runtime framework for efficient executions of Charm++ message-driven parallel applications on GPU systems. In the first part of our research, we have developed a runtime framework, G-Charm with the focus primarily on optimizing regular applications. At runtime, G-Charm automatically combines multiple small GPU tasks into a single larger kernel which reduces the number of kernel invocations while improving CUDA occupancy. G-Charm also enables reuse of existing data in GPU global memory, performs GPU memory management and dynamic scheduling of tasks across CPU and GPU in order to reduce idle time. In order to combine the partial results obtained from the computations performed on CPU and GPU, G-Charm allows the user to specify an operator using which the partial results are combined at runtime. We also perform compile time code generation to reduce programming overhead. For Cholesky factorization, a regular parallel application, G-Charm provides 14% improvement over a highly tuned implementation. In the second part of our research, we extended our runtime to overcome the challenges presented by irregular applications such as a periodic generation of tasks, irregular memory access patterns and varying workloads during application execution. We developed models for deciding the number of tasks that can be combined into a kernel based on the rate of task generation, and the GPU occupancy of the tasks. For irregular applications, data reuse results in uncoalesced GPU memory access. We evaluated the effect of altering the global memory access pattern in improving coalesced access. We’ve also developed adaptive methods for hybrid execution on CPU and GPU wherein we consider the varying workloads while scheduling tasks across the CPU and GPU. We demonstrate that our dynamic strategies result in 8-38% reduction in execution times for an N-body simulation application and a molecular dynamics application over the corresponding static strategies that are amenable for regular applications. Graphics Processing Unit (GPU) Parallel Programming (Computer Science) Parallel Programming Models Parallel Programming Frameworks Charm++ (Computer Program Language) HybridAPI-GPU Management Framework G-Charm Framework Accelerator Based Computing Cholesky Factorization Computer Science
100	3D Multi-parameters Full Waveform Inversion for challenging 3D elastic land targets / Inversion sismique 3D des formes d'onde complètes pour des cibles terrestres complexes Trinh, Phuong-Thu 24 September 2018 (has links) L’imagerie sismique du sous-sol à partir de données terrestres est très difficile à effectuer due à la complexité 3D de la proche surface. Dans cette zone, les ondes sismiques sous forme d’un paquet compact de phases souvent imbriquées sont dominées par des effets élastiques et viscoélastiques, couplés aux effets dus à la surface libre qui génèrent des ondes de surface de grande amplitude et dispersives.L’interaction des ondes sismiques avec une topographie plus ou moins complexe dans un contexte de fortes hétérogénéités de la proche surface induit d’importantes conversions des ondes avec de fortes dispersions d’énergie. Il est donc nécessaire de prendre en compte à la fois une représentation tridimensionnelle précise de la topographie et une physique correcte qui rend compte de la propagation du champ d’onde dans le sous-sol au niveau de précision réclamé par l’imagerie sismique. Dans ce manuscrit, nous présentons une stratégie d’inversion des formes d’onde complètes (FWI en anglais) efficace, autonome et donc flexible, pour la construction de modèles de vitesse à partir de données sismiques terrestres, plus particulièrement dans les environnements dits de chevauchements d’arrière pays(foothills en anglais) aux variations de vitesse importantes.Nous proposons une formulation efficace de cette problématique basée sur une méthode d’éléments spectraux en domaine temporel sur une grille cartésienne déformée, dans laquelle les variations de topographie sont représentées par une description détaillée de sa géométrie via une interpolation d’ordre élevé. La propagation du champ d’onde est caractérisée par une élasticité linéaire anisotrope et par une atténuation isotrope du milieu: cette deuxième approximation semble suffisante pour l’imagerie crustale considérée dans ce travail. L’implémentation numérique du problème direct inclut des produits matricevecteurefficaces pour résoudre des équations élastodynamiques composant un système différentielhyperbolique du second ordre, pour les géométries tridimensionnelles rencontrées dans l’exploration sismique. Les expressions explicites des gradients de la fonction écart entre les données et les prédictions sont fournies et inclut les contributions de la densité, des paramètres élastiques et des coefficients d’atténuation. Ces expressions réclament le champ incident venant de la source au même temps de propagation que le champ adjoint. Pour ce faire, lors du calcul du champ adjoint à partir de l’instant final, le champ incident est recalculé au vol à partir de son état final, de conditions aux bords préalablement sauvegardées et de certains états intermédiaires sans stockage sur disques durs. Le gradient est donc estimé à partir de quantités sauvegardées en mémoire vive. Deux niveaux de parallélisme sont implémentés, l’un sur les sources et l’autre sur la décomposition du domaine pour chaque source, cequi est nécessaire pour aborder des configurations tridimensionnelles réalistes. Le préconditionnement de ce gradient est réalisé par un filtre dit de Bessel, utilisant une implémentation différentielle efficace fondée sur la même discrétisation de l’espace du problème direct et formulée par une approche d’éléments spectraux composant un système linéaire symétrique résolu par une technique itérative de gradient conjugué. De plus, une contrainte non-linéaire sur le rapport des vitesses de compression et de cisaillement est introduite dans le processus d’optimisation sans coût supplémentaire: cette introductions’avére nécessaire pour traiter les données en présence de faibles valeurs de vitesse proche de la surface libre.L’inversion élastique multi-paramètres en contexte de chevauchement est illustrée à travers des exemples de données synthétiques dans un premier temps, ce qui met en évidence les difficultés d’une telle reconstruction…. / Seismic imaging of onshore targets is very challenging due to the 3D complex near-surface-related effects. In such areas, the seismic wavefield is dominated by elastic and visco-elastic effects such as highly energetic and dispersive surface waves. The interaction of elastic waves with the rough topography and shallow heterogeneities leads to significant converted and scattering energies, implying that both accurate 3D geometry representation and correct physics of the wave propagation are required for a reliable structured imaging. In this manuscript, we present an efficient and flexible full waveform inversion (FWI) strategy for velocity model building in land, specifically in foothill areas.Viscoelastic FWI is a challenging task for current acquisition deployment at the crustal scale. We propose an efficient formulation based on a time-domain spectral element method (SEM) on a flexible Cartesian-based mesh, in which the topography variation is represented by an accurate high-order geometry interpolation. The wave propagation is described by the anisotropic elasticity and isotropic attenuation physics. The numerical implementation of the forward problem includes efficient matrix-vector products for solving second-order elastodynamic equations, even for completely deformed 3D geometries. Complete misfit gradient expressions including attenuation contribution spread into density, elastic parameters and attenuation factors are given in a consistent way. Combined adjoint and forward fields recomputation from final state and previously saved boundary values allows the estimation of gradients with no I/O efforts. Two-levels parallelism is implemented over sources and domain decomposition, which is necessary for 3D realistic configuration. The gradient preconditioning is performed by a so-called Bessel filter using an efficient differential implementation based on the SEM discretization on the forward mesh instead of the costly convolution often-used approach. A non-linear model constraint on the ratio of compressional and shear velocities is introduced into the optimization process at no extra cost.The challenges of the elastic multi-parameter FWI in complex land areas are highlighted through synthetic and real data applications. A 3D synthetic inverse-crime illustration is considered on a subset of the SEAM phase II Foothills model with 4 lines of 20 sources, providing a complete 3D illumination. As the data is dominated by surface waves, it is mainly sensitive to the S-wave velocity. We propose a two-steps data-windowing strategy, focusing on early body waves before considering the entire wavefield, including surface waves. The use of this data hierarchy together with the structurally-based Bessel preconditioning make possible to reconstruct accurately both P- and S-wavespeeds. The designed inversion strategy is combined with a low-to-high frequency hierarchy, successfully applied to the pseudo-2D dip-line survey of the SEAM II Foothill dataset. Under the limited illumination of a 2D acquisition, the model constraint on the ratio of P- and S-wavespeeds plays an important role to mitigate the ill-posedness of the multi-parameter inversion process. By also considering surface waves, we manage to exploit the maximum amount of information in the observed data to get a reliable model parameters estimation, both in the near-surface and in deeper part.The developed FWI frame and workflow are finally applied on a real foothill dataset. The application is challenging due to sparse acquisition design, especially noisy recording and complex underneath structures. Additional prior information such as the logs data is considered to assist the FWI design. The preliminary results, only relying on body waves, are shown to improve the kinematic fit and follow the expected geological interpretation. Model quality control through data-fit analysis and uncertainty studies help to identify artifacts in the inverted models. Physics Geophysics Applied mathematics Numerical method Parallel programming Seismic modeling and imaging Physics Geophysics Applied mathematics Numerical method Parallel programming Seismic modeling and imaging 520

Search results