• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 218
  • 81
  • 19
  • 12
  • 6
  • 6
  • 6
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 444
  • 444
  • 219
  • 172
  • 85
  • 76
  • 70
  • 66
  • 59
  • 53
  • 52
  • 48
  • 46
  • 42
  • 41
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Towards a portable occam

Hill, David Timothy 07 March 2013 (has links)
Occam is designed for concurrent programming on a network of transputers. AIlocation and partitioning of the program is specified within the source code, binding the program to a specific network. An altemative approach is proposed which completely separates the source code from hardware considerations. Static allocation is performed as a separate phase and should, ideally, be automatic but at present is manual. Complete hardware abstraction requires that non-local, shared communication be provided for, introducing an efficiency overhead which can be minimised by the allocation. The proposal was implemented on a network of IBM PCs, modelled on a transputer network, and implementation issues are discussed
132

Unsupervised-based Distributed Machine Learning for Efficient Data Clustering and Prediction

Baligodugula, Vishnu Vardhan 23 May 2023 (has links)
No description available.
133

A Parallelized Naïve Algorithm for Pattern Matching

Svensson, William January 2022 (has links)
The pattern matching is the problem of locating one string, a pattern, inside another, a text, which is required in for example databases, search engines, and text editors. Thus, several algorithms have been created to tackle this problem and this thesis evaluates whether a parallel version of the Naïve algorithm, given a reasonable amount of threads for a personal computer, could become more efficient than some state-of-the-art algorithms used today. Therefore, an algorithm from the Deadzone family, the Horspool algorithm, and a parallel Naïve algorithm was implemented and evaluated on two different sized alphabets. The results show that a parallel Naïve implementation is to be favoured over the Deadzone and Horspool on a alphabet of size 4 for patterns larger than 2 up to 20. Furthermore, for alphabet of size 256 the parallel Naïve should also be used for patterns of lengths 1 to 20.
134

Millipyde: A Cross-Platform Python Framework for Transparent GPU Acceleration

Asbury, James B 01 December 2021 (has links) (PDF)
The prevalence of general-purpose GPU computing continues to grow and tackle a wider variety of problems that benefit from GPU-acceleration. This acceleration often suffers from a high barrier to entry, however, due to the complexity of software tools that closely map to the underlying GPU hardware, the fast-changing landscape of GPU environments, and the fragmentation of tools and languages that only support specific platforms. Because of this, new solutions will continue to be needed to make GPGPU acceleration more accessible to the developers that can benefit from it. AMD’s new cross-platform development ecosystem ROCm provides promise for developing applications and solutions that work across systems running both AMD and non-AMD GPU computing hardware. This thesis presents Millipyde, a framework for GPU acceleration in Python using AMD’s ROCm. Millipyde includes two new types, the gpuarray and gpuimage, as well as three new constructs for building GPU-accelerated applications – the Operation, Pipeline, and Generator. Using these tools, Millipyde hopes to make it easier for engineers and researchers to write GPU-accelerated code in Python. Millipyde also has the potential to schedule work across many GPUs in complex multi-device environments. These capabilities will be demonstrated in a sample application of augmenting images on-device for machine learning applications. Our results showed that Millipyde is capable of making individual image-related transformations up to around 200 times faster than their CPU-only equivalents. Constructs such as the Millipyde’s Pipeline was also able to additionally improve performance in certain situations, and it performed best when it was allowed to transparently schedule work across multiple devices.
135

Dynamic memory management for the Loci framework

Zhang, Yang 08 May 2004 (has links)
Resource management is a critical part in high-performance computing software. While management of processing resources to increase performance is the most critical, efficient management of memory resources plays an important role in solving large problems. This thesis research seeks to create an effective dynamic memory management scheme for a declarative data-parallel programming system. In such systems, some sort of automatic resource management is a requirement. Using the Loci framework, this thesis research focuses on exploring such opportunities. We believe there exists an automatic memory management scheme for such declarative data-parallel systems that provides good compromise between memory utilization and performance. In addition to basic memory management, this thesis research also seeks to develop methods that take advantages of the cache memory subsystem and explore balances between memory utilization and parallel communication costs in such declarative data-parallel frameworks.
136

The application of the key-value-reference model in dynamic irregular parallel computation

Zhang, Yang 02 May 2009 (has links)
This dissertation studies the effects of the "key-value-ref" model in the computational field simulation software development process. The motivation of this study is rooted in addressing the high cost of designing and implementing high-performance simulation software that runs on modern parallel supercomputers. Unlike traditional sequential programming where a number of effective tools exist, parallel super-cluster programming contains many low-level constructs that increase the complexity in the implementation of a software design. More importantly, the dynamic nature of the simulation problems brings additional challenges into the designing stage. Often a designer has to face a number of competing factors and needs to devise strategies to make a trade-off and to find better software structures that can be realized with reasonable performance and flexibility. Proper modeling can help to address many of these issues in the design and implementation stages. Using a two-phase Lagrangian particleield simulation problem as a case study, this dissertation shows that the "key-space" concept developed in the "key-value-ref" model within this dissertation is able to model the essential components in available design approaches for parallel computational field simulation, and that the model also helps to expose the design choices in a more sensible way, and also offers certain guidance towards the crafting of a better software structure. In addition, a programming interface is also designed and implemented that allows the development of computational field simulation software utilizing the "key-space" concept. Empirical results show that the current implementation provides a reasonable performance compared to those highly optimized hand-tuned programs.
137

Shared Memory Abstractions for Heterogeneous Multicore Processors

Schneider, Scott 21 January 2011 (has links)
We are now seeing diminishing returns from classic single-core processor designs, yet the number of transistors available for a processor is still increasing. Processor architects are therefore experimenting with a variety of multicore processor designs. Heterogeneous multicore processors with Explicitly Managed Memory (EMM) hierarchies are one such experimental design which has the potential for high performance, but at the cost of great programmer effort. EMM processors have cores that are divorced from the normal memory hierarchy, thus the onus is on the programmer to manage locality and parallelism. This dissertation presents the Cellgen source-to-source compiler which moves some of this complexity back into the compiler. Cellgen offers a directive-based programming model with semantics similar to OpenMP for the Cell Broadband Engine, a general-purpose processor with EMM. The compiler implicitly handles locality and parallelism, schedules memory transfers for data parallel regions of code, and provides performance predictions which can be leveraged to make scheduling decisions. We compare this approach to using a software cache, to a different programming model which is task based with explicit data transfers, and to programming the Cell directly using the native SDK. We also present a case study which uses the Cellgen compiler in a comparison across multiple kinds of multicore architectures: heterogeneous, homogeneous and radically data-parallel graphics processors. / Ph. D.
138

Problem specific environments for parallel scientific computing

Auvil, Loretta Sue 04 December 2009 (has links)
Parallelism is one of the key components of large scale, high performance computing. Extensive use of parallelism has yielded a tremendous increase in the raw processing speed of high performance systems, but parallel problem solving remains difficult. These difficulties are typically solved by building software tools, such as parallel programming environments. Existing parallel programming environments are general purpose and use a broad paradigm. This thesis illustrates that problem specific environments are more beneficial than general purpose environments. A problem specific environment permits the design of the algorithm, while also facilitating definition of the problem. We developed problem specific environments for a simple and a complex class of problems. The simple class consists of two classic graph problems, namely, all pairs shortest path and connected components. The complex class consists of elliptic partial differential equations solved via domain decomposition. Specific problems were solved with the problem specific environments and the general purpose environment, BUILD, which allows the algorithm to be described with a control flow graph. Comparisons between special purpose environments and general purpose environments show that the special purpose environments allow the user to concentrate on the problem, while general purpose environments force the user to think about mapping the problem to the environment rather than solving the problem in parallel. Therefore, we conclude more effort should be spent on building tools and environments for parallel computing that focus specifically on a particular class of problems. / Master of Science
139

Explicit parallel programming

Gamble, James Graham 08 June 2009 (has links)
While many parallel programming languages exist, they rarely address programming languages from the issue of communication (implying expressability, and readability). A new language called Explicit Parallel Programming (EPP), attempts to provide this quality by separating the responsibility for the execution of run time actions from the responsibility for deciding the order in which they occur. The ordering of a parallel algorithm is specified in the new EPP language; run time actions are written in FORTRAN and embedded in the EPP code, from which they are later extracted and given to a FORTRAN compiler for compilation. The separation of order and run time actions is taken to its logical extreme in an attempt to evaluate its benefits and costs in the parallel environment. As part of the evaluation, a compiler and executive were implemented on a Sequent Symmetry 881 shared memory multiprocessor computer. The design decisions and difficulties in implementation are discussed at some length, since certain problems are unique to this approach. In the final evaluation, the EPP project asserts that structured, parallel programming languages require a significant amount of interaction with an over-seeing task in order to provide some basic, desirable functions. It also asserts that the description of run time actions (e.g., expression syntax) need not change from the uniprocessor environment to the multiprocessor environment. / Master of Science
140

Chitra: a visualization system to analyze the dynamics of parallel programs

Doraswamy, Naganand. January 1991 (has links)
Visualization is gaining popularity in the field of Computer Science, especially in areas such as performance evaluation and program animation. In this thesis, we explore the possibility of using visualization to analyze parallel program dynamics. We have developed a visualization system, Chitra for this purpose. Chitra visualizes program execution sequences collected by monitoring parallel and distributed program execution. It provides multiple views to aid in the analysis. Through analysis of program execution sequences, we develop an empirical model, a hybrid stochastic/deterministic model, that describes parallel program behavior. Chitra provides various clues and capabilities that assist in developing an empirical model to fit the observed program behavior. Developing empirical models helps in predicting and evaluation efficiency of parallel programs. A case study of the Dining philosophers problem, a classic resource sharing problem, applies Chitra to develop empirical models for a range of processes. These models help in evaluating efficiency at a range of parameter values, given observations of a few parameters values. Working with Chitra has strengthened our belief that as parallel and distributed programs become more common, visualization systems will have an important role to play in analyzing these programs. / M.S.

Page generated in 0.1024 seconds