Global ETD Search

71	IPPM : Interactive parallel program monitor / Brandis, Robert Craig, January 1986 (has links) Thesis (M.S.)--Oregon Graduate Center, 1986.
72	Ensuring performance and correctness for legacy parallel programs McPherson, Andrew John January 2015 (has links) Modern computers are based on manycore architectures, with multiple processors on a single silicon chip. In this environment programmers are required to make use of parallelism to fully exploit the available cores. This can either be within a single chip, normally using shared-memory programming or at a larger scale on a cluster of chips, normally using message-passing. Legacy programs written using either paradigm face issues when run on modern manycore architectures. In message-passing the problem is performance related, with clusters based on manycores introducing necessarily tiered topologies that unaware programs may not fully exploit. In shared-memory it is a correctness problem, with modern systems employing more relaxed memory consistency models, on which legacy programs were not designed to operate. Solutions to this correctness problem exist, but introduce a performance problem as they are necessarily conservative. This thesis focuses on addressing these problems, largely through compile-time analysis and transformation. The first technique proposed is a method for statically determining the communication graph of an MPI program. This is then used to optimise process placement in a cluster of CMPs. Using the 64-process versions of the NAS parallel benchmarks, we see an average of 28% (7%) improvement in communication localisation over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement. Secondly, we move into the shared-memory paradigm, identifying and proving necessary conditions for a read to be an acquire. This can be used to improve solutions in several application areas, two of which we then explore. We apply our acquire signatures to the problem of fence placement for legacy well-synchronised programs. We find that applying our signatures, we can reduce the number of fences placed by an average of 62%, leading to a speedup of up to 2.64x over an existing practical technique. Finally, we develop a dynamic synchronisation detection tool known as SyncDetect. This proof of concept tool leverages our acquire signatures to more accurately detect ad hoc synchronisations in running programs and provides the programmer with a report of their locations in the source code. The tool aims to assist programmers with the notoriously difficult problem of parallel debugging and in manually porting legacy programs to more modern (relaxed) memory consistency models. 005.7
73	A distributed Linda server on a network of heterogeneous processors Smith, Graham Leslie January 1993 (has links) Linda is an approach to parallelism which relies on a virtual associative shared memory called tuple space. Tuple space is accessed through a small set of primitive operations and is conceptually easy to understand and manipulate. The physical implementation of a Linda tuple space may of course be completely different from the conceptual model. Rhodes has implemented versions of Linda on a ring of RS-232 joined PC's and on a cluster of T800 transputers with a single copy of tuple space on one transputer. Current research targets the implementation of a distributed Linda server on a network of heterogeneous processors. This work describes the design and implementation of a distributed Linda server. Emphasis is placed on aspects of the design which enhance portability and efficiency. LINDA (Computer system) Parallel programming (Computer science)
74	A Distributed Approach to EpiFast using Apache Spark Kannan, Vijayasarathy 04 August 2015 (has links) EpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches. / Master of Science computational epidemiology parallel programming distributed computing
75	Performance Impact on Neural Network with Partitioned Convolution Implemented with GPU Programming / Partitioned Convolution in Neuron Network Lee, Bill January 2021 (has links) For input data of homogenous type, the standard form of convolutional neural network is normally constructed with universally applied filters to identify global patterns. However, for certain datasets, there are identifiable trends and patterns within subgroups of input data. This research proposes a convolutional neural network that deliberately partitions input data into groups to be processed with unique sets of convolutional layers, thus identifying the underlying features of individual data groups. Training and testing data are built from historical prices of stock market and preprocessed so that the generated datasets are suitable for both standard and the proposed convolutional neural network. The author of this research also developed a software framework that can construct neural networks to perform necessary testing. The calculation logic was implemented using parallel programming and executed on a Nvidia graphic processing unit, thus allowing tests to be executed without expensive hardware. Tests were executed for 134 sets of datasets to benchmark the performance between standard and the proposed convolutional neural network. Test results show that the partitioned convolution method is capable of performance that rivals its standard counterpart. Further analysis indicates that more sophisticated method of building datasets, larger sets of training data, or more training epochs can further improve the performance of the partitioned neural network. For suitable datasets, the proposed method could be a viable replacement or supplement to the standard convolutional neural network structure. / Thesis / Master of Applied Science (MASc) / A convolutional neural network is a machine learning tool that allows complex patterns in datasets to be identified and modelled. For datasets with input that consists of the same type of data, a convolutional neural network is often architected to identify global patterns. This research explores the viability of partitioning input data into groups and processing them with separate convolutional layers so unique patterns associated with individual subgroups of input data can be identified. The author of this research built suitable test datasets and developed a (parallel computation enabled) framework that can construct both standard and proposed convolutional neural networks. The test results show that the proposed structure is capable of performance that matches its standard counterpart. Further analysis indicates that there are potential methods to further improve the performance of partitioned convolution, making it a viable replacement or supplement to standard convolution. convolutional neural network parallel programming partitioned
76	Modular Implementation of Program Adaptation with Existing Scientific Codes Kang, Pilsung 01 September 2010 (has links) Often times, scientific software needs to be adapted for different execution environments, problem sets, and available resources to ensure its efficiency and reliability. Directly modifying source code to implement adaptation is time-consuming, error-prone, and difficult to manage for today's complex software. This thesis studies modular approaches to implementing program adaptation with existing scientific codes, whereby application-specific adaptation strategies can be implemented in separate code which is then transparently combined with a given program. By using the approaches developed in this thesis, scientific programmers can focus on the design and implementation of adaptation schemes, manage an adaptation code separately from the main program components, and compose an adaptive application whose original capabilities are enhanced in diverse aspects such as application performance and stability. The primary objective of the modular approaches in this study is to provide a language-independent development method of adapting existing scientific software, so that applications written in different languages can be supported when implementing adaptation schemes. In particular, the emphasis is on Fortran, which has been a mainstream language for programming scientific applications. Three research questions are formulated in this thesis, each of which aims to: design and implement high-level abstractions for expressing adaptation strategies, develop a dynamic tuning approach for parallel programs, and support flexible runtime adaptation schemes, respectively. The applicability of the proposed approaches is demonstrated through example applications to real-world scientific software. / Ph. D. scientific computing parallel programming program adaptation
77	Semaphore Solutions for General Mutual Exclusion Problems Yue, Kwok B. (Kwok Bun) 08 1900 (has links) Automatic generation of starvation-free semaphore solutions to general mutual exclusion problems is discussed. A reduction approach is introduced for recognizing edge-solvable problems, together with an O(N^2) algorithm for graph reduction, where N is the number of nodes. An algorithm for the automatic generation of starvation-free edge-solvable solutions is presented. The solutions are proved to be very efficient. For general problems, there are two ways to generate efficient solutions. One associates a semaphore with every node, the other with every edge. They are both better than the standard monitor—like solutions. Besides strong semaphores, solutions using weak semaphores, weaker semaphores and generalized semaphores are also considered. Basic properties of semaphore solutions are also discussed. Tools describing the dynamic behavior of parallel systems, as well as performance criteria for evaluating semaphore solutions are elaborated. computer science parallel programming and processing semaphore solutions Parallel programming (Computer science)
78	Model-based automatic performance diagnosis of parallel computations / Li, Li, January 2007 (has links) Thesis (Ph. D.)--University of Oregon, 2007. / Typescript. Includes vita and abstract. Includes bibliographical references (leaves 119-123). Also available for download via the World Wide Web; free to University of Oregon users.
79	Efficient Conditional Synchronization for Transactional Memory Based System Naik, Aniket Dilip 10 April 2006 (has links) Multi-threaded applications are needed to realize the full potential of new chip-multi-threaded machines. Such applications are very difficult to program and orchestrate correctly, and transactional memory has been proposed as a way of alleviating some of the programming difficulties. However, transactional memory can directly be applied only to critical sections, while conditional synchronization remains difficult to implement correctly and efficiently. This dissertation describes EasySync, a simple and inexpensive extension to transactional memory that allows arbitrary conditional synchronization to be expressed in a simple and composable way. Transactional memory eliminates the need to use locks and provides composability for critical sections: atomicity of a transaction is guaranteed regardless of how other code is written. EasySync provides the same benefits for conditional synchronizations: it eliminates the need to use conditional variables, and it guarantees wakeup of the waiting transaction when the real condition it is waiting for is satisfied, regardless of whether other code correctly signals that change. EasySync also allows transactional memory systems to efficiently provide lock-free and condition variable-free conditional critical regions and even more advanced synchronization primitives, such as guarded execution with arbitrary conditional or guard code. Because EasySync informs the hardware the that a thread is waiting, it allows simple and effective optimizations, such as stopping the execution of a thread until there is a change in the condition it is waiting for. Like transactional memory, EasySync is backward compatible with existing code, which we confirm by running unmodified Splash-2 applications linked with an EasySync-based synchronization library. We also re-write some of the synchronization in three Splash-2 applications, to take advantage of better code readability, and to replace spin-waiting with its more efficient EasySync equivalents. Our experimental evaluation shows that EasySync successfully eliminates processor activity while waiting, reducing the number of executed instructions by 8.6% on average in a 16-processor CMP. We also show that these savings increase with the number of processors, and also for applications written for transactional memory systems. Finally, EasySync imposes virtually no performance overheads, and can in fact improve performance. Parallel programming Transactional memory Synchronization Transaction systems (Computer systems) Parallel programming (Computer science) Threads (Computer programs)
80	Augmenting MPI Programming Process with Cognitive Computing Kazilas, Panagiotis January 2019 (has links) Cognitive Computing is a new and quickly advancing technology. In thelast decade Cognitive Computing has been used to assist researchers in theirendeavors in many different scientific fields such as Health & medicine,Education, Marketing, Psychology and Financial Services. On the otherhand, Parallel programming is a more complex concept than sequentialprogramming. The additional complexity of Parallel Programming isintroduced by its nature that requires implementations of more complexalgorithms and it introduces additional concepts to the developers, namelythe communication between the processes (Distributed memory systems)that execute the parallel program and their synchronization (Share memorysystems). As a result of this additional complexity, a lot of novice developersare reserved in their attempts to implement parallel programs. The objectiveof this research project was to investigate whether we can assist parallelprogramming process through cognitive computing solutions. In order toachieve our objective, the MPI Assistant, a Q&A system has been developedand a case study has been carried out to determine our application’s efficiencyin our attempt to assist parallel programming developers. The case studyshowed that our MPI Assistant system indeed helped developers reduce thetime they spend to develop their solutions, but not improve the quality ofthe program or its efficiency as these improvements require features that areout of this research project’s scope. However, the case study had limitednumber of participants, which may affect our results’ reliability. As a nextstep in our attempt to determine if cognitive computing technologies are ableto assist developers in their parallel programming development, we movedto investigate if cognitive solutions can extract better and more completeresponses compared to our manually-created responses that we created forthe MPI Assistant. We have experimented with 2 different approaches to theproblem. An approach where we manually created responses for the MPIAssistant, and an approach where we investigated if cognitive solutions canautomatically extract better and complete responses. We compared the qualityof the latter automatic responses with the quality of the former which weremanually created. Parallel Programming Cognitive Computing Distributed parallel programming MPI IBM Watson IBM Assistant service IBM Discovery service MPI Assistance Augment parallel programming processes Computer Systems Datorsystem

Search results