• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 170
  • 81
  • 12
  • 12
  • 6
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 375
  • 375
  • 164
  • 123
  • 85
  • 76
  • 62
  • 61
  • 54
  • 51
  • 40
  • 39
  • 38
  • 37
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Optimizing data parallelism in applicative languages

Alahmadi, Marwan Ibrahim January 1990 (has links)
No description available.
42

PEM - Modelo de Ejecución Paralela basado en redes de Petri

Wolfmann, Aaron Gustavo Horacio January 2015 (has links)
El objetivo de la tesis es la definición de un modelo de ejecución paralelo, que basado en la representación de un algoritmo paralelo con Redes de Petri, permita a un conjunto flexible de procesadores independientes entre sí, ejecutar el algoritmo en forma asíncrona con altos rendimientos y que el programador tenga capacidad de ajustar los parámetros de ejecución en vista de mejoras de rendimiento. Los fundamentos son claros: se desea contar con una herramienta de ejecución de programas paralelos que permita modelar el algoritmo, y pasar del modelo a la ejecución asíncrona preservando el modelo. Las Redes de Petri son la herramienta básica e indiscutiblemente pertinente para lograr el objetivo. Un desafío es cubrir la brecha o gap existente entre el modelado y una ejecución del programa paralelo de rendimientos aceptables y escalables.Para ello, debe existir una vinculación del modelo con un conjunto de unidades de procesamiento que corran en paralelo.
43

Providing Support for the Movidius Myriad1 Platform in the SkePU Skeleton Programming Framework

Cuello, Rosandra January 2014 (has links)
The Movidius Myriad1 Platform is a multicore embedded platform primed to offer high performance and power efficiency for computer vision applications in mobile devices. The challenges of programming multicore environments are well known and skeleton programming offers a high-level programming alternative for parallel computing, intended to hide the complexities of the system from the programmer. The SkePU Skeleton Programming Framework includes backend implementations for CPU and GPU systems and it has the capacity to support more platforms by extending its backend implementations. With this master thesis project we aim to extend the SkePU Skeleton Programming Framework to provide support for execution in the Movidius Myriad1 embedded platform. Our SkePU backend for Myriad1 consists on a set of macros and functions to compose the different elements of a Myriad1 application, data communication structures to exchange data between the host systems and Myriad1, and a helper script and auxiliary files to generate a Myriad1 application.Evaluation and testing demonstrate that our backend is usable, however further optimizations are needed to obtain good performance that would make it practical to use in real life applications, particularly when it comes to data communication. As part of this project, we have outlined some improvements that could be applied to obtain better performance overall in the future, addressing the issues found with the methods of data communication.
44

Building a foundation for the future of software practices within the multi-core domain

Berg, Celina 31 August 2011 (has links)
Multi-core programming presents developers with a dramatic paradigm shift. Where the conceptual models of sequential programming largely supported the decoupling of source from underlying architecture, it is now unwise to develop new patterns, abstractions and parallel software in complete isolation from issues of modern hardware utilization. Challenging issues historically associated with complex systems code are now compounded within the parallel domain. These issues are manifested at all stages of software development including design, development, testing and maintenance. Programmers currently lack the essential tools to even partially automate reasoning techniques, resource utilization and system configuration management. Current trial and error strategies lack a systematic approach that will scale to growing multi-core and multi-processor environments. In fact, current algorithm and data layout conceptual models applied to design, implementation and pedagogy often conflict with effective parallelization strategies. This disertation calls for a rethinking, rebuilding and retooling of conceptual models, taking into account opportunities to introduce parallelism for multi-core architectures from the ground up. In order to establish new conceptual models, we must first 1) identify inherent complexities in multi-core development, 2) establish support strategies to make handling them more explicit and 3) evaluate the impact of these strategies in terms of proposed software development practices and tool support. / Graduate
45

IPPM : Interactive parallel program monitor /

Brandis, Robert Craig, January 1986 (has links)
Thesis (M.S.)--Oregon Graduate Center, 1986.
46

Ensuring performance and correctness for legacy parallel programs

McPherson, Andrew John January 2015 (has links)
Modern computers are based on manycore architectures, with multiple processors on a single silicon chip. In this environment programmers are required to make use of parallelism to fully exploit the available cores. This can either be within a single chip, normally using shared-memory programming or at a larger scale on a cluster of chips, normally using message-passing. Legacy programs written using either paradigm face issues when run on modern manycore architectures. In message-passing the problem is performance related, with clusters based on manycores introducing necessarily tiered topologies that unaware programs may not fully exploit. In shared-memory it is a correctness problem, with modern systems employing more relaxed memory consistency models, on which legacy programs were not designed to operate. Solutions to this correctness problem exist, but introduce a performance problem as they are necessarily conservative. This thesis focuses on addressing these problems, largely through compile-time analysis and transformation. The first technique proposed is a method for statically determining the communication graph of an MPI program. This is then used to optimise process placement in a cluster of CMPs. Using the 64-process versions of the NAS parallel benchmarks, we see an average of 28% (7%) improvement in communication localisation over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement. Secondly, we move into the shared-memory paradigm, identifying and proving necessary conditions for a read to be an acquire. This can be used to improve solutions in several application areas, two of which we then explore. We apply our acquire signatures to the problem of fence placement for legacy well-synchronised programs. We find that applying our signatures, we can reduce the number of fences placed by an average of 62%, leading to a speedup of up to 2.64x over an existing practical technique. Finally, we develop a dynamic synchronisation detection tool known as SyncDetect. This proof of concept tool leverages our acquire signatures to more accurately detect ad hoc synchronisations in running programs and provides the programmer with a report of their locations in the source code. The tool aims to assist programmers with the notoriously difficult problem of parallel debugging and in manually porting legacy programs to more modern (relaxed) memory consistency models.
47

A distributed Linda server on a network of heterogeneous processors

Smith, Graham Leslie January 1993 (has links)
Linda is an approach to parallelism which relies on a virtual associative shared memory called tuple space. Tuple space is accessed through a small set of primitive operations and is conceptually easy to understand and manipulate. The physical implementation of a Linda tuple space may of course be completely different from the conceptual model. Rhodes has implemented versions of Linda on a ring of RS-232 joined PC's and on a cluster of T800 transputers with a single copy of tuple space on one transputer. Current research targets the implementation of a distributed Linda server on a network of heterogeneous processors. This work describes the design and implementation of a distributed Linda server. Emphasis is placed on aspects of the design which enhance portability and efficiency.
48

Performance Impact on Neural Network with Partitioned Convolution Implemented with GPU Programming / Partitioned Convolution in Neuron Network

Lee, Bill January 2021 (has links)
For input data of homogenous type, the standard form of convolutional neural network is normally constructed with universally applied filters to identify global patterns. However, for certain datasets, there are identifiable trends and patterns within subgroups of input data. This research proposes a convolutional neural network that deliberately partitions input data into groups to be processed with unique sets of convolutional layers, thus identifying the underlying features of individual data groups. Training and testing data are built from historical prices of stock market and preprocessed so that the generated datasets are suitable for both standard and the proposed convolutional neural network. The author of this research also developed a software framework that can construct neural networks to perform necessary testing. The calculation logic was implemented using parallel programming and executed on a Nvidia graphic processing unit, thus allowing tests to be executed without expensive hardware. Tests were executed for 134 sets of datasets to benchmark the performance between standard and the proposed convolutional neural network. Test results show that the partitioned convolution method is capable of performance that rivals its standard counterpart. Further analysis indicates that more sophisticated method of building datasets, larger sets of training data, or more training epochs can further improve the performance of the partitioned neural network. For suitable datasets, the proposed method could be a viable replacement or supplement to the standard convolutional neural network structure. / Thesis / Master of Applied Science (MASc) / A convolutional neural network is a machine learning tool that allows complex patterns in datasets to be identified and modelled. For datasets with input that consists of the same type of data, a convolutional neural network is often architected to identify global patterns. This research explores the viability of partitioning input data into groups and processing them with separate convolutional layers so unique patterns associated with individual subgroups of input data can be identified. The author of this research built suitable test datasets and developed a (parallel computation enabled) framework that can construct both standard and proposed convolutional neural networks. The test results show that the proposed structure is capable of performance that matches its standard counterpart. Further analysis indicates that there are potential methods to further improve the performance of partitioned convolution, making it a viable replacement or supplement to standard convolution.
49

Modular Implementation of Program Adaptation with Existing Scientific Codes

Kang, Pilsung 01 September 2010 (has links)
Often times, scientific software needs to be adapted for different execution environments, problem sets, and available resources to ensure its efficiency and reliability. Directly modifying source code to implement adaptation is time-consuming, error-prone, and difficult to manage for today's complex software. This thesis studies modular approaches to implementing program adaptation with existing scientific codes, whereby application-specific adaptation strategies can be implemented in separate code which is then transparently combined with a given program. By using the approaches developed in this thesis, scientific programmers can focus on the design and implementation of adaptation schemes, manage an adaptation code separately from the main program components, and compose an adaptive application whose original capabilities are enhanced in diverse aspects such as application performance and stability. The primary objective of the modular approaches in this study is to provide a language-independent development method of adapting existing scientific software, so that applications written in different languages can be supported when implementing adaptation schemes. In particular, the emphasis is on Fortran, which has been a mainstream language for programming scientific applications. Three research questions are formulated in this thesis, each of which aims to: design and implement high-level abstractions for expressing adaptation strategies, develop a dynamic tuning approach for parallel programs, and support flexible runtime adaptation schemes, respectively. The applicability of the proposed approaches is demonstrated through example applications to real-world scientific software. / Ph. D.
50

A Distributed Approach to EpiFast using Apache Spark

Kannan, Vijayasarathy 04 August 2015 (has links)
EpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches. / Master of Science

Page generated in 0.0665 seconds