Global ETD Search

31	Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU Pinto, Vinícius Garcia January 2013 (has links) Nos últimos anos, uma das alternativas adotadas para aumentar o desempenho de sistemas de processamento de alto desempenho têm sido o uso de arquiteturas híbridas. Essas arquiteturas são constituídas de processadores multicore e coprocessadores especializados, como GPUs. Esses coprocessadores atuam como aceleradores em alguns tipos de operações. Por outro lado, as ferramentas e modelos de programação paralela atuais não são adequados para cenários híbridos, produzindo aplicações pouco portáveis. O paralelismo de tarefas considerado um paradigma de programação genérico e de alto nível pode ser adotado neste cenário. Porém, exige o uso de algoritmos de escalonamento dinâmicos, como o algoritmo de roubo de tarefas. Neste contexto, este trabalho apresenta um middleware (WORMS) que oferece suporte ao paralelismo de tarefas com escalonamento por roubo de tarefas em sistemas híbridos multi-CPU e multi-GPU. Esse middleware permite que as tarefas tenham implementação tanto para execução em CPUs quanto em GPUs, decidindo em tempo de execução qual das implementações será executada de acordo com os recursos de hardware disponíveis. Os resultados obtidos com o WORMS mostram ser possível superar, em algumas aplicações, tanto o desempenho de ferramentas de referência para execução em CPU quanto de ferramentas para execução em GPUs. / In the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications. Processamento : Alto desempenho Processamento paralelo Cluster Parallel programming Hybrid parallel programming GPU Parallel programming tools Work stealing scheduling
32	Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU Pinto, Vinícius Garcia January 2013 (has links) Nos últimos anos, uma das alternativas adotadas para aumentar o desempenho de sistemas de processamento de alto desempenho têm sido o uso de arquiteturas híbridas. Essas arquiteturas são constituídas de processadores multicore e coprocessadores especializados, como GPUs. Esses coprocessadores atuam como aceleradores em alguns tipos de operações. Por outro lado, as ferramentas e modelos de programação paralela atuais não são adequados para cenários híbridos, produzindo aplicações pouco portáveis. O paralelismo de tarefas considerado um paradigma de programação genérico e de alto nível pode ser adotado neste cenário. Porém, exige o uso de algoritmos de escalonamento dinâmicos, como o algoritmo de roubo de tarefas. Neste contexto, este trabalho apresenta um middleware (WORMS) que oferece suporte ao paralelismo de tarefas com escalonamento por roubo de tarefas em sistemas híbridos multi-CPU e multi-GPU. Esse middleware permite que as tarefas tenham implementação tanto para execução em CPUs quanto em GPUs, decidindo em tempo de execução qual das implementações será executada de acordo com os recursos de hardware disponíveis. Os resultados obtidos com o WORMS mostram ser possível superar, em algumas aplicações, tanto o desempenho de ferramentas de referência para execução em CPU quanto de ferramentas para execução em GPUs. / In the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications. Processamento : Alto desempenho Processamento paralelo Cluster Parallel programming Hybrid parallel programming GPU Parallel programming tools Work stealing scheduling
33	Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU Pinto, Vinícius Garcia January 2013 (has links) Nos últimos anos, uma das alternativas adotadas para aumentar o desempenho de sistemas de processamento de alto desempenho têm sido o uso de arquiteturas híbridas. Essas arquiteturas são constituídas de processadores multicore e coprocessadores especializados, como GPUs. Esses coprocessadores atuam como aceleradores em alguns tipos de operações. Por outro lado, as ferramentas e modelos de programação paralela atuais não são adequados para cenários híbridos, produzindo aplicações pouco portáveis. O paralelismo de tarefas considerado um paradigma de programação genérico e de alto nível pode ser adotado neste cenário. Porém, exige o uso de algoritmos de escalonamento dinâmicos, como o algoritmo de roubo de tarefas. Neste contexto, este trabalho apresenta um middleware (WORMS) que oferece suporte ao paralelismo de tarefas com escalonamento por roubo de tarefas em sistemas híbridos multi-CPU e multi-GPU. Esse middleware permite que as tarefas tenham implementação tanto para execução em CPUs quanto em GPUs, decidindo em tempo de execução qual das implementações será executada de acordo com os recursos de hardware disponíveis. Os resultados obtidos com o WORMS mostram ser possível superar, em algumas aplicações, tanto o desempenho de ferramentas de referência para execução em CPU quanto de ferramentas para execução em GPUs. / In the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications. Processamento : Alto desempenho Processamento paralelo Cluster Parallel programming Hybrid parallel programming GPU Parallel programming tools Work stealing scheduling
34	Exploiting data parallelism in artificial neural networks with Haskell Heartsfield, Gregory Lynn 2009 August 1900 (has links) Functional parallel programming techniques for feed-forward artiﬁcial neural networks trained using backpropagation learning are analyzed. In particular, the Data Parallel Haskell extension to the Glasgow Haskell Compiler is considered as a tool for achieving data parallelism. We ﬁnd much potential and elegance in this method, and determine that a suﬃciently large workload is critical in achieving real gains. Several additional features are recommended to increase usability and improve results on small datasets. / text Functional programming parallel programming neural networks
35	START : a parallel signal track analytical research tool for flexible and efficient analysis of genomic data Zhu, Xinjie, 朱信杰 January 2015 (has links) Signal Track Analytical Research Tool (START), is a parallel system for analyzing large-scale genomic data. Currently, genomic data analyses are usually performed by using custom scripts developed by individual research groups, and/or by the integrated use of multiple existing tools (such as BEDTools and Galaxy). The goals of START are 1) to provide a single tool that supports a wide spectrum of genomic data analyses that are commonly done by analysts; and 2) to greatly simplify these analysis tasks by means of a simple declarative language (STQL) with which users only need to specify what they want to do, rather than the detailed computational steps as to how the analysis task should be performed. START consists of four major components: 1) A declarative language called Signal Track Query Language (STQL), which is a SQL-like language we specifically designed to suit the needs for analyzing genomic signal tracks. 2) A STQL processing system built on top of a large-scale distributed architecture. The system is based on the Hadoop distributed storage and the MapReduce Big Data processing framework. It processes each user query using multiple machines in parallel. 3) A simple and user-friendly web site that helps users construct and execute queries, upload/download compressed data files in various formats, man-age stored data, queries and analysis results, and share queries with other users. It also provides a complete help system, detailed specification of STQL, and a large number of sample queries for users to learn STQL and try START easily. Private files and queries are not accessible by other users. 4) A repository of public data popularly used for large-scale genomic data analysis, including data from ENCODE and Roadmap Epigenomics, that users can use in their analyses. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Genomics - Data processing Parallel programming (Computer science)
36	Integer performance evaluation of the dynamically trace scheduled VLIW De Souza, Alberto Ferreira January 1999 (has links) No description available. 005
37	A PARALLEL MOLECULAR DYNAMICS PROGRAM FOR SIMULATION OF WATER IN ION CHANNELS Mullapudi, Laxmi 24 April 2009 (has links) With a modest beginning from developing a model of dynamics of hard liquid spheres (Alder et al., 1957), molecular dynamics (MD) simulations have come to a point where complex biomolecules can be simulated with precision close to reality (Noskov et al., 2007). In this context, a parallel molecular dynamics program for simulation of ion channels associated with cellular membranes has been developed. The parallel MD code developed is simple, efficient, and easily coupled to other codes such as the hybrid molecular dynamics/ brownian dynamics (MD/BD) code developed for the study of protein interactions (Ying et al., 2005). The Atom Decomposition (AD) Method was used in partitioning calculations on atoms to processors. One of the major impediments in using AD was the relatively large size of data that had to be communicated by the processes (Plimpton et al., 1995). Replicating only positions of atoms eased the congestion created by communication of both force terms and positions of atoms between processes. The performance of the code was tested on KcsA, a bacterial potassium channel. The program was written in Fortran 90 with parallel functions from the library of mpich-1.2.7. The idle time of processes was optimized by message driven ordering of communication. The scaling of the parallel program with 2000 – 60,000 atoms was determined and compared with the results obtained from the serial program. As expected, the parallel program scaled better than the serial program as the number of atoms included in the simulation increased from 2000 - 60000. The performance of the parallel program was tested on 4-15 processes, for a system comprising 20,000 atoms. The results obtained were compared with results from the serial program. It was observed that the parallel program scaled better than the serial program as the number of processes increased from 4 to 15. When compared with serial program, which had application of Newton’s Third Law in calculating force terms once per each pair of atoms, it was observed that the parallel program scaled better on 6-15 processes for a physical system comprising of 20,000 atoms. Molecular Dynamics Parallel programming Chemical Engineering Engineering
38	Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing Meng, Xiang January 2017 (has links) The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In particular, parallel computing are forms of computation operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently. In this dissertation, we report a series of new nanophotonic developments using the advanced parallel computing techniques. The applications include the structure optimizations at the nanoscale to control both the electromagnetic response of materials, and to manipulate nanoscale structures for enhanced field concentration, which enable breakthroughs in imaging, sensing systems (chapter 3 and 4) and improve the spatial-temporal resolutions of spectroscopies (chapter 5). We also report the investigations on the confinement study of optical-matter interactions at the quantum mechanical regime, where the size-dependent novel properties enhanced a wide range of technologies from the tunable and efficient light sources, detectors, to other nanophotonic elements with enhanced functionality (chapter 6 and 7). Electrical engineering Nanophotonics Parallel programming (Computer science)
39	Performance Modelling of Message-Passing Parallel Programs Grove, Duncan January 2003 (has links) Parallel computing is essential for solving very large scientific and engineering problems. An effective parallel computing solution requires an appropriate parallel machine and a well-optimised parallel program, both of which can be selected via performance modelling. This dissertation describes a new performance modelling system, called the Performance Evaluating Virtual Parallel Machine (PEVPM). Unlike previous techniques, the PEVPM system is relatively easy to use, inexpensive to apply and extremely accurate. It uses a novel bottom-up approach, where submodels of individual computation and communication events are dynamically constructed from data-dependencies, current contention levels and the performance distributions of low-level operations, which define performance variability in the face of contention. During model evaluation, the performance distribution attached to each submodel is sampled using Monte Carlo techniques, thus simulating the effects of contention. This allows the PEVPM to accurately simulate a program's execution structure, even if it is non-deterministic, and thus to predict its performance. Obtaining these performance distributions required the development of a new benchmarking tool, called MPIBench. Unlike previous tools, which simply measure average message-passing time over a large number of repeated message transfers, MPIBench uses a highly accurate and globally synchronised clock to measure the performance of individual communication operations. MPIBench was used to benchmark three parallel computers, which encompassed a wide range of network performance capabilities, namely those provided by Fast Ethernet, Myrinet and QsNet. Network contention, a problem ignored by most research in this area, was found to cause extensive performance variation during message-passing operations. For point-to-point communication, this variation was best described by Pearson 5 distributions. Collective communication operations were able to be modelled using their constituent point-to-point operations. In cases of severe contention, extreme outliers were common in the observed performance distributions, which were shown to be the result of lost messages and their subsequent retransmit timeouts. The highly accurate benchmark results provided by MPIBench were coupled with the PEVPM models of a range of parallel programs, and simulated by the PEVPM. These case studies proved that, unlike previous modelling approaches, the PEVPM technique successfully unites generality, flexibility, cost-effectiveness and accuracy in one performance modelling system for parallel programs. This makes it avaluable tool for the development of parallel computing solutions. / Thesis (Ph.D.)--Computer Science, 2003.
40	IPPM : Interactive parallel program monitor Brandis, Robert Craig 08 1900 (has links) (PDF) M.S. / Computer Science & Engineering / The tasks associated with designing and. implementing parallel programs involve effectively partitioning the problem, defining an efficient. control strategy and mapping the design to a particular system. The task then becomes one of analyzing the program for correctness and stepwise refinement of its performance. New tools are needed to assist the programmer with these last two stages. Metrics and methods of instrumentation are needed to help with behavior analysis (debugging) and performance analysis. First, current tools and analysis methods are reviewed, and then a set of models is proposed for analyzing parallel programs. The design of IPPM, based on these models, is then presented. IPPM is an interactive, parallel program monitor for the Intel iPSC. It gives a post-mortem view of an iPSC program based on a script of events collected during execution. A user can observe changes in program state and synchronization, select statistics, interactively filter events and time critical sequences.

Search results