Spelling suggestions: "subject:"arallel programming"" "subject:"arallel erogramming""
261 |
The Implementation of A Fingerprint Enhancement System Based on GPU via CUDAYang, Kaiyuan, Wang, Fuliang January 2017 (has links)
In order to reduce the large execution time of an existing fingerprint enhancement system, a parallel implementation method based on GPU via CUDA is proposed. Firstly, the necessity and feasibility of employing parallel programming for the whole system are analyzed. Then pre-processing, global analysis, local analysis and matched filtering of the whole fingerprint enhancement system is designed, optimized and implemented respectively using parallel computing technology via CUDA. Finally, numerous fingerprints from FVC2000 databases are tested and the obtained execution time is compared with that of the CPU based system. The results show that the execution time is significantly reduced by using the parallel implementation method based on GPU.
|
262 |
From dataflow-based video coding tools to dedicated embedded multi-core platforms / Depuis des outils de codage vidéo basés sur la programmation flux de données vers des plates-formes multi-coeur embarquées et dédiéesYviquel, Hervé 25 October 2013 (has links)
Le développement du multimédia, avec l'émergence des architectures parallèles, a ravivé l'intérêt de la programmation flux de données pour la conception de systèmes embarqués. En effet, la programmation flux de données offre une approche de développement suffisamment flexible pour créer des applications complexes tout en exprimant la concurrence et le parallélisme explicitement. Paradoxalement, la plupart des études portent sur des modèles flux de données statiques, même si un processus de développement pragmatique nécessite l'expressivité et la practicité d'un langage de programmation basé sur un modèle flux de données dynamiques, comme le langage de programmation utilisé dans le cadre de Reconfigurable Video Coding. Dans cette thèse, nous décrivons un environnement de développement pour la programmation flux de données qui facilite le développement multimédia pour des plates-formes multi-coeur embarquées. Cet environnement de développement repose sur une architecture logicielle modulaire qui bénéficie de techniques modernes de génie logiciel telles que la méta modélisation et la programmation orientée aspect. Ensuite, nous développons une implémentation logicielle optimisée des programmes flux de données ciblant aussi bien les ordinateurs de bureau que les plates-formes embarquées. Notre implémentation vise à combler le fossé entre la practicité du langage de programmation et l'efficacité de son exécution. Enfin, nous présentons un ensemble d'algorithmes de projection et d'ordonnancement d'acteurs qui permettent l'exécution de programmes flux de données dynamiques sur des plates-formes multi-coeur avec des performances extensibles. / The development of multimedia technology, along with the emergence of parallel architectures, has revived the interest on dataflow programming for designing embedded systems. Indeed, dataflow programming offers a flexible development approach in order to build complex applications while expressing concurrency and parallelism explicitly. Paradoxically, most of the studies focus on static dataflow models of computation, even if a pragmatic development process requires the expressiveness and the practicality of a programming language based on dynamic dataflow models, such as the language included in the Reconfigurable Video Coding framework. In this thesis, we describe a complete development environment for dataflow programming that eases multimedia development for embedded multi-core platforms. This development environment is built upon a modular software architecture that benefits from modern software engineering techniques such as meta modeling and aspect-oriented programming. Then, we develop an optimized software implementation of dataflow programs targeting desktop and embedded multi-core platforms. Our implementation aims to bridge the gap between the practicality of the programming language and the efficiency of the execution. Finally, we present a set of runtime actors mapping/scheduling algorithms that enable the execution of dynamic dataflow programs over multi-core platforms with scalable performance.
|
263 |
Using the IBM WatsonTM Dialog Service for Assisting Parallel ProgrammingCalvo, Adrián January 2016 (has links)
IBM Watson is on the verge of becoming a milestone in computer science as it is using a new technology that relies on cognitive systems. IBM Watson is able to understand questions in natural language and give proper answers. The use of cognitive computing in parallel programming is an open research issue. Therefore, the objective of this project is to investigate how IBM Watson can help in parallel programming by using the Dialog Service. In order to answer our research question an application has been built based on the IBM Watson Dialog Service and a survey has been carried out. The results of our research demonstrate that the developed application offers valuable answers to the questions asked by a programmer and the survey reveals that students would be interested in using it.
|
264 |
Real-Time Space-Time Adaptive Processing on the STI CELL MultiprocessorLi, Yi-Hsien January 2007 (has links)
Space-Time Adaptive Processing (STAP) has been widely used in modern radar systems such as Ground Moving Target Indication (GMTI) systems in order to suppress jamming and interference. However, the high performance comes at a price of higher computational complexity, which requires extensive powerful hardware. The new STI Cell Broadband Engine (CBE) processor combines PowerPC core augmented with eight streamlined high-performance SIMD processing engine offers an opportunity to implement the STAP baseband signal processing without any full custom hardware. This paper presents the implementation of an STAP baseband signal processing flow on the state-of-the-art STI CELL multiprocessor, which enables the concept of Software-Defined Radar (SDR). The potential of the Cell BE processor is studied so that kernel subroutine such as QR decomposition, Fast Fourier Transform (FFT), and FIR filtering of STAP are mapped to the SPE co-processors of Cell BE processor with variety of architectural specific optimization techniques. This report starts with an overview of airborne radar technique and then the standard, specifically the third-order Doppler-factored STAP are introduced. Next, it goes with the thorough description of Cell BE architecture, its programming tool chain and parallel programming methods for Cell BE. In later chapter, how the STAP is implemented on the Cell BE processor is discussed and the simulation results are presented. Furthermore, based on the result of earlier benchmarking, an optimized task partition and scheduling method is proposed to improve the overall performance.
|
265 |
Modelica PARallel benchmark suite (MPAR) - a test suite for evaluating the performance of parallel simulations of Modelica modelsHemmati Moghadam, Afshin January 2011 (has links)
Using the object-oriented, equation-based modeling language Modelica, it is possible to model and simulate computationally intensive models. To reduce the simulation time, a desirable approach is to perform the simulations on parallel multi-core platforms. For this purpose, several works have been carried out so far, the most recent one includes language enhancements with explicit parallel programing language constructs in the algorithmic parts of the Modelica language. This extension automatically generates parallel simulation code for execution on OpenCL-enabled platforms, and it has been implemented in the open-source OpenModelica environment. However, to ensure that this extension as well as future developments regarding parallel simulations of Modelica models are feasible, performing a systematic benchmarking with respect to a set of appropriate Modelica models is essential, which is the main focus of study in this thesis. In this thesis a benchmark test suite containing computationally intensive Modelica models which are relevant for parallel simulations is presented. The suite is used in this thesis as a means for evaluating the feasibility and performance measurements of the generated OpenCL code when using the new Modelica language extension. In addition, several considerations and suggestions on how the modeler can efficiently parallelize sequential models to achieve better performance on OpenCL-enabled GPUs and multi-coreCPUs are also given. The measurements have been done for both sequential and parallel implementations of the benchmark suite using the generated code from the OpenModelica compiler on different hardware configurations including single and multi-core CPUs as well as GPUs. The gained results in this thesis show that simulating Modelica models using OpenCL as a target language is very feasible. In addition, it is concluded that for models with large data sizes and great level of parallelism, it is possible to achieve considerable speedup on GPUs compared to single and multi-core CPUs.
|
266 |
A Selection of H.264 Encoder Components Implemented and Benchmarked on a Multi-core DSP ProcessorEinemo, Jonas, Lundqvist, Magnus January 2010 (has links)
H.264 is a video coding standard which offers high data compression rate at the cost of a high computational load. This thesis evaluates how well parts of the H.264 standard can be implemented for a new multi-core digital signal processing processor architecture called ePUMA. The thesis investigates if real-time encoding of high definition video sequences could be performed. The implementation consists of the motion estimation, motion compensation, discrete cosine transform, inverse discrete cosine transform, quantization and rescaling parts of the H.264 standard. Benchmarking is done using the ePUMA system simulator and the results are compared to an implementation of an existing H.264 encoder for another multi-core processor architecture called STI Cell. The results show that the selected parts of the H.264 encoder could be run on 6 calculation cores in 5 million cycles per frame. This setup leaves 2 calculation cores to run the remaining parts of the encoder.
|
267 |
Compiling the parallel programming language NestStep to the CELL processorHolm, Magnus January 2010 (has links)
The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compile to form an executable program that runs on the multicore processor Cell Broadband Engine (Cell BE). The NestStep runtime system has been ported to the Cell BE and is available from start of this project.
|
268 |
Schedule Based Code Generation for ParallelProcessorsNygård, Johan January 2010 (has links)
Dynamic model driven architecture (DMDA) is a architecture made to aid in the development of parallel computing code. This thesis is applied to an implementation of DMDA known as DMDA3 that should convert graphs of computations into efficient computation code, and it deals with the translation of Platform Specific Models (PSM) into running systems. Currently DMDA3 can generate schedules of operations but not finished code. This thesis describes a DMDA3 module that turns a schedule of operations into a runable program. Code was obtained from the DMDA3 schedules by reflection and a framework was build that allowed generation of low level language code from schedules. The module is written in Java and can currently generate C and Fortran code for computational tasks. Based on runtime tests for matrix multiplication algorithms the generated code is almost as fast as handwritten code.
|
269 |
Optimizing MPI Collective Communication by Orthogonal StructuresKühnemann, Matthias, Rauber, Thomas, Rünger, Gudula 28 June 2007 (has links) (PDF)
Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.
|
270 |
Combining Conditional Constant Propagation And Interprocedural Alias AnalysisNandakumar, K S 05 1900 (has links) (PDF)
No description available.
|
Page generated in 0.06 seconds